Introduction
Welcome to Edition 14 of Making SAS Accessible to Everyone. This edition focuses on an essential task in data processing: combining tables. Whether you’re preparing monthly reports, integrating survey results, building comprehensive views across multiple sources, or staging data for modeling, SAS provides powerful tools for integrating datasets in flexible and efficient ways.
In this article, we’ll explore how to:
Concatenate datasets using
SETMerge datasets using
MERGEandBYFilter and track rows using the
IN=optionApply conditional logic for matching and nonmatching rows
Use practical scenarios to demonstrate table combination strategies
1. Concatenating Tables Using SET
Basic Syntax and Behavior
DATA output_table;
SET table1 table2;
RUN;
This approach stacks the rows of table2 below table1. It’s often used to:
Combine data from multiple time periods (e.g., monthly sales files)
Append survey results collected across different locations
Integrate datasets with identical structure
Column Handling and Attributes
Shared variables are aligned by name and type.
Variables exclusive to one table appear in the result with missing values for non-origin rows.
Important: SAS assigns the variable attributes (type, length, format) from the first dataset listed. Be cautious when appending datasets with differences.
Managing Inconsistent Columns
Use RENAME= to ensure consistency before stacking.
DATA combined;
SET table1(rename=(Name=FullName)) table2;
RUN;
This ensures seamless concatenation where variable names differ slightly.
2. Merging Tables Using MERGE and BY
Pre-sorting the Datasets
Merging requires both datasets to be sorted by the merge key(s):
PROC SORT DATA=table1 OUT=sorted1;
BY ID;
RUN;
PROC SORT DATA=table2 OUT=sorted2;
BY ID;
RUN;
This step ensures that SAS aligns rows correctly during the merge process.
Performing the Merge
DATA merged_table;
MERGE sorted1 sorted2;
BY ID;
RUN;
Combines variables from
table1andtable2for rows with the sameID.Produces a full outer join by default, includes all observations from both tables.
Types of Merge Results
One-to-one: one row in each table per key.
One-to-many or many-to-many: SAS merges row combinations in sequence. Be careful to verify data alignment.
3. Controlling Output Using the IN= Option
Purpose of IN=
The IN= option creates Boolean variables indicating whether a row originated from a specific dataset. It is used for:
Identifying matches and nonmatches
Performing conditional logic after merging
Syntax Example
DATA merged_flags;
MERGE table1(IN=in1) table2(IN=in2);
BY ID;
IF in1 AND in2; /* Keep only matched records */
RUN;
Interpretation:
in1 = 1if the row exists intable1.in2 = 1if the row exists intable2.Allows filtering of rows according to presence in source tables.
Filtering for Nonmatches
DATA only_in_table1;
MERGE table1(IN=a) table2(IN=b);
BY ID;
IF a AND NOT b;
RUN;
This keeps rows that exist in table1 but not in table2, useful for QA or incremental load monitoring.
4. Practical Example: Quarterly Sales
Step 1: Define Q1 and Q2 Tables
DATA sales_q1;
INPUT ID $ Revenue;
DATALINES;
A1 1000
A2 1500
A3 1700
;
RUN;
DATA sales_q2;
INPUT ID $ Revenue;
DATALINES;
A2 1200
A3 1300
A4 1400
;
RUN;
Step 2: Merge and Filter Matches
DATA q1_q2_overlap;
MERGE sales_q1(IN=inQ1) sales_q2(IN=inQ2);
BY ID;
IF inQ1 AND inQ2;
RUN;
Result: Only customers present in both quarters are retained. Helpful in customer retention analysis.
Step 3: Combine All Records
DATA all_sales;
MERGE sales_q1(IN=in1) sales_q2(IN=in2);
BY ID;
RUN;
Adds a row per unique ID across both quarters. Use IN= flags to identify and analyze new or lost customers.
Conclusion
In Edition 14, you learned to:
Concatenate tables row-wise using
SET, ideal for appending data with shared structure.Merge datasets using
MERGEandBYto combine variables on common keys.Use the
IN=option to filter and tag rows based on source dataset origin.Apply practical join strategies for matched and unmatched records.
Combining tables is a foundational skill in SAS programming and supports advanced data preparation and integration tasks.
What’s Next
In Edition 15, we’ll explore processing repetitive codes.
Combine confidently with 3 D Statistical Learning.
Special thanks to Dr. Dany Djeudeu for advancing accessibility in SAS education through structured, example-driven guidance.
We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.
Our core services include:
– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.
– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.
– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.
– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).
– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.
– Scientific Data Analysis:
Advanced analytical support for scientific research projects.