Introduction

Welcome to Edition 14 of Making SAS Accessible to Everyone. This edition focuses on an essential task in data processing: combining tables. Whether you’re preparing monthly reports, integrating survey results, building comprehensive views across multiple sources, or staging data for modeling, SAS provides powerful tools for integrating datasets in flexible and efficient ways.

In this article, we’ll explore how to:

  • Concatenate datasets using SET

  • Merge datasets using MERGE and BY

  • Filter and track rows using the IN= option

  • Apply conditional logic for matching and nonmatching rows

  • Use practical scenarios to demonstrate table combination strategies


1. Concatenating Tables Using SET

Basic Syntax and Behavior

DATA output_table;
    SET table1 table2;
RUN;

This approach stacks the rows of table2 below table1. It’s often used to:

  • Combine data from multiple time periods (e.g., monthly sales files)

  • Append survey results collected across different locations

  • Integrate datasets with identical structure

Column Handling and Attributes

  • Shared variables are aligned by name and type.

  • Variables exclusive to one table appear in the result with missing values for non-origin rows.

  • Important: SAS assigns the variable attributes (type, length, format) from the first dataset listed. Be cautious when appending datasets with differences.

Managing Inconsistent Columns

Use RENAME= to ensure consistency before stacking.

DATA combined;
    SET table1(rename=(Name=FullName)) table2;
RUN;

This ensures seamless concatenation where variable names differ slightly.


2. Merging Tables Using MERGE and BY

Pre-sorting the Datasets

Merging requires both datasets to be sorted by the merge key(s):

PROC SORT DATA=table1 OUT=sorted1;
    BY ID;
RUN;

PROC SORT DATA=table2 OUT=sorted2;
    BY ID;
RUN;

This step ensures that SAS aligns rows correctly during the merge process.

Performing the Merge

DATA merged_table;
    MERGE sorted1 sorted2;
    BY ID;
RUN;
  • Combines variables from table1 and table2 for rows with the same ID.

  • Produces a full outer join by default, includes all observations from both tables.

Types of Merge Results

  • One-to-one: one row in each table per key.

  • One-to-many or many-to-many: SAS merges row combinations in sequence. Be careful to verify data alignment.


3. Controlling Output Using the IN= Option

Purpose of IN=

The IN= option creates Boolean variables indicating whether a row originated from a specific dataset. It is used for:

  • Identifying matches and nonmatches

  • Performing conditional logic after merging

Syntax Example

DATA merged_flags;
    MERGE table1(IN=in1) table2(IN=in2);
    BY ID;
    IF in1 AND in2; /* Keep only matched records */
RUN;

Interpretation:

  • in1 = 1 if the row exists in table1.

  • in2 = 1 if the row exists in table2.

  • Allows filtering of rows according to presence in source tables.

Filtering for Nonmatches

DATA only_in_table1;
    MERGE table1(IN=a) table2(IN=b);
    BY ID;
    IF a AND NOT b;
RUN;

This keeps rows that exist in table1 but not in table2, useful for QA or incremental load monitoring.


4. Practical Example: Quarterly Sales

Step 1: Define Q1 and Q2 Tables

DATA sales_q1;
    INPUT ID $ Revenue;
    DATALINES;
    A1 1000
    A2 1500
    A3 1700
    ;
RUN;

DATA sales_q2;
    INPUT ID $ Revenue;
    DATALINES;
    A2 1200
    A3 1300
    A4 1400
    ;
RUN;

Step 2: Merge and Filter Matches

DATA q1_q2_overlap;
    MERGE sales_q1(IN=inQ1) sales_q2(IN=inQ2);
    BY ID;
    IF inQ1 AND inQ2;
RUN;

Result: Only customers present in both quarters are retained. Helpful in customer retention analysis.

Step 3: Combine All Records

DATA all_sales;
    MERGE sales_q1(IN=in1) sales_q2(IN=in2);
    BY ID;
RUN;

Adds a row per unique ID across both quarters. Use IN= flags to identify and analyze new or lost customers.


Conclusion

In Edition 14, you learned to:

  • Concatenate tables row-wise using SET, ideal for appending data with shared structure.

  • Merge datasets using MERGE and BY to combine variables on common keys.

  • Use the IN= option to filter and tag rows based on source dataset origin.

  • Apply practical join strategies for matched and unmatched records.

Combining tables is a foundational skill in SAS programming and supports advanced data preparation and integration tasks.


What’s Next

In Edition 15, we’ll explore processing repetitive codes.


Combine confidently with 3 D Statistical Learning.

Special thanks to Dr. Dany Djeudeu for advancing accessibility in SAS education through structured, example-driven guidance.