Introduction

Welcome to Edition 16 of Making SAS Accessible to Everyone. In our ongoing journey through SAS programming, we now turn our focus to a crucial part of data preparation and analysis, restructuring tables.

In many real-world scenarios, the structure of your dataset may not be suitable for the type of analysis or visualization you want to perform. For instance, statistical models may require data in a long format, while reporting tools may need it in a wide format. Knowing how to pivot and unpivot your data is therefore essential.

SAS provides two principal methods for this:

  • Manual reshaping using the DATA step with arrays and loops

  • Automated reshaping using PROC TRANSPOSE, a powerful built-in procedure

This edition provides you with a comprehensive guide to both approaches, offering use cases, syntax examples, and guidance on when to use each method.


1. Restructuring Data with the DATA Step

The DATA step is extremely flexible and allows you to reshape data with precision, especially when dealing with non-standard structures.

Manually Stacking Columns (Wide to Long)

Suppose each row in your dataset contains scores for multiple subjects. You want to transform these into one row per subject score.

DATA scores_long_manual;
    SET scores;
    Test = "Math"; Score = Math; OUTPUT;
    Test = "English"; Score = English; OUTPUT;
    Test = "Science"; Score = Science; OUTPUT;
RUN;

Using Arrays and DO Loops

For larger numbers of columns, manual coding becomes impractical. Arrays and loops streamline this process.

DATA scores_long_array;
    SET scores;
    ARRAY Scores[3] Math English Science;
    ARRAY Tests[3] $8 _TEMPORARY_ ("Math", "English", "Science");
    DO i = 1 TO 3;
        Test = Tests[i];
        Score = Scores[i];
        OUTPUT;
    END;
RUN;

This approach can easily be scaled up for dozens of variables.


2. Restructuring Data with PROC TRANSPOSE

PROC TRANSPOSE provides a fast and efficient way to pivot datasets. It is ideal for straightforward transformations where variable names and values need to be rotated.

Basic Use

PROC TRANSPOSE DATA=input OUT=output PREFIX=col;
    VAR values;
RUN;

This flips columns into rows, optionally adding a prefix to output variables.

Customizing with ID and BY

PROC TRANSPOSE DATA=scores_long OUT=scores_wide PREFIX=TestScore;
    BY StudentID;
    ID Test;
    VAR Score;
RUN;
  • BY keeps rows grouped by StudentID.

  • ID uses values in the Test column to create new variable names.

  • VAR specifies the numeric variable to transpose.

Output Review and Cleanup

The output typically includes:

  • A _NAME_ column (from the source variable name)

  • One column per value in the ID variable

To refine your dataset, apply a follow-up DATA step to drop, rename, or format columns as needed.


3. Comparing the Two Methods

FeatureDATA Step Manual LogicPROC TRANSPOSE
FlexibilityHigh – supports complex logicModerate – best for simple pivots
Syntax ComplexityMedium to HighLow – concise and direct
Output ControlFull controlNeeds post-processing
ScalabilityHigh with arraysHigh with few lines

Use the DATA step when fine-grained control is necessary (e.g., conditional reshaping, renaming on-the-fly). Use PROC TRANSPOSE when you need quick reshaping of uniformly structured data.


4. Practical Scenario

Suppose you collect test scores for each student across three subjects and want to:

  1. Convert the wide dataset into a long format for repeated measures analysis.

  2. Reshape it back into wide format for reporting.

Step 1: Original Dataset

DATA scores;
    INPUT StudentID $ Math English Science;
    DATALINES;
    S01 85 90 88
    S02 78 84 82
    S03 92 89 95
    ;
RUN;

Step 2: Convert to Long Format (Array Method)

DATA scores_long;
    SET scores;
    ARRAY Subjects[3] Math English Science;
    ARRAY Names[3] $8 _TEMPORARY_ ("Math" "English" "Science");
    DO i = 1 TO 3;
        Test = Names[i];
        Score = Subjects[i];
        OUTPUT;
    END;
RUN;

Step 3: Convert Back to Wide Format

PROC TRANSPOSE DATA=scores_long OUT=scores_wide PREFIX=Score_;
    BY StudentID;
    ID Test;
    VAR Score;
RUN;

This cycle allows data to move smoothly between analysis-friendly and reporting-friendly formats.


Conclusion

In this edition, we focused on mastering table restructuring in SAS using two main techniques:

  • Manual control with ARRAYS, DO loops, and OUTPUT in the DATA step.

  • Automated reshaping with PROC TRANSPOSE, using BY, ID, and VAR to pivot data efficiently.

Understanding when and how to reshape your data is a vital skill for every data professional. It enables flexible modeling, reporting, and integration of diverse datasets.


What’s Next

In Edition 17, we introduce macro programming in SAS. You will learn how to:

  • Create and use macro variables

  • Write reusable macro code blocks

  • Parameterize logic to make your programs more dynamic and efficient


Structure smart, reshape with confidence – with 3 D Statistical Learning.

Special thanks to Dr. Dany Djeudeu for his invaluable efforts in making SAS an accessible and powerful tool for learners at every level.