Introduction
Welcome to Edition 16 of Making SAS Accessible to Everyone. In our ongoing journey through SAS programming, we now turn our focus to a crucial part of data preparation and analysis, restructuring tables.
In many real-world scenarios, the structure of your dataset may not be suitable for the type of analysis or visualization you want to perform. For instance, statistical models may require data in a long format, while reporting tools may need it in a wide format. Knowing how to pivot and unpivot your data is therefore essential.
SAS provides two principal methods for this:
Manual reshaping using the DATA step with arrays and loops
Automated reshaping using
PROC TRANSPOSE, a powerful built-in procedure
This edition provides you with a comprehensive guide to both approaches, offering use cases, syntax examples, and guidance on when to use each method.
1. Restructuring Data with the DATA Step
The DATA step is extremely flexible and allows you to reshape data with precision, especially when dealing with non-standard structures.
Manually Stacking Columns (Wide to Long)
Suppose each row in your dataset contains scores for multiple subjects. You want to transform these into one row per subject score.
DATA scores_long_manual;
SET scores;
Test = "Math"; Score = Math; OUTPUT;
Test = "English"; Score = English; OUTPUT;
Test = "Science"; Score = Science; OUTPUT;
RUN;
Using Arrays and DO Loops
For larger numbers of columns, manual coding becomes impractical. Arrays and loops streamline this process.
DATA scores_long_array;
SET scores;
ARRAY Scores[3] Math English Science;
ARRAY Tests[3] $8 _TEMPORARY_ ("Math", "English", "Science");
DO i = 1 TO 3;
Test = Tests[i];
Score = Scores[i];
OUTPUT;
END;
RUN;
This approach can easily be scaled up for dozens of variables.
2. Restructuring Data with PROC TRANSPOSE
PROC TRANSPOSE provides a fast and efficient way to pivot datasets. It is ideal for straightforward transformations where variable names and values need to be rotated.
Basic Use
PROC TRANSPOSE DATA=input OUT=output PREFIX=col;
VAR values;
RUN;
This flips columns into rows, optionally adding a prefix to output variables.
Customizing with ID and BY
PROC TRANSPOSE DATA=scores_long OUT=scores_wide PREFIX=TestScore;
BY StudentID;
ID Test;
VAR Score;
RUN;
BYkeeps rows grouped byStudentID.IDuses values in theTestcolumn to create new variable names.VARspecifies the numeric variable to transpose.
Output Review and Cleanup
The output typically includes:
A
_NAME_column (from the source variable name)One column per value in the
IDvariable
To refine your dataset, apply a follow-up DATA step to drop, rename, or format columns as needed.
3. Comparing the Two Methods
| Feature | DATA Step Manual Logic | PROC TRANSPOSE |
|---|---|---|
| Flexibility | High – supports complex logic | Moderate – best for simple pivots |
| Syntax Complexity | Medium to High | Low – concise and direct |
| Output Control | Full control | Needs post-processing |
| Scalability | High with arrays | High with few lines |
Use the DATA step when fine-grained control is necessary (e.g., conditional reshaping, renaming on-the-fly). Use PROC TRANSPOSE when you need quick reshaping of uniformly structured data.
4. Practical Scenario
Suppose you collect test scores for each student across three subjects and want to:
Convert the wide dataset into a long format for repeated measures analysis.
Reshape it back into wide format for reporting.
Step 1: Original Dataset
DATA scores;
INPUT StudentID $ Math English Science;
DATALINES;
S01 85 90 88
S02 78 84 82
S03 92 89 95
;
RUN;
Step 2: Convert to Long Format (Array Method)
DATA scores_long;
SET scores;
ARRAY Subjects[3] Math English Science;
ARRAY Names[3] $8 _TEMPORARY_ ("Math" "English" "Science");
DO i = 1 TO 3;
Test = Names[i];
Score = Subjects[i];
OUTPUT;
END;
RUN;
Step 3: Convert Back to Wide Format
PROC TRANSPOSE DATA=scores_long OUT=scores_wide PREFIX=Score_;
BY StudentID;
ID Test;
VAR Score;
RUN;
This cycle allows data to move smoothly between analysis-friendly and reporting-friendly formats.
Conclusion
In this edition, we focused on mastering table restructuring in SAS using two main techniques:
Manual control with
ARRAYS,DOloops, andOUTPUTin the DATA step.Automated reshaping with
PROC TRANSPOSE, usingBY,ID, andVARto pivot data efficiently.
Understanding when and how to reshape your data is a vital skill for every data professional. It enables flexible modeling, reporting, and integration of diverse datasets.
What’s Next
In Edition 17, we introduce macro programming in SAS. You will learn how to:
Create and use macro variables
Write reusable macro code blocks
Parameterize logic to make your programs more dynamic and efficient
Structure smart, reshape with confidence – with 3 D Statistical Learning.
Special thanks to Dr. Dany Djeudeu for his invaluable efforts in making SAS an accessible and powerful tool for learners at every level.
We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.
Our core services include:
– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.
– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.
– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.
– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).
– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.
– Scientific Data Analysis:
Advanced analytical support for scientific research projects.