Introduction

Welcome to Edition 10 of Making SAS Accessible to Everyone. After exploring SQL integration in SAS in Edition 9, we now turn our attention back to the DATA step, the core of SAS programming. This edition focuses on controlling DATA step processing for efficient, readable, and accurate code.

Understanding the internal mechanics of the DATA step, especially the distinction between compilation and execution phases is essential for mastering SAS. You’ll learn how to manage outputs explicitly, use DROP= and KEEP= efficiently, and debug with PUTLOG. Mastering these tools makes your programs faster, cleaner, and easier to troubleshoot.

1. DATA Step Processing Phases

Compilation Phase

When a DATA step is compiled, SAS does not execute any data-related logic. Instead, it:

Scans your code to determine variable names, types (numeric or character), and lengths.
Builds the Program Data Vector (PDV), a memory structure that holds one observation at a time.
Prepares the data descriptor portion of the output dataset.

Even if the dataset has millions of rows, no data is read during compilation, SAS is simply preparing the structure.

Execution Phase

In this phase, SAS reads and processes data row-by-row:

Reads the first record from the source dataset.
Fills the PDV with values.
Executes all logic (e.g., IF, PUT, OUTPUT statements).
Writes the resulting observation to the output dataset (implicitly or explicitly).
Clears the PDV and repeats the process for the next row.

Use `PUTLOG` as a diagnostic tool during execution to inspect values in real time.

PUTLOG _ALL_;         * Logs all variables and their values;
PUTLOG Age= Gender=;  * Logs selected variables;
PUTLOG "Checkpoint";  * Writes a custom message;

2. Implicit vs. Explicit OUTPUT

Implicit Output

By default, SAS automatically writes each PDV record to the output dataset at the end of every iteration. This behavior is called implicit output.

Explicit Output

You can suppress implicit output and take full control using the OUTPUT statement:

DATA result;
    SET patients;
    IF Age >= 65 THEN OUTPUT;   /* Only output these rows */
RUN;

You may also create multiple output datasets:

DATA seniors juniors;
    SET patients;
    IF Age >= 65 THEN OUTPUT seniors;
    ELSE OUTPUT juniors;
RUN;

This technique is ideal for classification and segmentation.

3. Optimizing Variables with `DROP=` and `KEEP=`

Using DROP=

The DROP= option removes variables from the output dataset, although they remain in the PDV during execution:

DATA output (DROP=TempVar);
    SET source;
    TempVar = ... ;   /* Used for calculation */
RUN;

Using KEEP=

The KEEP= option limits which variables are read from input and/or written to output:

DATA result;
    SET rawdata (KEEP=Name Age Gender);
RUN;

These options reduce memory usage, improve clarity, and produce leaner datasets.

4. Conditional Logic + Output Control

SAS allows combining logic with output and variable control for more robust pipelines:

DATA diabetes_flags (KEEP=ID RiskLevel);
    SET patients;
    IF BMI > 30 AND Age > 50 THEN RiskLevel = 'High';
    ELSE IF BMI > 25 THEN RiskLevel = 'Moderate';
    ELSE RiskLevel = 'Low';
RUN;

By specifying KEEP=, we write only relevant variables. The logic inside allows complex condition-based classification.

5. Advanced Debugging with `PUTLOG`

Use PUTLOG to investigate how your program behaves during execution:

DATA debug_test;
    SET work.patients;
    PUTLOG "Processing ID=" ID " Age=" Age;
    IF Age < 0 THEN PUTLOG "Warning: Negative age detected for ID=" ID;
RUN;

This is particularly useful for identifying:

Incorrect data values
Logic bugs
Unexpected PDV behavior

Try PUTLOG _ALL_; to see the entire PDV content at each iteration.

Conclusion

In Edition 10, you gained a deeper understanding of the internals of SAS DATA step processing. We covered:

The two-step nature of DATA steps: compilation and execution
Managing output behavior using implicit and explicit OUTPUT
Optimizing memory and clarity with DROP= and KEEP=
Structuring conditional outputs
Debugging effectively with PUTLOG

These foundational concepts unlock better performance and more readable code for advanced SAS projects.

What’s Next

In Edition 11, we transition to summarizing data in SAS, building upon the summary techniques introduced in previous editions.

Keep refining your SAS skills with 3 D Statistical Learning.

Special thanks to Dr. Dany Djeudeu for demystifying core programming mechanics for learners around the world.

3 D Statistical Learning

We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.

Our core services include:

– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.

– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.

– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.

– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).

– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.

– Scientific Data Analysis:
Advanced analytical support for scientific research projects.

Making SAS Accessible to Everyone – Edition 10: Controlling DATA Step Processing

Introduction

1. DATA Step Processing Phases

Compilation Phase

Execution Phase

Use `PUTLOG` as a diagnostic tool during execution to inspect values in real time.

2. Implicit vs. Explicit OUTPUT

Implicit Output

Explicit Output

3. Optimizing Variables with `DROP=` and `KEEP=`

Using DROP=

Using KEEP=

4. Conditional Logic + Output Control

5. Advanced Debugging with `PUTLOG`

Conclusion

What’s Next

Leave a Reply Cancel reply

Archive

Categories

Recent Posts

Share on Social Media

Follow Our Social Media

Data Visualization

Recent Posts

Recent Comments

Categories

Archives

Recent Post

Recent Comments

Making SAS Accessible to Everyone – Edition 10: Controlling DATA Step Processing

Introduction

1. DATA Step Processing Phases

Compilation Phase

Execution Phase

Use PUTLOG as a diagnostic tool during execution to inspect values in real time.

2. Implicit vs. Explicit OUTPUT

Implicit Output

Explicit Output

3. Optimizing Variables with DROP= and KEEP=

Using DROP=

Using KEEP=

4. Conditional Logic + Output Control

5. Advanced Debugging with PUTLOG

Conclusion

What’s Next

Leave a Reply Cancel reply

Archive

Categories

Recent Posts

Share on Social Media

Follow Our Social Media

Data Visualization

Recent Posts

Recent Comments

Categories

Archives

Recent Post

Recent Comments

Use `PUTLOG` as a diagnostic tool during execution to inspect values in real time.

3. Optimizing Variables with `DROP=` and `KEEP=`

5. Advanced Debugging with `PUTLOG`