Training and Coaching

Introduction

This edition marks the beginning of our deep dive into data exploration using SAS.

This Edition cannot have the pretension to cover the complete data exploration—not even a huge part. From experience, data exploration can consume up to 90% of the data analytics workload, depending on the goal and the data quality.

Our goal here is simple: to introduce foundational tools and steps to start exploring your data in SAS.

In future intermediate-level tutorials, we’ll go deeper into each of the important SAS procedures for data exploration.

1. Previewing Data with `PROC PRINT`

The PROC PRINT procedure is a basic yet essential tool for examining raw data rows.

PROC PRINT DATA=input-table(OBS=10);
    VAR col1 col2 col3;
RUN;

OBS=n: Limits the number of rows displayed.
VAR: Selects and orders variables to display.

Use this as your first look into a dataset.

2. Generating Descriptive Statistics

2.1 With `PROC MEANS`

Use PROC MEANS to quickly view summary statistics for numeric variables:

PROC MEANS DATA=input-table;
    VAR var1 var2;
RUN;

Shows: Mean, Std Dev, Min, Max, etc.
Use VAR to focus on specific variables.

2.2 With `PROC UNIVARIATE`

For more detailed insights (e.g., distribution, skewness, extreme values):

PROC UNIVARIATE DATA=input-table;
    VAR var1;
RUN;

This procedure is powerful for uncovering outliers and understanding distributional properties.

3. Exploring Categorical Data with `PROC FREQ`

For counts and percentages of unique values:

PROC FREQ DATA=input-table;
    TABLES col1 col2;
RUN;

You can add options for controlling how frequency tables are displayed. An example:

TABLES gender*region / NOCOL NOPERCENT;

Use this to explore categorical distributions and combinations.

4. Filtering Observations with `WHERE`

Use WHERE to focus your exploration on specific subsets of the data:

PROC PRINT DATA=input-table;
    WHERE Age > 40 AND Gender = "M";
RUN;

Operators You Can Use:

= or EQ
^= or NE
> / < / >= / <=
IN, NOT IN, BETWEEN, LIKE

Example with date filter:

WHERE AdmissionDate >= "01JAN2020"d;

Use logical operators (AND, OR, NOT) for complex filters.

5. Dynamic Filtering with Macro Variables

%LET ageLimit = 30;

PROC PRINT DATA=input-table;
    WHERE Age > &ageLimit;
RUN;

%LET: Creates macro variables.
&macrovar: Substitutes values in code.
Use "&charvar" for character variables and "&date"d for dates.

Macros make filtering parameterized and reusable.

6. Formatting Variables for Readability

Change how data is displayed (not stored) using formats:

PROC PRINT DATA=input-table;
    FORMAT Salary DOLLAR8.2 BirthDate DATE9.;
RUN;

Formats help produce clean, human-readable outputs.

7. Sorting and Deduplicating with `PROC SORT`

Sorting:

PROC SORT DATA=input-table OUT=sorted-table;
    BY Age;
RUN;

BY: Specifies sort key(s).
DESCENDING keyword for reverse sort.

Removing Duplicates:

PROC SORT DATA=input-table OUT=nodups NODUPKEY;
    BY _ALL_;
RUN;

NODUPKEY: Removes duplicates based on BY keys.
_ALL_: Uses all columns to identify exact duplicates.

Use DUPOUT= to save duplicates to a separate dataset.

Conclusion

In this edition, we introduced basic yet powerful tools for exploring data in SAS:

PROC PRINT to preview data
PROC MEANS and PROC UNIVARIATE for numeric summaries
PROC FREQ for categorical analysis
WHERE statements for filtering
Macro variables for dynamic queries
FORMAT for readable output
PROC SORT for ordering and deduplication

Mastering these techniques will prepare you for more in-depth exploration and visualization in SAS.

Coming Next

In Edition 6, we’ll explore data validation and common strategies to ensure your data is clean, consistent, and analysis-ready.

Stay curious and keep coding with 3 D Statistical Learning.

Special thanks to Dr. Dany Djeudeu for his continuous effort to make statistical tools accessible and intuitive.

3 D Statistical Learning

We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.

Our core services include:

– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.

– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.

– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.

– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).

– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.

– Scientific Data Analysis:
Advanced analytical support for scientific research projects.

Making SAS Accessible to Everyone – Edition 5: Starting Data Exploration

Introduction

1. Previewing Data with `PROC PRINT`

2. Generating Descriptive Statistics

2.1 With `PROC MEANS`

2.2 With `PROC UNIVARIATE`

3. Exploring Categorical Data with `PROC FREQ`

4. Filtering Observations with `WHERE`

Operators You Can Use:

5. Dynamic Filtering with Macro Variables

6. Formatting Variables for Readability

7. Sorting and Deduplicating with `PROC SORT`

Sorting:

Removing Duplicates:

Conclusion

Coming Next

Leave a Reply Cancel reply

Archive

Categories

Recent Posts

Share on Social Media

Follow Our Social Media

Data Visualization

Recent Posts

Recent Comments

Categories

Archives

Recent Post

Recent Comments

Making SAS Accessible to Everyone – Edition 5: Starting Data Exploration

Introduction

1. Previewing Data with PROC PRINT

2. Generating Descriptive Statistics

2.1 With PROC MEANS

2.2 With PROC UNIVARIATE

3. Exploring Categorical Data with PROC FREQ

4. Filtering Observations with WHERE

Operators You Can Use:

5. Dynamic Filtering with Macro Variables

6. Formatting Variables for Readability

7. Sorting and Deduplicating with PROC SORT

Sorting:

Removing Duplicates:

Conclusion

Coming Next

Leave a Reply Cancel reply

Archive

Categories

Recent Posts

Share on Social Media

Follow Our Social Media

Data Visualization

Recent Posts

Recent Comments

Categories

Archives

Recent Post

Recent Comments

1. Previewing Data with `PROC PRINT`

2.1 With `PROC MEANS`

2.2 With `PROC UNIVARIATE`

3. Exploring Categorical Data with `PROC FREQ`

4. Filtering Observations with `WHERE`

7. Sorting and Deduplicating with `PROC SORT`