Introduction
Welcome to Edition 9 of Making SAS Accessible to Everyone. After mastering report automation with ODS in Edition 8, we now turn to data manipulation and querying using SQL within SAS.
SAS offers full support for SQL through the PROC SQL procedure, blending the power of SQL with the flexibility of SAS datasets. Whether you are already familiar with SQL or just starting out, this edition will help you efficiently query, combine, and summarize your data using PROC SQL.
1. Getting Started with PROC SQL
At its core, PROC SQL enables SQL-like querying on SAS datasets:
PROC SQL;
SELECT column1, column2
FROM dataset;
QUIT;
This is useful for quick queries, simple reports, or more advanced manipulation without writing multiple DATA steps.
Example:
PROC SQL;
SELECT Name, Age, Gender
FROM work.demographics;
QUIT;
This displays the Name, Age, and Gender columns from the dataset work.demographics.
2. Filtering Rows with WHERE
The WHERE clause narrows down the rows returned based on conditions.
PROC SQL;
SELECT *
FROM work.demographics
WHERE Age > 40 AND Gender = 'F';
QUIT;
You can use logical operators like AND, OR, NOT, as well as pattern matching with LIKE, and inclusion using IN.
WHERE Name LIKE 'A%' /* starts with A */
WHERE Age IN (25, 30, 35)
3. Sorting Rows with ORDER BY
The ORDER BY clause controls the order of displayed results:
PROC SQL;
SELECT Name, Age
FROM work.demographics
ORDER BY Age DESC;
QUIT;
You can sort by multiple columns and specify ascending (default) or descending (DESC) order.
4. Creating New Tables
You can save query results as a new SAS dataset:
PROC SQL;
CREATE TABLE work.seniors AS
SELECT *
FROM work.demographics
WHERE Age >= 65;
QUIT;
This is equivalent to using DATA and SET with a WHERE clause in a DATA step, but often more concise.
5. Deleting Tables
SQL allows dropping a dataset (table) from your SAS workspace:
PROC SQL;
DROP TABLE work.temp_data;
QUIT;
Only use this once you’re certain the table is no longer needed.
6. Joining Tables
One of SQL’s most powerful features is the ability to join datasets.
Inner Join Example:
PROC SQL;
SELECT A.ID, A.Name, B.Salary
FROM work.employees AS A
INNER JOIN work.payroll AS B
ON A.ID = B.ID;
QUIT;
This merges employees and payroll by matching ID. You can also use LEFT JOIN, RIGHT JOIN, or FULL JOIN depending on your data needs.
7. Summary and Aggregation
SQL supports aggregation functions, allowing you to summarize data:
PROC SQL;
SELECT Gender, COUNT(*) AS N, AVG(Age) AS AvgAge, MAX(Age) AS MaxAge
FROM work.demographics
GROUP BY Gender;
QUIT;
Functions available:
AVG(),SUM(),MIN(),MAX(),COUNT()GROUP BYgroups data by one or more variables
Use HAVING to filter groups after aggregation:
HAVING AVG(Age) > 40
8. Combining Queries with UNION and EXCEPT
UNION: Combine rows from two queries
SELECT Name FROM table1
UNION
SELECT Name FROM table2;
EXCEPT: Keep values from the first query that aren’t in the second
SELECT Name FROM table1
EXCEPT
SELECT Name FROM table2;
These are useful for comparing datasets.
Conclusion
In Edition 9, we learned to:
- Use
PROC SQLto query and manipulate SAS data - Filter and sort results using
WHEREandORDER BY - Create and delete datasets dynamically
- Join datasets to enrich and combine information
- Aggregate and summarize data with SQL functions
- Use
UNIONandEXCEPTfor dataset comparison
SQL provides a powerful, readable, and flexible way to work with SAS datasets, especially when handling complex transformations and joins.
What’s Next
In Edition 10, we’ll transition from querying data to Controlling DATA Step Processing.
Stay analytical and SQL-savvy with 3 D Statistical Learning.
Special thanks to Dr. Dany Djeudeu for promoting clarity and accessibility in statistical programming.
We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.
Our core services include:
– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.
– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.
– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.
– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).
– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.
– Scientific Data Analysis:
Advanced analytical support for scientific research projects.