Introduction
Welcome to Edition 12 of Making SAS Accessible to Everyone. In the previous edition, we explored techniques for summarizing data using RETAIN, BY-group processing, and subsetting. In this edition, we delve into one of the most powerful and flexible tools in the SAS programming language: functions.
SAS functions allow users to perform data cleaning, transformation, and enhancement operations efficiently. These functions are the foundation of data wrangling, which is an essential part of any analytics pipeline. Whether you are standardizing date formats, cleaning up strings, or performing complex mathematical operations, SAS provides a rich library of built-in functions to get the job done.
This edition introduces:
Numeric and date/time functions for calculating values and intervals.
Character functions for string manipulation.
Type conversion functions for switching between numeric and character values.
CALL routines for side-effect operations that don’t return values.
1. Understanding SAS Functions and CALL Routines
SAS functions and CALL routines differ primarily in how they operate:
A function takes inputs (arguments) and returns a single value.
A CALL routine performs a task or action but does not return a value. Instead, it affects variables directly.
Syntax:
new_variable = function(arg1, arg2, ...);
CALL routine(arg1, arg2, ...);
Functions and routines can be used in DATA steps, PROC SQL, and macro expressions.
2. Numeric and Date Functions
These are essential for quantitative calculations and time-based analysis.
Generating Random Values
RAND('INTEGER', lower, upper)
Generates a uniformly distributed random integer between two bounds. Useful for sampling and simulations.
Finding Extremes
LARGEST(k, val1, val2, ...)
SMALLEST(k, val1, val2, ...)
Used to find the k-th largest or smallest value among several.
Rounding Numbers
ROUND(value, round_to)
CEIL(value)
FLOOR(value)
INT(value)
These help format or normalize numeric data:
ROUND(27.34, 0.1)→ 27.3CEIL(3.1)→ 4FLOOR(3.9)→ 3INT(3.9)→ 3 (truncates decimals)
Date and Time Extraction
DATEPART(datetime)
TIMEPART(datetime)
Split datetime into separate date and time components for analysis or formatting.
Calculating Intervals Between Dates
INTCK('interval', start_date, end_date, <'method'>)
Counts intervals (e.g., months or years) between two dates.
Shifting Dates
INTNX('interval', start_date, increment, <'alignment'>)
Calculates a new date that is a set number of intervals away from the starting date. Great for rolling periods and forecasts.
3. Character Functions
SAS offers powerful tools for handling text-based data.
Cleaning Text
COMPBL(string) * Reduce multiple spaces to one;
COMPRESS(string, chars) * Remove specified characters;
STRIP(string) * Remove leading and trailing spaces;
Parsing and Case Transformation
SCAN(string, n, 'delims') * Extracts the n-th word based on delimiters;
PROPCASE(string, 'delims') * Converts to Proper Case (Title Case);
Searching Strings
FIND(string, substr, 'mod')
INDEX(string, substr)
LENGTH(string)
These help locate substrings or measure string lengths.
Pattern Recognition
ANYDIGIT(string), ANYALPHA(string), ANYPUNCT(string)
Return the position of the first occurrence of a digit, letter, or punctuation.
String Replacement and Construction
TRANWRD(source, from, to) * Replace all occurrences;
CAT(string1, ..., stringn) * Concatenate without trimming;
CATS(...) * Trim + concatenate;
CATX('delim', ...) * Concatenate with delimiter;
Use these to build IDs, merge names, or standardize formats.
4. Type Conversion Functions
Converting Character to Numeric
numeric_var = INPUT(char_var, informat.);
Safely converts string numbers into numeric form.
Converting Numeric to Character
char_var = PUT(numeric_var, format.);
Useful for labeling outputs or formatting numbers as strings.
Conversion Example
DATA convert;
char_val = "200";
num_val = INPUT(char_val, 8.);
label_val = PUT(num_val, 6.1);
RUN;
Avoids common issues associated with implicit conversions.
5. Use Case: Standardizing Customer Identifiers
DATA customers_clean;
SET customers_raw;
CleanName = PROPCASE(COMPRESS(Name, '0123456789'), ' ');
CustomerID = CATX('-', STRIP(City), STRIP(ID));
RUN;
This standardizes names and constructs unique customer IDs.
Conclusion
In Edition 12, you explored how to:
Apply numeric, date/time, and string functions to transform data.
Leverage type conversion functions to safely change variable formats.
Use string construction and pattern functions for text normalization.
Implement
INTCKandINTNXfor time-based interval operations.Combine multiple functions to achieve real-world data cleaning tasks.
Mastering functions is crucial to transitioning from basic data preparation to robust analytical pipelines in SAS.
What’s Next
In Edition 13, we begin Creating and using custom formats.
Keep transforming intelligently with 3 D Statistical Learning.
Special thanks to Dr. Dany Djeudeu for continuously translating SAS concepts into accessible learning journeys.
We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.
Our core services include:
– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.
– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.
– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.
– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).
– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.
– Scientific Data Analysis:
Advanced analytical support for scientific research projects.