Introduction
In the previous editions, we introduced SAS Studio and explored the interface and fundamental programming concepts. Now in Edition 3, we dive into one of the most essential topics: Accessing and Importing Data in SAS Studio.
Understanding how SAS reads, stores, and manages data is the foundation for efficient and error-free analysis. In this article, we’ll explore how SAS datasets work, how to access them via libraries, and how to import data from common external sources such as CSV and Excel files.
1. Understanding SAS Data Sets and Their Structure
SAS datasets are the foundational data structure in SAS. When you import or generate data using SAS from raw text files to Excel spreadsheets, SAS stores this information internally as a SAS dataset, a format optimized for processing and analysis within the SAS environment.
Structure of a SAS Dataset
A SAS dataset is composed of two main parts:
Descriptor portion: Metadata that describes the dataset, including variable names, types, lengths, formats, labels, and other attributes.
Data portion: The actual data values stored in rows (observations) and columns (variables).
You can view a dataset’s metadata using:
proc contents data=your_table;
run;
Although SAS datasets can only be read and written by SAS, you don’t need to worry about the underlying storage mechanism. SAS handles all the complexity behind the scenes.
Variable Types and Storage
SAS simplifies data typing by supporting only two types of variables:
Character: Textual data (letters, numbers, special characters, blanks)
Numeric: Numbers, including integers, decimals, and dates
This simplicity makes SAS easier to learn than languages with more granular data types (e.g., integers, floats, logical types).
Here’s what you need to know:
Numeric variables: Stored in 8 bytes by default, giving ~14–15 significant digits of precision.
Character variables: Assigned a fixed storage length (1–32,767 bytes), either explicitly in your code or determined by SAS rules.
A Note on SAS File Format
If you try to open a SAS dataset using a non-SAS program (e.g., Microsoft Word), you’ll see unreadable characters. This is because SAS datasets are in a proprietary binary format that ensures fast and efficient processing inside SAS.
SAS even converts external data (e.g., Oracle or DB2) into this internal format behind the scenes to maintain consistency.
In practice, most SAS users won’t need to manually adjust storage lengths or worry about the dataset format. SAS abstracts these details for you, making the language more approachable and robust.
2. Accessing Data Through Libraries
SAS organizes data using libraries, which are references to directories containing datasets. A library is defined with the LIBNAME statement:
libname mydata "path-to-your-folder";
Once defined, you can reference datasets as mydata.datasetname.
To remove a library reference:
libname mydata clear;
Built-in Libraries
Work: Temporary library.
Datasets here are deleted after the session ends.Sashelp: Read-only library containing sample data and session-related information.
3. Reading Excel Files with the XLSX Engine
You can read Excel files directly using the XLSX engine (requires SAS/ACCESS to PC Files license):
options validvarname=v7;
libname myxls xlsx "path-to-your-file.xlsx";
Column names with spaces or special characters are automatically converted to valid SAS names (e.g., spaces become underscores).
You can reference sheets as myxls.sheetname. When done, clear the libref:
libname myxls clear;
4. Importing Data Using PROC IMPORT
To import external files such as CSV or Excel, use PROC IMPORT. SAS automatically interprets date values and column types based on the data.
Importing a CSV File
proc import datafile="path-to-your-file.csv"
dbms=csv
out=work.mydata
replace;
guessingrows=100;
run;
Importing an Excel File
proc import datafile="path-to-your-file.xlsx"
dbms=xlsx
out=work.mydata
replace;
sheet="Sheet1";
run;
5. Using Raw Data Files with the DATA Step
SAS also supports reading data directly from plain text using the DATA step and the INFILE statement.
List Input with Space-Delimited Data
data demo;
infile "path-to-your-file.txt";
input Gender $ Age Height Weight;
run;
CSV Files with DSD Option
data demo;
infile "path-to-your-file.csv" dsd;
input Gender $ Age Height Weight;
run;
6. Inline Data with DATALINES
For testing or small datasets, you can include data directly in your program using DATALINES:
data demo;
input Gender $ Age Height Weight;
datalines;
M 50 68 155
F 23 60 101
M 65 72 220
;
run;
7. Using Informats and Formats
Informats tell SAS how to read data. Formats control how data is displayed. For example:
data example;
input ID : $3.
Name : $20.
DOB : mmddyy10.
Salary : dollar8.;
datalines;
001 John Smith 01/12/1990 $50,000
002 Jane Doe 02/25/1985 $65,000
;
format DOB date9. Salary dollar11.2;
run;
Conclusion
In this edition, we’ve explored how to:
- Understand the structure of SAS datasets,
- Access and manage data using libraries,
- Import data from CSV and Excel files,
- Read raw data files using the
INFILEandDATALINESmethods, - Apply informats and formats to control data input and output.
These are essential skills for any SAS user working with real-world data.
In Edition 4, we will cover How SAS Works (a deeper Look Inside the
“Black Box”).
Stay connected with us at 3 D Statistical Learning as we continue our journey into the world of SAS.
Special thanks to Dr. Dany Djeudeu for his expert guidance and dedication to accessible data education.
We help businesses and researchers solve complex challenges by providing expert guidance in statistics, machine learning, and tailored education.
Our core services include:
– Statistical Consulting:
Comprehensive consulting tailored to your data-driven needs.
– Training and Coaching:
In-depth instruction in statistics, machine learning, and the use of statistical software such as SAS, R, and Python.
– Reproducible Data Analysis Pipelines:
Development of documented, reproducible workflows using SAS macros and customized R and Python code.
– Interactive Data Visualization and Web Applications:
Creation of dynamic visualizations and web apps with R (Shiny, Plotly), Python (Streamlit, Dash by Plotly), and SAS (SAS Viya, SAS Web Report Studio).
– Automated Reporting and Presentation:
Generation of automated reports and presentations using Markdown and Quarto.
– Scientific Data Analysis:
Advanced analytical support for scientific research projects.