Introduction

In the previous editions, we introduced SAS Studio and explored the interface and fundamental programming concepts. Now in Edition 3, we dive into one of the most essential topics: Accessing and Importing Data in SAS Studio.

Understanding how SAS reads, stores, and manages data is the foundation for efficient and error-free analysis. In this article, we’ll explore how SAS datasets work, how to access them via libraries, and how to import data from common external sources such as CSV and Excel files.

1. Understanding SAS Data Sets and Their Structure

SAS datasets are the foundational data structure in SAS. When you import or generate data using SAS from raw text files to Excel spreadsheets, SAS stores this information internally as a SAS dataset, a format optimized for processing and analysis within the SAS environment.

Structure of a SAS Dataset

A SAS dataset is composed of two main parts:

  • Descriptor portion: Metadata that describes the dataset, including variable names, types, lengths, formats, labels, and other attributes.

  • Data portion: The actual data values stored in rows (observations) and columns (variables).

You can view a dataset’s metadata using:

proc contents data=your_table;
run;

Although SAS datasets can only be read and written by SAS, you don’t need to worry about the underlying storage mechanism. SAS handles all the complexity behind the scenes.

Variable Types and Storage

SAS simplifies data typing by supporting only two types of variables:

  • Character: Textual data (letters, numbers, special characters, blanks)

  • Numeric: Numbers, including integers, decimals, and dates

This simplicity makes SAS easier to learn than languages with more granular data types (e.g., integers, floats, logical types).

Here’s what you need to know:

  • Numeric variables: Stored in 8 bytes by default, giving ~14–15 significant digits of precision.

  • Character variables: Assigned a fixed storage length (1–32,767 bytes), either explicitly in your code or determined by SAS rules.

A Note on SAS File Format

If you try to open a SAS dataset using a non-SAS program (e.g., Microsoft Word), you’ll see unreadable characters. This is because SAS datasets are in a proprietary binary format that ensures fast and efficient processing inside SAS.

SAS even converts external data (e.g., Oracle or DB2) into this internal format behind the scenes to maintain consistency.

In practice, most SAS users won’t need to manually adjust storage lengths or worry about the dataset format. SAS abstracts these details for you, making the language more approachable and robust.


2. Accessing Data Through Libraries

SAS organizes data using libraries, which are references to directories containing datasets. A library is defined with the LIBNAME statement:

libname mydata "path-to-your-folder";

Once defined, you can reference datasets as mydata.datasetname.

To remove a library reference:

libname mydata clear;

Built-in Libraries

  • Work: Temporary library.
    Datasets here are deleted after the session ends.

  • Sashelp: Read-only library containing sample data and session-related information.


3. Reading Excel Files with the XLSX Engine

You can read Excel files directly using the XLSX engine (requires SAS/ACCESS to PC Files license):

options validvarname=v7;
libname myxls xlsx "path-to-your-file.xlsx";

Column names with spaces or special characters are automatically converted to valid SAS names (e.g., spaces become underscores).

You can reference sheets as myxls.sheetname. When done, clear the libref:

libname myxls clear;

4. Importing Data Using PROC IMPORT

To import external files such as CSV or Excel, use PROC IMPORT. SAS automatically interprets date values and column types based on the data.

Importing a CSV File

proc import datafile="path-to-your-file.csv"
    dbms=csv
    out=work.mydata
    replace;
    guessingrows=100;
run;

Importing an Excel File

proc import datafile="path-to-your-file.xlsx"
    dbms=xlsx
    out=work.mydata
    replace;
    sheet="Sheet1";
run;

5. Using Raw Data Files with the DATA Step

SAS also supports reading data directly from plain text using the DATA step and the INFILE statement.

List Input with Space-Delimited Data

data demo;
    infile "path-to-your-file.txt";
    input Gender $ Age Height Weight;
run;

CSV Files with DSD Option

data demo;
    infile "path-to-your-file.csv" dsd;
    input Gender $ Age Height Weight;
run;

6. Inline Data with DATALINES

For testing or small datasets, you can include data directly in your program using DATALINES:

data demo;
    input Gender $ Age Height Weight;
    datalines;
M 50 68 155
F 23 60 101
M 65 72 220
;
run;

7. Using Informats and Formats

Informats tell SAS how to read data. Formats control how data is displayed. For example:

data example;
    input ID : $3.
          Name : $20.
          DOB : mmddyy10.
          Salary : dollar8.;
    datalines;
001 John Smith 01/12/1990 $50,000
002 Jane Doe 02/25/1985 $65,000
;
format DOB date9. Salary dollar11.2;
run;

Conclusion

In this edition, we’ve explored how to:

  • Understand the structure of SAS datasets,
  • Access and manage data using libraries,
  • Import data from CSV and Excel files,
  • Read raw data files using the INFILE and DATALINES methods,
  • Apply informats and formats to control data input and output.

These are essential skills for any SAS user working with real-world data.

In Edition 4, we will cover How SAS Works (a deeper Look Inside the
“Black Box”)
.


Stay connected with us at 3 D Statistical Learning as we continue our journey into the world of SAS.

Special thanks to Dr. Dany Djeudeu for his expert guidance and dedication to accessible data education.