SAS How To's

3 Easy Ways to Create an Index in SAS [Examples]

If you want to speed up your SAS code by using an index, this article is for you. But, what is an index actually, and how do you create one?

In short, an index is an optional SAS file that enables you to directly locate and access specific observations in a dataset whiteout processing each row sequentially. You create an index with the INDEX=-option in a SAS Data Step or in a PROC SQL procedure. Alternatively, you can use the PROC DATASETS procedure.

In the remainder of this article, we will discuss what an index is, how to create one, how to delete one, and how to check if a dataset has one. We support all our explanations with SAS code that you can apply directly in your own work.

Indexes in SAS

What is an Index

An index in SAS is an optional file that enables you to locate and access specific observations in a dataset directly. By using an index, you can dramatically increase the performance of your program.

For example, suppose you have a dataset with Customer IDs and you want to filter customer 12345. For a dataset without an index, SAS would access all the observations sequentially looking for customer 12345 until it has read all observations. Instead, for a dataset with an index, SAS can access the observations of customer 12345 directly.

In general, a SAS index improves the performance of your program in the following circumstances:

  • When you filter your data with a WHERE statement or a WHERE=-option. Note that, an index does not boost efficiency when you filter data with an IF statement.
  • When you use a BY statement for an operation on a non-sorted dataset. For example, while merging datasets.
  • When you join tables with PROC SQL.
  • When you use the SET or MODIFY statement and the KEY=-option to perform table look-up.

What Type of Indexes Exist

In SAS, you can distinguish two types of indexes, namely a simple index and a composite index. Additionally, you can constraint indexes with the unique and nomiss option.

Simple Index

The most common index in SAS is a simple index which is based on the values of one variable. This variable can be numeric or character and gives the name to the variable. In other words, if you create an index of the values of the CustomerID variable, then the name of the index is also CustomerID.

Composite Index

The second type of index is the composite index. This index is based on multiple variables that can be numeric, character, or a combination of both. In contrast to the simple index, you need to specify a name when you create a composite index.

Unique Values

You can enhance your index by adding the unique option to guarantee that the values of a simple index or the combination of values of a composite index are unique. Consequently, if you try to add duplicates to a table with such an index, SAS rejects the update.

Missing Values

The second option that enhances your index is the nomiss option. This option makes sure that missing values are not maintained by the index. Therefore, the index requires less space.

When to Use an Index

While creating an index is not difficult, the question arises when to use an index.

In general, an index is most effective if you use it to extract a small subset of observations from a large dataset. Although defining “small” and “large” might be difficult, you can use the following table as a rule-of-thumb to determine whether using an index improves the performance of your program.

Size SubsetImproves Performance
1% – 15%Definitely
16% – 20%Probably
21% – 33%Maybe
34% – 100%No
Source

In general, SAS users regard datasets that have more than ten thousand observations as large. However, this might depend on the number of available computer resources.

How to Create an Index in SAS

In this section, we will demonstrate how to create an index (both simple and composite) for new and existing tables.

Create an Index on a New Dataset with a Data Step

There exist two methods to create an index on a new dataset, namely with a SAS Data Step and with PROC SQL.

You create an index in a SAS Data Step with the INDEX=-option. This option enables you to create one or more simple or composite indexes. Additionally, you can add the unique or nomiss options to constraint the index.

Hereafter, we show some examples of how to create an index with the INDEX=-option.

Simple Index

A simple index is the most common index and is easy to create. To do so, you use the INDEX=-option followed by the name of the column that defines the index between parentheses.

Below we create a simple index based on the values of the variable name.

data work.my_data (index=(Name));
    set sashelp.baseball;
run;

The image below proves that we have created an index.

Composite Index

You can also use the INDEX=-option to create a composite index in a SAS Data Step. That means an index that is based on the values of multiple variables.

A composite index in a SAS Data Step starts with the index keyword, an equal sign, and the definition of the index between parentheses. The definition of the composite has a fixed structure, namely the name of the index, an equal sign, and between parentheses the variables that form the index.

In the example below, we create a composite index with the variables Division and League.

data work.my_data (index=(DivLeague=(Division League)));
    set sashelp.baseball;
run;

Combination of a Simple Index and a Composite Index

As mentioned before, you can use the INDEX=-option to create more than one index at once. For example, with the SAS code below we create a simple index and a composite index.

data work.my_data (index=(Name DivLeague=(Division League)));
    set sashelp.baseball;
run;

Unique Index

Besides creating a simple or composite index, you can use the INDEX=-option also to constraint an index. For example, by making sure that all the values in the index are unique.

You create a unique index by adding a forward-slash (/) and the unique keyword right after the definition of your index. In the two examples below, we create a unique simple index and a unique composite index.

data work.my_data (index=(Name / unique));
    set sashelp.baseball;
run;
data work.my_data (index=(NameLeague = (Name League) / unique));
    set sashelp.baseball;
run;

No Missing Values Index

Similar to the unique option, you can also use the INDEX=-option to ignore missing values while creating an index. To do so, you place a forward-slash (/) and the nomiss option directly after the index definition.

You can also combine the unique and nomiss options. For example:

data work.my_data (index=(Name / unique / nomiss));
    set sashelp.baseball;
run;

Create an Index on a New Dataset with PROC SQL

Instead of creating a dataset with a SAS Data Step, you can also create new datasets with PROC SQL. But, how do you create a new table and add an index in a single step?

You add an index to a new table in PROC SQL with the INDEX=-option. With this option, you can create one or more simple or composite indexes at once. Moreover, you can use the unique or nomiss options to constraint your indexes. You must place the INDEX=-option between the table name and the AS clause.

Below we will demonstrate how to use the INDEX=-option in the PROC SQL procedure. As the examples will prove, the syntax of the INDEX=-option in PROC SQL is identical to its syntax in a SAS Data Step.

Simple Index

You create a simple index on a new table with the INDEX=-option followed by the name of the variable that defines the values of the index. You must place the name of the variable between parentheses.

For example, with the following SAS code, we create an index on the variable Name.

proc sql;
    create table work.my_data (index=(Name)) as 
	select *
	from sashelp.baseball;
quit;

Composite Index

You can also use the INDEX=-option to create a composite index, i.e., an index of multiple variables. You create such an index by specifying the name of the index followed by an equal sign, and the names of the variables that define the index.

For example, below we create a composite index of the variables Division and League.

proc sql;
    create table work.my_data (index=(DivLeague=(Division League))) as 
	select *
	from sashelp.baseball;
quit;

Combination of a Simple Index and a Composite Index

As mentioned before, you can also use the INDEX=-option to create multiple indexes at once. In the example below, we create both a simple index and a composite index.

proc sql;
    create table work.my_data (index=(Name DivLeague=(Division League))) as 
	select *
	from sashelp.baseball;
quit;

Unique Index

One of the index options is the uniqueness constraint. You can create such an index by adding the unique keyword after the index definition. Remember that you must separate the index definition and the unique keyword with a forward-slash (/).

For example, below we create a unique simple index.

proc sql;
    create table work.my_data (index=(Name / unique)) as 
	select *
	from sashelp.baseball;
quit;

Likewise, you can also create a unique composite index.

proc sql;
    create table work.my_data (index=(NameLeague = (Name League) / unique)) as 
	select *
	from sashelp.baseball;
quit;

Note that, if you try to create a unique index on a variable with duplicates, that SAS will write an error to the log.

ERROR: Diplicate values not allowed on index ABC for file XYZ.
ERROR: Index creaetion filed for one or more indexes.

So, before you create the index you should remove the duplicates.

No Missing Values Index

Additionally, you can add the nomiss option to your index. By adding this option, SAS will ignore missing values while creating the index. As a result, the index will occupy less space.

Below we show an example of how to use the nomiss option in combination with the unique option.

proc sql;
    create table work.my_data (index=(Name / unique / nomiss)) as 
	select *
	from sashelp.baseball;
quit;

Create an Index on an Existing Dataset with PROC DATASETS

So far, we’ve created indexes on new datasets. However, sometimes you might want to create an index on an existing dataset. You could do so by re-creating the dataset and adding the INDEX=-option, or by using PROC DATASETS.

PROC DATASETS enables you to create an index on an existing SAS dataset with the INDEX CREATE statement. This statement allows you to create both a simple index and a composite index. A disadvantage of PROC DATASETS is that you need multiple INDEX CREATE statements to create indexes with different constraints.

These are the steps to create an index with the PROC DATASETS procedure:

  1. Start the PROC DATASETS procedure with the PROC DATASETS statement.
  2. Specify the library of your dataset with the LIB=-option. If you don’t use this option, SAS will assume that your dataset is in the WORK library.
  3. Use the MODIFY statement to specify the name of your dataset you want to modify (i.e., add the index).
  4. Create the index(es) with the INDEX CREATE statement.
  5. Finish the PROC DATASETS procedure with the RUN statement.

In the examples below, we demonstrate how to create different types of indexes with one or more constraints.

Simple Index

You can create a simple index with the INDEX CREATE statement followed by the name of the column you want to use for the index. If you want to create more than one simple index, then you just add more column names.

In the example below we use the column Name to create an index.

proc datasets lib=work;
    modify my_data;
    index create name;
run;

Composite Index

You can use PROC DATASETS and the INDEX CREATE statement also to create a composite index. In this case, the statement starts with the INDEX CREATE keywords, followed by the name of the index, an equal sign, and the variables that define the index between parentheses.

With the SAS code below we create a composite index called DivLeague that is based on the variables Division and League. Subsequently, we use the PROC CONTENTS procedure to make sure that the index has been created.

proc datasets lib=work;
    modify my_data;
    index create DivLeague=(Division League);
run;

Combination of a Simple Index and a Composite Index

If you want to create more than one index (either simple or composite), you just need one INDEX CREATE statement. This possibility makes your code much more efficient.

The following code shows how to create a simple and composite index with one INDEX CREATE statement.

proc datasets lib=work;
    modify my_data;
    index create Name DivLeague=(Division League);
run;

Unique Index

Additionally, you can also add constraints to the indexes you create with the INDEX CREATE statement. For example, you can add the unique option to make sure there are no duplicates in the index. However, if you try to create an index on a column with duplicates, SAS will return an error.

Below, we show an example of how to create a unique index on the variable Name.

proc datasets lib=work;
    modify my_data;
    index create Name / unique;
run;

You can also create a composite index with the unique constraint.

proc datasets lib=work;
    modify my_data;
    index create NameLeague = (Name League) / unique;
run;

No Missing Values Index

Another option is the nomiss index. If you use this option, the index will ignore missing values. As a result, the index occupies less space.

Below we show to use the nomiss option in combination with the unique option to create a simple index.

proc datasets lib=work;
    modify my_data;
    index create Name / unique nomiss;
run;

Create an Index on an Existing Dataset with PROC SQL

You can also add an index to an existing dataset with the CREATE INDEX statement of the PROC SQL procedure. With this statement, you can create both simple indexes and composite indexes. Additionally, you can add the unique keyword to create a unique index.

In contrast to all other methods discussed so far, a disadvantage of the CREATE INDEX statement is that you can create only one index per statement. Moreover, you can’t add the nomiss option. Therefore, we do not recommend this method.

Below follow some examples of how to use the CREATE INDEX statement.

Simple Index

Creating a simple index with the CREATE INDEX statement is straightforward. You start the statement with the CREATE INDEX keywords, followed by the variable that defines the index, the ON keyword, and finally the dataset on which you want to create the index.

For example:

proc sql;
    create index name on work.my_data;
quit;

Composite Index

Adding a composite index to an existing dataset is also possible with PROC SQL.

As usual, the CREATE INDEX statement starts with the CREATE INDEX keywords. Then, you need to define the name of the composite index. Finally, after the ON keyword and the name of your dataset, you specify the variables that define the composite index.

For example, below we create a composite index based on the variables Division and League.

proc sql;
    create index DivLeague on work.my_data (Division, League);
quit;

Unique Index

You can also use PROC SQL to create a unique index. Instead of using the CREATE INDEX statement, you use the CREATE UNIQUE INDEX statement to create a unique index.

In the examples below we demonstrate how to create a unique simple index and a unique composite index.

proc sql;
    create unique index name on work.my_data;
quit;
proc sql;
    create unique index NameLeague on work.my_data (Name, League);
quit;

How to Remove an Index in SAS

Above we have demonstrated several ways to create an index in SAS. Now it’s time to discuss how to remove them.

Basically, there are two methods to drop an index in SAS. In the first place, you can remove an index with the INDEX DELETE statement in the PROC DATASETS procedure. The second method is the DROP INDEX statement within the PROC SQL procedure. Both methods can delete simple and composite indexes. However, with only the INDEX DELETE statement, you can delete all indexes at once.

Note that, new datasets don’t inhere the indexes from the dataset in the SET statement or the FROM clause. Therefore, new datasets don’t have indexes, except when you explicitly define them.

Below we discuss the two methods that you can use to drop an index.

Remove an Index with PROC DATASETS

The best way to remove an index from a dataset in SAS is with PROC DATASETS. These are the steps:

  1. Call the PROC DATASETS procedure

    You start the PROC DATASETS procedure with the PROC DATASETS statement

  2. Specify the library

    You can specify the library where your dataset resides with the LIB=-option. If you don’t use this option, then SAS assumes that your dataset is in the WORK library.

  3. Specify the dataset

    You can specify the name of your dataset with the MODIFY statement.

  4. Remove the index

    You can remove your index with the INDEX DELETE statement. This statement starts with the INDEX DELETE keywords, followed by the name of the index. You can also drop all indexes on the same dataset at once when you use the _ALL_ keyword.

  5. Run the PROC DATASETS procedure

    You finish and run your code with the RUN statement.

In the SAS code below we remove the index DivLeague from the dataset work.my_data.

proc datasets lib=work;
    modify my_data;
    index delete divleague;
run;

You can use the SAS log to check if the index has been removed.

Remove an Index with PROC SQL

The second method to delete an index in SAS is with the DROP INDEX statement when you use PROC SQL.

The DROP INDEX statement starts with the DROP INDEX keywords, followed by the name of the index you want to remove. Then you use the FROM clause to specify the table which has the index.

The SAS code below provides an example.

proc sql;
    drop index DivLeague from work.my_data;
quit;

The log shows that SAS has successfully removed the index DivLeague.

How to Check if an Index Exists in SAS

Above we have discussed several ways to create and delete indexes. However, how do you know if an index actually exists?

You can check if a SAS dataset has one or more indexes with the PROC CONTENTS procedure or the Dictionary Tables. Compared to the Dictionary Tables, PROC CONTENTS gives you a better overview of the existing indexes and their definitions. However, the dictionary tables have the advantage that all information is stored in a dataset and therefore can be queried easily.

In the two sections below we show how to use both methods.

Check if an Index Exists with PROC CONTENTS

The PROC CONTENTS procedure provides the easiest way to check if a dataset has an index. You only need to call the PROC CONTENTS statement, the DATA=-option to specify your dataset and the RUN statement.

With the SAS code below we try to find out if the dataset work.my_data has any index.

proc contents data=work.my_data;
run;

By default, the PROC CONTENTS procedure creates a report with information about your dataset. At the end of the report, you find the section “Alphabetic List of Indexes and Attributes“. This section contains the following information:

  • Name: The name of the index.
  • Unique Option: Shows whether the values in the index must be unique.
  • NoMiss Option: Shows whether the index ignores missing values.
  • # of Unique Values: The number of unique values in the index. The higher the number of unique values, the better the performance of the index.
  • Variables: The variables that define a composite index. This field is empty in case of a simple index.

Check if an Index Exists with Dictionary Tables

The second option to check if a dataset has an index is with the Dictionary Tables. A Dictionary Table is a special table that contains metadata (i.e., information) about SAS datasets. For example, one of the Dictionary Tables is the indexes table.

The indexes table contains information about datasets in the work and permanent libraries. Because of the amount of data this table contains, we recommend using the libname and memname columns to filter and select your dataset. Make sure that you write the values of the libname and memname columns always in uppercase.

With the SAS code below, we check if the dataset work.my_data has an index.

proc sql;
    select *
    from dictionary.indexes
    where libname = "WORK"
	and memname = "MY_DATA";
quit;

If you select all columns from the Dictionary Table indexes, then SAS will show you the following information:

  • Library Name: The name of the library where your dataset resides.
  • Member Name: The name of your dataset.
  • Member Type: The type of your dataset (DATA or VIEW).
  • Column Name: The name of the column that defines (part of) the index.
  • Column Index Type: The type of the index (simple or composite).
  • Index Name: The name of the index.
  • Position of Column in Concatenated Key: The position of the column in case of a composite index.
  • Nomiss Option: Indicator whether the index uses the nomiss option, or not.
  • Unique Option: Indicator whether the index uses the unique option, or not.