SAS How To's

How to Select Variables with the KEEP & DROP Option

In this article, we discuss how to select variables from a SAS dataset with the KEEP and DROP option.

In SAS, you can select variables from a dataset with the KEEP=-option and the DROP=-option. The KEEP=-option specifies which variables to select, while the DROP=-option indicates which variable to ignore. You can use the KEEP and DROP option in a DATA statement and a SET statement.

Throughout this article, we use the data set below to demonstrate how you select columns from a dataset. The data set contains sale information (Product, Price, Units, and SaleDate) for 2 different days.

data work.my_ds;
	infile datalines dlm=",";
	input Product $ Price Units SaleDate :date9.;
	format Price numx12.2 SaleDate date9.;
	datalines;
Milk, 1.50, 2, 02NOV2020
Bread, 1.20, 3, 02NOV2020
Apple, 0.25, 2, 03NOV2020
Cheese, 3.00, 1, 03NOV2020
Soap, 2.75, 1, 03NOV2020
;
run;

In the sections below we show how to select variables in SAS with the KEEP and DROP option.

How to Select Variables in SAS?

If you work with large datasets, it can be beneficial to select only the columns you absolutely need. But, how do you select variables from a SAS dataset?

In SAS, you select variables with the KEEP=-option. The KEEP=-option starts with the KEEP keyword, then an equal sign, and finally the variable(s) you want to select. If you select more than one variable, then you must separate them with a blank. You can use the KEEP=-option in a DATA statement and a SET statement.

In the two examples below, we use the KEEP option to select the Product and Price column. In the first example, the KEEP option is part of the DATA statement, while in the second example, the KEEP option is used in the SET statement. Nevertheless, the result is the same.

/* KEEP Option in DATA statement (I) */
data work.keep_option_data_1 (keep=Product Price);
	set work.my_ds;
run;
 
/* KEEP Option in SET statement (I) */
data work.keep_option_set_1;
	set work.my_ds (keep=Product Price);
run;
Use the SAS Keep Option to select variables
The SAS KEEP= Option

In the example above, the location of the KEEP option didn’t matter to obtain the desired new data set. However, in most cases the location of the KEEP option is important. We show this in the next example.

Where to Place the KEEP Option?

As mentioned before, you can use the KEEP=-option in the DATA statement and in the SET statement. In most cases, the result is the same. However, in some cases the location of the KEEP=-option is important.

Firstly, the location of the KEEP=-option is important when you create a new variable.

If you create a new variable based on existing variables and you use the KEEP=-option in the SET statement, then you must specify these variables explicitly in the KEEP=-option. If you don’t do this, SAS throws an error.

Also, when you create a new variable and you use the KEEP=-option in the DATA statement, then you need to define the new variable in the KEEP=-option.

In this example, we use again the KEEP option to keep the Product and Price column. However, we also created a new column, namely PriceEuros which contains the price of the product in Euros.

/* KEEP Option in DATA statement (II) */
data work.keep_option_data_2 (keep=Product Price);
	set work.my_ds;
 
	PriceEuros = Price * 0.85;
run;
 
/* KEEP Option in SET statement (II) */
data work.keep_option_set_2;
	set work.my_ds (keep=Product Price);
 
	PriceEuros = Price * 0.85;
run;
Create New Variables in Combination With the KEEP= Option

In contrast to the first example, the 2 new data sets aren’t equal. The first data set doesn’t contain the new column PriceEuros, while the second data set does. This is because we specified that KEEP option in the DATA statement for the first data set (work.keep_option_data_2).

Secondly, the location of the KEEP=-option affects the efficiency of you code.

We recommend using the KEEP=-option in the SET statement to improve the efficiency of your code. By doing so, SAS only reads the variables from the input data set that you actually need (in the output dataset or for any calculation). By reading only variables you need, you minize the required working memory and you reduce the processing time.

How to Remove Variables in SAS?

Instead of selecting the variable you need, you could also remove the variables you don’t need. Removing variables is especially useful if you want to select all variables except one or two. So, how do you remove variables from a SAS dataset?

In SAS, you remove variables from a dataset with the DROP=-option. The DROP=-option starts with the DROP keyword, followed by an equal sign, and the variables you want to remove. You can use this option in the DATA statement and the SET statement.

In the two examples below, we use the DROP=-option to remove the SaleDate column. First, the DROP option is part of the DATA statement. In the second example, we use the DROP option in the SET statement.

/* DROP Option in DATA statement (I) */
data work.drop_option_data_1 (drop=SaleDate);
	set work.my_ds;
run;
 
/* DROP Option in SET statement (I) */
data work.drop_option_set_1;
	set work.my_ds (drop=SaleDate);
run;
Use the SAS DROP Option to remove variables
The SAS DROP= Option

As you can see above, it didn’t matter where we placed the DROP option. Both output data sets are the same. However, in the next example, we show that normally the location of the DROP option is important.

/* DROP Option in DATA statement (II) */
data work.drop_option_data_2 (drop=SaleDate);
	set work.my_ds;
 
	SaleWeek = week(SaleDate);
run;
 
/* DROP Option in SET statement (II) */
data work.drop_option_set_2;
	set work.my_ds (drop=SaleDate);
 
	SaleWeek = week(SaleDate);
run;

In this example, we create a new column, namely SaleWeek, that contains the week number of the week that sale occurred. However, as you can see in the image below, the SaleWeek column in the second data set is empty.

Create New Variables in Combination With the DROP= Option

The first output table contains the columns Product, Price, Units, and SaleWeek because we used the DROP option to specify that this table should contain all columns from the input data set and any newly created column, except for the SaleDate column. The second output table contains the Product, Price, Units, SaleWeek, and SaleDate columns. However, the SaleWeek and SaleDate columns are empty. These columns are empty because we specified in the SET statement to not read the SaleDate column into memory.

So, it’s important to read all column into memory (Program Data Vector (PDV)) that you need in the data step. If you run the example code, you will notice that SAS doesn’t give a warning when executing the code.

How to Use the DROP Option Efficiently?

In the example above, we showed that you can use the DROP option to prevent SAS from writing a column to the output data set. Also, the DROP option can be used to prevent SAS from reading columns from the input data set. Ignoring columns from the input data set is especially useful if you want to improve the efficiency and speed of your SAS program. So, use the DROP option in the SET statement to speed-up your SAS code.

How to Select Variables with KEEP and DROP Statement?

Besides the KEEP and DROP option to select variables, you can also use the KEEP and DROP statements. The KEEP and DROP statements are comparable to the KEEP and DROP options in the DATA statement.

If you have a SAS program that creates 2 or more data sets with the same columns you could use the KEEP (or DROP) option twice. However, you can also use the KEEP statement to write less SAS code.

In the example below, we use the DROP option in the SET statement to prevent SAS from reading the Price column (for efficiency reasons). Then we use an IF-ELSE THEN statement send a row to one of the two output data sets. Finally, we use the KEEP statement to select only the Product and Units columns in the output data sets.

data work.sales_02nov2020
	work.sales_03nov2020;
	set work.my_ds (drop=Price);
 
	if SaleDate = "02NOV2020"D then output work.sales_02nov2020;
	else if SaleDate = "03NOV2020"D then output work.sales_03nov2020;
 
	keep Product Units;
run;

Since the KEEP and DROP statement is comparable to the KEEP and DROP option in the DATA statement, you can’t use them to prevent SAS from reading columns from the input data set. Hence, the KEEP and DROP statements won’t make your SAS code more efficient.

You can find the official documentation of the KEEP and DROP option to select variables on the SAS website.