SAS How To's

How to Select Variables with the KEEP & DROP Option

A SAS data set contains columns. If you create a new data set with the DATA and SET statement, the new data set will contain, by default, all columns of the original data set. However, it might be unnecessary or unwanted that your new data set contains all the original columns. So, in this article, we discuss how to select variables in SAS with the KEEP and DROP option.

Throughout this article, we use the data set below in all the examples. The data set contains sale information (Product, Price, Units, and SaleDate) for 2 different days.

data work.my_ds;
	infile datalines dlm=",";
	input Product $ Price Units SaleDate :date9.;
	format Price numx12.2 SaleDate date9.;
	datalines;
Milk, 1.50, 2, 02NOV2020
Bread, 1.20, 3, 02NOV2020
Apple, 0.25, 2, 03NOV2020
Cheese, 3.00, 1, 03NOV2020
Soap, 2.75, 1, 03NOV2020
;
run;

In the sections below we show how to select variables in SAS with the KEEP and DROP option.

The KEEP Option

The KEEP option is a SAS data set option that can be used in both the DATA statement as well as in the SET statement. You write the KEEP option between parentheses followed by the equal-sign and the variables/columns that you want to keep.

In the two examples below, we use the KEEP option to retain the Product and Price column. In the first example, the KEEP option is part of the DATA statement, while in the second example, the KEEP option is used in the SET statement. Nevertheless, the result is the same.

/* KEEP Option in DATA statement (I) */
data work.keep_option_data_1 (keep=Product Price);
	set work.my_ds;
run;
 
/* KEEP Option in SET statement (I) */
data work.keep_option_set_1;
	set work.my_ds (keep=Product Price);
run;
Use the SAS Keep Option to select variables
The SAS KEEP= Option

In the example above, the location of the KEEP option didn’t matter to obtain the desired new data set. However, in most cases the location of the KEEP option is important. We show this in the next example.

In this example, we use again the KEEP option to retain the Product and Price column. However, we also created a new column, namely PriceEuros which contains the price of the product in Euros.

/* KEEP Option in DATA statement (II) */
data work.keep_option_data_2 (keep=Product Price);
	set work.my_ds;
 
	PriceEuros = Price * 0.85;
run;
 
/* KEEP Option in SET statement (II) */
data work.keep_option_set_2;
	set work.my_ds (keep=Product Price);
 
	PriceEuros = Price * 0.85;
run;
Create New Variables in Combination With the KEEP= Option

In contrast to the first example, the 2 new data sets aren’t equal. The first data set doesn’t contain the new column PriceEuros, while the second data set does. This is because for the first data set (work.keep_option_data_2) we specified that KEEP option in the DATA statement. That is to say, we defined which columns are in the output data set. In the second example, we used the KEEP option to read only the Product and Price columns from the input data set, and then we created the new column. Finally, in the second data set, there were no restrictions for the output data set.

The KEEP Option and Efficiency

As the examples above show, you can use the KEEP option in both the DATA statement as well as in the SET statement. Regarding efficiency, it’s recommended to use the KEEP option in the SET statement to only read the variables from the input data set that you actually need (in the output dataset or for any calculation). By doing this, you prevent SAS from reading data into memory that won’t be used and speed-up the process.

So, you can use the KEEP option to select variables, but, more importantly, to improve efficiency.

The DROP Option

Like the KEEP option, the DROP option is a SAS data set option that can be used in the DATA statement and the SET statement. You write the DROP option between parentheses followed by the equal-sign and the column(s) you want to drop (remove).

In the two examples below, we use the DROP option to drop the SaleDate column. First, the DROP option is part of the DATA statement. In the second example, we use the DROP option in the SET statement.

/* DROP Option in DATA statement (I) */
data work.drop_option_data_1 (drop=SaleDate);
	set work.my_ds;
run;
 
/* DROP Option in SET statement (I) */
data work.drop_option_set_1;
	set work.my_ds (drop=SaleDate);
run;
Use the SAS DROP Option to remove variables
The SAS DROP= Option

As you can see above, it didn’t matter where we placed the DROP option. Both output data sets are the same. However, in the next example, we show that normally the location of the DROP option is important.

/* DROP Option in DATA statement (II) */
data work.drop_option_data_2 (drop=SaleDate);
	set work.my_ds;
 
	SaleWeek = week(SaleDate);
run;
 
/* DROP Option in SET statement (II) */
data work.drop_option_set_2;
	set work.my_ds (drop=SaleDate);
 
	SaleWeek = week(SaleDate);
run;

In this example, we create a new column, namely SaleWeek, that contains the week number of the week that sale occurred. However, as you can see in the image below, the SaleWeek column in the second data set is empty.

Create New Variables in Combination With the DROP= Option

The first output table contains the columns Product, Price, Units, and SaleWeek because we used the DROP option to specify that this table should contain all columns from the input data set and any newly created column, except for the SaleDate column. The second output table contains the Product, Price, Units, SaleWeek, and SaleDate columns. However, the SaleWeek and SaleDate columns are empty. These columns are empty because we specified in the SET statement to not read the SaleDate column into memory.

So, it’s important to read all column into memory (Program Data Vector (PDV)) that you need in the data step. If you run the example code, you will notice that SAS doesn’t give a warning when executing the code.

The DROP Option and Efficiency

In the example above, we showed that you can use the DROP option to prevent SAS from writing a column to the output data set. Also, the DROP option can be used to prevent SAS from reading columns from the input data set. Ignoring columns from the input data set is especially useful if you want to improve the efficiency and speed of your SAS program. So, use the DROP option in the SET statement to speed-up your SAS code.

The KEEP and DROP Statement

Besides the KEEP and DROP option to select variables, you can also use the KEEP and DROP statements. The KEEP and DROP statements are comparable to the KEEP and DROP options in the DATA statement.

If you have a SAS program that creates 2 or more data sets with the same columns you could use the KEEP (or DROP) option twice. However, you can also use the KEEP statement to write less SAS code.

In the example below, we use the DROP option in the SET statement to prevent SAS from reading the Price column (for efficiency reasons). Then we use an IF-ELSE THEN statement send a row to one of the two output data sets. Finally, we use the KEEP statement to select only the Product and Units columns in the output data sets.

data work.sales_02nov2020
	work.sales_03nov2020;
	set work.my_ds (drop=Price);
 
	if SaleDate = "02NOV2020"D then output work.sales_02nov2020;
	else if SaleDate = "03NOV2020"D then output work.sales_03nov2020;
 
	keep Product Units;
run;

Since the KEEP and DROP statement is comparable to the KEEP and DROP option in the DATA statement, you can’t use them to prevent SAS from reading columns from the input data set. Hence, the KEEP and DROP statements won’t make your SAS code more efficient.

You can find the official documentation of the KEEP and DROP option to select variables on the SAS website.