SAS How To's

How to Easily Sort a Dataset in SAS

For some operations such as merging datasets or ranking observations, it is necessary to have an ordered dataset. Also, an ordered dataset can be useful while analyzing your data. So, in this article, we answer the question of how to sort a dataset in SAS?

Sorting a dataset in SAS can be done with PROC SORT and with PROC SQL. Both can order datasets ascendingly, descendingly, and by multiple variables. PROC SORT can also be used to remove duplicates. PROC SQL is a good option to order datasets by calculated fields.

Throughout the examples in this article, we will use the dataset below to demonstrate how you can sort your data.

Sample Dataset

How to Sort a Dataset with PROC SORT

These are the steps to sort a dataset in SAS with PROC SORT:

  1. Start the procedure with the PROC SORT statement

    With the PROC SORT statement, you let SAS know to start ordering a dataset.

  2. Use the DATA=-option to specify the input dataset

    The DATA=-option specifies the name of the unordered, input dataset.

  3. Optionally, use the OUT=-option to specify the output dataset

    The OUT=-option specifies the name of the ordered, output dataset. If you don’t use the OUT=-option, then SAS overwrites the input dataset.

  4. Use the BY statement to specify the order of the dataset

    The BY statement specifies the variables that define the order of the dataset. You can use one or more variables, separated by whitespace.
    The BY statement also indicates whether the dataset must be ordered ascendingly or descendingly.

  5. End the procedure with the RUN statement

    The RUN statement finishes the PROC SORT procedure.

In the sections below, we dive into the details of the PROC SORT procedure and give some examples with SAS code.

How to Sort a Dataset in Ascending Order in SAS

Generally speaking, you can order a dataset in two ways, namely ascending or descending. So, let’s first answer the question of how to order a SAS dataset in ascending order?

By default, the PROC SORT procedure orders data in ascending order. That is to say, SAS sorts the values of numeric column(s), specified in the BY statement, from low to high. In the case of character columns, SAS orders the values alphabetically.

The SAS code below shows an example of how to use PROC SORT to order a dataset ascendingly.

proc sort data=work.my_ds;
    by var1;
run;
 
proc print data=work.my_ds noobs;
run;
Sort dataset ascendingly with PROC SORT in SAS

How to Sort a Dataset in Descending Order in SAS

So, by default, SAS orders a dataset ascendingly, but how do you sort a dataset in descending order (i.e., from high to low)?

To order a dataset in descending order with PROC SORT, you need to add the DESCENDING keyword to the BY statement. The DESCENDING keyword precedes the name of the variable that defines the order.

With the SAS code below, we order the my_ds dataset in descending order based on the variable var1.

proc sort data=work.my_ds;
    by descending var1;
run;
 
proc print data=work.my_ds noobs;
run;
Sort dataset descendingly with PROC SORT in SAS

How to Sort a Dataset by Multiple Variables

So far, we have seen how to sort a dataset based on one variable. But, how do you order a dataset by multiple variables?

You can order a SAS dataset by multiple variables with the PROC SORT procedure. In the BY statement, you specify the variables that define the order. The variables are separated by a blank.

In the example below, we order the my_ds dataset by the variables var, var2, and var3, all in ascending order.

proc sort data=work.my_ds;
    by var1 var2 var3;
run;
 
proc print data=work.my_ds noobs;
run;
Sort a dataset in SAS by multiple variables with PROC SORT

Note that it is not necessary to order all the variables in either ascending or descending order. You can use PROC SORT to order some columns ascendingly, while others are ordered descendingly.

How to Create a New, Ordered Dataset

By default, the PROC SORT procedure replaces the unordered dataset with the ordered dataset. However, you might want to keep the original dataset and create a new, ordered dataset. So, how do you create a new dataset with PROC SORT?

You can use the OUT=-option of the PROC SORT procedure to create a new, ordered dataset, instead of overwriting the original dataset. The OUT=-option must be placed after the DATA=-option.

When you use the OUT=-option to create a new dataset, you can save the ordered dataset in a permanent library or the WORK library. If you don’t specify a library, SAS assumes that you want to use the WORK library.

In the example below, we use the OUT=-option to create a new, ordered dataset called work.my_ds_srt.

proc sort data=work.my_ds
    out=work.my_ds_srt;
    by var1 var2 var3;
run;

How to Remove Duplicates in SAS

Lastly, we demonstrate how to remove duplicates from a SAS dataset with the PROC SORT procedure.

These are the steps to remove duplicates:

  1. Start the PROC SORT procedure with the PROC SORT statement.
  2. Specify the name of the unordered dataset with the DATA=-option. This dataset contains all observations.
  3. Optionally, define the name of the ordered dataset with the OUT=-option. This dataset will contain the unique observations. If you dont specify the name of the output dataset, SAS will overwrite the unordered dataset.
  4. Also optionally, use the DUPOUT=-option to define the name of the dataset that will contain the duplicate observations.
  5. Add the NODUPKEY option to the PROC SORT statement. This option tells SAS to remove duplicates.
  6. Use the BY statement to specify the variables that will be used to identify the duplicates. You can use one or more variables, both in ascending or descending order.

In the example below, we will use the NODUPKEY option to remove duplicates from the work.my_ds dataset.

proc sort data=work.my_ds
    out=work.my_ds_no_duplicates
    dupout=work.my_ds_duplicates
    nodupkey;
    by var1;
run;
 
proc print data=work.my_ds_no_duplicates noobs;
run;
 
proc print data=work.my_ds_duplicates noobs;
run;
Remove duplicates with PROC SORT. One variable.

As you can see in the image above, SAS created two datasets. The first dataset contains unique observations, while the second one has duplicates.

At first sight, it might not be directly clear how SAS removes duplicates with the PROC SORT.

These are the steps how SAS identifies duplicates in the PROC SORT procedure:

  1. Firstly, SAS orders the dataset according to the variable(s) mentioned in the BY statement. The variable(s) can be sorted in ascending order or descending order.
  2. Secondly, SAS orders the remaining variables, i.e., those that aren’t specified in the BY statement, in ascending order.
  3. Finally, SAS writes the first observation of each combination of the variables in the BY statement to the output dataset.

In the next example, we use two variables to remove duplicates.

proc sort data=work.my_ds
    out=work.my_ds_no_duplicates
    dupout=work.my_ds_duplicates
    nodupkey;
    by var1 var2;
run;
 
proc print data=work.my_ds_no_duplicates noobs;
run;
 
proc print data=work.my_ds_duplicates noobs;
run;
Remove duplicates with PROC SORT. Multiple variables.

How to Sort a Dataset with PROC SQL

The second method to order a dataset in SAS is with PROC SQL.

These are the steps to order a SAS dataset with PROC SQL:

  1. Start the PROC SQL procedure with the PROC SQL statement.
  2. Optionally, use the CREATE TABLE clause to create an output dataset with the ordered data.
  3. Select the output variables with the SELECT clause.
  4. Use the FROM clause to specify the unordered, input dataset.
  5. Use the ORDER BY clause to specify the variables that define the order of the dataset.
  6. Finish the PROC SQL procedure with the QUIT statement.

Below we will provide some examples of how to use PROC SQL to sort a table.

How to Order a Dataset Ascendingly with PROC SQL

One of the ways to order a dataset is in ascending order. But, how do you order a dataset ascendingly with PROC SQL?

In PROC SQL, a dataset can be ordered ascendingly with the ORDER BY clause. This clause starts with the ORDER BY keywords, followed by the variables the define the order. If the dataset is ordered by multiple variables, then the variables must be separated by a comma.

ORDER BY variable-name

In the example below, we demonstrate how to use the ORDER BY clause to sort a dataset.

proc sql;
    select var1,
	var2,
	var3
    from work.my_ds
    order by var1;
quit;
Sort dataset ascendingly with PROC SQL in SAS

How to Order a Dataset Descendingly with PROC SQL

Since PROC SQL orders data by default in ascending order, the question arises how to order a dataset in descending order with PROC SQL?

In PROC SQL, you can sort a dataset in descending order with the ORDER BY clause and the DESCENDING keyword. The DESCENDING keyword must be placed after the variable name that defines the order. Instead of the DESCENDING keyword, you can use the DESC keyword also.

This is the correct syntax:

ORDER BY variable-name DESCENDING

Or with the DESC keyword instead of DESCENDING

ORDER BY variable-name DESC

Below we demonstrate how to use the DESCENDING keyword to order a table in descending order.

proc sql;
    select var1,
	var2,
	var3
    from work.my_ds
    order by var1 desc; /* or descending*/
quit;
Sort dataset descendingly with PROC SQL in SAS

How to Order a Dataset by Multiple Columns

Instead of sorting a dataset on one variable, you can also order a dataset o multiple variables.

You order a dataset by multiple variables with the ORDER BY clause. The clause starts with the ORDER BY keywords, followed by the variables that define the new order. The variables that define the order must be separated by a comma.

The SAS code for ordering a dataset on multiple variable looks like this:

ORDER BY variable-1, variable-2, variable-3

Or, when you want to order a dataset on multiple variable with mixed orders:

ORDER BY variable-1, variable-2 DESCENDING, variable-3 DESCENDING

With the SAS code below we order the dataset work.my_ds based on the variables var1, var2, and var3 (all in ascending order).

proc sql;
    select var1,
	var2,
	var3
    from work.my_ds
    order by var1, var2, var3;
quit;
Sort a dataset in SAS by multiple variables with PROC SQL

How to Order a Dataset on a Calculated Column

In contrast to PROC SORT, you can use PROC SQL to create a new column. Moreover, you can also sort your dataset on the computed column. So, how do you order a dataset on a calculated column?

You order a dataset in SAS on a computed column with the ORDER BY clause. The clause starts with the ORDER BY keywords, followed by the CALCULATED keyword and the name of the computed column.

This is the correct syntax.

ORDER BY CALCULATED variable-name

You can use the CALCULATED keyword also when you order a dataset on multiple (computed) columns. Moreover, you can combine the CALCULATED keyword with the DESCENDING keyword.

ORDER BY CALCULATED variable-name DESCENDING

In the example below, we demonstrate how to use the CALCULATED keyword to sort a dataset on a computed column.

proc sql;
    select var1,
	var2,
	var3,
	var2 * var3 as var2_x_var3
    from work.my_ds
    order by calculated var2_x_var3;
quit;

In the SAS code above, we have created a new column “var2_x_var3” (multiplication of column “var2” and “var3“) and used it to sort the dataset.

Sort a dataset in SAS by a calculated column

How to Order a Dataset Randomly

Another advantage of PROC SQL is that you can order a dataset randomly. This can be useful if you want to draw a random sample of you dataset. So, how do you sort a dataset randomly in SAS?

You can sort a dataset in SAS in random order with a combination of the ORDER BY clause and the RANUNI faunction. Firstñy. the RANUNI function returns a random number between 0 and 1 for each row. Then, the ORDER BY clause orders the random numbers.

In the example below, we show how you can use the RANUNI function to order a dataset randomly.

proc sql;
    select var1,
	var2,
	var3
    from work.my_ds
    order by ranuni(1234);
quit;
Sort a dataset randomly in SAS with PROC SQL