Uncategorized

Can You Sort your Dataset in a Data Step?

As a good SAS Programmer, you want to write efficient code in as few lines as possible. So, it would be great if you could create a SAS Data Step where you both create a new variable and sort your data.

But, can you actually sort a dataset in a Data Step? No, you can’t sort your data in a Data Step. With a Data Step, you can only create new datasets and new variables, filter observations, and select columns. You need the PROC SORT procedure to sort your data.

However, there might be a solution to this problem.

Only Sorting Your Data

Suppose we have the following SAS dataset with sales data.

Sample dataset

If you only want to sort your data, you can use the PROC SORT procedure. For example, here we sort the data by the units_sold column in ascending order.

proc sort data=work.sales;
     by units_sold;
run;
Sorted SAS dataset

With the DATA= keyword, you specify the dataset you want to sort. You use the BY keyword to let SAS know by which variable(s) you want to sort the data. By default, SAS sorts data in ascending order.

If you run the code above, SAS overwrites the original, unordered dataset work.sales and replaces it with an ordered dataset. If you don’t want to overwrite the original data, but, instead, want to create a new dataset with the ordered data, you need the OUT= keyword. For example:

proc sort data=work.sales out=work.sales_sorted;
     by units_sold;
run;

We have written another post where we discuss the PROC SORT procedure in more detail.

Create New Variables and Sort Your Data

Now, suppose you want to order your dataset by the revenue (price x units sold). However, the revenue variable doesn’t exist yet.

To sort your dataset on a variable that doesn’t exist in the original data, we have to take two steps. First, create a SAS dataset with the new variable in a Data Step, and second, sort the dataset with PROC SORT. So, for example:

data work.revenue;
     set work.sales;
     revenue = price * units_sold; 
     format revenue dollar8.2;
run;

proc sort data=work.revenue out=work.revenue_sorted;
    by descending revenue;
run;
SAS Dataset after a Data Step and PROC SORT

Although you can’t create a new variable and sort your dataset in a Data Step, you can use PROC SQL to perform this task in one step.

With PROC SQL, you can create a variable and use it in the ORDER BY clause to sort your data. Since the new variable isn’t in the original dataset, you need to use the keyword CALCULATED in the ORDER BY clause. For example:

proc sql;
    create table work.revenue_sorted_sql as
    select *,
        price * units_sold as revenue format=dollar8.2
    from work.sales
    order by calculated revenue desc;
quit;
Create a new variable and sort the dataset with PROC SQL