In most cases, SAS can execute your code independently of the position of each variable in your dataset. However, the order of the variables might be of importance if you want to present or analyze your data. So, how do you reorder variables in a SAS dataset?
You change the position of a variable in a SAS dataset with a DATA Step and the RETAIN statement. The RETAIN statement must be placed before the SET statement and is followed by the column names in the desired order.
In this article, we demonstrate how to rearrange the position of columns with a Data Step, as well as with PROC SQL. We also provide a robust method to order the columns alphabetically.
For the examples in this article, we use a subset of the CARS dataset in the SASHELP library. The subset contains 6 columns (Make, Model, Type, Invoice, EngineSize, and Horsepower) and 6 rows, and is called work.my_data.
data work.my_data; set sashelp.cars ( keep=Make Model Type Invoice EngineSize Horsepower obs=6); run;
How to Reorder Variables with PROC SQL
The first way to reorder the variables in your SAS dataset is with PROC SQL.
PROC SQL is a powerful SAS Base procedure that you can use to process SQL statements. So, if you are familiar with writing SQL code, this method might be the easiest for you to rearrange the column positions.
The SAS code has at least 4 steps:
- Open the SQL procedure with the PROC SQL statement.
- If you want to create a table with the columns in their new position, you need the CREATE TABLE statement followed by the name of the new table and the AS keyword. You can omit this step if you only want to show the table on your screen.
- With the SELECT statement, you specify the order of the columns.
- With the FROM statement, you let SAS know what the input table is. For example, work.my_data.
- Finally, you close the SQL procedure with the QUIT statement.
proc sql; create table work.my_data_reordered as select Type, Make, Model, Horsepower, EngineSize, Invoice from work.my_data; quit;
How to Reorder Variables with a Data Step
The second method to change the order of your columns is with a DATA Step.
A SAS DATA Step is a set of one or more SAS statements that you can use to create a new data set. An important concept of the SAS DATA Step, and the order of the variables, is the Program Data Vector.
SAS builds a new dataset by writing one observation at a time from the input dataset to the Program Data Vector (PDV). The PDV processes the observation (e.g., adding columns, applying formats, etc.) and writes the result to the output dataset. The order of the variables in the PDV defines the order of the variables in the output dataset.
The position of each variable in the PDV is defined the first time the PDV is created. Normally, this is when you use the SET statement to read the input dataset. So, if you write the SET statement directly after the DATA statement, the positions of the variables in the input and output dataset are the same.
However, you can place a RETAIN statement before the SET statement to explicitly define the order of the variables in the output dataset. After the RETAIN keyword you specify the position of each variable.
In the example below, we use the RETAIN statement to define the arrangement of the columns. The desired order of the columns is: Type, Make, Model, Horsepower, EngineSize, and Invoice.
data work.my_data_reordered; retain Type Make Model Horsepower EngineSize Invoice; set work.my_data; run;
Reorder some Variables
In the example above, we used the RETAIN statement to explicitly define the position of each column. However, it’s not necessary to mention all column names.
If you want to change the position of some columns, for example, Invoice and Type, you can use the RETAIN statement too. In this case, the first column of the output dataset will be Invoice and the second Type. The position of the remaining columns depends on the order of how they appear in the SET statement.
In the example below, we use the RETAIN statement to reorder the output dataset so that the first column is Invoice and the second is Type.
data work.my_data_some_reordered; retain Invoice Type; set work.my_data; run;
How to Reorder Variables Alphabetically
Finally, we demonstrate how to position the columns in a dataset in alphabetical order.
You could use the RETAIN statement and manually write down the column names in alphabetical order. However, this can be time-consuming and your code might not be robust. For example, if new columns appear, you need to modify the RETAIN statement.
Fortunately, there exists a method to generate code that always rearranges the columns in alphabetical order automatically.
In order to create and understand this code, you need to be familiar with the Dictionary tables in SAS. Dictionary tables are auxiliary tables that contain metadata about all SAS tables, such as the columns in each table.
In the code below, we use the Dictionary table columns to retrieve some information about the columns in our dataset work.my_data.
proc sql; select * from dictionary.columns where libname = "WORK" and memname = "MY_DATA"; quit;
The table above shows that our work.my_data dataset has 6 columns. The Name column contains the column names of our dataset. We can use this information to order the dataset alphabetically.
The SELECT INTO Statement
Since Dictionary tables are ordinary tables, we can sort them by the variable Name. Besides that, you can use the SELECT INTO statement to create a macro variable of the contents in the Name column.
In the code below, we use the SELECT INTO statement and the ORDER BY statement to generate the macro variable my_column_order. This macro variable contains the column names of the work.my_data dataset in alphabetical order.
proc sql; select name into :my_column_order separated by " " from dictionary.columns where libname = "WORK" and memname = "MY_DATA" order by name; quit;
Here we use a PUT statement to print the value of the macro variable my_column_order to the log.
As you can see, the my_column_order macro variable contains the column names in alphabetical order.
Now, instead of defining manually the order of the columns after the RETAIN statement, we use the macro variable my_column_order to place the columns in the output dataset in alphabetical order.
data work.my_data_ordered_alphabet; retain &my_column_order.; set work.my_data; run;