If you work with data, inevitably, you sometimes need to transpose your data. In this article, we explain how to use PROC TRANSPOSE to transpose a SAS data set. We both cover how to transpose your data from long-to-wide as well as wide-to-long. We illustrate this with step-by-step examples.
Contents
PROC TRANSPOSE: Long-to-Wide
First, we will show how to transpose a SAS data set from long to wide, i.e. rows to columns.
Example 1: The PROC TRANSPOSE Options
In this first example, we will reshape the data shown below.
We will show the default behavior of the transpose procedure and how to use the procedure’s options and statements.
The Data Set
For this example, we use the SHOES data set from the SASHELP library. This data set contains information about shoes’ sales from around the world. First, we use the IF statement to filter only the shoes’ sales from Johannesburg.
/* Create a simple data set */ DATA WORK.SHOES_JOBURG; SET SASHELP.SHOES; IF SUBSIDIARY = "Johannesburg"; KEEP REGION PRODUCT SUBSIDIARY SALES INVENTORY; RUN;
The Default Behavior of PROC TRANSPOSE
By default, the transpose procedure only transposes the numeric columns from long to wide and ignores any character column. The code and image below show this default behaviour.
/* Default transpose */ PROC TRANSPOSE DATA=WORK.SHOES_JOBURG OUT=WORK.SHOES_JOBURG_TRNS; RUN;
As the image above shows, SAS has transposed the numeric columns Sales and Inventory. It has created the columns _NAME_ and _LABEL_, which contain the original column names and its labels. The rows that are transposed to columns are named COL1, COL2, etc. Next, we will show how to give the new columns more meaningful names.
The Options of PROC TRANSPOSE
Like many other SAS procedures, the transpose procedure has options too. Some of the available options are:
- NAME: With the NAME option you can change the name of the _NAME_ variable.
- LABEL: With the LABEL option you can modify the name of the _LABEL_ variable.
- PREFIX: With the PREFIX option you can change the “COL” part of the newly created columns. However, the number in the column name won’t be changed. Later on, we will explain how to do this.
- OUT: With the OUT option you can specify the name of the transposed table. If you don’t use the OUT option, SAS will create an output data set with a default name.
With the code below we change the _NAME_ column to “VAR_NAME” and the _LABEL_ column to “VAR_LABEL”. We use the PREFIX option to name the new columns “PRODUCT1”, “PRODUCT2”, etc. Finally, with the OUT option, we specify the name of the output data set as “SHOES_JOBURG_TRNS”.
/* Options */ PROC TRANSPOSE DATA=WORK.SHOES_JOBURG NAME=VAR_NAME LABEL=VAR_LABEL PREFIX=PRODUCT OUT=WORK.SHOES_JOBURG_TRNS; RUN;
Example 2: The PROC TRANSPOSE Statements (VAR & ID)
In the previous example, we showed how to use the Options of the transpose procedure. In this part of this article, we will discuss the Statements of PROC TRANSPOSE in SAS. More specifically, the VAR and ID statement. We continue with the example mentioned above.
The VAR statement
As mentioned before, the default behavior of PROC TRANSPOSE is to transpose all the numeric columns from vertical to horizontal. However, with the VAR statement, you can select which column(s) you want to transpose. You can select both numeric and character column(s). With the code below we show how to transpose only the Sales column.
/* The VAR statement */ PROC TRANSPOSE DATA=WORK.SHOES_JOBURG NAME=VAR_NAME LABEL=VAR_LABEL PREFIX=PRODUCT OUT=WORK.SHOES_JOBURG_TRNS; VAR SALES; RUN;
As you can see, PROC TRANSPOSE only transposed the Sales column of the SAS data set.
The ID statement
In a previous section we discussed the PREFIX options. With this option you can change the prefix of the new column names. We used this option to change the column names from “COL1”, “COL2”, etc. to “PRODUCT1”, “PRODUCT2”, etc.
However, with the ID statement you can use the values of a column as the new variable names. In this example, we will use the column Product as ID. The code below show how to do this.
/* The ID statement */ PROC TRANSPOSE DATA=WORK.SHOES_JOBURG NAME=VAR_NAME LABEL=VAR_LABEL OUT=WORK.SHOES_JOBURG_TRNS; VAR SALES; ID PRODUCT; RUN;
As you can see in the image above, we used the ID statement to give the columns the names of the product (Boot, Sandal, etc.). If the variable used in the ID statement contains special characters or blanks, SAS replaces them with an underscore (Women_s_Dress).
The column you use in the ID statement can be a numeric or character variable. If it is a numeric variable, then an underscore “_” is added at the beginning of the variable name automatically. It is also possible to combine the ID statement with the PREFIX option. Lastly, you can use more than one variable in the ID statement.
Example 3: The PROC TRANSPOSE Statements (BY)
The last statement of the transpose procedure we will discuss is the BY statement. With the BY statement, you can define groups of data that must be transposed. SAS won’t transpose the variables that are mentioned in the BY statement.
By default, SAS assumes that the data you want to transpose is sorted by the variables mentioned in the BY statement. If not, you need to sort the data first or add the keyword NOTSORTED to let SAS know that the data is not sorted.
To illustrate how to use the BY statement we create a new data set. This new data set contains both the shoes’ sales from Johannesburg as well as from Nairobi. We will use the BY statement to reshape and group the data by these cities (Subsidiaries).
/* Create a data set */ DATA WORK.SHOES_JOBURG_NAIROBI; SET SASHELP.SHOES; IF SUBSIDIARY IN ("Johannesburg" "Nairobi"); KEEP REGION PRODUCT SUBSIDIARY SALES INVENTORY; RUN;
The transpose operation shown in the images above uses all the PROC TRANSPOSE Options and Statements that we have discussed. See the code below.
/* The BY statement */ PROC TRANSPOSE DATA=WORK.SHOES_JOBURG_NAIROBI NAME=VAR_NAME OUT=WORK.SHOES_JOBURG_NAIROBI_TRNS (DROP=_LABEL_); VAR SALES; ID PRODUCT; BY SUBSIDIARY; RUN;