While working with data, you may need to change the variables of your data set or table. In this article, we explain how to dynamically rename multiple column names in SAS. First, we will describe the RENAME option of the data step. This option is useful when you want to rename some variables. However, if you need to rename a lot of variables, then the MODIFY statement is more practical. This will be explained towards the end of the article.
Contents
The RENAME Syntax
The syntax of the RENAME option is shown below
RENAME = (var_name_1_old = var_name_1_new ….. var_name_n_old = var_name_n_new)
With the RENAME options you can rename one or more variables at the same time. First you specify the old variable name. After the equal-sign follows the new variable name.
The effect of the RENAME options depends on where you place it:
- In the SET statement: If you write the RENAME option in the SET statement, then the new variable name(s) will take effect when the Program Data Vector (PDV) is created. Hence, all programming statements in the data step must refer to the new variable name(s).
- In the DATA statement: If you write the RENAME option in the DATA statement, then the new variable name(s) will take effect after all programming statements.
Examples: Rename Column Names in SAS
In the two examples below, we will rename variables using the RENAME options. For this purpose, we create the data set below. This data set contains two columns with student names and their test scores.
/* CREATE A SIMPLE DATA SET */ DATA WORK.GRADES; INPUT NAME $ GRADE; DATALINES; MARIA 80 JOHN 70 PETER 85 EMMA 75 MIA 90 WILLIAM 65 ; RUN;
Rename ONE Variable
In this example, we rename the column GRADE to SCORE. We use the RENAME option in the SET statement. See the example code below on how to do this.
/* RENAME THE COLUMN 'GRADE' TO 'SCORE' */ DATA WORK.SCORES; SET WORK.GRADES (RENAME=(GRADE=SCORE)); RUN;
Rename TWO Variables
As mentioned in the syntax section, you can use the RENAME option to rename more than one variable at the same time. This example shows how to rename the NAME column to FIRST_NAME, and the GRADE column to SCORE. This time we place the RENAME option in the DATA statement.
/* RENAME THE COLUMNS 'NAME' AND 'GRADE' */ DATA WORK.FIRST_NAME_SCORES (RENAME=(NAME=FIRST_NAME GRADE=SCORE)); SET WORK.GRADES; RUN;
Rename MULTIPLE Variables Dynamically
Now suppose you have a SAS data set with many columns and you want to rename (some of) them. You can use the RENAME option and write out all the changes you need to carry out. However, this may be infeasible because the number of column names to change if very large and/or you don’t know all the column names. In the example below, we show a way to rename multiple column names dynamically at the same time.
For this example, we create a data set with student names and their grades on four exams. The goal is to change the column names from exam_a, exma_b, exam_c and exam_d to exam_1, exam_2, exam_3 and exam_4.
/* CREATE A DATA SET WITH MULTIPLE COLUMNS */ DATA WORK.EXAM_GRADES; INPUT NAME $ EXAM_A EXAM_B EXAM_C EXAM_D; DATALINES; MARIA 80 70 90 80 JOHN 70 65 75 75 PETER 85 80 80 90 EMMA 75 80 75 70 MIA 90 85 80 80 WILLIAM 65 70 75 75 ; RUN;
To dynamically rename all aforementioned column names, we create a mapping table. This mapping table contains a column with the old column names and a column with the new column names.
Use the Power of the Dictionary Tables
First, we obtain the current column names using the Dictionary Tables. Dictionary Tables are metadata (i.e., data about data) and are automatically available when a SAS session starts. They are updated automatically whenever there is a change in a data set.
We use the Dictionary Table COLUMNS which contains information about all the columns of the tables in the current session. We need three WHERE statements to select all the numeric columns of our example data set.
/* OBTAIN COLUMN NAMES */ PROC SQL; CREATE TABLE WORK.COLUMN_NAMES AS SELECT NAME AS COL_NAME FROM DICTIONARY.COLUMNS WHERE LIBNAME = "WORK" AND MEMNAME = "EXAM_GRADES" AND TYPE = "num"; QUIT;
Create a Mapping Table
In the previous step, we obtained the current column names of our example data set. In this step, we create the second column of our mapping table. This table contains the new variable names. For each column name, we need to replace the last character (a, b, c, etc.) with an increasing number. We do this with the use of the CATS function, the SUBSTR function, and the FIND function.
/* CREATE A MAPPING TABLE WITH THE NEW COLUMN NAMES */ DATA WORK.MAPPING_COL_NAMES; SET WORK.COLUMN_NAMES; COL_NAME_NEW = CATS(SUBSTR(COL_NAME, 1, FIND(COL_NAME,"_")),_N_); RUN;
Create a Macro Variable with all the Renames
The last step to use the MODIFY statement to dynamically rename multiple variable names is to create a macro variable with the syntax. We want a list of all the renames. The code below shows how to create a macro variable with the SELECT INTO statement.
/* SAVE THE MAPPING AS A MACRO VARIABLE */ PROC SQL; SELECT CATX("=", COL_NAME, COL_NAME_NEW) INTO :LIST_RENAME SEPARATED BY " " FROM WORK.MAPPING_COL_NAMES; QUIT; %PUT &=LIST_RENAME.;
Apply the MODIFY Statement
Finally, we can use the macro variable list_rename in the MODIFY statement of the datasets procedure to rename the variables.
PROC DATASETS LIB=WORK; MODIFY EXAM_GRADES; RENAME &LIST_RENAME.; RUN;
Although the total number of lines of code above might seem substantial, this code is very convenient to dynamically rename multiple column names in SAS. Especially when you need to change many (tens, hundreds, etc.) of column names. This code works also if you don’t exactly know all the current column names. For example, if you don’t know how many exams the students made. Furthermore, you can create a macro function of the code above and reuse it.