SAS provides many ways to avoid repetitive code, one of which the colon modifier. In this article, we discuss the many applications of the colon modifier. For example, how to append or delete multiple tables with a similar name, how to keep or drop columns with a similar name, and how to use all columns with a similar name as input for an arithmetic operation.
Contents
Use the Colon Modifier to Append Multiple Tables
Firstly, you can use the colon modifier to append multiple tables with a similar name.
In the example below, we will append the tables work.cities_africa, work.cities_asia, and work.cities_europe, and create a new table: work.cities_world. We could write-out the complete code.
data work.cities_world; set work.cities_africa work.cities_asia work.cities_europe; run;
If you have a few tables, then writing-out all the table names is doable. However, if you have many tables, then this can be a very tedious task. Moreover, writing-out all table names is impossible if you don’t know all the table names.
Fortunately, you can use the colon (:) modifier to append all tables with a similar name. In our example, all data sets start with countries_. If we use the SET statement “SET work.cities_:”, then SAS will append all tables whose name starts with cities_.
This trick to append tables is especially useful if you don’t know the names of the tables you want to combine. This might happen, for example, if you write a macro function.
data work.cities_europe; length city country $15.; infile datalines dlm=","; input city $ country $; datalines; Amsterdam, The Netherlands Madrid, Spain Brussels, Belgium ; run; data work.cities_africa; length city country $15.; infile datalines dlm=","; input city $ country $; datalines; Cairo, Egypt Luanda, Angola Tunis, Tunisia ; run; data work.cities_asia; length city country $15.; infile datalines dlm=","; input city $ country $; datalines; Tokyo, Japan Hanoi, Vietnam Jakarta, Indonesia ; run; data work.cities_world; set work.cities_:; run;
Use the Colon Modifier to Delete Multiple Tables
Secondly, you can use the colon (:) modifier to delete multiple tables with a similar name.
We could write-out the complete name of the tables we want to remove. For example, if we want to remove the tables work.cities_africa, work.cities_asia, and work.cities_europe, we could use the following code.
proc datasets lib=work nolist; delete cities_africa cities_asia cities_europe; run;
Although this code works fine, writing-out all the table names can be a tedious task. Also, writing-out is impossible if you don’t know or remember all table names.
To avoid writing many lines of SAS code, you can use the colon modifier. If you use the colon modifier in the delete statement, then SAS removes all tables with a similar name. That is to say, the statement “delete cities_:” deletes all tables whose name starts with cities_.
data work.cities_europe; length city country $15.; infile datalines dlm=","; input city $ country $; datalines; Amsterdam, The Netherlands Madrid, Spain Brussels, Belgium ; run; data work.cities_africa; length city country $15.; infile datalines dlm=","; input city $ country $; datalines; Cairo, Egypt Luanda, Angola Tunis, Tunisia ; run; data work.cities_asia; length city country $15.; infile datalines dlm=","; input city $ country $; datalines; Tokyo, Japan Hanoi, Vietnam Jakarta, Indonesia ; run; proc datasets lib=work nolist; delete cities_:; run;
Use the Colon Modifier to Keep Multiple Columns
Besides efficiently combining and removing multiple tables, you can use the colon (:) modifier in the KEEP option to select multiple columns at once. With the KEEP option, you can select the columns that SAS will read from the input data set and/or write to the output data set.
In this example, we have a data set with the results of 5 exams of 3 students. If we want to create a new data set with only the results of the exams, we could use the following code.
data work.keep_exam_scores; set work.exam_scores (keep=exam1 exam2 exam3 exam4 exam5); run;
The code above works fine, but specifying all columns names can be a boring and time-consuming task. Instead, you can use the colon modifier to keep all the columns that start with exam results. If you use “keep=exam:“, SAS will select all columns that start with “exam“. See the example below.
data work.exam_scores; infile datalines dlm=","; input id name $ exam1 exam2 exam3 exam4 exam5; datalines; 1, John, 60, 65, 62, 70, 71 2, Emma, 84, 92, 79, 85, 87 3, Will, 77, 73, 70, 81, 79 ; run; data work.keep_exam_scores; set work.exam_scores (keep=exam:); run;
Use the Colon Modifier to Drop Multiple Columns
Just as you can use the colon (:) modifier to keep (select) columns, you use the colon modifier also to drop (remove) multiple columns at once from a dataset. For example, in the following example, we drop all columns that start with “exam“.
Again, you could specify all the columns you want to drop by writing their complete name after the DROP option. However, you can make your code shorter by using the colon modifier. If you use “drop=exam:“, SAS will remove all columns whose name starts with “exam“. See the example below.
data work.exam_scores; infile datalines dlm=","; input id name $ exam1 exam2 exam3 exam4 exam5; datalines; 1, John, 60, 65, 62, 70, 71 2, Emma, 84, 92, 79, 85, 87 3, Will, 77, 73, 70, 81, 79 ; run; data work.drop_exam_scores; set work.exam_scores (drop=exam:); run;
Use the Colon Modifier to Carry Out Arithmetic Operations
Finally, you can also use the colon (:) modifier in arithmetic operations such as sum, mean, and max.
For example, if you want to know the average score of John’s 5 exams you could use this code:
mean_score = mean(exam1, exam2, exam3, exam4, exam5)
However, there is a more efficient way to write this code. With the colon modifier you can take the average of all columns whose name starts with “exam“.
mean_score = mean(of exam:)
data work.exam_scores; infile datalines dlm=","; input id name $ exam1 exam2 exam3 exam4 exam5; datalines; 1, John, 60, 65, 62, 70, 71 2, Emma, 84, 92, 79, 85, 87 3, Will, 77, 73, 70, 81, 79 ; run; data work.arithmetic_opertaions; set work.exam_scores; sum_score = sum(of exam:); mean_score = mean(of exam:); run;
You can use the same trick to take the sum, the min, the max, etc. from a number of columns with a similar name.
You can find more applications of the colon modifier in this article.