One of the most common mathematical operations is addition, i.e. summation. If you work with tables, you can calculate the sum of a row or a column. In this article, we discuss how to calculate the sum of a column in SAS.
In SAS, you can calculate the sum of a column with PROC SQL, PROC MEANS, PROC SUMMARY, PROC UNIVARIATE, and an ordinary Data Step. PROC SQL provides the easiest way to get the column sum, while the other procedures give you more flexibility. The Data Step is more complex but shows you the cumulative sum too.
In the remainder of this article, we take a detailed look at all 5 methods. We provide examples with reusable SAS and discuss their advantages and disadvantages. At the end of this article, you find a table with a comparison of the 5 methods.
If you want to know how to calculate the cummlative sum, check out this article.
In all examples, we use the dataset work.my_data which has 1 column (MyColumn) and 5 rows. The goal is to get the sum of the column MyColumn (which is 1 + 2 + 3 + 4 + 5 = 15).
Here is the SAS code to create this dataset.
data work.my_data; input MyColumn; datalines; 1 2 3 4 5 ; run;
1. Calculate the Column Sum in SAS with PROC SQL
The first, and probably easiest, method to calculate the sum of a column in SAS is with PROC SQL.
PROC SQL is a powerful SAS Base procedure that you can use to process SQL statements. So, if you want to calculate the column sum and you have experience with coding in SQL, this will be your preferred method.
This method consists of 4 steps:
- Open the SQL procedure with
- Create a new column containing the column sum with the SELECT statement and the SUM(column-name) function. The SUM function calculates the sum of the provided column. After the AS keyword, you define the name of the column that will contain the result of the SUM function.
- The FROM statement provides the dataset where SAS can find the column that you want to sum.
- Close the SQL procedure with
proc sql; select sum(MyColumn) as Sum_MyColumn from work.my_data; quit;
By default, PROC SQL shows only the result of the executed code on the screen. In other words, it doesn’t create an output dataset.
If you need a table with the column sum, you need to add the
CREATE TABLE statement to your code. After this statement you provide the name of the table that SAS will create, and the
AS keyword. If you use this code, SAS doesn’t print the result to your screen.
proc sql; create table work.sum_my_data as select sum(MyColumn) as Sum_MyColumn from work.my_data; quit;
To summarize, PROC SQL provides a quick and easy way to calculate the sum of a column. Especially, if you have experience with coding in SQL. However, this method either shows the column sum on your screen or creates a table with the result.
2. Calculate the Column Sum in SAS with PROC MEANS
The second method to calculate the column total in SAS is with PROC MEANS.
PROC MEANS is a SAS Base procedure that you can use for analyzing your data. It provides descriptive statistics such as the number of observations, the sum, the mean, and the median.
If you use PROC MEANS to calculate the sum of a column in SAS, your code has (at least) 3 parts:
- You start the procedure with the
PROC MEANSstatement followed by the
DATA=-option. With the
DATA=-option you provide the input dataset. You use the
SUMkeyword to only calculate the column sum.
- With the
VAR <column-name>statement you let SAS know of which column you want to calculate the sum.
- You close the procedure with the
proc means data=work.my_data sum; var MyColumn; run;
By default, PROC MEANS only prints the result of the procedure to your screen. However, you can add an
OUTPUT statement to your code to generate an output table too.
OUTPUT statement, you use the
OUT=-option to define the name of the output table. With the
SUM=-option, you define the column name that contains the column sum.
With the code below we create an output table where the column Sum_MyColumn contains the sum of the MyColumn column.
proc means data=work.my_data sum; var MyColumn; output out=work.sum_my_data sum=Sum_MyColumn; run;
As you can see in the image above, the OUTPUT statement of the MEANS procedure creates two additional columns (_TYPE_ and _FREQ_). You can remove these two columns with the
DROP=-option. Check this article to learn more about selecting and removing columns in SAS.
3. Calculate the Column Sum in SAS with PROC SUMMARY
The third method to calculate the sum of a variable in SAS is with PROC SUMMARY.
PROC SUMMARY is also a SAS Base procedure to analyze data and calculate descriptive statistics. This procedure is very similar to PROC MEANS, but there is one big difference. By default, PROC SUMMARY doesn’t print the result of the executed code to your screen.
You need 3 statements to calculate the sum of a SAS variable with PROC SUMMARY:
- Start the procedure with the
PROC SUMMARYstatement. Use the
DATA=-option to define the input table. Provide the
SUMkeyword to calculate the sum of the SAS variable. Use the
- Use the
VAR <variable-name>statement to let SAS know of which column you want to calculate the sum.
- Finish the SUMMARY procedure with the
proc summary data=work.my_data sum print; var MyColumn; run;
With the code above, SAS only prints the sum of the variable to your screen. So, if you need a table with the column sum, you need extra code.
Like the MEANS procedure, you need to add an
OUTPUT statement. Again, with the
OUT=-option you provide the name of the desired output table. You use the
SUM=-option to define the name of the new column.
proc summary data=work.my_data print sum; var MyColumn; output out=sum_my_data sum=Sum_MyColumn; run;
The output of PROC SUMMARY is identical to PROC MEANS. Hence, the output dataset has two additional columns which you could remove with the
4. Calculate the Column Sum in SAS with PROC UNIVARIATE
The fourth way to calculate the sum of a column in SAS is with PROC UNIVARIATE.
PROC UNIVARIATE is a powerful SAS Base procedure that you can use to assess the distribution of your data, including a test for normality. Although this procedure is more for statistical purposes, you can still use it to calculate the sum of a column.
You can calculate the column sum using PROC UNIVARIATE with 3 statements:
- Start with the
PROC UNIVARIATEstatement and the
DATA=-option. You use the
DATA=-option to define your input dataset.
- Use the
VAR <column-name>to specify of which column you want to calculate the sum.
- Finish your code with the
proc univariate data=work.my_data; var MyColumn; run;
Running the code above will produce the following output.
As you can see, the UNIVARIATE procedure generates much information about the selected variable. You can find the sum of the variable in the Moments section.
Like the previous procedures, the UNIVARIATE procedure only generates a report. However, you can add extra code to your program to create a SAS dataset.
You use the
OUTPUT statement to define the name of the output dataset (
OUT=-option) and the statistic that it will contain (
proc univariate data=work.my_data; var MyColumn; output out=work.sum_my_data sum=Sum_MyColumn; run;
Compared to the MEANS and SUMMARY procedures, the output dataset of the UNIVARIATE procedure has the advantages that it doesn’t contain the 2 additional columns (_TYPE_ and _FREQ_).
5. Calculate the Column Sum in SAS with a Data Step
The fifth, and final, method to calculate the column sum in SAS is with a Data Step.
As you know, a SAS Data Step processes the input data row by row. This is great to calculate the sum of a row and might seem less useful to calculate the sum of a column. However, it is still possible and provides an extra advantage compared to the previous methods.
As a SAS Data Step processes the dataset in the
SET statement row by row, it is necessary to know the sum of the previous rows to calculate the column total. Using the
RETAIN keyword you can add the value of the current row to the sum of all previous rows. This way, SAS will create a dataset with a new column that contains the cumulative sum.
With the code below we create a new, temporary dataset that contains a new column Sum_MyColumn with the cumulative sum.
data work.sum_my_data_tmp; set work.my_data; retain Sum_MyColumn; Sum_MyColumn = sum(Sum_MyColumn, MyColumn); run;
If you are only interested in the total sum of the column, you need to select the last row. You can do this with the
END=-option and the
data work.sum_my_data; set work.sum_my_data_tmp end=last_obs; if last_obs then output; run;
If you want to know more about the
END=-option and how to select specific rows from a SAS dataset, we highly recommend this article.
In this article, we’ve discussed 5 methods to calculate the sum of a column in SAS. Each method has its own use case. The table below provides an overview of the 5 methods, their advantages, and their disadvantages.
|PROC SQL||Low||Quick & Easy||Can’t create Results & Output table with same code|
|PROC MEANS||Medium||Creates Result & Output table by default||Output table contains 2 extra columns (_TYPE_ & _FREQ_)|
|PROC SUMMARY||Medium||Creates only Results table by default||Output table contains 2 extra columns (_TYPE_ & _FREQ_)|
|PROC UNIVARIATE||Medium||Many Statistics by default||Can be slow because of many extra stastics|
|DATA STEP||High||Also creates cumulative sum||Hard(er) to code|