SAS How To's

5 Easy Ways to Calculate the Column Sum in SAS

One of the most common mathematical operations is addition, i.e. summation. If you work with tables, you can calculate the sum of a row or a column. In this article, we discuss how to calculate the sum of a column in SAS.

In SAS, you can calculate the sum of a column with PROC SQL, PROC MEANS, PROC SUMMARY, PROC UNIVARIATE, and an ordinary Data Step. PROC SQL provides the easiest way to get the column sum, while the other procedures give you more flexibility. The Data Step is more complex but shows you the cumulative sum too.

In the remainder of this article, we take a detailed look at all 5 methods. We provide examples with reusable SAS and discuss their advantages and disadvantages. At the end of this article, you find a table with a comparison of the 5 methods.

If you want to know how to calculate the cummlative sum, check out this article.

Sample Data

In all examples, we use the dataset work.my_data which has 1 column (MyColumn) and 5 rows. The goal is to get the sum of the column MyColumn (which is 1 + 2 + 3 + 4 + 5 = 15).

Here is the SAS code to create this dataset.

data work.my_data;
    input MyColumn;
    datalines;
1
2
3
4
5
;
run;
SAS Sample Data
work.my_data

1. Calculate the Column Sum in SAS with PROC SQL

The first, and probably easiest, method to calculate the sum of a column in SAS is with PROC SQL.

PROC SQL is a powerful SAS Base procedure that you can use to process SQL statements. So, if you want to calculate the column sum and you have experience with coding in SQL, this will be your preferred method.

This method consists of 4 steps:

  1. Open the SQL procedure with PROC SQL.
  2. Create a new column containing the column sum with the SELECT statement and the SUM(column-name) function. The SUM function calculates the sum of the provided column. After the AS keyword, you define the name of the column that will contain the result of the SUM function.
  3. The FROM statement provides the dataset where SAS can find the column that you want to sum.
  4. Close the SQL procedure with QUIT.
proc sql;
    select sum(MyColumn) as Sum_MyColumn
    from work.my_data;
quit;
Calculate the sum of a column in SAS with PROC SQL

By default, PROC SQL shows only the result of the executed code on the screen. In other words, it doesn’t create an output dataset.

If you need a table with the column sum, you need to add the CREATE TABLE statement to your code. After this statement you provide the name of the table that SAS will create, and the AS keyword. If you use this code, SAS doesn’t print the result to your screen.

proc sql;
    create table work.sum_my_data as
        select sum(MyColumn) as Sum_MyColumn
        from work.my_data;
quit;

To summarize, PROC SQL provides a quick and easy way to calculate the sum of a column. Especially, if you have experience with coding in SQL. However, this method either shows the column sum on your screen or creates a table with the result.

2. Calculate the Column Sum in SAS with PROC MEANS

The second method to calculate the column total in SAS is with PROC MEANS.

PROC MEANS is a SAS Base procedure that you can use for analyzing your data. It provides descriptive statistics such as the number of observations, the sum, the mean, and the median.

If you use PROC MEANS to calculate the sum of a column in SAS, your code has (at least) 3 parts:

  1. You start the procedure with the PROC MEANS statement followed by the DATA=-option. With the DATA=-option you provide the input dataset. You use the SUM keyword to only calculate the column sum.
  2. With the VAR <column-name> statement you let SAS know of which column you want to calculate the sum.
  3. You close the procedure with the RUN statement.
proc means data=work.my_data sum;
    var MyColumn;
run;
Calculate the sum of a column in SAS with PROC MEANS

By default, PROC MEANS only prints the result of the procedure to your screen. However, you can add an OUTPUT statement to your code to generate an output table too.

Within the OUTPUT statement, you use the OUT=-option to define the name of the output table. With the SUM=-option, you define the column name that contains the column sum.

With the code below we create an output table where the column Sum_MyColumn contains the sum of the MyColumn column.

proc means data=work.my_data sum;
    var MyColumn;
    output out=work.sum_my_data 
        sum=Sum_MyColumn;
run;
PROC MEANS Output

As you can see in the image above, the OUTPUT statement of the MEANS procedure creates two additional columns (_TYPE_ and _FREQ_). You can remove these two columns with the DROP=-option. Check this article to learn more about selecting and removing columns in SAS.

3. Calculate the Column Sum in SAS with PROC SUMMARY

The third method to calculate the sum of a variable in SAS is with PROC SUMMARY.

PROC SUMMARY is also a SAS Base procedure to analyze data and calculate descriptive statistics. This procedure is very similar to PROC MEANS, but there is one big difference. By default, PROC SUMMARY doesn’t print the result of the executed code to your screen.

You need 3 statements to calculate the sum of a SAS variable with PROC SUMMARY:

  1. Start the procedure with the PROC SUMMARY statement. Use the DATA=-option to define the input table. Provide the SUM keyword to calculate the sum of the SAS variable. Use the PRINT keyword to print the result to your screen. If you don’t provide the PRINT keyword, SAS will return an error.
  2. Use the VAR <variable-name> statement to let SAS know of which column you want to calculate the sum.
  3. Finish the SUMMARY procedure with the RUN statement.
proc summary data=work.my_data sum print;
    var MyColumn;
run;
Calculate the sum of a column in SAS with PROC SUMMARY

With the code above, SAS only prints the sum of the variable to your screen. So, if you need a table with the column sum, you need extra code.

Like the MEANS procedure, you need to add an OUTPUT statement. Again, with the OUT=-option you provide the name of the desired output table. You use the SUM=-option to define the name of the new column.

proc summary data=work.my_data print sum;
    var MyColumn;
    output out=sum_my_data
        sum=Sum_MyColumn;
run;
PROC SUMMARY Output

The output of PROC SUMMARY is identical to PROC MEANS. Hence, the output dataset has two additional columns which you could remove with the DROP=-option.

4. Calculate the Column Sum in SAS with PROC UNIVARIATE

The fourth way to calculate the sum of a column in SAS is with PROC UNIVARIATE.

PROC UNIVARIATE is a powerful SAS Base procedure that you can use to assess the distribution of your data, including a test for normality. Although this procedure is more for statistical purposes, you can still use it to calculate the sum of a column.

You can calculate the column sum using PROC UNIVARIATE with 3 statements:

  1. Start with the PROC UNIVARIATE statement and the DATA=-option. You use the DATA=-option to define your input dataset.
  2. Use the VAR <column-name> to specify of which column you want to calculate the sum.
  3. Finish your code with the RUN statement.
proc univariate data=work.my_data;
    var MyColumn;
run;

Running the code above will produce the following output.

Calculate the sum of a column in SAS with PROC UNIVARIATE

As you can see, the UNIVARIATE procedure generates much information about the selected variable. You can find the sum of the variable in the Moments section.

Like the previous procedures, the UNIVARIATE procedure only generates a report. However, you can add extra code to your program to create a SAS dataset.

You use the OUTPUT statement to define the name of the output dataset (OUT=-option) and the statistic that it will contain (SUM=-option).

proc univariate data=work.my_data;
    var MyColumn;
    output out=work.sum_my_data
        sum=Sum_MyColumn;
run;
PROC UNIVARIATE OUTPUT

Compared to the MEANS and SUMMARY procedures, the output dataset of the UNIVARIATE procedure has the advantages that it doesn’t contain the 2 additional columns (_TYPE_ and _FREQ_).

5. Calculate the Column Sum in SAS with a Data Step

The fifth, and final, method to calculate the column sum in SAS is with a Data Step.

As you know, a SAS Data Step processes the input data row by row. This is great to calculate the sum of a row and might seem less useful to calculate the sum of a column. However, it is still possible and provides an extra advantage compared to the previous methods.

As a SAS Data Step processes the dataset in the SET statement row by row, it is necessary to know the sum of the previous rows to calculate the column total. Using the RETAIN keyword you can add the value of the current row to the sum of all previous rows. This way, SAS will create a dataset with a new column that contains the cumulative sum.

With the code below we create a new, temporary dataset that contains a new column Sum_MyColumn with the cumulative sum.

data work.sum_my_data_tmp;
    set work.my_data;
 
    retain Sum_MyColumn;
 
    Sum_MyColumn = sum(Sum_MyColumn, MyColumn);
run;
Calculate the sum of a column in SAS with a Data Step.

If you are only interested in the total sum of the column, you need to select the last row. You can do this with the END=-option and the IF statement.

data work.sum_my_data;
    set work.sum_my_data_tmp end=last_obs;
 
    if last_obs then output;
run;
Calculate the sum of a column in SAS with a Data Step, output

If you want to know more about the END=-option and how to select specific rows from a SAS dataset, we highly recommend this article.

Summary

In this article, we’ve discussed 5 methods to calculate the sum of a column in SAS. Each method has its own use case. The table below provides an overview of the 5 methods, their advantages, and their disadvantages.

MethodComplexityAdvantagesDisadvantage
PROC SQLLowQuick & EasyCan’t create Results & Output table with same code
PROC MEANSMediumCreates Result & Output table by defaultOutput table contains 2 extra columns (_TYPE_ & _FREQ_)
PROC SUMMARYMediumCreates only Results table by defaultOutput table contains 2 extra columns (_TYPE_ & _FREQ_)
PROC UNIVARIATEMediumMany Statistics by defaultCan be slow because of many extra stastics
DATA STEPHighAlso creates cumulative sumHard(er) to code

Do you know? 5 Ways to Find the Minimum Value of a Column (by Group)