If you analyze your data, you probably want to know descriptive statistics such as the sum and the mean. If you work with tables, you can calculate the mean of a row or a column. In this article, we discuss **how to calculate the mean of a column in SAS**.

**In SAS, you can calculate the mean of a column with PROC SQL, PROC MEANS, PROC SUMMARY, PROC UNIVARIATE, and an ordinary Data Step. PROC SQL provides the easiest way to get the column average, while the other procedures give you more flexibility. The Data Step is more complex but shows you the cumulative mean too.**

In the remainder of this article, we take a detailed look at all 5 methods. We provide examples with reusable SAS and discuss their advantages and disadvantages. At the end of this article, you find a table with a comparison of the 5 methods.

Throughout this article, we will use the terms mean and averages interchangeably.

*Instead, we have also written articles about how to calculate the average per group or the weighted average (per group)*.

**Sample Data**

In all examples, we use the dataset *work.my_data* which has 1 column (*MyColumn*) and 5 rows. The goal is to find the average of the column *MyColumn* (which is (1 + 2 + 3 + 4 + 5) / 5 = 3).

Here is the SAS code to create this dataset.

data work.my_data; input MyColumn; datalines; 1 2 3 4 5 ; run;

Contents

## 1. Calculate the Column Mean in SAS with *PROC SQL*

The first, and probably easiest method to calculate the mean of a column in SAS is with PROC SQL.

PROC SQL is a powerful SAS Base procedure that you can use to process SQL statements. So, if you want to find the average of a column and you have experience with coding in SQL, this will be your preferred method.

This method consists of 4 steps:

- Open the SQL procedure with
*PROC SQL*. - Create a new column containing the column mean by using the
*SELECT*statement and the*AVG(column-name)*function. The*AVG*function calculates the average of the provided column. After the*AS*keyword, you define the name of the column that will contain the result of the*AVG*function. - The
*FROM*statement provides the dataset where SAS can find the column of which you want to calculate the mean. - Close the SQL procedure with
*QUIT*.

proc sql; select avg(MyColumn) as Avg_MyColumn from work.my_data; quit;

By default, PROC SQL shows only the result of the executed code on your screen. In other words, it doesn’t create an output dataset.

If you need a table with the calculated mean, you need to add the *CREATE TABLE* statement to your code. After this statement, you provide the name of the table that SAS will create, and the *AS* keyword. If you use this code, SAS doesn’t print the result to your screen.

proc sql; create table work.avg_my_data as select avg(MyColumn) as Avg_MyColumn from work.my_data; quit;

To summarize, PROC SQL provides a quick and easy way to calculate the average of a column. Especially, if you have experience with coding in SQL. However, this method either shows the column average on your screen or creates a table with the result.

**Do you know? How to Count the Number of Observations per Group with PROC SQL**

## 2. Calculate the Column Mean in SAS with *PROC MEANS*

The second method to calculate the mean of a column in SAS is with PROC MEANS.

PROC MEANS is a SAS Base procedure that you can use for analyzing your data. It provides descriptive statistics such as the number of observations, the sum, the mean, and the median.

If you use PROC MEANS to calculate the average of a column in SAS, your code has (at least) 3 parts:

- You start the procedure with the
*PROC MEANS*statement followed by the*DATA*=-option. With the*DATA*=-option you provide the input dataset. You use the*MEAN*keyword to only calculate the column mean. - With the
*VAR <column-name>*statement, you let SAS know of which column you want to calculate the mean. - You close the procedure with the
*RUN*statement.

proc means data=work.my_data mean; var MyColumn; run;

By default, PROC MEANS only prints the result of the procedure to your screen. However, you can add an *OUTPUT *statement to your code to generate an output table too.

Within the *OUTPUT *statement, you use the *OUT*=-option to define the name of the output table. With the *MEAN*=-option, you define the column name that contains the column mean.

With the code below we create an output table where the column *Avg_MyColumn* contains the average of the *MyColumn* column.

proc means data=work.my_data mean; var MyColumn; output out=work.avg_my_data mean=Avg_MyColumn; run;

As you can see in the image above, the OUTPUT statement of the MEANS procedure creates two additional columns (*_TYPE_* and *_FREQ_*). You can remove these two columns with the *DROP*=-option. Check this article to learn more about selecting and removing columns in SAS.

## 3. Calculate the Column Mean in SAS with *PROC SUMMARY*

The third method to calculate the average of a variable in SAS is with PROC SUMMARY.

PROC SUMMARY is also a SAS Base procedure to analyze data and calculate descriptive statistics. This procedure is very similar to PROC MEANS, but there is one big difference. By default, PROC SUMMARY doesn’t print the result of the executed code to your screen.

You need 3 statements to calculate the average of a SAS variable with PROC SUMMARY:

- Start the procedure with the
*PROC SUMMARY*statement. Use the*DATA*=-option to define the input table. Provide the*MEAN*keyword to calculate the mean of the SAS variable. Use the*PRINT*keyword to print the result to your screen. If you don’t provide the*PRINT*keyword, SAS will return an error. - Use the
*VAR <variable-name>*statement to let SAS know of which column you want to calculate the mean. - Finish the SUMMARY procedure with the
*RUN*statement.

proc summary data=work.my_data mean print; var MyColumn; run;

With the code above, SAS only prints the average of the variable to your screen. So, if you need a table with the column average, you need extra code.

Like the MEANS procedure, you need to add an *OUTPUT *statement. Again, with the *OUT*=-option you provide the name of the desired output table. You use the *MEAN*=-option to define the name of the new column.

proc summary data=work.my_data mean print; var MyColumn; output out=avg_my_data mean=Avg_MyColumn; run;

The output of PROC SUMMARY is identical to PROC MEANS. Hence, the output dataset has two additional columns which you could remove with the `DROP`

=-option.

## 4. Calculate the Column Mean in SAS with *PROC UNIVARIATE*

The fourth way to calculate the mean of a column in SAS is with PROC UNIVARIATE.

PROC UNIVARIATE is a powerful SAS Base procedure that you can use to assess the distribution of your data, including a test for normality. Although this procedure is more for statistical purposes, you can still use it to calculate the average of a variable.

You can calculate the column average using PROC UNIVARIATE with 3 statements:

- Start with the
*PROC UNIVARIATE*statement and the*DATA*=-option. You use the*DATA*=-option to define your input dataset. - Use the
*VAR <column-name>*to specify - Finish your code with the
*RUN*statement.

proc univariate data=work.my_data; var MyColumn; run;

Running the code above will produce the following output.

As you can see, the UNIVARIATE procedure generates much information about the selected variable. You can find the mean of the variable in the *Moments* section.

Like the previous procedures, the UNIVARIATE procedure only generates a report. However, you can add extra code to your program to create a SAS dataset.

You use the *OUTPUT *statement to define the name of the output dataset (*OUT*=-option) and the statistic that it will contain (*MEAN*=-option).

proc univariate data=work.my_data; var MyColumn; output out=work.avg_my_data mean=Avg_MyColumn; run;

Compared to the MEANS and SUMMARY procedures, the output dataset of the UNIVARIATE procedure has the advantages that it doesn’t contain the 2 additional columns (*_TYPE_* and *_FREQ_*).

## 5. Calculate the Column Mean in SAS with a *Data Step*

The fifth, and final method to calculate the column mean in SAS is with a Data Step.

As you know, a SAS Data Step processes the input data row by row. This is great to calculate the average of a row and might seem less useful to calculate the average of a column. However, it is still possible and provides an extra advantage compared to the previous methods.

As a SAS Data Step processes the dataset in the *SET *statement row by row, it is necessary to know the sum of the previous rows and the number of rows to calculate the column average. Using the *RETAIN *keyword you can add the value of the current row to the sum of all previous rows. This way, SAS will create a dataset with a new column that contains the cumulative sum.

With the cumulative sum and the special variable *_N_*, we can calculate the cumulative average of each row. The variable *_N_* contains the row number of the row that SAS is currently processing.

With the code below we create a new, temporary dataset that contains two new columns:

*Sum_MyColumn*with the cumulative sum.*Avg_MyColumn*with the cumulative average.

data work.avg_my_data_tmp; set work.my_data; retain Sum_MyColumn; Sum_MyColumn = sum(Sum_MyColumn, MyColumn); Avg_MyColumn = Sum_MyColumn / _N_; run;

If you are only interested in the total average of the column, you need to select the last row. You can do this with the *END*=-option and the *IF *statement.

data work.avg_my_data; set work.avg_my_data_tmp end=last_obs; if last_obs then output; run;

If you want to know more about the *END*=-option and how to select specific rows from a SAS dataset, we highly recommend this article.

## Summary

In this article, we’ve discussed 5 methods to calculate the mean of a column in SAS. Each method has its own use case. The table below provides an overview of the 5 methods, their advantages, and their disadvantages.

Method | Complexity | Advantages | Disadvantage |
---|---|---|---|

PROC SQL | Low | Quick & Easy | Can’t create Results & Output table with same code |

PROC MEANS | Medium | Creates Result & Output table by default | Output table contains 2 extra columns (_TYPE_ & _FREQ_) |

PROC SUMMARY | Medium | Creates only Results table by default | Output table contains 2 extra columns (_TYPE_ & _FREQ_) |

PROC UNIVARIATE | Medium | Many Statistics by default | Can be slow because of many extra stastics |

DATA STEP | High | Also creates cumulative mean | Hard(er) to code |

**Do you know? 5 Ways to Find the Minimum Value of a Column (by Group)**