Besides the mean and the mode, the median is the third measure of central tendency in statistics. Because this statistics is so important in many analyses, we dedicate this article to discussing how you calculate the median in SAS.
Basically, there are 3 ways to calculate the median in SAS, namely with PROC MEANS, PROC UNIVARIATE, and PROC SQL. Depending on your knowledge and SAS skills, you can use one of them. PROC MEANS and PROC UNIVARIATE are perfect if you want to know other statistics as well. PROC SQL is the preferred method if you are processing data.
Besides demonstrating the basics of these 3 methods, we also show how to calculate the median per group, the weighted median, and how to save the median in an output dataset. For the examples in this article, we use the famous Iris Flower dataset from 1936.
Contents
Calculate the Median with PROC MEANS
The first method to calculate the median in SAS is with the PROC MEAN procedure.
PROC MEANS is a SAS Base procedure to quickly analyze the numeric variables in a dataset. By default, it shows for any given variable the number of observations, its mean, its standard deviation, its minimum, and its maximum. The code to run PROC MEANS is straightforward to read and easy to remember.
proc means data=sashelp.iris; var SepalLength; run;
As the image above shows, PROC MEANS doesn’t calculate the median by default. However, it is possible by adding an extra option.
These are the steps to calculate the median in SAS:
- Start the PROC MEANS procedure
You begin the PROC MEANS procedure with the PROC MEANS statement.
- Define the input dataset
You define the input dataset with the data=-option followed by the name of your dataset. For example, sashelp.iris.
- Let SAS know you want to calculate the median
To calculate the median with PROC MEANS, you add the MEDIAN option to the PROC MEANS statement. This is the most important step if you want to calculate the median with PROC MEANS.
- Define the relevant variable(s)
You specify the variable(s) of which you want to know the median with the VAR statement. This statement starts with the VAR keyword followed by the variable(s) you are interested in.
- Execute the PROC MEANS procedure
You finish and run the PROC MEANS procedure with the RUN statement.
When you use the steps above, SAS calculates the median while ignoring missing values.
The example below shows you we calculate the median of the variableSepalLength.
proc means data=sashelp.iris median; var SepalLength; run;
Note that, if you use the MEDIAN option, PROC MEANS only calculates the median. Therefore, if you are interested in other descriptive statistics, such as the mean or the mode, you need to add extra options to the PROC MEANS statement.
Calculate the Median of Multiple Variables
Above we have shown how to find the median of one variable. However, you can use PROC MEANS also to calculate the median of multiple variables in a single step. Moreover, PROC MEANS presents the results in such a way that you can easily compare the results.
You calculate the median of multiple variables with PROC MEANS by using the VAR statement. The statement starts with the VAR keyword followed by the names of the variables you are interested in. The variable names must be separated by a blank and the statement must be finished with a semicolon.
The SAS code below shows how we calculate the median of four variables.
proc means data=sashelp.iris median; var SepalLength SepalWidth PetalLength PetalWidth; run;
The report that PROC MEANS creates shows the median of the different variables in a neat way. Besides the name of the variable and the median, it also shows the variable labels.
Additionally, you can add the MAXDEC=-option to the PROC MEANS statement to limit the number of decimals SAS shows in the report. For example, below we use MAXDEC=1 to show just one decimal.
proc means data=sashelp.iris median maxdec=1; var SepalLength SepalWidth PetalLength PetalWidth; run;
Calculate the Median per Group
Besides calculating the median of one or more variables, you can also use PROC MEANS to find the median per group. That means that SAS calculates the median for each category within a variable.
To calculate the median per group in SAS you need to add the CLASS statement to the PROC MEANS procedure. This statement starts with the CLASS keyword followed by the variable that defines the groups. The statement ends with a semicolon.
Additionally, by adding more than one variable to the CLASS statement, SAS calculates the median per subgroup.
In the example below we calculate the median of the variable Sepal Length for each type of Iris Species. Besides the name of the species and the median, SAS also shows you the number of observations (N Obs) in each group.
proc means data=sashelp.iris median; class Species; var SepalLength; run;
As you might have expected, you can also calculate the median of multiple variables per group. Below we show an example.
proc means data=sashelp.iris median; class Species; var SepalLength SepalWidth PetalLength PetalWidth; run;
Calculate the Weighted Median
So far, while demonstrating how to calculate the median (per group), we assumed that all observations have the same weight. However, this isn’t always the case. If your observations have different weights, then you should calculate the weighted median.
You can use PROC MEANS to calculate the weighted median of a variable. You do so by adding the WEIGHT statement to your code. This statement starts with the WEIGHT keyword followed by the variable that determines the weights (i.e., importance) of each observation.
In the example below we show how to use the WEIGHT statement and calculate the weighted median.
proc means data=sashelp.iris median; var SepalLength; weight PetalLength; run;
Although we have used the WEIGHT statement, the report doesn’t mention that we’ve calculated the weighted median.
You can the WEIGHT statement also while calculating other descriptive statistics, such as the weighted average.
Create an Output Dataset with the Median
By default, PROC MEANS only creates a report with the median. However, sometimes it is useful to save the median in a SAS dataset for further use. Fortunately, this is possible.
You create an output dataset with the results of the PROC MEANS procedure with the OUTPUT statement. The statement starts with the OUTPUT keyword followed by the OUT=-option, the name of the output dataset, the MEDIAN=-option, and the name of the column that will contain the value of the median.
You can use the OUT=-option to save the dataset in either the WORK library or in a permanent library. As always, the name of the dataset can’t be longer than 32 characters and must start with an alphabetic character or an underscore.
With the MEDIAN=-option you specify the name of the column in the output dataset that will contain the median. This option is obligatory and the name can’t exceed 32 characters. Although not recommended, the variable might contain blanks.
The example below contains the SAS code to create an output dataset of the PROC MEANS procedure.
proc means data=sashelp.iris median; var SepalLength; output out=work.median_iris median = median_SepalLength; run;
As the image below shows, the output dataset contains besides the median also two extra columns, namely _TYPE_ and _FREQ_. Normally, you don’t need these variables, therefore we recommend using the DROP=-option to remove these variables.
If you use the PROC MEANS procedure to calculate the median of multiple variables and you want to save them in an output dataset, then the OUTPUT statement is slightly different. Especially the MEDIAN=-option.
To create a SAS dataset with the median of multiple variables, you need to modify the MEDIAN=-option. In this case, the option starts with the MEDIAN keyword followed by the names of the variables you are analyzing between parenthesis. Then, after the equal sign, you can specify the column names in the output dataset that will contain the medians.
In the SAS code below we show how to save the median of two variables (SepalLength and SepalWidth) into an output dataset.
proc means data=sashelp.iris median; var SepalLength SepalWidth; output out=work.median_iris median(SepalLength SepalWidth) = median_SepalLength median_SepalWidth; run;
You can also store the median per group of one or more variables into a dataset. To do so, you only need the standard CLASS statement to define the groups. However, you don’t need additional code in the OUTPUT statement.
proc means data=sashelp.iris median; class Species; var SepalLength SepalWidth; output out=work.median_iris median(SepalLength SepalWidth) = median_SepalLength median_SepalWidth; run;
As the image above shows, the output dataset has, besides the median per group, one extra row. This row contains the overall median per variable (i.e., ignoring the groups). This might be useful, however, in most cases, it isn’t. Therefore, you can use the NWAY option in the PROC MEAN statement to only calculate the median per group.
In the next example, we show how to use the NWAY option to calculate only the median per group and save them in an output dataset
proc means data=sashelp.iris median nway; class Species; var SepalLength SepalWidth; output out=work.median_iris median(SepalLength SepalWidth) = median_SepalLength median_SepalWidth; run;
Calculate the Median with PROC UNIVARIATE
The second method to find the median in SAS is with the PROC UNIVARIATE procedure.
The PROC UNIVARIATE procedure is a SAS Base procedure that lets you assess the distribution of your data. Although it’s a more advanced procedure than PROC MEANS, you can still use it to calculate the median of a variable. In contrast to PROC MEANS, the PROC UNIVARIATE procedure shows you the median by default. In other words, you don’t need extra options.
To calculate the median with the PROC UNIVARIATE procedure, you only need to specify the input dataset and the variable of interest. You do this with the DATA=-option and the VAR statement. The PROC UNIVARIATE procedure ignores missing values while calculating the median.
The SAS code below shows an example of how to find the median with PROC UNIVARIATE.
proc univariate data=sashelp.iris; var SepalLength; run;
By default, PROC UNIVARIATE creates a report with many statistics. Under the “Basic Statistical Measures” section you find the median of your variable. The image below shows part of the report.
Calculate the Median of Multiple Variables
Similar to PROC MEANS, you can also easily calculate the median of multiple variables with PROC UNIVARIATE.
The VAR statement lets you specify for which variables you want to calculate the median. The statement starts with the VAR keyword followed by the variable names. The variable names must be separated by a blank, and the statement ends with a semicolon.
Below we show the SAS code to find the median of multiple variables with PROC UNIVARIATE.
proc univariate data=sashelp.iris; var SepalLength SepalWidth; run;
In contrast to PROC MEANS, the PROC UNIVARIATE procedure creates a report for each variable. Therefore, PROC UNIVARIATE is less suited to compare the medians. See the image below.
Calculate the Median per Group
You can also use PROC UNIVARIATE to calculate the median per group. For such you need to add the CLASS statement to your code.
The CLASS statement specifies the classification variable(s) in order to divide your data into different groups. This statement starts with the CLASS keyword followed by one or more variables that define the groups. If you use more than one variable to define the groups, then the variables must be separated by a blank.
In the example below, we use the CLASS statement to calculate the median of the variable SepalLength per Species.
proc univariate data=sashelp.iris; class Species; var SepalLength; run;
The following image shows how PROC UNIVARIATE presents the median per group. Note that PROC UNIVARIATE creates a report per combination of variable and group. This makes the comparison of the median for the different variables and groups more difficult. Moreover, the report can become very long.
Similarly, you can also calculate the median for multiple variables per group. See the SAS code below.
proc univariate data=sashelp.iris; class Species; var SepalLength SepalWidth; run;
Calculate the Weighted Median
Similar to PROC MEANS, you can also use PROC UNIVARIATE to calculate the weighted median of a variable. Moreover, the syntax of both methods is identical. That means that you can find the weighted median in PROC UNIVARIATE by adding the WEIGHT statement.
The SAS example below demonstrates how to use the WEIGHT statement to calculate the weighted median in SAS.
proc univariate data=sashelp.iris; var SepalLength; weight PetalLength; run;
In contrast to PROC MEANS, the PROC UNIVARIATE procedure does explicitly show that you have calculated the weighted median. Furthermore, it displays the name of the variable that it has used as the weighting variable.
Create an Output Dataset with the Median
By default, PROC UNIVARIATE only generates a report. If you want to save the outcome of the procedure (e.g., the median) in a SAS table, then you need to add additional code.
You can create an output dataset with the results of the PROC UNIVARIATE procedure with the OUTPUT statement. The statement starts with the OUTPUT keyword, followed by the OUT=-option to define the name of the output dataset. Next, you need to add an extra option to specify which statistics (e.g., the median) you want to save in the table.
You can save the median in the output dataset with the MEDIAN=-option. With this option, you can define the name of the column that will contain the median value.
For example, with the following SAS code, we create the output dataset work.median_iris and save the median in the column median_SepalLength.
proc univariate data=sashelp.iris; var SepalLength; output out=work.median_iris median = median_SepalLength; run;
You can also create a dataset with the median of more than one variable. To do so, you use the MEDIAN=-option followed by the names of the variables that will show the medians.
Make sure that the number of variables in the VAR statement equals the number of variables in the MEDIAN=-option. For example:
proc univariate data=sashelp.iris; var SepalLength SepalWidth; output out=work.median_iris median = median_SepalLength median_SepalWidth; run;
If you have calculated the median per group, you can use the OUTPUT statement to save the results in a SAS dataset. Since the CLASS statement doesn’t affect the OUTPUT statement, you can use the same OUTPUT statement as in the SAS code above.
proc univariate data=sashelp.iris; class Species; var SepalLength SepalWidth; output out=work.median_iris median = median_SepalLength median_SepalWidth; run;
Calculate the Median with PROC SQL
The third method to calculate the median in SAS is with the PROC SQL procedure.
In contrast to the SAS programming language, the SQL language does have a dedicated function to calculate the median. Therefore, calculating the median with PROC SQL is straightforward.
You calculate the median of a variable in PROC SQL with the MEDIAN()-function. The function takes as only argument a numeric variable and returns its median. The MEDIAN()-function ignores missing values.
In the SAS code below, we use the MEDIAN()-function to find the median of the variable SepalLength.
proc sql; select median(SepalLength) as median_SepalLength from sashelp.iris; quit;
Calculate the Median of Multiple Variable and/or by Group
You easily calculate the median of multiple variables with PROC SQL in SAS. For each variable of which you want to know the median you add an extra line of code with the MEDIAN()-function. For example:
proc sql; select median(SepalLength) as median_SepalLength, median(SepalWidth) as median_SepalWidth from sashelp.iris; quit;
Also, you can use PROC SQL to find the median per group. To do so, you need:
- Add the variable that defines the groups in the SELECT clause
- Add the GROUP BY clause to your code. After the GROUP BY keywords follow the variable that defines the groups.
The SAS code shows an example of how to calculate the median per Species.
proc sql; select Species, median(SepalLength) as median_SepalLength, median(SepalWidth) as median_SepalWidth from sashelp.iris group by Species; quit;
Create an Output Table with the Median
In contrast to PROC MEANS and PROC UNIVARIATE, creating an output dataset with the median with PROC SQL is easy and doesn’t require much additional code.
You can create a table with the CREATE TABLE clause. This clause starts with the CREATE TABLE keywords followed by the name of the output table and the keyword AS.
For example:
proc sql; create table work.medians_iris as select Species, median(SepalLength) as median_SepalLength, median(SepalWidth) as median_SepalWidth from sashelp.iris group by Species; quit;
Note that, opposed to PROC MEANS and PROC UNIVARIATE, with PROC SQL you can either create a report or an output table. It isn’t possible to create both with the same code.