When you work with data, there will be opportunities that you need to know the frequency of a given item. In this article, we discuss how to create frequency tables in SAS with PROC FREQ. We cover the basic syntax and its most frequently used options, such as how to order a frequency table and how to create an output data set with frequencies.
Throughout this article, we will use the Baseball data set from the SASHELP library. This data set contains information about the performance and salary of Major League Baseball players from the 1986 season. For our examples, we will mainly use the Team and Salary columns.
proc print data=sashelp.baseball; run;
PROC FREQ in SAS
The syntax of the FREQ procedure is relatively straightforward. The only obligatory argument is the SAS data set you want to use to create frequency tables. By default, SAS creates a frequency table for each variable of the data set. The code and images below show the results of this default behavior.
proc freq data=sashelp.baseball; run;
The SAS code above creates frequency tables with 5 columns. Namely, the frequency of each value, the percentage of each value, and the cumulative frequency and percentage. Later on in this article, we show how to suppress the cumulative frequency and cumulative percentage.
Frequency Tables for Specific Variables
The Baseball data set contains 24 columns. However, we are only interested in the frequencies of the values in the columns Team and Salary. With the tables statement, you can specify the columns for which you want a frequency table. For example, the SAS code below shows how to create two frequency tables.
proc freq data=sashelp.baseball; tables Team Salary; run;
Order Frequency Tables in SAS
By default, SAS sorts the frequency tables in an alphabetical (character columns) or ascending order (numerical columns). However, it might be useful to order the frequency table by the frequencies. You can use the order option to do this. For instance, with the code below, we order the frequency table in a descending order based on the frequency of each team in the Baseball data set.
proc freq data=sashelp.baseball order=freq; tables Team; run;
If you use the order keyword and the tables statement with more that one variable, then all frequency tables are sorted in a descending order.
Handle Missing Values in Frequency Tables
If you work with data, missing values are common. Depending on your needs, you can treat missing values differently with PROC FREQ. By default, SAS ignores the missing values. That is to say, the values aren’t shown in the output table as a separate value and aren’t taken into account to calculate the (cumulative) frequency and (cumulative) percentage. Only at the bottom of the frequency table, the number of missing value is mentioned.
proc freq data=sashelp.baseball order=freq; tables salary; run;
Unlike the previous example, you can use the missing option to include the missing values. The missing option treats the missing values as a separate value and includes them in the (cumulative) percentage column. The SAS code below demonstrates how to use this option.
proc freq data=sashelp.baseball order=freq; tables salary / missing; run;
Finally, you can use the missprint option to only show the missing values in the frequency table. This option won’t include the missing values in the calculation of the percentage, cumulative frequency, and cumulative percentage columns.
proc freq data=sashelp.baseball order=freq; tables salary / missprint; run;
Output Table with Frequencies
One of the most frequenctly asked questions regarding PROC FREQ is how to create a data set with the frequencies. You can simply create an output table with the out= option. For example, with the code below, we create an output table with frequencies for the Team variable and the Division variable.
proc freq data=sashelp.baseball; tables Team /out=work.team_freq; tables Division /out=work.division_freq; run;
By default, the output data set contains the three columns: the variable, the count (frequency), and the percentage. However, as shown before, the FREQ procedure can also calculate the cumulative frequency and the cumulative percentage. With the outcum option, you can add these two columns to the output data set.
proc freq data=sashelp.baseball; tables Team /out=work.team_freq outcum; run;
Create Frequencies Based on Formats
Lastly, you can create frequency tables in SAS based on a format. In the example below, we create 4 salary categories: Low, Medium, High, and Unknown. Based on these new categories we create the frequency tables.
You use the format statement to apply the format before creating the frequency tables. So, in this example, we apply the format salary_group to the variable salary. To include the Unknown group in the output we added the missprint option.
proc format; Value salary_group . = "Unknown" 1 - 500 = "Low" 501 - 1000 = "Medium" 1001 - high = "High" ; run; proc freq data=sashelp.baseball order=freq; tables salary / missprint; format salary salary_group.; run;
Select Top N Frequencies
With all the examples discussed above, we can answer the question how to create an output table with the top N frequencies. The sas code below combines the tables statement, the order option, and the out option. Once we have created a SAS data set sorted in a descending order by the frequencies, we use the a data step and the obs option to select the first N rows (in this case 5).
proc freq data=sashelp.baseball order=freq; tables Team /out=work.team_freq; run; data work.team_freq_top_5; set work.team_freq (obs=5); run;