SAS How To's

How to Count the Number of Observations per Group in SAS

In a previous article, we explained 4 methods to count the number of observations in a dataset. However, sometimes you need the number of observations per group instead of the total number of observations. So, how do you count the number of observations per group in SAS?

You can count the number of observations per group (i.e., per variable) in a SAS dataset with PROC SQL. You need the COUNT function and the GROUP BY statement to make it work. With the GROUP BY statement, you define the groups.

Besides PROC SQL, you can also use PROC FREQ and a SAS DATA Step to calculate the number of rows per group. Although these methods are more complex, they provide some advantages over PROC SQL.

In this article, we discuss 3 methods you can use to count the number of rows by one variable, as well as the number of observations by multiple variables

Do you also know? 5 Ways to Calculate the Sum per Group and Average per Group

In the examples in this article, we use the CARS dataset from the SASHELP library. This dataset contains many variables with information about cars. We will use the columns Type and Origin to demonstrate how to count the observations per group.

Here we show the 10 first rows of the CARS dataset.

proc print data=sashelp.cars (keep=Make Model Type Origin obs=10) noobs;
run;
SAS CARS dataset

Do you know? How to Select the First 10 Rows of a Dataset

Count the Number of Observations by Group

There are several methods to count the number of observations per group in SAS. Here we discuss 3 of them; PROC SQL, PROC FREQ, and a DATA Step.

Method 1: Count Observations by Group with PROC SQL

The easiest method to find the number of observations per group is with PROC SQL.

PROC SQL is a powerful SAS Base procedure that you can use to process SQL statements. So, if you have experience with SQL this will be your preferred method.

The syntax of this method consists of 6 steps:

  1. With the PROC SQL statement, you start the procedure.
  2. After the SELECT statement follows the column you want to use to group the observations by. Here we use the Type column.
  3. With the COUNT function, SAS counts the number of observations.
  4. After the FROM statement, you define the name of the input dataset.
  5. You use the GROUP BY statement to let SAS know which variable to use to group observations.
  6. With the QUIT statement, you end the procedure.
proc sql;
    select type,
        count(*) as N
    from sashelp.cars
    group by type;
quit;

By default, the output is ordered in ascending order according to the appears of the column(s) in the GROUP BY statement.

Count the number of observations per group with PROC SQL in SAS

If you want to create an output table with the number of observations, you need to add a CREATE TABLE statement to the code above.

Method 2: Count Observations by Group with PROC FREQ

The second method to count the observations per group is by using the PROC FREQ procedure.

The FREQ procedure is a SAS procedure for analyzing the count of data. You can count the observations per group with the TABLE statement. The results are ordered in alphabetical order.

The SAS code below uses PROC FREQ to count the number of observations per Type.

proc freq data=sashelp.cars;
    table type;
run;
Count the number of observations per group with PROC FREQ in SAS

As you can see, PROC FREQ doesn’t only provide the count per group, but also the percentage, the cumulative frequency, and cumulative percentage. If you want to remove these statistics, you can add the NOPERCENT and NOCUM options.

proc freq data=sashelp.cars;
    table type / nopercent nocum;
run;

Do you know? How to Sort the Output of PROC FREQ by Frequency

Method 3: Count Observations by Group with a DATA Step

The third method to calculate the number of observations per variable is with a SAS DATA Step.

This method requires more coding and is not (that) easy to understand. However, it could provide you with a running count per group.

The first step is to sort your data by the variable you want to use to group the observations. You can do this with PROC SORT.

The second step is a SAS DATA Step. Since SAS processes row by row, we create a counter to count the number of observations per group. If SAS processes the first row of a new group, the counter is set to one again.

We create the counter with the RETAIN statement. The RETAIN statement “remembers” the last value of the counter when SAS starts processing a new row.

With the IF statement and the FIRST keyword, we check if SAS is processing the first observation of a new group.

Finally, we use the IF statement and the LAST keyword to only output the last row per group to the output dataset. You can create a table with the running count by group by omitting the last IF statement.

proc sort data=sashelp.cars 
    out=work.cars_ordered;
    by type;
run;
 
data work.observations_by_group;
    set work.cars_ordered;
    by type;
 
    retain N;
 
    if first.type then N = 1;
    else N = N + 1;
 
    if last.type then output;
 
    keep type N;
run;

Here we create a running count by the Type group.

proc sort data=sashelp.cars 
    out=work.cars_ordered;
    by type;
run;
 
data work.observations_by_group;
    set work.cars_ordered;
    by type;
 
    retain N;
 
    if first.type then N = 1;
    else N = N + 1;
 
    keep make model type N;
run;
A running count per group in SAS

Count the Number of Observations by Multiple Groups

Thus far, we have demonstrated how to count the number of observations grouped by one variable. Now, we will discuss how to count the observations grouped by multiple variables.

Method 1: Count Observations by Multiple Groups with PROC SQL

You can use PROC SQL to group your data by multiple variables and count the number of observations.

You use the SELECT statement to show the names of the variables you use to group your data. With the GROUP BY statement, you let SAS actually know which variables to use to group your data.

Here we use two variables to group the CARS dataset (Type and Origin). However, PROC SQL doesn’t have a limit to the number of variables to group your data.

proc sql;
    select type,
        origin,
        count(*) as N
    from sashelp.cars
    group by type, origin;
quit;
Count the number of observations per multiple groups with PROC SQL in SAS

Method 2: Count Observations by Multiple Groups with PROC FREQ

You can also use PROC FREQ to count the number of observations by two variables.

After the PROC FREQ statement, you use the DATA option to define the name of your input dataset. With the TABLE statement, you specify the two variables you want to use to group your data. It is important to place an asterisk between the two variables.

With the code below we count the number of observations grouped by the variables Type and Origin.

proc freq data=sashelp.cars;
    table type*origin;
run;
Count the number of observations per multiple groups with PROC FREQ in SAS

By default, SAS creates a report with the Frequency, Percentage, Cumulative Frequency, and Cumulative Percentage. If you are only interested in the Frequency, then you need to add the NOPERCENT, NOROW, and NOCOL options.

proc freq data=sashelp.cars;
    table type*origin / nopercent norow nocol;
run;

Do you know? How to Create an Output Table in PROC FREQ

Method 3: Count Observations by Multiple Groups with a DATA Step

The third method to count the number of observations by multiple variables is with a SAS DATA Step. This method is more complex but can provide you with a running count per group.

Firstly, you sort your data by the variables you want to use to group your data.

Secondly, you use a SAS DATA Step to create a counter. This counter determines the position of each observation in the group. Consequently, the last observation per group contains the total number of observations per group.

Lastly, you can use an IF statement and the LAST keyword to only output the last row per group.

proc sort data=sashelp.cars 
    out=work.cars_ordered;
    by type origin;
run;
 
data work.observations_by_group;
    set work.cars_ordered;
    by type origin;
 
    retain N;
 
    if first.origin then N = 1;
    else N = N + 1;
 
    if last.origin then output;
 
    keep type origin N;
run;

Do you want to learn more useful How To’s?