Uncategorized

How to Calculate the Cumulative Percentage by Group in SAS

A cumulative percentage, or running total of the percentage, divides the cumulative frequency by the total number of observations. In this article, we discuss how to calculate the cumulative percentage (per group) in SAS.

You can calculate the cumulative percentage in SAS with the frequency procedure (PROC FREQ). This procedure generates a report with frequencies and percentages. Optionally, it can create an output dataset.

In this article, we discuss how to calculate the cumulative percentage (per group), and how to create an output dataset.

Calculate the Cumulative Percentage in SAS

The SAS code below generates a sample dataset with test scores. We will use this dataset to demonstrate how to calculate the cumulative percentage with the PROC FREQ procedure.

data work.test_scores;
    input test_score;
    datalines;
9
6
8
8
5
9
8
6
6
8
;
run;
 
proc print data=work.test_scores noobs;
run;

This is how you calculate the cumulative percentage in SAS in 3 simple steps:

  1. The FREQUENCY Statement

    This statement starts the frequency procedure. With the DATA=-option you can provide the input dataset. If you don’t specify the input dataset, SAS will use the last create dataset.

  2. The TABLE Statement

    With this statement, you specify the variable of which you want to calculate the (cumulative) frequency and (cumulative) percentage. Instead of the TABLE statement, you can also use the TABLES statement.

  3. The RUN Statement

    This statement finishes and executes the PROC FREQ procedure.

proc freq data=work.test_scores;
    table test_score;
run;

By running the example code above, SAS generates a “FREQ Procedure” report. This report summarizes for each test_score the:

  • Frequency
  • Percentage
  • Cumulative Frequency
  • Cumulative Percentage

By default, this report is ascendingly ordered the variable specified in the TABLE statement. However, you can use the ORDER=-option to change this behavior.

Although it’s not the case in this example, the PROC FREQ procedure ignores missing values. However, if you want to take missing values into account for calculating frequencies and percentages, you can use the MISSING option.

Calculate the Cumulative Percentage by Group in SAS

The following sample dataset contains the test scores of two groups (A and B). We will use this table to demonstrate how to calculate the cumulative percentage per group.

data work.test_scores_by_group;
    input group $ test_score;
    datalines;
A 9
A 6
A 8
A 8
A 5
A 9
A 8
A 6
A 6
A 8
B 2
B 2
B 4
B 1
B 2
B 1
B 4
B 2
B 1
B 3
;
run;
 
proc print data=work.test_scores_by_group noobs;
run;

You can use the PROC FREQ procedure in SAS to calculate the cumulative percentage per group. To do so, you need to provide 2 statements:

  • The TABLE statement to specify the variable you want to analyse.
  • The BY statement to specify the variable that defines the groups.

You can use the DATA=-option to define the input dataset. If you don’t use this option, SAS will use the last dataset that has been created.

proc freq data=work.test_scores_by_group;
    table test_score;
    by group;
run;

If you execute the SAS code above, SAS creates a frequency report for each group in your data. In the example below, SAS creates one report for group A and one for group B. The report contains the (cumulative) frequency and (cumulative) percentage.

Create a SAS Dataset with the Cumulative Percentage

By default, the PROC FREQ procedure only generates a report. You can use the OUT=-option to create a SAS dataset with the results of the procedure.

The OUT=-option is part of the TABLE statement and specifies the name of the output dataset. By default, the output dataset only contains the frequency and percentage. The OUTCUM keyword adds the cumulative frequency and cumulative percentage to the dataset, too.

proc freq data=work.test_scores;
    table test_score / out=work.cumulative_pct outcum;
run;
 
proc print data=work.cumulative_pct noobs;
run;

By default, the PROC FREQ procedure always generates a report. If you only want to create an output table, you can use the NOPRINT option. You need to place this option in the FREQ statement after the DATA=-option.

proc freq data=work.test_scores_by_group noprint;
    table test_score / out=work.cumulative_pct_by_group outcum;
    by group;
run;
 
proc print data=work.cumulative_pct_by_group noobs;
run;

Do you know? How to Change Column Names

Calculate the Cumulative Percentage of a Summed Column

In the examples above, we have calculated the cumulative percentage while counting observations. However, you can also calculate the cumulative percentage for a summed column. Below, we show how to do this.

With the following SAS code we create a dataset that contains the daily revenue for four days.

data work.revenue;
    input date :date9. revenue;
    format date date9.;
    datalines;
01jan2020 100
02jan2020 120
03jan2020 80
04jan2020 50
;
run;
 
proc print data=work.revenue noobs;
run;

To this table we want to add two new columns:

  • The percentage that the daily revenue represents compared to the total revenue.
  • The cumulative percentage of revenue of previous days.

Calculating the cumulative percentage of the revenue column is a three-step process. First, you need to calculate the total revenue. Then, you calculate the percentage that each daily revenue represents compared to the total revenue. Finally, you can calculate the cumulative percentage.

In the example below, we calculate the column sum and the percentage of each day.

proc sql;
    create table work.revenue_pct as
        select date,
            revenue,
            sum(revenue) as total_revenue,
            revenue / sum(revenue) as revenue_pct
        from work.revenue;
quit;
 
proc print data=work.revenue_pct noobs;
run;

Do you know? 5 Ways to Calculate the Column Sum

Now that we’ve calculated the percentage per day, we can create a new column with the cumulative percentage. We will do this with the RETAIN statement and an IF statement.

For the first row in our dataset, the cumulative percentage is equal to the row percentage. You can check if SAS is processing the first row with the IF statement and the special variable _N_.

In all other situations, the cumulative percentage is the sum of the row percentage and the previous cumulative percentage. You can use the RETAIN statement to help SAS “remember” the previous value of the cumulative percentage.

data work.cum_pct;
    set work.revenue_pct;
 
    retain cumulative_pct;
 
    if _n_ = 1 then cumulative_pct = revenue_pct;
    else cumulative_pct = cumulative_pct + revenue_pct;
run;
 
proc print data=work.cum_pct noobs;
run;

Calculate the Cumulative Percentage by Group of a Summed Column

You can also calculate the cumulative percentage of a summed column for two or more groups. The dataset below contains the revenue of four days for two groups (A and B).

data work.revenue_by_group;
    input group $ date :date9. revenue;
    format date date9.;
    datalines;
A 01jan2020 100
B 01jan2020 50
A 02jan2020 120
B 02jan2020 60
A 03jan2020 80
B 03jan2020 60
A 04jan2020 50
B 04jan2020 100
;
run;
 
proc print data=work.revenue_by_group noobs;
run;

The first step to calculate the cumulative percentage per group is to calculate the row percentages. For this purpose, we need a new column (total_revenue) that calculates the sum of the revenues per group. You can do this with PROC SQL and the SUM option.

Once you have the total revenue per group, you can calculate the percentage of each row.

proc sql;
    create table work.revenue_by_group_pct as
        select group,
            date,
            revenue,
            sum(revenue) as total_revenue,
            revenue / sum(revenue) as revenue_pct
        from work.revenue_by_group
        group by group
        order by group, date;
quit;
 
proc print data=work.revenue_by_group_pct noobs;
run;

Finally, you can create a new column with the cumulative percentage per group. To do so, you need the RETAIN statement and the IF statement.

If SAS is processing the first row of a group, then the cumulative percentage is equal to the row percentage. You can use the FIRST keyword to check this.

In all other cases, the cumulative percentage is the sum of the row percentage and the previous cumulative percentage. Normally, SAS “forgets” everything about previously processed rows. However, with the RETAIN statement you can help SAS “remember” the previous value of the cumulative percentage.

data work.cum_pct_by_group;
    set work.revenue_by_group_pct;
    by group;
 
    retain cumulative_pct;
 
    if first.group then cumulative_pct = revenue_pct;
    else cumulative_pct = cumulative_pct + revenue_pct;
run;
 
proc print data=work.cum_pct_by_group noobs;
run;