In this article, we discuss how to calculate the Cumulative Sum and the Cumulative Sum per Group in SAS. A Cumulative Sum, also known as the Running Total, is a sum of items that changes each time you add a new item. We show two examples and support them with SAS code and images.
Calculate the Cumulative Sum
First, we discuss how to create a new column that contains the cumulative sum. In our example below, we have a small dataset of four rows. Each row contains the revenue of a specific day. We want to create a new column with the total revenue until that day.
By default, the first value of the cumulative sum is equal to the value in the first row. So, in our example, both revenue and cumulative revenue are 100 for January, 1st 2020. In SAS, you can use an IF statement and the _N_ variable to check if you are at the first row. This special variable _N_ stores the row number of the row that is currently being processed by SAS.
After the first row, we need to add the value of the revenue column in the current row to the value of the cumulative revenue column of the previous row. As you know, SAS reads and processes data row by row. When a row has been processed, SAS “forgets” everything about this row. So, in a normal situation, SAS won’t “remember” the value of the previous cumulative revenue, and hence the new cumulative revenue can’t be calculated. However, you can use the RETAIN statement to let SAS “remember” information about the previous row when processing the current row. To do so we write retain cumulative_sum.
data work.revenue; input date :date9. revenue; format date date9.; datalines; 01jan2020 100 02jan2020 120 03jan2020 80 04jan2020 50 ; run; data work.cum_sum; set work.revenue; retain cumulative_revenue; if _n_ = 1 then cumulative_revenue = revenue; else cumulative_revenue = cumulative_revenue + revenue; run;
Calculate the Cumulative Sum per Group
In this section, we demonstrate how to calculate the cumulative sum per group in SAS.
In this example, we use a small dataset with revenue data. Unlike the previous example, this dataset has 2 groups (A and B). For each group, we want to calculate the cumulative revenue. We do this in 2 steps.
STEP 1: Order Your Dataset
Since SAS processes data row by row and the RETAIN statement only remembers the value of the previous row, we need to order our dataset first before we can calculate the cumulative sum per group. To order our dataset we use the PROC SORT procedure. We order the data in ascending order by Group and Date.
data work.revenue_by_group; input group $ date :date9. revenue; format date date9.; datalines; A 01jan2020 100 B 01jan2020 50 A 02jan2020 120 B 02jan2020 60 A 03jan2020 80 B 03jan2020 60 A 04jan2020 50 B 04jan2020 100 ; run; proc sort data=work.revenue_by_group out=work.revenue_by_group_srt; by group date; run;
STEP 2: Calculate the Cumulative Sum by Group
Now that we have ordered the dataset by Group, we can calculate the cumulative sum. Like the previous example, we use the RETAIN statement and IF statement. However, we don’t use the _N_ variable to determine which row SAS is currently processing.
Since we want to calculate the cumulative sum per group, we need to know the first row per group, not the first row of the complete dataset. For this reason, we use the FIRST.variable. This variable is 1 if SAS processes the first row of a group, and 0 otherwise. If SAS encounters the first row of a new group, the cumulative_revenue variable is reset.
data work.cum_sum_by_group; set work.revenue_by_group_srt; by group; retain cumulative_revenue; if first.group then cumulative_revenue = revenue; else cumulative_revenue = cumulative_revenue + revenue; run;