If you work with data, you can calculate the highest value per column or row. In a previous article, we discussed how to find the maximum value per column. Now, we will explain how to find in SAS the maximum value per row across multiple columns.
You can find the highest value of a row in SAS with the MAX function. This function returns the maximum value based on its inputs. As input, you can use column names, constants, and SAS expressions. The MAX function works in a SAS DATA Step.
In this article, we demonstrate how to use the MAX function and show some examples. We also use arrays to select multiple columns and make your code more robust and efficient.
Do you know? How to Find the Minimum Value of a Row
Contents
How to Find the Maximum Value of a Row with Two Columns in SAS
In our first example, we show how to find the maximum value of a row with two columns.
First, we create a SAS dataset that has two columns (first_value and second_value) and 6 rows. For each row, we want to find the maximum value of the two columns. Note that this dataset has missing values.
Do you know? 5 Ways to Replace Missing Values
/* SAMPLE DATA 1 */ data work.my_ds1; input first_value second_value; datalines; 1 3 6 4 4 -2 5 9 7 . . -6 ; run; proc print data=work.my_ds1 noobs; run;
To find the highest value in a row with two columns you can use the MAX function in a DATA Step. The MAX function takes as arguments the names of the columns and returns the highest values. The column names must be separated by a comma. Missing values are ignored.
In the example below, we create a new column highest_value that will be assigned the maximum value of the columns first_value and second_value.
/* Maximum Value of 2 Columns */ data work.highest_value; set work.my_ds1; highest_value = max(first_value, second_value); run; proc print data=work.highest_value noobs; run;
As you can see in the image above, the MAX function returns the highest value of the two columns and ignores missing values.
How to Find the Maximum Value in a Row with Multiple Columns in SAS
In the example above, we found the maximum value of two columns. However, you can also use the MAX function to find the highest value in a row with multiple columns. How to do that will be shown here.
To demonstrate how to find the maximum value across multiple columns we create a new sample dataset. This dataset has four columns (first_value, second_value, third_value, and fourth_value) and 6 rows.
Do you know? How to Create a SAS Data Set from Scratch
/* SAMPLE DATA 2 */ data work.my_ds2; input first_value second_value third_value fourth_value; datalines; 1 3 4 2 6 4 1 . 4 -2 3 0 5 9 -3 -1 7 . 5 -2 . -6 8 3 ; run; proc print data=work.my_ds2 noobs; run;
We can apply the same strategy as in the previous section to find the highest value. That is, explicitly specifying the names of all the columns you want to take into account. With the SAS code below, we show how to do that.
/* Minimum Value of Multiple Columns */ data work.highest_value; set work.my_ds2; highest_value = max(first_value, second_value, third_value, fourth_value); run; proc print data=work.highest_value noobs; run;
In the example above, we found the maximum value of a row with four columns. We did this by explicitly specifying the column names in the MAX function. This works fine if you want to compare a few columns, but it can be a tedious task if you have many columns. Fortunately, there is a way to do this more efficiently.
You can use a SAS array to store all column names and use it as an argument of the MAX function. The MAX function will compare all the columns in the array and return the highest value.
An extra advantage of the array is that you don’t have to specify all columns explicitly that will be part of it. If you want to specify a range of columns (based on their position), you only need to specify the first and last column name separated by a double dash (i.e., –).
In the example below, we first define an array called columns. This array contains all columns between first_value and fourth_value (both included). After that, we use the MAX function to find the highest value across all columns in our array columns.
data work.highest_value; set work.my_ds2; array columns first_value -- fourth_value; highest_value = max(of columns[*]); run; proc print data=work.highest_value noobs; run;
As you can see, we found the highest value across four columns without the need of explicitly specifying all column names. We’ve used an array instead. This makes our code more efficient.
Do you know? How to Count the Number of Missing Values
How to Find the Maximum Value Across Multiple Columns with the Same Name
In the previous section, we demonstrated how to use an array to find the highest value of a row with multiple columns. We used the first and last column to specify the range of the columns. However, how do you calculate the lowest value of columns with a similar name?
To demonstrate this we create a new dataset with 6 columns and 6 rows. The columns are called value1, value2, value3, value4, value5, and value6. We want to find the highest value of these columns for each row.
/* SAMPLE DATA 3 */ data work.my_ds3; input value1 - value6; datalines; 1 3 4 2 1 3 6 4 1 . 0 2 4 -2 3 0 -1 7 5 9 -3 -1 2 6 7 . 5 -2 -4 5 . -6 8 3 8 4 ; run; proc print data=work.my_ds3 noobs; run;
Again, we make our code more efficient by using an array. This array will also contain a range of columns. However, this array contains columns based on their names. We use a single dash to create an array that contains all columns that are “alphabetically between” the columns value1 and value6. So, in this case, the position of these columns in the dataset doesn’t matter.
In the example below we use this method.
data work.highest_value; set work.my_ds3; array columns value1 - value6; highest_value = max(of columns[*]); run; proc print data=work.highest_value noobs; run;
This method is very useful when you want to find the highest value across multiple columns that have similar names. For example, all columns that start with the same prefix.
Do you know? How to Rename Variable with the Same Prefix
How to Find the Maximum Value of All Numeric Columns
Until now, we have explicitly specified the column names or the range of columns of which we want to find the maximum. But, how do you find the maximum value across all numeric columns in SAS?
You use the MAX function and the _numeric_ keyword to find the highest value of all numeric columns in a row. First, you define the name of the new variable follwed by an equal sign. Then, you start the MAX function. Finally, your use “of _numeric_” as the argument of the MAX function.
We create a sample dataset to demonstrate how to do this. This dataset has 5 columns and 6 rows. As you can see, the column names are completely different.
/* SAMPLE DATA 4 */ data work.my_ds4; input ColA VarB Age Salary X; datalines; 1 3 4 2 1 6 4 1 . 0 4 -2 3 0 -1 5 9 -3 -1 2 7 . 5 -2 -4 . -6 8 3 8 ; run; proc print data=work.my_ds4 noobs; run;
Instead of creating an array with a range of columns (with the double dash), we can use the special keyword _numeric_. In the SAS code below, we use this keyword as the argument of the MAX function to find the maximum value of all numeric columns in a row.
data work.highest_value; set work.my_ds4; highest_value = max(of _numeric_); run; proc print data=work.highest_value noobs; run;
As you can see, the column lowest_value contains the highest value of all (numeric) columns per row.
How to Find the Column Name of the Column with the Highest Value
To finish this article, we demonstrate not only how to find the highest value, but also how to retrieve the name of the column with the highest value. This might be useful if you want to save the name of the column as a macro variable.
There are 4 steps to find the name of the column with the highest value:
- Create an array of all numeric columns.
- Find the highest value with the MAX function.
- Find the position of the column with the highest value.
- Convert the position of the column in the name of the column.
As discussed in the sections above, you can find the highest value with the MAX function and an array.
To find the position of the column with the highest value we use the WHICHN function. This function searches for a numeric value that is equal to the first argument and returns its position in the array (the second argument).
Finally, you use the VNAME function to convert the position of the column to the name of the column.
data work.highest_value; set work.my_ds4; array columns _numeric_; highest_value = max(of columns[*]); index_highest_value = whichn(highest_value, of columns[*]); column_highest_value = vname(columns[index_highest_value]); run; proc print data=work.highest_value noobs; run;
As you can see in the last row, two columns contain the highest value, namely Age and X. The method we showed here returns the first column (reading from left to right) where it encounters the maximum value.
Do you know? How to Rename Variables in SAS