A common task while analyzing your data is finding the minimum and maximum value of a variable (i.e., column). But, how do you find the minimum value of a variable in SAS?
The easiest way to find the minimum value of a variable in SAS is with the MIN() function in PROC SQL. This function returns the minimum value of a given column. You can also use the MIN() function to find the minimum value per group.
In this article, we discuss 5 methods to find the lowest value of a column, as well as the lowest value per group. The methods that you can use are PROC SQL, PROC MEANS, PROC SUMMARY, PROC UNIVARIATE, and PROC SORT in combination with a SAS DATA Step.
Besides finding the minimum value, we also demonstrate how to select the row with the (second) lowest value from a dataset. Again, we do this for the overall minimum value, as well as per group.
Do you also know? How to Find the Minimum Value per Row
Contents
5 Methods to Find the Minimum Value of a Variable in SAS
We use a sample dataset to demonstrate the 5 methods that you can use to find the minimum value of a variable. This dataset has one column my_value with values between -6 and 8. Notice that there is one missing value.
data work.my_data; input my_value; datalines; 4 -6 2 8 . 3 ; run; proc print data=work.my_data noobs; run;
Do you know? How to Create a SAS Dataset Manually
Method 1: PROC SQL
The first method to calculate the lowest value of a column is with PROC SQL.
PROC SQL is a SAS BASE procedure that you can use to execute SQL code. Hence, if you have experience with SQL this methods will the easiest for you.
You use the MIN() function to find the minimum value of a column in SAS. This function has one argument, namely the column of which you want to find the minimum, and returns the minimum value in this column. The MIN() function ignores missing values.
Below we demonstrate how to apply this method.
proc sql; select min(my_value) as min_of_my_value from work.my_data; quit;
By default, PROC SQL only creates a report. Instead, if you want to create a table with the minimum value you need to add a CREATE TABLE statement to your code.
proc sql; create table work.minimum_proc_sql as select min(my_value) as min_of_my_value from work.my_data; quit;
Method 2: PROC MEANS
The second method to calculate the lowest value of a variable in SAS is with PROC MEANS.
PROC MEANS is a powerful SAS BASE procedure that you can use to analyze columns in your data. By default, it returns the number of observations, the mean value, the standard deviation, the minimum value, and the maximum value. PROC MEANS ignores missing values.
This procedure is very easy to code. You only need to provide the input dataset (with the DATA option) and the variable you want to analyse (with the VAR statement).
proc means data=work.my_data; var my_value; run;
Like PROC SQL, PROC MEANS also generates by default only a report. If you need a dataset with the minimum value of the variable you’re analysing, you need an OUTPUT statement.
In the OUTPUT statement, you provide the name of the output dataset (OUT option) and the statistic you need. Here we use min=min_of_my_value to let SAS know to create a column called min_of_my_value that contains the minimum value of the my_value column.
proc means data=work.my_data; var my_value; output out=work.minimum_proc_mean min=min_of_my_value; run; proc print data=work.minimum_proc_mean noobs; run;
By default, PROC MEANS creates two additional columns in the output dataset, namely _TYPE_ and _FREQ_. You can remove these columns with the DROP option.
Do you know? How to Select Columns with the DROP Option
Method 3: PROC SUMMARY
PROC SUMMARY is the third method to find the lowest value of a column in SAS.
This procedure is very similar to PROC MEANS. It provides you with the same default statistics. The only difference is that it doesn’t create a report. You need to add the PRINT option to generate one.
With the SAS code below we create a default PROC SUMMARY report.
proc summary data=work.my_data print; var my_value; run;
If you need an output table with the minimum value, you need to add the OUTPUT statement. The syntax of the OUTPUT statement is the same as discussed previously for PROC MEANS.
Run the code below to create an output dataset with PROC SUMMARY.
proc summary data=work.my_data print; var my_value; output out=work.minimum_proc_summary min=min_of_my_value; run; proc print data=work.minimum_proc_summary noobs; run;
Like PROC MEANS, PROC SUMMARY also creates two extra columns (that you can remove with the DROP option).
Method 4: PROC UNIVARIATE
The fourth method to find the minimum value of a variable in SAS is with PROC UNIVARIATE.
Normally, PROC UNIVARIATE is used to assess the distribution of values of a given variable. It even carries out a test for normality. However, you can still use this procedure to calculate the minimum value of a variable.
Because this procedure calculates many statistics and carries out some hypothesis testing, this method can be rather slow compared to the previously discussed methods.
If you run the code below, SAS generates a report. Within the section Extreme Observations, you can find the minimum and maximum values.
proc univariate data=work.my_data ; var my_value; run;
Again, you need to add an OUTPUT statement to generate a table with the lowest value of a column. Fortunately, the output dataset doesn’t contain the two extra columns.
proc univariate data=work.my_data; var my_value; output out=work.minimum_proc_univariate min=min_of_my_value; run; proc print data=work.minimum_proc_univariate noobs; run;
Method 5: PROC SORT + SAS DATA Step
The fifth method to calculate the minimum value of a column is a two-step process. You will use PROC SORT and a SAS DATA Step.
The first step is to sort your dataset in ascending order with PROC SORT. You use the BY statement to define the variable that will be used to sort your data.
proc sort data=work.my_data out=work.my_data_srt; by my_value; run; proc print data=work.my_data_srt noobs; run;
The second step is to select the first row of the ascendingly ordered dataset. However, as you can see in the image above, PROC SORT doesn’t ignore missing values. So, before you select the first row, you have to filter out rows with missing values.
With the code below we select the first row (after we have ignored missing values) using an IF statement and the special variable _N_.
data work.minimum_data_step; set work.my_data_srt (where=(not missing(my_value))); if _N_ = 1 then output; run; proc print data=work.minimum_data_step noobs; run;
Do you know? How to Select the First Rows of a Table
3 Methods to Find the Minimum Value of a Variable by Group in SAS
Besides finding the overall minimum value of a column, it’s a common task to analyze the lowest value per group. So, how do you find the minimum value of a variable per group in SAS?
For the examples in this section, we use another example dataset. This dataset has 6 rows which are split into two groups, namely A and B. We want to find the minimum of each group. Notice that in group B there is a missing value.
data work.my_data_groups; input my_group $ my_value; datalines; A 4 A -6 B 2 B 8 B . A 3 ; run; proc print data=work.my_data_groups noobs; run;
Method 1: PROC SQL
The easiest method to find the minimum value per group in SAS is with PROC SQL.
You use the MIN() function and the GROUP BY statement to calculate the minimum value. With the GROUP BY statement, you define the variable(s) that will define your groups. This variable needs to be present in the SELECT statement, too.
With the SAS code below, we calculate the minimum value per group. If you want to create an output table instead of a report, you need to add the CREATE TABLE statement.
proc sql; select my_group, min(my_value) as min_of_my_value from work.my_data_groups group by my_group; quit; proc sql; create table work.minimum_group_proc_sql as select my_group, min(my_value) as min_of_my_value from work.my_data_groups group by my_group; quit;
An advantage of this method is that you can easily extend the code to find the minimum value of subgroups. You only need to add the extra variables that form the subgroup to the SELECT statement and the GROUP BY statement.
Method 2: PROC MEANS, PROC SUMMARY, PROC UNIVARIATE
A second method to calculate the minimum value per group is with the PROC MEANS, PROC SUMMARY, or PROC UNIVARIATE procedures. If you use one of these procedures, you need two steps.
First, you need to order your dataset by the variable that defines the groups, for example with PROC SORT.
proc sort data=work.my_data_groups out=work.my_data_groups_srt; by my_group; run; proc print data=work.my_data_groups_srt noobs; run;
Then, you choose one of the procedures to calculate the minimum per group. With the VAR statement, you define the variable of which you want to know the minimum. To define the groups, you use the BY statement.
/* PROC MEANS */ proc means data=work.my_data_groups_srt; var my_value; by my_group; output out=work.minimum_group_proc_mean min=min_of_my_value; run; /* PROC SUMMARY */ proc summary data=work.my_data_groups_srt print; var my_value; by my_group; output out=work.minimum_group_proc_summary min=min_of_my_value; run; /* PROC UNIVARIATE */ proc univariate data=work.my_data_groups_srt; var my_value; by my_group; output out=work.minimum_group_proc_univariate min=min_of_my_value; run;
Method 3: PROC SORT + SAS DATA Step
The third method to find the minimum value of a group is with PROC SORT and a SAS DATA Step.
First, you order your dataset ascendingly by the variable the defines the groups in your data and by the variable of which you want to know the minimum.
proc sort data=work.my_data_groups out=work.my_data_groups_srt; by my_group my_value; run; proc print data=work.my_data_groups_srt noobs; run;
Then, you select the first row of each new group. However, SAS doesn’t ignore missing values. So, before selecting the first row per group, you need to filter out the rows with missing values.
In the SAS code below, we use an IF statement and the FRIST keyword to select each group’s first row (ignoring missing values in the my_value column).
data work.minimum_group_data_step; set work.my_data_groups_srt (where=(not missing(my_value))); by my_group; if first.my_group then output; run; proc print data=work.minimum_group_data_step noobs; run;
How to Select the Row with the Lowest Value
Thus far, we have discussed methods to find the lowest value of a group. These methods work fine unless you need to select the complete row with the lowest value. That is to say, all columns. So, how do you select the complete row with the lowest value?
You can select the row with the lowest value of a given column with PROC SQL and a SAS DATA Step. In PROC SQL, you need to use the HAVING clause. This clause allows you to filter on aggregated functions, such as the MIN() function.
We use a sample dataset to illustrate both methods. This dataset has 6 rows and 4 columns (first_name, last_name, gender, and age). We want to select the row with the lowest age. Note that there’s one row with a missing age.
data work.my_data; input first_name $ last_name $ gender $ age; datalines; Maria Williams F 40 John Smith M 42 Jess Jordan F 28 Michelle Young F 35 Peter Owen M . George Olsen M 37 ; run; proc print data=work.my_data noobs; run;
As you can see in the table above, Jess is the youngest person in our dataset. This is the row we want to select.
PROC SQL
You need 3 steps to select the row with the lowest value with PROC SQL in SAS:
- You need the SELECT * (asterisk) statement to select all columns.
- With the FROM statement to specify the input dataset.
- You use the HAVING statement to filter only the row with the lowest value.
Although you normally filter data in SQL with the WHERE statement, here you need the HAVING statement. In this case, you filter based on a aggregate function, i.e. the MIN() function. Functions such as the MIN() function can only be used to filter data with the HAVING statement.
proc sql; select * from work.my_data having age = min(age); quit;
As you can see in the image above, we selected the complete row with the lowest age. Note that, PROC SQL ignores missing values when it calculates the minimum value.
SAS DATA Step
The second method to select the row with the lowest value is with PROC SORT and a SAS DATA Step. This is a two-step process. First, you order your data ascendingly by the column you want to use to filter your data. Then you use an IF statement and the special _N_ variable to filter the first row.
In the SAS code below, we use PROC SORT to order our dataset ascendingly based on the age column.
proc sort data=work.my_data out=work.my_data_srt; by age; run; proc print data=work.my_data_srt noobs; run;
Note that PROC SORT doesn’t ignore missing values. In fact, according to the table above the missing value is the lowest value. However, the row we want to select is the second row. We have to account for this in the next step.
Here we filter the first row of our ordered dataset. We do this with the IF statement and the _N_ keyword. We use the WHERE=-option in the input dataset to ignore rows with missing data.
data work.select_min_value; set work.my_data_srt (where=(not missing(age))); if _N_ = 1 then output; run; proc print data=work.select_min_value noobs; run;
Do you know? How to Select the First Row of a Dataset
How to Select the Row with the Lowest Value by Group
A variation to the problem above is how to select the row with the lowest value by group? Again, this problem can be easily solved with PROC SQL and a SAS DATA Step.
In the example below, we want to find the youngest person (i.e., lowest age) per gender.
PROC SQL
To select the row with the lowest value of a specific variable within a group, we need to modify the code mentioned in the previous section slightly. We only need to add the GROUP BY statement. With this statement we specify the variable that defines the groups.
proc sql; select * from work.my_data group by sex having age = min(age); quit;
SAS DATA Step
To select the row with the lowest value for each per gender, we need to order our dataset first by these two variables. Then, we use the IF statement and the FIRST keyword to select one row per group. Keep in mind that you need ignore explicitly the missing values.
proc sort data=work.my_data out=work.my_data_srt; by sex age; run; proc print data=work.my_data_srt noobs; run; data work.select_min_value; set work.my_data_srt (where=(not missing(age))); by sex; if first.sex then output; run; proc print data=work.select_min_value noobs; run;
How to Select the Row with the Second Lowest Value
The last problem we will discuss in this article is how to select the row with the second lowest value?
Although in most cases you want to select the row with the lowest value, sometimes it is necessary to get the row with the second-lowest value. You can filter this row with PROC RANK and a simple IF statement.
The PROC RANK procedure assigns a rank to each row based on the value of a specified variable (ignoring missing values). With the SAS code below we find the rank for each row based on the age variable. We save the rank of each row in the new variable rank_of_age.
Finally, to select the row with the second-lowest value, we need to filter the row with rank equal to 2. We do this with a simple IF statement.
proc rank data=work.my_data out=work.ranked_age; var age; ranks rank_of_age; run; data work.select_second_lowest_value; set work.ranked_age; if rank_of_age = 2 then output; run; proc print data=work.select_second_lowest_value noobs; run;
Do you know? How to Rank Data in SAS
3 thoughts on “How to Find the Minimum Value of a Variable (by Group) in SAS”
Comments are closed.