SAS How To's

How to Count the Number of Missing Values in SAS

If you work with surveys, you need to know how to deal with missing data. Although it’s crucial to know how to replace missing data, it’s even more important to know the number of missing data.

So, how do you count the number of missing values in SAS? You can use the PROC FREQ procedure to count the number of missing values per column. If you want to know the number of missing values per row, you need to NMISS function or the CMISS function.

In the remainder article, we discuss in more detail how to count the number of missing values per column and per row. We give examples of how to use the PROC FREQ procedure, the NMISS function, and the CMISS functions. We support all examples by images and SAS code.

Sample Data Missing Values

For the examples in this article, we have created the dataset below. The dataset contains three character columns and three numeric columns. The columns are conveniently named char_var1 – char_var3 and num_var1 – num_var3.

SAS Dataset with Missing Values
data my_ds;
    infile datalines dlm=",";
    input char_var1 $ char_var2 $ char_var3 $ num_var1 num_var2 num_var3;
    datalines;
A, B, C, 1, 2, 3
A, B, ,1, 2, .  
 , , ,., ., 3
A, , C, ., ., .
 , B, ,1, 2, .
A, , ,., 2, .
;
run;

Count the Missing Values by Row

We start by counting the number of missing values per row in a SAS dataset. We separate this section according to the variable type (character, numeric, and both).

Character Variables

If you want to count the number of missing character values per row, you can use the CMISS function. The CMISS function takes as arguments the columns you want to analyze.

So, in our example we could use the following code:

CMISS(char_var1, char_var2, char_var3)

However, since all character variables start with char_var we can use the colon (:) modifier to make our code more efficient.

data work.cnt_missing_row_char1;
    set work.my_ds (keep= _character_);
    cnt_missing_character = cmiss(of char_var:);	
run;

Above, we used the actual names of the columns as arguments for the CMISS function. However, if you want to know the number of missing values in a row for all character columns, you can use the special keyword _character_. With this keyword, SAS takes into account automatically all character columns and makes your code more efficient.

data work.cnt_missing_row_char2;
    set work.my_ds (keep= _character_);
    cnt_missing_character = cmiss(of _character_);	
run;
Count Missing Character Values by Row

Numeric Variables

You can use the NMISS function to count the missing numeric values in SAS. Like the CMISS function, the NMISS function takes as arguments the columns you want to analyze.

Again, we could use the following syntax to count the number of missing numerical values:

NMISS(num_var1, num_var2, num_var3)

However, the we will use the colon modifier to make our code more efficient.

data work.cnt_missing_row_num1;
    set work.my_ds (keep= _numeric_);
    cnt_missing_numeric = nmiss(of num_var:);
run;

The second method to count all the missing values in numeric columns is using the _numeric_ keyword. However, if you use this method you must subtract 1 to get the correct answer. Let me explain this.

The numeric columns in our original dataset (work.my_ds) are num_var1, num_var2, and num_var3. In the code below, we define a new, extra numeric column, namely cnt_missing_numeric. This column will contain the number of missing values per row. So, cnt_missing_numeric is also a numeric value. Since this is a new variable, it has no value at first. However, we don’t want to take this column into account while counting the number of missing numeric value. Hence, we need to subtract 1.

data work.cnt_missing_row_num2;
    set work.my_ds (keep= _numeric_);
    cnt_missing_numeric = nmiss(of _numeric_)-1;	
run;
Count Missing Numeric Variables by Row

Character and Numeric Variables

If you want to count all the missing values by row in a SAS dataset, you have two options.

First, you can explicitly name the range of variables you want to take into account as arguments of the CMISS function. So, in this example, all variables between char_var_1 and num_var_3.

data work.cnt_missing_row_all1;
    set work.my_ds;
    cnt_missing_all = cmiss(of char_var1 -- num_var3);
run;

Second, we can use the special keyword _all_. Using this keyword, SAS takes all variables into account while counting the number of missing values. However, like in the previous section, we need to subtract 1 (see above), because the new column cnt_missing_all should be ignored.

data work.cnt_missing_row_all2;
    set work.my_ds;
    cnt_missing_all = cmiss(of _all_)-1;
run;
Count All Missing Values by Row

Count the Missing Values by Column

Instead of counting the missing values per row, you can also count the number of missing values per column.

You can use the PROC FREQ procedure to count the number of missing values per column. You use the statement “table _all_ / missing” to do this for all variables. If you want the count the number of a specific (type) of variable, you can change the _all_ keyword for _character_, _numeric_, or a column name.

proc freq data=work.my_ds;
    table _all_ / missing;
run;
Count Missing Values by Column
Count Missing Values by Column