SAS Functions

How to Extract a Specific Word from a SAS String

Suppose you have a string with many words, but you are only interested in the third word. How do you extract this word from your string in SAS?

In SAS, you can use the SCAN function to extract a word from a string. This function takes the string you want to scan as an argument, as well as a number that represents the position of the word you want to extract.

Although most users will use the SCAN function to extract a specific word from a text string in SAS, you can use this function for a wider variety of purposes. In general, you use the SCAN function to parse a string and extract the part you are interested in.

In this article, we discuss the syntax of the SCAN function and provide some examples. However, if you are interested in extracting a substring from a longer string, then we recommend this article.

The SCAN syntax

The SCAN function contains two obligatory arguments and one optional argument:

SCAN(string, count <, character list>)

  • string: This is the string you want to parse. In other words, the string from which you want to extract a specific part.
  • count: This number specifies the position of the word you want to extract. If the number if positive, SAS parses the string from left to right. However, if the number is negative, SAS scans the string from right to left.
  • character list (optional): With this optional parameter you can define the character(s) that are used as delimiter to parse the string.

If you don’t set the optional argument, SAS uses a default list of characters to parse the string:

blank ! $ % & ( ) * + , – . / ; < ^ |

How to Scan a String (Forward & Backward)

SAS extracts from a string the word which position corresponds to the value of the count argument. So, if count = 3, then SAS extracts the third word from a string. By default, SAS counts words from left to right. However, if count = -3, then SAS counts from right to left.

Below you find an example of how the value of the count argument impacts the result of the SCAN function. If you set count = 1, then SAS extracts the first word of a string. But, if count = -1, then the SCAN function returns the last word.

SAS dataset Example 1
SAS Scan function count
data work.ds;
	input my_string $1-50;
	datalines;
Today is a sunny day
Tomorrow will be a sunny day too
Yesterday was my birthday
World/Europe/Spain/Madrid
0800-123-456-789
;
run;

data work.ds_scan;
	set work.ds;
	
	my_scan = scan(my_string, 1);
	my_scan_backward = scan(my_string, -1);

run;

How to Use the Delimiter to Extract a Word from a String

The third, optional argument of the SCAN function is the delimiter. You use the delimiter to let SAS know how to parse a string.

By default, if you don’t provide a delimiter, SAS uses the following characters as delimiters:

blank ! $ % & ( ) * + , – . / ; < ^ |

This means that SAS counts a “word” each time it encounters one of these characters. However, this might not what you want. Fortunately, you can provide your own delimiter as a third argument.

In the example below, we define that only the “/”-character counts as a delimiter. Note the result of parsing the 3rd and 4th row which contains the “-“-character which normally counts as a delimiter.

SAS dataset Example 2
Extract a word from a string in SAS
data work.ds;
	input my_string $1-50;
	datalines;
World/Europe/Spain/Madrid
World/Europe/France/Paris
World/Latin-America/Argentina/Buenos Aires
World/Latin-America/Peru/Lima
;
run;


data work.ds_scan_slash;
	set work.ds;
	
	my_scan_slash = scan(my_string, 3, '/');
run;

You can find the official SAS documentation about the SCAN function here. Check this list for other character functions.

One thought on “How to Extract a Specific Word from a SAS String

Comments are closed.