SAS How To's

How to Extract Characters from a String in SAS

In this article, we discuss how to create a substring in SAS. In other words, how to extract a specified number of characters from a longer string.

The easiest way to create a substring in SAS is with the SUBSTR function. This function extracts characters from a SAS string given the input arguments: string, position, and length. You can also use the SUBSTR function to get the last N characters from a string, as well as to replace a range of characters with a set of new characters.

Before we start this article, we want to stress that we will be discussing how to extract and replace characters. However, if you want to remove characters, we recommend the article where we discuss the applications of the COMPRESS function.

How to Extract N Characters from a String in SAS

The SAS SUBSTR() function extracts a number of characters (i.e., a substring) from a text string starting at a given position. The function has three arguments, namely string, position, and (optionally) length:

  1. String: The text string from which you want to extract a substring.
  2. Position: The starting position of the substring in the original text string.
  3. Length (optional): The length (i.e., number of characters) of the substring to extract.

If you omit the length argument, SAS extracts the remainder of the text string given the starting position.

SUBSTR(string, position, <length>)

Below we provide some examples of how to use the SUBSTR function.

With the following SAS code, we extract the first 3 characters from a text string.

data work.ds;
    text_string = "abcde";
    substring = substr(text_string, 1, 3);
    output;
run;
Create a Substring in SAS

Instead of starting the substring at the first position, you can use the SUBSTR function also to read characters from other positions (e.g., the second, third, or fourth).

For example, below we extract three characters from a text string starting at the second position. With this example, you can clearly see that the third argument is the length of the substring, instead of its ending position.

data work.ds;
    text_string = "abcde";
    substring = substr(text_string, 2, 3);
    output;
run;
Extract 3 characters from a string in SAS

In the two examples above, we have used all three arguments (string, position, and length). However, you can omit the length argument. As a result, SAS creates a substring that reads all the remaining characters after the starting position. For example:

data work.ds;
    text_string = "abcde";
    substring = substr(text_string, 2);
    output;
run;
Extract the remaining characters from a SAS string.

If you use the SUBSTR function to create a new variable (as in the previous examples), then SAS gives it the same length as the first argument (i.e., the text string). However, you can change the default length by using the LENGTH statement.

How to Extract the Last N Characters from a String in SAS

Another common question is how to extract the last N characters from a string.

A natural thought is to use the SUBSTR function and extract the characters backward (from right to left) instead of forward (from left to right). However, the SUBSTR function doesn’t provide this possibility. Therefore, you need another function to extract the last N characters from a string.

You can extract the last N characters from a SAS string by using the SUBSTR function and the LENGTH function. First, you determine the length of the string with the LENGTH function and you use this information to specify the starting position of the substring. Then, with the SUBSTR function, you extract the last N characters you need.

For example, we have two strings with different lengths, namely abc and abcdef.

You can extract the last 2 characters of the text strings, with the following 3 steps:

1. Determine the length of the string with the LENGTH function.

Find and replace the last N charatcers in a SAS string.

2. Specify the starting position to extract the last N characters. You do so by subtracting the N-1 characters from the length of the original string.

3. Create the substring with the last N characters with the SUBSTR() function.

Extract the last N characters from a SAS string.

You can also combine these steps into one line of SAS code.

data work.ds;
    set work.ds;
 
    last_N_characters = substr(text_string, length(text_string)-(2-1), 2);
run;
Extract the last N characters from a string in SAS.

The method above is just one way to create a substring of the last N characters of a string. In this article, we discuss other ways how to do this. Also, we demonstrate how to get only the last character of a string, as well as how to extract the last alphabetic or numeric character(s).

How to Replace N Characters in a String in SAS

Instead of creating a substring, you can use the SUBSTR() function also to replace characters in a string.

Before we continue and explain how to do this, we must first distinguish two types of replacing characters. You can either replace specific characters (e.g., a word) or replace a range of characters based on their positions.

For example, below we show how to replace a specific word in a string.

data work.ds;
    text_string = "It will be RAINING tomorrow";
    text_string_replace = tranwrd(text_string, "RAINING", "SNOWING");
    output;
 
    text_string = "It is RAINING for 2 hours";
    text_string_replace = tranwrd(text_string, "RAINING", "SNOWING");
    output;
run;
 
proc print data=work.ds noobs;
run;

In the example above, we replaced the word RAINING with SNOWING by using the TRANWRD() function. If you want to see more examples of the TRANWRD function, you can read this article.

Another type of replacing characters is by replacing characters on their position. For example, below we change the characters from position 2 to 4 with the characters XYZ.

You replace a range of characters in a SAS string by placing the SUBSTR function on the left of the =-sign. By placing the function before the =-sign, you can specify the range of characters you want to replace. Then, after the =-sign, you specify the characters that will replace the original characters.

A drawback of this method is that you directly overwrite the original value of the text string.

In the example below, we replace 3 characters starting from the second position with the characters XYZ.

data work.ds;
    text_string = "abcdef";
    substr(text_string, 2, 3) = "XYZ";
    output;
 
    text_string = "ghijkl";
    substr(text_string, 2, 3) = "XYZ";
    output;
run;
 
proc print data=work.ds noobs;
run;