SAS Statements

Join two tables in SAS

In SAS you can join two or more tables in two ways. In this article, we discuss how to join tables with the MERGE statement. We explain the syntax of the MERGE statement and discuss how to create a full, left, right, and inner join.

Do you know? How to Compare Datasets without Merging Them First

The MERGE Syntax

MERGE SAS-data-set-1 <(data-set-options)>
SAS-data-set-2 <(data-set-options) >
<…SAS-data-set-n<(data-set-options)>>;

The MERGE statement contains at least two obligatory arguments, namely the SAS tables you want to join. You can keep adding more arguments to join more tables.

The optional arguments are the data set options. For example, the WHERE statement to subset the input data set or the KEEP statement to select variables of the input data set. In this article, we focus only on the IN statement. With this statement, you can create left, right, and inner joins.

In contrast to a join in SQL, the MERGE statement in SAS doesn’t create a cartesian, many-to-many join. The joins in SAS are one-to-one joins or one-to-many joins. The advantage of theses types of joins is that SAS carries out the joins faster. However, the only condition to create these joins is that the input tables are sorted on the common variable(s), i.e. the variable(s) you want to use to join on.

MERGE Example

The Input Data Sets

Throughout this article, we will use 2 data sets of 2 columns with information about cities (population and elevation). Both tables contain the column city. We will use this column to join the 2 tables. The tables will be referred to as Left and Right.

data work.left;
	infile datalines dlm=',';
	length city $ 15;
	input city $ population;
	format population nlnum12.;
	datalines;
Copenhagen, 602481
Amsterdam, 872680
Paris, 2150271
Berlin, 3769495
Rome, 4342212
Madrid, 6345000
London, 8908081
;
run;

data work.right;
	infile datalines dlm=',';
	length city $ 15;
	input city $ elevation;
	datalines;
Amsterdam, -2
Lisbon, 2
Brussels, 13
Rome, 21
Berlin, 34
Paris, 35
Warsaw, 78
Vienna, 151
Madrid, 667
;
run;
Left SAS table to join
Table LEFT
Right SAS table to join
Table RIGHT

As mentioned before, the tables must be sorted on the common variable(s). In this example, we have only one common variable, namely city. Currently, the tables are sorted on population and elevation. So, first, we sort both tables on the city column. We use PROC SORT to order the tables.

proc sort data=work.left out= work.left_srt;
	by city;
run;

proc sort data=work.right out=work.right_srt;
	by city;
run;
Table LEFT Sorted
Table LEFT Sorted
Table RIGHT Sorted
Table RIGHT Sorted

If we wanted to join tables on more than one the column, then these columns must be specified in the BY statement.

The Full Join

The first join type we discuss is the FULL JOIN. This join returns all matching observations from both tables whether the other table matches or not. This join type is the most straightforward of all joins and doesn’t require the IN keyword. However, we always need to specify the common variable(s) in the BY statement.

data work.full_join;
	merge work.left_srt
		work.right_srt;
	by city;
run;
The FULL JOIN
The FULL JOIN

As you can see, some observations contain missing value. For example, the population of Brussels. This is because Brussels isn’t present in the Left table.

The LEFT and RIGHT Join

The second and third join types are the LEFT JOIN and the RIGHT JOIN. The LEFT JOIN returns all observations of the Left table and the matching observations from the Right table. In contrast, the RIGHT JOIN returns all observations of the Right table and the matching observations from the Left table.

To create these types of join we need to use the IN keyword. The IN keyword is used to refer to the input tables. In the examples below we use IN = a to refer to the Left table and IN = b to refer to the Right table. The characters a and b are arbitrary. However, it’s common to use these characters.

We use the IF statement to let SAS know which observation must be written to the output data set.

data work.left_join;
	merge work.left_srt (in=a)
		work.right_srt (in=b);
	by city;
	
	if a then output;
run;

data work.right_join;
	merge work.left_srt (in=a)
		work.right_srt (in=b);
	by city;
	
	if b then output;
run;
LEFT JOIN of SAS tables
The LEFT JOIN
RIGHT JOIN of SAS tables
The RIGHT JOIN

The INNER Join

The last join type we discuss is the INNER JOIN. The INNER JOIN returns all observations that match in the Left and Right table. For this join we need the IN keyword and IF statement too.

data work.inner_join;
	merge work.left_srt (in=a)
		work.right_srt (in=b);
	by city;
	
	if a and b then output;
run;
INNER JOIN of SAS tables
The INNER JOIN

The Conclusion

In SAS you can join two or more tables using the MERGE statement. Firstly, you need to order the tables you want to join on the common variable(s). Secondly, you need to define the common variable(s) in SAS with the BY statement. Finally, to create a left, right or inner join, you need the IN keyword and the IF statement.

On the SAS website you can find the official documentation.