In this article, we demonstrate two ways to perform a left join in SAS.
A join (left, right, inner, outer, etc.) is used to combine data from two or more tables based on a common value in both sources, e.g., a customer ID.
More specifically, a left join returns all records from the left table and the matching records from the right table. If there is no match, zero records from the right table are returned.
A left join is sometimes also called a left outer join.
In SAS, there are two ways to left join tables. Namely, you can either use the LEFT JOIN operation from the PROC SQL procedure or the MERGE statement in a SAS data step.
In this article, we will discuss both methods, provide examples, and discuss their advantages and disadvantages.
METHOD 1: Left Join Tables with PROC SQL
The first method to perform a left-join in SAS is with the LEFT JOIN operation in an SQL procedure (i.e., PROC SQL).
With the LEFT JOIN operation, the PROC SQL procedure returns all the records from the table in the FROM clause plus the matching records from the table mentioned after the LEFT JOIN keywords.
Additionally, the ON keyword specifies the condition for matching records. If you have more than one condition, you can add more using the AND keyword.
Next, we provide an example of how to perform a left join with two tables.
data work.table1; input Name $ Weight; label weight='Weight (in kg)'; datalines; John 85 Bob 79 Susan 65 Aaron 82 Michelle 70 ; run; proc print data=work.table1 noobs label; run; data work.table2; input Name $ Height; label Height='Height (in cm)'; datalines; Bob 185 Melissa 168 Susan 164 John 178 Patrick 191 ; run; proc print data=work.table2 noobs label; run; proc sql; create table work.left_join_proc_sql as select t1.name, t1.weight, t2.height from work.table1 t1 left join work.table2 t2 on t1.name = t2.name; quit; proc print data=work.left_join_proc_sql noobs label; run;
In the code above, we perform a left-join on table1 and table2 based on the column Name.
METHOD 2: Left Join Table with a SAS Data Step
Alternatively, you can use a SAS Data Step to join data from two or more data sets.
In a SAS Data Step, you start a join with the MERGE statement followed by the names of the tables you want to combine. After the MERGE statement follows a BY statement to specify the columns you want to match.
Finally, to create a (left) join, you need an IF statement. The IF statement specifies the type of join. For example, that you want to keep all records from the left table plus the matching records from the right table.
In contrast to the PROC SQL procedure, the MERGE statement requires that the input tables must be ordered. You can do this with the PROC SORT procedure.
Additionally (and also in contrast to the PROC SQL procedure), the column names to find matching records must be identical. For example, you can’t merge two tables with the column names first_name and firstname.
All these extra requirements come with a benefit. Namely, if you want to perform a left join on two or more large tables, the MERGE statement is faster than the SQL procedure.
Next, we provide an example to do a left join with the MERGE statement.
data work.table1; input Name $ Weight; label weight='Weight (in kg)'; datalines; John 85 Bob 79 Susan 65 Aaron 82 Michelle 70 ; run; proc print data=work.table1 noobs label; run; data work.table2; input Name $ Height; label Height='Height (in cm)'; datalines; Bob 185 Melissa 168 Susan 164 John 178 Patrick 191 ; run; proc print data=work.table2 noobs label; run; proc sort data=work.table1 out=work.table1_srt; by name; run; proc sort data=work.table2 out=work.table2_srt; by name; run; data work.left_join_merge; merge work.table1_srt (in=a) work.table2_srt (in=b); by Name; if a; run; proc print data=work.table1_srt noobs label; run; proc print data=work.table2_srt noobs label; run; proc print data=work.left_join_merge noobs label; run;
SQL Left Join versus Merge
What follows next is a comparison of the two methods to perform a left join in SAS.
LEFT JOIN operation (PROC SQL) | MERGE statement (SAS Data Step) | |
---|---|---|
Advantages | – Input tables can be unordered. – Matching records can be found in columns with different names. – The order of columns in output data can be easily changed | – Fast – Can be used for different kinds of joins (e.g., left, right, inner, etc.). |
Disadvantages | – Slow | – Input tables must be ordered – Matching columns must have the same name. |