In SAS there are many ways to create a dataset, but suppose you want to add a row to an existing table. How do you do that?
In this article, we show 5 ways how to insert a row into an existing SAS dataset. The most well-known and convenient way is with the INSERT INTO statement, but there are more options. We especially focus on methods where you (manually) define the values that need to be added.
At the end of this article, we provide an overview of all 5 methods with their pros and cons.
Insert a Row into a SAS Dataset
In all the methods we will discuss, we will add two rows to an existing dataset. You can create this table for yourself by running the SAS code below. The table consists of three rows and three columns, namely a character column, a numeric column, and a SAS date column.
/* My Dataset */ data work.my_data; input name $ salary birthdate date9.; format salary dollar12.0 birthdate date9.; datalines; John 50000 05MAY1985 Maria 55000 17AUG1982 Matt 43500 23OCT1991 ; run;
Method 1: INSERT INTO & SET Statement
The first method to add a row to a table is with the INSERT INTO statement and the SET statement.
With the INSERT INTO statement, you can insert new rows into an existing table. The syntax of this method is straightforward. First of all, you write down the INSERT INTO keywords followed by the table to which you want to add the new row(s). Then, after the SET statement, you specify the variable names and the variable values.
INSERT INTO table-name SET variable-1 = value-1 variable-2 = value-2 etc. ;
Syntax – Remarks
First of all, this method inserts records into an existing table. So, with this method, you overwrite the existing table. (Keep this in mind.) At the same time, this method is very efficient. It only inserts the new rows but doesn’t process the existing rows.
Secondly, if you insert character values, then you need to write them between quotes. Date values can be written as a number or as a date encapsulated by quotes followed by a D (i..e., 21915 or “01JAN2020″d).
Thirdly, it isn’t necessary to provide values for all existing columns. If you want to insert a blank (missing) value, you can omit the variable-value pair.
Finally, you can’t use this method to add new columns to the existing table. That is to say, in the SET statement, you can only specify columns that already exist. If you want to create a new column, then have a look at this article.
In this example, we insert two records into the work.my_data dataset with the INSERT INTO statement and the SET statement.
proc sql; insert into work.my_data set name = "Meghan", salary = 52000, birthdate = "04APR1989"d set name = "Joe", salary = 48000, birthdate = "27NOV1994"d; quit;
Method 2: INSERT INTO & VALUES Statement
The second method to insert a row into a SAS dataset is with the INSERT INTO statement and the VALUES statement.
The syntax of this method is quite similar to the syntax of the previous method. Firstly, you use the INSERT INTO keywords to specify to which table you want to insert the row(s). Then, you use one or more VALUES statements to detail the values you want to insert.
INSERT INTO table-name VALUES(value-1, value-2, etc.) ;
Syntax – Remarks
Like the first method, this method also overwrites the original table. Besides that, this method is also very efficient. SAS doesn’t read/process the rows of the existing table, it only adds the new row(s).
Unlike the previous method, you can’t leave values blank. In other words, the VALUES statement must contain the same number of values as the number of columns. If you want to add a blank value, then you should specify this with a period (.) or a blank (” “).
In this example, we add two rows to an existing table with the INSERT INTO statement and the VALUES statement.
proc sql; insert into work.my_data values("Meghan", 52000, "04APR1989"d) values("Joe", 48000, "27NOV1994"d); quit;
Method 3: PROC APPEND
The third method to add a row to a dataset in SAS is with PROC APPEND.
With the PROC APPEND procedure, you can add a dataset of one or more rows to an existing dataset. The PROC APPEND procedure has two obligatory arguments, namely the base-argument and the data-argument. With the base-argument, you specify the table to which you want to add rows, while you use the data-argument to detail the dataset you want to add.
PROC APPEND base = table-name-1 data = table-name-2; RUN;
Syntax – Remarks
First of all, PROC APPEND its intended use is to combine tables (vertically). So, it is always necessary to create a table with the rows you want to add beforehand. Therefore, this method requires more lines of code to add a row.
Secondly, PROC APPEND is an efficient procedure since it only appends rows. SAS doesn’t process the rows of the existing (base) dataset.
Finally, you can use PROC APPEND to add rows and create new columns at once (with the FORCE option). For more information about how to do this, read this article.
With the code below, we add two rows to an existing table using PROC APPEND. First, we create a new table with the records we want to add, then we use PROC APPEND to combine the existing table with the new table.
data work.my_data_to_add; input name $ salary birthdate date9.; format salary dollar12.0 birthdate date9.; datalines; Meghan 52000 04APR1989 Joe 48000 27NOV1994 ; run; proc append base=work.my_data data=work.my_data_to_add; run;
Method 4: DATA STEP & SET Statement
The fourth method to insert a row into a dataset is with a Data Step and the SET statement.
This method is actually another way of appending datasets. Like the previous method (PROC APPEND), it’s mainly used to combine datasets, not to add individual rows. However, if you want to add multiple rows, this method can be quite efficient (regarding the number of lines of code).
To use this method, you need to create a table with the records you want to add first. Then, you use the Data Step and the SET statement to combine the two tables (existing table and table with the new records).
DATA table-name-1; SET table-name-1 table-name-2; RUN;
Syntax – Remarks
Although this method is mainly used to append datasets, its advantage is that it doesn’t necessarily overwrite the existing dataset. With the DATA statement, you can define and create a new table.
The drawback of this method is that always processes all rows of the existing table. Hence, this method isn’t very efficient. Moreover, if the existing dataset has many rows, this method can be rather slow.
In this example, we use a Data Step and the SET statement to add 2 records to the work.my_data dataset.
data work.my_data_to_add; input name $ salary birthdate date9.; format salary dollar12.0 birthdate date9.; datalines; Meghan 52000 04APR1989 Joe 48000 27NOV1994 ; run; data work.my_data; set work.my_data work.my_data_to_add; run;
Method 5: DATA STEP & OUTPUT Statement
The fifth way to add a row to a table is with a Data Step and the OUTPUT statement.
This method adds rows to a dataset after SAS has read and processed all rows of the dataset in the SET statement. We use the end keyword to check if SAS has processed the last row and, if so, add the new row(s). We define the values of the new row and use the OUTPUT statement to send these values to the new dataset.
If you want to know more about the end keyword, you can check this article.
Syntax – Remarks
Because this method processes first all rows from the input dataset before adding the new rows, this method can be slow. However, the code is very easy to understand and modify. Moreover, you can create new columns when you add new rows. Finally, the table names in the DATA statement and the SET statement can be different. So, this method doesn’t necessarily overwrite the original dataset.
In this example, we use the Data Step and the OUTPUT statement to insert a row into the SAS dataset work.my_data.
data work.my_data; set work.my_data end=eof; output; if eof then do; name = "Meghan"; salary = 52000; birthdate = "04APR1989"d; output; name = "Joe"; salary = 48000; birthdate = "27NOV1994"d; output; end; run;
Summary: Insert a Row into a SAS Dataset
To conclude this article, we provide a table that compares the characteristics of the 5 methods. You can use this table to decide which method is best for your needs.
|Method||Overwrites Dataset||Efficiency||Lines of Code||Add New Column(s)|
|SELECT INTO & SET Statement||Yes||High||Low||No|
|SELECT INTO & VALUES Statement||Yes||High||Low||No|
|DATA STEP & SET Statement||Not necessarily||Low||Medium||Yes|
|DATA STEP & OUTPUT Statement||Not necessarily||Low||Medium||Yes|