/* Arrays are very useful for performing the same action on many variables. As we saw in a previous example, an array can be used to rearrange data from a multivariate form to a univariate form. For some people, it has been a common practice for many years to use a specific value to represent missing data in a raw data file. Ideally, this value is one that has no possibility of being a valid value in the data. For example, in a list of temperatures for a group of cities, one could represent a missing temperature with -999 since this is an impossible temperature. However, SAS will not know that this -999 represents a missing value, so we must recode these values to a period in order for SAS to work with the data correctly. */ data temperatures; input city $ Jan_temp Feb_temp Mar_temp Apr_temp May_temp Jun_temp Jul_temp Aug_temp Sep_temp Oct_temp Nov_temp Dec_temp; datalines; Raleigh 40.5 -999 49.2 59.5 67.4 74.4 77.5 76.5 70.6 -999 50.0 41.2 Fargo 12.2 16.5 28.3 -999 57.1 66.9 71.9 70.2 60.0 50.0 32.4 18.6 Phoenix 52.1 55.1 -999 67.7 76.3 84.6 91.2 -999 83.8 72.2 59.8 52.5 run; proc print data=temperatures; title "Monthly temperatures"; run; /* Let's use an array to recode all the -999 values to the proper SAS missing value. */ data recoded; set temperatures; array t {12} Jan_temp -- Dec_temp; do i = 1 to 12; if t{i} = -999 then t{i} = .; end; drop i; run; proc print data=recoded; title "Missing values of -999 recoded"; run; /* In the above example, we of course know how many months there are so we can tell SAS that there are 12 elements in the array (variables) and use 12 as the upper bound of the DO loop. If we want to recode a lot of variables using an array, but don't know (or don't want to count) exactly how many variables there are between the first and last variable inclusive, we can use a couple of tricks. First, we can use an asterisk to tell SAS that it should count how many array elements (variables) are in the array. Then we can use the dim() function as the upper bound of the DO loop. */ data recoded; set temperatures; array t {*} Jan_temp -- Dec_temp; do i = 1 to dim(t); if t{i} = -999 then t{i} = . ; end; drop i; run; proc print data=recoded; title "Missing values of -999 recoded" title2 "Use of asterisk and dim() function"; run; /* What if our data was not all numeric but had some character data in between the variables we want to recode? We can tell SAS in the ARRAY statement that the array should contain all the numeric variables in the data set. */ data temperatures; input Jan_temp Feb_temp Mar_temp Apr_temp May_temp city $ Jun_temp Jul_temp Aug_temp Sep_temp Oct_temp Nov_temp Dec_temp; datalines; 40.5 -999 49.2 59.5 67.4 Raleigh 74.4 77.5 76.5 70.6 -999 50.0 41.2 12.2 16.5 28.3 -999 57.1 Fargo 66.9 71.9 70.2 60.0 50.0 32.4 18.6 52.1 55.1 -999 67.7 76.3 Phoenix 84.6 91.2 -999 83.8 72.2 59.8 52.5 run; proc print data=temperatures; title "Monthly temperatures"; run; data recoded; set temperatures; array t{*} _NUMERIC_; do i = 1 to dim(t); if t{i} = -999 then t{i} = . ; end; drop i; run; proc print data=recoded; title "Missing values of -999 recoded"; title2 "Use of _NUMERIC_ option"; run;