/* Sometimes, files may include data of different types in the same file and each record may have an identifier, usually as the first piece of information on the record. The following example reads a common form of multi-type data, where there is information about the household and about each person in the household. The household data is identified by an H in the first column of the record, while person data has a P in the first column. Our first attempt to read the data inputs the value of record type, holds the pointer, and evaluates the value of REC_TYPE so that we can continue to read the appropriate type of data on the record. */ options ls=90; filename foo URL "http://www.uvm.edu/~abh/stat295/datasets/households.dat"; data hh; infile foo; input rec_type $1. @; /* The trailing @ sign above tells SAS to hold the pointer so that it can read more data on the same record with another input statement. */ if rec_type = "H" then input hh_id zipcode:$5. housing_type year_built valuation; else if rec_type = "P" then input person_id age sex education occupation income; run; proc print data=hh; title "Household and person data"; run; /* We see from the printout that something is not right. What we really want is a record for each person in the data file, with all the information about the household and about the person. What we got was missing values for all the person data when there was non-missing data for the household, and missing values for the household variables when there was non-missing data for the person. We don't want to output the observation to the data set until we have some person data, so let's try outputting the observation after we read the person data. */ data hh; infile foo; input rec_type $1. @; if rec_type = "H" then input hh_id zipcode:$5. housing_type year_built valuation; else if rec_type = "P" then do; input person_id age sex education occupation income; output; end; run; proc print data=hh; title "Household and person data"; run; /* So this didn't seem to work very well either. Now we have all the household data missing for every person. Recall that when we discussed how SAS sets itself up to read data by using the "Program Data Vector", it sets all values to missing at the beginning of each loop through the data step. That is exactly what is happening here. We may be reading values for the household data when we loop through the data step, but by the time we get to the person data, we have looped through the data step again and SAS has reset all the variables to missing. So what we need is a way to tell SAS not to set the household variables to missing values in the Program Data Vector when it loops back to the beginning of the data step. The RETAIN statement is designed to do exactly that. */ data hh; infile foo; retain hh_id zipcode housing_type year_built valuation; input rec_type $1. @; if rec_type = "H" then input hh_id zipcode:$5. housing_type year_built valuation; else if rec_type = "P" then do; input person_id age sex education occupation income; output; end; run; proc print data=hh; title "Household and person data"; run; /* This seems to work pretty well. Notice the order of the variables in the printout. The RETAIN statement affects the order just as the LENGTH statement did in previous examples. The order in which the variables are encountered by SAS in the data step determines the order in which they are written to the data set. Notice also that the value of REC_TYPE is always "P" because we only output an observation after we have read data for the person from the person record. So we may as well not output the REC_TYPE variable. There are also two variables whose data are dollar amounts, so we could provide a format for these variables. If we knew what the codes for housing_type, sex, education, and occupation were, we could construct formats for them with PROC FORMAT and associate these formats with the corresponding variables. */ data hh; infile foo; retain hh_id zipcode housing_type year_built valuation; input rec_type $1. @; if rec_type = "H" then input hh_id zipcode:$5. housing_type year_built valuation; else if rec_type = "P" then do; input person_id age sex education occupation income; output; end; drop rec_type; run; proc print data=hh; format valuation income dollar10. ; title "Household and person data"; run;