/* The first few examples have been seen before, in the SAS_if_then.sas program and in one of the assignments. The larger data set contains physical fitness measurements and the variables are Age (years), Weight (kg), Oxygen intake rate (ml per kg body weight per minute), time to run 1.5 miles (minutes), heart rate while resting, heart rate while running (same time Oxygen rate measured), and maximum heart rate recorded while running. The smaller data set contains variables sex and bmi. */ options ls=100; filename foo1 url "http://www.uvm.edu/~abh/stat295/datasets/fitness2.dat"; filename foo2 url "http://www.uvm.edu/~abh/stat295/datasets/bmi.dat"; filename foo3 url "http://www.uvm.edu/~abh/stat295/datasets/bmi2.dat"; data fitness; infile foo1; input id Sex $1. Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse; run; proc print; title "Fitness Data"; run; data bmi; infile foo2; input Sex $1. bmi; run; proc print; title "BMI Data"; run; /* The following code uses the MERGE statement to combine the fitness and BMI data sets. It is an example of a one-to-one merge. The resulting data set matches the first observation from the fitness data with the first observation from the BMI data set, the second with the second, etc. etc. etc. Matching continues until both data sets run out of observations. This resulting data set has 31 observations since both fitness and BMI data sets have 31 observations. However, we must take it on faith that the BMI values are matched to the correct person since there is no way to identify the observations in the BMI data set. */ data fit_bmi; merge fitness bmi; run; proc print; title "Fitness and BMI data merged"; run; /* The following code is an example of a one-to-one match merge when the data sets do not have the same number of observations. The resulting data set will have missing values for the variables in the smaller data set after the smaller data set runs out observations. It will also overwrite the values of id from the fitness data with those from the bmi data set. This time we are using a BMI data set that includes an ID variable. */ data bmi2; infile foo3; input id Sex $1. bmi; run; proc print; title "BMI Data with ID"; run; data fit_bmi2; merge fitness bmi2; run; proc print data=fit_bmi2; title "Fitness and BMI data merged with unequal data set sizes"; run; /* The following code is an example of match merging with a BY statement. The data sets must be sorted before merging. Match merging with a BY variable ensures that the data from each data set is matched to the correct person appropriately. */ proc sort data=bmi2; by id; run; data fit_bmi2; merge fitness bmi2; by id; run; proc print; title "Fitness and BMI data merged by id"; run; /* It is often necessary to calculate descriptive statistics (sum, mean, etc.) by groups and merge these group means back into the original data for further calculation and/or analysis. First, sort the data, then use PROC MEANS to obtain these descriptive statistics and output them to a SAS data set. */ proc sort data=bmi2; by sex; run; /* The NOPRINT option of the PROC MEANS statement prevents any printed output from being produced. The OUTPUT statement creates the SAS data set containing the bmi means. The mean= option creates a variable containing means for each sex and names it bmimean. */ proc means data=bmi2 noprint; by sex; var bmi; output out=stats mean=bmimean ; run; /* The data set "stats" contains the bmi means of each sex as well as a couple of automatic variables created by the PROC MEANS. These variables can be dropped from the data when the data set is created by using the drop= data set option: output out=stats(drop=_TYPE_ _FREQ_) mean=bmimean ; */ /* Let's make sure that the original data set is sorted in the proper order before we do the merge. It is OK to include a PROC SORT to be sure that the data is sorted properly. If SAS determines that the data set is already sorted in that order, it will not actually run the sort procedure again. */ proc sort data=bmi2; by sex; run; data bmi3; merge bmi2 stats; by sex; run; proc print; title "BMI data with means for each sex merged back in"; run; /* Merging simple descriptive statistics back into the original data was the impetus for creating this course. A student was running PROC MEANS to get sums and then putting the sums back into the original data with IF...THEN statements (80 of them!). A three line DATA step would do what the 80+ lines of code had been doing. */