This web page simply contains SAS code for the treatment of missing data. Because SAS runs primarily from code, rather than from a GUI, it is much easier to show you how to do the analysis. For a more extensive discussion of this data set and how multiple imputation works, see MissingDataNorm.html or Missing.html.
The data file is named CancerNoHeadDot.dat and contains the following variables related to child behavior problems among kids who have a parent with cancer. (The "dot" in the title of the file is there to remind me that this file used "." for missing data, which is a standard notation for missing data in SAS. Unfortunately SPSS for some odd reason recently decided that it no longer likes periods in prepared data files. The NoHead tells me that the names of the variables are not to be found in Line 1.) Several of the variables in this example relate to the parent (patient) with cancer. The other variables relate to the spouse of the patient. The variable names are, in order, SexP (sex parent), deptp (parent's depression T score), anxtp (parent's anxiety T score), gsitp (parent's global symptom index T score), depts, anxts, gsits (same variables for spouse), sexchild, totbpt (total behavior problem T score for child). These are a subset of a larger dataset, and the analysis itself has no particular meaning. I just needed a bunch of data and I grabbed an available file related to a research project with which I was involved. We will assume that we want to predict the child's Total Behavior Problem T score as a function of the other variables. I no longer recall whether the missing values were actually missing or whether I deleted a bunch of values to create an example.
The first few cases are shown below. Notice that variable names are NOT included in the first line. Missing data are indicated by ".".
2 50 52 52 44 41 42 . . 1 65 55 57 73 68 71 1 60 1 57 67 61 67 63 65 2 45 2 61 64 57 60 59 62 1 48 2 61 52 57 44 50 50 1 58 1 53 55 53 70 70 69 . . 2 64 59 60 . . . . . 1 53 50 50 42 38 33 2 52 2 42 38 39 44 41 45 . . 2 61 61 55 44 50 42 1 51 1 44 50 42 42 38 43 . . 2 57 55 51 44 41 35 . . . . . . 57 52 57 2 65 2 70 59 66 . . . 1 61 2 57 61 52 53 59 53 2 49
The code which follows will read the data and perform the necessary analysis. You will need to change the designation for the file name. The program first solves the regression for the original file with missing data. It then imputes the data and solves the regression again with the imputed data. Proc MI does the imputation, while Proc MIANALYZE averages the results to obtain the final result.
Data Missing; Infile 'C:\Users\Dave\Dropbox\Webs\StatPages\More_Stuff\Missing_Data\CancerDot.dat'; Input SexP DeptP AnxtP GSItP DeptS AnxtS GSItS SexChild Totbpt; run; Proc Reg data = Missing; * This runs the analysis on the original data file; Model Totbpt = SexP DeptP AnxtP DeptS AnxtS; Run; Proc MI Data = Missing out = miout seed = 35399; Var SexP DeptP AnxtP GSItP DeptS AnxtS GSItS SexChild Totbpt; Run; Proc Reg data = miout outest = outreg covout ; * Runs analyses for each imputation set; Model Totbpt = SexP DeptP AnxtP DeptS AnxtS; by _Imputation_; Run; proc MIANALYZE data = outreg; * Averages across the 5 regressions from previous procedure; ModelEffects SexP DeptP AnxtP DeptS AnxtS intercept; Run;
I suggest that you run this code and compare the output with results you obtain from other programs to which I have referred. The results will not be exactly the same (in part because the procedures involve a random component), but they should be close. The output from my analysis can be seen as CancerPrintout.html. The printout can be a bit difficult to read because of the way that SAS breaks up the output to fit on a page.
Return to Dave Howell's Statistical Home Page
Send mail to: David.Howell@uvm.edu)
Last revised 12/3/2012