Missing data are a part of almost all research, and we all
have to decide how to deal with it from time to time. I have written two web pages on multiple regression with missing data. You can see these at (Missing Data Part One) and at
I am going to use a set of data from a study that I was involved with some time ago and published as Epping-Jordan, Compas, & Howell (1994). It dealt with cancer outcomes (improved versus not improved) as a function of several variables.These include Survrate (a rating by the oncologist of the individuals expected survival time, Prognosis (a four point scale), Amttreat (amount of treatment, GSI (the Glocal Symptom Index from Derogatis' Symptom Checklist 90), Avoid (a measure of avoidance behavior), and Intrus (a measure of intrusive thoughts. The original data are available at survrate.dat. For this example I doubled the sample size by randomly adding or subtracting random numbers from the data in the original set. This left me with the original 66 cases and an additional 66 pseudocases. I did this simply to create a better example. The full set of 132 cases had no missing observations. so we can begin with a logistic regression on a full data set and use these results for comparison. Having a full data set, I randomly deleted 35 observations and replaced them with a missing data code. This file is available at
survrateMissing999.dat. The 999 in that title indicates that for this particular set I used 999 as a missing data code. (For SAS I will change 999 to ".", and for R I will change 999 to "NA." For some software I will include variable labels in line 1, and for other software I will leave the labels out.) The results of a standard logistic regression on the data set with no missing values follows. For this analysis I used SAS, but I could have done the same thing with any other program. There are a
number of alternative ways of dealing with missing data, and this
document is an attempt to outline those approaches. The original
version of this document spent considerable space on using dummy
variables to code for missing observations. That idea was
popularized in the behavioral sciences by Cohen and Cohen (1983).
However, that approach does not produce unbiased parameter
estimates (Jones, 1996), and is no longer to be
recommended--especially in light of the availability of excellent
software to handle other approaches. For a very thorough
book-length treatment of the issue of missing data, I recommend
Little and Rubin (1987) .A shorter treatment can be found in Allison (2002) . Perhaps the nicest treatment of modern approaches can be found in Baraldi & Enders (2010).
I have recently written a chapter on missing data for an edited volume (Howell, 2007). Part of that paper forms the basis for some of what is found here. You can write me at David.Howell@uvm.edu
for a copy of that missing data chapter. I am in the process of revisng
this page by breaking it into at least two pages. It has grown much too long and probably no one is eager to
read all the way through it. When I am done, this page will cover missing data in general and focus primarily on the situation where we either look for ways to use the data in its original form, or use traditional missing data techniques such as listwise deletion and mean replacement. I will cover situations that involve both multiple linear regression and the analysis of variance. The next document (Missing Data Part Two) focuses on newer data imputation methods which replace the missing data with a best guess at what that value would have been if you were able to obtain it. This is the material that most people now think of under the heading of "missing data,", but the former material is still important and often very useful. If your interest is in missing data in a repeated measures ANOVA , you will find useful material at http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Mixed Models for Repeated Measures.pdf . There are several reasons why data may be missing. They
may be missing because equipment malfunctioned, the weather was
terrible, people got sick, or the data were not entered
correctly. Here the data are missing completely at random
(MCAR). When we say that data are missing completely at
random, we mean that the probability that an observation
(Xi) is missing is unrelated to the value of
Xi or to the value of any other variables. Thus data
on family income would not be considered MCAR if people
with low incomes were less likely to report their family income
than people with higher incomes. Similarly, if Whites were more
likely to omit reporting income than African Americans, we again
would not have data that were MCAR because missingness would be
correlated with ethnicity. However if a participant's data were
missing because he was stopped for a traffic violation and missed
the data collection session, his data would presumably be missing
completely at random. Another way to think of MCAR is to note
that in that case any piece of data is just as likely to be
missing as any other piece of data. Notice that it is the value of the observation, and not its
"missingness," that is important. If people who refused to report
personal income were also likely to refuse to report family
income, the data could still be considered MCAR, so long as
neither of these had any relation to the income value itself.
This is an important consideration, because when a data set
consists of responses to several survey instruments, someone who
did not complete the Beck Depression Inventory would be missing
all BDI subscores, but that would not affect whether the data can
be classed as MCAR. This nice feature of data that are MCAR is that the analysis
remains unbiased. We may lose power for our design, but the
estimated parameters are not biased by the absence of data. Often data are not missing completely at random, but
they may be classifiable as missing at random (MAR). (MAR
is not really a good name for this condition because most people
would take it to be synonymous with MCAR, which it is not. However,
the label has stuck.) Let's back up one step. For
data to be missing completely at random, the probability
that Xi is missing is unrelated to the value of
Xi or other variables in the analysis. But the data
can be considered as missing at random if the data meet the
requirement that missingness does not depend on the value of
Xi after controlling for another variable. For
example, people who are depressed might be less inclined to
report their income, and thus reported income will be related to
depression. Depressed people might also have a lower income in
general, and thus when we have a high rate of missing data among
depressed individuals, the existing mean income might be lower
than it would be without missing data. However, if, within
depressed patients the probability of reported income was
unrelated to income level, then the data would be considered MAR,
though not MCAR. Another way of saying this is to say that to the
extent that missingness is correlated with other variables that
are included in the analysis, the data are MAR. The phraseology is a bit awkward here because we tend to
think of randomness as not producing bias, and thus might well
think that Missing at Random is not a problem. Unfortunately it
is a problem, although in this case we have ways of dealing with
the issue so as to produce meaningful and relatively unbiased
estimates. But just because a variable is MAR does not mean that
you can just forget about the problem. But nor does it mean that You have to throw up your handes and declare that there is nothing to be done The situation in which the data are at least MAR is sometimes
referred to as ignorable missingness. This name comes about because
for those data we can still produce unbiased parameter estimates without
needing to provide a model to explain missingness. Cases of MNAR, to be considered next, could be labeled cases of nonignorable missingness. If data are not MCAR or MAR then
they are classed as Missing Not at Random (MNAR). For
example, if we are studying mental health and people who have
been diagnosed as depressed are less likely than others to report
their mental status, the data are not missing at random. Clearly
the mean mental status score for the available data will not be
an unbiased estimate of the mean that we would have obtained with
complete data. The same thing happens when people with low income
are less likely to report their income on a data collection
form. When we have data that are MNAR we have a problem. The only
way to obtain an unbiased estimate of parameters is to model
missingness. In other words we would need to write a model that
accounts for the missing data. That model could then be
incorporated into a more complex model for estimating missing
values. This is not a task anyone would take on lightly. See
Dunning and Freedman (2008) for an
example. However even if the data are MNAR, all is not lost. Our estimators may be biased, but the bias may be small. By far the most common approach to missing data is to simply omit those cases
with missing data and to run our analyses on what remains. Thus
if 5 subjects in group one don't show up to be tested, that group
is 5 observations short. Or if 5 individuals have missing
scores on one or more variables, we simply omit those individuals
from the analysis. This approach is usually called listwise
deletion, but it is also known as complete case
analysis. Although listwise deletion often results in a substantial
decrease in the sample size available for the analysis, it does
have important advantages. In particular, under the assumption
that data are missing completely at random, it leads to unbiased
parameter estimates. Unfortunately, even when the data are MCAR
there is a loss in power using this approach, especially if we have to rule out a large number of subjects. And when the data
are not MCAR, bias results. (For example when low income
individuals are less likely to report their income level, the
resulting mean is biased in favor of higher incomes.) The
alternative approaches discussed below should be considered as
a replacement for listwise deletion, though in some cases we may
be better off to "bite the bullet" and fall back on listwise
deletion. Many computer packages offer the option of using what is
generally known as pairwise deletion but has also been called
"unwise" deletion. Under this approach each element of the
intercorrelation matrix is estimated using all available data. If
one participant reports his income and life satisfaction index,
but not his age, he is included in the correlation of income and
life satisfaction, but not in the correlations involving age. The
problem with this approach is that the parameters of the model
will be based on different sets of data, with different sample
sizes and different standard errors. It is even quite possible to
generate an intercorrelation matrix that is not positive
definite, which is likely to bring your whole analysis to a
stop. It has been suggested that if there are only a few missing
observations it doesn't hurt anything to use pairwise deletion.
But I would argue that if there are only a few missing
observations that it doesn't hurt much to toss those participants
out and use complete cases. If there are many missing
observations you can do considerable harm with either analysis.
In both situations the approaches given below are generally
preferable. I want to talk about a few approaches that are sometimes used
and that we know are not very wise choices. It is important to
talk about these because it is important to discourage their use,
but more importantly because they lead logically to modern
approaches that are very much better. Hot deck imputation goes back over 50 years and was used quite
successfully by the Census Bureau and others in better times. I have included it here partly for historical reasons and partly because it represents an approach of replacing data that are missing. In
the 1940's and '50's most citizens seemed to feel that they had a
responsibility to fill out surveys, and, as a result, relatively
little data was missing. Suppose that in the 1950 census a young,
black, male resident of census block 32a either was not available
or refused to participate. The census bureau would simply take a
stack of Hollerith cards (you may know them as "IBM cards"--or you may not even know them at all) that
came from young, black males in census block 32a, reach in the
pile, and pull one out at random. That card was substituted for the missing
card and the analysis continued. That is not as outrageous a
procedure as it might seem at first glance. First of all there
were relatively few missing observations to be replaced. Second
the replacement data was a random draw from a collection of data
on similar participants. Third the statistical implications of
this process were thought to be pretty well understood. I don't
believe that hot deck imputation is much used anymore, but it
served its purpose at the time. Scheuren (2005) has an
interesting discussion of how the process was developed within
the U. S. Census Bureau. An old procedure that should certainly be relegated to the
past was the idea of substituting a mean for the missing data.
For example, if you don't know my systolic blood pressure, just
substitute the mean systolic blood pressure for mine and
continue. There are a couple of problems with this approach. In
the first place it adds no new information. The overall mean,
with or without replacing my missing data, will be the same. In
addition, such a process leads to an underestimate of error.
Cohen et al. (2003) gave an interesting example of a data set on
university faculty. The data consisted of data on salary and
citation level of publications. There were 62 cases with complete
data and 7 cases for which the citation index was missing. Cohen
gives the following table. Notice that using mean substitution makes only a trivial
change in the correlation coefficient and no change in the
regression coefficient. But the st. err (b) is noticeably smaller
using mean substitution. That should not be surprising. We have
really added no new information to the data but we have increased
the sample size. The effect of increasing the sample size is to
increase the denominator for computing the standard error, thus
reducing the standard error. Adding no new information certainly
should not make you more comfortable with the result, but this
would seem to. The reduction is spurious and should be
avoided--as we'll see below. If we don't like mean substitution, why not try using linear
regression to predict what the missing score should be on the
basis of other variables that are present? We use existing variables to make a prediction, and then substitute that predicted value as if it were an actual obtained value. This approach has been
around for a long time and has at least one advantage over mean
substitution. At least the imputed value is in some way
conditional on other information we have about the person. With
mean substitution, if we were missing a person's weight we assigned
him the average weight. Put somewhat incorrectly, with regression
substitution we would assign him the weight of males of around
the same age. That has to be an improvement. But the problem of
error variance remains. By substituting a value that is perfectly
predictable from other variables, we have not really added more
information but we have increased the sample size and reduced the
standard error. There is one way out of this difficulty, however, is known as stochastic
regression imputation. The approach adds a randomly sampled residual term
from the normal (or other) distribution to each the imputed value. SPSS has
implemented this in their Missing Value Analysis procedure. By
default that procedure adds a bit of random error to each
substitution. That does not totally eliminate the problem, but it
does reduce it. There are better ways, however, and they build on
this simple idea. I am about to make the distinction between regression and
ANOVA models. This may not be the distinction that others might
make, but it makes sense for me. I am really trying to
distinguish between those cases for which group membership is
unknown and cases in which the substantive variables are
unknown. I will begin with a discussion of an approach that probably
won't seem very unusual. In experimental research we usually know
which group a subject belongs to because we specifically assigned
them to that group. Unless we somehow bungle our data, group
membership is not a problem. But in much applied research we
don't always know group assignments. For example, suppose that we
wanted to study differences in optimism among different religious
denominations. We could do as Sethi and Seligman (1993) did and
hand out an optimism scale in churches and synagogues, in which
case we have our subjects pretty well classified with respect to
religious affiliation because we know where we recruited them.
However we could instead simply hand out the optimism scale to
many people on the street and ask them to check off their
religious affiliation. Some people might check "None," which is a
perfectly appropriate response. But others might think that their
religious affiliation is not our business, and refuse to check
anything, leaving us completely in the dark. I would be hard
pressed to defend the idea that this is a random event across all
religious categories, but perhaps it is. Certainly "no response"
is not the same as a response of "none," and we wouldn't want to
treat it as if it were. I will take the data from the study by Sethi
and Seligman (1993) on optimism and religious fundamentalism as
an example, although I will assume that data collection
involved asking respondents to check a box specifying their religious affiliation.
These are data that I created and analyzed elsewhere to match
the results that Sethi and Seligman obtained, although for
purposes of this example I will alter those data so as to
remove "Religious Affiliation" from 30 cases. I won't tell you
whether I did this randomly or systematically, because the
answer to that will be part of our analysis. The data for this
example are contained in a file named FundMiss.dat, which is available for
downloading, although it is much too long to show here. (The
variables are, in order, ID, Group (string variable),
Optimism, Group Number (a numerical coding of Group),
Religious Influence, Religious Involvement, Religious Hope,
and Miss (to be explained later).) We will assume that when
respondents are missing any data, the data are missing on
Group membership and on all three religiosity variables, but
not on Optimism. Essentially our respondents are telling us that their religious beliefs are not our business. (Missing values are designated here with a
period (.). If your software doesn't like periods as missing
data (and SPSS no longer does), you can take any editor and change periods to asterisks
(*), or blanks, or 999s, or whatever it does like. R uses the
symbol "NA" for missing observations.) This is the
kind of result you might find if the religiosity variables all
come off the same measurement instrument and that instrument
also has a place to record religious affiliation. We see cases
like this all the time. The dependent variable for these
analyses is the respondent's score on the Optimism scale, and
the resulting sample sizes, means, and standard deviations are
shown in Table 1, as produced by SPSS. The results of an analysis of variance on Optimism scores of
all four groups is presented in Table 2. Here I have asked SPSS
to use what are called "Simple Contrasts" with the last (missing)
group as the reference group. This will cause SPSS to print out a
comparison of each of the first three groups with the Missing
group. I chose to use simple contrasts because I wanted to see
how Missing subjects compared to each of the three non-missing
groups. A moment's calculation will show you that the difference
between the mean of Fundamentalists and the mean of the Missing
group is 3.094 - 3.533 = -0.439. Similarly the Moderate group
mean differs from the mean of the Missing group by 1.942 - 3.533
= -1.591, and the Liberal and Missing means differ by 0.878 -
3.533 = -2.655. Thus participants who do not give their religious
affiliation have Optimism scores that are much closer to those of
Fundamentalists than those of the other affiliations. In the section of the table labeled "Parameter Estimates" we
see the coefficients of -.439, 1.592, and -2.655. You should note
that these coefficients are equal to the difference between each
group's mean and the mean of the last (Missing) group. Moreover,
the t values in that section of the table represent a
significance test on the deviations from the mean of the Missing
group, and we can see that Missing deviates significantly from
Moderates and Liberals, but not from Fundamentalists. This
suggests to me that there is a systematic pattern of non-
response which we must keep in mind when we evaluate our data.
Subjects are not missing at random because missingness depends on
the value of that variable. (Notice that the coefficient for
missing is set at 0 and labeled "redundant." It is redundant
because if someone is not in the Fundamentalist, Moderate, or
Liberal group, we know that they are missing. "Missing," in this
case, adds no new information.) You might be inclined to suggest that the previous analysis
doesn't give us exactly what we want because it does not tell us
about relationships among the three groups having non-missing
membership. In part, that's the point, because we wanted to
include all of the data in a way that told us something about
those people who failed to respond, as well as those who did
supply the necessary information. I am going to move slightly away from the problem of missing data in order to make this example more complete. If you prefer, you can jump to the next main heading. For those who want to focus on those subjects who
provided Religious Affiliation while not totally ignoring those
who did not, an alternative analysis would involve the use of
orthogonal contrasts not only to compare the non-responders with
all responders, but also to make specific comparisons among the
three known groups. But keep in mind that because the data are
not MCAR the means, particularly the grand mean, is likely to be
biased. (If Fundamentalists are less likely to respond, and if
they have higher optimism scores, the grand mean of optimism will
be biased downward from what it would have been had they
responded.) You can use SPSS (OneWay) or any other program to perform the
contrasts in question. (Or you can easily do it by hand). Suppose
that I am particularly interested in knowing how the
non-responders differ from the average of all responders, but
that I am also interested in comparing the Moderates with the
other two identified groups, and then the Fundamentalists with
the Liberals. I can run these contrasts by providing SPSS with
the following coefficients. The first contrast deals with those missing responses that
have caused us a problem, and the second and third contrasts deal
with differences among the identified groups. The results of this
analysis are presented below. (I have run this using SPSS syntax
because it produces more useful printout.) Table 3 OneWay Analysis of Variance on Optimism with
Orthogonal Contrasts I have presented this last analysis to make the point that you
have not lost a thing by including the missing cases in your
analysis relative to running the analysis excluding missing
observations. The second and third contrasts are exactly the same
as you would have run if you had only used the three identified
groups. However, this analysis includes the variability of
Optimism scores from the Missing group in determining the error
term, giving you somewhat more degrees of freedom. In a sense,
you can have your cake and eat it too, although, as I noted
above, the overall mean is biased relative to what it would have
been had we collected complete data. Cohen and Cohen (1983, Chapter 7) provide additional comments
on the treatment of missing group membership, and you might look
there for additional ideas. In particular, you might look at
their treatment of the case where there is missing information on
more than one independent variable. This situation, where data on group membership is missing, is
handled by the analysis above. Notice that, other than the
overall mean, the analysis is not dependent on the nature of the
mechanism behind missingness, which is in fact addressed by the
analysis. This will not necessarily be the case in the following
analysis, where the nature of missingness is important. We have a different kind of problem when we have data missing
on the dependent variable that makes the results of our study
much more difficult to interpret. If our data are in the form of
a one-way analysis of variance, and if we can assume that data
are missing completely at random, things are not particularly
bad. We will lose power because of smaller sample sizes, and the
means of larger groups will be estimated with less error than
means of smaller groups, but we will not have problems with
biased estimates. But keep in mind that I'm speaking here of data
that are missing completely at random. But suppose that our data are not missing completely at
random. Suppose that we are comparing two treatments for
hypertension. In the ideal study we have all participants take
the medication they are prescribed and then we compare blood
pressure levels at the end of treatment. But in the real world we
know that there is usually a dropout problem in medical studies.
In particular, those who are not helped by the treatment are more
likely to drop out, or to die. If one drug is quite successful
and the other is pretty much a failure, the sample size will be
very much smaller in the second treatment. Moreover, those who
remain, and whose blood pressure is eventually measured, are
likely to be the ones who benefitted from the treatment. So if we
see that the means of the two groups are nearly equal at the end
of treatment, we might be led to the conclusion that the two
treatments are equally effective. In fact, one was a horrible
treatment but we didn't have data from its "failures." In such a
setting missing data make the interpretation of means quite
risky. (Perhaps the most appropriate statistic would be the drop
out rate instead of the mean.) This is where I am going to split off and create a separate web page on the problem of missing dependent variables. The techniques there are quite a bit more sophisticated that those we have seen so far, but with software that is now generally available, there is much that we can do to salvage our study. To continue, go to Missing data imputation
I have shown how to do this with NORM. I was asked by a former student if I could write
something that was a step-by-step approach to using NORM, and that document is available
at "MissingDataNorm.html". You can also do something similar with SPSS and with SAS. In addition, there is an R program called Amelia (in honor of Amelia Earhart). I have written (or will write) instructions for the use of those programs. An important point, however, is that each program uses its own algorithm for imputing data, and it is not always clear exactly what algorithm they are using. For all practical purposes it probably doesn't matter, but I would like to know.
The continuation page for this material can be found at Missing Data Part Two.
Last revised 6/28/20151.1 The nature of missing data
Missing completely at random
Missing at random
Missing Not at Random
1.2 Traditional treatments for missing data
The simplest approach--listwise deletion.
1.3 Other Not-So-Good Approaches
A poor approach--pairwise deletion
Hot deck imputation
Mean substitution
Analysis
N
r
b
St. Err. b
Complete cases
62
.55
310.747
60.95
Mean substitution
69
.54
310.747
59.13
Regression substitution
1.4 The Special Case of Missing Group Membership
Missing Identification of Group
Membership
The most obvious thing to do in this situation would be to drop
all of those non-responders from the analysis, and to try to
convince ourselves that these are data missing completely at
random. (Even if we did convince ourselves, I doubt we would fool
our readers.) But a better approach is to make use of the fact
that non-response is itself a bit of data, and to put those
subjects into a group of their own. We would then have a specific
test on the null hypothesis that non-responders are no different
from other subjects in terms of their optimism score. And once we
establish the fact that this null hypothesis is reasonable (if we
should) we can then go ahead and compare the rest of the groups
with somewhat more confidence. On the other hand, if we find that
the non-responders differ systematically from the others on
optimism, then we need to take that into account in interpreting
differences among the remaining groups.
An Example
- - Description of Subpopulations - -
Summaries of OPTIMISM
By levels of GROUPNUM Group Membership
Variable Value Label Mean Std Dev Cases
For Entire Population 2.1633 3.2053 600
GROUPNUM 1 Fundamentalist 3.0944 2.8573 180
GROUPNUM 2 Moderate 1.9418 3.1629 275
GROUPNUM 3 Liberal .8783 3.2985 115
GROUPNUM 4 Missing 3.5333 3.1919 30
Total Cases = 600
From this table we see that there are substantial differences
among the three groups for whom Religious Affiliation is known.
We also see that the mean for the Missing subjects is much closer
to the mean of Fundamentalist than to the other means, which
might suggest that Fundamentalists were more likely to refuse to
provide a religious affiliation than were members of the other
groups.
At the top of Table 2 you see the means of the four groups, as
well as the unweighted mean of all groups (labeled Grand Mean).
Next in the table is an Analysis of Variance summary table,
showing that there are significant differences between groups;
(F3,599 = 14.395; p = .000).
Orthogonal Contrasts
Missing vs Non-Missing 1 1 1 -3
(Fundamental & Moderate) vs Liberals 1 1 -2 0
Fundamental vs Liberal 1 -1 0 0
ONEWAY
optimism BY groupnum(1 4)
/CONTRAST= 1 1 1 -3 /CONTRAST= 1 1 -2 /CONTRAST= 1 -1
/HARMONIC NONE
/FORMAT NOLABELS
/MISSING ANALYSIS .
Notice in Table 3 that the contrasts are computed with and
without pooling of error terms. In our particular case the
variances are sufficiently equal to allow us to pool error, but,
in fact, for these data it would not make any important
difference to the outcome which analysis we used. In Table 3 you
will see that all of the contrasts are significant. This means
that non-responders are significantly different from (and more
optimistic than) responders, that Fundamentalists and Moderates
combined are more Optimistic than Liberals, and that
Fundamentalists are in turn more optimistic than Moderates.
1.5 The More General Case of Missing Dependent Variables
Missing Data Imputation
Alternative Software Solutions
References
Allison, P. D. (2001) Missing Data
Thousand Oaks, CA: Sage Publications.Return
Barladi, A. N. &s;
Enders, C. K. (2010). An introduction to modern missing data analyses.
Journal of School Psychology, 48, 5-37.Return
Cohen, J. & Cohen, P. (1983) Applied
multiple regression/correlation analysis for the behavioral
sciences (2nd ed.).Hillsdale, NJ: Erlbaum. Return
Cohen, J. & Cohen, P., West, S. G. &
Aiken, L. S. (2003). Applied Multiple
Regression/Correlation Analysis for the Behavioral Sciences,
3rd edition. Mahwah, N.J.: Lawrence Erlbaum.
Return
Dunning, T., & Freedman, D.A. (2008)
Modeling section iffects. in Outhwaite, W. & Turner, S.
(eds) Handbook of Social Science Methodology. London:
Sage. Return
Howell,D. C. (2007)
The analysis of missing data. In Outhwaite, W. & Turner, S. Handbook of Social Science Methodology. London: Sage.Return
Little, R.J.A. & Rubin, D.B. (1987)
Statistical analysis with missing data. New York,
Wiley. Return
Jones, M.P. (1996). Indicator and
stratification methods for missing explanatory variables in
multiple linear regression Journal of the American
Statistical Association, 91,222-230. Return
Schafer, J. L. (1997). Analysis of incomplete multivariate data, London, Chapman & Hall.">
Schafer, J.L. & Olsden, M. K.. (1998). Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate Behavioral Research, 33, 545-571.Return
Scheuren, F. (2005). Multiple imputation:
How it began and continues. The American Statistician,
59, : 315-319.Return
Sethi, S. & Seligman, M.E.P. (1993).
Optimism and fundamentalism. Psychological Science, 4,
256-259. Return