Parametric and Resampling Statistics (cont.):

The Absence of Random Assignment

The entire logic of randomization tests rests on the concept of random assignment. Whereas parametric tests rely on the idea of random sampling to justify parameter estimation, randomization tests rely on the idea of random assignment to justify randomizing (or shuffling) the data in line with the null hypothesis. If participants were randomly assigned to treatments, and if the null hypothesis is true, then a given score was equally likely to fall in each of the treatments. This means that under the null hypothesis all assignments of scores to treatments, given constraints on sample size, are equally probable.

Although random assignment and the logical operations which flow from it, are powerful tools for hypothesis testing, there are some drawbacks. If we want to compare two treatments of anorexia, we can randomly assign participants to treatments and then run our test. But what if we want to know whether depressed patients perform worse than nondepressed patients on a test of vigilance? We cannot possibly randomly assign patients to the depressed and nondepressed conditions, and this failure undercuts the rationale for a randomization test. If we could not randomly assign a particular person to a particular condition, then we have no basis to say that a given score would be equally likely to appear under either condition. This would appear to severely limit the use of randomization tests, and at one level it certainly does. What's a body to do?

In one sense we are in the same predicament as the parametric statistician who knows that she doesn't have a random sample, but is willing to make statistical inferences anyway, on the grounds that she thinks the sample is similar to the sample that would result if she could draw participants randomly. In both cases, any inferences we draw are not statistical inferences, but are logical inferences. An excellent paper addressing random assignment was published in 1969 by Winch and Campbell. The title is revealing: "Proof? No. Evidence? Yes. The significance of tests of significance." Donald Campbell is a highly respected methodologist. He has discussed at great length the threats to the internal and external validity of an experiment. He has identified 15 such threats, one of which is the argument that an obtained difference is due to chance. Winch and Campbell argue that if we perform a randomization test on a set of data, regardless of whether those data were randomly sampled or assigned, and even if the data completely exhaust the population or populations, and if the result that we obtain in our study show that our treatments differ from each other in means, medians, variances, or whatever, more than randomly assigned scores would normally differ, we can reject the hypothesis that the difference is due to chance. We may not be able to eliminate other threats to internal validity—the difference may be due to many causes other than the one we identify, but it is not due to random fluctuations in the data. And that is a very powerful conclusion.

Winch and Campbell use the example of comparing political stability in all countries with press censorship with stability in all countries without censorship. Traditional usage would suggest that a statistical test is not valid, because we have exhausted the two populations and the countries were not randomly assigned to conditions. Moreover, because we have whole populations, we know their mean levels of stability and whether they differ. So if we know that μ1 < μ2 why do we need a test? However, it is still a plausible hypothesis that the difference we see is of such a size that it would occur quite frequently by chance, and thus it is worth running the statistical test anyway. If we can rule out the explanation that the difference we found is of the kind that would often occur by chance, then we have gone some way toward drawing inferences about the effects of censorship. Of course we have absolutely no basis for concluding that censorship causes instability—a reasonable case might be made that it is the other way around, or it might be due to concomitants of censorship, but we nevertheless have a phenomenon to be explained.

The point of this page is not to suggest that random assignment is not important, and that it doesn't really matter whether we assign cases randomly to treatments. The point is, rather, that simply because we do not have random assignment is no excuse to stand around with our hands in our pockets, claiming that there is nothing that we can do. There is a lot that we can do, even if our final conclusions are less precise than we would like them to be. This is an important point, because many of the comparisons that we like to make are between pre-existing groups for which random assignment is impossible.

Return to http://philosophy.html

Last revised: 04/01/2007
dch