The major assumption behind traditional parametric procedures--more fundamental than normality and homogeneity of variance--is the assumption that we have randomly sampled from some population (usually a normal one). Of course virtually no study you are likely to run will employ true random sampling, but leave that aside for the moment. To see why this assumption is so critical, consider an example in which we draw two samples, calculate the sample means and variances, and use those as estimates of the corresponding population parameters. For example, we might draw a random sample of anorexic girls (potentially) given one treatment, and a random sample of anorexic girls given another treatment, and use our statistical test to draw inferences about the parameters of the corresponding populations from which the girls were randomly sampled. We would probably like to show that our favorite treatment leads to greater weight gain than the competing treatment, and thus the mean of the population of all girls given our favorite treatment is greater than the mean of the other population. But statistically, it makes no sense to say that the sample means are estimates of the corresponding population parameters unless the samples are drawn randomly from that (those) populations(s). (Using the 12 middle school girls in your third period living-arts class is not going to give you a believable estimate of U. S. (let alone world) weights of pre-adolescent girls.) That is why the assumption of random sampling is so critical. In the extreme, if we don't sample randomly, we can't say anything meaningful about the parameters, so why bother? That is part of the argument put forth by the resampling camp.
Of course, those of us who have been involved in statistics for any length of time recognize this assumption, but we rarely give it much thought. We assume that our sample, though not really random, is a pretty good example of what we would have if we had the resources to draw truly random samples, and we go merrily on our way, confident in the belief that the samples we actually have are "good enough" for the purpose. That is where the parametric folks and the resampling folks have a parting of the ways.
The parametric people are not necessarily wrong in thinking that on occasion nonrandom sampling is good enough. If we are measuring something that would not be expected to vary systematically among participants, such as the effect of specific stimulus variations on visual illusions, then a convenience sample may give acceptable results. But keep in mind that any inferences we draw are not statistical inferences, but logical inferences. Without random sampling we cannot make a statistical inference about the mean of a larger population. But on nonstatistical grounds it may make good sense to assume that we have learned something about how people in general process visual information. But using that kind of argument to brush aside some of the criticisms of parametric tests doesn't diminish the fact that the resampling approach legitimately differs in its underlying philosophy.
The resampling approach, and for now I mean the randomization test approach, and not bootstrapping, really looks at the problem differently. In the first place, people in that area don't give a "population" the centrality that we are used to assigning to it in parametric statistics. They don't speak as if they sit around fondly imagining those lovely bell-shaped distributions with numbers streaming out of them, that we often see in introductory textbooks. In fact, they hardly appear to think about populations at all. And they certainly don't think about drawing random samples from those imaginary populations. Those people are as qualified as you could wish as statisticians, but they don't worry too much about estimating parameters, for which you really do need random samples. They just want to know the likelihood of the sample data falling as they did if treatments were equally effective. And for that, they don't absolutely need to think of populations.
In the history of statistics, the procedures with which we are most familiar were developed on the assumption of random sampling. And they were developed with the expectation that we are trying to estimate the corresponding population mean, variance, or whatever. This idea of "estimation" is central to the whole history of traditional statistics--we estimate population means so that we can (hopefully) conclude that they are different and that the treatments have different effects.
But that is not what the randomization test folks are trying to do. They start with the assumption that samples are probably not drawn randomly, and assume that we have no valid basis (or need) for estimating population parameters. This, I think, is the best reason to think of these procedures as nonparametric procedures, though there are other reasons to call them that. But if we can't estimate population parameters, and thus have no legitimate basis for retaining or rejecting a null hypothesis about those parameters, what basis do we have for constructing any statistical test. It turns out that we have legitimate alternative ways for testing our hypothesis, though I'm not sure that we should even be calling it a null hypothesis.
This difference over the role of random sampling is a critical difference between the two approaches. But that is not all. The resampling people, in particular, care greatly about random assignment. The whole approach is based on the idea of random assignment of cases to conditions. That will appear to create problems later on, but take it as part of the underlying rationale. Both groups certainly think that random assignment to conditions is important, primarily because it rules out alternative explanations for any differences that are found. But the resampling camp goes further, and makes it the center point of their analysis. To put it very succinctly, a randomization test works on the logical principle that if cases were randomly assigned to treatments, and if treatments have absolutely no effect on scores, then a particular score is just as likely to have appeared under one condition than under any other. Notice that the principle of random assignment tells us that if the null hypothesis is true, we could validly shuffle the data and expect to get essentially the same results. This is why random assignment is fundamental to the statistical procedure employed.
Return to Philosophy.html
David C. Howell
University of Vermont