Parametric and Resampling Statistics:

Two Different Philosophies of Hypothesis Testing -- Or is it Three?

David C. Howell

University of Vermont

To me, one of the major problems facing people who are interested in exploring resampling procedures is understanding (or even recognizing) the difference in philosophy between the traditional parametric tests and resampling statistics. Most discussions of resampling procedures assume that you and the author share a common understanding, and therefore skip over the philosophy very quickly. Perhaps the best exception to this statement is a paper by Lunneborg. By some good fortune, I happen to have copies of those pages. You can download the zipped file at Lunneorg papers I consider this required reading to understand the underlying issues behind randomization tests.

However, the philosophies that lie behind parametric and resampling approaches (especially randomization tests) are quite different. It isn't simply that parametric tests focus on random sampling and population assumptions, while randomization tests focus on random assignment and the data at hand. While both of those statements are true, there is a lot more going on behind the scenes that people rarely think to talk about. That is why I have used the word "philosophy" for the title of this page, rather than just focusing on "assumptions."

Much as I like the resampling approach to statistical tests, and easy as it is to understand the calculations behind them, once you get beyond the calculations and try to understand the true differences, things can get really messy. For me, there are at least three reasons for that. In the first place, I have taught parametric statistics for a long time, and have incorporated that approach into my thinking. Thus when I read something different, it takes a long time to realize that what I have been taking for granted isn't really what the author has in mind. We all try to fit what we are trying to learn into what we already know, and sometimes that hurts rather than helps. For a second reason, I have long been familiar with the parametric statisticians' appeal to robustness, and with the idea behind robustness that assumptions often really don't matter all that much. It is common to read, in my own work as well as that of others, that such and such a test is robust and we don't have to be too worried about the fact that the underlying assumptions are probably violated. And it is only a tiny step from here to the idea that even if we don't have random assignment, it doesn't matter too much, and thus the idea of parameter estimation is still intact and a legitimate basis for hypothesis testing. It is hard to say "Hey, wait a minute, these guys aren't really talking about parameter estimation." A third explanation is that most of us have too often heard the call of the "purists." They constantly bring out the argument that without normality or homogeneity, or whatever their favorite assumption, a test is not valid and we can't use it. We routinely brush aside these criticisms as the province of those who are rarely faced with real-world data, and thus don't have to dirty their hands with violations of assumptions. In other words, we are used to sweeping aside the arguments of the purists, and don't really notice what they are saying. In fact, the problem is partly of their own making. They are so busy telling us that we are doing the wrong thing, that they don't really take the time to emphasize that their approach is more than just a better way of doing the same thing. It is a way of doing something different.

I want to talk first about sampling (random or otherwise), then about the role of underlying parametric assumptions, and then about the null hypothesis. Finally, I will take a look at how far we can push the issue of random assignment. The links below lead to the appropriate pages.

A (Longish) Final Word

I would like to end with a final comment on the idea of parametric and nonparametric tests. The distinction between them is not always clear, and I probably won't make it completely clear here. But I want to consider a very simple example that pits one approach against the other, though in this case the two approaches would come up with the same answer. I will take the very prosaic example of flipping coins to test for fairness. (I don't know if anyone has ever truly tested a coin to see if it is fair, but we often use this as an example because it is parallel to many things that we actually do.)

Suppose that we have a coin, perhaps an old Roman coin that has more silver on the front than on the back. We want to know if this coin is what we all would call a "fair coin." Suppose that we flip the coin 9 times and obtain 9 heads. There are two different questions we could ask, and, though subtly different, they address the distinction between parametric and resampling statistical tests.

The first approach establishes a null hypothesis that p = .50. It then asks "What is the probability of getting 9 heads out of 9 flips if the null hypothesis (p = .50) is true. This question can be answered by some simple calculations based on the binomial distribution. Notice that we have established a parameter (p), and calculated a result based on that parameter. For this example, this is probably the easiest approach to use. It is probably the approach that you or I would think of first, because it is in all of our textbooks.

The second approach jumps right in and asks "If this coin is fair, how often will we get 9 heads of out 9 tosses?" That question doesn't use the word "parameter," nor need it necessarily refer to a parameter. Most of us mentally translate this question into the the first question—if the coin is fair, the probability of a head on any one toss is .50, but we don't need to do that. We just do because our textbooks told us to. We could just as easily take a 9 fair coins, toss them in the air a whole bunch of times, and count how often 9 heads come up in any one toss. We could run the test this way without ever thinking about what p would be under the null. We don't even have to know the slightest thing about the binomial distribution. This turns out to be a completely different way of testing a null hypothesis, and it is the fundamental approach of permutation test statistics.

Many resampling papers start with some example, point to how unreasonable it is to assume normality or homoscedasticity, and then pull out their favorite test as a way around this problem. By doing this, they give the impression that resampling statistics are just something you drag out when assumptions are violated. In other words, those tests are justified on the basis of assumption violations, which is not necessarily their strong suit, and the idea that they are good tests for other reasons gets swept aside. And those papers that do not start by complaining about normality or homoscedasticity, start by beating up on people who don't have random samples, and look irrelevant to anyone who thinks that his or her samples are "random enough." In both cases the appeal seems to depend on throwing stones at parametric tests, when I think it should be on putting forth resampling statistics as a legitimate way of developing useful and meaningful tests for important research questions (notice that I didn't use the phrase "null hypothesis"). If I had two groups that were each randomly sampled from normally distributed populations with equal variances, I could still use a randomization test. It would be just as valid as a parametric test, though it's power would sometimes be less. If I had two random samples from populations with unequal variances, my resampling test would continue to be valid, and, in this case, might even be more powerful than the corresponding parametric test. And finally, if I randomly split a convenience sample with no possible claim to randomness into two experimental groups, with or without homoscedasticity, my randomization test would be legitimate, whereas any conclusions that I drew from a parametric test might be dubious at best.

Go to Resampling.html