To me, one of the major problems facing people who are interested in exploring resampling procedures is understanding (or even recognizing) the difference in philosophy between the traditional parametric tests and resampling statistics. Most discussions of resampling procedures assume that you and the author share a common understanding, and therefore skip over the philosophy very quickly. Perhaps the best exception to this statement is a paper by Lunneborg. By some good fortune, I happen to have copies of those pages. You can download the zipped file at Lunneorg papers I consider this required reading to understand the underlying issues behind randomization tests.

However, the philosophies that lie behind parametric and resampling approaches (especially randomization tests) are quite different. It isn't simply that parametric tests focus on random sampling and population assumptions, while randomization tests focus on random assignment and the data at hand. While both of those statements are true, there is a lot more going on behind the scenes that people rarely think to talk about. That is why I have used the word "philosophy" for the title of this page, rather than just focusing on "assumptions."

Much as I like the resampling approach to statistical
tests, and
easy as it is to understand the calculations behind them,
once you
get beyond the calculations and try to understand the true
differences, things can get really messy. For me, there
are at least
three reasons for that. In the first place, I have taught
parametric
statistics for a long time, and have incorporated that
approach into
my thinking. Thus when I read something different, it
takes a long
time to realize that what I have been taking for granted
isn't really
what the author has in mind. We all try to fit what we
are trying
to learn into what we already know, and sometimes that
hurts rather
than helps. For a second reason, I have long been familiar
with the
parametric statisticians' appeal to robustness, and with
the idea
behind robustness that assumptions often really don't
matter all that
much. It is common to read, in my own work as well as that
of others,
that such and such a test is robust and we don't have to
be too
worried about the fact that the underlying assumptions are
probably
violated. And it is only a tiny step from here to the idea
that even
if we don't have random assignment, it doesn't matter too
much, and
thus the idea of parameter estimation is still intact and a
legitimate basis for hypothesis testing. It is hard to say
"Hey,
wait a minute, these guys aren't really talking about
parameter
estimation." A third explanation is that most of us
have too
often heard the call of the "purists." They
constantly
bring out the argument that without normality or
homogeneity, or
whatever their favorite assumption, a test is not valid
and we can't
use it. We routinely brush aside these criticisms as the
province of
those who are rarely faced with real-world data, and thus
don't have
to dirty their hands with violations of assumptions. In
other words,
we are used to sweeping aside the arguments of the
purists, and don't
really notice what they are saying. In fact, the problem
is partly of
their own making. They are so busy telling us that we are
doing the
wrong thing, that they don't really take the time to
emphasize that
their approach is more than just a better way of doing the
same
thing. *It is a way of doing something different.*

I want to talk first about sampling (random or otherwise), then about the role of underlying parametric assumptions, and then about the null hypothesis. Finally, I will take a look at how far we can push the issue of random assignment. The links below lead to the appropriate pages.

I would like to end with a final comment on the idea of parametric and nonparametric tests. The distinction between them is not always clear, and I probably won't make it completely clear here. But I want to consider a very simple example that pits one approach against the other, though in this case the two approaches would come up with the same answer. I will take the very prosaic example of flipping coins to test for fairness. (I don't know if anyone has ever truly tested a coin to see if it is fair, but we often use this as an example because it is parallel to many things that we actually do.)

Suppose that we have a coin, perhaps an old Roman coin that has more silver on the front than on the back. We want to know if this coin is what we all would call a "fair coin." Suppose that we flip the coin 9 times and obtain 9 heads. There are two different questions we could ask, and, though subtly different, they address the distinction between parametric and resampling statistical tests.

The first approach establishes a null hypothesis that
*p* =
.50. It then asks "What is the probability of getting
9 heads
out of 9 flips if the null hypothesis (*p* = .50) is
true. This
question can be answered by some simple calculations based
on the
binomial distribution. Notice that we have established a
parameter
(*p*), and calculated a result based on that
parameter. For this
example, this is probably the easiest approach to use. It
is probably
the approach that you or I would think of first, because
it is in all
of our textbooks.

The second approach jumps right in and asks "If
this coin is
fair, how often will we get 9 heads of out 9 tosses?"
That
question doesn't use the word "parameter," nor
need it
*necessarily* refer to a parameter. Most of us mentally
translate this question into the the first
question—if the coin
is fair, the probability of a head on any one toss is .50,
but we
don't need to do that. We just do because our textbooks
told us to.
We could just as easily take a 9 fair coins, toss them in
the
air a whole
bunch of times, and count how often 9 heads come up in any
one toss.
We could run the test this way without ever thinking about
what *p*
would be under the null. We don't even have to know the
slightest
thing about the binomial distribution. This turns out to
be a completely
different
way of testing a null hypothesis, and it is the
fundamental approach
of permutation test statistics.

Many resampling papers start with some example, point
to how unreasonable it
is to assume normality or homoscedasticity, and then
pull out their favorite
test as a way around this problem. By doing this, they
give the impression that
resampling statistics are just something you drag out
when assumptions are violated.
In other words, those tests are justified on the basis
of assumption violations,
which is *not necessarily* their strong suit, and
the idea that they are
good tests for other reasons gets swept aside. And those
papers that do not
start by complaining about normality or
homoscedasticity, start by beating up
on people who don't have random samples, and look
irrelevant to anyone who thinks
that his or her samples are "random enough."
In both cases the appeal
seems to depend on throwing stones at parametric tests,
when I think it should
be on putting forth resampling statistics as a
legitimate way of developing
useful and meaningful tests for important research
questions (notice that I
didn't use the phrase "null hypothesis"). If I
had two groups that
were each randomly sampled from normally distributed
populations with equal
variances, I could still use a randomization test. It
would be just as valid
as a parametric test, though it's power would sometimes
be less. If I had two
random samples from populations with unequal variances,
my resampling test would
continue to be valid, and, in this case, might even be
more powerful than the
corresponding parametric test. And finally, if I
randomly split a convenience
sample with no possible claim to randomness into two
experimental groups, with
or without homoscedasticity, my randomization test would
be legitimate, whereas
any conclusions that I drew from a parametric test might
be dubious at best.

Go to Resampling.html

dch: