Overview of Randomization Tests

Randomization tests can be thought of as another way to examine data, and do not restrictive assumptions about populations. As a very quick example, suppose that you have two groups of scores. One came from subjects who were presented with a particular treatment, and the other came from a subjects who did not receive the treatment. The question is, "can we draw a conclusion about the effectiveness of the treatment by looking at thoses two sets of scores." And we won't make any assumptions about the distribution of scores, though we will assume that subjects were assigned to groups at random.

Let's start by assuming that the treatment had absolutely no effect. And let's assume that one participant had a score of 27. If the treatment had no effect, then that 27 would be equally likely to have come from the treatment group as it is to come from the control group. The same holds for all of the other scores. So let's set out by taking all of our data, tossing it in the air, and letting half of it fall in one group and the other half in the other group. That is an example of what we would expect if the treatment had no effect. Now let's calculate the mean, or perhaps the median, of each group, and then calculate the difference in the medians (or means). Record that number, and then toss the data in the air again, separate them at random into two groups, and again calculate the difference in the medians. Now keep doing this a great many times, say 10,000, each time recording the median difference. Those 10,000 differences are 10,000 examples of what you would expect if there were no treatment effect.

Now consider the data that you actually obtained. If there was no treatment effect, your obtained median difference would look like all of the other differences. But suppose that your median difference was guite large--totally unlike the kind of differences you found with random assignment of scores to groups. You would then conclude that the difference you actually found is totally unlike the differences you find when there is no effect, and you would likely conclude that the treatment actually worked--it made a difference how the scores were distributed. I am going to be a bit sloppy here and say that this result would lead us to reject the "null hypothesis." Notice that I have said nothing about normality and nothing about homogeneity of variance. In fact I have said nothing at all about population parameters. That is part of the nature of randomization, or "permutation," tests.

Now, of course, I do not expect you to sit around tossing your data in the air 10,000 times. That would be absurd. But there is nothing to prevent you from letting your computer do roughly that, and you would probably be amazed at how fast it can do so. Without a computer, randomization tests would be totally impractical, which is why they did not arise years ago. But with a computer, such tests are entirely practical and have several advantages of the usual parametric tests that fill most of our textbooks.

I have worked with randomization tests for a number of years. At one time I wrote a set of programs in Visual Basic that ran such tests and printed out the results if a very neat and attractive way. But no one uses Visual Basic any more. But you can do the same thing with other kinds of software, and, in particular, with R. Much of what follows is based on a set of R programs, but even if you do not want to play with R, you can learn a great deal just by reading the text. If you want to run such tests using SPSS, look at

Hayes, A. F. (1998). SPSS procedures for approximate randomization tests. Behavior Research Methods, Instruments & Computers, 30(3), 536-543.

If you want to use SAS, look at

Ru San Chen, R. S. & Dunlap, W. P.(1993) SAS procedures for approximate randomization tests. Behavior Research Methods, Instruments, & Computers 25 (3), pp 406–409.

An Elaboration

Before discussing specific procedures, I need to say something, actually quite a lot, on the characteristics of randomization tests.