I want to discuss randomization procedures for data analysis, but I have worked myself into somewhat of a mess. Years ago I wrote a set of programs in Visual Basic that worked very well. (They were quite pretty, too.) I then created a set of pages based on those programs. More recently I have approached the whole problem from the point of view of *R*. These pages come in different forms. Some of them, such as philosophy.html, are generic and do not depend on any particular program. They are like the first couple of chapters in a textbook, which layout the underlying issues. The rest of them are directed toward a specific problem--e.g. comparing the means of two groups, running an analysis of variance, and so on. Where I had a page discussing procedures using Visual Basic, and another page covering roughly the same material using *R*, I am combining the two. Often you will see a discussion about comparing two groups where the underlying program is first, Visual Basic, followed by *R* code to do the same thing in *R*. I tried working with two separate pages, but that became unwieldy. There is also a page about downloading *R* and understanding the basic commands, and another page about where to find the Visual Basic programs and how to install them.

Before elaborating on specific procedures, I need to say something, actually quite a lot, on the characteristics of randomization tests.

Randomization tests differ from parametric tests in almost every respect.

- There is no requirement that we have random samples from one or more populations—in fact we usually have not sampled randomly.
- Why do we worry about random samples with parametric tests? Because we use the mean and variance to estimate the population parameters, and we need random samples from some population in order to have legitimate estimators. Randomization tests don't estimate parameters.
- Parametric tests need to assume normality so that our test statistic, such as
*t*, follows a*t*distribution. - For resampling tests we rarely think in terms of the populations from which the data came, and there is no need to assume anything about normality or homoscedasticity.
- Our null hypothesis has nothing to do with parameters, but is phrased rather vaguely, as, for example, the hypothesis that the treatment has no effect on the how participants perform.
- This is an important distinction. The alternative hypothesis is simply that different treatments have an effect. But, note that we haven't specified whether the difference will reveal itself in terms of means, or variances, or some other statistic. That we leave up to the statistic we calculate in running the test.
- That might be phrased a bit more precisely by saying that, under the null hypothesis, the score that is associated with a participant is independent of the treatment that person received.
- Because we are not concerned with populations, we are not concerned with estimating (or even testing) characteristics of those populations.
- We do calculate some sort of test statistic, however we do not compare that statistic to tabled distributions.
- Instead, we compare it to the results we obtain when we repeatedly randomize the data across the groups, and calculate the corresponding statistic for each randomization.
- Even more than parametric tests, randomization tests emphasize the importance of random assignment of participants to treatments.
- This is very important because we make statements of the form "If treatments had no effect, that particular score could just as easily ended up in the second group instead of the first." You need random assignment to do that.
- I need to hedge a bit here. If the groups are males and females, you obviously cannot randomly assign subjects to groups. But you need to assume that, conditional on gender, there are no other systematic differences in group assignment.

I need to say something about exchangeability. It applies to the null hypothesis sampling distribution--in other words, data are exchangeable under the null. Phil Good, for example, is a big fan of this term. He would argue that if the scores in one group have a higher variance than the other, then the data are not exchangeable and the test is not valid. BUT, if the hypothesis being tested is that treatments have no effect on scores, then under the null hypothesis why would one set of scores have a higher variance other than by chance? The problem is that we have to select the statistic to test with care. We normally test means, or their equivalent, but we also need to consider variances, for example, because that is another way in which the treatment groups could differ. If we are focussing on means, then we have to assume exchangeability including variance. But we need to be specific. So much for that little hobby horse of mine.

Cliff Lunneborg has written an excellent discussion of randomization tests. (Unfortunately, Cliff Lunneborg has died, and the paper is no longer available at his web site. By some good fortune, I happen to have copies of those pages. You can download files at "Paper-One and Paper Two. I consider these required reading to fully understand the underlying issues behind randomization tests. Lunneborg writes extremely well, but (and?) he chooses his words very carefully. Don't read this when you are too tired to do anything else--you have to be alert.

I have broken these many pages down into three sections. The first deals with the pages that are basically designed to explain the logic and structure of resampling tests. I have labled these "Background Material" for obvious reasons. I then move to what I have called "Randomization Tests." These are pages that deal with specific tests, such as comparing two group means. Here I have combined my pages on Visual Basic programs and those on programs written in *R*. Generally the Visual Basic section serves better to describe the tests and their logic, while the sectionon *R* is primarily aimed at providing the code for similar tests in *R*. In *R* I have deliberately pulled coding steps apart, when I could have shortened the code by using more complex commands. I did this to make it easier for you to follow what I am doing. After the randomization tests I provide a section on bootstrapping. This is a bit shorter simply because topics on bootstrapping are more easily covered and are fewer. Again I have tried to provide programs in both Visual Basic and *R*.

- The Philosophy of Randomization Tests
- Random Assigment and Random Sampling
- Randomization Tests/Parametric Assumptions
- The Null Hypothesis
- Absence of Random Assignment
- Comparing Means of Two Groups
- Two Paired Samples
- Comparing two medians
- Correlation of two variables
- Comparing more than two groups
- Repeated measures analysis of variance
- Permutation tests on factorial Anova

Edgington, E. & Onghena, P. (2007) Randomization tests, New York, Chapman & Hall.

Efron, B. & Tibshirani, R. J. (1993) *An introduction to the bootstrap*. New York: Chapman and Hall.

Lunneborg, C. E. (2000) Random assignment of available cases: Let the inference fit the design. http://faculty.washington.edu/lunnebor/Australia/randomiz.pdf

dch:

David C. Howell

University of Vermont

David.Howell@uvm.edu