Much of what I need to say about bootstrapping medians has already been said
in the section on bootstrapping
means. The procedures for bootstrapping almost any statistic follow a very
predictable pattern, and I am not going to repeat much of that here. The only
method that I have programmed as of the time of this original writing is
The percentile method applied to medians is essentially the same as that
applied to means. We take our original sample of n observations, and
sample from it, with replacement to create new samples. Each new sample
contains n elements. We create B bootstrap samples, where B is
a number of 1000 or more. These procedures draw at least 1000 bootstrap samples,
and can draw as many as 50,000. The program runs so quickly that time is not
really a concern for most purposes.
For each bootstrap sample we compute the sample median (denoted Med*), and
when we have drawn all of our samples, these values of Med* represent the
sampling distribution of the median. The program then sorts this sampling
distribution for low to high (which can take a fair amount of time for very
large values of B) and calculates the relevant percentiles. For our
purposes here, these will be the 2.5th and 97.5th percentile, though generically
these are the a/2 and 1-a/2
percentiles. For B = 1000, these will be the 25th and 975th order
statistics (values from the ordered series). The percentile method would take
these to be the upper and
lower cutoffs for the 95% confidence interval.
As I said when I discussed bootstrapping limits on the mean, Lunneborg did
not invent this approach (it is implicit in other sources), but his is a
convenient name to assign.
When the sampling distribution is perfectly symmetric, the percentile method
is quick, easy to comprehend, and accurate. But when the distribution isn't
symmetric, better limits can be obtained by doing the same thing that we did
with the mean. (I will use a = .05 and B =
1000 for a concrete example, assuming that you can make the obvious transition
to other values.) First take the original sample, and calculate its median
(denoted Med). Then do your resampling. Having drawn B bootstrap samples,
we sort them as before from low to high. We
will take the distance from the original sample median to the 25th score, and
label that "a." We take the distance from the 975th score to the
original sample median and label that "b." Then the lower CI is Med -
b, and the upper CI is Med + a. I am sure that you will think that I wrote that backw
When we were working with the mean, we could fall back on the traditional
method of creating confidence limits. This method computes .
However, there are two important features of this approach. In the first place,
it relies on the Student t distribution, which is based on an assumption
of normality. Even if the population is not normal, the Central Limit Theorem
tells us that the sampling distribution will be at least approximately normal,
so we don't worry too much. We do not have a corresponding theorem for the
median, and I don't know how quickly the sampling distribution approaches normal
as n increases. Moreover, this formula requires that we estimate the
standard error of the mean by taking the standard deviation of the sample and
dividing by the square root of n. But we do not have a comparable formula
when we are talking about the median, especially if the population is not
normal. So the traditional method is out when it comes to the median.
Bootstrapped t Method
Just as we did with the mean, we can calculate a bootstrapped t
estimate of the confidence limits for the median. However, we have to overcome
at least two problems.
You should recall that for the mean we drew a bootstrapped sample, calculated
the sample mean and standard deviation, converted that to a t* statistic
by , and
stored away the values of t*, which creates the sampling distribution of t*.
Under ideal conditions this would simply reproduce Student's t
distribution, and under less than ideal conditions this would reproduce
something like, but more accurate than, Student's t distribution. The a/2
and 1-a/2 cutoffs give us the t values we need
for the traditional approach.
However, we have a problem. We could, and will, substitute Med* and Med for and
calculate t, but what do
we use for ? This is
our estimate of the
standard error, but it only works for the mean. It is not the standard error of
the median. To get the standard error of the median, we have to have the
empirical standard deviation of a bunch of medians. In other words, at this
point in our calculations we have to come to a full stop, draw a bunch of new
bootstrap samples, using our current bootstrap sample as if it were the original
sample, calculate the standard deviation of that bunch of sample medians, then
calculate t, and go on with drawing bootstrap sample number 2. (The
concept of dividing (Med* - Med) by its standard error is known as Studentizing,
and it is an important step. In other
words, we need to do bootstrapping within bootstrapping. That is a lot of work,
but computers never need to sleep or rest, so it is not impossible.
Let B represent the number of bootstrap samples we calculate in the
outer loop, and let b represent the number of bootstrap samples we draw
based on each outer bootstrap samples. Thus we will draw B*b
samples in all. Efron works in terms of B being a number on the order of
1000, while b is a number on the order of 50. There is nothing sacred
about these values, but they should give you the general idea.
One more step. From the subsamples taken above, we will get B values
of t*, from which we will find the a/2 and 1-a/2
cutoffs. We use those as we have in the traditional method. But we need one more
thing--we need the standard error of the median that corresponds to the standard
error of the mean in the traditional formula. We can obtain an estimate of that
by taking the medians of our B samples, and simply calculating the
standard deviation of that distribution. Then our confidence limits become.
This is analogous to what we did with the mean.
I have discussed this approach with respect to the median, but you could use
it with respect to a wide array of statistics. The nice thing is that we don't
need to know a formula for the standard error we seek--we just go out and
estimate it by brute force.
Diagram of the bootstrapped t method:
Original Sample: 2 2 3
4 5 5 5 6 7
9 --> Med
Sample 1: 2 2 2
5 6 6 6 7
7 9 --> Med1*
Subsample 1a: 2
2 5 5 5 5
6 6 7 9 --> Med1a
Subsample 1b: 2
5 5 5 6 7
7 7 7 9 --> Med1b
Subsample 1c: 2
5 5 6 6 6
7 7 9 9 --> Med1c
Sample 2: 3 3 3
5 5 5 6 6
6 9 --> Med2*
Subsample 2a: 3
3 5 5 6 6
6 9 9 9 --> Med2a
Subsample 2b: 3
3 3 3 5 5
5 6 6 9 --> Med2b
Subsample 2c 3
3 5 5 5 5
5 6 6 9 --> Med2c
Med = median of original sample--around which we will build our interval
Med1*, Med2*, etc. = Outer set of bootstrapped medians, which will give
us standard error of the median.
Med1a, Med1b, Med1c, etc--inner set of bootstrapped medians, which will
be used to calculate t*1. Similarly for Med2a, Med2b, and Med2c. These sets
will generate the estimate of the standard error of Med*1, Med*2, etc.
Although I have discussed the bootstrapped t method in some
detail, it is not implemented in Resampling.exe.
I could say the same things here that I said for confidence limits on the
mean, with respect for corrections for bias and acceleration. I will not repeat
that here, but the translation should be straightforward--though the
calculations are not.
Reaction time example
The Resampling.exe program calculates a confidence interval on the median
using the bootstrapped t approach. We can illustrate the result of this
method using an example that I have used elsewhere. This is based on a
study by Sternberg (1966), in which he asked subjects to view a set of digits
for a brief time (measured in milliseconds) and then see a test digit. They were
required to press one key if the test digit had been present in the comparison
digits, and press a different key if it had not. I will use the data from the
condition in which 5 comparison digits were first presented, and the test
stimulus actually was one of those digits.
The distribution of reaction times is somewhat skewed. Notice that it has a
range of about 60 milliseconds, with a mean of about 65 milliseconds (the median
was 62). We would expect a positive skew because of the nature of the task.
If we were calculating 95% confidence limits on the mean, SPSS could
tell us that those limits were 61.01 and 68.19. However, SPSS cannot give us
limits on the median
If we use our program to calculate confidence limits on the median, we obtain
the following results.
Notice that these limits are somewhat narrower (57.5 and 65.0) and that they
are slightly asymmetric around the sample median. The number of different values
of the medians in bootstrapped samples were rather limited, but that is because
the median must either be one of the obtained values, or the average of two of
them. Note also the slight positive skewness of the sampling distribution of the
median. The median is not as well behaved as the mean relative to the central
limit theorem, which does not apply to medians.
David C. Howell
University of Vermont