|
|
|
|
|
Subject |
12 Hours |
10 Min. |
Gain |
| 1 | 10.0 | 20.0 | 10.0 |
| 2 | 6.5 | 14.0 | 7.5 |
| 3 | 8.0 | 13.5 | 5.5 |
| 4 | 12.0 | 18.0 | 6.0 |
| 4 | 5.0 | 14.5 | 9.5 |
| 5 | 11.5 | 9.0 | -2.5 |
| 6 | 5.0 | 18.0 | 13.0 |
| 8 | 3.5 | 6.5 | 3.0 |
| 9 | 7.5 | 7.5 | 0.0 |
| 10 | 5.8 | 6.0 | 0.2 |
| 11 | 4.7 | 25.0 | 20.3 |
| 12 | 8.0 | 12.0 | 4.0 |
| 13 | 7.0 | 15.0 | 8.0 |
| 14 | 17.0 | 42.0 | 25.0 |
| 15 | 8.8 | 16.0 | 7.2 |
| 16 | 17.0 | 52.0 | 35.0 |
| 17 | 15.0 | 11.5 | -3.5 |
| 18 | 4.4 | 2.5 | -1.9 |
| 19 | 2.0 | 2.0 | 0.0 |
The relevant statistics are:
Before Mean = 8.35 St. Dev. = 4.40
After Mean = 16.05 St. Dev. = 12.51
Diff Mean = 7.70 St. Dev. = 13.52 N = 19
Suppose that stress has nothing to do with beta-endorphin levels.
- We would expect that there is just as great a chance that the first score is less than the second, as there is that the second is less than the first.
- Therefore even if we stayed with these data values, there would be a random arrangement of positive and negative signs on the Diff column.
- You could think of it as picking a pair of scores, and flipping a coin to decide which would be the 12 hr. score, and which would be the 10 min. score.
- IF stress is irrelevant, this is a perfectly good model for what is happening.
- and, if stress is irrelevant, the set of data we have are no more or less likely than any other set of data.
- We will use these particular values, and assign the sign of the difference at random.
- Then we could take the mean of the Diff column and see how large it is.
- This is the kind of mean we will get when stress doesn't make any difference.
- We could do this 1000 times and plot the means.
- Those would be the means that we would expect if the signs were distributed at random and there was no effect due to stress.
- Then we could see how those compare to the actual mean difference we obtained above.
- It is important to keep the probability of a sample and the probability of a mean separate. When we flip coins, it is very unlikely that we would get 10 heads. But the result HHHHHHHHHH is just as probable, or as improbable, as the specific result HTHHHTHHTT, which has 6 heads and 4 tails. The reason that "6 heads" is more likely than "10 heads" is that there are so many more ways of getting 6 heads.
- To take 2 coins, the probabilities of
HH HT TH TT
are all equally likely at .25. But the probability of 1 head and 1 tail is more likely than 2 heads, because the former consists of HT and TH, while the latter is just HH.I created a program using resampling statistics to do randomly assign + and - as discussed above. This example is based on a program called Resampling Stats by Simon and Bruce. It is a very simple program, but also a very powerful one. I have no desire for people to understand the following code, but I think that reading through it once is good for the soul. It gives a general understanding of what is going on.
numbers (10 7.5 5.5 6 9.5 2.5 13 3 0 .2 20.3 4 8 25 7.2 35 3.5 1.9 0) a [put these numbers in a]
numbers (1 -1) b [put these numbers in b]
repeat 1000 [repeat the following 1000 times]
sample 19 b c [sample 19 scores from b and put them in c]
multiply a c d [multiply numbers in a by numbers in c and put answers in d]
mean d e [take the mean of d and put it in e]
score e f [take the value in e and add it to the values in f]
end [end of loop]
histogram f [draw a histogram of f]
- This program reads in the actual values of beta-endorphins that we obtained, and then reads in a 1 and a -1.
- Then it samples from the 1, -1 column with replacement, and puts those answers in c.
- It is possible for c to contain all 1's, all -1's, or any number of each.
- Then it multiplies c times b to randomly associate a sign with the entries in b. It puts this answer in d. (This is exactly equivalent to flipping a coin to chose which of a pair of scores will be the 12 hr. score and which will be the 10 minutes score.)
- Then it takes the mean of d, puts it in e, and accumulates those in f
- Then it repeats that above steps 1000 times and draws a histogram of the 1000 results.
The frequency distribution follows:
![]()
Then I have a histogram of these results.
Our mean difference was -7.70. We can compare that to what we find when we use a randomization test (which is what this is often called) when the null is true.
Notice from the frequency distribution that we find a value (the rounds to) -8 only 1 time out of 1000. That suggests that when the null hypothesis is true, and data are just as likely to be higher than lower in the 12 hr. test as in the 10 min. test, the probability of a -7.7 is about 1/1000 = .001. We had another resampling result where the difference was +8, and for a two-tailed test we need to add those together. Therefore the probability is 2/1000 = .002 that we would get such an extreme result with a true null hypothesis.
Because we actually obtained a mean difference of -7.7, this would suggest that the null is not true in this case.
I have written my own program which will do similar resampling. I am sticking the results here just for fun.
In the Simon and Bruce program I resampled (and counted up) differences between the means. Here I have computed a t statistic and plotted those. There are some good, but technical, reasons why this approach is better, but in this case it is not very much better.Notice that the p value is about the same. Notice that I drew many more samples--it is so quick that it doesn't matter.
The previous approach treated the actual obtained sample values as a population, and drew from that population. [Well, that isn't really true for Fisher and many others. They don't worry about populations--they treat this as the set of numbers they got.] With a t test we are going to behave differently. We are going to assume that the population has the same standard deviation as we found (because we don't have a better guess). However, we are going to ask what kinds of means we would expect if we drew from an infinite population of difference scores (with that standard deviation) where we know that m1 = 0. We are also going to assume that the population is normally distributed, which is not something we assumed above.
We will treat the set of difference scores as one sample.
If the beta-endorphin scores had not really changed over time, then we would expect that the average of the 12 hr. scores would be the same as the average of the 10 min. scores. This would mean that the mean difference score would be 0.00. (Explain why)
Ask what H0 is.
Ask what µ would be if H0 were true.
We will use D to stand for the set of difference scores.
= 7.70; sD = 13.52; N = 19
Our mean difference is 7.70. Is that far enough from zero for use to reject H0?
To answer this question we need to know something about how means would be distributed if they were actually drawn from a normal population of where there are no pre-post differences.
Get them to think about what that statement means
This is subtly different from the resampling approach, because we are sampling from an infinite population, not from the population of the scores we actually obtained. And we are assuming that the population is normally distributed.
Ask them what they know (or can guess) about means and their distribution.
Define
Show them that this is what they were actually plotting when they plotted the distribution of means in lab.
Central Limit Theorem
Tells us exactly what the Sampling Distribution of the Mean looks like.
Given a population with mean µ and variance s2,
the sampling distribution of
the mean (the distribution of sample means) will have a mean = µ and a
variance = s2/N (st. dev. = s/sqrt(N)).
The distribution will approach normal as N,
the sample size, increases.
Show transparencies
I have a plot of the sampling distribution of the mean when N = 19 and we sample from N(0,3.10). This will look like the means that we would get if we drew samples of N = 19 from a population with a mean of 0 and a st. dev. of 13.52, and calculated the sample means.
To get a nice example of the central limit theorem, and also to look at the sampling distribution of the variance, go to David Lane's pages at http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html Just click on Begin to begin.
Using the Central Limit Theorem to test hypotheses
We now know what the distribution of means of differences looks like:
Mean = µ = 7.70; st. dev. = s/ sqrt(N) = 13.52/sqrt(19) = 3.10
Here I have cheated and substituted the sample standard deviation in
place of s. This is a no-no, but let me get away with it for a minute.
Since this is a normal distribution with a known mean and st. dev., we
can ask how extreme (and therefore how likely) our sample mean of 7.70 is.
z = (X- µ)/s in general case
becomes

Show this graphically.
The preceding assumed that we know the variance of the population.
We rarely do, although here is a case where we happen to.
If we do not know s, we cant solve for z, which requires it.
BUT we could say that s is an estimate of s, and
therefore we could substitute
that in our equation.
The only problem with that is that the answer is not really z.
As I say in the text, the sampling distribution of the variance is
skewed.
Therefore, s is more likely to be smaller than the true s
than
larger. This means
that the resulting answer is more likely to be larger than z would be, if we
could calculate it, than smaller.
t.025(18) = +2.101 <-- this is the critical value taken from tables of t.
We will reject H0.
Mention, but dont repeat, the discussion in text about sampling distribution of s2. They could go back to the simple sampling study they conducted a few weeks ago and look at the variances (or standard deviations) that they got.
We will go ahead and make this calculation, but we will call the answer t to reflect the fact that it is not z
We will then look up the resulting t in t tables.

This distribution is a bit exaggerated. For most cases the distributions are closer together than this.
For a Java applet that allows you to adjust the degrees of freedom and see
how the distribution changes, go to http://www-stat.stanford.edu/~naras/jsm/TDensity/TDensity.html
It isn't going to
hold your attention for more than about 8 seconds, but that's something.
SPSS
SPSS allows us to test the null hypothesis two different ways for this example.
We can test the mean of the differences against 0
We can run a dependent sample t using before and after.
The results will be the same.
Starting with the one-sample test:
T-TEST
/TESTVAL=0
/MISSING=ANALYSIS
/VARIABLES=diff
/CRITERIA=CIN (.95) .


T-TEST
PAIRS= before WITH after (PAIRED)
/CRITERIA=CIN(.95)
/MISSING=ANALYSIS.

Note that the results are the same, and that they agree with what I got.
Point out that there is very likely to be a correlation between Time 1 and Time 2, which is why we need to resort to using the difference scores. (The correlation here is .699, which is about what I might expect.)
Get them to think about what it would mean if there were not a correlation between the scores at those two times.
Last revised: 10/09/01