t tests--Introduction

10/9/01 

Announcements:

  • Tell them about the answer sheet for last week's lab.

  • Are there remaining problems from last week?

  • Say something about odds ratios and relative risk?

Introduction:

  • Chi–square was the first true hypothesis testing procedure we have used.

  • Chi–square dealt with the case in which we had “qualitative” data and simply
    counted the number of subjects (observations) falling in each of several categories.

  • We set up a model that said that the rows and columns of a contingency table
    (Drug and Outcome) were independent, and created expected frequencies
    on the basis of that model. We asked if the data we obtained (O) were in
    line with what we would expect (E) if H0 (the model) were true. We
    rejected H0 if the data were not in line with the expectations that came
    from our model.

Now we will go to the analysis of “quantitative” or “measurement” data.

  • Here we will worry primarily about the mean.
  • In general we will worry about comparing two means to test H0: µ1 = µ2
  • For us, t tests are more important in terms of how they will fit with
    the analysis of variance than for simple tests of two means.
  • I’m going to start with the one–sample case, simply because it is clearer.
    • The one–sample test, in itself, is not particularly common.

One Sample tests (especially with paired scores)

I’m going to skip over the “truly one-sample” case and look at the case where we have paired scores and have created one sample of difference scores. This kills two birds with one stone and saves a lot of time.

I'm not really going to start with a t test, but with a "resampling" test, because that is a nice way to lay out what is happening, and at the same time remind people of some things that I covered in my faculty seminar presentation. I am going to abbreviate this discussion, however, because students have heard it before. A link to all of this stuff can be found at www.uvm.edu/~dhowell/StatPages/Resampling/Resampling.html.

There are several reasons for starting with resampling tests, but the primary one is so that students can see the logic of both the resampling and the t tests are parallel. It also gives a viable alternative to standard parametric tests.

Resampling Statistics

I'll start with the simple example that I have used elsewhere. 

Hoaglin, Mosteller, and Tukey (1983) looked at the role of beta-endorphins in response to stress. They recorded beta-endorphin levels in 19 patients 12 hours before surgery and again 10 minutes before surgery. The data follow in fmol/ml:

Subject

12 Hours
Before

10 Min.
Before

Gain

1 10.0 20.0 10.0
2 6.5 14.0 7.5
3 8.0 13.5 5.5
4 12.0 18.0 6.0
4 5.0 14.5 9.5
5 11.5 9.0 -2.5
6 5.0 18.0 13.0
8 3.5 6.5 3.0
9 7.5 7.5 0.0
10 5.8 6.0 0.2
11 4.7 25.0 20.3
12 8.0 12.0 4.0
13 7.0 15.0 8.0
14 17.0 42.0 25.0
15 8.8 16.0 7.2
16 17.0 52.0 35.0
17 15.0 11.5 -3.5
18 4.4 2.5 -1.9
19 2.0 2.0 0.0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The relevant statistics are:

Before     Mean = 8.35 St. Dev. = 4.40

After        Mean = 16.05 St. Dev. = 12.51

Diff           Mean = 7.70 St. Dev. = 13.52 N = 19

Resampling Approach

Suppose that stress has nothing to do with beta-endorphin levels.

  • We would expect that there is just as great a chance that the first score is less than the second, as there is that the second is less than the first.
  • Therefore even if we stayed with these data values, there would be a random arrangement of positive and negative signs on the Diff column.
    • You could think of it as picking a pair of scores, and flipping a coin to decide which would be the 12 hr. score, and which would be the 10 min. score.
    • IF stress is irrelevant, this is a perfectly good model for what is happening.
    • and, if stress is irrelevant, the set of data we have are no more or less likely than any other set of data.
  • We will use these particular values, and assign the sign of the difference at random.
  • Then we could take the mean of the Diff column and see how large it is.
    • This is the kind of mean we will get when stress doesn't make any difference.
  • We could do this 1000 times and plot the means. 
  • Those would be the means that we would expect if the signs were distributed at random and there was no effect due to stress.
  • Then we could see how those compare to the actual mean difference we obtained above.
  • It is important to keep the probability of a sample and the probability of a mean separate. When we flip coins, it is very unlikely that we would get 10 heads. But the result HHHHHHHHHH is just as probable, or as improbable, as the specific result HTHHHTHHTT, which has 6 heads and 4 tails. The reason that "6 heads" is more likely than "10 heads" is that there are so many more ways of getting 6 heads.
    • To take 2 coins, the probabilities of
      HH     HT     TH     TT
      are all equally likely at .25. But the probability of 1 head and 1 tail is more likely than 2 heads, because the former consists of HT and TH, while the latter is just HH.

I created a program using resampling statistics to do randomly assign + and - as discussed above. This example is based on a program called Resampling Stats by Simon and Bruce. It is a very simple program, but also a very powerful one. I have no desire for people to understand the following code, but I think that reading through it once is good for the soul. It gives a general understanding of what is going on.


numbers (10 7.5 5.5 6 9.5 2.5 13 3 0 .2 20.3 4 8 25 7.2 35 3.5 1.9 0) a  [put these numbers in a]
numbers (1  -1) b        [put these numbers in b]
repeat 1000                [repeat the following 1000 times]
sample 19 b c            [sample 19 scores from b and put them in c]
multiply a c d            [multiply numbers in a by numbers in c and put answers in d]
mean d e                    [take the mean of d and put it in e]
score e f                    [take the value in e and add it to the values in f]
end                            [end of loop]
histogram f                [draw a histogram of f]


  • This program reads in the actual values of beta-endorphins that we obtained, and then reads in a 1 and a -1. 
  • Then it samples from the 1, -1 column with replacement, and puts those answers in c.
  • It is possible for c to contain all 1's, all -1's, or any number of each.
  • Then it multiplies c times b to randomly associate a sign with the entries in b. It puts this answer in d. (This is exactly equivalent to flipping a coin to chose which of a pair of scores will be the 12 hr. score and which will be the 10 minutes score.)
  • Then it takes the mean of d, puts it in e, and accumulates those in f
  • Then it repeats that above steps 1000 times and draws a histogram of the 1000 results.

The frequency distribution follows:


Then I have a histogram of these results.

Our mean difference was -7.70. We can compare that to what we find when we use a randomization test (which is what this is often called) when the null is true.

Notice from the frequency distribution that we find a value (the rounds to) -8 only 1 time out of 1000. That suggests that when the null hypothesis is true, and data are just as likely to be higher than lower in the 12 hr. test as in the 10 min. test, the probability of a -7.7 is about 1/1000 = .001. We had another resampling result where the difference was +8, and for a two-tailed test we need to add those together. Therefore the probability is 2/1000 = .002 that we would get such an extreme result with a true null hypothesis.

Because we actually obtained a mean difference of -7.7, this would suggest that the null is not true in this case.

Just for fun!

I have written my own program which will do similar resampling. I am sticking the results here just for fun. 

    In the Simon and Bruce program I resampled (and counted up) differences between the means. Here I have computed a t statistic and plotted those. There are some good, but technical, reasons why this approach is better, but in this case it is not very much better.Notice that the p value is about the same. Notice that I drew many more samples--it is so quick that it doesn't matter.

t test (an alternative approach)

The previous approach treated the actual obtained sample values as a population, and drew from that population. [Well, that isn't really true for Fisher and many others. They don't worry about populations--they treat this as the set of numbers they got.] With a t test we are going to behave differently. We are going to assume that the population has the same standard deviation as we found (because we don't have a better guess). However, we are going to ask what kinds of means we would expect if we drew from an infinite population of difference scores (with that standard deviation) where we know that m1 = 0. We are also going to assume that the population is normally distributed, which is not something we assumed above.

We will treat the set of difference scores as one sample.

  • If the beta-endorphin scores had not really changed over time, then we would expect that the average of the 12 hr. scores would be the same as the average of the 10 min. scores. This would mean that the mean difference score would be 0.00. (Explain why)

  • Ask what H0 is.

  • Ask what µ would be if H0 were true.

  • We will use D to stand for the set of difference scores.

  •   = 7.70; sD = 13.52; N = 19

  • Our mean difference is 7.70. Is that far enough from zero for use to reject H0?

  • To answer this question we need to know something about how means would be distributed if they were actually drawn from a normal population of where there are no pre-post differences.

    • Get them to think about what that statement means

      • This is subtly different from the resampling approach, because we are sampling from an infinite population, not from the population of the scores we actually obtained. And we are assuming that the population is normally distributed.

    • Ask them what they know (or can guess) about means and their distribution.

Sampling Distribution of the Mean

Define

Show them that this is what they were actually plotting when they plotted the distribution of means in lab.

Central Limit Theorem

  • Tells us exactly what the Sampling Distribution of the Mean looks like.

    • Given a population with mean µ and variance s2, the sampling distribution of
      the mean (the distribution of sample means) will have a mean = µ and a
      variance = s2/N (st. dev. = s/sqrt(N)). The distribution will approach normal as N,
      the sample size, increases.

    • Show transparencies

I have a plot of the sampling distribution of the mean when N = 19 and we sample from N(0,3.10). This will look like the means that we would get if we drew samples of N = 19 from a population with a mean of 0 and a st. dev. of 13.52, and calculated the sample means.

 

To get a nice example of the central limit theorem, and also to look at the sampling distribution of the variance, go to David Lane's pages at http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html Just click on Begin to begin.

Using the Central Limit Theorem to test hypotheses

  • We now know what the distribution of means of differences looks like:

    • Mean = µ = 7.70; st. dev. = s/ sqrt(N) = 13.52/sqrt(19) = 3.10

    • Here I have cheated and substituted the sample standard deviation in
      place of s. This is a no-no, but let me get away with it for a minute.

    • Since this is a normal distribution with a known mean and st. dev., we
      can ask how extreme (and therefore how likely) our sample mean of 7.70 is.

    • z = (X- µ)/s in general case

      • becomes

  • Show this graphically.

 

Simple one sample t test

  • The preceding assumed that we know the variance of the population.

    • We rarely do, although here is a case where we happen to.

  • If we do not know s, we can’t solve for z, which requires it.

  • BUT we could say that s is an estimate of s, and therefore we could substitute
    that in our equation.

  • The only problem with that is that the answer is not really z.

  • As I say in the text, the sampling distribution of the variance is skewed.
    Therefore, s is more likely to be smaller than the true s than larger. This means
    that the resulting answer is more likely to be larger than z would be, if we
    could calculate it, than smaller.

  •  

t.025(18) = +2.101  <-- this is the critical value taken from tables of t.

  • We will reject H0

  • Mention, but don’t repeat, the discussion in text about sampling distribution of s2. They could go back to the simple sampling study they conducted a few weeks ago and look at the variances (or standard deviations) that they got.

  • We will go ahead and make this calculation, but we will call the answer t to reflect the fact that it is not z

    • We will then look up the resulting t in t tables.

The t distribution

tdist01.gif (3215 bytes)

This distribution is a bit exaggerated. For most cases the distributions are closer together than this.

For a Java applet that allows you to adjust the degrees of freedom and see how the distribution changes, go to http://www-stat.stanford.edu/~naras/jsm/TDensity/TDensity.html It isn't going to
hold your attention for more than about 8 seconds, but that's something.

SPSS

  • SPSS allows us to test the null hypothesis two different ways for this example.

    • We can test the mean of the differences against 0

    • We can run a dependent sample t using before and after.

    • The results will be the same.

  • Starting with the one-sample test:

T-TEST

/TESTVAL=0

/MISSING=ANALYSIS

/VARIABLES=diff

/CRITERIA=CIN (.95) .

  • Now the two-sample test (I can't control the order using menus)

T-TEST

PAIRS= before WITH after (PAIRED)

/CRITERIA=CIN(.95)

/MISSING=ANALYSIS.

 

 

Note that the results are the same, and that they agree with what I got.

Point out that there is very likely to be a correlation between Time 1 and Time 2, which is why we need to resort to using the difference scores. (The correlation here is .699, which is about what I might expect.)

Get them to think about what it would mean if there were not a correlation between the scores at those two times.

  Last revised: 10/09/01