One-way Analysis of Variance

11/13/01

Announcements:

Introduction:

The Analysis of Variance (Anova) is a test comparing the means of several groups. It is an extension of the independent groups t test to the multiple group case. (There is also an extension of the repeated measures t, but I’ll put that off until next semester.)

In the past I have discussed the equal n case before the unequal n case, but there is no great reason to do so. Here I’ll do it the other way around. There is a big difference between the two cases when we have more than one independent variable, but not with a one-way.

One-way Anova

Here we are talking about the situation in which we have only one independent variable, and the groups are stretched out along that dimension.

We are going to first test a null hypothesis that relates to all of the groups, and later I will come back and ask how we can test the differences between individual groups.

I’m going to give an example and plunge right into the calculation. I want students to go back to the text and understand exactly what I did. I am changing my example from previous years just for variety, but we will come back to the older one.

I am going to first simply use SPSS. Then I will go to an example where I work through the actual calculations by hand.

Cheryan, S. & Bodenhausen, G. V. (2000) When positive stereotypes threaten intellectual performance:... .Psychological Science, 11, 399-402

.It has been shown that invoking a cultural stereotype can help or hinder performance, depending on the stereotype. Shih et al. (1999) showed that when they tested Asian-American women on a test of mathematical ability, making prominent the "Asian" characteristic of these women improved performance relative to making prominent their "Female" characteristic. But we also know that making those characteristics publicly apparent can hurt performance, because the individual then sees him/herself as having to uphold a positive steriotype, leading to "choking." Shih had not made the prominence public and Cheryan & Bodenhausen wondered what would happen if you did.

They had three groups of Asian-American women. To make ethnicity salient the respondents completed a questionnaire with such items as "I am a worthy member of the racial group to which I belong." To make gender prominent, the questions were of the form "I am a worthy member of the gender to which I belong." In the private group the participants answered questions about "their personal, individual identity."

The results are in the following table.

Condition
Mean (%)
Standard Deviation
n
Ethnicity
71
17
16
Gender
81
14
16
Control
83
9
16

We can see that when Ethnicity was made prominent, performance was worse than the Control condition,

I want to test the null hypothesis that says

When I use SPSS to run the analysis I get the following

 

Interpret this only in terms of what the F means, and what the Sig. column means.

F is the ratio of Between MS over Within MS. Thus it is asking if there is more difference between groups than we would expect if the only source of variability in this experiment were error.

Cheryan reported an F(2,46) = 2.34, p < .025, but I don't see how she could have found that given the statistics she reported.

Example#2 --Foa et al.

Foa, Rothbaum, Riggs, and Murdock (1991) Treatment of posttraumatic stress disorder in rape victims: A comparison between cognitive-behavioral procedures and counseling. Journal of Consulting and Clinical Psychology, 59, 715-723.

The title tells us something about what they consider the important comparison—there are additional groups that they could have compared.

They compared three treatments (and a waiting list control) for the treatment of posttraumatic stress disorder in rape victims.

  1. Stress Inoculation Training (SIT) Instruction in coping skills (deep breathing, muscle relaxation, stopping intrusive thoughts, etc.)
  2. Prolonged Exposure (PE) 7 sessions devoted to reliving the rape scene in their imagination.
  3. Supportive Counseling (SC) Patients taught general problem solving with therapist playing an indirect unconditionally supportive role. This was the control for nonspecific therapeutic effects.
  4. Waiting List Control (WL)

Therapy was carried out over 9 sessions. I am going to look only at the post-test scores.

There were pre and post treatment measures on a number of dependent variables. In fact, this was a much better study than I am going to present.

My dependent variable will be the sum of the subject’s ratings on about 15 variables related to PTSD—e.g. flashbacks, nightmares, memory difficulties, etc. Therefore, higher scores represent more disturbance.

Data:

  SIT 
(n = 14)
PE 
(n = 10)
SC 
(n = 11)
WL 
(n = 10)
 

3

18

24

12

 

13

6

14

30

 

13

21

21

27

 

8

34

5

20

 

11

26

17

17

 

9

11

17

23

 

12

2

23

13

 

7

5

19

28

 

16

5

7

12

 

15

26

27

13

 

18

 

25

 
 

12

     
 

8

     
 

10

     
         

Mean

11.07

15.4

18.09

19.50

St. Dev.

3.95

11.12

7.13

7.11

SX

155

154

199

195

                                                                                                                Grand Mean = 15.6222

Describe what data file would look like.

Calculations:

First I want to define the formulae using a "definitional equation," and then I'll use a computational one. 

These translate to

Summary Table

Source

df

SS

MS

F

p

Group

3

507.8401

169.280

3.05

.0394

Error

41

2278.7377

55.579

   

Total

44

2786.5778

     

Since p < .05, we can reject H0 and conclude that the samples did not all come from populations with the same mean. The groups did not score equally at posttest.

But I think that the best way to look at a set of data is to graph them. I’ll first do this using boxplots in SPSS.

 

Direct Calculation:

The formulae above are fine for calculation by hand, but the more direct way to see exactly what we are doing is to consider the definitional formulae I gave above.:

Explain what these two formulae say. They are algebraically equivalent to what we had before, and so give the same answer.

SStreat is a weighted sum of the deviations of the sample means around the grand mean.

Here

SSerror is the weighted average of the cell variances. 

SPSS analysis:

I used the one-way procedure in SPSS to analyze these data. The results are below. I used the one-way procedure instead of the Anova/Factorial procedure just because it would give me the means I want, and because it will run multiple comparisons among group means when we come to them. The summary table itself will be exactly the same.

 

Notice that Levene's test is significant. That might make us nervous, but perhaps not. See below.

Get them to draw the necessary conclusions about treatment. Get them to see that we need to subsequently compare one group with another, and ask what kinds of comparisons they would like to make.

Sampling Distribution of F

I computed the sampling distribution of F for a situation in which the null hypothesis was true. 

I simply took Foa's data and allowed a resampling program to combine the groups and then sample at random. Here we are looking at the case where the null is true.

The sampling distribution follows.

I then assumed that the separate populations exactly mirrored the data that Foa found. In other words, the null is false to exactly the same degree that she found it false. I sampled from four different populations whose data were random (with replacement) samples from Foa's group data This led to the following F distribution. 

Notice the degree to which the second distribution is displaced to the right. This distance is what we will call the noncentrality parameter when we talk about power. In comparing these two distributions, the critical value is approximately 2.81.

Welch's approach to heterogeneity of variance.

The following is taken from the solutions manual for this book. It shows that we still have a significant difference here even when we take heterogeneity into account.

I give a formula in the text (due to Welch) that could be used if you are really concerned. It is a very messy formula.

 

The result would still be significant, although on a substantially reduced number of degrees of freedom, but a substantially larger F. I feel more confident, because I know that even when I take the differences in group variances into account, I still get a significant result.

 

 

Last revised: November 11, 2001