Factorial Anova #1
1/29/2002

Announcements

I was asked in class about the format for reporting research results. I have scanned the relevant section from the APA Publication Manual, and I recommend it as the most authoritative source.

Definition of Factorial Anova

Define a factorial as a crossing of variables.

Define as 3×5, etc.

In what I am talking about now, we will have independent groups of subjects in each cell. This will change when we come to repeated measures.

Example

Spilich et al (1992) looked at the effects of smoking on various tasks. They varied tasks in terms of level of cognitive involvement required to perform the task (Pattern recognition, Recall task, Driving simulation) and in terms of the smoking behavior of subjects (Non-smoker, Delayed smoker (not currently smoking), and active smoker (currently or recently smoking) ) In the original study the dependent variable was different in each task. I have made it "errors" across all tasks, and have not violated the spirit of what they found.

What would students expect to find in this study?

    Does that involve an interaction?

The data are provided in the following table for those who want to work with them. They are also available at Spilich.sav.

Notice that there are the same number of observations in each cell.

Pattern Recognition

Recall Task Driving Simulation
Non
smoker
Delayed smoker Active smoker Non
smoker
Delayed smoker Active smoker Non
smoker
Delayed smoker Active smoker
9 12 8 27 48 34 15 7 3
8 7 8 34 29 65 2 0 2
12 14 9 19 34 55 2 6 0
10 4 1 20 6 33 14 0 0
7 8 9 56 18 42 5 12 6
10 11 7 35 63 54 0 17 2
9 16 16 23 9 21 16 1 0
11 17 19 37 54 44 14 11 6
8 5 1 4 28 61 9 4 4
10 6 1 30 71 38 17 4 1
8 9 22 4 60 75 15 3 0
10 6 12 42 54 61 9 5 0
8 6 18 34 51 51 3 16 6
11 7 8 19 25 32 15 5 2
10 16 10 49 49 47 13 11 3

Analysis

Cell Means

task1.gif (6557 bytes)

smkgrp.gif (6856 bytes)

cellmeans.gif (13094 bytes)

We can see that there are very large differences in means in the different tasks. That is part of the nature of the task, and is completely uninteresting.

We can also see that there are differences among the smoking groups, with active smokers making more errors than nonsmokers. We don't yet know if this is significant.

The cell means show the most interesting pattern.

We can plot the means and get a more visual effect.

Calculations

I’m going to please everyone by not focusing on hand calculations. (Let A represent Tasks and C represent Smoking conditions.

 

If we look at the means (from the table) it looks as if there is a difference due to Task and to Smoking. These are called main effects, because they are the effect of one variable averaged over the other variable(s).

It also looks as if there is an interaction effect, meaning that the pattern across one level of Task looks different from the pattern across other levels of Task. (Or vice versa.)

The SPSS analysis is shown below:

(Illustrate setup using SPSS.)

The shaded areas in the above table are lines that we don't normally include. They are perfectly correct, but just not very useful for our purposes.

The following is a paste of an exchange with a women named Diana Sharp, who asked a perfectly reasonable question about the extra terms. (Note how easily an e-mail message can get pasted into a web page and made available for the whole world to see. In this case she had sent hers to a wide audience, but she could have sent it privately to me, and then done an internet search and discovered that her message was floating around in the ether. Just a comment.)

Diana,

You asked about all of those extraneous terms in SPSS factorial anova.

The "intercept" is what most textbooks (especially older ones) call the "correction factor." It is the grand total squared divided by N. (Equivalently, it is xbar squared times sqrt(N).

The "Corrected total" is what everyone else calls the total sum of squares.

The "total" is the total sum of squares before you subtract the correction factor, or, equivalently, sigma(X-squared).

The "corrected model" is, when you have equal sample sizes, the sum of the main effects and the interaction. When you don't have equal sample sizes, it is the variability that can be explained by all three effects (the two main effects and the interaction) simultaneously.

Hope that helps,

Dave Howell

At 1037 PM 1/26/99 -0500, Diana Sharp wrote

>I am using GLM Univariate to do two factor analysis of variance for my PhD research (trying to do it without a stats coach!) The results include an "intercept", a corrected model, a total and a corrected total. Please help me understand what the intercept is. Also, the corrected model is, I assume, due to the unequal sizes of the groups in my study. Is there any way to know which one of the main effects or the interaction to attribute the "difference" between the Sum of Squares for each of the main and interaction effects versus the amount in the corrected model Sum of Squares (they never quite match up mathematically). I assume one cannot attribute it directly to any of them.

>Can anyone shed some light on this for me. I would really appreciate the assistance.

>In addition, thanks for the help with copying charts. I thought I was the only one seeing double. Now I can stop retyping all these (if I get your instructions to work)!

>Sincerely,

>Diana Sharp"

From the Task entry, we can see that there are significant differences due to tasks. I don't care about this effect, and will ignore it. (Ask: why?.)

From the Smkgrp entry we can see that differences between smoking groups overall (the main effect of smkgrp) are not significant. This will turn out not to be an important effect, anyway, so we'll set it aside. (Ask: why?.)

You can see that there is a significant interaction, which is something that we had predicted. It is this interaction that caused me to say that I didn't care very much about the Smkgrp main effect.

We have the means above, so we can see what's going on. I can also use SPSS to draw a graph of these results, which is what I prefer. Notice that this graph is different from the one above, only because of which variable is plotted on the X-axis and which variable is plotted with separate lines. (Ask: Is one plot better than the other?.)

Remember that the dv is Errors, so high scores are bad!

plot1.gif (4713 bytes)

Summary table again

Discuss the structure of the summary table.

Point out the main effects and interaction, and draw the appropriate conclusion.

Point out the MSerror and discuss what it is.

Emphasize that it is the average within cell variance (weighted by sample size if necessary.)

Ask what would happen if we reran this as a one-way, ignoring the presence of the Task variable. I want them to see that any Task and interaction effects would go into the error term. See following printout.

SSerror = SSerror + SSTask + SSinteraction = 13587.20 + 28661.526 + 2728.652 = 44977.378

This would be on dferror = dferror + dfTask + dfinteraction = 126 + 2 + 4 = 132, giving a new MSerror = 44977.378/132 = 340.738, which is much larger than 107.835. This would drastically reduce power.

Simple effects

A simple effect is the effect of one variable at one level of the other independent variable. Thus, for example, it is the effect of Smkgrp at Task = 1. (Or the effect of smkgrp at Task = 2 or at smkgrp at Task = 3.)

The easiest way to get the simple effects is to restrict the analysis to one level of the other variable, though you still might have to go back and correct the F value.

To do this we need to use a filter to specify either that we will only use one set of data, or that we will run separate analyses for each level of Task. we will just use Data/Split File.

This analysis will be the effect of Smkgrp at each level of Task.

simple1.gif (6861 bytes)

simple2.gif (3596 bytes)

 

In the Overall Anova, MSerror was 107.835 on 126 df. For the first analysis (Task level = 1) we will take the above table and replace the error term in that table by 107.835, and the df by 126. This will give us an F = .01, which is clearly not significant. 

We would do the same thing for the other simple effects, getting F's of. 12.256 and 2.029. The first is significant, but not the second (critical value approx = 3.07 for 2 and 126 df). Notice that these are not the same conclusions that we came to when treating them as three separate analyses.

There is a good argument to be made that replacing the error term with MSerror was not a very bright idea. Notice that the individual error terms varied from 23 to 278, indicating heterogeneity of variance between tasks. Given the nature of the tasks, this might well have been predicted. If these were my data I would not use the pooled error term.

Ask: Why might we expect such heterogeneity of variance in this analysis?

The tasks are very different from one another, and I can easily imagine that you might have greater (or lesser) variability in errors on a driving simulation task than on a cognitive task. Notice that the recall task has a much higher mean. Though that is probably meaningless to us, it would allow for much more variability on recall.

There is a big discussion among people who like statistics about when to look at the simple effects and when not to. I strongly suggest looking at them whenever the interaction is significant. An interaction is telling you that different things happen to one variable at different levels of the other variable, and I think you need to look and see what those things are.

A related discussion is what about main effects in the presence of an interaction. I’ll try to remember to come back to that.

Calculation of Power for Spilich Data Given
Sample Means as Parameters

The following table gives the cell means, the row and column means, and the treatment effects. (The treatment effects in each cell are in parentheses.)

It is important that students understand this, because they are almost certain to have to carry out power calculations before they finish their degree. (At least I hope they do.) I may very well put something like this on an exam.

 

Nonsmoker

Delayed
Smoker

Active
Smoker

Mean

Effect

Pattern Rec

9.40
(1.948)

9.60
(-0.563)

9.93
(-1.385)

9.64

8.615

Recall

28.87
(-7.718)

39.93
(0.637)

47.53
(7.081)

38.78

-20.518

Driving 9.93
(5.770)
6.80
(-0.074)
2.33
(-5.696)
6.36 11.904
Mean

16.07

18.78

19.93

18.26

 
Effect

2.193

-0.519

-1.674    

Note how the "effects" are calculated. I have done that immediately below. They are the deviations or row or column means from the grand mean, and then, for the interaction, deviations of cell means from the grand mean with row and column effects removed.

Students need to understand how to calculate them. (There is an error in the first cell effect, because 9.94 should read 9.40. The answer is correct. It would take me too long to correct this, because it is one large graphic.)

effects.gif (5892 bytes)

MSerror is found in the summary table to be 107.835.

Power in a factorial is a direct extension of the way we calculated power with a one-way. There we calculated

factor1.gif (616 bytes)

Here we will simply extend that to rows, columns, and interactions.

In what follows I have replaced terms like factor2.gif (191 bytes) with Saj2

I use the symbol k to refer to the number of rows, columns, or cells whose deviations are in the numerator. (Probably not a great idea, but I did it. If I were smart I would use r, c, and rc rather than k.)

phi.gif (2848 bytes)

phi.gif (1080 bytes)

There is a problem when we come to specifying the sample size for Cohen's tables. For reasons I won't go into, Cohen defines

n' = dferror/(dfeffect +1) + 1

For our main effects this becomes 126/3 + 1 = 43

and for the interaction this is 126/5 + 1 = 26.2

Using the tables From Cohen , with dfe = 30 and phi-prime rounded, or a program called G*Power, I calculate power as

Effect n' Phi-prime Phi Power
Task 43 1.42 9.55 .99
SmkGrp 43 0.16 1.07 .32
Interaction 26 0.43 1.68 .98

I cannot make this come out properly when I use the tables of the noncentral F distribution. After several hours on it, I gave up. But students generally don't calculate power using the non-central F anyway.

A great source for power is a program called G*Power. It is available at

http://www.psychologie.uni-trier.de:8000/projects/gpower.html

Even if you don't want the program itself, they have an excellent manual that covers lots of stuff about power.

Effect Size Measures

I have been advocating the use of effect size measures. There are several different things that you could do here.

1.  You could use an r2 type measure, such as h2 or w2. I talk about those in the text.

2.  You could use a modification of an r2 type measure to give a more accurate account of what is going on. For example, there is a very large Task effect, which is of no particular interest. If you were to calculate h2 for either SmokeGrp or the Interaction,, you would get a relatively small answer because SStotal is inflated by the Task effect. It would be perfectly legitimate to subtract SSTask from SStotal in calculating the denominator. Just be sure that you tell your reader what you have done.

3.  d or f

 In discussing power, we calculated f'. For SmokeGrp it was 0.16. This is what Cohen called F and what others have called d. It is a standardized measure of the variability of group means.

4.  d or f for a pair of means

As I said in class yesterday, it often makes far more sense to present the effect measure for two groups, because it is clear what that measure means, and it is often the thing you really care about. 

I think that it would be particularly interesting to compare the errors for the NonSmokers and the Smokes, but only for the simple effect of SmokeGrp at Recall. 

Ask why I wouldn't want it for the two groups overall.

Taking (MSerror)1/2 as our estimate of s, we would have

That is a very substantial treatment effect. The two means differ by 1.8 standard deviations.

Notice that the error term does not involve any variability attributable to the main effects or interactions, so it is not influenced by the large Task effect. At the same time, by looking only at the simple effect at recall, we have also removed any Task effects. So we have a meaningful measure of effect size.

 

Last revised: 02/01/02