header.gif (5403 bytes)


GLM, Unequal Sample Sizes,
and the Analysis of Covariance

4/23/02

Announcements

Hand back assignments

Talk about Final

Background

Last Tuesday I went over the basics of GLM, and Thursday that worked with ancova..

I will start with a factorial, some of which I know is review, and some of which is not. I will go very quickly over the equal n case, and then make the sample sizes unequal and show what happens. Then I will go on to Ancova. 

I am still using the Smoking X Task example, and the next section is just a cut-and-paste from last week's notes.

It is easy for me to get carried away and talk about all the neat things you can see it what follows. I don't want students to get bogged down in that stuff. What is important is that they see that there are real parallels between Anova and Regression with dummy variables. That allows us to develop approaches which rely on these parallels. (Specifically: the analysis of unequal cell sizes and Ancova.) What follows is designed to aid understanding of what these more complex analyses are telling us.

GLM and Factorial Anova

First, we will take the Spilich example, but with all three tasks, and create dummy variables for the different tasks as well.

Then we create interaction dummy variables my multiplying our dummies together to create 4 new variables.

Nonsmoke*Patrec, Nonsmoke*Cogit, Delayed*Patrec, and Delayed*Cognit

The overall Factorial Anova follows:

Regression approach

If we run the complete multiple regression using all dummy variables as predictors, we get

Regression

SSregression is equivalent to "Model" in regular Anova (Explain why 8 df.)

Comment on the error term.

Now they should understand why the Anova summary table is the way it is, even if that is a confusing way to have chosen to present it.

Removing the Interaction Terms gives:

The difference in the SSregression is 31744.726 - 29016.074 = 2728.652.

This is the SS for the interaction term in the Anova.

Removing the Task Terms (after replacing interaction) gives:

If we subtract this SSregression from the SSregression in full model, we get

31744.726 - 3083.200 = 28661.526

This is the effect of Task

Lastly, look at the model with dummy variables for Task and Interaction, but no dummy variables for Condition

Here the difference between the full and reduced models is

31744.726 - 31390.178 = 354.548

This is the effect of Condition.

From here I get the following models:

 

Model

SSreg

Difference

SSerror

Effect

Full

31744.726

13587.200

Error
Main effects

29016.174

2728.652

Interaction
Task. + Interaction

31390.178

354.548

Condition
Cond + Interaction

3083.200

28661.526

Task

Notice that when we subtract one model from another, we get the appropriate sums of squares.

But we aren't done.

Now we will go back to the full model.

(Note: the table lists Inter4 before Inter3--just due to the way I happened to create them.)

This gives us the regression equation

Y = -2.193Nonsmoke + 0.519 Delayed -8.615Patreg + 20.519Cognit +1.948Inter1

    -7.719Inter2 - 0.563Inter3 + 0.637Inter4 + 18.259

By substituting the dummy variables for a subject from cell11. [    1    0   1   0   1   0   0    0 ] we can get an expected score for that subject. This is the expected cell mean.

Y = -2.1931(1) + 0.519 (0) -8.615(1) + 20.519(0) +1.948(1)

    -7.719(0) - 0.563(0) + 0.637(0) + 18.259 = 9.391 =?= 9.400

In other words, we have reproduced the mean of cell11, which is what we wanted to do. This is nothing but what we do when we make a prediction in multiple regression.

All of this has been review. Students should be able to see that the regression analyses give us everything that the Anova gave us, but in a different form. The advantage of the regression approach is the flexibility it gives you.

Unequal Sample Sizes

This is the quick version of coverage, because I want to get to Ancova, and there is a good example in the text.

The following is designed to show students the long and painful way of doing what SPSS does automatically if you take the default method, which is Unique. Again, don't get caught up in the details.

This approach deals with what in the book I call unweighted means. It treats each cell as if it had the same number of observations, and the row and column means are the means of the means within that role and column.

There are other ways of dealing with unequal sample sizes, but they should only be used if you have a very specific reason for doing so. And in all the years that I have been teaching, I have yet to see an example where I would seriously consider an alternative approach--although I did make one up once that I found pretty convincing..

Below are two diagrams that illustrate the difference between equal and unequal sample sizes. The important point is that the independent variables themselves are correlated when we have unequal samples, and this correlation contaminates the results. [By this I mean that the correlations between the dummy variables for rows and the dummy variables for columns are 0.00 when we have equal sample sizes, and not equal to 0.00 when we don't.]

Equal Sample Sizes--no overlap

eqvenn.gif (14018 bytes)

 

Unequal Sample Sizes--Overlap

uneqvenn.gif (13268 bytes)

 

If I made up an example with unequal sample sizes, I would go through the same steps that I went through with the equal sample size case, and create the same dummy variables. I would also go through the same process of subtracting one model from another.. But now the parts would not add to the whole. I could show, but leave that to the text, that with unequal sample sizes the effects that we get are comparing unweighted means. In other words, the Anova operates as if each cell had the same number of subjects--no mean gets more weight than another.

Thus the mean for the Cognitive task is the average of the means of the nonsmokers, the delayed smokers, and the active smokers who participated in that task. It makes no difference if one group had a sample size that was 10 times as large as the size of the other samples. We saw this last Thursday.

This is really all that I have to say about the unequal n case. But if you go back to the first time that I talked about unweighted means in discussing Chapter 13, you will see that I made the point that if you just look at raw (weighted) means, the row effect might actually represent the contaminating influence of a column effect, or vice versus. Using the solution here we adjust each effect for the others. Thus now the row effect would be adjusted for, rather than reflecting the influence of, the column effect. The unequal sample sizes are not playing any role. I strongly recommend that you go back to that material and look it over briefly.

The Analysis of Covariance

The analysis of covariance (Ancova) is really nothing more than what we have been doing all along. It is an analysis of variance version of semi-partial correlation and regression, where we look at what happens to the independent variable(s) after we control for some other variable or variables (the covariate(s)).

I’m going to modify the example above by adding another variable. This is a variable I created out of my head, it was not part of the original study.

Suppose that we had a measure of Distractibility for each of our subjects. We might assume that how well the subject performs the task is partly a function of his/her basic level of distractibility. We are not inherently interested in distractibility, but we don't want it to mess up our data by adding error variance. We want to get any variability that is due to distractibility out of our data.

Important:  The question that we are asking here is what would the data have looked like if we could somehow equate all subjects on their level of distractibility.

We basically adjust the subject's score to what it would be predicted to be if that subject had the average level of distractibility.

The first question I would like to ask is whether Errors are related to Distract. If they are not, there is no point in worrying about distractibility as a covariate. The simple regression result follows:

We can see that Errors are correlated with Distract. Therefore it would make sense to remove whatever variability can be found in Errors as a result of Distract out of the analysis. In the ideal world this will just be removing random noise that has nothing to do with what we are studying. By removing this noise we will make a smaller error term and have more power.

The second thing that I need to do is to make sure that my major independent variables (Task and SmokeGrp) do not differ on the covariate. For that I ran an Anova with Distract as the dv.

Here we see that the groups don't differ in any significant way on Distract. This is important because if they did, by removing the effect of Distract, we would be removing part of the treatment effect.

Explanation:

Forget the smoking conditions for a minute and imagine that we are just looking at the three tasks, which any person can see are drastically different in Errors. (F = 113.474)

Now lets fudge Distract so that the group with a lot of errors is also very high on Distract. I created Newdistract which was set equal to Distract except  for the Cognitive group, where  I added 80 points to the old Distract score.

Now the R2 between Newdistract and Errors has risen to .750, because those subjects who are high in errors (the cognitive group) are now even higher on New Distract.

When I run an analysis of covariance using Newdistract, SPSS will first adjust the Error (the dep. var.) means to what they would be if the groups did not differ on NewDistract. But that is going to knock off a huge part of Errors for the Cognitive group, leaving them much more like the other groups. Now when I run my Ancova (using Newdistract as my covariate) I will get

You can see that the Task effect has now completely disappeared.

To see what is really happening here, look at the adjusted means on Errors when we use No covariate.

Now look at the means when we use Distract as the covariate

There hasn't been any real change. But remember that the data did not show important group differences on the covariate.

Now look again when we use NewDistract as the covariate.

Notice how the differences between the Tasks have largely disappeared.

This is a good example of how we can lose power in an analysis of covariance if the groups differ on the covariate.

But the alternative is not to simply pretend you don't know about distract and leave it out. If your groups had differed on distractibility, you have confounded Groups and Distract, and your analysis doesn't address the question you want, regardless of whether or not you use Distract as a covariate.

NOTE that in manipulating these data I didn't do a thing to Task or to Errors. I just made the group that was high on errors also high on NewDistract. Then when I ran my Ancova, I basically adjusted the Error means to what they would be if all subjects had the same NewDistract. But this means that I would be subtracting a whole lot from the Error scores in the Cognitive group. By creating a group difference on the covariate, and then adjusting the dv for the covariate, I have removed a previously significant effect on the dv.

An excellent discussion of Ancova, and the effects of group differences on the covariate, can be found in an excellent book by Huitema (1980) The analysis of covariance and alternatives. New York: Wiley. It is out of print, but should be in most libraries.

Ancova:

The following is the printout from using the Anova procedure and specifying a covariate in the appropriate box.. Notice that I am done with NewDistract. I just created that to make a point, and the point has been made. We are back to Distract as the covariate.

Note that the first important item in this table is Distract. This is a test on the relationship between Distract and Errors (controlling for Task, SmkGrp, and Interaction).

I want to compare the above Ancova table with the following Anova table.

1.  Look at the difference in the error term. In the Anova is is 13,587. In the Ancova it is only 8,942. This drop is due to the fact that we have removed error in Errors that was related to Distract. This gives us a good deal more power. This is one of the things that we wanted to accomplish.

2.  Because the groups did differ somewhat on Distract, by removing the covariate we have also reduced the SSTask from 28,661 to 23, 870. We have actually increased Smkgrp from 354 to 563, but reduced the interaction term from 2728 to 1626.

3.  Balancing off the changes in the SSeffects and SSerror we have a larger F for Smokegrp and Task, and a slightly smaller effect for the interaction.

A different way of looking at things:

One way to think about this stuff is to realize that by including Distract as a covariate, we are looking at each of the other effects controlling for the covariate.

I would never do an analysis of covariance this way, but it is instructive to look at how we would use regression to solve the problem.

When we did not have a covariate, we took the sum of squares regression for the base model of

Y = Task, Group, Task X Group   (where Task, etc. were represented by dummy variables).

We then compared that model against the sum of squares for regression in the following models

  1. Y = Task, Group
  2. Y = Group, Task X Group
  3. Y = Task, Task X Group

When we do an analysis of covariance, we do almost the same thing, The only difference is that we put the covariate in every model.

Y = Task, Group, Task X Group, Distract  (where Task, etc. were represented by dummy variables).

We then compared that model against

  1. Y = Task, Group, Distract
  2. Y = Group, Task X Group, Distract
  3. Y = Task, Task X Group, Distract

Notice that every effect is now "over and above" (or controlling for) the effects of Distract.

Assumptions:

The analysis of covariance has all of the assumptions about normality, homogeneity of variance, and independence that we had in the analysis of variance. In addition, we assume that the relationship between the dv and the covariate is the same in every group. (It would never do to have a couple of groups where the two are highly related, and other groups in which they are not related.)

The reason for this assumption is that we are basically fitting one regression equation to all of the data, and using that equation to adjust the data as if the groups did not differ on the covariate. If the slope of the relationship is different in the different groups, it doesn't make much sense to dump them into one regression equation.

In the book I talk about how we test this assumption. We basically include interaction terms between group dummy variables and the covariate. We then look to check that these interactions are not significant. If they were, it would mean that the relationship between Errors and Distract depend on what group you are in.

Summary

If the groups differ randomly on the covariate, that is a bad thing, but perhaps not too bad.

If the groups had systematic differences on the covariate, that is a very bad thing.

If the groups differ significantly on the covariate, that is a very very bad thing, which is worth a couple of hail Marys.

Special topics in Ancova

Adjusted Means

You can ask SPSS to print out "Adjusted Means" whenever you run an Ancova. What this does is to really run a regression solution where the predictors are the main effects, the interaction(s), and the covariate. Then each subject has a 0,1 score on the dummy variables representing groups, as well as a score on the covariate. All we do is substitute those scores, but use the mean score on the covariate for every subject.

The adjusted means are the means that we would expect if there were no differences on the covariate.

Repeated Measures versus Ancova

The question that comes up over and over again is whether it is better to run a repeated measures Anova with Pre and Post (along with Groups) [or a t test on difference scores, which amounts to the same thing], or to use the PreTest as a covariate.

First of all, if the slope relating pre to post is 1.00, these are the same analysis.

Demonstration

Imagine that I was going to create data for an analysis of covariance (I'm not actually going to do that.)

I could take two groups of Pretest scores, and create a set of post-test scores

How could I make a slope of 1.00?

Answer: posttest = pretest * 1 + 5.6

In other words, the slope will be 1.0 if we just add a constant to every score to get our posttest score.

Think about this in the context of the way some experimental treatment affects subjects. If the effect of the treatment is to add a constant to the score of the subject who gets that treatment, then the slope is likely to be about 1.00.

We could add a different constant to Group 1 pretests and Group 2 pretests, which is what we would have to do to get group differences. But here again, the slope would be about 1.00

How could I make the slope different from 1.00?

I could make the posttest score some percentage of the pretest score (and perhaps add a constant).

For example, Post = .70*Pre + 5

Here, the higher you were at pretest, the less you will change to posttest.

This would happen if you had some sort of ceiling effect.

We probably have many situations where we could convince ourselves that the slope is one (the treatment just adds a constant to everyone's score). If that is true, things are fine.

As I said above, only when the slope is 1.00 will the Ancova and the t test on the differences (or the interaction term in the repeated measures anova) be the same.

When the slope is not one, the two analyses will differ.

In this case, I don't have a strong feeling about which analysis is better, but I suspect the covariate is. The t test just goes on pretending the slope is 1.00 when it isn't.

 

Last revised: 04/23/02