Unequal Cell Sizes Do Matter

2/7/02

We may spend the entire class on Thursday finishing up some important stuff that I will not be able to cover completely on Tuesday. I may ask that this lab be done on your own. It should not be difficult to do so.

Most textbooks dealing with factorial analysis of variance will tell you that unequal cell sizes alter the analysis in some way. I recently came across an excellent example that illustrates this point, and its elaboration may be helpful to people who have to work in this environment. This example is valuable for several reasons. First of all, the pattern of inequality is not dramatic, at least in the sense that within every row, the columns have roughly similar sized samples. Second, the effect is quite dramatic. Finally, my initial extreme explanation, while still correct, works even in a relatively less extreme case.

This lab is based on a web page that I put together two years ago. That page can be found at Unequal CellSizes, but please do not read what I have there until after you have worked on this for a while.

I received the following example from Jo Sullivan-Lyons, who is a research psychologist at the University of Greenwich in London. She was kind enough to share her data with me. In her dissertation, she was asking how men and women differ in their reports of depression on the HADS (Hospital Anxiety and Depression Scale), and whether this difference depends on ethnicity So we have 2 independent variables--Gender (Male/Female) and Ethnicity (White/Black/Other), and one dependent variable-- HADS score. 

I have created data which exactly reflect the cell means and standard deviations that the author obtained, and these are available at JSLdep.dat as a tab-delimited ASCII file with the variable names in the first line.

First produce the descriptive statistics for these data broken down by Gender and then by Ethnicity. (Please recall that the data, though fictitious, do reflect her results, and the results are proprietary.) Examine these results and draw some simple conclusions about what the data look like. (At this point, don't run any hypothesis test.)

You might suspect that the original data are somewhat skewed given the fact that the standard deviation for males is larger than the mean, but for the data that I am using the observations were drawn from random normal distributions, so that is not a concern to us in this assignment, though it should be a concern to her.

The author's first question concerned whether males and females differ in their level of reported depression. Run a t test to compare males and females on the HADS. What would you conclude? (Report the group statistics and the results of the t test. Be sure to report the means of males and females.)

But, the original question implied that we should take ethnicity into account. This would suggest a two-way analysis of variance, with Sex and Ethnicity as independent variables, and HADS as the dependent variable. Run this analysis and paste in the results. In the process, ask GLM/Univariate to print out the means for Sex, Ethnicity, and the interaction. Save these results.

What conclusions would you draw from the analysis of variance, and how would those conclusions differ from the ones you drew based on the t test? (Here we are especially interested in any sex differences.)

Notice that there is a significant effect due to Ethnicity, and there is an interaction of Ethnicity by Sex, but there is no Sex effect. There isn't even an "almost" sex effect. The F and  p values for Sex are 0.048 and 0.826, respectively. What Happened!!!! Well, compare the male and female means you found in the t test and the Anova.

No, you didn't do anything wrong, although you may have made a poor choice of analysis. The two sets of answers will be very very different with respect to Sex. Your job is to examine the data, think about what is happening, and tell me why the results are so different.

The differences described above, along with the significant interaction, should lead you to an interest in simple effects. Specifically, is there a significant Sex effect at each level of Ethnicity. You could run the analysis separately for each row, and then go back and recalculate the F for Sex by substituting the error term from the overall analysis of variance. Alternatively, you could accomplish the same thing by using SPSS syntax commands. (If there is a way to do this from the menus, I haven't figured it out.) The relevant commands are

Manova hads by sex(1,2) Ethnicit(1,3)
/Design = Ethnicit, Sex within Ethnicit(1), sex within Ethnicit(2),
sex within Ethnicit(3).
Execute.

Now write up the results of your investigation explaining what is happening. You should write this as if you were sending the answer to Jo Sullivan-Lyons, who asked the original question. Her original question follows, but I have not asked for permission to post it, so please don't make it any more public than I already have.


 

Name: Jo Sullivan-Lyons <jlyons@molly.u-net.com> Institution University of Greenwich, London, England 

What a great home page! I am writing up my PhD and I am wondering why for a couple of analyses there seems to be a discrepancy between my t-test result and my ANOVA result. For example, I find a significant difference between depression scores for men and women when I use a t-test (p= 0.0001). However when I use depression scores as the dependent variable and gender (male, female) as one factor in a 2-way ANOVA (other factor is ethnicity, not that that is relevant really) I don't get a significant main effect for gender (p=0.83)Any suggestions gratefully received! Jo Jo Sullivan-Lyons

Last revised: 02/01/02