
Solomon, Secker-Walker, Skelly, and Flynn (1996) Journal of Behavioral Medicine studied smoking behavior in pregnant women. They looked at the women’s determination to quit smoking while pregnant. They interviewed 349 women at their first pre-natal visit, all of who were smokers when they became pregnant, and classified them into four groups.
- PC Precontemplation Smokes and has no plan to quit smoking
- C Contemplation Smokes but is thinking of quitting
- P Preparation Smokes, but has made some effort at quitting
- A Action Has already quit
They wanted to look at the subsequent smoking behavior of these subjects over the course of their pregnancy, but one important consideration is how much these women smoked when they became pregnant. If the groups differ on that variable, that might affect the interpretation of the results. (This is our problem for today.) The data can be found in Solomon.sav.
The answers to this assignment can be found at Answers
The means and standard deviations of these four groups, in terms of cigarettes/day when they became pregnant, follow.
|
PC |
C |
P |
A |
|
|
Mean |
24.8 |
16.6 |
28.8 |
13.7 |
|
St. Dev. |
13.3 |
5.2 |
12.2 |
8.8 |
|
nj |
69 |
37 |
153 |
90 |
Notice that this is not really a simple problem. Your sample sizes are grossly unequal, and you have problems with heterogeneity of variance. Not to worry, I tell you how to deal with this in the text using the Games-Howell approach. It is a modification of the S-N-K, though can be applied in the context of most pairwise comparisons by making suitable changes in the critical value of qr.
Notice that this is a real data set, and this is the kind of problem that each of you can expect to face in the future. This isn’t some trumped up example that doesn’t apply to anything important. Using the Games-Howell procedure within SPSS, what can you conclude from these data? Tell me how this test differs from a standard S-N-K or Tukey test, as far as the arithmetic is concerned.
Last revised: 04/10/03
Because the question that prompted this note referred specifically to the Newman-Keuls test, I will answer with respect to that test. However the approach generalizes to any of the multiple comparison procedures that are based on a t or q statistic. However you have to be careful about the test statistic. For example, the Tukey and the Newman-Keuls use the same arithmetic, but they evaluate q, their test statistic, against different values. Tukey evaluates it against the Studentized Range statistic with r = number of means levels of the independent variable. The Newman-Keuls, on the other hand, calculates the same q, but sets r equal to the number of means for which the two in question are the largest and smallest. [If you work with "widths" rather than q itself, just make the appropriate change.]
The solution for doing a Newman-Keuls test with unequal sample sizes is basically the same solution you would use for a variety of post-hoc procedures. Most of the post-hoc tests involve some sort of t test or Studentized range test. As such, they contain a standard error of the form
The former is used with a t test, and the latter with a Studentized range-based test (such as the Newman-Keuls or Tukey tests.)
The problem with either of these formulae is that they assume that you have a constant sample size. If you have different sample sizes, you need to replace "n" with "ni" and "nj."
There are two ways to do this. The simpler, known as the Tukey-Kramer approach, is to assume that the populations have equal variances, and therefore to continue to use MSerror as our variance estimate. Thus the formulae would be
again, using the first if you have a t test and the second if you have a Studentized range test.
Notice that these formulae, and those that follow, assume that you carry out separate calculations of the error term for each pair of samples. That is because ni and nj will change as you change the two groups you are comparing. This is painful, but you don’t have much choice.
Using this approach, you can calculate either t or q, and evaluate them against the t or Studentized range tables.
and
In both cases the degrees of freedom would equal the degrees of freedom for MSerror.
If you want to calculate a critical width (Wr) instead of a test statistic like t or q, you can simply multiply the appropriate error term by the critical value of t on dferror or by the critical value of q for r and dferror.
Games and Howell (1976) (no relation, unfortunately) carried this one step further by allowing for heterogeneous sample variances, as well as unequal sample sizes. They proposed an error term of the form
Notice that this is similar to the error term for the Tukey-Kramer test, except that we have replace MSerror by the individual variances. Here again you will be required to calculate a separate error term for each pair of samples.
Games and Howell went a bit further, recognizing that with this error term the degrees of freedom need to be adjusted. Their adjustment goes back to the adjustments proposed by Welch and by Satterthwaite, and can be written as
This, too, must obviously be computed for each pair of samples.
Again you can form the t or q test statistic by replacing the standard (common) error term with the individualized error term above, and using df’ instead of dferror for your degrees of freedom.
As I say in the text (Methods, 5th edition, p. 397), if the sample sizes are nearly equal, you can save a great deal of time by using the more traditional formulae and substituting the harmonic mean of the sample sizes. I do not recommend this if you have heterogeneous variances.
Last revised: 4/10/03