
11/29/01
This lab is intended to give you experience with running multiple comparison procedures using SPSS, and to contrast some of these procedures. It is also intended to illustrate the kinds of variability that we can expect in even fairly large studies, and the confusion that can result from trying to make sense of data.
Solomon, Secker-Walker, Skelly, and Flynn (1996) in the Journal of Behavioral Medicine studied smoking behavior in pregnant women. They looked at the women's determination to quit smoking while pregnant. They interviewed 349 women at their first pre-natal visit, all of whom were smokers when they became pregnant, and classified them into four groups.
| Label | Condition | Description |
| PC | Precontemplation | Smokes and has no plan to quit smoking |
| C | Contemplation | Smokes but is thinking of quitting |
| P | Preparation | Smokes, but has made some effort at quitting |
| A | Action | Has already quit |
They wanted to look at the subsequent smoking behavior of these subjects over the course of their pregnancy, but one important consideration is how much these women smoked when they became pregnant. If the groups differ on that variable, that might affect the interpretation of the results.
The means and standard deviations of these four groups, in terms of cigarettes/day when they became pregnant, are given below. What can we tentatively conclude about group differences in pre-pregnant smoking behavior?
PC |
C |
P |
A |
|
Mean |
24.8 |
16.6 |
28.8 |
13.7 |
St. Dev. |
13.3 |
5.2 |
12.2 |
8.8 |
nj |
69 |
37 |
153 |
90 |
Notice that this is not really a simple problem. Your sample sizes are grossly unequal, as you might expect in the real world, and you have problems with heterogeneity of variance. We are not going to worry about that at the moment, but it is still worth noting.
We will recreate this data set by randomly sampling from populations with these means and standard deviations. Each of you will do that 5 times, run the appropriate analysis of variance, and subsequent multiple comparison procedures. You will then report the results to me and I will collate them for the entire class.
The SPSS syntax program to do this is attached at the end of this document. You can cut and paste it into SPSS, so you don't have to do a lot of typing.
After you get the syntax written, run the entire program. It will perform 5 analyses, then it will automatically reduce the sample sizes by half, and then run five more analyses. For the first table below, just fill in the 5 F values that you found with the full and half Anovas. The differences that you see relate to the relative power of the two levels of sample size. Next, I want you to fill in the following tables, based on Fisher's LSD procedure (which is basically all possible t tests at a = .05), and then Tukey's test (which is one of the more commonly recommended procedures). In each cell of the table I want to know the number of times (out of 5) that each procedure, and each sample size, found a significant difference between corresponding pair of groups.
| Obtained F values | |||||
| Full sample | |||||
| Half sample | |||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Notice that this is a real data set, and this is the kind of problem that each of you can expect to face in the future. This isn't some trumped up example that doesn't apply to anything important. What can you conclude from these data?
I have not covered other testing procedures in class, though I do discuss them in the book. I would like you to go through the rest of the choices for multiple comparison procedures and apply each of those to your data for r1, which is the first dependent variable. I just want you to get a sense of how to apply them. You do not have to cut and paste the results, because that would be too much paper, but I would like you to right a sentence of two indicating what, if anything, you noticed when you tried the other procedures.
Last revised: 11/28/2001
SPSS Syntax:
* This program creates data that resemble data collected by Solomon et al.
(1996).
* The data fall into four groups, and are sampled from populations with their
means.
* and standard deviations. We first do it with their sample sizes, and then with
samples.
* cut about in half.
new file.
input program.
SET SEED RANDOM.
loop #i = 1 to 349.
*Draw data for 5 experiments and randomly compute data for them.
do repeat response = r1 to r5.
COMPUTE response = rv.normal(0,1).
end repeat.
end case.
end loop.
end file.
end input program.
Save outfile = "DataOut.sav" /Keep = r1 to r5.
*We now have a file with 5 experiments, each having 349 cases.
*Create a dummy variable to be used later to cut samples in half.
COMPUTE cuthalf = MOD($casenum,2) .
*Now create a variable for group membership.
IF ($casenum le 69) Group = 1 .
IF (($casenum ge 70) and ($casenum le 106)) Group = 2 .
IF (($casenum ge 107) and ($casenum le 259)) Group = 3 .
IF ($casenum ge 260) Group = 4 .
EXECUTE .
*Now we need to set the means and variances.
*These are the data we would expect if we drew from populations with the approp.
means and variances.
Do repeat response = r1 to r5.
IF (group =1) response = response*13.3 + 24.8.
IF (group = 2) response = response*5.2 + 16.6.
IF (group = 3) response = response*12.2 + 28.8.
IF (group = 4) response = response*8.8 + 13.7 .
end repeat.
*Now we want to run the analysis of variance on all 5 experiments, followed.
* by the LSD and Tukey procedures
MEANS
TABLES=r1 r2 r3 r4 r5 BY group
/CELLS MEAN COUNT STDDEV .
ONEWAY
r1 r2 r3 r4 r5 BY group
/MISSING ANALYSIS
/POSTHOC = TUKEY LSD ALPHA(.05).
EXECUTE .
*Now we want to cut the sample in half and rerun the analysis.
USE ALL.
COMPUTE filter_$=(cuthalf = 1).
VARIABLE LABEL filter_$ 'cuthalf = 1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .
MEANS
TABLES=r1 r2 r3 r4 r5 BY group
/CELLS MEAN COUNT STDDEV .
ONEWAY
r1 r2 r3 r4 r5 BY group
/MISSING ANALYSIS
/POSTHOC = TUKEY LSD ALPHA(.05).
EXECUTE .