They compared three treatments (and a waiting list control) for the treatment of posttraumatic stress disorder in rape victims.
- Stress Inoculation Training (SIT) Instruction in coping skills (deep breathing, muscle relaxation, stopping intrusive thoughts, etc.)
- Prolonged Exposure (PE) 7 sessions devoted to reliving the rape scene in their imagination.
- Supportive Counseling (SC) Patients taught general problem solving with therapist playing an indirect unconditionally supportive role. This was the control for nonspecific therapeutic effects.
- Waiting List Control (WL)
Therapy was carried out over 9 sessions. I am going to look only at the post-test scores.
There were pre and post treatment measures on a number of dependent variables. In fact, this was a much better study than I am going to present.
My dependent variable will be the sum of the subjects ratings on about 15 variables related to PTSDe.g. flashbacks, nightmares, memory difficulties, etc. Therefore, higher scores represent more disturbance.
Data:
SIT
(n = 14)PE
(n = 10)SC
(n = 11)WL
(n = 10)3
18
24
12
13
6
14
30
13
21
21
27
8
34
5
20
11
26
17
17
9
11
17
23
12
2
23
13
7
5
19
28
16
5
7
12
15
26
27
13
18
25
12
8
10
Mean
11.07
15.4
18.09
19.50
St. Dev.
3.95
11.12
7.13
7.11
SX
155
154
199
195
Grand Mean = 15.6222
Summary Table
Source
df
SS
MS
F
p
Group
3
507.8401
169.280
3.05
.0394
Error
41
2278.7377
55.579
Total
44
2786.5778
Assumptions
Homogeneity of Variance
:This is the assumption that the populations from which the data came all have the same variance, regardless of whether or not their means are equal.
This is really the same assumption we made in regression, when we spoke about homogeneity of variance in arrays. Get them to see this as a regression problem.
Ask whether they think this assumption is reasonable, given our sample results.
The following is the output from JMP, where you can see how various homogeneity of variance tests come out. SPSS does produce a test of this assumption, and attributes it to Levene. I'll talk about this in a minute when I return to the homogeneity of variance assumption.
Normality
The samples for each of the groups are assumed to come from normal populations.
This is an assumption that is virtually impossible to check with such small samples. Fortunately, it is also one that the test is robust against. (Define robust.)
Independence
This is the assumption that the errors associated with each observation are independent. For this particular example, it is the assumption that if you and I are in the same group, knowing how much you deviate from the group mean tells someone nothing about how much I deviate from the group mean.
Get them to think of examples where this might be violated.
e. g. married couples
Take a dorm and use all residents. Then roommates might be more alike than non-roommates.
This is an extremely important assumption, and one that we do not want to violate.
Homogeneity of Variance
Levene's Test
Test on the absolute deviations (or squared deviations) of scores from their own group mean.
If a group has a large variance, the deviations from the mean will be large. If it has a small variance, the deviations will be small. When we take the absolute value, or square them, we get around the fact that the sum of the deviations from the mean will always be zero.
Here we can see that we do have heterogeneity of variance to a very significant degree.
I didn't know if SPSS used squared or absolute deviations, so I calculated both and ran an analysis of variance on them. The results follow.
It is pretty obvious that SPSS uses absolute deviations.
The basic problem is that Anova assumes that the variances of the separate populations are homogeneous, and the test is based on that assumption. The question is what to do when we clearly violate that assumption. As with t tests, violation of homogeneity is particularly a problem when we have quite different sample sizes.
The variances in our samples range from 3.952 = 15.60 and 11.122 = 123.65, and the n's are 14 and 11.
I give a formula in the text (due to Welch) that could be used if you are really concerned. It is a very messy formula. I certainly would never expect anyone to memorize this.
![]()
The result would still be significant, although on a substantially reduced number of degrees of freedom, but a substantially larger F. I feel more confident, because I know that even when I take the differences in group variances into account, I still get a significant result.
From the text I define power as a function of the effect size (f') and f, which takes the sample size into account.
What I am going to look at here is what is called "post hoc power." There are a lot of good reasons for disliking post hoc power, but I am using it here to illustrate the calculations. What we are asking is "What is the power of this study if the population means and variances are exactly like those found by Foa.?" It would be better if we had an independent estimate of those, but for now we don't.
We have the following sample means
The grand mean is 16.015 (unweighted) (Notice that the unweighted mean is different from the weighted mean. We want the unweighted mean because it doesn't give any extra influence to one group over another. Explain what unweighted is.
We'll assume that we have equal n = 11.04, which is just the harmonic mean of the sample sizes.
The following tables contains our best estimates of population means
Group mj mj - m
(mj - m)2 1 11.07 -4.945 24.453 2 15.40 -0.615 0.378 3 18.09 2.075 4.306 4 19.50 3.485 12.145 Total (16.015) 0.000 41.28
(I can't see the decimals there, but the answer is 1.43.)
Notice that in calculating f' we are just taking the average of the squared deviations of the group means around the grand mean and dividing by MSerror . Put another way, the numerator is (almost) the variance of the group means, and the denominator is the the within group variance.
This says that our effect size (which Cohen labels f) is .43. Roughly speaking, this means that the groups differ from each other (on average) by .43 standard deviations--sort of.
Taking n into account, we get f = 1.43
We need to use tables of the noncentral F distribution, which can be found in the Appendix. We enter with dft = 3, dfe = 41, and f = 1.43, which we could round to 1.4.
With interpolation we find that beta = .40, giving us
Power - 1 - beta = 1 - .40 = .60
When I use a (free) software program called G*Power, I get a slightly higher value for power. They are likely to be closer to the true value.
I have a web page that gives the link to download this software. It is available at http://hobbes.uvm.edu/StatPages/More_Stuff/Gpower.html
The following is a piece of software (a Java applet) that came from: http://www.dartmouth.edu/~matc/X10/java/anova/Anova.html.) It took a while to figure it out, but after you enter any value in one of the boxes, you have to hit the Enter key for it to take effect. I went over this on Thursday, just because I thought that it would be confusing to use, but I think that it does a nice job of illustrating some aspects of power in Anova. (I'll skip this is class, but am leaving it in for reference.)

I entered the means and variance from the Foa, et al. article, assuming that those were the actual parameters.
Explain the above figure, with special reference to:

We have already dealt with this in our example, but I just want to make explicit what we are going. I gave a formula that said
If we had equal sample sizes, there is no reason to divide by nj as we go along, because the nj are all equal to a common n. Therefore we simply change the equation to
But the point I want to really beat on is that this only works in the one-way design. When we get to factorial analysis of variance, we will have to do something entirely different to handle the unequal n case, and, in fact, there is no completely satisfactory solution for unequal ns (except to not have any).
We are looking for a measure of the degree to which the independent variable affects the dependent variable.
One way to look at this would be to look at our measure of effect size. In t tests we called this d, but in Anova it is really f.
To quote what I said before: "This says that our effect size (which Cohen labels f) is .43. Roughly speaking, this means that the groups differ from each other (on average) by .43 standard deviations--sort of."
An alternative would be to get something like r2 between the independent and dependent variable, but the independent variable (X) is not a continuous variable, and it is not even ordinal. So we cant really correlate X and Y.
For the Foa data, we could just plot the means against groups, and it would have a nice linear relationship for some ordering of treatments.
But, suppose that I numbered the groups differently. There should be absolutely nothing wrong with doing that, but when I plot the data I get:
Obviously, we cant change our statement about the relationship between X and Y on the basis of how we happen to label the groups.
But in regression we pass a line through our best estimate of Y. In this case, our best estimate is the group mean. So why not pass a line through the four group means?
This is the best fitting curvilinear line.
In regression, we take as our measure of goodness of fit, the squared deviations from the line. But if the "line" is the line shown above, then the squared deviations from the line are really the squared deviations from the means. So,
![]()
becomes
For our data this becomes:
But, eta squared is biased because it is optimized for the particular set of data we have. Therefore we need to go to a less biased statistic:
w 2 is always going to be smaller than h 2, but generally it is only a little smaller.
Im going to skip all the stuff in the text about random models, and so can they.
How large does an effect need to be?
Rosenthal and Rubin (1979, 1982) have argued that even small effects can be important.
This is best seen if we take only two groups with a dichotomous dependent variable, but it generalizes to other cases.
Suppose that we have a magic drug when we hope will cure people. We give the drug to 100 people and a placebo to another 100 people. We get the following data:
Improved Worse Total Drug 66 34 100 Placebo 34 66 100
That looks very exciting. No one would be likely to argue that this drug is not a good thing. But we could calculate h 2 = .10, and be very discouraged because we have only accounted for 10% of the variance. On the other hand, we have gone from a success rate of only 34% from a success rate of 64%, and I cant imagine that anyone would suggest that this isnt a major accomplishment.
Rosenthal has argued in several places that just because an effect size is small doesnt mean that it isnt important. This one is clearly important.
One way to get a handle on this would be to run an analysis of variance between male and female heights. We have some general feeling what the difference is in the real world, but what about eta-squared?
The following data are real data.
Oneway
This is a good index of what we mean by a "large" percentage of variation.
Last revised: 11/19/01