# Repeated Measures Designs

## Announcements

Talk about exam (not until 3/7/02) We will have 4 classes on repeated measures and a good bit of review before then.

I'll be at COGDOP on the 21st, but we still have lab.

## Repeated Measures Designs

A repeated measures design is one in which we measure subjects repeatedly over time.

Sometimes it isn’t really the same subject that we measure repeated: it could be the same family.

e.g.

 Family Mom Dad Daughter Son#1 1 23 28 19 18 2 34 36 28 30 3 20 23 14 18

The important thing is that we have a set of non-independent measures. If Mom and Dad are high on some variable, probably the kids are too.

If we are measuring the same subjects repeatedly, it is even easier to see why this would happen.

The most important thing to note here is that we are talking about a non-zero correlation between a set of measurements. For example, if columns 1, 2, 3, and 4 represented repeated measurements on a set of participants, it is highly doubtful that the correlation between columns 2 and 3, or 3 and 4 would be 0.00 in the population.

However, if these were just sets of scores on four independent groups of participants, then we couldn’t really calculate a correlation. But if we did, its expectancy would be 0.00

I discovered a year or two ago that I am being too simplistic when I imply that the proper analysis is obvious.. Exercises 14.10 and 14.20 refer to a set of variables on "goals," "settings," and "dispositions" in stories told by people of different ages. I argued that this was a repeated measure because the same subject had a score on each of those types of content. Someone from William and Mary pointed out that perhaps it should be treated as a standard Manova. A Manova is basically an analysis of variance where you have multiple dependent variables.  I think he has a good point. (If you measure exactly the same thing on repeated trials, the answer is pretty clear. But when you are measuring somewhat different things, it may not be so clear.) We will see an example of a Manova in about two weeks.

Example from Foa et al. (1991)

This is the same study we saw before on rape and PTSD, but then I was looking only at the number of reported symptoms at the end of the study. But Foa measured people at baseline, at the end of therapy, and at follow-up. I’m trying to replicate her results.

For now, I’m going to look only at the Supportive Counseling group. I’ll bring in addition groups later, when I complicate things.

In order to generate data, I need to know something about the correlations between observations. (I am putting this in to emphasize that these correlations play a role.)

Ask what the students think these are likely to be.

I had to choose something, so I chose r = .70 for all three correlations.

I know how to create data with a fixed (or random) correlation between columns, and then I can adjust means to whatever I want them to be. The data I generated are:

 Subj Pre Post Follow-Up 1 21 15 15 2 24 15 8 3 21 17 22 4 26 20 15 5 32 17 16 6 27 20 17 7 21 8 8 8 25 19 15 9 18 10 3 Mean 23.89 15.67 13.22 s.d. 4.20 4.24 5.78 s2 17.61 18.00 33.44

Note: The data are available at faorep.dat, but you will have to read them as an ASCII file. They are also available at foarep.sav, ( an SPSS system file), but downloading a system file is not always as error-free as it should be, partly depending on your browser.

We could think of this as a two-way (subj ´ trials)

But we don’t have independent observations. Knowing one person’s score at Post tells us something about their score at Follow-Up--at least we hope that it does. (Elaborate).

In this particular case it might not make sense to assume that there is a correlation between Pre and Follow-Up, though I have assumed that. (Why do you suppose that is?)

• Emphasize the fact that with no  systematic variability at pretest, the correlation might turn out to be near 0.00.
• I have unintentionally stumbled into one of the problems with repeated measures--this correlation is likely to be low, but the others will be high.

We have one score/cell, which makes it impossible to separate error from the Subj X Time interaction. In fact, the error term  is the interaction, and can be calculated that way.

Get them to think about the difference in how we would set up the data file for a two-way, as opposed to this design.

For this problem, the data are actually entered just as they are presented.

Correlation and Covariance Matrices

I created this using SPSS and specifying options to get the sums of squares and cross products and the covariances. Then I had to play with the pivot table to arrange the table.

GROUP = 3.00 = Supportive Counseling

The covariance matrix is actually the more important one, because we have an assumption about it. We assume that the off-diagonal elements are equal to each other, and the diagonal elements are equal to each other. We don’t worry about the relationship between the diagonal and off-diagonal elements.

This is the assumption of Compound Symmetry, which is a special case of Sphericity. If you have compound symmetry, you have sphericity. I could make up some weird cases where the reverse doesn’t hold, but they aren’t of practical importance. Students will see the word "sphericity" used with most software.

This assumption, in everyday practice, turns out to be an assumption that the correlations between variables are constant.

With only 9 cases, we couldn’t hope to test this.

## The Anova

If I were doing this with pencil and paper, or if I were using a different kind of software, I would have the following summary table.

 Source df SS MS F p Between Subj 8 397.85 Within Subj 18 716.66 Time 2 562.07 281.04 29.09 .000 Error 16 154.59 9.66 Total 26 1114.51

The Anova from SPSS looks quite different. Even after I cut it down drastically, we get the following pieces of output:

Discuss this printout.

Show that there is nothing important in the  Between subjects part because there are no comparisons between different subjects. We just see a test on the null hypothesis that the grand mean is 0.00.

The  Within subjects part is the only thing we care about in this particular example.

The effect of time is clearly significant, although we don’t yet know which time is different from which other time. I would assume that we would like to know if scores went down from Pre to Post. We probably assume that they go up some from post to Follow-Up, and that is probably not worth testing. But, I’d like to know if people reverted all the way back to where they were at Pre, so perhaps we can test these in a minute.

If we are satisfied that we have met the assumption of sphericity, then we are all set--just look at the line labeled "sphericity assumed." If not, we have to apply corrections, which follow.

Notice that they give the Greenhouse-Geisser / Hyuhn-Feldt corrections, and then they applied them to the df. Thus .764*16 = 12.224

Discuss these corrections.

They aren’t going to make any difference here, because the F is so huge that changing the df isn’t go to alter the conclusions.

The df would be 1.53 and 12.22 using G-G, and 1.82 and 14.52 using H-F.

I’d go with H-F here.

Note that even the bare minimum df would be 1 and 9, which has a critical F value of 5.12.

### Conclusions

We can simply conclude that there are significant differences between the mean number of symptoms reported at the different times.

### Multiple Comparisons

The question of multiple comparisons in repeated measures Anova is a big deal. There is no particular problem when we have different groups of subjects, but there is a problem when we have the same group(s) measured multiple times. Most traditional statistical software doesn't apply a standard multiple comparison test for these, although they do apply trend analyses.

When the repeated measures variable really is the same measure collected over several different times, trend analysis actually makes a good deal of sense. (But I am going to ignore that right now.)

The easiest way to make the comparisons we want (pre vs follow-up, post vs follow-up) is to fall back on our old friend the t test, perhaps adjusted as a Bonferroni.

I’m going to compare Pre-Post and Pre-FollowUp

We want a paired t test here. The easiest way is just to use SPSS to calculate it.

The first of these is the difference between Pre and Post: (Note that the following tests are standard t tests, they are not created as part of GLM-Repeated. I just told it to run two t-tests.)

GROUP = 3.00

What can we conclude from these results?

Simple contrasts

We can accomplish the same thing by using contrasts. Set up GLM-Repeated to run the analysis, and under the Contrast button make sure that Time is highlighted and then chose Simple contrasts with "last" as the reference level. This will contrast each level with the last level, which is what we have just done. (Be certain that after you select Simple you click on the "change" button.)

Note that the F values for these contrasts are exactly equal to t2 for the t tests.

Note that it produces a separate error term for each contrast, which makes this equivalent to separate t tests.

Note also that MSerror for the contrasts = n (st. error mean2) for the t tests.

This just goes to show that these are the same tests.

If I had wanted a comparison of each group with the next group, I could have specified Contrast/Repeated on the dialogue box. I didn't do that here, but students might think about why I might want to.

Comment on why we would want separate error terms, and how that relates to the assumption of sphericity.

Now I’ll extend this Anova to handle multiple groups as well as repeated measures.

One Between-Group and One Within-Group Variable

Here I am going to use the more complete Foa et al. study, but I am limited to three groups because the Waiting List group doesn’t have pre-post-followup.

The data follow:

 Group Pre Post Follow-Up 1 26 13 7 1 23 15 29 1 32 9 9 1 22 12 20 1 27 10 11 1 25 8 10 1 38 11 22 1 29 9 13 1 20 3 3 2 30 34 27 2 26 11 19 2 21 12 10 2 31 34 21 2 25 21 8 2 24 1 6 2 16 14 0 2 28 26 10 2 26 6 14 3 21 15 15 3 24 15 8 3 21 17 22 3 26 20 15 3 32 17 16 3 27 20 17 3 21 8 8 3 25 19 15 3 18 10 3

Notice that I have added a column for Group membership, where 1 = SIT, 2 = PE, and 3 = SC

First we need to look at the correlation matrices for each group separately.

Explain why I would want to look at this!

• Note that the sample size is really too small to make much of the result.
• What I really care about are the covariances, because that is what the assumption is really all about.

Correlations

GROUP = 1.00

GROUP = 2.00

GROUP = 3.00

Note that for two of these matrices, the covariances on the diagonal are pretty different. This should make us nervous.

Now the Group Means for each variable

Elaborate on the results. It looks as if there are differences at Post, but not at the other two times.

### The Anova

If I were doing this with pencil and paper, or if I were using a different kind of software, I would have the following summary table.

 Source df SS MS F p Between Subj 26 2152.32 Groups 2 37.80 18.90 0.21 .808 Error(b) 24 2114.52 88.10 Within Subj 54 3851.33 Time 2 2391.80 1195.90 49.17 .000 TXG 4 292.05 73.01 3.00 .027 Error(w) 48 1167.48 24.32 Total 48 6003.65

SPSS will give me the same answers, but it sets up the summary tables in quite a different way.

The SPSS printout (heavily edited)

General Linear Model

Profile Plots

Ask about what they would do for multiple comparisons.

Last revised: 02/10/02