Some time ago I wrote two web pages on using mixed-models for repeated measures designs. Those pages can be found at Mixed-Models-for-Repeated-Measures1.html and Mixed-Models-for-Repeated-Measures2.html. This page, or perhaps set of pages, is designed for a different purpose. I am trying to sort out mixed models so that the average reader can understand their purpose and their relationship to one another. This involves focusing on fixed and random models and on repeated measures, and trying to explain why they go together.

Just about every source you read on these models takes a somewhat different approach, and it is not always clear how they relate to each other and why they look at the models so differently. One way to uderstand what is going on, if only vaguely, is to understand that the traditional analyses of variance assumes independent observations, or, with repeated measures, compound symmetry. This suggests that observations within the same group should be uncorrelated, or correlated in an unrealistic way. That's fine if you have just a bunch of random subjects, but it creates problems when you have repeated measures or you have independent variables nested within other independent variables. I want to try to sort this out. In addition, standard analyses of variance become controversial when you have unbalanced designs, meaning unequal cell sizes.

There are many books written on this topic, but each seems to take a slightly different approach and you sometimes end up wondering what they have to do with the problem you are trying to address. On top of that, different books and articles use different software, and a discussion written for SAS looks, on the surface, quite different from one written with respect to R. I can't cover everything, so I will first focus on SPSS, which uses a graphical user interface, and R, which is becoming more and more common, but does not have a GUI that will handle much of what I will cover here. Later on I will move to SAS

Perhaps my favorite source for mixed models is Maxwell and Delany (2004) *Designing Experiments and Analyzing data* (Chapter 15). They have done a particularly good job of working through the meaning of such designs, and are quite interpretable. For those interested in working through this material using *R*, there is a really excellent, and fairly brief, discussion of these models in a tutorial by Bodo Winter. It is entitled *A very basic, and excellent, tutorial for performing linear mixed effects analyses*, and can be downloaded at http://www.bodowinter.com/tutorials.html. You want the second tutorial, but it wouldn't hurt to read the first one as well.

But what do we mean by mixed models? And how do repeated measures enter the discussion? And then what about nested effects? All of these topics have a role in this discussion, and sorting them out in a meaningful way has proved to be difficult. Part of my difficulty is that I wanted to come at this topic from the direction of an anlysis of variance, with which you probably have some reasonably familiarity. There are situations under which the analysis of variance and maximum likelihood, the major tool for mixed models, produce the same result. It is therefore tempting to start with Anova and then move on. But I have been forced to give up on that approach simply because it leads me down too many alleys at the same time. Instead, I am going to go at the issues head on.

You might think that the distinction between fixed and random variables is something that statisticians worked out years ago. But actually there is still a good bit of discussion about them. There are two issues here. First, what is the difference between fixed and random variables, and second, how do we estimate their effects and why are they different.

If you want to run a study to compare the effects of four different drugs on the treatment of depression, you will want to treat Drugs as a **fixed effect**. You have deliberately selected those drugs and not others, and your interest is just in their effects. Put a different way, you want to focus on the differences between groups. You are not planning to generalize to other drugs. In the field of experimental psychology this is the way that we ran most of our experiments, and fixed effects were the basic building blocks.

The designs that we often use for a standard analysis of variance include one random variable, normally "subjects," but this is often about it. A random variable can be thought of as a variable whose levels were chosen at random. Alternatively, some would argue that a random variable is one for which you want to make a global statement about the population of levels. When we apply an experiment to many different classrooms, we really aren't particularly interested in saying that Miss Smith's class is better than Mr. Jones' class. We want to make a general statement about the *variability* among classrooms, as opposed to statements like Drug A is a better treatment for depression than Drug B, which is what we want for fixed variables.

But this last statement brings me to another distinction that will color the output we receive and, in some ways, make the printout from most software analyses look as if the distinction from the analysis of variance is even greater than it is. In the analysis of variance approach, even when our model has a random experimental variable, we expect to see a summary table that emphasizes *F* tests on each of the effects. In other words, we expect to see output that says there are significant differences due to A, perhaps no differences due to B, and perhaps a significant interaction. Then we flesh out what those mean, perhaps with multiple comparison tests, effect sizes, and so on, and stop. But in a mixed model you will see that we pull apart fixed factors from random factors, present a significance test on the fixed factors, if we can figure out how to do that, and then compute and focus on the *variance* of the random factors. So our resulting output has two quite different parts. And this really does make sense. I want to know if Drug A is a better treatment than Drug B, but I don't particularly want to know that Mary is better than John--although I might want to make a statement about how variable subjects are in general. So expect to see output that is in many ways different from what you are used to.

There is at least one more feature of mixed-models that makes things look quite different. In Howell (1913 - Chapter 13) I show that with a balanced design, which generally means equal cell sizes, we can work out expected mean squares, figure out how to compute *F* values for fixed effects, and go our merry way. But in the general case where we often don't have nice balanced designs, our statistical test for that fixed effect will often boil down to a chi-square test comparing a full and reduced model, where the reduced model omits the factor in question. That makes it look as if we have gone a long way in a different direction. We really haven't, but it certainly looks that way.

You probably know that for a standard analysis of variance we assume that observations are independent. Or, when it comes to repeated measures, we assume sphericity or compound symmetry. But for repeated measures designs in particular, those are frequently not reasonable assumptions. In the standard analysis of variance we dropped back to corrections by Huynh and Feldt (1970) or Greenhouse and Geisser (1958). But those are only approximations. With mixed models we no longer have to make a compound symmetry assumption, so we can avoid approximations. You will see how to do that later. But I need to say a bit more about this here because it is a central aspect of dealing with mixed models.

It is easier to see what the problem is with assuming independence if we look at a repeated measure. Suppose that we follow a group of patients by testing them every month for six months. If we had compound symmetry, a standard repeated measures assumption, then the correlation between patients' data at Time 1 and Time 2 would be the same as the (expected) correlation between the data on Time 1 and Time 5. But is that reasonable? Don't you think that your subjects' Time 1 and Time 2 scores will be more highly correlated that their Time 1 and Time 5 scores. In other words data collected closer in time will be more similar than data collected further apart in time? But if that is the case, we won't have compound symmetry. As I said, we can, on occasion, fall back on correction factors such as those of Greenhouse and Geisser or Huhyn and Feldt, but those are not always satisfactory either. But with our new analysis of choice, maximum likelihood, we can find other ways around this problem that give a better solution.

For at least the first part of this page I am going to base my analyses on the SPSS package and on *R*. SPSS has the advantage (???) of offering a graphical user interface to help you set up your analysis. It is also widely available, expecially on university campuses. At the moment, *R's* lme4() package does not allow you to specify alternative correlational structures, although you can specify that the slopes for each individual over time are assessed separately.

But, while a graphical user interface is generally easier to use, I don't find it all that easy with the mixed models analysis. But I found a very good discussion of using that interface in a chapter by Howard Seltman at CMU, which is available on line at http://www.stat.cmu.edu/~hseltman/309/Book/chapter15.pdf. I think that he has done a very good job of explaining what you need to do. However in the SPSS code you see below, I have given you the syntax. You can simply paste that in to SPSS and run it from there. (Of course you need to do something about identifying the data set, but you can use the GUI to do that.) By the way, I strongly recommend that you go to Howard Seltman's home page at http://www.stat.cmu.edu/~hseltman/ and click on the link to "Statistics/Math." He has a wealth of amazing links to important articles and material on the web. In addition, he has a very complete FREE statistics text on the web at http://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf. Check it out, although I hope that you think my book is better.

My motivation for this document came from a question asked by Rikard Wicksell at Karolinska University in Sweden. He had a randomized clinical trial with two treatment groups and measurements at pre, post, 3 months, and 6 months. His problem is that some of his data were missing. He considered a wide range of possible solutions, including "last trial carried forward," mean substitution, and listwise deletion. In some ways listwise deletion appealed most, but it would mean the loss of too much data. One of the nice things about mixed models is that we can use all of the data we have. If a score is missing, it is just missing. It has no effect on other scores from that same patient.

Another advantage of mixed models is that we don't have to be consistent about time. For example, and it does not apply in this particular example, if one subject had a follow-up test at 4 months while another had their follow-up test at 5 months, we simply enter 4 (or 5) as the time of follow-up. We don't have to worry that they couldn't be tested at the same intervals.

I have created data to have a number of important characteristics. (These are my own fabricated data, and should not be taken as the data that Wicksell found.) There are two groups - a Control group and a Treatment group, measured at 4 times. These times are labeled as 0 (pretest), 1 (one month posttest), 3 (three months follow-up), and 6 (six months follow-up). Both Group and Time are fixed variables. I created the treatment group to show a sharp drop at post-test and then sustain that drop (with slight regression) at 3 and 6 months. The Control group declines slowly over the 4 intervals but does not reach the low level of the Treatment group. There are noticeable individual differences in the Control group, and some subjects show a steeper slope than others. In the Treatment group there are individual differences in level but the slopes are not all that much different from one another. You might think of this as a study of depression, where the dependent variable is a depression score (e.g. Beck Depression Inventory) and the treatment is drug versus no drug. If the drug worked about as well for all subjects the slopes would be comparable and negative across time. For the control group we would expect some subjects to get better on their own and some to stay depressed, which would lead to differences in slope for that group. These facts are important because with random variables the individual differences will show up as variances in subjects' intercepts, and any slope differences will show up as a significant variance in the slopes. The only random variable in this particular example is "Subjects."

The data used below are available (with no missing values) at WicksellLongComplete.dat. (Until I get things cleaned up, there are similarly titled files there that you want to ignore,) I need to say something important about the structure of the data. If you were planning on running a standard analysis of variance, you would most likely put each subject's data on a single line. You would probably have Subject Number, Group, Time1, Time2, Time3, and Time4 all sitting side by side. That is generally referred to as the "wide format." But for what we will be doing, we will use what is called the "long format." There will be a separate line for each dependent variable... In other words, we will have a column for Subject Number, one for Time (represented as 0, 1, 3, or 6), one for Group, and one for the dependent variable. Then we will go to the next line, enter the same Subject Number, the same Group, a "1" for Time2, and then the Time2 measure. This data file is shown below, although to save space here I have typed it as three separate columns rather than display it all in one very long table with one line of variable names and 96 lines of data.

Interaction Plot, First by Groups, and then by Subjects within Groups

Subj Time Group dv 1 0 1.00 296.00 1 1 1.00 175.00 1 3 1.00 187.00 1 6 1.00 192.00 2 0 1.00 376.00 2 1 1.00 329.00 2 3 1.00 236.00 2 6 1.00 76.00 3 0 1.00 309.00 3 1 1.00 238.00 3 3 1.00 150.00 3 6 1.00 123.00 4 0 1.00 222.00 4 1 1.00 60.00 4 3 1.00 82.00 4 6 1.00 85.00 5 0 1.00 150.00 5 1 1.00 271.00 5 3 1.00 250.00 5 6 1.00 216.00 6 0 1.00 316.00 6 1 1.00 291.00 6 3 1.00 238.00 6 6 1.00 144.00 7 0 1.00 321.00 7 1 1.00 364.00 7 3 1.00 270.00 7 6 1.00 308.00 8 0 1.00 447.00 8 1 1.00 402.00 8 3 1.00 294.00 8 6 1.00 216.00 9 0 1.00 220.00 9 1 1.00 70.00 9 3 1.00 95.00 9 6 1.00 87.00 10 0 1.00 375.00 10 1 1.00 335.00 10 3 1.00 334.00 10 6 1.00 79.00 11 0 1.00 310.00 11 1 1.00 300.00 11 3 1.00 253.00 11 6 1.00 140.00 12 0 1.00 310.00 12 1 1.00 245.00 12 3 1.00 200.00 12 6 1.00 120.00 13 0 2.00 282.00 13 1 2.00 186.00 13 3 2.00 225.00 13 6 2.00 134.00 14 0 2.00 317.00 14 1 2.00 31.00 14 3 2.00 85.00 14 6 2.00 120.00 15 0 2.00 362.00 15 1 2.00 104.00 15 3 2.00 144.00 15 6 2.00 114.00 16 0 2.00 338.00 16 1 2.00 132.00 16 3 2.00 91.00 16 6 2.00 77.00 17 0 2.00 263.00 17 1 2.00 94.00 17 3 2.00 141.00 17 6 2.00 142.00 18 0 2.00 138.00 18 1 2.00 38.00 18 3 2.00 16.00 18 6 2.00 95.00 19 0 2.00 329.00 19 1 2.00 62.00 19 3 2.00 62.00 19 6 2.00 6.00 20 0 2.00 292.00 20 1 2.00 139.00 20 3 2.00 104.00 20 6 2.00 184.00 21 0 2.00 275.00 21 1 2.00 94.00 21 3 2.00 135.00 21 6 2.00 137.00 22 0 2.00 150.00 22 1 2.00 48.00 22 3 2.00 20.00 22 6 2.00 85.00 23 0 2.00 319.00 23 1 2.00 68.00 23 3 2.00 67.00 23 6 2.00 12.00 24 0 2.00 300.00 24 1 2.00 138.00 24 3 2.00 114.00 24 6 2.00 174.00

To give a starting point for the analyses that follow, I have included the SPSS and R code that would run the appropriate analysis. With balanced data this analysis is just fine, which is why I use it as a starting point, but you are going to see that we will vary the code as we go along. Notice several things in the code that follows. In SPSS I have to specify that the independent variables of Time and Group are nominal variables. In R they need to be specified as factors. I first need to convert the independent variables to factors. Otherwise the analysis would treat Time as a continous measure with 1 *df* instead of 4 levels. But I am also going to need time as a continuous measure in later analyses, so I also create the continuous Time.cont variable before modifying time to a factor. In SPSS be sure that you specify that this is a scaled variable.

DATASET ACTIVATE DataSet4. COMPUTE Time.cont=Time. EXECUTE. MIXED dv BY Group Time /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED=Group Time Group*Time | SSTYPE(3) /METHOD=REML /REPEATED=Time | SUBJECT(Subject) COVTYPE(CS).

If I ran the same analysis Using *R*, I would use the following code and obtain the results shown below. You will see that the results look exactly like those of SPSS *except* for the -2 Restricted Log Likelihood. I do not know why they differ. They both call for REML, so they should be the same. Go Figure!

If you look at the output for fixef(model2), you will first obtain the intercept, which is the grand mean of dv, and then the group and cell effects, which are their deviations from the grand mean or, for example, mean(dv) - mean(Group1) - mean(Time1) + mean(Cell11). If you look at the random effects with ranef(model2), you will see the individual intercepts for each subject.

# Analysis of Wicksell Data with complete data rm(list = ls()) setwd ( http://www.uvm.edu/~dhowell/StatPages/DataFiles") dataComplete <- read.table("http://www.uvm.edu/~dhowell/StatPages/DataFiles/WicksellLongComplete.dat", header = TRUE) data <- within(dataComplete, { Time.cont <- Time #A continuous (or at least ordered) variable Time = factor(Time) Group = factor(Group) Subject = factor(Subject) }) options(contrasts = c("contr.sum","contr.poly")) library(lme4) library(car) model2 <- lmer(dv~Group + Time + Time:Group + (1|Subject), REML = TRUE, data = data) summary(model2) anova(model2) fixef(model2) ranef(model2) ### Interaction plot with(dataComplete, { ## Need this because interaction.plot will not take # "data = dataComplete" par(mfrow = c(2,1)) interaction.plot(Time, factor(Group), dv, type="b", pch = c(4,6), legend = "F", col = c(4,6)) legend(2, 300, c("Control", "Treatment"), col = c(4,6), text.col = "green4", lty = c(2, 1, 3), pch = c(4, 6), merge = TRUE, bty = "n") }) #### Output Linear mixed model fit by REML ['lmerMod'] Formula: dv ~ Group + Time + Time:Group + (1 | Subject) Data: data REML criterion at convergence: 1011.9 Scaled residuals: Min 1Q Median 3Q Max -2.79300 -0.38860 0.00641 0.55955 1.76696 Random effects: Groups Name Variance Std.Dev. Subject (Intercept) 2539 50.39 Residual 2761 52.54 Number of obs: 96, groups: Subject, 24 Fixed effects: Estimate Std. Error t value (Intercept) 188.437 11.600 16.244 Group1 42.958 11.600 3.703 Time1 103.938 9.288 11.190 Time2 -12.854 9.288 -1.384 Time3 -30.396 9.288 -3.273 Group1:Time1 -31.000 9.288 -3.338 Group1:Time2 38.125 9.288 4.105 Group1:Time3 14.750 9.288 1.588 Correlation of Fixed Effects: (Intr) Group1 Time1 Time2 Time3 Gr1:T1 Gr1:T2 Group1 0.000 Time1 0.000 0.000 Time2 0.000 0.000 -0.333 Time3 0.000 0.000 -0.333 -0.333 Group1:Tim1 0.000 0.000 0.000 0.000 0.000 Group1:Tim2 0.000 0.000 0.000 0.000 0.000 -0.333 Group1:Tim3 0.000 0.000 0.000 0.000 0.000 -0.333 -0.333 anova(model2) Analysis of Variance Table Df Sum Sq Mean Sq F value Group 1 37860 37860 13.7141 Time 3 373803 124601 45.1351 Group:Time 3 74654 24885 9.0142 fixef(model2) (Intercept) Group1 Time1 Time2 Time3 Group1:Time1 Group1:Time2 Group1:Time3 188.43750 42.95833 103.93750 -12.85417 -30.39583 -31.00000 38.12500 14.75000 ranef(model2) $Subject (Intercept) 1 -14.857748 2 17.970176 3 -20.754980 4 -93.684082 5 -7.584496 ... ...

The information criteria given in the output of SPSS are measures of goodness of fit. We will return to these shortly. Notice that the summary table for the fixed effects is a standard analysis of variance table, complete with *p* values. I should also point out that SPSS allowed me to specify compound symmetry of the covariances, which results in the table of covariance parameters to give only one entry for the CS covariance element, because all of these covariances are assumed to be equal in the population.

Here we see that each of the effects in the overall analysis is significant. We don't care very much about the group effect because we expected both groups to start off equal at pre-test. What is important is the interaction, and it is significant at *p* = .0001. Clearly the drug treatment is having a differential effect on the two groups, which is what we wanted to see. The fact that the Control group seems to be dropping in the number of symptoms over time is to be expected and not exciting, although we could look at these simple effects if we wanted to. We would just run two analyses, one on each group. I would not suggest pooling the variances to calculate *F*, though that would be possible.

To go on one small amount, you can ask for different slopes by running the model

# Now use the following model which allows for groups to have different slopes over time. model3 <- lmer(dv ~ Group + Time + Time:Group + (1 + Time.cont|Subject), data = data) summary(model3) anova(model3) fixef(model3) ranef(model3)

The printout for the random effects will show both different intercepts and different slopes for each subject. That is probably a more realistic analysis, because we would expect that different people would change in somewhat different ways over time. Notice that I had to use Time.cont to obtain the differences in slopes. That makes sense because if time is just a categorical variable, how would you expect to calculate a slope with an arbitrary ordering of a variable?

There is at least one more final step. We have two models, one of which allows varying slopes across subjects while the other does not. We can compare these two models with another "anova" statement. This will produce a chi-square test on the difference between the two models. The result is given below, where we see that the model with varying slopes is a better fit to the data. (Of course, I created the data to have that effect.)

anova(model2, model3) refitting model(s) with ML (instead of REML) Data: data Models: model2: dv ~ Group + Time + Time:Group + (1 | Subject) model3: dv ~ Group + Time + Time:Group + (1 + Time.cont | Subject) Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) model2 10 1081.8 1107.4 -530.87 1061.8 model3 12 1076.4 1107.2 -526.21 1052.4 9.3349 2 0.009396 ** ---

The final general point I will make about mixed models, actually about maximum likelihood, is that missing data present far fewer problems than they do with the analysis of variance. If a piece of data is missing, it is just missing and you go on. But in the analysis of variance with repeated measures, if a subject is missing a score on Trial 2, all of that person's data are left out of the analysis. Our analyses include only subjects with complete data. And that can do a serious job of distorting an analysis and it certainly eats into the degrees of freedom. So here is one very big plus for maximum likelihood.

In the section above I have computed a mixed model analysis. But with balanced data the result does, and should, come out just about the same. But now I want to go to a model with missing, and therefore unbalanced data. I use the same data set but deleted data pretty much at random. I took out nine data points from 7 cases. For a standard analysis of variance this would mean completely eliminating those nine cases, thus losing 9*4 = 36 observations. But with a maximum liklihood I can just lose the 9 missing data points. That's a big difference in itself. I have not reproduced the data, but they are available at http://www.uvm.edu/~dhowell/StatPages/Mixed-Models-Repeated/WicksellLongMiss.dat. Before going on, I show the same analyses that we just ran, but this time on the missing data.I will begin with SPSS printout calling for a Type III analysis assuming compound symmetry. This is shown below

Type III Tests of Fixed Effects Source Numerator df Denominator df F Sig. Intercept 1 22.327 269.632 .000 Time 3 58.646 32.453 .000 Group 1 22.327 16.524 .001 Time * Group 3 58.646 6.089 .001 Dependent Variable: dv. And now using the lmer function from R. Wickselllm <- lmer(dv ~ Group*Time + (1|Subj), data = data.miss ) anova(Wickselllm) Analysis of Variance Table Df Sum Sq Mean Sq F value Time 3 282491 94164 31.8708 Group 1 46529 46529 15.7483 Group:Time 3 53973 17991 6.0892

You may note that the *F* values for the interactions are the same in the two analyses, but the main effects are slightly different. This is because *R*, by default, performs a Type I analysis of variance, whereas SPSS uses Type III by default. If you look at the code I gave for R, I asked R to allow each subject to have a different intercept. The way I have set this up, that means that they will start off a different levels. But I have not asked it to allow them have different slopes--i.e. change differentially over trials. So that is where I will go now.

If I want *R* to allow different slopes for each subject, I have to code the command differently. Writing

WicksellMiss <- lmer(dv ~ Group*Time + (1|Subj), data = data.miss )

says that I want a different intercept for each subject. That is what the "1" stands for. If I want a different slope, I need to write something like

lmer(dv ~ Group*Time + (1 + Time|Subj), data = data.miss )

but that won't work because in my code Time is a factor. Here is where that Time.cont comes in. It is a numeric variable, so you can have a slope for Time.cont. By default, you will also get different intercepts, whether or not you keep the "1." The results follow. Notice that to get this output I needed to ask for a summary of WicksellMiss, not just the anova.

summary(WicksellMiss) Linear mixed model fit by REML ['lmerMod'] Formula: dv ~ Group + Time + Time:Group + (1 + Time.cont | Subject) Data: data REML criterion at convergence: 907.1 Scaled residuals: Min 1Q Median 3Q Max -1.71598 -0.44530 0.03835 0.50584 1.83195 Random effects: Groups Name Variance Std.Dev. Corr Subject (Intercept) 5315.9 72.91 Time.cont 160.8 12.68 -0.76 Residual 1828.4 42.76 Number of obs: 87, groups: Subject, 24 Fixed effects: Estimate Std. Error t value (Intercept) 194.761 11.860 16.421 Group1 48.280 11.860 4.071 Time1 97.614 10.147 9.620 Time2 -20.634 8.996 -2.294 Time3 -35.945 8.303 -4.329 Group1:Time1 -36.321 10.147 -3.579 Group1:Time2 27.022 8.996 3.004 Group1:Time3 9.852 8.303 1.187 Correlation of Fixed Effects: (Intr) Group1 Time1 Time2 Time3 Gr1:T1 Gr1:T2 Group1 -0.014 Time1 0.225 0.017 Time2 0.163 0.015 0.079 Time3 -0.058 -0.004 -0.331 -0.349 Group1:Tim1 0.017 0.225 -0.019 -0.018 0.004 Group1:Tim2 0.015 0.163 -0.018 -0.020 -0.009 0.079 Group1:Tim3 -0.004 -0.058 0.004 -0.009 -0.051 -0.331 -0.349 > anova(model3) Analysis of Variance Table Df Sum Sq Mean Sq F value Group 1 34725 34725 18.9925 Time 3 195434 65145 35.6299 Group:Time 3 45995 15332 8.3854

The following set of commands (from SAS) plot the results of each individual subject broken down by groups. Earlier we saw the group means over time. Now we can see how each of the subjects stands relative to the means of his or her group. In the ideal world the lines would start out at the same point on the Y axis (i.e. have a common intercept) and move in parallel (i.e. have a common slope). That isn't quite what happens here, but whether those are chance variations or systematic ones is something that we will look at later. We can see in the Control group that a few subjects decline linearly over time and a few other subjects, especially those with lower scores decline at first and then increase during follow-up.

To this point all of our analyses have been based on an assumption of compound symmetry. (The assumption is really about sphericity, but the two are close and **Proc Mixed** refers to the solution as type = cs.) But if you look at the correlation matrix given earlier it is quite clear that correlations further apart in time are distinctly lower than correlations close in time, which sounds like a reasonable result. Also if you looked at Mauchly's test of sphericity (not shown) it is significant with *p* = .012. While this is not a great test, it should give us pause. We really ought to do something about sphericity.

In what follows, I am switching to SAS because it offers greater flexibility. (Immediately below this paragraph is the comparable syntax for SPSS using WicksellLongMiss.dat. You will need to change the covariance structure as SAS changes, but otherwise the outputs agree remarkably well.) The first thing that we could do about sphericity is to specify that the model will make no assumptions whatsoever about the form of the covariance matrix. To do this I will ask for an unstructured matrix. This is accomplished by including type = un in the repeated statement. This will force SAS to estimate all of the variances and covariances and use them in its solution. The problem with this is that there are 10 things to be estimated and therefore we will lose degrees of freedom for our tests. But I will go ahead anyway. For this analysis I will continue to use the data set with missing data, though I could have used the complete data had I wished. I will include a request that SAS use procedures due to Hotelling-Lawley-McKeon (hlm) and Hotelling-Lawley-Pillai-Samson (hlps) which do a better job of estimating the degrees of freedom for our denominators. This is recommended for an unstructured model. The results are shown below.

Comparable Syntas for SPSSMIXED dv BY Group Time /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED=Group Time Group*Time | SSTYPE(3) /METHOD=REML /REPEATED=Time | SUBJECT(Subject) COVTYPE(UN).

Proc Mixed data = lib.WicksellLongMiss; class group time subject; model dv = group time group*time ; repeated time /subject = subject type = un hlm hlps rcorr; run; Estimated R Correlation Matrix for subject 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.5858 0.5424 -0.02740 2 0.5858 1.0000 0.8581 0.3896 3 0.5424 0.8581 1.0000 0.3971 4 -0.02740 0.3896 0.3971 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) subject 5548.42 UN(2,1) subject 3686.76 UN(2,2) subject 7139.94 UN(3,1) subject 2877.46 UN(3,2) subject 5163.81 UN(3,3) subject 5072.14 UN(4,1) subject -129.84 UN(4,2) subject 2094.43 UN(4,3) subject 1799.21 UN(4,4) subject 4048.07 Fit Statistics -2 Res Log Likelihood 883.7 AIC (smaller is better) 903.7 AICC (smaller is better) 906.9 BIC (smaller is better) 915.5 -------------------------------------------------------------------------------------------------- Same analysis but specifying an unstructured covariance matrix. The Mixed Procedure Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 9 40.92 <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 22 17.95 0.0003 time 3 22 28.44 <.0001 group*time 3 22 6.80 0.0021 Type 3 Hotelling-Lawley-McKeon Statistics Num Den Effect DF DF F Value Pr > F time 3 20 25.85 <.0001 group*time 3 20 6.18 0.0038 ---------------------------------------------------------------------------------------- Same analysis but specifying an unstructured covariance matrix. The Mixed Procedure Type 3 Hotelling-Lawley-Pillai-Samson Statistics Num Den Effect DF DF F Value Pr > F time 3 20 25.85 <.0001 group*time 3 20 6.18 0.0038

Notice the matrix of correlations. From pretest to the 6 month follow-up the correlation with pretest scores has dropped from .46 to -.03, and this pattern is consistent. That certainly doesn't inspire confidence in compound symmetry.

The *F*s have not changed very much from the previous model, but the degrees of freedom for within-subject terms have dropped from 57 to 22, which is a huge drop. That results from the fact that the model had to make additional estimates of covariances. Finally, the hlm and hlps statistics further reduce the degrees of freedom to 20, but the effects are still significant. This would make me feel pretty good about the study if the data had been real data.

But we have gone from one extreme to another. We estimated two covariance parameters when we used type = cs and 10 covariance parameters when we used type = un. (Put another way, with the unstructured solution we threw up our hands and said to the program "You figure it out! We don't know what's going on.' There is a middle ground (in fact there are many). We probably do know at least something about what those correlations should look like. Often we would expect correlations to decrease as the trials in question are further removed from each other. They might not decrease as fast as our data suggest, but they should probably decrease. An autoregressive model, which we will see next, assumes that correlations between any two times depend on both the correlation at the previous time and an error component. To put that differently, your score at time 3 depends on your score at time 2 and error. (This is a first order autoregression model. A second order model would have a score depend on the two previous times plus error.) In effect an AR(1) model assumes that if the correlation between Time 1 and Time 2 is .51, then the correlation between Time 1 and Time 3 has an expected value of .512^{2} = .26 and between Time 1 and Time 4 has an expected value of .513^{3} = .13. (For SPSS the comparable covariance structure is named "AR1".) Our data look reasonably close to that. (Remember that these are expected values of *r*, not the actual obtained correlations.) The solution using a first order autoregressive model follows.

Title 'Same analysis but specifying an autoregressive covariance matrix.'; Proc Mixed data = lib.WicksellLongMiss; class group subject time; model dv = group time group*time; repeated time /subject = subject type = AR(1) rcorr; run; Same analysis but specifying an autoregressive covariance matrix. Estimated R Correlation Matrix for subject 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.6182 0.3822 0.2363 2 0.6182 1.0000 0.6182 0.3822 3 0.3822 0.6182 1.0000 0.6182 4 0.2363 0.3822 0.6182 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate AR(1) subject 0.6182 Residual 5350.25 Fit Statistics -2 Res Log Likelihood 895.1 AIC (smaller is better) 899.1 AICC (smaller is better) 899.2 BIC (smaller is better) 901.4 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 1 29.55 <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 22 17.32 0.0004 time 3 57 30.82 <.0001 group*time 3 57 7.72 <.0002

Notice the pattern of correlations. The .6182 as the correlation between adjacent trials is essentially an average of the correlations between adjacent trials in the unstructured case. The .3822 is just .61822^{2} and .2363 = .61823^{3}. Notice that tests on within-subject effects are back up to 57 df, which is certainly nice, and our results are still significant. This is a far nicer solution than we had using **Proc GLM**.

Now we have three solutions, but which should we choose? One aid in choosing is to look at the "Fit Statistics' that are printed out with each solution. These statistics take into account both how well the model fits the data and how many estimates it took to get there. Put loosely, we would probably be happier with a pretty good fit based on few parameter estimates than with a slightly better fit based on many parameter estimates. If you look at the three models we have fit for the unbalanced design you will see that the AIC criterion for the type = cs model was 909.4, which dropped to 903.7 when we relaxed the assumption of compound symmetry. A smaller AIC value is better, so we should prefer the second model. Then when we aimed for a middle ground, by specifying the pattern or correlations but not making SAS estimate 10 separate correlations, AIC dropped again to 899.1. That model fit better, and the fact that it did so by only estimating a variance and one correlation leads us to prefer that model.

Guerin, L., and W.W. Stroup. 2000. A simulation study to evaluate PROC MIXED analysis of repeated measures data. p. 170-203. In Proc. 12th Kansas State Univ. Conf. on Applied Statistics in Agriculture. Kansas State Univ., Manhattan.

Howell, D.C. (2008) The analysis of variance. In Osborne, J. I., Best practices in Quantitative Methods. Sage.

Little, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., & Schabenberger, O. (2006). SAS for Mixed Models. Cary. NC. SAS Institute Inc.

Maxwell, S. E. & Delaney, H. D. (2004) Designing Experiments and Analyzing Data: A Model Comparison Approach, 2nd edition. Belmont, CA. Wadsworth.

Overall, J. E., Ahn, C., Shivakumar, C., & Kalburgi, Y. (1999). Problematic formulations of SAS Proc.Mixed models for repeated measurements. Journal of Biopharmaceutical Statistics, 9, 189-216.

Overall, J. E. & Tonindandel, S. (2002) Measuring change in controlled longitudinal studies. British Journal of Mathematical and Statistical Psychology, 55, 109-124.

Overall, J. E. & Tonindandel, S. (2007) Analysis of data from a controlled repeated measurements design with baseline-dependent dropouts. Methodology, 3, 58-66.

Pinheiro, J. C. & Bates, D. M. (2000). Mixed-effects Models in S and S-Plus. Springer.

Winter, B. (2013). Linear models and linear mixed effects models in R in linguistic applications. arXiv:1308.5499. [http://arxiv.org/pdf/1308.5499.pdf]

Some good references on the web are:

http://www.ats.ucla.edu/stat/sas/faq/anovmix1.htm http://www.ats.ucla.edu/stat/sas/library/mixedglm.pdf

The following is a good reference for people with questions about using SAS in general.

http://ssc.utexas.edu/consulting/answers/sas/sas94.html

Downloadable Papers on Multilevel Models

Good coverage of alternative covariance structures

http://cda.morris.umn.edu/~anderson/math4601/gopher/SAS/longdata/structures.pdf

The main reference for SASProc Mixedis

Little, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., & Schabenberger, O. (2006) SAS for mixed models, Cary, NC SAS Institute Inc.

See also

Maxwell, S. E. & Delaney, H. D. (2004). Designing Experiments and Analyzing Data (2nd edition). Lawrence Erlbaum Associates.

The classic reference for R is Penheiro, J. C. & Bates, D. M. (2000) Mixed-effects models in S and S-Plus. New York: Springer. (Although that deals with the earlier nlme library and is quite technical.

Last revised 3/5/2017