Using R

Running a repeated measures analysis of variance in *R* can be a bit more difficult than running a standard between-subjects anova. This page is intended to simply show a number of different programs, varying in the number and type of variables. In another section I have gone to extend this to randomization tests with repeated measures, and you can find that page at www.uvm.edu/~dhowell/StatPages/R/RandomRepeatedMeasuresAnovaR.html

I should point out that there are a number of different ways of performing this analysis within *R*, and setting them up is not always obvious. But I strongly recommend that you do a search under "Repeated measures analysis of variance using R." I think that you will be surprised at the quality of the discussions you will find. I particularly like this site by Rudolf Cardinal at Cambridge. But there are a whole bunch of really good sites out there. (I look up R stuff so often that Google defaults to that if I just type "R". Your might be wise to use "R-project" for a while until it gets the idea. Otherwise you might just get a bunch of stuff about the alphabet.)

The following example is loosely based on a study by Nolen-Hoeksema and Morrow (1991). The authors had the good fortune to have measured depression in college students two weeks before the Loma Prieta earthquake in California in 1987. After the earthquake they went back and tracked changes in depression in these same students over time. The following example is based on their work, and assumes that participants were assessed every three weeks for five measurement sessions.

I have changed the data from the ones that I have used elsewhere to build in a violation of our standard assumption of sphericity. I have made measurements close in time correlate highly, but measurements separated by 9 or 12 weeks correlate less well. This is probably a very reasonable thing to expect, but it does violate our assumption of sphericity.

If we ran a traditional repeated-measures analysis of variance on these data we would find

Notice that *F*_{obt} = 2.726, which is significant, based on
unadjusted *df*, with *p* = .034. Greenhouse and Geisser's correction
factor is 0.617, while Huynh and Feldt's is 0.693. Both of these lead to an
increase in the value of *p*, and neither is significant at α
= .05. (The *F* from a multivariate analysis of variance, which does not
require sphericity has *p* = .037.)

# R for repeated measures # One within subject measure and no between subject measures # Be sure to run this from the beginning because # otherwise vectors become longer and longer. library(car) rm(list = ls()) ## You want to clear out old variables --with "rm(list = ls())" -- ## before building new ones. data <- read.table("NolenHoeksema.dat", header = TRUE) datLong <- reshape(data = data, varying = 2:6, v.names = "outcome", timevar = "time", idvar = "Subject", ids = 1:9, direction = "long") datLong$time <- factor(datLong$time) datLong$Subject <- factor(datLong$Subject) orderedTime <- datLong[order(datLong$time),] options(contrasts=c("contr.sum","contr.poly")) # Using "data = dataLong" I can use the simple names for the variables modelAOV <- aov(outcome~factor(time)+Error(factor(Subject)), data = datLong) print(summary(modelAOV)) obtF <- summary(modelAOV)$"Error: Within"[[1]][[4]][1] par( mfrow = c(2,2)) plot(datLong$time, datLong$outcome, pch = c(2,4,6), col = c(3,4,6)) legend(1, 20, c("same", "different", "control"), col = c(4,6,3), text.col = "green4", pch = c(4, 6, 2), bg = 'gray90') The results follow. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Error: factor(Subject) Df Sum Sq Mean Sq F value Pr(>F) Residuals 24 953.7 39.74 Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(time) 4 94.3 23.572 2.726 0.0337 * Residuals 96 830.1 8.647 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Having two repeated measures is not really much different than having one. Instead of each subject's data across the one measure, we will randomize it across the second measure as well. For example, if we had a 2 X 5 design, both measures being repeated, each subject would have 10 scores and those 10 scores would be randomized.

The data are taken from a study by Bouton and Swartzentruber (1985) on conditioned suppression, but I have only used the data from Group 2. (This study is discussed in my book on Statistical Methods for Psychologists, 8th ed.) on page 484ff. In this study each of 8 subjects was measured across two cycles on each of 4 phases, so Cycle and Phase are repeated measures. The code is shown below.

### Test for Two Within-Subject Repeated Measures ### One tricky part of this code came from Ben Bauer at the Univ. of Trent, Canada. ### He writes code that is more R-like than I do. In fact, he knows more about R than ### I do. # Two within subject variables, 1:8 # Data from Bouton & Schwartzentruber (1985) -- Group 2 # Methods8, p. 486 data <- read.table("Tab14-11long.dat", header = T) n <- 8 #Subjects cells <- 8 #cells = 2*4 nobs <- length(data$dv) attach(data) cat("Cell Means \n") print(tapply(dv, list(Cycle,Phase), mean)) #cell means cat("\n") Subj <- factor(Subj) Phase <- factor(Phase) Cycle <- factor(Cycle) #Standard Anova options(contrasts = c("contr.sum","contr.poly")) model1 <- aov(dv ~ (Cycle*Phase) + Error(Subj/(Cycle*Phase)), contrasts = contr.sum) summary(model1) # Standard repeated measures anova *** I have used this example elsewhere except with a randomization approach. # The resuts follow_____________________________________________________ Error: Subj Df Sum Sq Mean Sq F value Pr(>F) Residuals 7 8411 1202 Error: Subj:Cycle Df Sum Sq Mean Sq F value Pr(>F) Cycle 1 110.2 110.25 2.877 0.134 Residuals 7 268.2 38.32 Error: Subj:Phase Df Sum Sq Mean Sq F value Pr(>F) Phase 3 283.4 94.46 1.029 0.4 Residuals 21 1928.1 91.82 Error: Subj:Cycle:Phase Df Sum Sq Mean Sq F value Pr(>F) Cycle:Phase 3 147.6 49.21 0.728 0.547 Residuals 21 1418.9 67.57

I am going to use an example of a study with one between subject measure and one within subject measure, The data are from a study by King (1986) on motor activity in rats following the administration of midazolam under three different conditions.

# This code was originally written by Joshua Wiley, in the Psychology Department at UCLA. # Modified for one between and one within for King.dat by dch ### Howell Table 14.4 ### ## Repeated Measures ANOVA with 2 variables ## Read in data, convert to 'long' format, and factor() dat <- read.table("Tab14-4.dat", header = TRUE) head(dat) dat$subject <- factor(1:24) datLong <- reshape(data = dat, varying = 2:7, v.names = "outcome", timevar = "time", idvar = "subject", ids = 1:24, direction = "long") datLong$Interval <- factor(rep(x = 1:6, each = 24), levels = 1:6, labels = 1:6) datLong$Group <- factor(datLong$Group, levels = 1:3, labels = c("Control", "Same", "Different")) cat("Group Means","\n") cat(tapply(datLong$outcome, datLong$Group, mean),"\n") cat("\nInterval Means","\n") cat(tapply(datLong$outcome, datLong$Interval, mean),"\n") # Actual formula and calculation King.aov <- aov(outcome ~ (Group*Interval) + Error(subject/(Interval)), data = datLong) # Present the summary table (ANOVA source table) print(summary(King.aov)) interaction.plot(datLong$Interval, factor(datLong$Group), datLong$outcome, fun = mean, type="b", pch = c(2,4,6), legend = "F", col = c(3,4,6), ylab = "Mean of Outcome", legend(4, 300, c("Same", "Different", "Control"), col = c(4,6,3), text.col = "green4", lty = c(2, 1, 3), pch = c(4, 6, 2), merge = TRUE, bg = 'gray90') #The Results follow Error: subject Df Sum Sq Mean Sq F value Pr(>F) Group 2 285815 142908 7.801 0.00293 ** Residuals 21 384722 18320 Error: subject:Interval Df Sum Sq Mean Sq F value Pr(>F) Interval 5 399737 79947 29.852 < 2e-16 *** Group:Interval 10 80820 8082 3.018 0.00216 ** Residuals 105 281199 2678 ---

# Two between subject variables, one within # Data from St. Lawrence et al.(1995) # Methods8, p. 479 # Note: In data file each subject is called a Person, and I convert that to a factor. # In reshape, it creates a variable called "subject," but that is not what I want to use. # In aov the model is based on Person, not subject. rm(list = ls()) data <- read.table("Tab14-7.dat", header = T) head(data) # Create factors data$Condition <- factor(data$Condition) data$Sex <- factor(data$Sex) data$Person <- factor(data$Person) #Reshape the data dataLong <- reshape(data = data, varying = 4:7, v.names = "outcome", timevar = "Time", idvar = "subject", ids = 1:40, direction = "long") dataLong$Time <- factor(dataLong$Time) tapply(dataLong$outcome, dataLong$Sex, mean) tapply(dataLong$outcome, dataLong$Condition, mean) tapply(dataLong$outcome, dataLong$Time, mean) options(contrasts = c("contr.sum","contr.poly")) model1 <- aov(outcome ~ (Condition*Sex*factor(Time)) + Error(Person/(Time))) summary(model1) # The results follow ________________________________ Error: Person Df Sum Sq Mean Sq F value Pr(>F) Condition 1 107 107 0.215 0.6457 Sex 1 3358 3358 6.731 0.0136 * Condition:Sex 1 64 64 0.128 0.7228 Residuals 36 17961 499 --- Error: Person:Time Df Sum Sq Mean Sq F value Pr(>F) factor(Time) 3 274 91.4 0.896 0.4456 Condition:factor(Time) 3 1378 459.3 4.507 0.0051 ** Sex:factor(Time) 3 780 260.0 2.551 0.0594 . Condition:Sex:factor(Time) 3 476 158.8 1.558 0.2037 Residuals 108 11006 101.9 ...

These data are from the same Bouten and Swartzentrub study used above. The difference is that I have used all three groups.

rm(list = ls()) data <- read.table("Tab14-11.dat", header = T) attach(data) Phase <- factor(rep(1:2, each = 24, times = 4)) Cycle <- factor(rep(1:4, each = 48)) Group = factor(rep(1:3, each = 8,times = 8)) dv <- c(C1P1, C1P2, C2P1, C2P2, C3P1, C3P2, C4P1, C4P2) Subj <- factor(rep(1:24, times = 8)) n <- 24 #Subjects withincells <- 8 #withincells = 2*4 nobs <- length(dv) cat("Cell Means Across Subjects \n") print(tapply(dv, list(Cycle,Phase), mean)) #cell means cat("\n") #Standard Anova options(contrasts = c("contr.sum","contr.poly")) model1 <- aov(dv ~ (Group*Cycle*Phase) + Error(Subj/(Cycle*Phase)), contrasts = contr.sum) print(summary(model1)) # Standard repeated measures anova # The results follow Error: Subj Df Sum Sq Mean Sq F value Pr(>F) Group 2 4617 2308.4 3.083 0.067 . Residuals 21 15723 748.7 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Subj:Cycle Df Sum Sq Mean Sq F value Pr(>F) Cycle 3 2727 909.0 12.027 2.53e-06 *** Group:Cycle 6 1047 174.5 2.309 0.0445 * Residuals 63 4761 75.6 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Subj:Phase Df Sum Sq Mean Sq F value Pr(>F) Phase 1 11703 11703 129.85 1.88e-10 *** Group:Phase 2 4054 2027 22.49 6.01e-06 *** Residuals 21 1893 90 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Subj:Cycle:Phase Df Sum Sq Mean Sq F value Pr(>F) Cycle:Phase 3 742 247.17 4.035 0.01090 * Group:Cycle:Phase 6 1274 212.30 3.466 0.00505 ** Residuals 63 3859 61.26 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Last revised: 6/26/2015