GreenBlueBar.gif GreenBlueBar.gif

R Code for Chapter 16

One-way Analysis of Variance


When we come to the analysis of variance, the work horse for those analyses will be the "linear model" function (lm)). ) This is the same function that we used with multiple regression, with one important difference. For the analysis of variance, the "predictor variables," (those on the right side of the equation) are factors. That means that they take on only a few possible values and are usually categorical.)

We will start with the data on aggressive behavior by Giancola and Corman. These data can be found in Table 16.1. (The analysis of variance is printed out in Table 16.5.) I usually don't provide printout in these pages, but I did here to illustrate a point. The calculation of the model is straight-forward. I then printed the model. You will see that the only thing that it printed was labeled "coefficients. In this printout the intercept is the mean of group 1, and the remaining coefficients are the deviations of the individual group means from the mean of group 1. That's not really very helpful.

When I printed the summary of the model I obtain those same coefficients, but also their estimated standard errors, and the individual t tests and their probabilities. Again, it is not obvious what we would want these for at this time. We know that the mean of group 1 is significantly different from 0.00, the mean of groups 3 and 4 are significantly different from the mean of group 1, and the others are not. Again, that's not very helpful.

When I ask for "anova(model1)" I obtain the printout that you would expect. The group effect is significantly different from 0.00, with F = 4.151 and p = .0001. That is the result we usually look for. So the command that you want is of the form "anova(model1)." (You could try to put them all together with print(summary(anova(model1))), but you won't like what you get.

### Chapter 16   The analysis of variance

### Start with Giancola and Corman

Giancola <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Tab16-1.dat", 
header = TRUE)
head(Giancola)      # Variables are group and dv
attach(Giancola)
group <- as.factor(group)       # Make group a factor
### Let's look at the group means
tapply(dv, group, mean)
###    group means =   1.8025000  0.6750000 -0.9125000  0.5441667  1.8891667 
model1 <- lm(dv ~ group)
print(model1)
print(summary(model1))
print(anova(model1))
#_______________
### Output
	> print(model1)
	
	Call:
	lm(formula = dv ~ group)
	
	Coefficients:
	(Intercept)       group2       group3       group4       group5  
		1.80250     -1.12750     -2.71500     -1.25833      0.08667  
#_________________________	
	> print(summary(model1))
	
	Call:
	lm(formula = dv ~ group)
	
	Residuals:
		Min      1Q  Median      3Q     Max 
	-4.4825 -0.5904  0.0500  0.7106  4.5808 
	
	Coefficients:
	             Estimate    Std. Error   t value    Pr(>|t|)    
	(Intercept)  1.80250       0.43425     4.151    0.000116 ***
	group2      -1.12750       0.61412    -1.836    0.071772 .  
	group3      -2.71500       0.61412    -4.421    4.67e-05 ***
	group4      -1.25833       0.61412    -2.049    0.045245 *  
	group5       0.08667       0.61412     0.141    0.888288    
	---
	Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
	
	Residual standard error: 1.504 on 55 degrees of freedom
	Multiple R-squared:  0.3342,	Adjusted R-squared:  0.2857 
	F-statistic: 6.901 on 4 and 55 DF,  p-value: 0.0001415
#______________________	
	> print(anova(model1))
	Analysis of Variance Table
	
	Response: dv
			  Df Sum Sq Mean Sq F value    Pr(>F)    
	group      4  62.46 15.6151  6.9005 0.0001415 ***
	Residuals 55 124.46  2.2629                      
	---
	Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


The next analysis is an analysis of data from groups having unequal sizes. The analysis is straight-forward, because when you have a oneway analysis of variance it makes no differences how the sample sizes compare. I have included the result, and you can see that it agrees with the result in your text.

Immediately following that analysis I give Fisher's LSD test. This is a straight-forward test, but you need to specify "p.adj = "none" to obtain that test. Remember, we do not apply any direct adjustment to the probabilities, at least when we have 4 or fewer groups. The only requirement is a significant F. When we have many groups the Type I error rate would get out of hand with Fisher's test, but we seldom have that many groups. I do not give a test for the Bonferroni correction. It is possible to ask for such a test, but that test would assume that you wanted to make all pairwise contrasts, and that rarely makes sense. And adjusting for all pairwise contrasts makes the test overly conservative. The printout for this analysis is shown in the text in Section 16.5.


### Unequal sample sizes from Table 16.6--Maternal Adaptation
matAdapt <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Tab16-6.dat"
, header = TRUE)
names(matAdapt)
attach(matAdapt)
Group <- factor(Group)
model2 <- lm(Adapt ~ Group)
print(anova(model2))
#__________________________________________
	Analysis of Variance Table
	
	Response: Adapt
			  Df  Sum Sq Mean Sq F value   Pr(>F)   
	Group      2  226.93 113.466   5.532 0.005421 **
	Residuals 90 1845.99  20.511                    
	---
	Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
	
pairwise.t.test(Adapt, Group, p.adj = "none", pool.sd = TRUE) # "none" gives us the LSD test



In the text I talk about the size of an effect. I suggested that when we have an experiment with more than two groups, it makes sense to chose our contrasts to answer very specific questions, either by comparing two groups, or by greating two sets of groups. Here I will make two comparisons. I will look at the two low-birth-weight groups and compare the experimental with the control group. I will then look at the LBW-control group versus the full-term group. These are the contrasts that I used in the book.

I can make these contrasts using the MBESS package and its ci.cs() function, when gives us standardized differences. This has the advantage of also giving us confidence limits on the standardized difference. The code is shown below, where the means and sample sizes come from Table 16.6. Notice that the results that we obtain agree with those in the text.


library(MBESS)
### We don't need the data because we have the relevant statistics
ci.sc(means = c(18.333, 14.838), error.variance = 20.511, c.weights = c(1, -1),
 n = c(27, 37), N = 64, conf.level = .95)
ci.sc(means = c(14.97, 18.33), error.variance = 20.511, c.weights = c(1,-1),
n = c(29, 27), N = 56, conf.level = .95)
#___________________________________________________________
[1] "The 0.95 confidence limits for the standardized contrast are given as:"
$Lower.Conf.Limit.Standardized.Contrast
[1] 0.25454

$Standardized.contrast
[1] 0.771709

$Upper.Conf.Limit.Standardized.Contrast
[1] 1.28308
#######

[1] "The 0.95 confidence limits for the standardized contrast are given as:"
$Lower.Conf.Limit.Standardized.Contrast
[1] -1.28112

$Standardized.contrast
[1] -0.741901

$Upper.Conf.Limit.Standardized.Contrast
[1] -0.196265


Notice that for both sets of effect sizes the confidence limits do not cross zero. This is in line with the fact that both of those contrast were statistically significant.

I will only work two of the homework exercises here because most of them are similar. I have chosen Exercise 16.21 because I can then easily go on to Exercise 16.22. The analysis is given below. The result of the multiple comparisons test, which in this case is really Fisher's LSD test, is probably different from the result you calculated, because I assume that you used MSerror for each test. Each of these tests in based on an error term computed from the two groups in question. There are a number of ways to create multiple comparisons in R, but they are more complex and will not provide the t value that you would obtain using an error term pooled across the three conditions.

Notice the way that I created separate subgroups of the data, leaving out one of the conditions in each set. There are other ways to do this, but this is probably the simplest. Notice also that in one case I told it to treat the variances as equal, while in the other case I did not. Because we have equal sample sizes, we obtain the same result either way. I speak about what I have done as Fisher's LSD test because you can evaluate each of those t values at α = .05. They could also be thought of as tests with a Bonferonni correction if you evaluate them against α = .025.



### Exercises 16.21 and 167.22.
data <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Ex16-21.dat", 
header = TRUE)
names(data)
attach(data)
Smkgrp <- factor(Smkgrp)
model3 <- lm(Recall ~ Smkgrp)
print(anova(model3))

test1 <- subset(data, Smkgrp != 2)
test2 <- subset(data, Smkgrp != 1)

cat("The following t test leaves out the Delayed Smokers.")
bb <- t.test(Recall ~ Smkgrp, data = test1, var.equal = TRUE)
cat("The following t test leaves out Nonsmokers.")
t.test(Recall ~ Smkgrp, data = test2)


GreenBlueBar.gif
GreenBlueBar.gif dch: