GreenBlueBar.gif GreenBlueBar.gif

R Code for Chapter 12

Hypothesis tests applied to Means: One Sample


The first piece of code is quite extensive and plots many of the figures used in this chapter. For Figure 12.1 I first created a variable (x) that runs from -3 to 3 in steps of .1. I then computed the density (height) of a normal distribution at each value of x. But that left me with a distribution centered on 0.00 with a standard deviation of only 1. So I adjusted my values of x to have the appropriate mean and standard deviation (50, 7.65). Now I simply plot y as a function of the adjusted value of x, and supply a legend.

That was the easy part. I next wanted to impose a normal distribution on top of a histogram. That can be a little tricky because you want the two distributions to have the same height, but the height of a histogram will change with the sample size, even if you request "freq = FALSE." However the first of those below (Figure 12-2a) accomplishes this in one way. It is intended to show the sampling distribution of the mean for n = 5. The mean of the population was set at 50, with a standard deviation of 7.65. So I set up my x axis to have a range of 40 - 60, in steps of .1. I then drew 10,000 samples of n = 5 from the appropriate population and calculated the mean of each sample. Given those 10,000 means, I computed the mean of means and the st. dev. of the means, which is the standard error. Using the hist() function I drew a histogram of those means and then superimposed the normal densities on top of that using the lines() function. (That command might be clearer if I calculated "y = dnorm(x, meanxbar, sterr)" and then used "plot(x,y).") Finally, I added the appropriate legend giving the mean and st. dev. That works nicely, and it will continue to work even if you change the number of samples (nreps). Notice that in Figure 12-2b I first plotted without any axes, and then added the axes in after I had adjusted the values of x to have the appropriate mean and standard deviation. Several of the graphical parameters, such as the xlabel, I left blank (xlab = "") in the overplotting because I didn't want to repeat the label. Doing so sometimes gives a blurred result.

The next set of code in this section illustrates the solution for Figure 12.4. It draws random samples from a normal population and plots the sampling distribution of the variance. I have plotted a density function on top of the histogram. You will see that it is slightly uneven because it is trying to provide the best fit to the distribution.


### Chapter 12

# Figures 12-1 and 12-2a and 12-2b

x <- pretty(c(-3, 3), 50)
y <- dnorm(x)
xadj <- x*7.65+50
plot(y~xadj, type = "l", ylab = "density", xlab = "Value of X", ylim = c(0, .4),
 xlim = c(20, 80))
legend(60, .35, "mu = 50 \nsigma = 7.65", bty = "n")
par(new = TRUE)

##########################
# Figure 12-2a
x <- seq(40, 60, .1)                     # values on the X-axis
nreps <- 100000
means <- numeric(nreps)
for (i in 1:nreps) {
   samp <- rnorm(5, 50, 7.65)  # Draw 1,000,000 random samples (n = 5) and compute mean
   means[i] <- mean(samp)
}
meanxbar <- round(mean(means), digits = 2)
sterr <- round(sd(means), digits = 2)

hist(means,  xlim = c(35,65), ylim = c(0,.2), ylab = "density",
     main = "Histogram of Means; N = 5", freq = FALSE, breaks = 50)
lines(x, dnorm(x, meanxbar, sterr))
legend(55,.15,bquote("mean"  == .(meanxbar)), bty = "n")
legend(55, .14, bquote("st. error" == .(sterr)), bty = "n")  # Another way to insert calculated value

############################
# Figure 12-2b
x <- seq(-3, 3, by = .1)
y <- dnorm(x)
nreps <- 100000
means <- numeric(nreps)
for (i in 1:nreps) {
   samp <- rnorm(30, 50, 7.65)
   means[i] <- mean(samp)
}
meanxbar <- round(mean(means), digits = 2)
sterr <- round(sd(means), digits = 2)

hist(means, breaks = 50, xlim = c(35,65), axes = FALSE, ylab = "density",
     main = "Histogram of Means; N = 30")
xadj <- x*sterr+meanxbar

par(new = T)
plot(xadj, y, type="l",  ylab="", yaxt="s", xlab=" ", xlim = c(35,65))
legend(55,.30,bquote("mean"  == .(meanxbar)), bty = "n")
legend(55, .28, bquote("st. error" == .(sterr)), bty = "n")

##############################
library(psych)
df <- 4
nreps <- 10000
var <- numeric(nreps)
for (i in 1:nreps) {
   samp <- rnorm(5, 50, 11.785)
   var[i] <- var(samp)
}
meanvar <- round(mean(var), digits = 2)
sdvar <- round(sd(var), digits = 2)
hist(var, breaks = 75, xlim = c(0,800), axes = F, ylab = "Frequency",
     main = "Sampling Distribution of Variances; N = 5", probability = TRUE,
     xlab = "Variance", ylim = c(0, .0055))
legend(300,.005,bquote("mean variance"  == .(meanvar)), bty = "n")
legend(300, .0045, bquote("s.d. of var" == .(sdvar)), bty = "n")
arrows(325, .0048, 139, 0,length = 0.125, col = "blue")
par(new = TRUE)

plot(density(var), ylim = c(0, .0055), xlim = c(0, 800), ylab = "", xlab = "", main = "")


The next set of code produces Figure 12.7. I have skipped Figure 12.6 because I imagine that you have seen enough about drawing polygons. Figure 12.7 shows the results of drawing 25 samples from a population with μ = 5 and σ = 1, and computing confidence limits on the mean for each sample. I have drawn a vertical line at μ = 5, and ideally 95% of our confidence intervals would cross that line. Because I did not use something like "set.seed(1234)" to run the random generator from the same place each time, every time you run this you will have a different result, with a different number of lines failing to include 5. That is what we want.


# Figure 12.7  Confidence limits

mu <- 5; sigma = 1; N = 4; nreps = 25
lower <- numeric(nreps)
upper <- numeric(nreps)
for (i in 1:nreps) {
   x <- rnorm(N,mu,sigma)
   test <- t.test(x, conf.level = .95)
   lower[i] <- test$conf.int[1]
   upper[i] <- test$conf.int[2]
}

samples <- c(1:nreps)
outcomes <- seq(2, 8, length.out = nreps)
plot(samples ~ outcomes, type = "n", ylim = c(0,35),
     xlab = "Mean Value", ylab = "Sample")
for (i in 1:nreps)  {
   lines(x = c(lower[i], upper[i]), y = c(i, i), new = TRUE)
   abline(v = 5)
}
thelabel <- expression(italic(mu) == 5)
text(x = 5, y = 30, labels = thelabel, cex = 2)


The two analyses that follow do not need any explanation. They plot a histogram of the moon illusion data and run a t-test on the data on guessing on the SAT (Section 12.9).



###  Moon illusion analysis

ratio <- c(1.73, 1.06, 2.03, 1.40, 0.95, 1.13, 1.41, 1.73, 1.63, 1.56)
cat( "The mean of the ratio scores is ",mean(ratio),"\n")
cat("The standard deviation of ratio scores is ", sd(ratio), "\n")
t.test(ratio, conf.int = .95, mu = 1)
hist(ratio, col = "red", breaks = 5, main = "Histogram of Ratio of Horizon to Zenith",
 xlab = "Ratio")

### A good guess
score <- c(58, 48, 48, 41, 34, 43, 38, 53, 41, 60, 55, 44, 43, 49,
           47, 33, 47, 40, 46, 53, 40, 45, 39, 47, 50, 53, 46, 53)
t.test(score, conf.int = .95, mu = 20)


The following sets of code deal with exercises at the end of the chapter. If you highlight the first block (Exercise 12.1 and 12.4) and keep clicking "run," you will see as many replications of this as you want. That is worth doing just to get a feel for what kinds of differences you see.

Exercise 12.8 is straightforward. I used the t-test to print out the confidence interval. Similarly, Exercise 12.11 is a simple t-test. However I did something slightly different for Exercise 12.13. Most of the time the code that I give uses built in functions, such as t.test() or hist(). So for Exercise 12.13 you might expect me to use something like "effectSize()." But there is not a good effect size measure that I wanted to use here. (There are some, but it would require loading another library.) So I did something like hand calculation. I calculated the mean and standard deviation, and then I plugged them into the equation for Ŷ given in Section 12.9. That directly calculated Ŷ.


## Exercise 12.1 and 12.4
## I have superimposed distributions to compare spread
## The population mean is 4.5 with a standard deviation of 2.6
## Draw 1000 observation from this (normal) population
nreps = 1000
data <- rnorm(n = nreps, mean = 4.5, sd = 2.6)
hist(data, main = "Distribution of Sample Values", xlab = "Score or Mean",
     col = "red", breaks = 10, density = 10, xlim = c(-2,12))  # density = lines/inch
## Now draw 1000 samples of n = 5 and calculate and plot means

Mean = numeric(nreps)   # Place to store sample means
for (i in 1:nreps) {
      sample <- rnorm(5, 4.5, 2.6)
      Mean[i] <- mean(sample)
}
hist(Mean, breaks = 10, add = TRUE, col = "darkblue", xlim = c(-2,12)) 
  # "add" = Superimpose on first distribution
  # Go back to "fsample <- rnorm(5, 4.5, 2.6)}" and change 5 to 15

### Exercise 12.8
education.data <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/
  DataFiles/Tab11-1.dat", header = TRUE)
names(education.data)
attach(education.data)
t.test(PTratio, conf.int = .95)

### Exercise 12.11 and 12.13
gain <- c(1.7, 0.7, -0.1, -0.7, -3.5, 14.9, 3.5, 17.1, -7.6,
 1.6, 11.7, 6.1, 1.1, -4.0, 20.9, -9.1, 2.1, -1.4, 1.4, -0.3,
 -3.7, -0.8, 2.4, 12.6, 1.9, 3.9, 0.1, 15.4, -0.7)
t.test(gain, conf.int = .95)
hist(gain)
xbar <- mean(gain)
stdev <- sd(gain)
cat("xbar = ", xbar, "      st. dev. = ", stdev)
mu = 0
delta <- round((xbar - mu)/stdev, digits = 3)
cat("Cohen's estimate of delta = ", delta)

GreenBlueBar.gif GreenBlueBar.gif dch: