R Code for Chapter 5

Measures of Variability

The set of code below plots the data on rated attractiveness from Langlois and Roggman. There is nothing particularly new here.

###  Chapter 5
###    ____________________________________________________________________________
### Graphics for Langois and Roggman    Figure 5.1 in Fundamentals

X <- c(1.20, 1.82, 1.93, 2.04, 2.30, 2.33, 2.34, 2.47, 2.51, 2.55, 2.64,
 2.76, 2.77, 2.90, 2.91, 3.20, 3.22, 3.39, 3.59, 4.02)
Y <- c(3.13, 3.17, 3.19, 3.19, 3.20, 3.20, 3.22, 3.23, 3.25, 3.26, 3.27,
 3.29, 3.29, 3.30, 3.31, 3.31, 3.34, 3.34, 3.36, 3.38)

print(mean(X)); print(mean(Y))  # You can have multiple commands per line if you use a semi-colon
print(var(X)); print(var(Y))    #  These are needed to supply information for the legends.
print(sd(X)); print(sd(Y))
N = 20
par(mfrow = c(1,2))
hist(X, xlab = "Rated Attractiveness", main = "Composite of 4 Faces", col = "darkgreen", 
   xlim = c(1,5)  )
legend(2.5,7,"Mean = 2.64\nVariance = 0.43\nN = 20", bty = "n", cex = .75)

hist(Y, xlab = "Rated Attractiveness", main = "Composite of 32 Faces", col = "darkgreen",
 xlim = c(1,5)   )
legend(2,5,"Mean = 3.26\nVariance = .0048\nN = 20", bty = "n", cex = .75)
#  Try changing the beginning of last command from 2,5 to 4,4 to see what happens.

This next batch of code is a bit more interesting. I want to demonstrate that using N - 1 in computing the sample variance and standard deviation gives us a much better result than using just N. I start by creating a population of 1 million numbers with a mean of 35 and a standard deviation of 10.5. This population is named pop and it has an actual population variance of 109.425. (sd <- 10.461). Here I draw 10,000 samples of N = 15 observations each, and compute the variance of them with both N and N-1. (Back when I wrote the first edition of this book, the computer that was available would have taken several hours to do this, and probably could not have stored even a fraction of the data. ) To do this I have to set aside some space to store those results. Then I use one of those "for" loops. We start out with i = 1. Then I draw a sample of data (to be described in a minute) and place the data in a variable called "temp". I then calculate the variance of temp with both denominators and store those away as var1[1] and var2[1]. Now I go back and increment i to 2. Again I draw a sample, gets its variances, and store them away as var1[2] and var2[2]. This keeps repeating until I have calculated and stored the variances of the 10,000 sample in var1[10000] and var2[10000]. Drawing those 10,000 samples and making the calculations took my macPro just under 2 seconds.

I should say something about the "sample" command. I set up a population named "pop" in the first line of the program. It has 1 million observations from a population with a mean of 35 and a standard deviation of 10.5. Every time I use the "sample(pop, N, replace = TRUE)" I draw N = 15 random observations from that population, using sampling with replacement.

   ### Dividing by N vs N-1
set.seed(1486)    # Will produce the same result every time it is run
pop <- rnorm(1000000, mean = 35, sd = 10.5)   #Notice how long it takes to draw
  #a million random numbers
N <- 15
nreps = 10000                          #  I'll draw 10,000 samples
var1 <- numeric(nreps)          #  Setting aside space to store 10,000 variances
var2 <- numeric(nreps)
temp <- numeric(N)
popVar <- round(var(pop), digits = 3)
cat("The variance of the population is = ", popVar)
for (i in 1:nreps) {
   temp <- sample(pop, N, replace = TRUE)
   var1[i] <- var(temp)
   var2[i] <- var(temp)*(N-1)/N    # This really divides by N instead  of N-1
}

par(mfrow = c(1,2))
hist(var1, main = "With N-1", ylim = c(0, 3500))
legend(200, 2000, "Mean \nvariance = \n 109.85", bty = "n", cex = .75)
legend(200, 1600, "Population \nvariance = \n110.242", cex = .75, bty = "n")

hist(var2, main = "With N", ylim = c(0, 3500))
mean(var1); mean(var2)
legend(200, 2000, "Mean \nvariance = \n 102.528", bty = "n", cex = .75)
legend(200, 1600, "Population \nvariance = \n110.242", cex = .75, bty = "n")

dch: