The set of code below plots the data on rated attractiveness from Langlois and Roggman. There is nothing particularly new here.
### Chapter 5 ### ____________________________________________________________________________ ### Graphics for Langois and Roggman Figure 5.1 in Fundamentals X <- c(1.20, 1.82, 1.93, 2.04, 2.30, 2.33, 2.34, 2.47, 2.51, 2.55, 2.64, 2.76, 2.77, 2.90, 2.91, 3.20, 3.22, 3.39, 3.59, 4.02) Y <- c(3.13, 3.17, 3.19, 3.19, 3.20, 3.20, 3.22, 3.23, 3.25, 3.26, 3.27, 3.29, 3.29, 3.30, 3.31, 3.31, 3.34, 3.34, 3.36, 3.38) print(mean(X)); print(mean(Y)) # You can have multiple commands per line if you use a semi-colon print(var(X)); print(var(Y)) # These are needed to supply information for the legends. print(sd(X)); print(sd(Y)) N = 20 par(mfrow = c(1,2)) hist(X, xlab = "Rated Attractiveness", main = "Composite of 4 Faces", col = "darkgreen", xlim = c(1,5) ) legend(2.5,7,"Mean = 2.64\nVariance = 0.43\nN = 20", bty = "n", cex = .75) hist(Y, xlab = "Rated Attractiveness", main = "Composite of 32 Faces", col = "darkgreen", xlim = c(1,5) ) legend(2,5,"Mean = 3.26\nVariance = .0048\nN = 20", bty = "n", cex = .75) # Try changing the beginning of last command from 2,5 to 4,4 to see what happens.
This next batch of code is a bit more interesting. I want to demonstrate that using N - 1 in computing the sample variance and standard deviation gives us a much better result than using just N. I start by creating a population of 1 million numbers with a mean of 35 and a standard deviation of 10.5. This population is named pop and it has an actual population variance of 109.425. (sd <- 10.461). Here I draw 10,000 samples of N = 15 observations each, and compute the variance of them with both N and N-1. (Back when I wrote the first edition of this book, the computer that was available would have taken several hours to do this, and probably could not have stored even a fraction of the data. ) To do this I have to set aside some space to store those results. Then I use one of those "for" loops. We start out with i = 1. Then I draw a sample of data (to be described in a minute) and place the data in a variable called "temp". I then calculate the variance of temp with both denominators and store those away as var1[1] and var2[1]. Now I go back and increment i to 2. Again I draw a sample, gets its variances, and store them away as var1[2] and var2[2]. This keeps repeating until I have calculated and stored the variances of the 10,000 sample in var1[10000] and var2[10000]. Drawing those 10,000 samples and making the calculations took my macPro just under 2 seconds.
I should say something about the "sample" command. I set up a population named "pop" in the first line of the program. It has 1 million observations from a population with a mean of 35 and a standard deviation of 10.5. Every time I use the "sample(pop, N, replace = TRUE)" I draw N = 15 random observations from that population, using sampling with replacement.
dch:### Dividing by N vs N-1 set.seed(1486) # Will produce the same result every time it is run pop <- rnorm(1000000, mean = 35, sd = 10.5) #Notice how long it takes to draw #a million random numbers N <- 15 nreps = 10000 # I'll draw 10,000 samples var1 <- numeric(nreps) # Setting aside space to store 10,000 variances var2 <- numeric(nreps) temp <- numeric(N) popVar <- round(var(pop), digits = 3) cat("The variance of the population is = ", popVar) for (i in 1:nreps) { temp <- sample(pop, N, replace = TRUE) var1[i] <- var(temp) var2[i] <- var(temp)*(N-1)/N # This really divides by N instead of N-1 } par(mfrow = c(1,2)) hist(var1, main = "With N-1", ylim = c(0, 3500)) legend(200, 2000, "Mean \nvariance = \n 109.85", bty = "n", cex = .75) legend(200, 1600, "Population \nvariance = \n110.242", cex = .75, bty = "n") hist(var2, main = "With N", ylim = c(0, 3500)) mean(var1); mean(var2) legend(200, 2000, "Mean \nvariance = \n 102.528", bty = "n", cex = .75) legend(200, 1600, "Population \nvariance = \n110.242", cex = .75, bty = "n")