R Code for Chapter 6

The Normal Distribution

The first two bits of code that follow simply show you how to calculate the density (height of the curve) of the normal distribution at a specific value of x (1 in this case). You can also calculate the probability of a score less than q = 1.0 in the next line. For example, if you set q = 1.96 for pnorm, the result would be 0.975.

The next set of code creates a bar plot. Notice that I use the names() function to create names for the different levels of "areas". The barplot function will then add these names to the abscissa, though you may have to drag your plot wider for all of the names to appear. The next section simply repeats those commands and adds a bit more. Nothing is particularly new there.

### Calculate height (density) of the normal distribution at x = 1
dnorm(x = 1, mean = 0, sd = 1)         # Just a demonstration
pnorm(q = 1.00, mean = 0, sd = 1)

### Bar chart for correctional supervision  
areas <- c(.61, .19, .09, .11)
names(areas) <- c("Probation","Prison","Jail", "Parole")
barplot(height = areas, ylab = "Percentage", main = "Form of Supervision", density = 10, 
col = "darkgreen")

### Calculate areas, density, etc. of normal distribution
density <- dnorm(x = -0.5, mean = 0, sd = 1)            # Produces density (height) at that point
probability <- pnorm(q = -0.5, mean = 0, sd = 1)     # Area below z = -.50
criticalValue <- qnorm(p = .025, mean = 0, sd = 1)  # Cutoff for p = .025
cat("density = ", density)
cat("probability = ", probability)
cat("critical value = ", criticalValue)
sample <- rnorm(n = 10, mean = 0, sd = 1)               # Draw a random sample of 10 cases
print(sample)
cat("\nNow to more stuff about plotting")

The next set of code does some stuff with overplotting a histogram with lines. First it reads the data and plots a histogram. Nothing new there. Next it does something you haven't seen with "par(new = TRUE)" The "par" says that it is going to change the parameters of a graphic. But the "new" is confusing. I am about to superimpose a line or two on the graphic, without first erasing the graphic. I would have thought that when I said "new = TRUE" it would mean that I want a new graphic. But it means just the opposite. It means that I want to superimpose a new thing on top of the old graphic. (oh, well.)

If you look at the histogram you will see that it runs roughly from 15 to 90. So I create a variable (x) that is that long (15, 15.01, 15.02, ... 89.99, 90) and, in the following plot command I compute the density at each x point. It takes each value of x, finds the corresponding normal density, the height of the curve, and uses that. If you look, you will see that I tell it to plot each value of x against the density of the distribution at that point. I need to spell out the mean and standard deviation so that the densities will be appropriate. Then the code goes ahead and puts in a point at all of those x,y combinations, making a nice pretty normal curve.You will see that my plot command includes a lot of extra stuff. For example, I want to specify the range of the x axis so that in the next plot my x axis will be the same size. The same goes for the y axis. You will see that in some places I specify something like main = "". That is because the histogram has already supplied the main label, and I don't want to overwrite it with the one here. That happens in several places.

A density() function is a bit like the normal curve in this case, but it is basically the best fitting "curve" for the data. So it has some bumps here and there. (Someone referred to it as a kind of smoothed histogram.) I plot that using the "lines" command.

Finally, I show a different way of doing the same thing with the "curve" function. It does a nice job, but you could live without knowing about it.


### Impose normal distribution on Achenbach data---Fig 6.3
### Doesn't use "curve" function
par(mfrow = c(1,1))
setwd("/Users/davidhowell/Dropbox/Webs/fundamentals9/DataFiles") 
data <- read.table("Fig6-3.dat", header = TRUE)
attach(data)
hist(BehavProbTot, breaks = 35, freq = FALSE, xlim = c(15, 85), ylim = c(0,.05), 
col = "green", xlab = "Behavior Problem Score", main = "")
par(new = TRUE)
x <- seq(15, 90, .01)
mean(BehavProbTot); sd(BehavProbTot)
### Now superimpose a normal distribution
plot(x, dnorm(x, mean = 49.12, sd = 10.548), type = "l", xlim = c(15, 85), ylim = c(0,.05), 
ylab = "", xlab = "", col = "red")
### Now superimpose the best fitting density function
lines(density(BehavProbTot), col = "blue") 

### Now use the "curve" function -- A bit more than you really need
attach(data)
hist(BehavProbTot, breaks = 35, freq = FALSE, xlim = c(15, 85), ylim = c(0,.05), 
col = "green", xlab = "Behavior Problem Score", main = "")
par(new = TRUE)
curve(dnorm(x, mean = mean(BehavProbTot), sd = sd(BehavProbTot)), from = 15, to = 90, 
xlim = c(15, 85), ylim = c(0, .05), ylab = "", xlab = "")

Now we come to polygons. In theory, they should be quite simple. But I always mess them up. Always! Every single time! We will use Figure 6-8 as an example. First we want to plot a histogram and superimpose a normal curve on it. That's no problem. Then we want the polygon. We happen to have a distribution with mean = 0, sd = 1, which makes things a bit simpler. I want to shade the area under the normal distribution from x = -2 to x = -1. We could put our cursor on the point -2,0 and make a dot. Now move up to the point (-2, 0.054), where 0.054 is the density (height) of the curve at z = -2. Make another dot. Now increase x by a little, e.g. -1.9. Find the density at -1.9, which is .066 and make a dot at x = -1.9, y = .066. Keep going until you make it up to (-1,.2420). After you make that dot, you want to drop down to the abscissa at (-1,0) and make a dot. Finally, go back to your starting point at (-2,0) and make your last dot. Draw a line to connect all of those dots and you have what you want, though you should supply a color so that you can see the polygon better.

Now we will work on the code to use the "polygon()" function. Notice that I have set up my collection of points, named x and y. The variable x is just the set of values between -2 and -1. The variable y is just the corresponding set of densities. Now my polygon function starts with the lowest value of x and sets y = 0, so I'm on the baseline. Then I move through all of the x values and the corresponding densities (y) until I get to -1, .242. Finally I go to the maximum x (-1) and 0. I could then go back to where I started, but I don't have to because "polygon()" is pretty smart and does that automatically. Well--when I do it I think that polygon() is pretty stupid, but that is just projecting my stupidity onto R to make myself feel better.

That part of the code has drawn the polygon. The rest is there to add some labels. I calculate the area between -2 and -1 and call it "Area." I use the "text()" function to write "Area =" on the screen . Then I use the legend function to write the value of "area." The two numbers at the beginning of those commands are just the coordinates of x and y where you want this stuff written. The "bty" is for "box type, and by specifying bty = "n" I am saying that I want no box. They look tacky. Finally, I use the arrows() command to draw an arrow from the two sets of points that I give. And then I sit back and tell myself how well I have done.

The next section of code does some more shading. I'll let you work through that on your own. But here the mean and st. dev. are not 0 and 1, so you have to take that into account when you write your "dnorm()" command.

### Now let's do Figure 6.8
### A bit easier because we are starting with z
rm(x)    # More housekeeping
zx <- seq(-3,3,.01)
zy <- dnorm(zx)
plot(zy ~ zx, col = "red", type = "l", ylab = "density", xlab = "z")
x <- seq(-2, -1, .01)
y <- dnorm(x)
polygon(c(min(x), x, max(x)), c(0,y,0), col = "blue")
lowp <- pnorm(-2)
upperp <- pnorm(-1)
area <- round(upperp - lowp, digits = 3)
text(-2.4, .305, "Area =")
legend(-2.8, .3, area, bty = "n")
arrows(x0 = -2.3, y0 = .27, x1 = -1.5, y1 = .15)

### Now let's shade another polygon using Achenbach's data.
### Shade area below Behav. Prob. score of 35 under the normal distribution
rm(x)       # Housecleaning
dataAchenbach <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/
Fig6-3.dat", header = TRUE)
attach(dataAchenbach)
hist(BehavProbTot, breaks = 35, freq = FALSE, xlim = c(15, 85), ylim = c(0,.05),  
xlab = "Behavior Problem Score", main = "")
par(new = TRUE)
plot(BehavProbTot, dnorm(BehavProbTot, mean = 49.12, sd = 10.548), 
  type = "l", xlim = c(15, 85), ylim = c(0,.05), 
ylab = "", xlab = "", col = "red")
x <- seq(0, max(BehavProbTot), .01)
y <- dnorm(x, mean(BehavProbTot), sd(BehavProbTot))
polygon(c(x[x <= 35],35),c(y[x <= 35], 0), col = "blue")

dch: