R Code for Chapter 7

Basic Concepts of Probability

The first batch of code should be fairly clear. Notice how I created a variable named "labels" and then used that variable in the barplot command. In the next portion I read in data from 2012 and 2002 and plotted them separately. I don't know why the fitted curve is noticeably smoother than the one in the book. Changing some of the parameters of scatter.smooth() does not seem to make a big difference. Notice the separate lines for the two distributions.

###  Chapter 7

### Figure 7.1    Percentages are close estimates of actual values.*`-

Percentages <- c(8,39,25,10,10,9,3)
labels <- c("Strongly\nAgree", "Agree", "Slightly\nAgree",
 "Neither", "Slightly\nDisagree", "Disagree", "Strongly\nDisagree")
barplot(height = Percentages, names = labels, ylab = "Percentage Agreement")
   # To see all labels, drag the plot wider.

### Figure 7.2
### Birthrate and age -- First Child
### Data from National Vital Statistics Reports vol 62, #9, Births:Final Data for 2012.
par(mfrow = c(2,1))    # Fit on same plot--2 rows, 1 column

# From 2012
age <- c(14, 15, 16, 17, 18, 19, 22, 27, 32, 37, 42, 47, 52)
births2012 <- c( 3578, 10570, 24700, 43864,  70517, 101371, 461553, 421704, 299857, 106892,
  24251,  1952,  167)
total2012 <- sum(births2012)
percent.rate2012 <- births2012/total2012
scatter.smooth(percent.rate2012 ~ age, xlim = c(12, 55), ylim = c(0, .32), col = "red",
   ylab = "Percentage of Births", family = "gaussian", main = "Birthrate by Age 2012")
### Type "?scatter.smooth" to learn more about this function.

# From 2002
births2002 <- c(7149, 17909, 39736, 66132, 93489, 118795, 472976, 378647, 276110, 102180,
 95788, 1302, 63)
total2002 <- sum(births2002)
percent.rate2002 <- births2002/total2002
par(new = TRUE)     # Superimpose another curve
scatter.smooth(percent.rate2002~age, xlim = c(12, 55), ylim = c(0, .32), col = "blue",
   ylab = "", xlab = "", family = "gaussian")
legend(40, .35, "red = 2012\nblue = 2002", bty = "n")

The next set of code shows two ways to create the data table in Exercise 7.25. As I said in the code, for a huge data set like this I can't believe that anyone would do it the way I did first. But for smaller datasets, that makes some sense. To understand just what is happening there, go to the console window of R and enter ?rep. That will show you how the "rep()" (repeat) command is set up. Once I created these two very long variables, I used the table() command to make them into a table showing the frequencies of each race/sentence pair. Note how I added row and column names, which get printed out when we print out the table. You certainly don't have to add all the extra goodies just to make it pretty.

The second way to create this table makes a lot more sense. I simply created a 2 X 2 matrix. The byrow = TRUE and ncol = 2 are designed to explain to R how the numbers fall into the matrix. The prop.table() command simply turns those matrix elements into proportions. Note the each command either has a 1 or a 2. A 1 means that I want proportions taken across the rows, while a 2 does the same for columns.The last two lines give row and column percentages.

### Creating table in Ex7-25    No one would seriously do this for such a huge N
race <- rep(c(1,2),c(616, 278))
sentence <- rep(c(1,2,1,2), times = c(388, 228, 202, 76))
display <- table(race, sentence)
rownames(display) <- c("Non-White", "White")
colnames(display) <- c("Death Sentence", "No Death Sentence")
print(display)
### The following won't come up until Chapter 19, but it is appropriate here
print(chisq.test(display))

### Now let's set it up more reasonably by entering cell frequencies.
dataTable <- matrix(c(388, 228, 202, 76), byrow = TRUE, ncol = 2, 
  dimnames = list(c("Yes", "No"), c("Non-White", "White")))
print(dataTable)                  # Just to make sure that you have what you want.
prop.table(dataTable,1)     # Proportions within rows (1)
prop.table(dataTable,2)     # Proportions within columns (2)
print(dataTable)
print(chisq.test(dataTable))

margin.table(dataTable,1)     # "1" translates to rows, "2" to columns
margin.table(dataTable,2)     # These give row and column marginal totals

dch: