R Code for Chapter 3

Displaying Data

For each chapter I will present the code that I used in that chapter. In some places I will give you just the simple code that is printed in the text. In other cases I will give code that goes beyond what I have given in the book, but I will add explanation of all new material. In most cases I will make comments about specific aspects of the code. But I have to admit that I sometimes get carried away and keep adding stuff. If you come to a point where you say to yourself "this guy's crazy," that is probably where you should throw up your hands and quit. But if you think that you are still learning something useful, keep going.

Before you go further, I would suggest that you can make your life a bit simpler by having your own copies of the R code files and the data files. Once you have copies on your own machine or a thumb drive, you can load those files directly without even being connected to the Internet. A very brief page on getting those files can be found at http://www.uvm.edu/~dhowell/fundamentals9/downloading.html.

As a brief introduction to R you would expect this to be a short page. But the chapter turned out to have much more code than I thought, so this page is quite long. (Not a good way to start out, but you can't say that I didn't warn you!) There is a risk that you will be discouraged because of just how much is here. You don't have to read it all! But read what you can and try to understand what is happening. You will be surprised how much you will be able to figure out. And remember, this is not a course on R. I want you to understand stuff and be able to take some code and make obvious changes for your own purposes, but I don't expect you to sit at a blank screen and write a lot of code.

Figure 3.1 -- Histogram for reaction time

The code for the first figure in Chapter 3 follows, but before you jump into it, I want to make a few comments.


### ---------- This file contains code for in Chapter 3 
###                Figure 3.1
### ---------- Plotting a histogram for the reaction time data. ----------

# setwd("C:/Users/Dave/Dropbox/Webs/Fundamentals9/DataFiles")    
    #An example of "set working directory. You will need to provide the correct path for your computer."
# data1<- read.table("Tab3-1.dat", header = TRUE)                
    # With the directory set, just give the file name, assuming the file is on your machine in the 
    #  default directory.
	
data1 <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Tab3-1.dat", 
header = TRUE)                              # This assumes that you are using files on the web.
names(data1)                                 # Prints out the names of those variables
attach(data1)                                  # Make the variables available for use
par(mfrow = c(2,1))                       # Just to make things pretty--
                                                          # For one plot, this uses only half the screen
hist(RTsec, breaks = 40, xlim = c(0,5), xlab = "Reaction Time (in sec.)") 

stem(RTsec, scale = 2)                  # Not discussed yet in text

# More useful stuff
install.packages("psych")            # Only use this the first time you install  a library
library(psych)                                # Assumes psych package has been installed
describe(data1)                            # Extra stuff that we come back to later

Any line that begins with a "#" is a comment and is skipped as the program runs.
Lines 5 and 6 show you how to specify a default directory. Having done that, whenever the code calls for a file, R will start its search in that directory, which makes life easier. But this only works if the file is in one of the directories on your computer. To run the code, remove the "#" (comment) at the beginning of the line. If I were you, I would create a folder on my machine that contains all of the data sets and all of the files ending in "Chapter??.R. Then you can set that up as the default directory, and your work will be much simpler.
Some of you will know that it is usually safer to use "https" instead of "http" to address a file on the web. But if you are running R on a Mac or a Unix machine, R doesn't know what https means unless you do a bunch of other stuff. So I will drop the "s." But those on a Windows machine can still put it in. And this only applies to R. A browser on a Mac knows perfectly well what "https" means.
Notice the line that says "attach(data1)" I talk about this on a separate page, but it is basically a way for you to be able to easily call up a variable that was read into data1. The page about "attach() is at is at https://www.uvm.edu/~dhowell/fundamentals9/Supplements/attaching.html
You will often see a command like "par(mfrow = c(2,1)" That sets up the graphics screen to have two rows and one column. This will allow me to plot two figures, one above the other. By changing the numbers you can change the partitioning.
The "hist" function obviously draws a histogram. "breaks" tells it (approximately) how many breaks there will be on the X axis, xlim tells it the range of values on the X axis, and you can figure out the rest. In the "stem" function, "scale = 2" tells it how to break up the data. Change 2 to 1 or 3 and you'll see what I mean.
I tell you in several places about the "install.packages" command. You only use if the first time you need the package. After that you don't need it. (It's like shoes. You only have to buy them once, but you have to put them on each time you need them. The "library" command is like putting on your shoes. Installing a package makes it available on your computer for future use. In this example it gets the code ready to use a function that is in the psych library--e.g. describe(data1). We need the psych library because the base package of R doesn't have a function that does what we want.

I said a lot about that little bit of code, but having said it once, I can skip it in the next batch of code.

Figure 3.4 -- Does it hurt to skip class?

I haven't told you how to draw a stem-and-leaf display, but it was in the previous batch of code and I'm sure that you don't need my advice. But a back-to-back stem-and-leaf display is a bit trickier, so I'll say something about that. The code follows my comments. The base package doesn't do back-to-back stem plots, so I need to load a library called aplpack.

This set of code introduces a new feature--subsets. We want to break the data down into those students who missed class frequently and those who attended most classes. So we use the "subset" command. Notice that when we ask if Attend is equal to 1, we use the double equal sign (==). The single equal sign is for use when you want to set something equal to some value.

We need the "stem.leaf.backback()" function. We call it with two variables for input. One contains the data for those who freqently missed class, and the other contains the data for those who were usually there.

There is a final command that I do not use as often as I should. I told the program to "detach(data1)." This gets rid of those particular variables -- at least for the time being. (But it does not delete the original data frame.) It is unlikely to be important here, though it is still a good idea. But suppose that the code had created or loaded variables named X and Y. It is reasonably likely that the next bunch of code that I create could also have variables with those names. Then everything gets confused when one variable seems to take the place of another. It doesn't always mess up, but it does so often enough that it is a good idea to use the detach command. Another alternative, especially after you have been trying, without success, to get something to work for a while, is to just get rid of all of your variables. For that you can use the "rm(list = ls()) command, which removes variables. You can either type something like rm(X), or you can throw up your hands and remove everything with rm(list = ls()). But you still need to use detach() to be sure that everything is really gone, and even then some of it seems to hang around. When all else fails, quit R and start over.



### Draw Figure 3.4

### The stem-and-leaf is given by stem(RTsec, scale = 2) in previous code
### ----------Drawing back-to-back stem-and-leaf displays ----------
### But see a slightly different example below, which might be clearer.
### install.packages(aplpack)                    # This command only needed once

### NOTE!!! AS OF 6/7/2016 LIBRARY(APLPACK) FAILED !!!!

library(aplpack)                                          # Load the package
data2 <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Fig3-4.dat", 
header = TRUE)
names(data2) 
attach(data2)
missed <- subset(Points, Attend == 1)   # Creating subsets of data  for each category
attended <- subset(Points, Attend == 2)

stem.leaf.backback(missed, attended)   # display appear in console, not as graph
detach(data2)                                             # Clears out those variables

Figure 3.5 -- Reaction time and accuracy

Bargraphs are something that people often use in presenting data, and it is worth seeing how those can be created. The following code first prints out the descriptive statistics, using a function from the "psych" library. That is pretty clear. But bar graphs are a bit more complicated. We first use a function called tapply which essentially creates a table of means. You can read the command to say "set the variable 'group.means' equal to the mean of RTsec broken down by the different levels of Accuracy." That will give us the means of correct and incorrect responses. Then we adjust the size of the graphics display with par(mfrow = c(1,1) by saying that it has 1 row and 1 column. (Try changing that to c(2,3) to see what you will get if you run the plot a couple of times. )

Finally the barplot command tells it make the plot using the variable "means" -- our sample means--, name those means as "correct" and "wrong", run the bars vertically, and labels a bunch of stuff. The "density" just tells it how close to make the cross-hatching. (Try something else to see what happens. Then play around with the other parameters to see what they do.)

attach(data1)                        # The data frame from first set of code
# install.packages("psych")
library(psych)
describe(data1)                   # Computes many descriptive statistics for each variable
				                # More appropriate for next few chapters--Ignore warnings

###------------Creating barplots --------
### You probably don't want to get into "tapply" yet, but we use it for barplots
### tapply(RTsec, Accuracy, group.means)  calculates the mean of RTsec broken down 
### by Accuracy, and name that variable 'group.means'.
                   
group.means = tapply(RTsec, Accuracy, group.means) # Compute average of RTsec for each level of Accuracy.

par(mfrow = c(1,1))
barplot(group.means, names.arg = c("Correct", "Wrong"),  horiz = FALSE, 
  ylab = "Mean Reaction Time", density = 10, angle = 45, col = "green", space = 1, width = .5,
  border = "blue", main = "Reaction Time as Function \n of Accuracy")   
        # using "\n" goes to new line.

Figure 3.6 -- A line graph

Moving along slowly we come to Figure 3.6. Here I have entered the data directly because it is easier than reading them in. Note how I did that. The command "age = seq(from = 8, to = 18)" just creates a variable with the values 8, 9, 10, ..., 18. (I could use "age <- c(8:18)" instead.) The "c() tells R to "concatenate" or "combine".Then I tell it to plot freq as a function of age and then hours as a function of age. That isn't too complicated. I used par(mfrow = c(2,1) to make space for both plots on the same screen.

### ---------- Creating Figure 3.6 on video game play ----------
###  Gentile,D. Pathological Video-game use among youth 8 to 18. 
###  Psychological Science, 2009, 20, 594-602.

hours <- c(11.3, 11.7, 14.3, 11.9, 15.0, 16.4, 12.1, 11.1, 13.6, 14.9, 12.6)
freq <- c(4.8, 4.85, 4.9, 4.4, 5.4, 4.85, 3.6, 4.1, 3.4, 3.7, 2.9)
age <- seq(from = 8, to = 18)
par(mfrow = c(2,1))                     # Changes size and shape of display screen
plot(freq ~ age, ylim = c(0,6), type = "l", ylab = "Mean Frequency", 
     main = "Frequency of Video \n Play by Age")   # The "l" means "line"
plot(hours ~ age,  type = "l", ylab = "Mean Hours Use",ylim = c(10,18), 
     main = "Hours of Video \n Play by Age")

Skipping a bit

The next plot looks at reaction time as a function of the degree to which the stimulus has been rotated. I suspect that you can figure that out if I tell you that "hist(RTsec[angleFactor == 20])" simply tells the program to make a histogram of RTsec, but only for the case where the variable "angleFactor" is equal to 20. (As an aside, square brackets "[ ]" represent subscripts. So I am asking hist() to use the variable RTsec, where the value of angleFactor is 20.) The one thing that is different here is that we had to take "Angle", which was a continuous variable, and make it into a factor, which is just an ordinal or nominal variable. You can see how I did that. I'll skip the next two plots with the comment that the line "cat("The mean ...") is essentially equivalent to a print command, except that you can do a bit more with cat, such as write some text and then write the value of a variable. See if you can figure out why I created a variable called "breaksAt."

In the code you will see something new. We read the data into a data frame named mrdata. It has a couple of variables in it. But I wanted to add another variable named "RxTime", which was a rounded off set of RTmsec scores. By specifying mrdata$RxTime, I am telling it to calculate RxTime and make it part of the mrdata data frame. You will also come across a line that says "a <- table(RxTime)." What do you think that does? (It creates a frequency distribution.) I snuck in a line that says "tempAcc <- tapply(RxTime, Accuracy, mean). That creates a table of means of RxTime broken down by Accuracy.

### Mental Rotation
### 600 trials with rotations in 20 degree intervals.
# mrdata <- read.table(file.choose(), header = TRUE)   # You want Tab3-1.dat--- or use
mrdata <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Tab3-1.dat", 
header = TRUE)
mrdata$RxTime <- round(mrdata$RTmsec/1000, digits = 2)
attach(mrdata)
par(mfrow = c(2,2))
meanrxt <- mean(RxTime)
sdrxt <- sd(RxTime)   
cat("The mean reaction time is ",meanrxt, " msec. \n")                   
cat("The s.d. of reaction time is ",sdrxt, " msec. \n")
cat("\n")
cat("Mean reaction time as a function of correct response \n   Wrong    Right \n")
breaksAt <- seq(from = 0.4, to = 5, by = .1)
hist(RxTime, breaks = breaksAt, main = "Histogram of Reaction Time", 
xlab = "Reaction Time (seconds)", density = 10)

########################################################################
tempAcc <- tapply(RxTime, Accuracy, mean)
barplot(tempAcc, ylim = c(0,2), xlab = "Accuracy", 
   ylab = "Mean Reaction Time", names.arg = c("Wrong", "Right"),
   xlim = c(0,5), width = 1)
print(tempAcc); cat("\n")
tempAngle <- tapply(RxTime, Angle, mean)
barplot(tempAngle, ylim = c(0,2), xlab = "Angle",
   ylab = "Mean Reaction Time")
cat("Mean reaction time as a function of degrees of rotation \n")
print(tapply(RxTime, Angle, mean)); cat("\n")     # ";" is like starting a new line

pctcorrect <- round(100*(sum(Accuracy) /length(Accuracy)))
cat( pctcorrect, "percent of the responses were correct \n")
cat("\n")
a <- table(RxTime)
cat("The frequency distribution for reaction times rounded to 10ths of a second is, \n ")
print(a[1:78])     # Produces a frequency distribution of reaction times.

Enough is Enough

This page has gone on way-way too long, so I will quit here.What follows are pieces of code that deal with the exercises at the end of the chapter. I give them without comment.


### _______________________________________________________________________________
### ---------- Exercise 3.34 -- Creating a boxplot ----------
data4 <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Fig3-4.dat", header = TRUE)
attach(data4)
names(data4)
boxplot(Points ~ Attend)

### _______________________________________________________________________________
# Exercise 3.20

Ethnicity <- c("White","Black", "NatAmer","Hispanic","Asian","Foreign")
Yr82 <-c(9997, 1101, 88, 519, 351, 331)
Yr91 <- c(10990, 1335, 114, 867, 637, 416)
Yr05 <- c(11774, 2276, 179, 1935, 1164, 591)
Names= c("White","Black","NativeAmer","Hispanic","Asian","Foreign")
Change8291 <- ((Yr91 - Yr82)/Yr82) * 100
Change9105 <- ((Yr05 - Yr91)/Yr91) * 100
Change8205 <- ((Yr05 - Yr82)/Yr82) * 100
barplot(Change8205,  angle = 30, density = 5, names.arg = Names, xlab = "Ethnicity",
        col = "green" , ylab = "Percentage Change", main = "Change from 1982 - 2005")

### ________________________________________________________________________________
### Calculate sample mode     # This section adds in several things that you haven't seen before.
mySample <- c(19,4,5,7,29,19,29,13,25,19, 29)
tabSmpl <- tabulate(mySample)
print(tabSmpl)        # This will tell you that there are 3 19's and 3 29's, 
				   # but only 1 4, 1 5, etc..
SmplMode <- which(tabSmpl == max(tabSmpl))
# if(sum(tabSmpl == max(tabSmpl)) > 1) SmplMode <- NA
### That line gives NA if there are 2 or more modes
cat(SmplMode)

# or
a <- mySample
table(a)
names(table(a))[table(a)==max(table(a))]

### _______________________________________________________________________________
### Life expectancy
par(mfrow = c(2,1))
whitewomen <- c(49, 52, 66, 64, 67, 72, 74, 76, 78, 80, 80)
blackwomen <- c(34, 38, 45, 49, 55, 63, 66, 68, 73, 74, 75)
year <- seq(1900,2000,10)
plot(year, whitewomen, ylab = "Life Expectancy", ylim = c(40,90),type = "l", lty = "dashed" )  
title( main = "Change in Life Expectancy " )
legend(x = 1920, y = 80, "White", bty = "n")
par(new = T)
plot(year, blackwomen, ylim = c(40,90),type = "l", lty = "solid")
legend(x = 1960, y = 70, "Black", bty = "n")

### _______________________________________________________________________________
### The following code goes beyond what I have covered in the text, but it is
### worth running it to see what you get.

### Superimpose density function on a histogrram  --  basically smooths the histogram

data1 <- read.table("http://www.uvm.edu/~dhowell/fundamentals9/DataFiles/Tab3-1.dat", header = TRUE)
attach(data1)
hist(RTsec, breaks = 40, xlim = c(0,5), xlab = "Reaction Time (in sec.)")
par(new = TRUE)                      # Will cause next plot to be superimposed  on the histogram
plot(density(RTsec), xlim = c(0,5), main = '', xlab = '', ylab = '', yaxt = "n")

#### Now superimpose normal distribution
y <- dnorm(seq(0,5, .1), mean = mean(RTsec), sd = sd(RTsec))
x <- seq(0,5,.1)
hist(RTsec, breaks = 15, xlim = c(0,5),ylim = c(0,.8), freq = FALSE)
par(new = TRUE)
plot(y~x, xlim = c(0,5), ylim = c(0,.8), xlab = "", ylab = "", type = "l")

 _________________________________________________________________________

dch: