R Code for Chapter 19

Chi-Square

The chi-square test is quite easy to run in R because we have a function "chisq.test()" built into the base package of R. It can be used either for a one dimensional design or for a contingency table. I will start with the one-way design in Table 19.1. These are the data collected by Emily Rosa on therapeutic touch. The analysis is remarkably simple. I have included a barplot, and would encourage you to play with different parameters of that function call to see what happens.

### From Table 19.1
observed <- c(123, 157)
names(observed) <- c("Correct", "Incorrect")
barplot(observed, density = 10, col = "blue", ylab = "Number of Correct Trials", 
  main = "Emily's Data")
result <- chisq.test(observed, correct = FALSE)
prob <- result$p.value
cat("The chi-square statistic is = ", result$statistic, "\nwith a probability of ", prob)
print("The complete printout is \n")
print(result)

### Let's draw a chi-square distribution of 4 df
x <- seq(0,20, by = .1)
y <- dchisq(x = x, df = 4)
plot(y~x, type = "l")

### Compute critical value on 4df
cutoff <- qchisq(p = .95, df = 4)
cat("The critical value of chi-square at .05 on 4 df = \n",cutoff)
xx <- seq(cutoff, 25, .01)
yy <- dchisq(xx, df = 4)
polygon(x = c(cutoff,xx), y = c(0,yy), density = 5)
value <- expression(chi ^2)
legend.value <- c(value, "= 9.488")
legend(15, .05, legend.value, bty = "n")
arrows(15,.035, 11, .018)

There is a lot in that set of code. I calculated chi-square for Emily's data, which had a probability under the null of .042, which is significant at p = .05. Something that may be new in that code is a statement that uses "result$statistic." When chisq.test() does its thing, it stores the output in an object called "result." But that object includes a whole bunch of stuff, including the value of chi-sq, its degrees of freedom, and its probability under the null. If you type "names(result)" you will see a list of the parts of result.

> names(result)
[1] "statistic" "parameter" "p.value"   "method"    "data.name"
[6] "observed"  "expected"  "residuals" "stdres"

The chi-square statistic is labeled "statistic", and can be retrieved by "result$statistic." The probability value can be retrieved by "result$p.value", and so on.

Even though the previous example only had 1 df, I used 4 df for this example to show how skewed the distribution is. Below that I calculated the critical value of chi-sq for 4 df and printed it on the plot.

We can move on to a simple 2 x 2 contingency table using the data from Walsh et al. (2006) data on Prozac and Anorexia. Here we enter the data as a matrix, specifying the orientation of the data.

### 2 x 2 Contingency Table  Data from Walsh et al. 2006

contin.table = matrix(c(13, 36, 14, 30), byrow = TRUE, ncol = 2)
test.stat <- chisq.test(contin.table, correct = FALSE)
print(test.stat)
Fisher.exact <- fisher.test(contin.table, )
print(Fisher.exact)
- - - - - - - - - - - - - - - - - - - - - -
	Pearson's Chi-squared test
data:  contin.table
X-squared = 0.3146, df = 1, p-value = 0.5749

	Fisher's Exact Test for Count Data
data:  contin.table
p-value = 0.6501
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.2859687 2.0897821
sample estimates:
odds ratio 
 0.7759623

The last thing that I want to show for this chapter is the calculation of risk ratios and odds ratios. I use the data on heart attacks and aspirin in Table 19.6. I will also use the data for Exercise 19.18 because with higher probability values the risk ratio and odds ratio will differ. There is an R package called "epicalc" that we could use to compute these values, but it adds a layer of complexity that we can do without. So I simply perform the arithmetic myself. But first a bit about using matrices

Notice that for both of the analyses I have read the data into a matrix. How that works should be clear. When it comes to calculating Risk.placebo, I specify "contin.table[2,1]/sum(contin.table[2,]" The first is the entry in cell(2,1), which is 189. The second is the sum of the cells in row 2, specified as contin.table[2,]. The [2,] refers to all of the entries in row 2. Similarly, contin.table[,2] would be all of the entries in column 2, whereas [1,2] would be the first entry in column 2. When I came down to using the example from Dabbs and Morris (1990), I simply entered the numerical values, rather than the cell locations. Either approach works.


### Odds ratios and risk ratios
contin.table <- matrix(c(104, 10933, 189, 10845), byrow = TRUE, ncol = 2)
rownames(contin.table) <- c("Aspirin","Placebo")
colnames(contin.table) <- c("Heart Attack","No Heart Attacl")
print(contin.table)
Risk.Placebo <- contin.table[2,1]/sum(contin.table[2,])
Risk.Aspirin <- contin.table[1,1]/sum(contin.table[1,])
RR <- Risk.Aspirin/Risk.Placebo
cat("The risk ratio for heart attack given Aspirin = ",RR)
Odds.Placebo <- contin.table[2,1]/contin.table[2,2]
Odds.Aspirin <- contin.table[1,1]/contin.table[1,2]
OR <- Odds.Aspirin/Odds.Placebo
cat("The Risk Ratio is = ",RR,"\n")
cat("The Odds ratio is = ",OR,"\n")
cat("The two ratios are almost the same because of the very low probability of a heart
 attack")


### Using a table with smaller N and higher proportions-- From Ex19.18

dabbs <- matrix(c(402, 3614, 101, 345), byrow = TRUE, ncol = 2)
rownames(dabbs) <- c("Normal", "High")
colnames(dabbs) <- c("Delinq", "Non-delinq")
chisq.test(dabbs, correct = FALSE)
cat("The following will produce a risk ratio \n")
RR <- (402/4016)/(101/446)
OR <- (402/3614)/(101/345)
cat("The Risk Ratio is = ",RR,"\n")
cat("The Odds ratio is = ",OR,"\n")
- - - - - - - - - - - - - - - - - - - -
The Risk Ratio is =  0.442024 
The Odds ratio is =  0.379958

dch: