The chi-square test is quite easy to run in R because we have a function "chisq.test()" built into the base package of R. It can be used either for a one dimensional design or for a contingency table. I will start with the one-way design in Table 19.1. These are the data collected by Emily Rosa on therapeutic touch. The analysis is remarkably simple. I have included a barplot, and would encourage you to play with different parameters of that function call to see what happens.
### From Table 19.1
observed <- c(123, 157)
names(observed) <- c("Correct", "Incorrect")
barplot(observed, density = 10, col = "blue", ylab = "Number of Correct Trials",
main = "Emily's Data")
result <- chisq.test(observed, correct = FALSE)
prob <- result$p.value
cat("The chi-square statistic is = ", result$statistic, "\nwith a probability of ", prob)
print("The complete printout is \n")
print(result)
### Let's draw a chi-square distribution of 4 df
x <- seq(0,20, by = .1)
y <- dchisq(x = x, df = 4)
plot(y~x, type = "l")
### Compute critical value on 4df
cutoff <- qchisq(p = .95, df = 4)
cat("The critical value of chi-square at .05 on 4 df = \n",cutoff)
xx <- seq(cutoff, 25, .01)
yy <- dchisq(xx, df = 4)
polygon(x = c(cutoff,xx), y = c(0,yy), density = 5)
value <- expression(chi ^2)
legend.value <- c(value, "= 9.488")
legend(15, .05, legend.value, bty = "n")
arrows(15,.035, 11, .018)
There is a lot in that set of code. I calculated chi-square for Emily's data, which had a probability under the null of .042, which is significant at p = .05. Something that may be new in that code is a statement that uses "result$statistic." When chisq.test() does its thing, it stores the output in an object called "result." But that object includes a whole bunch of stuff, including the value of chi-sq, its degrees of freedom, and its probability under the null. If you type "names(result)" you will see a list of the parts of result.
> names(result) [1] "statistic" "parameter" "p.value" "method" "data.name" [6] "observed" "expected" "residuals" "stdres"
The chi-square statistic is labeled "statistic", and can be retrieved by "result$statistic." The probability value can be retrieved by "result$p.value", and so on.
Even though the previous example only had 1 df, I used 4 df for this example to show how skewed the distribution is. Below that I calculated the critical value of chi-sq for 4 df and printed it on the plot.
We can move on to a simple 2 x 2 contingency table using the data from Walsh et al. (2006) data on Prozac and Anorexia. Here we enter the data as a matrix, specifying the orientation of the data.
### 2 x 2 Contingency Table Data from Walsh et al. 2006 contin.table = matrix(c(13, 36, 14, 30), byrow = TRUE, ncol = 2) test.stat <- chisq.test(contin.table, correct = FALSE) print(test.stat) Fisher.exact <- fisher.test(contin.table, ) print(Fisher.exact) - - - - - - - - - - - - - - - - - - - - - - Pearson's Chi-squared test data: contin.table X-squared = 0.3146, df = 1, p-value = 0.5749 Fisher's Exact Test for Count Data data: contin.table p-value = 0.6501 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.2859687 2.0897821 sample estimates: odds ratio 0.7759623
The last thing that I want to show for this chapter is the calculation of risk ratios and odds ratios. I use the data on heart attacks and aspirin in Table 19.6. I will also use the data for Exercise 19.18 because with higher probability values the risk ratio and odds ratio will differ. There is an R package called "epicalc" that we could use to compute these values, but it adds a layer of complexity that we can do without. So I simply perform the arithmetic myself. But first a bit about using matrices
Notice that for both of the analyses I have read the data into a matrix. How that works should be clear. When it comes to calculating Risk.placebo, I specify "contin.table[2,1]/sum(contin.table[2,]" The first is the entry in cell(2,1), which is 189. The second is the sum of the cells in row 2, specified as contin.table[2,]. The [2,] refers to all of the entries in row 2. Similarly, contin.table[,2] would be all of the entries in column 2, whereas [1,2] would be the first entry in column 2. When I came down to using the example from Dabbs and Morris (1990), I simply entered the numerical values, rather than the cell locations. Either approach works.
### Odds ratios and risk ratios
contin.table <- matrix(c(104, 10933, 189, 10845), byrow = TRUE, ncol = 2)
rownames(contin.table) <- c("Aspirin","Placebo")
colnames(contin.table) <- c("Heart Attack","No Heart Attacl")
print(contin.table)
Risk.Placebo <- contin.table[2,1]/sum(contin.table[2,])
Risk.Aspirin <- contin.table[1,1]/sum(contin.table[1,])
RR <- Risk.Aspirin/Risk.Placebo
cat("The risk ratio for heart attack given Aspirin = ",RR)
Odds.Placebo <- contin.table[2,1]/contin.table[2,2]
Odds.Aspirin <- contin.table[1,1]/contin.table[1,2]
OR <- Odds.Aspirin/Odds.Placebo
cat("The Risk Ratio is = ",RR,"\n")
cat("The Odds ratio is = ",OR,"\n")
cat("The two ratios are almost the same because of the very low probability of a heart
attack")
### Using a table with smaller N and higher proportions-- From Ex19.18
dabbs <- matrix(c(402, 3614, 101, 345), byrow = TRUE, ncol = 2)
rownames(dabbs) <- c("Normal", "High")
colnames(dabbs) <- c("Delinq", "Non-delinq")
chisq.test(dabbs, correct = FALSE)
cat("The following will produce a risk ratio \n")
RR <- (402/4016)/(101/446)
OR <- (402/3614)/(101/345)
cat("The Risk Ratio is = ",RR,"\n")
cat("The Odds ratio is = ",OR,"\n")
- - - - - - - - - - - - - - - - - - - -
The Risk Ratio is = 0.442024
The Odds ratio is = 0.379958
dch: