The chi-square test is quite easy to run in R because we have a function "chisq.test()" built into the base package of R. It can be used either for a one dimensional design or for a contingency table. I will start with the one-way design in Table 19.1. These are the data collected by Emily Rosa on therapeutic touch. The analysis is remarkably simple. I have included a barplot, and would encourage you to play with different parameters of that function call to see what happens.
### From Table 19.1 observed <- c(123, 157) names(observed) <- c("Correct", "Incorrect") barplot(observed, density = 10, col = "blue", ylab = "Number of Correct Trials", main = "Emily's Data") result <- chisq.test(observed, correct = FALSE) prob <- result$p.value cat("The chi-square statistic is = ", result$statistic, "\nwith a probability of ", prob) print("The complete printout is \n") print(result) ### Let's draw a chi-square distribution of 4 df x <- seq(0,20, by = .1) y <- dchisq(x = x, df = 4) plot(y~x, type = "l") ### Compute critical value on 4df cutoff <- qchisq(p = .95, df = 4) cat("The critical value of chi-square at .05 on 4 df = \n",cutoff) xx <- seq(cutoff, 25, .01) yy <- dchisq(xx, df = 4) polygon(x = c(cutoff,xx), y = c(0,yy), density = 5) value <- expression(chi ^2) legend.value <- c(value, "= 9.488") legend(15, .05, legend.value, bty = "n") arrows(15,.035, 11, .018)
There is a lot in that set of code. I calculated chi-square for Emily's data, which had a probability under the null of .042, which is significant at p = .05. Something that may be new in that code is a statement that uses "result$statistic." When chisq.test() does its thing, it stores the output in an object called "result." But that object includes a whole bunch of stuff, including the value of chi-sq, its degrees of freedom, and its probability under the null. If you type "names(result)" you will see a list of the parts of result.
> names(result) [1] "statistic" "parameter" "p.value" "method" "data.name" [6] "observed" "expected" "residuals" "stdres"
The chi-square statistic is labeled "statistic", and can be retrieved by "result$statistic." The probability value can be retrieved by "result$p.value", and so on.
Even though the previous example only had 1 df, I used 4 df for this example to show how skewed the distribution is. Below that I calculated the critical value of chi-sq for 4 df and printed it on the plot.
We can move on to a simple 2 x 2 contingency table using the data from Walsh et al. (2006) data on Prozac and Anorexia. Here we enter the data as a matrix, specifying the orientation of the data.
### 2 x 2 Contingency Table Data from Walsh et al. 2006 contin.table = matrix(c(13, 36, 14, 30), byrow = TRUE, ncol = 2) test.stat <- chisq.test(contin.table, correct = FALSE) print(test.stat) Fisher.exact <- fisher.test(contin.table, ) print(Fisher.exact) - - - - - - - - - - - - - - - - - - - - - - Pearson's Chi-squared test data: contin.table X-squared = 0.3146, df = 1, p-value = 0.5749 Fisher's Exact Test for Count Data data: contin.table p-value = 0.6501 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.2859687 2.0897821 sample estimates: odds ratio 0.7759623
The last thing that I want to show for this chapter is the calculation of risk ratios and odds ratios. I use the data on heart attacks and aspirin in Table 19.6. I will also use the data for Exercise 19.18 because with higher probability values the risk ratio and odds ratio will differ. There is an R package called "epicalc" that we could use to compute these values, but it adds a layer of complexity that we can do without. So I simply perform the arithmetic myself. But first a bit about using matrices
Notice that for both of the analyses I have read the data into a matrix. How that works should be clear. When it comes to calculating Risk.placebo, I specify "contin.table[2,1]/sum(contin.table[2,]" The first is the entry in cell(2,1), which is 189. The second is the sum of the cells in row 2, specified as contin.table[2,]. The [2,] refers to all of the entries in row 2. Similarly, contin.table[,2] would be all of the entries in column 2, whereas [1,2] would be the first entry in column 2. When I came down to using the example from Dabbs and Morris (1990), I simply entered the numerical values, rather than the cell locations. Either approach works.
dch:### Odds ratios and risk ratios contin.table <- matrix(c(104, 10933, 189, 10845), byrow = TRUE, ncol = 2) rownames(contin.table) <- c("Aspirin","Placebo") colnames(contin.table) <- c("Heart Attack","No Heart Attacl") print(contin.table) Risk.Placebo <- contin.table[2,1]/sum(contin.table[2,]) Risk.Aspirin <- contin.table[1,1]/sum(contin.table[1,]) RR <- Risk.Aspirin/Risk.Placebo cat("The risk ratio for heart attack given Aspirin = ",RR) Odds.Placebo <- contin.table[2,1]/contin.table[2,2] Odds.Aspirin <- contin.table[1,1]/contin.table[1,2] OR <- Odds.Aspirin/Odds.Placebo cat("The Risk Ratio is = ",RR,"\n") cat("The Odds ratio is = ",OR,"\n") cat("The two ratios are almost the same because of the very low probability of a heart attack") ### Using a table with smaller N and higher proportions-- From Ex19.18 dabbs <- matrix(c(402, 3614, 101, 345), byrow = TRUE, ncol = 2) rownames(dabbs) <- c("Normal", "High") colnames(dabbs) <- c("Delinq", "Non-delinq") chisq.test(dabbs, correct = FALSE) cat("The following will produce a risk ratio \n") RR <- (402/4016)/(101/446) OR <- (402/3614)/(101/345) cat("The Risk Ratio is = ",RR,"\n") cat("The Odds ratio is = ",OR,"\n") - - - - - - - - - - - - - - - - - - - - The Risk Ratio is = 0.442024 The Odds ratio is = 0.379958