 
ChiSquare Analyses of Categorical Data
10/2/01
Announcements:
Introduction:
 Categorical (Qualitative) data
 Point to differences from data on which we calculate means and standard deviations.
 We have seen examples of each in lab, with Liddle's means and
Siegel's frequencies.
 Our individual observations can be classified into a few bins or cells
 The basic result is a set of counts.
 I'll start with the simplest possible models and move up in complexity.
 Note that I am always asking whether a particular model will fit a particular set of
data.
 The model usually defines the null hypothesis.
 A model is a statement about the process behind the data. For example:
 "The data are normally distributed"
 "The data depend only on row and column parameters."
 "The data are equally distributed across a set of cells."
 etc.
Goodness of fit tests
(Present this as a problem concerning therapeutic touch, rather than as a
statistical technique looking for an example.)
 These are the simplest models.
 Responses are distributed along one dimension, as opposed to two or more for the next
type of test.
 Example
 The following came from an excellent website named Chance
News, and the data that I am going to talk about come from an
article they cite in the New York Times.
 Therapeutic touch.

A child's paper poses a medical challenge. The New York
Times, 1
April, 1998, A1 Gina Kolata
{The paper actually appeared in the 1996 edition of JAMA, the abstract
of which is attached. dch. The full paper can be found at http://www.amaassn.org/scipubs/journals/archive/jama/vol_279/no_13/joc71352.htm.
In addition, a presentation on this issue for a Chance
Course Seminar can be found at http://www.math.yorku.ca/Who/Faculty/Brettler/public_html/chance/touch.html.}
The practice of therapeutic touch is used in hospitals all over the world and is taught in
medical and nursing schools. In this therapy, trained practitioners manipulate something
that they call the 'human energy field'. This manipulation is carried out without actually
touching the patient's body. Practitioners of this therapy claim that anyone can be
trained to feel this energy field.
Some researchers say that there exists no reliable evidence showing that this technique
heals patients. Dr. Donal O'Mathuna, a professor of bioethics and chemistry at the Mount
Carmel School of Nursing in Columbus, Ohio, has reviewed more than 100 papers and doctoral
dissertations on this technique without finding any convincing data.
James Randi, a magician who is a wellknown skeptic of some types of alternative medicine,
has been trying to test the practice of therapeutic touch for years. Only one person has
agreed to submit to his test, but when she was tested she did no better than chance in
detecting the energy field.
Emily Rosa, an 11yearold in Colorado, was able to recruit 21 practitioners of
therapeutic touch in an experiment she conducted two years ago. Emily's mother, who is a
nurse and a skeptic of this technique, believes that Emily was able to recruit this many
subjects because they did not feel threatened by a 9yearold girl working on a project
for a science fair.
Her test consisted of placing a screen between a subject's eyes and hands, and then
holding her own hand over one hand or the other of the subject. The premise of this
experiment is that if, in fact, the subject can feel Emily's energy field, then the
subject should be able to determine over which hand Emily's hand is being held. Emily
conducted 280 tests with 21 subjects, and they identified the correct location of her hand
in 44% of the tests.
The results of her study were reported this month in the Journal of the American Medical
Association. Reaction was swift from proponents of the therapeutic touch technique.
Meanwhile, Emily recently received a letter from the Guiness Book of World Records, saying
she may be the youngest person ever to publish a paper in a major scientific journal.
DISCUSSION QUESTIONS:
(1) One practitioner of therapeutic touch, in response to Emily's results, stated that
people who use this technique rely on more than just touch to sense the energy field. They
also use 'the sense of intuition and even a sense of sight'. Other users of this method
claim that patients who are ill have hot or cold spots in their energy fields in some
cases, or have areas that feel tingly. Can you design an experiment that allows the
practitioner to use senses of touch, sight, and intuition, and that still tests whether
the technique is a valid one?
(2) How likely is so poor a showing as 123 or fewer correct responses out of 280 tests,
given no real ability to detect the energy field?
<<<========<<

Emily Rosa
Emily giving her keynote speech at the 1998 Ig Nobel Award ceremonies.
Emily was also a 1998 invited speaker at the Ig Nobel prize ceremony in
Boston, a highly contested prize awarded by a committee associated with the Annals
of Improbable Research. The committee awarded an Ig Nobel prize, which is not a badge of
honor, to Doris
Kreiger for her paper on Therapeutic Touch.
Emily found a 44% accuracy rate out of 280 trials, which would mean that
subjects were correct on 123 trials, and incorrect on 157 trials. We want
to know if this result is more divergent from a 50/50 split than would be
expected by chance.
I recognize that this is actually a worse than chance outcome for
those who support TT, but we didn't know that before the experiment began. It
still makes sense to test the null hypothesis that the probability of a
correct response is .50.
 We are going to test the hypothesis that there would be 123 or fewer correct responses
out of 280 tests if subjects are not able to sense the presence of Emily's hand.
(We will also turn this into a twotailed test, which I much prefer.)
 We are actually ignoring a problem that I would not ignore in "real
life." Because the 280 responses came from 21 subjects, this means that the responses technically are not independent, although it is hard to
believe that if subjects cannot sense the presence of Emily's hand, the lack of
independence would create difficulties.
 Ask what difficulties it might create if there is such a thing as therapeutic touch.
 There are several different ways to do this.
 z test
 We have already talked about this. (Well, I wrote about it, for
those who have read the past notes.)
 I'm going to elaborate on the z test because it parallels
chisquare in the simple case.
 If we repeat this experiment an infinite number of times, where the probability of a
correct response is .50 and there are 280 trials, we'll have a distribution of outcomes.
 Notice that there have been many occasions this semester when we
actually did repeat an experiment many times. This is the first
time that we actually see what would happen without going through
all of those repetitions. That is really what statistical tests are
all about.
 These outcomes will be normally distributed, with a mean of Np and a variance
of Npq, where N = 280, p = .50, and q = 1.50 = .50
 These values come from what statisticians know about the
binomial distribution, which I mentioned about 2 classes back.
 This is a mean of 140 and a variance of 70, for a standard deviation of 8.367

 With a onetailed test, the probability of a z as low as 123 = .0212
 For a twotailed test (X < 123 or X >
157) this would be 2(.0212) = .0424
 We would reject the null hypothesis that subjects are responding at random.
 But notice, subjects are correct less often than "therapeutic
touch" model would predict. What do we do with this? (That's
an interesting problem.)
Contingency Tables
Again, present this as a problem looking for a solution, rather than a solution
looking for an example.
I'm going to start with the results of last Thursday's lab
 We first generated data with the null hypothesis true for Siegel's
study.
 This was the study in which rats were given injections of morphine in the
same, or different, environment in which they had built up tolerance.
 Siegel's data are categorized on two dimensions (Group 1 or Group 2, and Survive or Die).
 His actual results are shown below:

Survived 
Died 
Totals 
Group 1 
21 
9 
30 
Group 2 
11 
19 
30 
Totals 
32 
28 
60 
 If context makes no difference, the probability of the survival should be
the same in both groups.
 We made that probability = .5333 because that was Siegel's overall
survival rate.
 Notice that we are replicating the results to be expected when the
null is true.
 We generated 15 chisquare values per student, for 135 values overall.
 The results followthey are pooled across the last three years.:
Notice the shape of this distribution. It decreases at a negatively
accelerated rate. Very few of the values are greater than about 4. In a perfect
chisquare it would lie at 3.84.
Notice how this distribution so closely matches the previous chisquare
distribution with 1 df.
I'll come back to what this would look like when the null is false later.
Another Example
 Friedman, Katcher, Lynch, and Thomas (1980) did an interesting study on the effect of
having a pet for people recovering from heart attacks. I don't recall whether they
supplied a pet or just found people who did, and did not, have pets.
 What difference would this make from a methodological perspective?
 They found 92 people who had recently had a heart attack, and classified them in terms
of whether or not they had a pet. They then determined whether these people were alive one
year later.
 Here we have two variables of classification:
 Pet (yes/no)
 Alive/Dead
 Notice that in this case the row frequencies are not equal. That
is not a problem, and, in fact, it's kind of nice that so few people
died.
 We want to test the null hypothesis that Pet and Survival are independent.

Pet 


Yes 
No 
Total 
Alive 
50 
28 
78 
Dead 
3 
11 
14 
Total 
53 
39 
92 
The next task is to find the expected frequencies if subjects fall
in cells at random within the constraints of the row and column
totals..
 If rows and columns are independent, the multiplicative law of
probability tells us that the probability of falling in row_{1} col_{1}
= the product of the probability of row_{1} times the probability of col_{1}.
 p(row_{1} ) = freq(row_{1} )/N = 78/92 = .848
 p(col_{1} ) = freq(col_{1})/N = 53/92 = .576
 Then, the p_{11} = .848*.576 = .4885.
 If there are 92 subjects overall, then .4885*92 = 44.94 would be expected to fall in
cell_{11}
 We can put this into a formula:
 E_{11}=(Row_{1}*Col_{1})/N
 Or, in the general case, E_{ij}=(Row_{i}*Col_{j})/N
 For Row_{2}Col_{1} E_{21}=14*53/92 = 8.07
 Filling in the rest of the cells, we get
 Expected Frequencies

Pet 


Yes 
No 
Total 
Alive 
44.94 
33.06 
78 
Dead 
8.07 
5.93 
14 
Total 
53 
39 
92 
We will use the same formula for chisquare, but this time we will calculate it over the
four cells of the table.

 For a contingency table, df = (R1)(C1), which is this case
is 1.
 We already know that with 1 df the critical value of chisquare = 3.84.
 So we will reject our null hypothesis and conclude that there is a relationship between
having a pet and living for a decent length of time after a heart attack.
More Complex Contingency Tables
 These are data from Jody Kamon (1998), but I don't recall where she got them.
 The experiment involves the relationship between problem behavior in children and their
parents.
 Kamon (?) classified parents with respect to whether they exhibited Antisocial
personality Disorder (APD)
 She also classified children with respect to whether or not they were diagnosed as
Conduct Disorder (CD), Oppositional Defiant Disorder (ODD) or no problem (Control)
 Observed Frequencies

Child's Diagnosis 

Parent's Diagnosis 
CD 
ODD 
Control 
Total 
APD 
27 
16 
3 
46 
NonAPD 
41 
54 
36 
131 
Total 
68 
70 
39 
177 
We calculate the expected frequencies in exactly the same way we did above. These are
given in the following table.
 Expected Frequencies

Child's Diagnosis 

Parent's Diagnosis 
CD 
ODD 
Control 
Total 
APD 
17.67 
18.19 
10.14 
46 
NonAPD 
50.33 
51.81 
28.86 
131 
Total 
68 
70 
39 
177 

 Here we have (21)(31) = 2 df.
 The critical value of chisquare with 2 df = 5.99
 Again we will reject the null hypothesis and conclude that the diagnosis of the child is
not independent of the diagnosis of the parent.
 If we look at the data we see that children are more likely to be diagnosed as CD
or ODD if their parents have a diagnosis of APD.
 We could check this better if we combined the CD and ODD cells, which I'll do next with
SPSS.
 Notice that I slipped into a table larger than 2 X 2 without any problem.
SPSS Analysis
 First we need to create the data file without combining anything..
 Enter a column for Child and a column for Parents
 You can enter CD, ODD, etc instead of numbers, but you need to create all six cells.
 Then create a column called Freq (or whatever) and enter the cell frequencies.
 Go to the Data menu entry and select "weight cases" Tell it to weight cases by
freq.
 Analyze/Descriptive statistics/CrossTabs, put child on columns and parents on
rows.
 Be sure to click on statistics and tell it to compute chisquare.
 The data look as follows:

 The printout would be as follows.
That chisquare is, within
rounding, the same as we calculated above.
Then I recoded Child into NewChild,
making CD and ODD into Problem and leaving Control as Control.
(Note: I had to specify that the
new variable was a string variable.)
The results follow:
Interpret this result.
Measuring the size of an effect
One of the most important recent developments in behavioral statistics
is the emphasis on effect sizes in addition to (if not in place of) statistical
hypothesis tests.
When it comes to contingency tables, perhaps the best measure is the odds
ratioespecially for a 2 X 2 table.
Odds Ratios
Likelihood ratio
tests
This is an alternative way of
calculating a c^{2} statistic as a test of the null hypothesis.
 Some evidence that it is a better statistic than
Pearson’s Chisq. with small sample sizes, but I doubt that. The following is a quote
from Agresti, 1990 (p. 49)
 "When independence holds, the Pearson statistic c^{2}
and the likelihoodratio statistic G^{2} have asymptotic chisquared distributions
with df = (I1)(J1). In fact, c^{2} and
G^{2} are asymptotically
equivalent in that case: c^{2} 
G^{2} converges in probability to zero.
The limiting results for multinomial sampling also apply to the other sampling schemes...
"It is not simple to describe the sample
size needed for the chisquared distribution to approximate well the exact distributions
of c^{2} and
G^{2}. For a fixed number of cells,
c^{2} usually
converges more quickly than G^{2}. The chisquared approximation is usually poor
for G^{2} when N/IJ < 5. When I or J is large, it can be decent for
c^{2}
for n_{ij} as small as 1, if the table does not contain both very small
and moderately large expected frequencies...."
Likelihood ratio chisquare is heavily
used in loglinear models, which are like Anova for categorical data.
I give the formula in the text
Explain formula
Do this by hand on the AIDS data.

Survived 
Died 
Total 
Ritonavir 
472
(433.90) 
71
(109.10) 
543 
Placebo 
399
(437.10) 
148
(109.90) 
547 
Total 
871 
219 
1090 
c^{2}
= 2[472*ln(1.088) + 71*ln(0.651) + 399*ln(0.913) + 148*ln(1.347)]
= 2[16.99] = 33.99
which is quite close to the Pearson
chisquare of 33.18.
The SPSS output is
Last revised: 10/01/01
