|
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A child's paper poses a medical challenge. The New York
Times, 1
April, 1998, A1 Gina Kolata
{The paper actually appeared in the 1996 edition of JAMA, the abstract
of which is attached. dch. The full paper can be found at http://www.ama-assn.org/sci-pubs/journals/archive/jama/vol_279/no_13/joc71352.htm.
In addition, a presentation on this issue for a Chance
Course Seminar can be found at http://www.math.yorku.ca/Who/Faculty/Brettler/public_html/chance/touch.html.} |
Emily Rosa
Emily giving her keynote speech at the 1998 Ig Nobel Award ceremonies.Emily was also a 1998 invited speaker at the Ig Nobel prize ceremony in Boston, a highly contested prize awarded by a committee associated with the Annals of Improbable Research. The committee awarded an Ig Nobel prize, which is not a badge of honor, to Doris Kreiger for her paper on Therapeutic Touch.
Emily found a 44% accuracy rate out of 280 trials, which would mean that subjects were correct on 123 trials, and incorrect on 157 trials. We want to know if this result is more divergent from a 50/50 split than would be expected by chance.I recognize that this is actually a worse than chance outcome for those who support TT, but we didn't know that before the experiment began. It still makes sense to test the null hypothesis that the probability of a correct response is .50.
- We are going to test the hypothesis that there would be 123 or fewer correct responses out of 280 tests if subjects are not able to sense the presence of Emily's hand. (We will also turn this into a two-tailed test, which I much prefer.)
- We are actually ignoring a problem that I would not ignore in "real life." Because the 280 responses came from 21 subjects, this means that the responses technically are not independent, although it is hard to believe that if subjects cannot sense the presence of Emily's hand, the lack of independence would create difficulties.
- Ask what difficulties it might create if there is such a thing as therapeutic touch.
- There are several different ways to do this.
- z test
- We have already talked about this. (Well, I wrote about it, for those who have read the past notes.)
- I'm going to elaborate on the z test because it parallels chi-square in the simple case.
- If we repeat this experiment an infinite number of times, where the probability of a correct response is .50 and there are 280 trials, we'll have a distribution of outcomes.
- Notice that there have been many occasions this semester when we actually did repeat an experiment many times. This is the first time that we actually see what would happen without going through all of those repetitions. That is really what statistical tests are all about.
- These outcomes will be normally distributed, with a mean of Np and a variance of Npq, where N = 280, p = .50, and q = 1-.50 = .50
- These values come from what statisticians know about the binomial distribution, which I mentioned about 2 classes back.
- This is a mean of 140 and a variance of 70, for a standard deviation of 8.367
- With a one-tailed test, the probability of a z as low as 123 = .0212
- For a two-tailed test (X < 123 or X > 157) this would be 2(.0212) = .0424
- We would reject the null hypothesis that subjects are responding at random.
- But notice, subjects are correct less often than "therapeutic touch" model would predict. What do we do with this? (That's an interesting problem.)
- Chi-square test.
- An alternative test is to use the chi-square distribution.
- We calculate the number of correct and incorrect responses we would expect if the null were true, and then we compare that with the number of obtained responses.
- If H0 were true, we would expect 140 correct responses and 140 incorrect responses.
Correct Incorrect Total Observed 123 157 280 Expected 140 140 280 - This statistic follows a chi-square distribution, which depends on the degrees of freedom. For the goodness-of-fit test, the degrees of freedom = c - 1, where c is the number of categories. So we have 2-1 = 1 df for this case.
- Such a distribution is shown below, though I had to draw in the curve for 1 df by hand. This one came from David Lane's Hyperstat site:
http://www.ruf.rice.edu/~lane/rvls.html
- (This is purely an aside, but if you go to http://www.ruf.rice.edu/~lane/stat_analysis/index.html and click on Analyze, it will shortly give you a window in which one of your choices is "Datasets." Click on that and proceed.)
- You could do much worse than wasting a few minutes looking at David Lane's site. It is pretty impressive.
:
(My distribution for 1 df should be displaced a bit to the left and down.)
- You can probably guess from this curve for 1 df that a chi-square of 4.13 or greater is not very likely.
- In fact, the cutoff of the 5% area with 1 df = 3.84.
- We will again reject our null hypothesis because 4.13 > 3.84.
- We can conclude that the number of correct guesses departs from what we would expect under the null.
- But keep in mind that we are rejecting the null because there are too few correct choices.
- Chi-square and z
- In the case where we have only two categories (right and wrong), the z test and the chi-square test turn out to be exactly equivalent, though the chi-square is by nature a two-tailed test..
- The chi-square distribution for 1 df is just the square of the z distribution.
- sqrt(4.129) = 2.032)
- Note that we had a z = -2.032. Squaring this gives 4.129, which, within rounding, is our chi-square.
- Note also that the critical value of z = 1.96, which, when squared, = 3.8416 = the critical value of chi-square.
- Then why bother with chi-square?
- The equality only holds if we have 1 df.
- If we asked subjects to say "right", "left", or "middle", and if Emily chose to put her hand in those three positions 1/3 of the time, then we would have expected frequencies of 93.33, and would have 2 df for our chi-square. I'm not going to show that here, but it is just a simple extension of what we have done. The z test would no longer apply.
Contingency Tables
Again, present this as a problem looking for a solution, rather than a solution looking for an example.
I'm going to start with the results of last Thursday's lab
- We first generated data with the null hypothesis true for Siegel's study.
- This was the study in which rats were given injections of morphine in the same, or different, environment in which they had built up tolerance.
- Siegel's data are categorized on two dimensions (Group 1 or Group 2, and Survive or Die).
- His actual results are shown below:
Survived
Died
Totals
Group 1
21
9
30
Group 2
11
19
30
Totals
32
28
60
- If context makes no difference, the probability of the survival should be the same in both groups.
- We made that probability = .5333 because that was Siegel's overall survival rate.
- Notice that we are replicating the results to be expected when the null is true.
- We generated 15 chi-square values per student, for 135 values overall.
- The results follow--they are pooled across the last three years.:
Notice the shape of this distribution. It decreases at a negatively accelerated rate. Very few of the values are greater than about 4. In a perfect chi-square it would lie at 3.84.
Notice how this distribution so closely matches the previous chi-square distribution with 1 df.
I'll come back to what this would look like when the null is false later.
Another Example
- Friedman, Katcher, Lynch, and Thomas (1980) did an interesting study on the effect of having a pet for people recovering from heart attacks. I don't recall whether they supplied a pet or just found people who did, and did not, have pets.
- What difference would this make from a methodological perspective?
- They found 92 people who had recently had a heart attack, and classified them in terms of whether or not they had a pet. They then determined whether these people were alive one year later.
- Here we have two variables of classification:
- Pet (yes/no)
- Alive/Dead
- Notice that in this case the row frequencies are not equal. That is not a problem, and, in fact, it's kind of nice that so few people died.
- We want to test the null hypothesis that Pet and Survival are independent.
Pet Yes No Total Alive 50 28 78 Dead 3 11 14 Total 53 39 92
The next task is to find the expected frequencies if subjects fall in cells at random within the constraints of the row and column totals..- If rows and columns are independent, the multiplicative law of probability tells us that the probability of falling in row1 col1 = the product of the probability of row1 times the probability of col1.
- p(row1 ) = freq(row1 )/N = 78/92 = .848
- p(col1 ) = freq(col1)/N = 53/92 = .576
- Then, the p11 = .848*.576 = .4885.
- If there are 92 subjects overall, then .4885*92 = 44.94 would be expected to fall in cell11
- We can put this into a formula:
- E11=(Row1*Col1)/N
- Or, in the general case, Eij=(Rowi*Colj)/N
- For Row2Col1 E21=14*53/92 = 8.07
- Filling in the rest of the cells, we get
- Expected Frequencies
Pet Yes No Total Alive 44.94 33.06 78 Dead 8.07 5.93 14 Total 53 39 92
We will use the same formula for chi-square, but this time we will calculate it over the four cells of the table.
- For a contingency table, df = (R-1)(C-1), which is this case is 1.
- We already know that with 1 df the critical value of chi-square = 3.84.
- So we will reject our null hypothesis and conclude that there is a relationship between having a pet and living for a decent length of time after a heart attack.
More Complex Contingency Tables
- These are data from Jody Kamon (1998), but I don't recall where she got them.
- The experiment involves the relationship between problem behavior in children and their parents.
- Kamon (?) classified parents with respect to whether they exhibited Antisocial personality Disorder (APD)
- She also classified children with respect to whether or not they were diagnosed as Conduct Disorder (CD), Oppositional Defiant Disorder (ODD) or no problem (Control)
- Observed Frequencies
Child's Diagnosis Parent's Diagnosis CD ODD Control Total APD 27 16 3 46 Non-APD 41 54 36 131 Total 68 70 39 177
We calculate the expected frequencies in exactly the same way we did above. These are given in the following table.
- Expected Frequencies
Child's Diagnosis Parent's Diagnosis CD ODD Control Total APD 17.67 18.19 10.14 46 Non-APD 50.33 51.81 28.86 131 Total 68 70 39 177
- Here we have (2-1)(3-1) = 2 df.
- The critical value of chi-square with 2 df = 5.99
- Again we will reject the null hypothesis and conclude that the diagnosis of the child is not independent of the diagnosis of the parent.
- If we look at the data we see that children are more likely to be diagnosed as CD or ODD if their parents have a diagnosis of APD.
- We could check this better if we combined the CD and ODD cells, which I'll do next with SPSS.
- Notice that I slipped into a table larger than 2 X 2 without any problem.
SPSS Analysis
- First we need to create the data file without combining anything..
- Enter a column for Child and a column for Parents
- You can enter CD, ODD, etc instead of numbers, but you need to create all six cells.
- Then create a column called Freq (or whatever) and enter the cell frequencies.
- Go to the Data menu entry and select "weight cases" Tell it to weight cases by freq.
- Analyze/Descriptive statistics/CrossTabs, put child on columns and parents on rows.
- Be sure to click on statistics and tell it to compute chi-square.
- The data look as follows:
![]()
- The printout would be as follows.
That chi-square is, within rounding, the same as we calculated above.
Then I recoded Child into NewChild, making CD and ODD into Problem and leaving Control as Control.
(Note: I had to specify that the new variable was a string variable.)
The results follow:
Interpret this result.
Measuring the size of an effect
One of the most important recent developments in behavioral statistics is the emphasis on effect sizes in addition to (if not in place of) statistical hypothesis tests.
When it comes to contingency tables, perhaps the best measure is the odds ratio--especially for a 2 X 2 table.
Odds Ratios
Likelihood ratio tests
For 2
X2 tables this is one of my favorite topics.Define Odds: (# positive outcomes)/(# of negative outcomes)
Using Jody's study with the data for kids reclassified to Problem and No Problem.
- The odds of being a Problem given that you have a parent with APD are
- 43/3 = 14.333
- The odds of being Problem given that you do not have a parent who is APD are
- 95/36 = 3.639
- Notice that odds are conditional on something, just like conditional probabilities
- Notice that odds are not a proportion or a probability, because the denominator is not the total, but the number in the other category.
- Clearly the odds of being a problem, given that you have parent with APD are higher than the odds of being a problem given that parent is not APD. But, how much higher?
- The odds ratio is just what it sounds like, a ratio of odds.
- The odds ratio here is 14.333/3.639 = 3.94
- This can be interpreted to mean that you are almost 4 times more likely to be a Problem if your parent was classed as APD. That's pretty impressive.
- Going back to the Ritonavir example that we used about two weeks ago,
Improved
Died or worse
Total
Ritonavir
472
71
543
Placebo
399
148
547
Total
871
219
1090
Chi-square = 33.18, which is clearly significant.- The odds of dying in the Ritonavir group = 71/472 = .15
- The odds of dying in the Placebo group = 148/399 = .37
- The odds ratio of the Placebo relative to the Ritonavir group = .37/.15 = 2.47 ~ 2.5, meaning that an AIDS patient was about 2 1/2 times more likely to die if he/she was in the Placebo group than in the Control group,
- If we took the ratio the other way around, as .15/.37 = .405, we would say that your odds of surviving are only 40.5% as high if you are in the placebo group than if you are in the Ritonavir group. That is simply 1/2.47
- Point out that you would be talking about the same odds, and the same odds ratios (in reverse) if you talked about surviving.
- Odds surviving in Ritonavir = 472/71 = 6.648 = 1/.15.
- Odds surviving in Control = 399/148 = 2.696 = 1/.37
- Comment on odds ratios with larger contingency tables.
This is an alternative way of calculating a c2 statistic as a test of the null hypothesis.
- Some evidence that it is a better statistic than Pearsons Chi-sq. with small sample sizes, but I doubt that. The following is a quote from Agresti, 1990 (p. 49)
- "When independence holds, the Pearson statistic c2 and the likelihood-ratio statistic G2 have asymptotic chi-squared distributions with df = (I-1)(J-1). In fact, c2 and G2 are asymptotically equivalent in that case: c2 - G2 converges in probability to zero. The limiting results for multinomial sampling also apply to the other sampling schemes...
"It is not simple to describe the sample size needed for the chi-squared distribution to approximate well the exact distributions of c2 and G2. For a fixed number of cells, c2 usually converges more quickly than G2. The chi-squared approximation is usually poor for G2 when N/IJ < 5. When I or J is large, it can be decent for c2 for nij as small as 1, if the table does not contain both very small and moderately large expected frequencies...."
Likelihood ratio chi-square is heavily used in log-linear models, which are like Anova for categorical data.
I give the formula in the text
Explain formula
Do this by hand on the AIDS data.
Survived
Died
Total
Ritonavir
472
(433.90)71
(109.10)543
Placebo
399
(437.10)148
(109.90)547
Total
871
219
1090
c2 = 2[472*ln(1.088) + 71*ln(0.651) + 399*ln(0.913) + 148*ln(1.347)]
= 2[16.99] = 33.99
which is quite close to the Pearson chi-square of 33.18.
The SPSS output is
Last revised: 10/01/01