Chi-Square

9/27/01

This lab is a combination of a lab on hypothesis testing and a lab on Chi-Square. We haven’t discussed the chi-square test yet; we will do that on Tuesday. But I want you to see more about hypothesis testing, something more about sampling distributions, and something about chi-square, so we are going to take a very simple experiment and replicate it a number of times. The point is to see what happens over repeated experiments conducted under known conditions. In particular, what happens when the null hypothesis is true. (You are probably getting tired of running replications (Monte Carlo studies) like this, but they do illustrate some very important concepts.)

In a study we will consider at great length this year, Siegel (1988) examined the role of context on the effect of morphine. The study has relevance both to clinical and experimental psychology, and everyone needs to be familiar with the general hypotheses. Siegel's prior results had shown that if you give a rat morphine in a familiar setting, the rat develops tolerance. Over a period of time you can build up the tolerance to morphine to the point that a rat is receiving massive doses with no lethal effect. Siegel hypothesized that if you now gave the rat its now-accustomed massive dose in a novel environment, the effects would be greatly magnified. (I'll talk about the theory behind this prediction when we discuss a related study by him later in the semester.)

Siegel used 3 groups in his study, but we’re going to eliminate one of those groups to simplify the experiment. Subjects in Group 1 were given increasing doses of morphine in Room A, and built up a tolerance to the drug. They were then given a test trial in that same room (Room A). Group 2 was treated exactly the same way, except that they received their test trial in a different room (Room B). If Siegel was correct about the role of context, those animals who received their test trial in a different context (Room B) would be much more susceptible to the effects of large doses of morphine, and many of them would die from it (just as do heroin addicts who shoot up in novel environments).

Siegel had 30 rats in each group, and the data below show the number of rats in each group who survived, and the number who died. (In the second table I have converted the frequencies to percentages within each row.) Are these the kind of data that we would expect if the null hypothesis were true? By that I mean, are these the kind of data we would expect if the probability of survival is the same in each condition.

 

Survived

Died

Totals

Group 1

21

9

30

Group 2

11

19

30

Totals

32

28

60

The dependent variable is the number of rats who survived and the number who died.

 

Survived

Died

Totals

Group 1

.70

.30

1.00

Group 2

.367

.633

1.00

Average

.533

.467

1.00

 30% of the rats in Group 1, and 63% of the rats in Group 2 died from morphine overdose.

 

Now we are going to forget Siegel's actual data for a moment and create some new data that we would expect to find if the null hypothesis is true, and then compute data we would expect to find if the null is false. For each set of data we will calculate a bunch of chi-square test statistics. I want you to see what these statistics look like under the true and the false null hypotheses. Finally, we will go back to Siegel's data and calculate a chi-square for them and draw a conclusion.

Null hypothesis true:

We are going to start with the assumption that the probability of survival is the same in each condition. Since 53.33% of Siegel's rats survived overall, we will assume that .5333 is the probability of survival in each condition. (Note that we have just stated the assumption that the null hypothesis is true.)

For the null = true condition, we will create 30 subjects in each of 2 groups (N = 60), and we will set the probability of survival = .5333 for each group. Each of you will repeat this experiment 15 times, for a total of 150 replications. Then we will look at the combined results as an illustration of the chi-square distribution.

We will do this by using SPSS to draw random samples. The software (syntax) will assign pseudo-rats to the Survived vs. Died outcome on the basis of random numbers. (For example, if I draw numbers uniformly distributed between 0 and 1, I will call a Group 1 animal a survivor if his/her random number is < .5333. Otherwise he will be classed as a victim of drug overdose. Because we are generating results under conditions where the null hypothesis is true, I will also call a Group 2 animal a survivor if his/her random number is < .53333. Otherwise he will be classed as Died. You should be able to see that over the long term this will mean that 53.3% of the animals in both groups will survive, and 46.7% of the animals in both groups will die. However, that won’t necessarily be the result for any given sample of 30 rats In fact, it probably won't be.

The following program looks a bit clumsy because it generates data separately for the two groups, even though the population proportions of survival are the same. I have done that so that it is simple to modified the program for a false null hypothesis.


To generate data, do the following:

Start SPSS

Set a random seed. You can just go to Transform/Random Number Seed, and take the default, although we had problems with that. Just type in a big number.. 

Create a variable named Group with 30 1’s and 30 2’s. This just sets up the data file for 60 animals and assigns them to groups.

Now we need to create some outcomes. We will first give everyone a 1, to make them all into survivors. That is just for a starting point. Then we will kill off a bunch. Enter the following as syntax statements--note carefully the placement of parentheses. Note also that "outcomexx" is spelled "outcomxx" so as not to exceed 8 characters.

COMPUTE outcom1 = 1 .
IF (((Group = 1) and (rv.uniform(0,1) gt .5333)) or ((Group = 2) and
(rv.uniform(0,1) gt .5333))) outcom1 = 2 .
EXECUTE .

(Note, the notation "gt" means "greater than," just as "ge" means greater than or equal to." Be sure to leave spaces around "ge", "and", and "or.")

Next you want to copy and paste this 14 more times, editing it to create variables named outcom2, ..., outcom15. 

To see what the results look like, invoke Analyze/Descriptive statistics/CrossTabs. Put Group on the Rows, and Outcom1, Outcom2, ..., Outcom15 on the columns. You must also click the Statistics button at the bottom of the dialog box, and then chose chi-square. You will get fifteen tables that look like the one above, each of which will have a chi-square statistic.

The chi-square statistic is a measure of the degree to which the Survive/Die ratio is the same or different in the two groups. (If exactly the same number died in each group, chi-square would be 0. If the two groups have quite different survival rates, the chi-square values should be large.). 

Neatly record your tables and their corresponding chi-square values (to two decimal places), and pass me a sheet with the fifteen chi-squares on it. 

STOP!!! Go back and reread that last sentence about the number of decimal places!! Surprisingly, everyone got that right last week. That is a first!!

I’ll record the data and make the file available to you. (You do not have to give me the actual cell frequencies, only the values of chi-square. Don't run away when you have done that, because I want to give the compiled results back to you.

After the results have been compiled into a single data file, you should plot a histogram of the resulting chi-square values. We will then discuss chi-square in class on Tuesday, and you can see if your results looked like what you should expect with repeated sampling.

Print out the output page so that you have a record of your results. 

Now I want you to do this all over again, but this time generate data where the null hypothesis is false. In this case the probability of dying is much lower for one group than for the other. 

Saying "the null is false" is very imprecise, because it doesn't say how false. I have to pick some values, and I'll use 70% survivors for the Same Context group, and 63.33% survivors for the Different Context group. That is a pretty small difference, but probably one that is big enough to see. 

We will probably not have time to generate 15 sets of results when the null is false, but the following code would do so. 

COMPUTE outcom1 = 1 .
IF (((Group = 1) and (rv.uniform(0,1) gt .
70)) or ((Group = 2) and (rv.uniform(0,1) gt .6333))) outcom1 = 2 .
EXECUTE .
70)) or ((Group = 2) and (rv.uniform(0,1) gt .6333))) outcom1 = 2 .
EXECUTE 

You can create 15 copies of these statements, modify them to label the variables outcome1 ... Outcom15, compute the data, and then compute the tables and chi-square. Note that these chi-square values are appreciably higher on average, but there are very likely to be one or two small values as well.

Now I want you to calculate chi-square for Siegel's data. Add a column labeled Siegel, and put in 1's and 2's in correspondence with the table at the beginning of this assignment. In other words, you will have 21 1's and 9 2's for the animals in Group 1, and 11 1's and 19 2's for the animals in Group 2. Now run chi-sq on this.

 

I have created a page showing the results of the first part of this lab. It contains both the histogram and the frequency distribution. You could recreate the output complete from the frequency distribution.

 

Last revised: 09/27/01