The "Monte Hall three door problem" is an interesting, and generally puzzling, problem that has been around for years. In its most recent turn on the stage it was brought up by Maryln Vos Savant in an article she wrote in the popular press (source unknown). This problem generated an enormous correspondence over the Internet by statistically-minded types, among others. Most of those responding argued that she was out of her mind, and I'm afraid that many of them still believe that--though they should know better. She is correct. If you don't believe me, will you believe Click and Clack--the Tappit brothers? Their web page at http://cartalk.com/About/Monty/ contains a great demonstration. And if you can't believe the guys on Car Talk, who can you believe?

As originally conceived, the problem can be stated in the following way. Imagine that you are part of a television audience and have three doors on the stage in front of you. The master of ceremonies promises you that there is a car hidden behind one of the doors and invites you to guess which door. Suppose you guess Door #3. After you guess, the master of ceremonies informs you that, of the two doors remaining, the car is not behind Door #1. She then invites you to change your mind and go with Door #2 rather than Door #3. Should you make the switch, should you stay with what you have chosen, or does it make any difference, anyway?

The simple, and often non-intuitive answer, is that it you will have a better chance of winning if you switch. There have been many attempts to explain the issues involved so that people will say "Oh, Of Course!!," but they rarely work. I watched for days as a bunch of statisticians on the Internet went through exactly the same kinds of arguments my Psychology 1 students did to try to explain why it didn't make the slightest difference whether you switched or not.

Before I give my explanation, which may satisfy a small percentage of my audience at best, let's remember that many of us are supposed to be psychologists, who like to think that they learn about the world by observing it. Better yet, let's observe it by carrying out an experiment. If we had lots of money to buy lots of cars, we could actually run this experiment a whole bunch of times with real people and see if the people who switch win more often than those who don't switch. Or, in class you could do what I do in Psych 1 and put a dollar in one of three envelopes, have someone choose, etc. But even that costs money and takes time. So why not use a computer to run the experiment a large number of times, sometimes switching and sometimes staying, and see what happens. We can use a random number generator to select the winning "door" and to make the decisions, and that approach will have direct analogies to what humans do.

I wrote a small SPSS program to run this experiment, and you are welcome to download it. This program randomly chooses what will be the winning door, and then chooses a door for the participant. That automatically means that the Master of Ceremonies has the other two doors. It then decides, again randomly, whether the participant will switch on this trial. If a switch is to be made, the participant is given the door that the master of ceremonies did not declare to have nothing behind it. (If you carefully work through the logic of the "IF" statements in that program, you will get a pretty good idea of why the results come out as they are.) All that is left to do is to decide whether the participant one or lost, and break that down by whether he or she had switched on that trial.

I ran this program using 10,000 trials, which I hope you would admit is a pretty good test. The results I received are shown below. (If you run the program, your results will look pretty much like mine, though not identical.) From the following table you will see that on the 5036 trials when the participant did not switch, she won 1643 times, which comes out to 32.6% of the time. (Notice that this is remarkably close to 1/3 of the time. On the 4964 trials on which she did switch, she won 3262 times, or 65.7% of the time. (And this is very close to 2/3.) In other words, she won about 2/3 of the time when she switched and about 1/3 of the times when she didn't switch. Therefore it is decidedly to your advantage to switch when given the chance.

SWITCH by WIN WIN Page 1 of 1 Count | Row Pct | | Row | No | Yes | Total SWITCH --------+------+------+ No | 3393| 1643| 5036 | 67.4| 32.6| 50.4 +------+------+ Yes | 1702| 3262| 4964 | 34.3| 65.7| 49.6 +------+------+ Column 5095 4905 10000 Total 51.0 49.1 100.0

Most people have trouble understanding why the results should come out this way, even if they are willing to admit that they do. There are probably as many explanations of this as there are explainers, but I just have to put in my two cents.

Suppose that you are the participant and I am the master of ceremonies. You pick a door. Now you would probably agree that you have a 1/3 chance of winning at that point, while I have a 2/3 chance of winning. So presumably you would rather have my two doors than your one door, and you're right. But you know that only one of my doors can be a winner. When I peek behind my two doors and swing open one that doesn't have a car, why wouldn't you still rather be me. You know I had (at least) one empty one, and telling you which one it is shouldn't really make a difference. You knew all along that I had a better deal than you, and that I had to have at least one empty door. Knowing which door is the empty one doesn't change the fact that I'm better off than you are. It just tells you exactly which of my doors you want.

An interesting link on the Internet that relates to this problem is the Monty Hall 3-Door problem. It offers the opportunity to play the gamble repeatedly, as well as a discussion of the problem and another explanation.

If you have not yet covered hypothesis testing and the chi-square test, you can stop right here. But this example does give us a nice opportunity to illustrate one use of the chi-square test, and that's what I'll do next.

It's very nice when you can run simple experiments on a computer and generate 10,000 replications. It doesn't take much of a genius to tell that when 67.4% of the non-switchers lose and 65.7% of the switchers win, something real must be going on here. But most of our experiments are not performed with such large samples. A much more likely way of doing this would be to take 25 pairs of students in a class and have them perform some reasonable variation on this study. (For instance, they could each use three envelopes containing one smiley face and two sad-sack faces.) Suppose that we did that experiment with 25 pairs and found the following results.

SWITCH by WIN WIN Page 1 of 1 Count | Row Pct | | Row | No | Yes | Total SWITCH --------+------+------+ No | 6| 4| 10 | 60.0| 40.0| 40.0 +------+------+ Yes | 5| 10| 15 | 33.3| 66.7| 60.0 +------+------+ Column 11 14 25 Total 44.0 56.0 100.0

Here we see that there was a tendency to win decidedly more often when you switched, but there was a relatively even distribution of win/lose when you did not switch. I would not be very comfortable about saying that this experiment has shown me a definitive result.

But think what our null hypothesis would be in such a study. We would be testing the
null hypothesis that the distribution of win/lose is ithe *same* whether we switch or
whether we stay with our first choice. (I have not known anyone to seriously argue that we
would actually be *better off* by staying, only that it wouldn't make any
difference.) But this is precisely what a chi-square test does a good job of testing. It
asks if the distribution of responses at one level of an independent variable is the same
as that distribution at another level of that variable.

Our standard formula for chi-square is

When we apply that formula to our data we have

These data have produced a chi-square = 1.73. We have (2 - 1)(2 - 1) = 1 degree of
freedom because each variable had two levels. From a standard chi-square table we find
that for 1 *df* the critical value of chi-square is 3.84. Since our value doesn't
exceed 3.84, we cannot reject the null hypothesis. We would have to conclude that our data
do not allow us to decide that the chances of winning are greater under one strategy
(i.e., switch) than under the other.

If you don't want to run the chi-square test by hand, you can add the "/Statistics
chisq" command to the Crosstabs procedure in SPSS. This will produce the following
results.

Chi-Square Value DF Significance -------------------- ----------- ---- ------------ Pearson 1.73160 1 .18821 Continuity Correction .81845 1 .36563 Likelihood Ratio 1.74083 1 .18703 Fisher's Exact Test: One-Tail .18306 Two-Tail .24063 Minimum Expected Frequency - 4.400 Cells with Expected Frequency less than 5 1 OF 4 ( 25.0%) Notice that these results replicate those we obtained by hand. In addition we have several other statistics for evaluating the truth or falsity of the null hypothesis. I would argue against the Coninuity Correction or Fisher's Exact Tests here, for the same reason I did in theMethodsbook--the marginal totals cannot be thought of as fixed.

We already know that when we ran the experiment with 10,000 trials we clearly showed that the chances of winning and losing were different under the two strategies (Switch/Stay). But that did not happen here. The reason is very simple; when we have small sample sizes it is more difficult to reject the null hypothesis. With small samples the random error in our data tends to overwhelm the true underlying pattern of the results, and the answer is sometimes obscure. With very large sample sizes, the random noise tends to cancel itself out, and the systematic nature of the data shines through.

What we are talking about here has much to do with "power." Small experiments
are not as powerful as large ones. But this leads to a very nice demonstration of power.
If you take the SPSS program that I have provided (or write a similar one for other
software) and run the experiment with 25 trials several times, I think you will be
surprised at how much the results jump around, and how eratic the *p* value
associated with chi-square is. If you do this a whole bunch of times (or make it a class
project and collect data from everyone), the percentage of times that the null hypothesis
is rejected will give you a pretty good estimate of the power of that experiment. Now if
you increase the number of trials from 25 to 100, you will start to see the variability
die down and the power increase. Try it; you'll learn something.

Another way of looking at the power of the chi-square test in this situation is to use the program called G*Power, discussed elsewhere, to draw a graph of the power of the chi-square test for this effect size (.50) as a function of sample size. This graph can be downleaded if you wish. (I am assuming that you are testing the null hypopothesis that the probability of the participant winning is .3333 (and losing is .6667) regardless of whether he/she stays with the first choice or switches.) I think it would be quite useful for students to calculate power empirically, as in the previous paragraph, and compare their results to those predicted by G*Power.

I wish that I could think that you are all convinced that switching is the best strategy, but I'm sure you're not. But I hope you have seen several things.

- Some questions can be answered empirically--just try something out and see what happens.
- Experiments are a useful way to knowledge.
- Small samples are less likely to produce reliable (and significant) results than large samples.
- The chi-square test is a useful way to compare the distribution of responses under different levels of the other independent variable.
- You can model the power of an experiment by simulating it on a computer.

Return to Dave Howell's Statistical Home Page

University of Vermont Home Page

Send mail to: David.Howell@uvm.edu)

Last revised: 7/11/98