Logo

Old Faithful at Yellowstone

A Bimodal Distribution

David C. Howell

bar bar

An interesting example of a binomial distribution is found in a Chatterjee,S., Handcock, M.S., & Simonoff, J. S. (1995) Casebook for a First Course in Statistics and Data Analysis. New York: Wiley. Most of us have grown up to think of the geyser at Yellowstone named Old Faithful as just that--faithful and reliable. But actually it isn't very faithful at all, with times between eruptions varying between about 45 minutes and 90 minutes (And it has gotten worse in the last few months, following recent earthquake activity.) Chatterjee et al. present data on the timing of nearly 300 eruptions, as well as the length of each eruption.

The authors currently have these (and other) data available at geyser2a.dat The variables, in order, are length of previous eruption, interval between eruptions, and a dichotomized version of the first variable. The M's in the dataset represent missing values. There is a brief discussion of these data in item 4 at http://www.geom.umn.edu/docs/education/chance//chance_news/recent_news/ chance_news_5.03.html

The students (or the instructor) can download these data  and draw a frequency distribution of the length of eruptions or the interval between them. A quick histogram of the length of the eruptions is shown below to give a flavor of what the data look like.

wpe77.jpg (17365 bytes)

The geyser data set leads to a great example for discussions of regression, associated with Chapter 10. We would presumably like to be able to predict when the next eruption will take place. (Otherwise the spectators will be caught in the coffee shop when the thing goes off, and have to make a dash to see it--as I did.) Use the data you have to try to predict the next eruption.

bar bar

 

Last revised: 7/11/98