 # The Normal Distribution

## David C. Howell  The material in this section will relate to the normal distribution. I do not have much of it written yet, but what is here represents a start.

Much that you ever wanted to know about the normal distribution.

Rich Ulrich at the University of Pittsburgh has created a page giving all sorts of useful information about the normal distribution--such as how to generate normal data. This page is available at http://www.pitt.edu/~wpilib/statfaq/gaussfaq.html

Normal probabilities
This is a short Java program to calculate probabilities (and points) under the normal distribution. It was written at UCLA, and will run on many web browsers. It will certainly run under Netscape 2.0. When you see what little code there is in the html document itself, you will be impressed with what Java can do. Use it to check the tables in the book. (Remember about scientific notation; -1.0106675468e-7 tells you to move the decimal 7 places to the left. That is 0.0000001010667, which is a close to zero as I need to come.

Generating data from a Normal Population
You can easily generate data from a normal distribution using any of the commonly available statistical packages. If you want to specify a mean and variance, the easiest thing to do is to ask the program to standardize the data and save the standardized variable. Then you simply multiply the new variable by the standard deviation you want, and add the mean that you want. That's all there is to it.

Applying the Normal distribution to NonNormal Data
While we like to speak about normal distributions, our data are not always normally distributed. Go to the bimodally distributed data on Old Faithful, referred to elsewhere. Calculate the percentage of observations with intervals longer than 50, 60, 70, 80, 90, and 100 minutes by counting up actual observations. Then estimate what percentage would be above those points if the data were normally distributed. What do you find?

"That's easy, it's normally distributed"
I still recall with chagrin the day in graduate school when I as a lab instructor in a statistics course, and our students were to test a hypothesis about some statistic. The instructor said "Just tell them that it's normally distributed, and they can take it from there." I was too embarrassed to tell him that I couldn't take it from there. Instead I asked a fellow student, who explained what he meant.

Suppose that we have two groups, with n1 = 18 observations in one group and n2 = 26 observations in the other group. We want to know if the groups came from populations centered at the same point -- i.e., the same means, or medians, or modes, or whatever. Jumping way ahead to distribution-free tests, you will find there that Wilcoxon's Ws statistic is useful for this purpose. We know that when the two populations do have the same central tendency (we often use the word "location"), Ws is normally distributed around a mean of n1(n1 + n2 + 1)/2 and a standard deviation (called a standard error) of sqrt(n1*n2(n1 + n2 + 1)/12. Suppose that we calculated Ws for our data as 300. What is the probability that we would obtain a value of Ws this small if the populations have the same location? Hint: What I had to be told is that you just calculate a z score and find the area under the normal distribution below that point. We know the point (300) and the mean and standard deviation, so it is easy to calculate z. (Thankfully, I have learned more statistics since that afternoon.)

This is a short discussion of the fact that gender or racial biases in hiring and admissions can be a statistical artifact--which doesn't make them any less wrong. Two Java applets are used to illustrate the point, and readers can play around with numbers and see what happens. It raises some interesting issues.   Return to Dave Howell's Statistical Home Page University of Vermont Home Page

Send mail to: David.Howell@uvm.edu)

Last revised: 7/11/98