The Normal Distribution
David C. Howell
The material in this section will relate to the normal
distribution. I do not have much of it written yet, but what is here represents a start.
Much that you ever wanted to know about the normal distribution.
Rich Ulrich at the University of Pittsburgh has created a page giving all
sorts of useful information about the normal distribution--such as how to
generate normal data. This page is available at http://www.pitt.edu/~wpilib/statfaq/gaussfaq.html
- Normal probabilities
- This is a short Java program to calculate probabilities (and points) under the normal
distribution. It was written at UCLA, and will run on many web browsers. It will certainly
run under Netscape 2.0. When you see what little code there is in the html document
itself, you will be impressed with what Java can do. Use it to check the tables in the
book. (Remember about scientific notation; -1.0106675468e-7 tells you to move the decimal
7 places to the left. That is 0.0000001010667, which is a close to zero as I need to come.
- Generating data from a Normal Population
- You can easily generate data from a normal distribution using any of the commonly
available statistical packages. If you want to specify a mean and variance, the easiest
thing to do is to ask the program to standardize the data and save the standardized
variable. Then you simply multiply the new variable by the standard deviation you want,
and add the mean that you want. That's all there is to it.
- Applying the Normal distribution to NonNormal Data
- While we like to speak about normal distributions, our data are not always normally
distributed. Go to the bimodally distributed data on Old
Faithful, referred to elsewhere. Calculate the
percentage of observations with intervals longer than 50, 60, 70, 80, 90, and 100 minutes
by counting up actual observations. Then estimate what percentage would be above those
points if the data were normally distributed. What do you find?
- "That's easy, it's normally distributed"
- I still recall with chagrin the day in graduate school when I as a lab instructor in a
statistics course, and our students were to test a hypothesis about some statistic. The
instructor said "Just tell them that it's normally distributed, and they can take it
from there." I was too embarrassed to tell him that I couldn't take it from
there. Instead I asked a fellow student, who explained what he meant.
Suppose that we
have two groups, with n1 = 18 observations in one group and n2 = 26 observations in the
other group. We want to know if the groups came from populations centered at the same
point -- i.e., the same means, or medians, or modes, or whatever. Jumping way ahead to
distribution-free tests, you will find there that Wilcoxon's Ws statistic is useful for
this purpose. We know that when the two populations do have the same central tendency (we
often use the word "location"), Ws is normally distributed around a mean of
n1(n1 + n2 + 1)/2 and a standard deviation (called a standard error) of sqrt(n1*n2(n1 + n2
+ 1)/12. Suppose that we calculated Ws for our data as 300. What is the probability that
we would obtain a value of Ws this small if the populations have the same location? Hint:
What I had to be told is that you just calculate a z score and find the area under
the normal distribution below that point. We know the point (300) and the mean and
standard deviation, so it is easy to calculate z. (Thankfully, I have learned more
statistics since that afternoon.)
- Hiring and Admissions Bias
- This is a short discussion of the fact that gender or racial biases in hiring and
admissions can be a statistical artifact--which doesn't make them any less wrong. Two
Java applets are used to illustrate the point, and readers can play around with numbers
and see what happens. It raises some interesting issues.
Dave Howell's Statistical Home Page
University of Vermont Home Page
Send mail to: David.Howell@uvm.edu)
Last revised: 7/11/98