Normal Probability Plots

From: Minitab users Group Newsletter (1992), 15, 3-4

This document has been scanned from the above source for use in my course (Psychology 340). I am assuming that it meets the requirements of fair use under the copyright laws. The author was not given.

It is often important to determine whether or not a set of data is normally distributed. Before the availability of packages such as MINITAB, this was usually done by examining the shape of the data's histogram. Another commonly used approach was to compute the percentage of observations within 1, 2, or 3 standard deviations of the mean and compare those values to what would be expected for a normal distribution.

Both of these approaches are problematic when the size of the data set is either very small or very large. When the sample size is small, the shape of a histogram is hard to determine. For large data sets, it is very time-consuming to perform either of the techniques without the use of a statistical package. Today, a far superior tool to check for normality exists. It is the normal probability plot.

MINITAB does not have a normal probability plot command. Still, you can construct this plot by using the following two commands. Suppose that the data are located in Cl:

NSCORES C1 into C2

PLOT C1 versus C2

NSCORES calculates normal scores for each observation in the data set. The normal score of the minimum value in the data set is the expected value of the first order statistic from a sample of the same size from a standard normal distribution, The normal score of the next largest value in the data set is the expected value of the second order statistic from the same sample, and so on. If several observations are equal, each is given the same normal score. It is calculated from the average of their ranks. If the sample is from a normal population, the plot should be easily fitted with a straight line. Otherwise, another shape will be present.

Let us now use MINITAB to generate a set of normal data and to construct a normal probability plot from these data.

MTB > BASE 15

MTB > RANDOM 1000 C1;

SUBC> NORMAL 80 5.

MTB > NSCORES C1 C2

MTB > GPLOT C1 C2

 

(The graphs aren’t great, but they will suffice. dch)

Note that it appears that there is a straight line present in this normal probability plot. Still, there may be situations in which it is not as easy to determine the linearity of the normal probability plot. Then one may measure the amount of linearity by computing the Pearson correlation coefficient and employing a powerful test for normality that is based upon this coefficient. Critical values for this test have always appeared in the MINITAB Reference Manual, but they only handled sample sizes up to 75. The Release 8 manual has critical values up to 1000. For example, with an alpha equal to 0.05 the critical value is .9984. Based upon the value of the correlation coeff icent of .999 computed below, we do not reject the null hypothesis of normality, since the correlation coefficient falls below the critical value.

MTB > CORR C1 C2

Correlation of C1 and C2 = 0.999

 For a normal distribution this useful plot can also be used to determine the mean and standard deviation of the distribution. They are simply the y-intercept and slope of the straight line that explains the plot. (Please refer to Chambers et al (1984) for a more complete discussion.) For example, by examining the plot given above we can determine that estimates of the population mean and standard deviation are 80 and 5, respectively.

This can be verified by using the REGRESSION output that follows.

 

MTB > BRIEF 1

MTB > REGRESSION C1 on 1 predictor C2

The regression equation is
C1 = 79.9 + 5.07 C2
Predictor Coef        Stdev t-ratio P
Constant 79.9235 0.00780 10194.93     0.000
       C2          5.06958 0.00787 644.56         0.000
              
     s = 0.2479 R-sq = 99.8% R-sq(adj) = 99.8%
 
        Analysis of Variance
        SOURCE     DF      SS     MS          F              p
Regression 1 25533 25533     415459.16     0.000
Error     998     61          0
Total     999 25595

In addition, many characteristics of any distribution can be determined by studying a normal probability plot. Among them are heavy- and light-tails, skewness, outliers, granularity, and bimodality. (Please refer to Hamilton (11992) for a more complete discussion.) For example, positively skewed distributions produce a downward-bowed normal probability plot. Let us check this by generating a set of data from a X2 distribution with 15 degrees of freedom and constructing a normal probability plot.

MTB > BASE 15

MTB > RANDOM 1000 C11;

SUBC> CHIS 15.

MTB > NSCORES C11 C12

MTB > GPLOT C11 C12

Observe the downward-bowed appearance of the plot, that is based upon positively skewed data.

 

References:

Chambers, J.M., et al. (1983) Graphical Methods for Data Analysis. Boston: Duxbury.

Hamilton, L.C. (1992) Regression with Graphics—A Second Course in Applied Statistics, pacific Grove, Ca: Brooks/Cole.

Minitab Inc. (1991) MINITAB Reference Manual. Release 8, PC Version, Rosemont, Pa.