David C. Howell

For this example I used SPSS to generate five variables as random samples (of 20 cases each) from a normally distributed population. These samples are independent of each other, and the population correlation would be 0.0. The results, without the data, are shown below. Notice that the intercorrelation matrix shows you the correlation, below that the sample size, and below that the two-tailed significance level. (Thus, for example, when the true correlation between X1 and X2 in the population is 0.00, a sample correlation as extreme as ±.1127 would occur 63.6 percent of the time.)

```
- -  Correlation Coefficients  - -

X1         X2         X3         X4         X5

X1           1.0000     -.1127      .2541     -.3364      .1563

(   20)    (   20)    (   20)    (   20)    (   20)

P= .       P= .636    P= .280    P= .147    P= .511

X2           -.1127     1.0000     -.1044      .1905      .0451

(   20)    (   20)    (   20)    (   20)    (   20)

P= .636    P= .       P= .661    P= .421    P= .850

X3            .2541     -.1044     1.0000     -.1739      .3960

(   20)    (   20)    (   20)    (   20)    (   20)

P= .280    P= .661    P= .       P= .464    P= .084

X4           -.3364      .1905     -.1739     1.0000     -.1503

(   20)    (   20)    (   20)    (   20)    (   20)

P= .147    P= .421    P= .464    P= .       P= .527

X5            .1563      .0451      .3960     -.1503     1.0000

(   20)    (   20)    (   20)    (   20)    (   20)

P= .511    P= .850    P= .084    P= .527    P= .

(Coefficient / (Cases) / 2-tailed Significance)

```

A scatterplot of these data follows:

What if we increase the sample size?

To give you a sense of the relationship between sample size and the variablility of correlation coefficients, I have repeated the previous example, but this time I have generated 200 cases. Because the correlations are based on much more data, they should hover more closely around the true population correlation of 0.00. Can you see this in the following set of data?

```                       - -  Correlation Coefficients  - -

X1         X2         X3         X4         X5

X1           1.0000     -.0002      .0500      .0236      .0072

(  200)    (  200)    (  200)    (  200)    (  200)

P= .       P= .998    P= .482    P= .741    P= .919

X2           -.0002     1.0000     -.0378      .1233      .0306

(  200)    (  200)    (  200)    (  200)    (  200)

P= .998    P= .       P= .595    P= .082    P= .667

X3            .0500     -.0378     1.0000      .1810     -.0225

(  200)    (  200)    (  200)    (  200)    (  200)

P= .482    P= .595    P= .       P= .010    P= .751

X4            .0236      .1233      .1810     1.0000     -.0168

(  200)    (  200)    (  200)    (  200)    (  200)

P= .741    P= .082    P= .010    P= .       P= .814

X5            .0072      .0306     -.0225     -.0168     1.0000

(  200)    (  200)    (  200)    (  200)    (  200)

P= .919    P= .667    P= .751    P= .814    P= .

(Coefficient / (Cases) / 2-tailed Significance)

" . " is printed if a coefficient cannot be computed

```

Notice that there is one Type I error here. (Remember that a Type I error consists of rejecting the null hypothesis when it is in fact true. Since I drew all of my samples independently, the true correlation is the population would in fact be 0.00.) Can you find the Type I error? What do you think happens to the probability of a Type I error when we work at a = .05, but run many hypothesis tests? (How many tests did we actually run here?)

Last revised: 7/13/98