*Generating Data with a Specified Correlation*

## David C. Howell

It is quite easy to generate a set of data that represents a sample from a population a specified correlation coefficient of r. I don't have the time right now to write out a specific program. However, the basic steps are very simple. The program will not generate a data set with exactly the correlation you specify. Instead it will draw data from a *population* whose correlation parameter (ρ) is that correlation.

Use the normal random number function available in almost all software to generate two
random variables (X and Y).
Standardize these variables to mean = 0, sd = 1.
Calculate a = r/sqrt(1-r^{2}).
Calculate Z = a*X + Y.
Adjust the means and variances of X and Z to what you want them to be by simple linear
transformations--(e.g., Xnew = Xold*NewSD + NewMean).
Now the correlation between X and Z will be r.
The mean of z will be 0.00, and its stand deviation will be sqrt(a^{2} + 1).
If you don't standardize the variables I would assume that the resulting r will come
from a population where rho = 0, but I haven't worked this out. If anyone knows for sure,
I'd appreciate hearing.
I got this idea from an electronic message from Marco Welton, at University College
Cork, Ireland, but I'm sure that it is not original with him. If you want a program in SPSS or R that will generate a data set with an exact correlation matrix, go to CorrGen2.html. That program will handle a matrix with many variables, not just two.

Return to
Dave Howell's Statistical Home Page

University of Vermont Home Page

dch: