*Generating Data with a
Fixed Intercorrelation Matrix *

David C. Howell

Frequently people want to generate an intercorrelation matrix with specific properties. For example, they might want to generate a set of data whose intercorrelation matrix is exactly specified in advance. A long time ago I wrote such a program. It is a nice program, but it is somewhat long.

Recently Andrew Hayes at Ohio State provided me with a program in SPSS syntax that I like better. All of the credit goes to him--I am just writing the web page. It is a very easy program to implement.

If you would rather work in R, there is code at the bottom of this page that you can use. I did write that, though building on an R function written by Venables and Ripley.

The SPSS syntax is given below, but first a word about implementation. The simplest thing to do is to make a copy of this program in any text editor. Then decide whether you want to specify correlation coefficients exactly or whether you want to draw data from populations with particular correlations. In the latter case, your results will not be exactly the correlations specified in the matrix, but rather a sample from such populations. If you want exact correlations, leave the 4th line of the syntax (not counting the commented section) as is. If you want to draw from populations with specified correlations, change the 4th line to "compute exact = 0."

Next you have to decide how many rows you will have in your data matrix. If you want 150 cases on five variables, change the 3rd line to Compute n = 150.

Finally, you need to specify the matrix of intercorrelations. Change lines 5 - 8 to contain your matrix. You need to be very careful here, because for some reason it is hard to type a matrix in just the way you want it. Remember that R is symmetric, so be sure that your upper and lower triangular matrices match. Maybe that is easy, but I always screw it up.

Having modified the code appropriately, cut and paste it into SPSS. Just start up SPSS, click on File/New, and specify that you want syntax (not data). Then paste the code into that screen and click "Run." The resulting data will appear in the SPSS data window and can be saved.

If, on the other hand, you only want a set of two variables drawn from populations with a specified correlations, you can easily generate that using instructions given in CorrGen.html. . This will not, however, give you the exact correlations you want for sample data.

*******************************************

/* This program generates a multivariate random normal sample */ /* of size n from a population described by covariance matrix r */ /* Setting exact to 1 yields a sample that exactly reproduces */ /* the population matrix. Setting exact to any value other than 1 */ /* produces a sample from the population, which will be subject */ /* to random sample error, meaning that sample covariance */ /* matrix will not be exactly equal to the population matrix */ /* keep the seed constant to reproduce the data from run to run */ /* Written by Andrew F. Hayes */ /* School of Communication */ /* The Ohio State University */ /* hayes.338@osu.edu */ /* Version 1.1, Sept 15, 2010 */ set seed = 12343. matrix. compute n = 500. compute exact = 1. compute r = {1, .4, -.3; .4, 1, .6; -.3, .6, 1}. compute rn = nrow(r). compute x1 = sqrt(-2*ln(uniform(n,rn)))&*cos((2*3.14159265358979)*uniform(n,rn)). compute x1=x1*chol(r). compute ones = make(n,1,1). compute sigma = (t(x1)*(ident(n)-(1/n)*ones*t(ones))*x1)*(1/(n-1)). do if (exact = 1). call eigen(r, vc, vl). compute sqrtr = vc*sqrt(mdiag(vl))*t(vc). call eigen(sigma, vc, vl). compute sqrts = vc*sqrt(mdiag(vl))*t(vc). compute x1 = x1*inv(sqrts)*sqrtr. compute ones = make(n,1,1). compute sigma = (t(x1)*(ident(n)-(1/n)*ones*t(ones))*x1)*(1/(n-1)). end if. print r/title = "Population Matrix"/format = F16.4. print sigma/title = "Sample Matrix"/format = F16.4. print n/title = "number of cases created"/format = F16.0. save x1/outfile = *. end matrix.

# This code uses the mvrnorm function in the MASS library, written by Venables and Ripley. # I no longer recall who suggested that, but someone deserves credit. # The variables will each have a mean of 0 and a variance of 1. After you have the data, you can # apply a linear transformation to get any mean and variance that you want. rmat <- matrix(c(1.0, .50, -.50, .50, 1.0, .30, -.50, .30, 1.0), nrow = 3, byrow = T) mu <- c(0,0,0) library(MASS) mat <- mvrnorm(100, Sigma = rmat, mu = mu, empirical = TRUE) # If empirical = FALSE, the correlations will be approx. cat ("The intercorrelation matrix is = '\n'") print(cor(mat))

dch:

David C. Howell

University of Vermont

David.Howell@uvm.edu