Logo

Generating Data with a
Fixed Intercorrelation Matrix

David C. Howell

bar bar

* This SPSS Syntax generates a set of data which have been drawn from
* populations having a specified set of means and variances, and the pattern of
* correlations specified in  "Compute R = { }.

* You can alter this program in a large number of ways to produce whatever data
* you wish. For example, you can leave out the material on the pattern of intercorrelations,
* and you will have a set of data that are independent (in the population). If the data must be
* absolutely independent, leave in the correlation material, but set the correlations to 0.00.


* If you want to draw from non-normal populations, simply replace the COMPUTE response = rv.normal(0,1).
* command with the command for a different distribution--e.g. "rv.chisq(df)."

* If you want to draw variables from a population with a specified mean and variance, use the commands
* shown here. For example, Compute violwit = nr1*12.48+20.36. is really just "Compute violwit =
* nr1*NewSD + NewMean." If you want to be sure that the new mean and standard deviation are exactly
*  20.36 and 12.48, respectively, then invoke the Statistics/Summarize/Descriptives command with
* the "Save standardized values as variables" command, and then apply the transformation to the
* standardized variables.

 

* If you run a Principal Components analysis
* and extract all factors, and enter these factors in the
* Matrix statements (in place of r1, r2, ...), the output variables
* will have EXACTLY the pattern of correlations given in R.

* David C. Howell
* The Factor commands were supplied by Lawrence Gordon.
* 4/27/98

new file.
input program.
* Draw 99 cases.
SET SEED 3458769.
loop #i = 1 to 99.
*Draw data for 10 variables.
do repeat response = r1 to r10.
COMPUTE response = rv.normal(0,1).
end repeat.
end case.
end loop.
end file.
end input program.

*Modify the next line for your own system.
Save outfile = "DataOut.sav".

* You now have 10 variables with 99 cases each, drawn from a normal population
* with mean = 0 and variance = 1. If you don't care about the correlations among
* the variables, skip the next two sections.

* The next commands provide a principal components analysis and
* give orthogonal factors which will then reproduce the matrix exactly. If you only
* want the population to have a specified pattern, skip this section.
*You will have to modify in 3 places to adjust to the number of variables.
Factor
/variables r1 to r10
/analysis r1 to r10
/print correlation extraction
/criteria Factors(10) Iterate(25)
/extraction pc
/rotation norotate
/save reg(all).

Save outfile = "DataOut.sav".

* This section sets the pattern of correlations. Be very very careful here. I invariably
* get something wrong and the program aborts. The matrix must be perfectly symmetric.

Matrix.
Get X
/File = "DataOut.sav"
/Variables = fac1_1 to fac10_1.

* If you don't want the exact pattern of correlations, replace previous
* three lines with: (Omitting the "*".)
*Get X
* /File = "DataOut.sav"
* /Variables = r1 to r10.

Compute R = {1.0, .43, .37, .20, .08, .19, .05, -.18, .20, -.20;
.43, 1.0, .48, .31, -.04, .17, -.02, -.03, .04, -.05;
.37, .48, 1.0, .39, -.08, .27, .04, -.25, -.09, -.10;
.20, .31, .39, 1.0, -.17, .33, .27, -.21, -.16, -.04;
.08, -.04, -.08, -.17, 1.0, -.29, -.08, .12, .07, .06;
.19, .17, .27, .33, -.29, 1.0, .00, -.19, -.10, .09;
.05, -.02, .04, .27, -.08, .00, 1.0, .01, .06, -.08;
-.18, -.03, -.25, -.21, .12, -.19, .01, 1.0, .04, .01;
.20, .04, -.09, -.16, .07, -.10, .06, .04, 1.0, .00;
-.20, -.05, -.10, -.04, .06, .09, -.08, .01, .00, 1.0}.

Compute NewX = X*chol(R).
Save NewX /outfile = */variables = nr1 to nr10.
End matrix.

* Now you have your correlated variables. The next lines adjust the means and variances.

Compute violwit = nr1*12.48+20.36.
Compute violvict = nr2*2.43*2.28.
compute intrus = nr3*3.32+10.09.
compute internal=nr4*2.35 + 0.
compute socsupp=nr5*5.25+27.55.
compute sostrain=nr6*1.70+3.77.
compute stress=nr7*2.12+2.94.
compute mateduc=nr8*1.91+11.75.
compute age=nr9*1.27+10.71.

* The following was inserted for a specific example I was creating.

RECODE
nr10
(Lowest thru -.24=1) (ELSE=2) INTO gender .
EXECUTE .

CORRELATIONS
/VARIABLES=nr1 to nr10 /PRINT=TWOTAIL SIG.

 

I have recently found a much simpler program at http://www.jcomm.ohio-state.edu/ahayes/SPSS Programs/cholesky.htm. I have never used it, but it certainly looks simple. I would want to modify it to get the appropriate means and variances, but that is just a set of linear transformations, so there shouldn't be a problem.

bar bar

Last revised: 12/17/05