
The purpose of this lab is to illustrate clearly the meaning of multiple regression. I love this first example, although it seems a bit prosaic, because it illustrates things in ways that are easy to see and easy to grasp. The statistics are easy, but you need to think carefully about what they are telling us. We will first use data on height and weight, taken from the old Minitab handbook, and known as the "Pulse data." If you dont think that this is really related to psychology, think about the role that weight plays in many peoples lives, especially women. What do these data tell you about weight and about its relationship to sex and height?
The data file can be found on Gumby and is named HeightWt.sav.
Answer the following questions by use of SPSS and the dataset.
Use Graph/Scatter to plot the data, with Weight on the ordinate and Height on the abscissa. Chose Set Markers by Sex. This will give you different colored dots for males and females. After the graph has plotted, double click on the graph and then chose Chart/Options menu and click on Fit Line by Subgroups box. What do you see here? (You may also want to change the symbols on the graph so that it still is readable when printed out in black and white.)
You could try a three-dimensional graph scatterplot. You will have to double click on the image once the graph is plotted, and then chose set/exit spin mode on the right of the menu, and then play around. What does this do for you? (Don't spend a lot of time on this.)
Remember, these are real data collected from Psych 1 students at Cornell about 15 years ago. And remember that real people lie about both their height and their weight, and this lying probably varies substantially as a function of sex. How might this "fact" affect the result that you get?
Problem #2
We will work on this example only if there is time after we have done the height/weight example.
We will take as an example the data from Primo and Compas. These data are available in a file called karireg.sav on Gumby. This file contains data from 85 breast cancer patients, measured at three times. There are data on Intrusive Thoughts (intrus1) and Avoidance (avoid1) at Time 1, and BSI (Brief Symptom Inventory) measures at Times 1, 2, and 3 (Anx# and Dep#) (There are other measures that we are going to ignore for now.) I want you to look carefully at the output we get, and think about what it does, or doesnt, mean.
- Start by looking at the intercorrelation matrix of intrusive thoughts, avoidance, anxiety, depression, and age. What information do you glean from this?
- Now use Regression to predict Anxiety at Time 2 from Avoidance at Time 1. What does this show, and how does it compare with the intercorrelation matrix?
- Now predict Anxiety at Time 2 from Avoidance and Intrusions at Time 1. What can you tell from this analysis. Dont expect to be overwhelmed by the power of these two variables.
- You should have been a bit discouraged by what you found in the above step. What was there that was discouraging?
- Now add Age to the pot and see what this does.
- What would be the optimal regression equation for all three predictors? Write it out and predict the anxiety level for Subject 1 and Time 2.
- Finally, go back and rerun the regression, but this time use the Save button to save the unstandardized predictions and residuals. What do these tell you? Plot the prediction against Anx2, and calculate the correlation.
Last revised: 11/30/03