header.jpg (5403 bytes)


Multiple Regression

3/12/2002

Announcements

Exams not done

Basic Review

I’m going to start with the facts they know from last semester.

= bX + a

where b is the slope (the difference in for a one unit difference in X.

a is the intercept = the value of when X = 0.

rYX  is = the correlation between Y and

This is true, but unimportant in simple regression, but becomes useful in multiple regression.

SSY = Sum of squares of Y = S(Y-)2 = (N-1)s2Y

SSreg = sum of squares of = S( - )2 = r2SSY

SSerror = residual sum of squares = S(Y - )2 = (1-r2)SSY

dferror = N - p - 1 = N - 2

r2 = percent of variation in Y accounted for by variation in X.

Each of these relationships has a counterpart in multiple regression, with only minor (and pretty much obvious) changes.

Multiple Regression

Here we try to use information on several predictors (Xj) to simultaneously predict Y.

We will form a linear combination of the Xj to yield , subject to the same least squares  
restriction that we had in Chapter 9.

Get them to remember what a linear combination is.

We want an equation of the form:

= b1X1 + b2X2 + b3X3 + ... + bkXk + b0

where the bj are chosen so as to minimize S(Y - )2 = SSresidual

point out that the subscript j stands for the variable, and not the subject.

Point out b0

Example

I’ll start out with the example from Esther Leerkes work. It only has two independent variables, which makes things a bit simpler. And it is a very nice study.

Esther was looking at Maternal Self Efficacy, measured at 5 months after mom became a mom (meq5). Esther wanted to know how that score was related to the care that mom received from her own mom when she was little (MCarem) and mom's own sense of self esteem (SE)

Define criterion variable = the dependent variable.

Define predictors = the independent variables.

Describe data file.

Variables are FamilyID, se, mcarem, and meq5

The first 12 cases follow. (These are the actual data she collected.)

familyid se mcarem meq5

1

3.83

2.58

3.70

2

3.50

2.83

3.40

4

4.00

3.17

3.80

8

4.00

3.75

3.90

9

4.00

3.58

3.90

11

3.33

3.67

3.70

12

3.50

3.25

3.80

13

3.67

3.75

3.50

15

1.83

3.17

3.10

16

2.83

2.67

3.50

17

3.67

4.00

3.30

19

4.00

4.00

4.00

 

SPSS Printout

Descriptives:

Correlations:

 

 

3 - D Scatterplot

I rarely think that these are helpful, but some people really like them.

Regression:

REGRESSION

 

 

The following shows the Maternal Efficacy plotted against the predicted values

Discuss the above printout

Intercorrelation Matrix

"validities"

correlation among the variables

Missing values

Here we don’t have any, but if we did:

  • pairwise deletion
  • Casewise or listwise deletion

Scatterplot:

Regression solution:

Discuss overall F and R-squared

Discuss parameter estimates:

  • Point out that these are just slopes

  • Called regression coefficients

Predicted values

Optimal regression equation =

= .147*se + .0582*mcarem + 2.929

familyid se mcarem meq5

1

3.83

2.58

3.70

2

3.50

2.83

3.40

4

4.00

3.17

3.80

8

4.00

3.75

3.90

For the first subject:

= .147*3.83 + .0582*2.58 + 2.929 = 3.642

For the second case = 3.608

For the third subject:

= .147*4.00 + .0582*3.17 + 2.929  = 3.70

 

Case

Y –

Residual

1

3.70 – 3.642

0.058

2

3.40 – 3.608

-.208

3

3.90 – 3.700

0.200

 

Standardized Regression Coefficients

Explain why we might want to standardize the variables

Get them to remember that this creates comparable standard deviations, and hence scales the coefficients by the standard deviations.

Last revised: 03/11/02