
Exams not done
Im going to start with the facts they know from last semester.
= bX + a
where b is the slope (the difference in
for a one unit difference in X.
a is the intercept = the value of
when X = 0.
rYX is = the correlation between Y and
![]()
This is true, but unimportant in simple regression, but becomes useful in multiple regression.
SSY = Sum of squares of Y = S(Y-
)2 = (N-1)s2Y
SSreg = sum of squares of
= S(
-
)2 = r2SSY
SSerror = residual sum of squares = S(Y -
)2 = (1-r2)SSY
dferror = N - p - 1 = N - 2
r2 = percent of variation in Y accounted for by variation in X.
Each of these relationships has a counterpart in multiple regression, with only minor (and pretty much obvious) changes.
Here we try to use information on several predictors (Xj) to simultaneously predict Y.
We will form a linear combination of the Xj to yield
, subject to the same least squares
restriction that we had in Chapter 9.Get them to remember what a linear combination is.
We want an equation of the form:
= b1X1 + b2X2 + b3X3 + ... + bkXk + b0
where the bj are chosen so as to minimize S(Y -
)2 = SSresidual
point out that the subscript j stands for the variable, and not the subject.
Point out b0
Example
Ill start out with the example from Esther Leerkes work. It only has two independent variables, which makes things a bit simpler. And it is a very nice study.
Esther was looking at Maternal Self Efficacy, measured at 5 months after mom became a mom (meq5). Esther wanted to know how that score was related to the care that mom received from her own mom when she was little (MCarem) and mom's own sense of self esteem (SE)
Define criterion variable = the dependent variable.
Define predictors = the independent variables.
Describe data file.
Variables are FamilyID, se, mcarem, and meq5
The first 12 cases follow. (These are the actual data she collected.)
familyid se mcarem meq5 1
3.83
2.58
3.70
2
3.50
2.83
3.40
4
4.00
3.17
3.80
8
4.00
3.75
3.90
9
4.00
3.58
3.90
11
3.33
3.67
3.70
12
3.50
3.25
3.80
13
3.67
3.75
3.50
15
1.83
3.17
3.10
16
2.83
2.67
3.50
17
3.67
4.00
3.30
19
4.00
4.00
4.00
SPSS Printout
Descriptives:
Correlations:

3 - D Scatterplot
I rarely think that these are helpful, but some people really like them.
Regression:
REGRESSION
The following shows the Maternal Efficacy plotted against the predicted values
Discuss the above printout
Intercorrelation Matrix
"validities"
correlation among the variables
- discuss the meaning of these
- discuss multicollinearity
Missing values
Here we dont have any, but if we did:
- pairwise deletion
- Casewise or listwise deletion
Scatterplot:
- Discuss how to get this.
- What does it tell us?
Regression solution:
Discuss overall F and R-squared
- Tie this to Anova and discuss similarities and differences
- Point out SSregression and SSresidual
- Ask them why dfresidual = 44.
Discuss parameter estimates:
- Intercept
bj
Point out that these are just slopes
- Called regression coefficients
Predicted values
Optimal regression equation =
= .147*se + .0582*mcarem + 2.929
familyid se mcarem meq5 1
3.83
2.58
3.70
2
3.50
2.83
3.40
4
4.00
3.17
3.80
8
4.00
3.75
3.90
For the first subject:
= .147*3.83 + .0582*2.58 + 2.929 = 3.642
For the second case
= 3.608
For the third subject:
= .147*4.00 + .0582*3.17 + 2.929 = 3.70
Case
Y
Residual
1
3.70 3.642
0.058
2
3.40 3.608
-.208
3
3.90 3.700
0.200
Standardized Regression Coefficients
Explain why we might want to standardize the variables
Get them to remember that this creates comparable standard deviations, and hence scales the coefficients by the standard deviations.
Last revised: 03/11/02