header.gif (5403 bytes)


Logistic Regression--Answers

4/6/02

Abbreviated answers for the questions are included in a different type face after the questions.

This lab is intended as a lead-in to logistic regression. I will go over logistic regression in more depth on Tuesday in class. We will do this lab together in class, because it led to a lot of questions last year. I want you to hand in the answers to each of these questions next week.

Logistic regression is quite easy to perform, and it is not that hard once you get your head straight, but it does take some head straightening because it involves a different way of looking at things. This exercise is intended to give you some experience with logistic regression, and to help you understand just what the resulting models are all about. We will use a data set from Howell and Huessy (1985) on children who did, and did not, show behavior associated with Attention Deficit Disorder (ADD). These are not the data in the book, but a complete set of data on about 350 cases, The data are in a file named ADDfull.sav, and the variables are:

We will use these data to see if we can predict Social Problems in 9th grade from the children’s level of ADD-like behavior in elementary school, their gender, and their subsequent GPA in 9th grade.

We will start out with Gender as a single predictor. I chose this because it is a dichotomous predictor.

The first thing to do is to get a plain old CrossTab (contingency table) of Gender against SocProb. It is located under Statistics/Descriptives. Be sure to ask it to include a chi-square test.

Note both the Pearson chi-square (what you usually think of as chi-square) and the Likelihood Ratio chi-square.

Using this printout, what are the odds of SocProb|male, the odds SocProb|female, the odds ratio, and the inverted odds ratio?

Odds(sp|Female) = 11/147 = .075

Odds (sp|male) = 33/157 = .210

OR = .075/.210 = 0.356

1/OR = 1/.357 = 2.8

We can interpret this OR as saying that you are 35.7% as likely to display social problems if you are female than if you are male. Sounds reasonable.

The inverse can be interpreted to mean that if you are female, you are 2.8 ties as likely not to exhibit social problems if you are female than if you are male.

I chose to ask for a Risk Analysis under the options for the cross-tab. The following result shows the odds ratio, along with some other stuff we will ignore.

I do not have an explanation for the .888 and 2.495. I should have, but I don't. The ratio of them is also the odds ratio.

Now use logistic regression with SocProb as the dependent variable and Gender as the independent (covariate)  variable.

Warning: A huge problem that we run into with logistic regression is how SPSS defines "success" and "fail." Different programs do it differently. Is "success" the one coded with the smallest value, or the one coded 0, or the one coded 1? It always creates confusion.

First I want to show older printout so that students can see where some of these numbers come from. Compare that with the printout from SPSS 10.1, which is the newer version. 

Beginning Block Number 0. Initial Log Likelihood Function
-2 Log Likelihood 264.172
* Constant is included in the model.

Beginning Block Number 1. Method: Enter
Variable(s) Entered on Step Number
1.. GENDER GENDER
-2 Log Likelihood 255.278
Goodness of Fit   ???????
Cox & Snell - R^2   .025
Nagelkerke - R^2    .047

   Chi-Square df Significance
Model 8.894     1   .003
Block 8.894     1   .003
Step  8.894     1   .003

 

Notice that the program prints out -2 log likelihood = 264.172 when no variable is used--i.e. when we assume that all cell frequencies are equal. Then it adds in Gender, producing different expected frequencies based on Gender, and calcuates a second -2 log likelihood ratio. In this case it is 255.278. The difference between those two is 264.172 - 255.278  = 8.894, which is the chi-square given in the model.

Now look at the version 10.1 printout.

You don't see the equivalent of the -2 log likelihood ratio for the "null" model, but you do see it for the model with Gender included (255.278). You also see a chi-square test on the model with Gender (8.894). If we had the model without gender, its -2 log likelihood would be 255.278 + 8.894 = 264.172.

What is the coefficient for Gender?

Coefficient for Gender = -.1.033

That is the ln(odds ratio). Calculate the odds ratio ( eln(odds ratio), and relate that to what you already know.

When we raise ln(x) to eln(x), we get back X. So raising e to the the log odds ratio power, gives us the odds ratio.

Exponentiating ln(-1.033)  gives us  0.3559 = 0.356, which is the odds ratio.

Do you see why I started with a dichotomous predictor?

This shows clearly that the slope (b ) in logistic regression (at least when the predictor is a dichotomy) is just the ln(Odds) 

 

Now we will move on to a continuous predictor. Continuous predictors work the same way as dichotomous ones, but it is a bit harder to draw parallel's with what you already know. It seems reasonable to assume that we want to know if ADDSC alone can predict future SocProb scores, so use that as your sole predictor.

Does ADDSC significantly predict SocProb using logistic regression? How do you know?

 

 

 The prediction using ADDSC is significant (with 1df) at p = .000

What is the logistic regression equation? What does it mean?

The regression equation = .104ADDSC - 7.82

Use that equation to make a prediction for someone with an ADDSC score of 40, 50, 51, 60, 70, 80, 90, & 100. Crudely sketch out these predicted values in terms of odds and then probabilities. (Remember, p = odds/(1 + odds).) (You can get these plots in Word using Graph, and can put them all on the same graph.)

If we do this we will see that odds increase linearly as we increase ADDSC, but probabilities increase sigmoidally.

ADDSC        ln(odds)        odds       p=odds/(1+odds)

20 -5.74 0.003215 0.003204
30 -4.7 0.009095 0.009013
40 -3.66 0.025733 0.025087
50 -2.62 0.072803 0.067862
51 -2.516 0.080782 0.074744
60 -1.58 0.205975 0.170795
70 -0.54 0.582748 0.368188
80 0.5 1.648721 0.622459
90 1.54 4.66459 0.823465
100 2.58 13.19714 0.929563
110 3.62 37.33757 0.973916
120 4.66 105.6361 0.990622

Plotted against p = odds/(1+odds) Notice that this is an ogive.

plotted against ln(odds) Notice that this plot is linear.

 

Why are we interested in eln(odds), and what does it tell us?

The log of the odds is b. Exponentiating it gives us the odds.

What do these predicted values in terms of odds stand for?

The odds of exhibiting social problems are multiplied by 1.109 for every 1 point increase in ADDSC.

What does the difference between the prediction for 50 and 51 tell you?

The change in the log odds  [ln(Odds)] for every one unit change in ADDSC.

Write a sentence or two telling me what you have found.

Now add GPA to the model.

Is that model an improvement on the previous model?

How are you best going to test whether GPA adds significant explanatory power?

With just ADDSC the Likelihood Ratio chi-square was 227.728.
When we add GPA it drops to 220.158 
The drop = 7.57.
This used up 1 df, so we have a chi-sq = 7.57 on 1 df, which is significant at p < .05

Write out the model and calculate ln(odds), odds, and p that a person with an ADDSC score of 45 and a GPA of 2.5 will exhibit social problems.

What do you think this model has to say about causal relationships?

This is not a causal model.

Why might you want to look at the model in this incremental way, rather than just starting with both ADDSC and GPA as predictors? (Hint: Think about Wald.)

Would you give the same answer to this question if you also included birthweight as a predictor at the same time as you included GPA?

Then you would be adding two variables at the same time, and the decrease in chi-square would not speak to either of them individually.

How would your answer to the question about incremental change differ if you were using standard linear regression rather than logistic regression?

Then we would not have to worry about the "approximate" nature of Wald's test, because the t  test would give the same answer as the increase in R2.

Finally, design an experiment that you might imagine doing that would use logistic regression.

Last revised: 04/03/02