Median-SplitsProbably not a Good Idea
Psychologists and others often like to think in terms of groups or classes of people. We like to call some people Extroverted, and others Introverted, or we like to classify people as Type A or Type B personalities, and so on. And, in fact, there may be very good reason to think of people in this way, as opposed to thinking of them as spread out along a single dimension of extroversion.
And if we are going to think of people this way, it is sometimes very tempting to actually classify them this way. For example, we administer a scale of Optimism, and then use a median-split to label the people above the median as Optimists, and those below the median as Pessimists. We then run a t test between the two groups on some dependent measure, concluding, for example, that Optimists have more friends than Pessimists.
We sometimes even complicate the situation by having two independent variables. We might also pass out a scales of Extroversion and Optimism, and again use the median to divide people into Extroverts and Introverts, as well as Optimists and Pessimists. We might then think of this as a factorial design, and use a two-way analysis of variance to examine number of friends as a function of both dichotomous variables and their interaction. The analysis might reveal that both Optimism/Pessimism, and Extrovert/Introvert are related to the number of friends that an individual has, and there might even be an interaction between the two variables.
But, is the design just described an acceptable way to analyze the data? Certainly it is a common method, but that doesnt necessarily make it right. In fact, there is a lot to suggest that this is not really the best way to proceed, especially when we have a factorial design.
The Simplest Casethe One-Way Design.
The most common use of the median-split is to create two groups, who are then compared on some dependent variable. Just for an example, we might compare Optimists and Pessimists on the number of friends subjects report having. (In fact, we wouldnt need to use the median, we could use quartiles to classify them into four groups. However, the case is cleanest if I talk about a median-split, so thats what Ill do.)
There is quite a literature to suggest that, even though it is nice and convenient to sort people into 2 groups and then use a t test to compare group means on Friends, we lose considerable power doing that as compared to simply looking at the regression of Optimism on Friends. Cohen (1983) has said that breaking subjects into two groups leads to the loss of 1/5 to 2/3 of the variance accounted for by the original variables. The loss in power is equivalent to tossing out 1/3 to 2/3 of the sample. Ouch! No one likes to lose that much power.
If, on the other hand, we have a one-way design, as we do here, and we find a difference between the two groups, we can be pretty confident of our results. In other words, the biggest problem with dichotomizing with a one-way design is the loss in power.
As an example, suppose that we look at the relationship between and individuals Optimism score and his/her level of Religious Hope (RelHope), as measured by questions such as "Do you believe that there is a heaven?" (I would use number of friends, but I dont have any data on that.) A study by Sethi and Seligman (1993), which I refer to elsewhere
(http://www.uvm.edu/~dhowell/StatPages/Fundamentalism/Fundamentalism.html
examined just such a relationship. I created data reflecting the data they found, and will use the first 150 cases to look at this relationship.
First, we can dichotomize the Optimism score at the median, creating a variable named OptDich. If we use the analysis of variance to compare the resulting groups on their mean level of RelHope, we obtain the following result.
- - - - - O N E W A Y - - - - -
Variable RHOPE
By Variable OPTDICH
Analysis of Variance
Sum of Mean F F
Source D.F. Squares Squares Ratio Prob.
Between Groups 1 2.7070 2.7070 2.2924 .1321
Within Groups 148 174.7663 1.1809
Total 149 177.4733
Clearly, there is no significant difference between the two groups, F(1,148) = 2.29, p = .1321.
However, suppose that we had not dichotomized Optimism, but had simply looked at the regression of RelHope on Optimism. Such an analysis is shown below, where it is clear that there is a significant relationship between the two variables, F(1, 148) = 5.81, p = .0172. The fact that we had a significant difference with the regression, but lost it with the dichotomized variable, is a function of the relative power of the two procedures.
* * * * M U L T I P L E R E G R E S S I O N * * * *
Equation Number 1 Dependent Variable.. RELHOPE
Block Number 1. Method: Enter OPTIMISM
Multiple R .19435
R Square .03777
Adjusted R Square .03127
Standard Error 1.07417
Analysis of Variance
DF Sum of Squares Mean Square
Regression 1 6.70383 6.70383
Residual 148 170.76951 1.15385
F = 5.80997 Signif F = .0172
------------------ Variables in the Equation ------------------
Variable B SE B Beta T Sig T
OPTIMISM .074415 .030873 .194355 2.410 .0172
(Constant) 5.274708 .132261 39.881 .0000
The More Complex Casethe Factorial Design.
When we were looking at only one dependent variable, the biggest problem we had to face was the decrease in power. And in some situations, we can take the loss in power in stride, and still have the results we seek. But all that can change when we come to factorial designs, resulting from the dichotomization of two independent variables. In that situation we also suffer a potential loss in power, but, much worse, we run the risk of finding spurious effects that dont actually exist in the data.
Much of what follows was inspired by a paper by Maxwell and Delaney (1993). In that paper the authors challenged the common belief that the only concern with dichotomizing continuous variables was a loss in power. They showed that it was not always possible to disentangle the effects of the dichotomized two independent variables. I wont give their specific arguments here, but will instead provide an example which illustrates their arguments.
Primo, Compas, et al. (in press) studied how people coped with breast cancer. They hypothesized that Intrusive Thoughts and Avoidance were variables controlling the subsequent outcome of patients. Someone with a high intrusion score is someone who continually thinks about her cancer and "cant get it out of her mind." Someone with a high avoidance score goes to great length (often unsuccessfully) to avoid thinking about her cancer. The data that I will present are only a small part of the data that they collected, and they have been chosen to make specific points. The conclusions are my own, and not necessarily theirs.
The data on 85 subjects are available in the file named Kari.txt. The variables, in order, are
Obs Case Number
Intrus1 Intrusion score at diagnosis
Avoid1 Avoidance score at diagnosis
Anxt1 Anxiety score at diagnosis
Anxt2 Anxiety score at 3 months post-diagnosis
Anxt3 Anxiety score at 6 months post-diagnosis
Dept1 Depression score at diagnosis
Dept2 Depression score at 3 months post-diagnosis
Dept3 Depression score at 6 months post-diagnosis
Age Age of patient
AvoidCen Avoid1 - mean
IntrCen Intrus1 - mean
AvIntCen AvoidCen * IntCen
HiLoInt High or Low Intrusion group (median-split) 1 = Low, 2 = High
HiLoAv High or Low Avoidance group (median-split) 1 = Low, 2 = High
Group 1 = LowInt/LowAvoid, 2 = LowInt/HighAvoid, 3 = HighInt/LowAvoid, 4 = HighInt/HighAvoid
Before we even think about a dependent variable, we first need to look at the two independent variables, Intrusions and Avoidance. It turns out that the problems stem from these two variables, and not from the dependent variable.
It probably is not too difficult to believe that there would be a correlation between Intrustions and Avoidance. It turns out that it is a positive correlation, with people who engage in a lot of avoidance having a lot of intrusions, and vice versa. (I might have expected that if you did a lot of avoiding, you wouldnt have many bad thoughts, but that just shows how little I know.) The correlation in this case is .405, which, with 85 cases, is significant at p < .01. This turns out to be important. (Fortunately, it also turns out to be almost exactly equal to one of the parameters in the Maxwell and Delaney paper, which allows for comparison. If you dont believe what follows, just look at their paper.)
As Maxwell and Delaney show, just knowing this correlation and the sample size allows us to make a very good guess as to the number of subjects who will fall in each quadrant once we make median-splits. In fact, for 85 subjects we would predict 27 in the HighHigh condition, 27 in the LowLow, and 15 in the other two conditions (allowing for rounding). Guess what! Our examples had ns of 28, 26, 16, and 14, respectively. Not bad!
But unequal sample sizes in factorial designs often cause problems, and this case is no exception. On the other hand, if you have read the sections of my text on unequal sample size in factorial designs, you will know that the usual methods of analysis control for the effects of other variables in the design. Well, they do when we have standard designs with usual kinds of groups. Youd probably suspect that they would control for similar things here, but youd be wrong. And thats the problem. Maxwell and Delaney show that using median-splits the effect of one independent variable can contaminate the effect of the other variable. For our example, it can make an Avoidance effect pop up just because there is an Intrusion effect.
Regression approaches to the problem
You probably believe that I can show that, or else I wouldnt have gone out on a limb by saying it, but how about a simple example? First I will use the continuous variables in a regression problem, and then Ill use the dichotomized ones in an analysis of variance. But, which variables? You might think that I would just grab Intrusions and Avoidance and enter them. And I could, with no particular problem. But, I also want to check out the possibility that there is an interaction between Intrusion and Avoidance, and for that I might use the product of those two variables (call it IntAvoid). The problem with that approach is that IntAvoid is created from Intrusion and Avoidance, and is almost certain to be highly correlated with both of them. This makes for a messy situation when we try to sort out the variance. Instead, we will center both Intrusion and Avoid by subtracting their respective means for each case, (creating IntCen and AvoidCen) and then take the product of those two centered variables. When you look at the overall result, the two variables and their product will explain exactly the same amount of variability (with the same overall F), regardless of whether or not we center. However, the partitioning of that variability among the three sources is quite different with centered and uncentered variables. That is why I have chosen to use the centered variables.
The results of this regression are given below.
-> REGRESSION
-> /DESCRIPTIVES MEAN STDDEV CORR SIG N
-> /MISSING LISTWISE
-> /STATISTICS COEFF OUTS R ANOVA COLLIN TOL
-> /CRITERIA=PIN(.05) POUT(.10)
-> /NOORIGIN
-> /DEPENDENT anxt3
-> /METHOD=ENTER intrcen avoidcen avintcen .
* * * * M U L T I P L E R E G R E S S I O N * * * *
Mean Std Dev Label
ANXT3 51.235 11.538
INTRCEN .003 8.004
AVOIDCEN -.003 7.599
AVINTCEN 24.043 60.524
N of Cases = 85
Correlation, 1-tailed Sig:
ANXT3 INTRCEN AVOIDCEN AVINTCEN
ANXT3 1.000 .402 .183 -.166
. .000 .047 .065
INTRCEN .402 1.000 .400 -.155
.000 . .000 .078
AVOIDCEN .183 .400 1.000 -.020
.047 .000 . .428
AVINTCEN -.166 -.155 -.020 1.000
.065 .078 .428 .
* * * * M U L T I P L E R E G R E S S I O N * * * *
Equation Number 1 Dependent Variable.. ANXT3
Multiple R .41635
R Square .17335
Adjusted R Square .14273
Standard Error 10.68325
Analysis of Variance
DF Sum of Squares Mean Square
Regression 3 1938.61798 646.20599
Residual 81 9244.67613 114.13180
F = 5.66193 Signif F = .0014
------------------------- Variables in the Equation --------------------------
Variable B SE B Beta Tolerance VIF T Sig T
INTRCEN .536845 .160981 .372410 .818353 1.222 3.335 .0013
AVOIDCEN .048700 .167538 .032074 .838197 1.193 .291 .7720
AVINTCEN -.020473 .019516 -.107391 .973822 1.027 -1.049 .2973
(Constant) 51.726277 1.250193 41.375 .0000
From this table you can see the correlation among the independent variables, which is .40 as I reported earlier. You can also see how those variables correlate with Anxiety at 6 months. Intrusions and Avoidance (and their interaction) are each significantly correlated with Anxt3, but we really cant tell much from this. If we assume some true relationship between Intrusions and Anxt3, and if Avoidance is correlated with Intrustions, there is probably a pretty good bet that Avoidance will also correlate with Anxt3. Thats why we use multiple regression in the first placewe want to look at the effects of one variable while controlling for the effects of other variables.
If you look toward the center of the table, you will see that the multiple correlation is .416 between Anxt3 and all three predictors. But remember, it was .402 between Intrusions and Anxt3. A fat lot of good Avoidance and the interaction term did; they raised the correlation by only .014! This already tells us that when we control for Intrusions, the other variables really dont make much of a contribution.
If you look next at the regression coefficients, you see that the coefficient for Intrusions was significant (b = .537, p = .0013), but that the other two predictors where not significant predictors of Anxt3 (b = .049, p = .7720; and b = -.020, p = .2973, respectively).
So what are we going to conclude from this analysis? Well, Im going to conclude that when it comes to looking at the level of anxiety 6 months after diagnosis, the important, and only, predictor variable is the level of intrusive thoughts around the time of diagnosis. At this point I think we have learned something about coping with breast cancer.
But I like to put people in groups
Regression is all well and good, but I really do like the analysis of variance. It makes it easy for people with simple minds like mine to sort things out. I want to be able to say that Martha does a lot of avoiding and has lots of intrusions, and people like her are headed for trouble. None of this "Well, shes sort of high on intrusions" stuff for me.
So I take my two independent variables and I create groups. I find people who are high on both Intrusions and Avoidance, those who are high on Intrusions and low on Avoidance, and so on. This leaves me with a 2X2 factorial, and I know just how to apply the analysis of variance to that. In fact, the subgroup means that I get look like:
HighAvoid |
LowAvoid |
Row Mean |
|
HighIntrusion |
56.96 |
50.733 |
53.849 |
LowIntrusion |
51.688 |
45.077 |
48.382 |
Column Mean |
54.321 |
47.905 |
51.115 |
Oh, Dear! Im pleased to see that I still get differences due to Intrusions, with means of approximately 53.8 and 48.4. But whats happening with Avoidance. There the means are 54.3 and 47.9, which is an even larger difference than the Intrusion difference. Where did this come from?
Well, you might think that the problem was just caused by unequal sample sizes, and we certainly have those. But that wont explain everything because the marginal means given in this table are what I have elsewhere called "unweighted means," which says that they are the means of the cell means. Well, maybe everything will go away when I run my analysis of variance. Generations of students have known that the analysis of variance is magic, and perhaps the magic will work here. The results of the analysis of variance are shown below:
-> ANOVA
-> VARIABLES=anxt3
-> BY hiloint(1 2) hiloav(1 2)
-> /MAXORDERS 2
-> /METHOD UNIQUE
-> /FORMAT LABELS .
* * * A N A L Y S I S O F V A R I A N C E * * *
ANXT3
by HILOINT
HILOAV
UNIQUE sums of squares
All effects entered simultaneously
Sum of Mean Sig
Source of Variation Squares DF Square F of F
Main Effects 1912.108 2 956.054 8.353 .001
HILOINT 587.849 1 587.849 5.136 .026
HILOAV 810.971 1 810.971 7.085 .009
2-Way Interactions .709 1 .709 .006 .937
HILOINT HILOAV .709 1 .709 .006 .937
Explained 1912.113 3 637.371 5.569 .002
Residual 9271.181 81 114.459
Total 11183.294 84 133.134
Even Houdini had a bad day! The Avoidance effect didnt go away. In fact, the F value associated with it is even larger than the F value associated with Intrusions. I seem to have created an effect when there wasnt one here in the first place. Well, at least I didnt also get an interaction. That would have been really embarrassing.
Thanks to Maxwell and Delaney, and a lot of thinking on my part, I believe I can show you what actually went wrong. Remember way back at the beginning I told you that Intrusions and Avoidance were correlated with r = .40? Well that correlation is messing things up. Suppose that we make a scatterplot of that relationship, and that we then cut it into quadrants by using the medians. Suppose that we go one step further and calculate what the mean Intrusion and Avoidance score would be for each of the subgroups. This relationship is plotted below, compliments of SPSS.

See those little circles in each quadrant? Those are called "centroids", and they represent the center of the scores in each quadrant. Thus, for example, subjects who fall in the HighIntrusion/LowAvoidance quadrant, have a mean Intrusion score of about 19, and a mean Avoidance score of about 10. But notice that those little circles (which Ill try to make more obvious) dont form a squarethey form a parallelogram. That means something to statisticians, but us simple folk need a better explanation. Notice that the people who are high in Avoidance, dont really have the same Intrusion scores. If you were in the upper right quadrant, your groups mean Intrusion score would be about 22. However, if you were in the lower right quadrant, your groups mean Intrusion score would be about 19. Whereas our labeling of groups has led us to see HighInt/HighAvoid and HighInt/LowAvoid as both being high (and therefore equal) on Intrusions, the one group actually has a higher mean intrusion score than the other. When we compare HighInt/HighAvoid and HighInt/LowAvoid, they really dont just differ on Avoidance, they also differ on Intrusions. And we already know Intrusions are a bad thing and make people feel bad. Because of this, when we see those two groups differing on Anxiety, our group labels lead us to think in terms of Avoidance, when the reason may really be the (normally hidden) difference in Intrusions. In other words, Intrusions and Avoidance are confounded, which is another way of saying that they are correlated, which is what I said way back at the beginning.
Whats a Body to Do?
Where do we go from here? We have one analysis that suggests that the only important variable here is Intrusions. We have another analysis that suggests that both Intrusions and Avoidance are important. They cant both be trueor can they?
Well, this is where I came into this game. I was brought this problem for the very reason that the two analyses lead to seemingly conflicting conclusions. I at least now understand why the conclusions conflict, but Im not sure that I am really all that much closer to understanding what this all means. I guess what this really comes down to is that Im a pragmatist rather than a theorist. Im actually interested in whether optimists have more friends than pessimists. Im not really all that excited by the fact that some optimists are more optimistic than others, just as Newt is probably not all that much interested that some Republicans are more committed to his (adjective deleted) cause than others. If they vote like a Republican, thats good enough for him. Well, then, its true that people with high Avoidance scores do worse than people with low Avoidance scores. The theorists among us may jump up and down and say "But, But, But, its just cuz theyre also high in Intrusions," but that doesnt take away from the fact that the groups differ. Maybe they know why and I dont, but if I were a shrink faced with a high avoider, Id start to worry. (Now maybe Id be better off poking her with a pin every time she had an intrusive thought, and let the avoidance problem take care of itselfas the theorist would have it but I should still worry.)
Well, If this isnt a Problem with the One-way, Why Not Use That?
Why not, indeed. Ive said that Maxwell and Delaney dont take quite as much umbrage at people who carry out median-splits on one independent variable. And we know that the problem is certainly less confusing in that setting, so why not just create four different groups (rather than think of it as a 2X2) and run the one-way analysis of variance? Then you can follow it up with planned contrasts or even something like Tukeys test. Perhaps this isnt the normal approach, and perhaps Scott Maxwell would whap me on the side of my head for thinking such a thing, but I dont think that this would necessarily be such a terrible solution. For that matter, Im not even sure that the factorial solution is so terribly bad if we are focusing on the groups, rather than on the underlying variables. Lets remember that were not just talking about the right versus the wrong statistical analysis. Were really talking about how to think about the variables and their interpretation. Primo, Compas, et al, while normally very conscientious pointy-head theorists, this time happen to be acting like bleeding-heart shrinks. They want to know whos at risk, not why. (At least until they write their next in a seemingly limitless flow of papers.) And they arent even particularly interested in the factorial nature of these groups, as much as in the fact that they can identify four subsets of people.
Well, Id feel better if I didnt have this nagging little worry in the back of my head. I know that if I treat the design as a one-way, I can get answers that tells me something about group differences. But I also know that just because I pretended that there werent two independent variables doesnt mean that they magically went away. They are still out there, and the HighIntrusion/HighAvoid group still differs from the HighIntrusion/LowAvoid group on the level of intrusive thoughts they experience. I have to be willing to say that this doesnt really matter to me, and that I really dont want to focus on the underlying cause of the theory. And thats going to take a bit of doing.
Stay Tuned. Dont Touch That Dial!