Median-Splits—Probably not a Good Idea

Psychologists and others often like to think in terms of groups or classes of people. We like to call some people Extroverted, and others Introverted, or we like to classify people as Type A or Type B personalities, and so on. And, in fact, there may be very good reason to think of people in this way, as opposed to thinking of them as spread out along a single dimension of extroversion.

And if we are going to think of people this way, it is sometimes very tempting to actually classify them this way. For example, we administer a scale of Optimism, and then use a median-split to label the people above the median as Optimists, and those below the median as Pessimists. We then run a t test between the two groups on some dependent measure, concluding, for example, that Optimists have more friends than Pessimists.

We sometimes even complicate the situation by having two independent variables. We might also pass out a scales of Extroversion and Optimism, and again use the median to divide people into Extroverts and Introverts, as well as Optimists and Pessimists. We might then think of this as a factorial design, and use a two-way analysis of variance to examine number of friends as a function of both dichotomous variables and their interaction. The analysis might reveal that both Optimism/Pessimism, and Extrovert/Introvert are related to the number of friends that an individual has, and there might even be an interaction between the two variables.

But, is the design just described an acceptable way to analyze the data? Certainly it is a common method, but that doesn’t necessarily make it right. In fact, there is a lot to suggest that this is not really the best way to proceed, especially when we have a factorial design.


The Simplest Case—the One-Way Design.

The most common use of the median-split is to create two groups, who are then compared on some dependent variable. Just for an example, we might compare Optimists and Pessimists on the number of friends subjects report having. (In fact, we wouldn’t need to use the median, we could use quartiles to classify them into four groups. However, the case is cleanest if I talk about a median-split, so that’s what I’ll do.)

There is quite a literature to suggest that, even though it is nice and convenient to sort people into 2 groups and then use a t test to compare group means on Friends, we lose considerable power doing that as compared to simply looking at the regression of Optimism on Friends. Cohen (1983) has said that breaking subjects into two groups leads to the loss of 1/5 to 2/3 of the variance accounted for by the original variables. The loss in power is equivalent to tossing out 1/3 to 2/3 of the sample. Ouch! No one likes to lose that much power.

If, on the other hand, we have a one-way design, as we do here, and we find a difference between the two groups, we can be pretty confident of our results. In other words, the biggest problem with dichotomizing with a one-way design is the loss in power.

As an example, suppose that we look at the relationship between and individual’s Optimism score and his/her level of Religious Hope (RelHope), as measured by questions such as "Do you believe that there is a heaven?" (I would use number of friends, but I don’t have any data on that.) A study by Sethi and Seligman (1993), which I refer to elsewhere


examined just such a relationship. I created data reflecting the data they found, and will use the first 150 cases to look at this relationship.

First, we can dichotomize the Optimism score at the median, creating a variable named OptDich. If we use the analysis of variance to compare the resulting groups on their mean level of RelHope, we obtain the following result.



- - - - - O N E W A Y - - - - -



Variable RHOPE

By Variable OPTDICH

Analysis of Variance


Sum of Mean F F

Source D.F. Squares Squares Ratio Prob.

Between Groups 1 2.7070 2.7070 2.2924 .1321

Within Groups 148 174.7663 1.1809

Total 149 177.4733



Clearly, there is no significant difference between the two groups, F(1,148) = 2.29, p = .1321.

However, suppose that we had not dichotomized Optimism, but had simply looked at the regression of RelHope on Optimism. Such an analysis is shown below, where it is clear that there is a significant relationship between the two variables, F(1, 148) = 5.81, p = .0172. The fact that we had a significant difference with the regression, but lost it with the dichotomized variable, is a function of the relative power of the two procedures.





* * * * M U L T I P L E R E G R E S S I O N * * * *

Equation Number 1 Dependent Variable.. RELHOPE

Block Number 1. Method: Enter OPTIMISM

Multiple R .19435

R Square .03777

Adjusted R Square .03127

Standard Error 1.07417

Analysis of Variance

DF Sum of Squares Mean Square

Regression 1 6.70383 6.70383

Residual 148 170.76951 1.15385

F = 5.80997 Signif F = .0172


------------------ Variables in the Equation ------------------

Variable B SE B Beta T Sig T

OPTIMISM .074415 .030873 .194355 2.410 .0172

(Constant) 5.274708 .132261 39.881 .0000




The More Complex Case—the Factorial Design.

When we were looking at only one dependent variable, the biggest problem we had to face was the decrease in power. And in some situations, we can take the loss in power in stride, and still have the results we seek. But all that can change when we come to factorial designs, resulting from the dichotomization of two independent variables. In that situation we also suffer a potential loss in power, but, much worse, we run the risk of finding spurious effects that don’t actually exist in the data.

Much of what follows was inspired by a paper by Maxwell and Delaney (1993). In that paper the authors challenged the common belief that the only concern with dichotomizing continuous variables was a loss in power. They showed that it was not always possible to disentangle the effects of the dichotomized two independent variables. I won’t give their specific arguments here, but will instead provide an example which illustrates their arguments.

Primo, Compas, et al. (in press) studied how people coped with breast cancer. They hypothesized that Intrusive Thoughts and Avoidance were variables controlling the subsequent outcome of patients. Someone with a high intrusion score is someone who continually thinks about her cancer and "can’t get it out of her mind." Someone with a high avoidance score goes to great length (often unsuccessfully) to avoid thinking about her cancer. The data that I will present are only a small part of the data that they collected, and they have been chosen to make specific points. The conclusions are my own, and not necessarily theirs.

The data on 85 subjects are available in the file named Kari.txt. The variables, in order, are

Obs Case Number

Intrus1 Intrusion score at diagnosis

Avoid1 Avoidance score at diagnosis

Anxt1 Anxiety score at diagnosis

Anxt2 Anxiety score at 3 months post-diagnosis

Anxt3 Anxiety score at 6 months post-diagnosis

Dept1 Depression score at diagnosis

Dept2 Depression score at 3 months post-diagnosis

Dept3 Depression score at 6 months post-diagnosis

Age Age of patient

AvoidCen Avoid1 - mean

IntrCen Intrus1 - mean

AvIntCen AvoidCen * IntCen

HiLoInt High or Low Intrusion group (median-split) 1 = Low, 2 = High

HiLoAv High or Low Avoidance group (median-split) 1 = Low, 2 = High

Group 1 = LowInt/LowAvoid, 2 = LowInt/HighAvoid, 3 = HighInt/LowAvoid, 4 = HighInt/HighAvoid

Before we even think about a dependent variable, we first need to look at the two independent variables, Intrusions and Avoidance. It turns out that the problems stem from these two variables, and not from the dependent variable.

It probably is not too difficult to believe that there would be a correlation between Intrustions and Avoidance. It turns out that it is a positive correlation, with people who engage in a lot of avoidance having a lot of intrusions, and vice versa. (I might have expected that if you did a lot of avoiding, you wouldn’t have many bad thoughts, but that just shows how little I know.) The correlation in this case is .405, which, with 85 cases, is significant at p < .01. This turns out to be important. (Fortunately, it also turns out to be almost exactly equal to one of the parameters in the Maxwell and Delaney paper, which allows for comparison. If you don’t believe what follows, just look at their paper.)

As Maxwell and Delaney show, just knowing this correlation and the sample size allows us to make a very good guess as to the number of subjects who will fall in each quadrant once we make median-splits. In fact, for 85 subjects we would predict 27 in the HighHigh condition, 27 in the LowLow, and 15 in the other two conditions (allowing for rounding). Guess what! Our examples had ns of 28, 26, 16, and 14, respectively. Not bad!

But unequal sample sizes in factorial designs often cause problems, and this case is no exception. On the other hand, if you have read the sections of my text on unequal sample size in factorial designs, you will know that the usual methods of analysis control for the effects of other variables in the design. Well, they do when we have standard designs with usual kinds of groups. You’d probably suspect that they would control for similar things here, but you’d be wrong. And that’s the problem. Maxwell and Delaney show that using median-splits the effect of one independent variable can contaminate the effect of the other variable. For our example, it can make an Avoidance effect pop up just because there is an Intrusion effect.


Regression approaches to the problem

You probably believe that I can show that, or else I wouldn’t have gone out on a limb by saying it, but how about a simple example? First I will use the continuous variables in a regression problem, and then I’ll use the dichotomized ones in an analysis of variance. But, which variables? You might think that I would just grab Intrusions and Avoidance and enter them. And I could, with no particular problem. But, I also want to check out the possibility that there is an interaction between Intrusion and Avoidance, and for that I might use the product of those two variables (call it IntAvoid). The problem with that approach is that IntAvoid is created from Intrusion and Avoidance, and is almost certain to be highly correlated with both of them. This makes for a messy situation when we try to sort out the variance. Instead, we will center both Intrusion and Avoid by subtracting their respective means for each case, (creating IntCen and AvoidCen) and then take the product of those two centered variables. When you look at the overall result, the two variables and their product will explain exactly the same amount of variability (with the same overall F), regardless of whether or not we center. However, the partitioning of that variability among the three sources is quite different with centered and uncentered variables. That is why I have chosen to use the centered variables.

The results of this regression are given below.





-> /CRITERIA=PIN(.05) POUT(.10)


-> /DEPENDENT anxt3

-> /METHOD=ENTER intrcen avoidcen avintcen .


* * * * M U L T I P L E R E G R E S S I O N * * * *

Mean Std Dev Label

ANXT3 51.235 11.538

INTRCEN .003 8.004

AVOIDCEN -.003 7.599

AVINTCEN 24.043 60.524

N of Cases = 85

Correlation, 1-tailed Sig:


ANXT3 1.000 .402 .183 -.166

. .000 .047 .065

INTRCEN .402 1.000 .400 -.155

.000 . .000 .078

AVOIDCEN .183 .400 1.000 -.020

.047 .000 . .428

AVINTCEN -.166 -.155 -.020 1.000

.065 .078 .428 .

* * * * M U L T I P L E R E G R E S S I O N * * * *

Equation Number 1 Dependent Variable.. ANXT3

Multiple R .41635

R Square .17335

Adjusted R Square .14273

Standard Error 10.68325

Analysis of Variance

DF Sum of Squares Mean Square

Regression 3 1938.61798 646.20599

Residual 81 9244.67613 114.13180

F = 5.66193 Signif F = .0014


------------------------- Variables in the Equation --------------------------

Variable B SE B Beta Tolerance VIF T Sig T

INTRCEN .536845 .160981 .372410 .818353 1.222 3.335 .0013

AVOIDCEN .048700 .167538 .032074 .838197 1.193 .291 .7720

AVINTCEN -.020473 .019516 -.107391 .973822 1.027 -1.049 .2973

(Constant) 51.726277 1.250193 41.375 .0000




From this table you can see the correlation among the independent variables, which is .40 as I reported earlier. You can also see how those variables correlate with Anxiety at 6 months. Intrusions and Avoidance (and their interaction) are each significantly correlated with Anxt3, but we really can’t tell much from this. If we assume some true relationship between Intrusions and Anxt3, and if Avoidance is correlated with Intrustions, there is probably a pretty good bet that Avoidance will also correlate with Anxt3. That’s why we use multiple regression in the first place—we want to look at the effects of one variable while controlling for the effects of other variables.

If you look toward the center of the table, you will see that the multiple correlation is .416 between Anxt3 and all three predictors. But remember, it was .402 between Intrusions and Anxt3. A fat lot of good Avoidance and the interaction term did; they raised the correlation by only .014! This already tells us that when we control for Intrusions, the other variables really don’t make much of a contribution.

If you look next at the regression coefficients, you see that the coefficient for Intrusions was significant (b = .537, p = .0013), but that the other two predictors where not significant predictors of Anxt3 (b = .049, p = .7720; and b = -.020, p = .2973, respectively).

So what are we going to conclude from this analysis? Well, I’m going to conclude that when it comes to looking at the level of anxiety 6 months after diagnosis, the important, and only, predictor variable is the level of intrusive thoughts around the time of diagnosis. At this point I think we have learned something about coping with breast cancer.


But I like to put people in groups


Regression is all well and good, but I really do like the analysis of variance. It makes it easy for people with simple minds like mine to sort things out. I want to be able to say that Martha does a lot of avoiding and has lots of intrusions, and people like her are headed for trouble. None of this "Well, she’s sort of high on intrusions" stuff for me.

So I take my two independent variables and I create groups. I find people who are high on both Intrusions and Avoidance, those who are high on Intrusions and low on Avoidance, and so on. This leaves me with a 2X2 factorial, and I know just how to apply the analysis of variance to that. In fact, the subgroup means that I get look like:




Row Mean









Column Mean





Oh, Dear! I’m pleased to see that I still get differences due to Intrusions, with means of approximately 53.8 and 48.4. But what’s happening with Avoidance. There the means are 54.3 and 47.9, which is an even larger difference than the Intrusion difference. Where did this come from?

Well, you might think that the problem was just caused by unequal sample sizes, and we certainly have those. But that won’t explain everything because the marginal means given in this table are what I have elsewhere called "unweighted means," which says that they are the means of the cell means. Well, maybe everything will go away when I run my analysis of variance. Generations of students have known that the analysis of variance is magic, and perhaps the magic will work here. The results of the analysis of variance are shown below:



-> VARIABLES=anxt3

-> BY hiloint(1 2) hiloav(1 2)




* * * A N A L Y S I S O F V A R I A N C E * * *




UNIQUE sums of squares

All effects entered simultaneously


Sum of Mean Sig

Source of Variation Squares DF Square F of F

Main Effects 1912.108 2 956.054 8.353 .001

HILOINT 587.849 1 587.849 5.136 .026

HILOAV 810.971 1 810.971 7.085 .009

2-Way Interactions .709 1 .709 .006 .937

HILOINT HILOAV .709 1 .709 .006 .937

Explained 1912.113 3 637.371 5.569 .002

Residual 9271.181 81 114.459

Total 11183.294 84 133.134

Even Houdini had a bad day! The Avoidance effect didn’t go away. In fact, the F value associated with it is even larger than the F value associated with Intrusions. I seem to have created an effect when there wasn’t one here in the first place. Well, at least I didn’t also get an interaction. That would have been really embarrassing.

Thanks to Maxwell and Delaney, and a lot of thinking on my part, I believe I can show you what actually went wrong. Remember way back at the beginning I told you that Intrusions and Avoidance were correlated with r = .40? Well that correlation is messing things up. Suppose that we make a scatterplot of that relationship, and that we then cut it into quadrants by using the medians. Suppose that we go one step further and calculate what the mean Intrusion and Avoidance score would be for each of the subgroups. This relationship is plotted below, compliments of SPSS.


See those little circles in each quadrant? Those are called "centroids", and they represent the center of the scores in each quadrant. Thus, for example, subjects who fall in the HighIntrusion/LowAvoidance quadrant, have a mean Intrusion score of about 19, and a mean Avoidance score of about 10. But notice that those little circles (which I’ll try to make more obvious) don’t form a square—they form a parallelogram. That means something to statisticians, but us simple folk need a better explanation. Notice that the people who are high in Avoidance, don’t really have the same Intrusion scores. If you were in the upper right quadrant, your groups mean Intrusion score would be about 22. However, if you were in the lower right quadrant, your groups mean Intrusion score would be about 19. Whereas our labeling of groups has led us to see HighInt/HighAvoid and HighInt/LowAvoid as both being high (and therefore equal) on Intrusions, the one group actually has a higher mean intrusion score than the other. When we compare HighInt/HighAvoid and HighInt/LowAvoid, they really don’t just differ on Avoidance, they also differ on Intrusions. And we already know Intrusions are a bad thing and make people feel bad. Because of this, when we see those two groups differing on Anxiety, our group labels lead us to think in terms of Avoidance, when the reason may really be the (normally hidden) difference in Intrusions. In other words, Intrusions and Avoidance are confounded, which is another way of saying that they are correlated, which is what I said way back at the beginning.


What’s a Body to Do?

Where do we go from here? We have one analysis that suggests that the only important variable here is Intrusions. We have another analysis that suggests that both Intrusions and Avoidance are important. They can’t both be true—or can they?

Well, this is where I came into this game. I was brought this problem for the very reason that the two analyses lead to seemingly conflicting conclusions. I at least now understand why the conclusions conflict, but I’m not sure that I am really all that much closer to understanding what this all means. I guess what this really comes down to is that I’m a pragmatist rather than a theorist. I’m actually interested in whether optimists have more friends than pessimists. I’m not really all that excited by the fact that some optimists are more optimistic than others, just as Newt is probably not all that much interested that some Republicans are more committed to his (adjective deleted) cause than others. If they vote like a Republican, that’s good enough for him. Well, then, it’s true that people with high Avoidance scores do worse than people with low Avoidance scores. The theorists among us may jump up and down and say "But, But, But, it’s just ‘cuz they’re also high in Intrusions," but that doesn’t take away from the fact that the groups differ. Maybe they know why and I don’t, but if I were a shrink faced with a high avoider, I’d start to worry. (Now maybe I’d be better off poking her with a pin every time she had an intrusive thought, and let the avoidance problem take care of itself—as the theorist would have it— but I should still worry.)


Well, If this isn’t a Problem with the One-way, Why Not Use That?

Why not, indeed. I’ve said that Maxwell and Delaney don’t take quite as much umbrage at people who carry out median-splits on one independent variable. And we know that the problem is certainly less confusing in that setting, so why not just create four different groups (rather than think of it as a 2X2) and run the one-way analysis of variance? Then you can follow it up with planned contrasts or even something like Tukey’s test. Perhaps this isn’t the normal approach, and perhaps Scott Maxwell would whap me on the side of my head for thinking such a thing, but I don’t think that this would necessarily be such a terrible solution. For that matter, I’m not even sure that the factorial solution is so terribly bad if we are focusing on the groups, rather than on the underlying variables. Let’s remember that we’re not just talking about the right versus the wrong statistical analysis. We’re really talking about how to think about the variables and their interpretation. Primo, Compas, et al, while normally very conscientious pointy-head theorists, this time happen to be acting like bleeding-heart shrinks. They want to know who’s at risk, not why. (At least until they write their next in a seemingly limitless flow of papers.) And they aren’t even particularly interested in the factorial nature of these groups, as much as in the fact that they can identify four subsets of people.

Well, I’d feel better if I didn’t have this nagging little worry in the back of my head. I know that if I treat the design as a one-way, I can get answers that tells me something about group differences. But I also know that just because I pretended that there weren’t two independent variables doesn’t mean that they magically went away. They are still out there, and the HighIntrusion/HighAvoid group still differs from the HighIntrusion/LowAvoid group on the level of intrusive thoughts they experience. I have to be willing to say that this doesn’t really matter to me, and that I really don’t want to focus on the underlying cause of the theory. And that’s going to take a bit of doing.

Stay Tuned. Don’t Touch That Dial!