BlueGreenSigma.gif (3324 bytes)

A Different Look at Power

David C. Howell

GreenBlueBar.gif (3931 bytes) GreenBlueBar.gif (3931 bytes)

We normally think of power in terms of its precise definition, which is the probability of rejecting a false null hypothesis. We have a traditional system that includes a null hypothesis and an alternative hypothesis, where the alternative hypothesis is really just the negative of the null. We either reject the null hypothesis of equality, or we don't.

Suppose that we are testing two drugs for the treatment of clots in heart attack patients. By our traditional system, we have a null hypothesis that says that the two drugs are equally effective in the treatment of clots, and as alternative hypothesis that says that one drug is better than the other. Power, then, is the probability of rejecting the null hypothesis, in favor of the alternative, when in fact one drug is (no matter how trivially) more effective than the other.

But, the world really isn't as simple as that choice would suggest. The following argument is based on an article in Discover magazine for May, 1996, which builds on a paper by Brophy and Joseph in the Journal of the American Medical Association (May, 1995). That paper looks at a Bayseian view of hypothesis testing, but the argument that I make here really doesn't depend on your opinion of the good Reverend Bayes, who didn't even make a mark until after he was dead.

If you go stumbling into the hospital showing symptoms of a serious heart attack, you could be prescribed one of two clot-dissolving drugs--streptokinase or t-PA (tissue plasminogen activator). The question is, which one should you get? Someone with a good standard statistical training might assume, as we usually do, that the issue is easily resolved. Just give a whole bunch of patients streptokinase, and another bunch of patients t-PA, and then wait and see which group shows the higher survival rate. If we assume that the null hypothesis is false, then one drug is superior to the other, and power is simply the probability of finding that difference that is really there. But I left out an interesting fact. T-PA sells for $1,530 a pop, whereas you can get streptokinase for a mere $220. Oh! Well, if the drugs are equally effective, you would probably go for the cheaper one, unless your insurance company is paying the bill. But, then, we have to define what "equally effective" means. If streptokinase will save 90 percent of those who receive it, and t-PA will save 90.3 percent of those who receive it, and if we're talking about a 6 - 7 fold difference in price, you might be tempted, if you were paying the bill, to go with streptokinase. But what if the difference in survival rate is 5%, or 3%, or even 1%? Then which would you prefer?

I assume that we could agree that there is some point at which we would decide that the difference in survival rate is so clearly on the side of t-PA that we would vote for it regardless of cost. But I also suppose that there is a point at which we would decide that the difference is so small that we would go for the cheaper streptokinase. (Remember, if we spend huge amounts of our medical dollars on one treatment, we can't spend it on others.)  The only question is where is that cutoff? Well, in traditional treatments of power there is no particular cutoff. We speak about the null hypotheisis being "false;" we don't speak about it being false by a certain amount--though certainly our calculations take into account how false it is.

There have been several studies of the relative effectiveness of the two drugs. One study with 20,000 patients found in favor of one of the two drugs, while another, with 30,000 patients, found in favor of the other. (As a psychologist, I can only marvel at the huge sample sizes. Isn't it wonderful what money will buy?) But the conflicting results left people in doubt, so the manufacturers of t-PA teamed up with some other foks and funded a really huge study of 40,000 patients. They decided that a difference of 1% in survival percentages would be a meaningful difference, and were truly excited when they found that 93.7 percent of patients who received t-PA survived, while only 92.7% of those who received streptokinase survived. They argued that this was convincing evidence that cardiologists should forget about the burden on the poor patient, worry about the burdens on their own liability insurance, and prescribe the (much) more expensive drug t-PA. But, argued Brady and Joseph, what does this study really tell us? It's true that the odds of getting that particular result, if the drugs are equally effect, are 1000:1. But who's talking about "equally effective?" What the originators of the study were talking about as "clinically superior" was a 1% difference. And that is exactly what they found. Now suppose that there really is a 1% difference between the two drugs. And suppose that you ran a study and concluded that you would support t-PA only if 1% or more of the patients who received it actually survived. Then, in fact, you actually have a 50:50 chance of getting a result that you would call significant.(If the true difference is 1%, then half the time you will find differences like 1.1% and 1.4%, and half the time you will find differences like 0.7% and 0.9%.) If we ignore everything else, and we assume that the sampling distribution of our result is symmetrically distributed, the probability of a result greater than 1% is only 50:50. Or, put another way, the probability that the result of an actual 1% difference means that t-PA is "clinically superior," is only 50:50. In other words, Brady and Josph argue, the data don't really resolve anything.

It is not my intention to argue the case for one pharmaceutical company over another, especially given how little I know about pharmacology. It is my point that we have to stop and think about what we mean by power. The paper by Brody and Joseph raises the interesting point that we may be asking the wrong question when we ask about the power of rejecting the null hypothesis that µ1 =  µ2. What we really need to think about is the probability of finding a difference that we would call "meaningful." That is quite a different thing, and that's not what we are usually talking about.

This page isn't intended to come up with a definitive statement of what we mean by power in this situation. It is intended to raise questions--some of which I can't really answer. Put yourself in the position of an intelligent and compassionate HMO. (I know that most people think that is an oxymoron, but let that pass.) You don't want people to die, but neither do you want to spend your very scarce resources needlessly. Furthermore, you agree that a true 1% difference in survival is worth paying for, but a (true) 0.9% difference is not.How are you going to design the definitive study, assuming that large subject populations are readily available?

Gee, that would make a great exam question.

GreenBlueBar.gif (3931 bytes) GreenBlueBar.gif (3931 bytes)

Home Icon Return to Dave Howell's Statistical Home Page  

 

Planetary 



Cows Icon University of Vermont Home Page  




Send mail to: David.Howell@uvm.edu)

Last revised: 7/11/98