Stats: Statistical Assessment Service
October 2000
Sudden surges, surprising rallies, dramatic
breakthroughs, swelling support, unexpected stumbles,
rough weeks, critical turning-points. Such breathless phrases as
these are the building blocks of campaign reporting.
This apparent drama of fluxes and flows is buttressed with impressive statistics drawn from the polls, validating each shift in the plotline and offering hints as to the decisive element that determines the outcome. But this year the apparent changes in voter sentiment have been so abrupt they can seem to defy explanation. As the Baltimore Sun put it on Oct. 1, politicians in both parties are somewhat at a loss to explain the apparent wide swings in public opinion over the past few weeks.
Ironically, no explanation at all may be the best explanation for the majority of these apparent race-defining moments. In fact, many of them likely represent a simple sloshing around of mushy opinion within the standard margin of error of the polls. The use of this term in election poll reporting may seem somewhat negative, suggesting that pollsters should just shape up and get rid of all that error.
Of course, margins of error do not represent a failure to perform correctly in the way we speak of mechanical errors. Rather, they are an ineradicable feature of all polling design. Any time a small sample must represent a vast population, there can be precision, but it will be of the fuzzy kind one can declare a hit, but only somewhere in that wide space between the margins. Moreover, dead-eye accuracy only emerges in the aggregate. For any single rifle-shot, there simply cannot be any guarantees, except in a parallel universe in which the sample was as large as the population, and the survey took place at exactly the same time as the election itself.
Two election polls can be wildly different without either necessarily being at fault. Critics still cite a Newsweek survey in late October, 1996 which found a 23 percentage point lead for Bill Clinton over Bob Dole. Barely a week later, Clinton won by only 8 percentage points. At the other end of the spectrum, the Battleground tracking poll was within 0.5 percent of the final vote. Both survey organizations used legitimate professional polling methodology.
This year, history may be repeating itself. One of the most widely discussed polls of the tight 2000 campaign season was a September 14-15 Newsweek poll of 580 likely voters which gave Al Gore a 14 point lead over his rival George W. Bush. No other contemporaneous poll found a gap of this magnitude. A scant week later, the same Newsweek poll showed only a 2 point lead for Gore, a finding within the margin of error that was consistent with other trackings.
Commentators were quick to ask what went wrong with the first Newsweek poll. Yet the answer could well be nothing or, conversely, any number of things. The result could have been presciently correct, a valid portrait of the September electorate. Or there may have been unintended problems in the design or execution of the poll, such as subtle bias in the selection of the sample. Even the use of scientific sampling techniques cannot ensure that the correct population was surveyed.
A strong possibility is that the result was in fact an innocent anomaly produced by chance, even with a perfectly designed and executed survey. But to understand that prospect, we need to grasp just what is meant when a poll result is said to be accurate with a such-and-such margin of error at such-and-such confidence level.
Many of us remember a staple of the carnival midway -- a shooting gallery, where a mechanical bear with an X on his side marched back and forth across an open space, turning around automatically when he reached the edge of the box (or reversing abruptly when his X was hit square-on by the shooter). A good shooter could keep him turning in a relatively confined space by a tighter and tighter bracketing of his shots, hitting the X more and more frequently until the bear simply spun in place. Then you had him. Thats every pollsters dream of zeroing in on public opinion.
The problem is, when it comes to the magic X-spot for millions of voters (the exact result on election day), the target is not only moving, its in the future, in effect invisible. The only action comes within the open space between the edges, and the blur of a moving bear inside. Where exactly the bear will stop on November 7 no one knows. In such a case, pollsters have to define what counts as a hit in a different manner, and thats where the margins of error and confidence levels come in.
A survey samples a population of a certain size, drawn from another population of a known size. From those two facts statistical tables can tell you just how faithfully your survey result will represent the opinion of the large population, within a margin of error at a particular confidence level. Of course, 100 percent confidence is absolute certainty that the single rifle shot hit its mark (the bear spins every time), and zero confidence means everything misses. If confidence were set at 50 percent, that would be no better than coin-flipping chance, fifty-fifty, that your shot hit within the margins.
Standard expectations for a well-done poll are usually set at the level of 95 percent confidence. So, five percent of the time your shots will neither hit the bear nor even be close enough for horseshoes theyre outside the box. Thats one out of every 20 shots going completely astray. But confidence in what, exactly that you nailed the X on the bear? Unfortunately, not even that. All that the 95 percent confidence level can tell you is that your shot hit somewhere between the edges of the bears box 19 out of 20 times. Much of the apparent movement during a campaign is actually within the margin of error, meaning that chance alone is providing different reports on the likely whereabouts of that bear with every poll taken, even though he may not have moved at all.
But even impressive numbers like 95 percent confidence can not guarantee that youve bagged any game. They only guard against statistical chance error, saying, in effect, among the reasons you might have missed the bear, the role of sheer chance can be minimized to no more than one out of twenty times (which represents the remaining 5 percent). They can do nothing to protect against measurement error (your gun failed to fire) or systematic bias, which is like bent sights on the rifle or a magnet on one side that pulls the shots offline.
The larger the sample, the narrower the margin of error within which we can be confident of 95 hits every 100 shots. For any population over 500,000, 1,065 respondents is minimally necessary to provide a 95 percent confidence level with a 3 percent plus or minus margin.
One can narrow the margin of error, but most pollsters find it prohibitively expensive. Ed Goeas of the Tarrance Group said his firm needs to make 15,000 telephone calls over three nights to reach 1,065 people. If 1,065 voters gave a +/- 3 percent margin of error with 95 percent confidence, then to get a 1 percent error with 99 percent confidence would require 16,641 voters. Given Goeas 15 to one ratio, then one would have to make nearly 250,000 calls to get the response!
Hence, most pollsters spend money reducing bias rather than expanding samples beyond 4,000 voters. The Rasmussen firm uses an automated telephone prompting system. This may increase refusal rate, but it also allows a larger sample because of the cost saving, thereby reducing the overall margin of error to +/- 2 percent. Gallup was able to call the last six elections, coming within a +/- 2 percent margin of error, with fewer than 4,000 voters.
At least four polling organizations are conducting daily tracking polls on this presidential race, yielding at least 240 polls. Nearly all have a 95 percent confidence level. That is, one out of every 20 polls will produce outcomes beyond the specified margins of error by chance alone. That means we can expect about 12 of these 240 to differ from actual opinion by more than the margin of error. So the September Newsweek poll could have squeezed the trigger on a perfect design but gotten a wild shot by sheer chance. Unfortunately, it is just those outlier results that will suggest the most dramatic changes from earlier polling numbers, and hence be more likely to draw media coverage.
Given these unavoidable limitations, it is generally good advice to average the outcomes of several polls and stick with the pack when reporting the horse-race. But even this approach doesnt always work. PollsterWarren Mitofsky points to the recent Mexican presidential elections, won by Vicente Fox. One poll showed Fox with a commanding lead, but the press ignored the result because it was so unlikely until the day after the election, that is. Fox won by the exact margin the outlier poll anticipated.
Many reporters also forget that in a two-person race, the margin of error applies to each candidate. Hence, the actual level of support with a +/- 3 percent margin of error is 6 percent wide for each candidate. That is, there are two dancing bears inside the box, each with a fuzzy buffer zone 3 percentage points wide on either side of him, left and right. Sometimes they overlap, sometimes they spread apart.
So, which bear is actually in the lead? As a general rule of thumb, if the margin between the candidates is greater than twice the sampling error for the poll (i.e. Candidate A has 41 percent, Candidate B has 48 percent, with a total margin of error of 6 percent), then the candidate with the greatest level of support is leading. It is more reliable to focus on changes in an individual candidates level of support, rather than the horse-race divide, because it only has one margin of error
Margin of sampling error is just one possible source of inaccuracy in a poll. It is not necessarily the greatest source of possible error; but it is the only one that can be precisely quantified. Remember the sportscasters cliche -- its not over until the fat lady sings. But only if shes within the margin of aria.