February 26, 2021

Polling, Models, and Prediction

What happened on November 3rd, 2020? How did an election in which Joe Biden had a large lead in national polls and in most key swing states turn into an electoral college nail-biter?

First things first. Polls are not self-interpreting. They must be placed in historical and contemporary context. We also know that aggregating many polls will provide a more accurate picture than any single poll.


Hence, the era of predictive models.


What should we make of them? In both 2016 and 2020 the models, drawing on both national and state polls, tended to predict heavy Democratic victories. The only notable outlier was fivethirtyeight in 2016, which gave Trump a roughly 30% chance of winning. Most other models put Clinton’s odds between 85% and 99%, with most models in 2020 giving similar odds to Biden.


Have the polls and the models that draw on them been vindicated? No. Let’s try to figure out why this is the case.


Trump came fairly close to winning the electoral college in 2020 (~20,000 votes in Wisconsin, ~81,000 votes in Pennsylvania, ~154,000 votes in Michigan): what does this say about the many polls that had him well behind, both nationally and in key swing states, and the many models that considered him an overwhelming underdog? 


As Nate Silver and others stress, probabilistic thinking is hard. A model generally isn’t wrong or right but rather better or worse. The most prominent modelers dutifully reminded us that a 10% chance of winning is not zero. Hence, Trump having a 10% chance in fivethirtyeight’s final predictive model, for instance, does not mean he will lose, nor is the model necessarily discredited if Trump wins. 1 in 10 events happen. But if Trump won twice, which nearly happened, this would seem to call for more radical rethinking. Indeed, the actual outcome that we witnessed, in which Trump narrowly lost the electoral college, calls for serious rethinking. The margin in the national popular vote, key swing states, and the electoral college were all much closer than the polls and models suggested, not to mention the massive underperformance of Democratic Senate and House candidates relative to prominent polls and models.


Is the best explanation that in 2016 and 2020 two unlikely events happened? Isn’t it more plausible that the narrow Trump win in 2016 and the narrow Trump loss in 2020 were actually fairly likely, maybe on the order of a coin toss? If supposedly unlikely events keep happening, don’t basic principles of interpretation suggest that we need to adjust our models, rather than insisting that unlikely outcomes keep occurring?


How about an easier and more plausible explanation: the models were wrong in placing such confidence in state and national polls. They were thus wrong to build in assumptions that could lead to any presidential candidate in this polarized period being given odds of winning at 80% or 90%.


Why were they wrong to have such confidence? To begin with, the state polls in 2016 were very inaccurate. They were not random or useless. They were systematically incorrect, which is worse. They didn’t tell us nothing; rather, they told us the opposite of what was true. In 2020 modelers took it on faith that this was no longer the case, that the decent polling leads for Biden in places like Wisconsin, Michigan, Pennsylvania, North Carolina, and Florida reflected reality, even though this was not true in 2016. 


Why did anyone assume that state polls, so inaccurate in 2016, had corrected themselves? Don’t they have to re-earn our trust in the next Presidential election? Once again in 2020 the polls were systematically off in key swing states and nationally, predicting a large (8-10%) popular vote margin for Biden and comfortable swing state victories.


So again: What is more plausible, something that is unlikely keeps happening or something that has a fifty-fifty chance? Yet those who make the polls and design the models continue to insist that the unlikely keeps happening—that is, they claim that in a narrowly divided and polarized country it is unlikely that Presidential elections will be close in the popular vote and the electoral college. To be blunt, this defies all sense and logic. 


Sorry pollsters, pundits, and modelers. You were wrong when you said Biden was a heavy favorite, just as you were wrong about Clinton in 2016. The most simple and best explanation is that American Presidential elections can be expected to be close, barring some new developments (politics is always in flux).


In a country that at the moment leans blue nationally but with an electoral college that gives Republicans an edge (often estimated at around 3%) it makes sense to assume as a starting point that Democrats and Republicans each have around a 50% chance of winning the Presidency. Keep this in mind in the future.


See this recent analysis in fivethirtyeight which discusses the fact that every Presidential election in the past thirty years has been fairly close: Why A Biden Blowout Didn't Happen.