Friday, November 11, 2016

What Does Early Polling Tell Us?

In the wake of Hillary Clinton's loss to Donald Trump, it's worth considering whether Bernie Sanders would have suffered the same fate. During the primary, I pointed to historical polling of party nominees during the primary season, and how well it predicted final general election results:

https://twitter.com/n8r0n74/status/705825446785515520

At the time, there were two major sets of polling data available to help assess the candidates. First, individual candidates' favorability ratings. Second, we had general election matchup polling of several potential November matchups.

In the favorability ratings, Bernie Sanders far exceeded Clinton, who noticeably exceeded the ratings of Donald Trump (June 6 ratings in parentheses):

https://elections.huffingtonpost.com/pollster/bernie-sanders-favorable-rating (+9%)
http://elections.huffingtonpost.com/pollster/hillary-clinton-favorable-rating (-14%)
http://elections.huffingtonpost.com/pollster/donald-trump-favorable-rating (-25%)

In matchup polls, the final polling conducted between Sanders and Trump (end of Sanders' realistic chances to win nomination) showed Sanders performing at an average of +10%, which is an enormous margin in modern electoral terms:

http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_sanders-5565.html

The RealClearPolitics running average for Clinton-Trump, measured at that same end date (June 6):

http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html

It's important to compare apples-to-apples while the Sanders-Clinton race was still in progress. After it ended, Clinton predictably got a large, but short-lived polling boost (as did Trump earlier in the season). Depending on how you average/smooth the data, the Clinton-Trump matchup polls gave Clinton between a +2% and +4% advantage at that time. So, conservatively, Sanders fared about 6 points better than Clinton in that future matchup.

Are all these early data meaningful? I believe so. Here is an analysis of primary season polling data as a predictor of general elections:

http://fivethirtyeight.blogs.nytimes.com/2012/04/18/do-romneys-favorability-ratings-matter/?_r=1

I draw some different conclusions than Silver, but I'm starting with the data he provides.


What this shows are the averages of favorability ratings during the primaries, and of course, the final results. In these 10 previous contests, 7 times the winner was the candidate who polled more favorably during the primaries. 7 out of 10 seems modestly, but not wildly, skillful. But, I believe the results are more useful than that, when inspected closely.

One of the 3 "failures" was the 1988 Bush-Dukakis race. That was a failure. I will only note that unlike other candidates, Dukakis had excellent net favorability, while still having low favorable ratings. In other words, during the primary season, many respondents either had a neutral, or no opinion of Dukakis. In statistical terms, we have to consider this kind of data as lower confidence data. Dukakis and Bush had identical early 34% favorable ratings.  But, by November, Dukakis's unfavorable ratings had risen from 16% to 39%. In any case, this race was a failure in terms of prediction skill.

However, the 1992 Clinton-Bush failure was different. While Clinton's early ratings were worse than Bush's:

  • Like Dukakis, many voters had not yet formed either a favorable or unfavorable view yet
  • They were only modestly worse than Bush's (-11% vs -3%)
  • Most importantly, the final race was impacted by an independent candidate winning 15% of the vote
This last point is crucial. A comparison of two early candidates cannot be expected to account for the effect of another candidate garnering 15% of the final vote, if that candidate is not siphoning exactly as many Democrat and Republican votes. It seems entirely plausible to me that Ross Perot allowed Clinton to win the 1992 election. Thus, I think we need to throw that year out, entirely, in our analysis.

The last failure was the 2004 Bush-Kerry race. But, I submit that this wasn't a failure at all, rather an indication of the closeness of the race. Early polling showed Kerry at +1% and Bush at -1%, the smallest early season difference in the last 40 years. In the end, Bush won the vote by 1%. So, while early polling did not successfully predict the winner, I think it's fair to say that it did predict the outcome, within a very modest range of uncertainty. In statistics, that needs to be the standard. Barely missing is not equally bad as missing by a large margin. I argue that 2004 should be considered a success of this predictor, with the caveat that a couple percentage points of uncertainty has to be assumed. This, then, would mean that in 8 of the last 9 elections without large 3rd-party influence, primary season favorability ratings were a good predictor of outcome.

8 out of 9, or even 8 out of 10, has to be considered very good performance. This year, the gold standard of political poll analysis, FiveThirtyEight.com, had Hillary Clinton at >70% to win the election, on the very day of the election. They ended up missing on the popular vote by a modest 2%. We don't have any means of quantifying electoral chances that's without uncertainty in the results.

That quantifiable predictors are not perfect is not a good reason to favor more qualitative analysis. It's only a good reason to be less certain about the predictions.

Questions


Q: But, if we're to believe that favorability rating is a good predictor, why didn't it predict Sanders beating Clinton?

A: I don't claim that favorability is a means by which to predict primary races. There are two unique factors affecting primaries. First, name recognition. Primary voters must choose between less well-known candidates. By November, both major party candidates are always well-known. In primaries, those with high familiarity (e.g. Clinton) have a huge advantage. This is why we often see party nominees who were "losers" in previous years' races. Secondly, at the national level, we have a close left-right split of the electorate. In that environment, I believe personal favorability has the opportunity to be a deciding factor. In primaries, there is no clear center-point of the party. No symmetry to be slightly tipped in one candidate's favor by good favorability ratings.

Q: What about 2000? Isn't that a "failure" because Gore actually won the popular vote?

A: That's an entirely fair reading. But, then, we could also consider 2016 a success because Clinton won the popular vote. In any case, I think 8 out of 10 successes is a reasonable interpretation of the history of this heuristic.

Q: But these polls were so far in advance? Wouldn't things change by November?

A: Of course, they could. But, the final polls were conducted in June, only 5 months from the election. Clinton and Trump are also well-known quantities. Their favorability ratings were very consistent throughout the election season. Sanders is the least well-known of the three. His numbers may have changed. But, the point is that they would have to change by a tremendous amount in 5 months. His (favorability/matchup) numbers weren't just better than Clinton's in June. They were much better. And, Sanders' favorability ratings did not mirror Dukakis's in 1988, where large numbers of voters expressed no favorable/unfavorable view in the polls. 

In the end, the data show that, historically, things usually don't change that much from June to November.

No comments:

Post a Comment