Friday, November 4, 2016

Sanity Checking

One of the best things about posting ideas online is exposing them to a community to be challenged. You also record a snapshot of your thinking at a given time, which can later be revisited. This should serve as an opportunity to calibrate your judgment and evaluate your analyses. Here's an opportunity to do that with a claim I made last week:

https://twitter.com/n8r0n74/status/792168797322883073


This tweet was in reference to the James Comey (FBI) announcement on 10/28 about the discovery of new emails, possibly related to the previously-closed Clinton email investigation. My tweet claimed that the announcement would surely not change polls by 1%, persistently. To be clear, I was referring to the net margin between Clinton and Trump, which at the time, stood at approximately 4%. Thus, I was claiming that the event would not cause polls to drop to 3%, favoring Clinton. It should also be clear by the "hiccups" clause that I was making no statements about the possibility for short-term effects on polls, as voters scrambled to understand the meaning of the Comey release. Granted, I offered no specific clarification about what "persistent" meant, but in the context of an election that was only 11 days away, I intended the claim to pertain to the polls as they stood before the election.

Results


As the election is still 4 days away, I actually don't think it's necessary to judge the prediction at all today. I consider less than a week still within "hiccup" territory, and many pollsters' latest results still include polling from the day of this news release, when voters may have been caught in the confusion of politicos spinning this story. Nonetheless, let's evaluate it today. I will be happy to admit failure and publicize it as such, should election day come and the prediction not have been validated.

Measuring


To assess my claim, we need a measure of the polling today, and a baseline before the Comey letter. I've consistently referred to rolling poll averages at RealClearPolitics.com, because their math is straightforward and the source data is well linked.

http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton_vs_johnson_vs_stein-5952.html

I consider 4-candidate polls to be the proper measure, as in almost all 50 states, it will actually be 4 (or more) choices before voters. A 1% effect on polling can't be evaluated without the effect of 3rd-parties.

The results today (updated as new polls come in) show Clinton with a +2.4% margin. This is the easy part. The more difficult part is establishing a baseline. I anticipated I might need to back up this claim in the future, and actually tweeted about the (then) current polling immediately after the Comey letter:

https://twitter.com/n8r0n74/status/792556051904069633


following that up by noting that at that time, the RCP moving average (Clinton +3.8%) included no polling data collected after the FBI letter was announced. 


So, this is clear, right? Polls at +3.8% before the event, and +2.4% afterwards? I was wrong? Well, maybe. Again, I think the jury is still out (until Tuesday), but there's also a couple points to consider:

What is our Baseline?


10/28 is clearly reflecting data before the event. But, what about the average on 10/29? Looking closely, we see a huge jump in Trump's numbers between 10/28 and 10/29:


Is that jump due to the Comey letter? Almost certainly, mostly not. It's important to understand that RCP's moving average is not showing the sentiment of American voters on a given day. It's an average of the most recent polls that have been released before that day. Expanding the individual poll summaries below the chart, we see that RCP averages approximately 5 days' worth of polls, typically between 5 and 8 polls' worth. 


On 10/29, I believe their moving average was calculated from these 7 polls, judging by the average:


I say "I believe," because a simple average of these polls gives Clinton +2.7%. Hovering over the graph, however, shows +2.6%. So, I can't say for sure. But, it's possible that either:

  • RCP applies different weighting to polls based on sample size or margin-of-error. The two biggest polls in that group were both Clinton +1%, so that might explain a slightly lower result
  • It may be that RCP is only displaying results to the nearest percent on their webpage, and have finer-grained data to calculate the overall results. So, 2.6 vs 2.7 may be due to rounding.
But, doesn't the 10/29 moving average include the Comey event? Barely at all. Only 1 of those 7 polls continued into 10/28. That was a 6 day poll. If we assume equal spread of polling in the IBD/TIPP poll, only 169 voters could possibly have known about the Comey letter. This is a generous estimate, given that many were likely at work on Friday, or otherwise didn't hear about the event until after participating in the poll. In any case, 169 voters represents only 2% of those polled, in the 10/29 RCP moving average.  So, why the big Trump jump from 10/28 to 10/29?

Almost certainly, this is a result of a block of very good poll results for Clinton from 10/24 now being 5 days old, and no longer in the average. Dropping results of +9, +9, +9, +14, +1, and +4 for Clinton was guaranteed to significantly reduce this moving average between 10/28 and 10/29. This aspect had nothing to do with the FBI letter.

This underscores an important point: Clinton was already losing ground quickly, before the FBI letter was released. From the data, her biggest margin of +7.1% came on 10/17. By 10/28, it was down to +3.8% with zero influence from the FBI letter. By 10/29, with results including only 2% of respondents from Friday 10/28, the margin was down to +2.6%



So, what is the best baseline? Well, there's probably no perfect answer. 10/28 has no post-FBI data in it. But, it does still carry the effect of 4 really good Clinton polls (+9,+9,+9,+14). 10/29 only has about 2% of its data coming from after the announcement.

The entire 10/28-10/29 gap in margin (+3.8% - +2.6%) can be explained by the removal of the 10/24 polls. In order for the limited amount of Friday the 28th polling to account for that difference would require those 169 voters to have chosen Trump by a ratio of 4:1. That's nearly impossible. Even a 55/45 split can't be supported by the polls conducted since that date.

So, I feel comfortable with the assessment that 10/29 is actually the best baseline. With that baseline, the change in polling is now down to only 0.3%. The selection of start date literally makes the difference between my prediction looking right (well under 1%), or wrong (assuming we evaluate it today at all).

Where Did the Votes Move From


The second interesting thing in the RCP moving average data is that Clinton's total did not drop at all, whether you use 10/28 or 10/29 as the start date. Jill Stein's numbers didn't change at all. It doesn't appear that the left, broadly, had any change in opinion based on this announcement.

The entirety of the difference in polling in the last week is Gary Johnson losing ground, and Trump gaining it (mostly the latter). Are we really to believe that bad news for Hillary Clinton caused Gary Johnson voters to shift their votes to Trump?

What seems more plausible to me is that Gary Johnson has been on a continual slide for nearly 2 months (as has Stein to a lesser degree). Johnson had a bad last week: his faux marijuana heart-attack and VP Weld's semi-endorsement of Clinton both likely lost him moderate Republican followers.

TL;DR


Was I right or wrong? Maybe. I still submit that it's too early to tell. I'll revisit this prediction on election day, when I'm hopefully not celebrating a Trump victory. And, hopefully improving on my early primary prediction of Marco Rubio being the GOP nominee. 😉

No comments:

Post a Comment