Variations in party % sampling, 2016 and Oct 2016 polls.

October 24, 2016

Note: I am strongly opposed to both Clinton and Trump. This blog advocates 3rd parties.

This is to briefly look at the numbers relating to recent ZeroHedge articles, which claim that Clinton’s recent strength in polls is exaggerated by uncorrected oversampling of Democrats.

I decided to look at how the party composition of various polls varies. The data has already been aggregated, I just made some graphs. Data is from Huffpo’s Party ID page. [found it via The Federalist]. Note that I do not know whether the pollsters applied any corrections to these figures in the actual results of the polls.

First, 2016 figures for the most prolific pollsters. There is considerable variation between pollsters! This was actually the most surprising and interesting graph for me. I was surprised to see that FOX is tied for highest %Dem and has lowest Indep.!! I didn’t do time-series of these (If there is interest I can), but they were consistent within each pollster, in 2016 at least. pct-party-vs-pollster

Since the Republican vs Independent breakdown makes it a little confusing, I synthesized the same data from the chart above into a figure to crudely represent “bias”, which should only be used to get an idea of the magnitude of the differences that could result from these changes in party composition of sampling, if not otherwise compensated. This is quite experimental, and doesn’t seem to align well with actual differences between actual national polls conducted by the pollsters below at about the same time. It would be interesting to investigate that correlation. [update, time series of this measure at end.]


Next, analysis of October polls vs the rest of the year. Note, the %stdev here is, for example, variation of %Dem between different polls for a time-group. It is is very different from “margin of error” reported in any poll, which is an indication of uncertainty due to sample size.


Next, plot vs time for % Democrat, across all pollster and polls, to see what can be learned at this level (not much, there is a faint uptrend but its buried in the noise). I am looking at %Dem, because historically, %Rep has been considerably lower than actual Republican voters, large numbers of whom identify as Independent. Take a look at the multi-year graph of Rep/Dem/Ind in the Huffpo link.


Lastly a histogram of the above, just to be complete. 35-36% Democrat is the clear center of the distribution, of sampling in 2016 national election polls.


Conclusion: In this case, the ZH article is overstating its case. Whether there is distortion at a more fine grained level, I suppose we won’t find out until the election, but I am not holding my breath. At the same time, I am wondering if variations in party composition of sampling (unless they’re further corrected) would result in noticeable biases which vary between pollsters (ballpark: +/- 2% or 3% vs some central average of all pollsters) – the latter completely apart from sampling error.

I repeat, I think both Clinton and Trump are unworthy of being president, by a longshot.

[update- additional info: Gallup’s party affiliation poll series. I don’t think this goes thru a likely-voter model like most of the election polls, hence the disconnect vs the election polls above.]

[update- on what the term “oversampling” means – article from Pew]

[update- just to satisfy my curiosity, broke out the RDI “net” bias vs average-of-all, vs time vs pollster.]

formula:  “bias” = D – R – 0.1*I  where D,R,I  are percent Dem,Rep,Ind

The 0.1 number is a wild guess- figures range from 0 to 0.2, tending to the low side in the general election due to Trump’s extreme negatives, and toward the high side of that in generic-congressional races. Also D and R’s chance of voting for their own party is more like 80%, and aren’t exactly equal – polls seem to show that D’s should get a little bonus from that.

the graph below is relative, meaning (bias – bias_average_over_all_2016_data)

I’m not concluding anything from this, it’s just another way to see how much variation there is in the party-balance coming out of the methodology and/or likely-voter models in these polls (not all of which are LV’s! Gallup I think is just a survey of party affiliation and Pew is RV’s, for example.)



