13 Suspects: The Verdicts on Gallup’s Gaffes
Even if you don’t care about political polling, or the fact that Gallup consistently overestimates support for Republican candidates, it is worth paying attention to how Gallup is trying to fix its problems with surveys and polling.
They are not happy with how poorly their polls have fared (who would be?), and they have teams of smart people trying to figure out what is wrong. Given their high profile, they are making the process and findings of their investigations public, and we have much to learn from that.
Last week they released their findings from an extensive review, which involved outside experts as well as internal ones. It is fascinating to read, because they identify 13 suspects in their survey process that all companies who do survey research should always be thinking about:
1. Tracking design. Did daily sampling quotas, interviews, and weighting (versus aggregating quotas and weighting protocols over several days) affect the results? Verdict: innocent.
2. RDD list-assisted landline vs. listed landline samples. Gallup used random digit dialing only for its mobile phone sample, but not for landlines. For landline phones, they called only numbers published in directories. Verdict: guilty.
3. Company name. Does the brand name of the polling firm influence who is willing to participate in surveys and how they answer? Verdict: innocent.
4. Race of the interviewer. African American interviewers win better cooperation among African American voters, but there was no discernible effect on outcomes. Verdict: innocent.
5. Gender of the interviewer. Women interviewers were somewhat more likely to get respondents who supported Obama, but there was no discernible effect on outcomes. Verdict: innocent.
6. Neutral probing of “don’t know” and “refused” responses. Gently pushing respondents to reveal an underlying preference even if they say “I don’t know” might have affected statistical estimates. Verdict: innocent.
7. Geographic distribution of interviews. National samples usually set quotas by four large regions within the U.S., but not by areas within those regions. Verdict: guilty.
8. Interview completion time. Which day of the week polling is done, and at what times of day, can affect the types of respondents who are reached. Verdict: innocent.
9. Cellphone and landline phone distribution. The most accurate telephone polling now requires companies to call both cellphones and landlines. Was the 50%-50% mix appropriate? Verdict: innocent.
10. Measuring and weighting race. The most accurate ways to ask about race and to compensate for skews with statistical weighting is always changing, even at the U.S. Census Bureau. Did Gallup’s protocols affect their results? Verdict: guilty.
11. Handling of third-party candidates. Not explicitly reading third-party candidate names and party affiliations could affect the accuracy of polling predictions. Verdict: innocent.
12. Candidate name order in questions. The order in which names are read does affect outcomes, but randomization helps to ensure that any effects are equal across all candidates. Verdict: innocent.
13. Likely voter estimating. Some opinions should be ignored in election polls if there are good reasons to believe those people will not vote. But how good were the criteria for deciding who the most likely voters are? Verdict: guilty.
Conducting valid and reliable surveys is not easy. There are multiple places where design, fieldwork, data collection, and analysis can go wrong. There are many places where experience and professional judgment play critical roles.
Every survey research and measurement firm and every internal market research team should be keeping their eyes on suspects like these — and additional suspects that are unique to each research effort — with the highest levels of vigilance and care.