NYT Blunders on Sample Size
Readers of this newsletter and blog know how much we value and respect the New York Times. It is one of the few resources we cite regularly, as it always has relevant, authoritative information that informs our work and that of our clients. So my heart sank when I read this description of a new study from Pew:
The study is not the first to suggest that American politics are sorting along ideological lines. But it is based on a survey of 10,000 Americans, roughly 10 times the size of the average political poll.
It was written by one of the staff reporters for “The Upshot,” a regular (new) feature of the paper that focuses on news stories based on data and statistics. So why did my heart sink? Because it suggests and reinforces the ignorant idea that very large samples make studies better and more believable. “Wow, you can really believe it now—this poll had 10,000 respondents, not just 1,000!”
Take a look at Versta’s Interactive Graph for Choosing Sample Size. The margin of error for a random sample of 1,000 people is ±3%. If you boost that up to 10,000 people the margin of error goes down to ±1%. The meaningful polarization this reporter is so eager to document ought to be visible with a three percentage point margin of error. In fact, it is. That’s why smaller political polls consistently report it as well.
Very little is gained with very large samples, and that is why most national polls use sample sizes of 800 to 1,200. The smart people at Pew surely used a large sample not because bigger is better, but because they needed to divide and analyze their data by subgroups (by regions, or age groups, or racial and voting groups, etc.) in order to tell more detailed stories beyond the overall polarization of American politics.
This new feature’s print advertisement reads: Driven by data—Informed by experts—Interpreted by us. Unfortunately their interpretation of Pew’s research perpetuates a rather persistent myth about data and statistics. So here’s an idea for The Upshot: How about a story on sampling and statistics, and how it actually works? It would be a valuable piece for their reporters to research (and learn about) and then offer to their readers.