Trim Your Weights at Five
How many young African American men support Donald Trump? Not many. But if one of them ends up randomly selected into your survey, and if you use statistical weighting to ensure the sample does not skew too old and too white, then you are in trouble.
This is what’s happening to the LA Times’ election tracking poll being conducted in partnership with the University of Southern California’s Center for Economic and Social Research. USC has built a probability-based panel of 5,500 U.S. adults who are being polled at regular intervals about the upcoming election.
The problem with their poll is that it includes very few young African American men, but as luck would have it, among the very few they have, one of them plans to vote for Trump. To fix the overall imbalance of age and race on the panel, he is being assigned a statistical weight that is 30 times the average weight assigned to other panelists. The result? The poll shows support for Trump among African Americans, and especially among young men, is far stronger than any other polling (and everyday experience) suggests.
The flaw here is obviously methodological, but chances are pretty good it affects some of your research data as well. There are lots of “just push this button!” statistical weighting tools out there that will adjust your sample so it perfectly matches a pre-defined target population. It will be beautiful mathematically, but potentially ridiculous.
Weighting sample data is almost always necessary, and it is almost always a good thing to do. But keep in mind the dangers of doing it mindlessly, and follow these best practices:
- Avoid trying to analyze and weight very small subgroups. The LA Times pollsters decided to weight at micro-levels, targeting subgroups as small as 15 respondents. Just as you would never (I hope!) compare and contrast subgroups in your data that are so small, neither should you set weighting targets for them. Instead, roll them into a larger sub-group and admit that you don’t have the statistical power to do much else.
- Trim your weights at a value of 5. Most of us weight our data using raking algorithms, which assign weights iteratively and continually adjust to bring marginal distributions into alignment. Once you’ve done that, look at your weights. A good rule of thumb used is to trim any large weights back down to a value of five, and then re-rake (potentially many times) to adjust all your other weights accordingly.
If you are not the person actually doing the weighting, or preparing and tabulating your data, this is another great reason that you should always ask for and review your full data file. Look at your weights and make sure they are not ridiculous before you start drawing conclusions.