Six Ways to Identify Bad Survey Data
Bad data is worse than no data. With no data, at least you have intuition and experience to rely on. Bad data, on the other hand, will result in “findings” that are likely to spoil important decisions.
With most surveys fielded online nowadays, we have lost the opportunity for phone (or in-person) interviewers to assess the quality of a respondent’s answers. On the other hand, with online surveys there are all sorts of useful indicators to help assess whether respondents are providing truthful responses. This is because almost nobody takes a survey purposefully to give wrong or misleading data. Instead, they are bored, lazy, or irritated by your less-than-optimal survey design. Or they are trying to get through a survey quickly to collect an incentive payment or sweepstakes entry.
There are tell-tale signs of when respondents are giving bad data. We look for them in almost every survey we conduct or dataset we analyze. Here are six ways we suggest to identify bad survey data:
- Examine multiple select questions. Flag the data as suspect if a respondent (a) selects all options, especially in screening questions; (b) selects exactly one option in all multiple select questions throughout the survey; or (c) selects exactly two options in all multiple select questions throughout the survey.
- Identify straight-lining or patterning in questions that are laid out in grids, especially if it results in inconsistent answers or if a respondent does it on multiple grids in the survey. (We generally recommending avoiding grids, if possible.)
- Identify numeric inconsistencies, especially with respect to age. For example does a respondent say he has been in his current job for 20 years, but that he was born in 1980?
- Scan through open-end responses (and other-specify responses) for gibberish or excessively vague and short answers
- Flag all “speeders,” which for us usually means they are in fastest ten percent in terms of how quickly the survey was completed from start to finish.
- Include quality-check questions, and flag data if the respondent fails any of them. We usually include two quality checks: a stand-alone question asking the respondent to select a specific item, and a grid row instructing the respondent to click on a certain column.
For each of these, we review every case in the data and flag potential violations. A single violation, or even two, is rarely grounds for dismissal. Some respondents are very fast; a straight line through a grid is often legitimate; honest respondents sometimes misinterpret what we intended to ask. But if we start seeing three or more violations and/or failed quality check items, we will replace those cases with new ones. Every survey is different, and the cut-off decisions vary.
Whenever a customer relates a story about discovering data that is riddled with bad cases after sharing the report, I wish they would have called Versta. With simple quality-check processes in place, this bad data should never happen.
—Joe Hopper, Ph.D.