Can You Really Use AI to Create “Synthetic” Survey Respondents? Just-Published Academic Research Says No.
One of the weirdest new uses of artificial intelligence in market research is to create “synthetic respondents” for surveys and qualitative interviews. The idea is to use information scraped from the Internet via AI’s large language models, construct a sample of synthetic people that matches the demographics of one’s target population, and then ask those synthetic people questions from a survey or in-depth interview.
Research firms are racing to commercialize and sell this new use of AI. Fortunately, we have more cautious colleagues in the academic world who are carefully testing whether it actually works. They are comparing data constructed from synthetic respondents to rigorous survey data obtained from real people. Can one really get valid and reliable answers to market research questions by using AI to create synthetic samples? The answer is no.
Here is what one team of researchers set out to do:
Our primary analysis compares the distribution of responses from synthetic ChatGPT personas to matching corresponding respondents in the ANES [American National Election Studies]. We focus on three metrics of interest to social scientists: (1) how well ChatGPT recovers the overall mean and variance of feelings toward various groups, (2) how closely the (conditional) correlations between persona characteristics and survey responses mirror the inferences we would draw from the ANES, and (3) the sensitivity of our comparisons to changes in the prompt, the LLM, and the timing of data collection.
The researchers found that for the 11 measures of interest, the overall mean scores from the synthetic respondents closely matched the overall mean scores of the real survey. But these mean score similarities were superficial. Additional analysis showed:
- There was less variation in scores compared to the real survey
- Correlations and regression coefficients did not match the real survey
- The score distributions changed significantly with minor changes in question wording
- Identical questions yielded significantly different results when replicated over a 3-month period
Ultimately, the authors concluded: “… our findings raise serious concerns about the quality, reliability, and reproducibility of synthetic survey data generated by LLMs.”
So if you are curious about AI and how it might be applied to the work we do in survey research, by all means start experimenting with it. But keep in mind that you are simply experimenting and learning for the future. You are definitely not generating findings that are in any way valid, reliable, or scientific.
—Joe Hopper, Ph.D.