Why You Need Your Raw Data
As we get further along in our careers most of us get further away from the nitty-gritty of data collection, data coding, and sometimes even data analysis. But I always feel nervous about that distance, and usually want to take a quick scan through the raw data myself. I want to see all those rows and columns of numbers, variables, and cases just to be sure that our insights and stories are built on a solid foundation of reliable numbers. And it always surprises me how just five minutes of just sorting, counting, and cross-tabbing can point towards errors and suspicious data that would otherwise be invisible.
A fascinating article last year in the New York Times Magazine reminded me of how important it is for good researchers to have—and at least look at—their raw data. A prominent academic in social psychology fabricated experimental data over the course of many years, publishing some 55 articles in top academic journals:
Sitting at his kitchen table in Groningen, he began typing numbers into his laptop that would give him the outcome he wanted. He knew that the effect he was looking for had to be small in order to be believable; even the most successful psychology experiments rarely yield significant results. The math had to be done in reverse order: the individual attractiveness scores that subjects gave themselves on a 0-7 scale needed to be such that Stapel would get a small but significant difference in the average scores for each of the two conditions he was comparing. He made up individual scores like 4, 5, 3, 3 for subjects who were shown the attractive face. “I tried to make it random, which of course was very hard to do,” Stapel told me.
Doing the analysis, Stapel at first ended up getting a bigger difference between the two conditions than was ideal. He went back and tweaked the numbers again. It took a few hours of trial and error, spread out over a few days, to get the data just right.
The amazing thing is that this professor’s co-authors seem never to have looked closely at the data themselves. It wasn’t until a suspicious junior colleague and several graduates students pulled together several raw data sets and discovered suspicious numbers including multiple rows of nearly identical data that the fraud was unmasked.
A research colleague at a large bank told me a similar story not about fraud, but about sloppy fieldwork and data coding from a research firm that resulted in one row of data being replicated hundreds of times. Had she not taken a look at the data herself, her erroneous conclusions from the data could have done real damage.
Of course there are other important reasons to get your raw data as well. For example, a great new partner like Versta Research might come on board and suggest a useful way to re-analyze data from studies you’ve done in the past. But where is it? Make sure you have it, and make sure you are always comfortable with what’s inside.