P-Hacking and Other Bad Research Practices
Leave it to the National Science Foundation to keep me hip on current research lingo. A report published last May (with the very unhip title Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science) offered this definition of an edgy new term in research called p-hacking:
Generating many different sets of results and selecting one to report simply because it confirms a researcher’s expectations is a behavior referred to as “p-hacking”: a disingenuous attempt to generate a publishable result when the full array of available evidence raises questions about its replicability.
The unhip phrase we often use in market research is “cherry picking” but I’m a fan of the term p-hacking because it captures a whole spectrum of bad research behavior, and it conceptually links those behaviors to the ongoing problem of p-values in research.
Falling under the rubric of p-hacking, then, here is a list of those bad research behaviors (which the NSF politely refers to as “Questionable Research Practices”) pulled verbatim from their report:
(a) Failing to report analyses of all of the measures collected in a study and describing only those that yield desired findings.
(b) Deciding whether to collect more data after determining whether obtained results with a smaller sample document desired results.
(c) Failing to report analyses of data from all relevant experimental conditions that were executed in the course of data collection, because data from those conditions did not yield desired results.
(d) Stopping collecting data earlier than initially planned because desired results have already been obtained.
(e) “Rounding off” a p value in a way inconsistent with conventional practice (e.g., reporting that a p value of .054 is less than .05) in order to enhance the apparent robustness of a desired finding.
(f) Reporting only studies that produced desired findings and discarding studies that did not produce desired findings.
(g) Deciding to exclude data points only after determining that doing so will enhance the degree to which a study seems to produce desired findings.
(h) Keeping in data points because without them the desired findings will no longer be found.
(i) Reporting an unexpected finding as if it had been predicted a priori and thereby increasing its apparent plausibility.
(j) Claiming that analytic results are unaltered by controlling for other variables when this has not been fully checked empirically.
Don’t be a p-hack. Rigorous, truthful research findings are far more useful and far more important, no matter how boring they may seem. Plus, good research is never boring if you have the right skills and strategy at hand for turning data into stories.