Data Geniuses Who Predict the Past
If there is one thing I hope you remember from your college statistics class, it’s this: Correlation does not imply causation. This is especially important to remember in our world of big data. Any large dataset will have hundreds of thousands of correlations, but most of those correlations will reflect purely random occurrences that mean nothing.
Welcome to the world of “predictive analytics” – a fancy term for statistical efforts to sift through lots of data looking for correlations, most of which mean nothing.
And for today’s statistics lesson, welcome to the world of higher education where administrators may have forgotten that correlation does not mean causation. A recent feature article in the New York Times described how big data is being used to predict success among college students:
Georgia State is one of a growing number of colleges and universities using what is known as predictive analytics to spot students in danger of dropping out. Crunching hundreds of thousands and sometimes millions of student academic and personal records, past and present, they are coming up with courses that signal a need for intervention.
Of course, when you crunch millions of numbers, you can find some pretty compelling numbers:
The analysis showed that fewer than 10 percent of nursing students with a C in math graduated, compared with about 80 percent of students with at least a B+. Algebra and statistics, it seems, were providing an essential foundation for later classes in biology, microbiology, physiology and pharmacology.
Hmm. Trouble is, the miracle of finding such predictors is inconsistent. “Different courses at different universities have proved to be predictors of success, or failure,” the articles continues, noting various universities where predictor courses might be algebra, statistics, biology, English composition, American history … you name it.
Instead of seeing this as damning evidence that the predictors may be random, the reporter suggests (and administrators seem to believe) that this is all part of the miracle. Predictive analytics can find the predictor customized to your school and your own students. As such, they are investing hundreds of thousands of dollars in search of their predictors, and even more in building interventions to help students they identify as being at risk.
Are these courses truly predictors of student success? Possibly. But the article presents no validated evidence of it. Remember, correlation does not imply causality, and indeed, correlation may not imply any meaningful relationship at all. The only thing these fancy models have done so far is identified potentially random predictors of the past.