Don’t Be Fooled by Big Numbers
Suppose you could have a neural network machine-learned algorithm that is 91% accurate in predicting who will buy your product and who will not, just with a computer facial scan. Wow—91% sounds amazing! Don’t buy it, and don’t be a sucker for big numbers until you look more closely at the math behind those big numbers.
Case in point. A business professor at Stanford University just published research in the Journal of Personality and Social Psychology claiming to have built a facial-scan algorithm that identifies sexual orientation based on facial scans of photos. It’s 91% accurate, he says, and I’m sure he’s right. Here’s the trouble, though, accurately laid out by Heather Murphy of The New York Times (get out your calculator—it will help):
Let’s say 5 percent of the population is gay, or 50 of every 1,000 people. A facial scan that is 91 percent accurate would misidentify 9 percent of straight people as gay; in the example above, that’s 85 people. The software would also mistake 9 percent of gay people as straight people. The result: Of 130 people the facial scan identified as gay, 85 actually would be straight. “When an algorithm with 91 percent accuracy operates in the real world,” Dr. Cox [a critic of the study] said, “almost two-thirds of the times it says someone is gay, it would be wrong.”
Here is a different, even easier way to think about it, and it will not require your calculator. Let me, Joe, tweak the algorithm. If 5 percent of the population is gay, I’ll delete all the fancy calculations and turn it into a blunt prediction machine that always predicts the person is straight. Voilà, I will have boosted the accuracy of your algorithm to 95%.
The next time you are in the market for fancy predictive analytics, do some simple math before you buy. Impressive-sounding big numbers often mask some pretty flimsy findings underneath.
OTHER ARTICLES ON THIS TOPIC:
Data Geniuses Who Predict the Past