A Tale of Data That Got Too Big
Here is a brilliant thought experiment about big data, published as a one-paragraph short story in 1946 by the Argentinian writer Jorge Luis Borges:
“. . . In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.”
Borges is writing about maps, not about big data. But the idea is the same. We capture and document increasingly more information about our lives. The data gets bigger and bigger. Eventually it becomes a one-to-one map of reality itself—a repository of data as gigantic, and as difficult to comprehend, as the reality it is supposed to represent. It becomes a Map of our World whose size is that of our World, and which coincides point for point with it.
At this point the data becomes Useless. There is the “real” world we live in, and now there is a “replica” world of big data that looks exactly like the real world it maps. But what is the point of replicating the world exactly?
So we’re back to square one. How do we make sense of all that data? How do we query it? How do we create sampling shortcuts and models of that data because we can’t possibly analyze it all? Well, in the same ways we research the “real” world, because the two have become identical. So, thank goodness for researchers (the Researchers Guild!) who still know how to extract, simplify, synthesize, model, infer, communicate, and summarize what is going on in the world and in all of that data.