Big Data (which others call the universe) is old. Our brains naturally process overwhelming sensory input and distill it to something intelligible. Markets have always been massively parallel computers of prices. And many data science techniques mimic nature: Metallurgic annealing, ant colony behavior, and evolution all form the basis of modern algorithms.
Still, something’s happening.
Below, a selection of large data sets are plotted according to their dimension and sample size: that is, how many attributes of how many things. While it’s hard to unambiguously define those things, we can at least get a rough sense of the contours of Big Data’s bigness.
Thought experiments, with no data except the mechanics of our brains, would lie infinitely far to the bottom left; Jorge Luis Borges’s “Library of Babel,” containing all possible truths and untruths, would lie infinitely far to the top right. Thought experiments have been invaluable in science, from Galileo to Einstein. Borges’s library, in its perfect completeness, was perfectly useless. Thus the proof of Big Data pudding will be in the eating.
“Far better an approximate answer to the right question, which is often vague,” said statistician John Tukey, “than an exact answer to the wrong question, which can always be made precise.”