Consultant Andreas Weigend on Big Data Refineries

Independent consultant, former chief scientist at

Consultant Andreas Weigend on Big Data Refineries
Andreas Weigend, independent consultant, former chief scientist at

Independent consultant Andreas Weigend joins our experts discussing the untapped potential of data analysis in medicine, education, and elsewhere, along with the pitfalls that may lie ahead.

You’ve compared big data to oil.
If you find raw oil in your backyard, it’s probably not that useful. You need to refine the raw materials in order to get something that people need. Raw data is not that useful, either. Examples of data refineries would be Amazon or Google. A big difference, of course, is that we’re not going to run out of data anytime soon, as far as I know. So in terms of pricing, there are interesting implications of information products vs. oil products.

The process of drilling for and refining oil has led to all sorts of environmental and political problems. What’s the parallel for data?
If you want to push the energy metaphor, it is more like a nuclear power plant, where the question is are we willing to deal with the risk of nuclear accident, which is very unlikely but when it happens: big problem. A nuclear accident would be like late last year when data on a third of all credit-card users in the U.S. leaked.

Seems like many of the benefits are enjoyed by the companies you describe as refineries, as opposed to their users. How do we make sure everyone benefits from big data?
I think that what has happened with Apple’s App Store will happen in the data economy, where companies will build services on top of the raw material. There is value in having an app store where another party uses data to provide apps for the consumer and shares revenue with the data company.

Doesn’t this create very powerful gatekeepers?
Yes. At the same time, a lot of the valuable research has moved to places like Amazon or Facebook, and if we try to take antitrust action against them in the best interest of consumers, we should look at the second-order effects.

You were the chief scientist at Amazon about a decade ago. Was it a completely different world then?
Ten years ago we were already seeing the shift in focus from algorithms, which meant getting everything you could out of the data you had, to simply getting more raw data. So it was totally different, but we still have similar ideas. Jeff [Bezos] is still Jeff.

What industry do you think is sitting on the most interesting data reserves and hasn’t figured out a way to utilize it?
There’s a Chinese company called Tencent that makes WeChat, which has totally changed the way Chinese people communicate. Contrast that to the e-commerce company Alibaba, which knows what you’re interested in, what you search for, and what you eventually buy. They know whether you returned the item, whether there are payment issues, etc. These are two companies with a billion users. Which has more potential, knowing all about my communication behavior or knowing all of my financial transactions? It depends, of course, on which industry you are interested in. But the real potential is the cross-fertilization across those two things. For instance, you can learn a lot from Tencent if you need to make credit decisions. Knowing whether you hang out with prostitutes or with a pimp says a bit about your propensity of paying back a loan.

How do you deal with information overload in your personal life?
We just have to have a model that acknowledges that, hey, people miss things. Don’t get angry if somebody else misses an e-mail. Try to reach them through another channel.

For more conversation and video, visit:

Before it's here, it's on the Bloomberg Terminal. LEARN MORE