Apple and Its Rivals Bet Their Futures on These Men’s Dreams
Over the past five years, artificial intelligence has gone from perennial vaporware to one of the technology industry’s brightest hopes. Computers have learned to recognize faces and objects, understand the spoken word, and translate scores of languages. The world’s biggest companies—Alphabet, Amazon.com, Apple, Facebook, and Microsoft—have bet their futures largely on AI, racing to see who’s fastest at building smarter machines. That’s fueled the perception that AI has come out of nowhere, what with Tesla’s self-driving cars and Alexa chatting up your child. But this was no overnight hit, nor was it the brainchild of a single Silicon Valley entrepreneur.
The ideas behind modern AI—neural networks and machine learning—have roots you can trace to the last stages of World War II. Back then, academics were beginning to build computing systems meant to store and process information in ways similar to the human brain. Over the decades, the technology had its ups and downs, but it failed to capture the attention of computer scientists broadly until around 2012, thanks to a handful of stubborn researchers who weren’t afraid to look foolish. They remained convinced that neural nets would light up the world and alter humanity’s destiny.
While these pioneers were scattered around the globe, there happened to be an unusually large concentration of neural net devotees in Canada. That’s only partly through luck: The government-backed Canadian Institute for Advanced Research (Cifar) attracted a small group of academics to the country by funding neural net research when it was anything but fashionable. It backed computer scientists such as Geoffrey Hinton and Yann LeCun at the University of Toronto, Yoshua Bengio at the University of Montreal, and the University of Alberta’s Richard Sutton, encouraging them to share ideas and stick to their beliefs. They came up with many of the concepts that fueled the AI revolution, and all are now considered godfathers of the technology. This is the peculiar story—pieced together from my interviews with them—of why it took so long for neural nets to work, how these scientists stuck together, and why Canada, of all places, ended up as the staging ground for the rise of the machines.
(Now, not everyone agrees with Canada’s pride of place. See if you can spot German researcher Jürgen Schmidhuber below, and find out why he’s so upset here.)
JUSTIN TRUDEAU, Canadian prime minister: AI is just a computer that can simulate human thought or human behavior. Within that, there’s machine learning, which is where you get a computer to do an experiment over and over again. It could be driving a simulated car down a road or trying to recognize a cat in a photo.
Within that, there’s a subset of machine learning called deep learning. The general idea is you build a neural network, and it has weights and biases that can be tweaked to home in on the desired outcome. You allow the computer to iterate and evolve the problem-solving. That’s what Geoff Hinton and others have really worked on over the past decades, and it’s now the underpinning of what’s most exciting about AI. It does a better job of mimicking the way a human brain thinks.
CADE METZ, reporter for the New York Times and author of a forthcoming history of AI: The idea of a neural network dates back to the 1940s—the notion of a computing system that would mimic the web of neurons in the brain. But a guy named Frank Rosenblatt really pushed the work forward in the 1950s. He was a professor and was also working with the U.S. Navy and other parts of the government, and he developed this thing called a Perceptron based off the neural network concept. When he revealed it, places like the New York Times and the New Yorker covered it in pretty grand terms.
Rosenblatt claimed it would not only learn to do small tasks like recognize images but also could theoretically teach machines to walk and to talk and to show emotion. But it was a single layer of neurons, and that meant it was extremely limited in what it could do. Needless to say, none of the things he promised actually happened.
Marvin Minsky, a colleague of Rosenblatt’s who happened to be one of his old high school classmates from the Bronx, wrote a book in the late 1960s that detailed the limitations of the Perceptron and neural networks, and it kind of put the whole area of research into a deep freeze for a good 10 years at least.
GEOFF HINTON: Rosenblatt’s Perceptron could do some interesting things, but he got ahead of himself by about 50 years. While Minsky had sort of been a believer in neural nets, he was able to show there were certain things they couldn’t cope with. The book by Minsky and Seymour Papert on the technology (Perceptrons: An Introduction to Computational Geometry) basically led to the demise of the field.
During the 1970s a small group of people kept working on neural nets, but overall we were in the midst of an AI winter.
METZ: Geoff Hinton, at Carnegie Mellon University and then later at the University of Toronto, stuck with the neural network idea. Eventually he and his collaborators and others developed a multilayered neural network—a deep neural network—and this started to work in a lot of ways.
A French computer scientist, Yann LeCun, spent a year doing postdoctoral research at Hinton’s lab in Toronto. LeCun was then recruited by Bell Labs in New Jersey.
YANN LECUN: I was fascinated by intelligence as a whole from a very early age. I grew up in the 1960s, so there was space exploration, the emergence of the first computers, and AI. So when I started studying engineering, I was really interested in artificial intelligence, a field that was very nascent.
I heard about the Perceptron and was intrigued, because I thought learning was an integral part of intelligence. I dug around to find everything I could about the Perceptron. As an engineer, if you want to understand intelligence, the obvious approach is to try to build a smart machine—it forces you to focus on the components needed to foster intelligence. It’s a bit like how the pioneers of aviation were inspired by birds, but they didn’t really copy them exactly. You don’t want to just mimic biological intelligence or the brain, because there are a lot of aspects of its function that are just due to biochemistry and biology—they’re not relevant to intelligence, really. Like how feathers aren’t crucial for flight: What’s important are the underlying aerodynamic principles.
METZ: There were people who thought LeCun was a complete nut and that this was sort of a Sisyphean task. You would go to these big AI conferences as a neural network researcher, and you weren’t accepted by the core of academia. These ideas were on the fringes.
YOSHUA BENGIO: In 1985 neural nets were a marginal thing and weren’t taught in my classes at McGill University. I was taught classical, symbolic AI. So I had to convince my professor to supervise me doing neural nets. I had a scholarship from the government, so I could basically choose my topic, and it didn’t cost anything to the professor. We made a deal that I could do machine learning, but I would apply it to the thing that he cared about, which was speech recognition.
LECUN: Around 1986, there was a period of elation around neural nets, partly due to the interest in those models from physicists who came up with new mathematical techniques. That made the field acceptable again, and this led to a lot of excitement in the late 1980s and early 1990s. Some of us made neural-net-based systems to do practical things like fraud detection for credit cards. I worked on an automated system for reading checks with character recognition.
METZ: At Carnegie Mellon, a guy named Dean Pomerleau built a self-driving car in the late 1980s using a neural network. It drove on public roads. LeCun used the technology in the 1990s to build a system that could recognize handwritten digits, which ended up being used commercially by banks.
So through the late ’80s and on into the ’90s, there was this resurgence in neural networks and their practical applications, LeCun’s work being the prime example. But again they hit a ceiling, mainly because of a lack of computing power and available data. We entered another AI winter.
JURGEN SCHMIDHUBER: With the Canadian guys, it’s clear we are not using their algorithms; they are using our algorithms. LeCun is really a French guy originally, and we are using his algorithm. So that’s good. And he had lots of contributions, which were really important and useful.
I have known these other guys for a long time. My first encounter with Yoshua was when he published the same thing, or more or less the same thing, four years after one of my students published it. And then a couple of years later there was a showdown at a conference where all of this came out. There was a public debate in the workshop, and there it was really clear who did what first. It wasn’t nasty. It was just clarifying things. What you do in science is you clarify things. (Bengio has denied Schmidhuber’s claims.)
LECUN: The problem back then was that the methods required complicated software, lots of data, and powerful computers. Not many people had access to those things or were willing to invest the time. Between the mid-1990s and mid-2000s, people opted for simpler methods—nobody was really interested in neural nets. That was kind of a dark period for Geoff, Yoshua, and I. We were not bitter, but perhaps a little sad that people didn’t want to see what we all thought was an obvious advantage.
HINTON: Of course, we kept believing in it and kept working on it, but engineers discovered that other methods worked just as well or better on small data sets, so they pursued those avenues and decided neural networks were wishful thinking. The number of people getting neural networks to work better was quite small.
The Canadian Institute for Advanced Research got people like us from all over the world to talk to each other much more. It gave us something of a critical mass.
LECUN: There was this very small community of people who had this in the back of their minds, that eventually neural nets would come back to the fore. In 2003, Geoff was in Toronto and was approached by Cifar to start a program on neural computations. We got together and decided that we should strive to rekindle interest in our work.
But we needed a safe space to have little workshops and meetings to really develop our ideas before publishing them. The program officially started in 2004, and by 2006 there were papers that were really interesting. Geoff published one in Science.
TRUDEAU: Learning that Canada had quietly built the foundations of modern AI during this most recent winter, when people had given up and moved on, is sort of a validation for me of something Canada’s always done well, which is support pure science. We give really smart people the capacity to do smart things that may or may not end up somewhere commercial or concrete.
HINTON: In 2006 in Toronto, we developed this method of training networks with lots of layers, which was more efficient. We had a paper that same year in Science that was very influential and helped back up our claims, which got a lot of people interested again. In 2009 two of the students in my lab developed a way of doing speech recognition using these deep nets, and that worked better than what was already out there.
It was only a little better, but the existing technology had already been around for 30 years with no advances. The fact that these deep nets could do even slightly better over a few months meant that it was obvious that within a few years’ time they were going to progress even further.
METZ: Around 2009, there was this random meeting between Hinton and a speech recognition researcher at Microsoft named Li Deng. Like just about everyone else, Li Deng believed in a different form of AI known as symbolic AI. In this approach, you basically had to build speech recognition systems one line at a time, coding in specific behavior, and this was really slow going.
Hinton mentioned that his neural-net approach to speech recognition was showing real progress. It could learn to recognize words by analyzing the patterns in databases of spoken words, and it was performing faster than the symbolic, line-by-line work. Deng didn’t necessarily believe Hinton, but invited him and eventually two of his collaborators to Microsoft to work on the technology. Speech recognition took a huge leap forward at Microsoft, and then Google as well in 2010.
Then, at the end of 2012, Hinton and two of his students have a huge image recognition breakthrough where they blew away previous techniques. That’s when not just Microsoft and Google but the rest of the industry woke up to these ideas.
The thing to remember is, these are very old ideas. What changed is the amount of computing power and data behind the neural networks. To run a Microsoft or a Google, you need thousands of machines operating in concert, processing everything from text to videos. This ultimately allowed neural networks to succeed. You need the data to train on, and you need the computing power to execute that training.
LECUN: Why did it take so long? That’s just the way science works. It’s psychology. Before a set of techniques is adopted, people have to be convinced that it can work. These methods had a bad reputation of being finicky and requiring some black magic.
RICHARD SUTTON: It’s profound to see such steady increases in computer power. Now we’re in a bit of a race between people trying to develop the algorithms and people trying to develop faster and faster computers. You have to sort of plan for your AI algorithms to work with the computers that will be available in 5 years’ and 10 years’ time.
The computer has to have a sense of what’s good and what’s bad, and so you give it a special signal called a reward. If the reward is high, that means it’s good. If the reward is low, that means it’s bad. It’s where a purpose comes from.
A neural net is where you store the learning, and reinforcement is how you decide what changes you’d like to make.
BENGIO: We’re still a long way from the kind of unsupervised learning that Geoff, Yann, and I dream about. Pretty much every industrial product based on deep learning relies mostly on supervised learning, where computers have to be told what to do in millions of cases. And of course, humans don’t learn that way; we learn autonomously. We discover the world around us by ourselves. A 2-year-old has intuitive notions of physics, gravity, pressure, and so on, and her parents never need to tell her about Isaac Newton’s equations for force and gravity. We interact with the world, observe, and somehow build a mental model of how things will unfold in the future, if we do this or that.
We’re moving into a new phase of research into unsupervised learning, which connects with the work on reinforcement. We’re not just observing the world, but we’re acting in the world and then using the effect of those actions to figure out how it works.
LECUN: I’m interested in getting machines to learn as efficiently as animals and humans. When you learn to drive, you know that if you get off the road, bad things will happen. We can predict the consequences of our actions, which means that we don’t need to actually do something bad to realize it’s bad.
So, what I’m after is finding ways to train machines so that they can learn by observation, so they can build those kind of predictive models of the world. Every living animal has a predictive model of its environment. The smarter they are, the better they are at doing this. You could say that the ability to predict is really the essence of intelligence, combined with the ability to act on your predictions.
LECUN: It’s quite possible we’re going to make some significant progress over the next 3 years, 5 years, 10 years, or 15 years—something fairly nearby. It’s going to take a long time after that to actually build systems around this that are somewhere near human intelligence. That will take decades.
BENGIO: I don’t think that humans will necessarily be out of jobs, even if machines become very smart and maybe even smarter than us. We’ll always want real people for jobs that really are about human interactions. I don’t want a robot taking care of my babies, or grandparents, or me when I’m sick in the hospital. I’m not worried about the Terminator scenario. I believe that if we’re able to build machines that are as smart as us, they will also be smart enough to understand our values and our moral system, and so act in a way that’s good for us.
My real concern is around the potential misuse of AI, for example as applied to military weapons. It’s already being used to influence people, as you can see in advertising. In places where deploying AI is morally or ethically wrong, I think we should just make it illegal. We need to become collectively wiser.
SUTTON: I think it’s a big mistake that we’ve called the field “artificial intelligence.” It makes it seem like it’s very different from people and like it’s not real intelligence. It makes people think of it as more alien than it should be, but it’s a very human thing we’re trying to do: re-create human intelligence.
Science has always revealed truths that not all people like—you get the truth but not always the one you wanted. Maybe this is why religion has historically been at odds with science. I think it will be the same as we learn more about the mind. Maybe there won’t be an explanation of consciousness. And some people will like that, and some people won’t like that. Science can’t change what’s true.
There will always be winners and losers whenever you have change, and there’s vast change coming. I think we will become the intelligent machines. We should think of the AIs either as ourselves or as our offspring. We can create them to be as we see fit.
What is humanity? It’s a striving to be better. We shouldn’t want to freeze the way we are now and say that’s the way it should always be.
LECUN: Until we know exactly what it’s going to look like, worrying about this really is premature. I don’t believe in the concept of singularity, where one day we’ll figure out how to build superintelligent machines and the next day that machine will build even smarter ones and then it will take off. I think people forget that every physical or social phenomenon will face friction, and so an exponentially growing process cannot grow indefinitely.
This Hollywood scenario where some genius somewhere in Alaska comes up with the secret to AI and builds one robot and it takes over the world, that’s just preposterous.
TRUDEAU: It’s not something I overly fret about. I think we’ve all seen or read enough science fiction about how dangerous AI theoretically could be. I think there’s always a sense that technology can be used for good or bad. I’m reassured that Canada is part of it in terms of trying to set us on the right path. And I wouldn’t want to slow down our research, our trying to figure out the nuts and bolts of the universe.
The question is: What kind of world do we want? Do we want a world where the successful have to hide behind gated communities and everyone else is jealous and shows up with pitchforks? Or do you want a world where everyone has the potential to contribute to innovation?
HINTON: I think the social impact of all this stuff is very much up to the political system we’re in. Intrinsically, making the production of goods more efficient ought to increase the general good. The only way that’s going to be bad is if you have a society that takes all of the benefit of that rise in productivity and gives it to the top 1 percent. One of the reasons I live in Canada is its tax system; if you make a lot of money, the country taxes you a lot. I think that’s great.
My main take is that it’s really hard to predict the future. As soon as you start making predictions about what’s going to happen in 20 years’ time, you almost always wind up hopelessly wrong. But there are some things we can predict, like that this technology is going to change everything.