Photographer: Tomohiro Ohsumi/Bloomberg

What’s in This Picture? AI Becomes as Smart as a Toddler

Developments in machine learning allow computers to answer more complex questions about the contents of images

Artificial intelligence has graduated past the infancy stage of figuring out what's in an image. Computers have previously been capable of little more than a simple game of I Spy: Name a specific object or person, and they'll show you an image containing it. But thanks to new developments in AI research, machines can now answer more complex questions, like, “What is there on the grass, except the person?” (For the answer to that awkwardly worded enigma, take a look at the last image.)

research paper published on Thursday in Cornell University's Arxiv outlines a system that learns to identify fine-grained visual features of images, and the words associated with them. Then it combines the two into a dictionary in its digital brain. It then references this to answer new questions about never-before-seen images.

The research was conducted by a team comprised of experts from the Chinese Internet search company Baidu and a student at the University of California at Los Angeles, and coincides with similar research from Microsoft, Virginia Tech, and various other academic institutions that came out recently. “Our goal is to enable the computer to connect language with experiences in the physical world,” says Wei Xu, a distinguished scientist in Baidu's research group. “This is important for solving the problem of common sense reasoning.”

Courtesy Baidu

Bloomberg put the Baidu and UCLA system to its own test. I took a picture of a small citrus fruit in the palm of my hand, and sent it to Baidu with the question, “What is in the center of the hand?” The software answered: “An orange.” (It's actually a satsuma, but we'll let it slide.)

This development may sound small, but teaching computers to discern what's inside of images and associate them with language has proved immensely challenging. Such research draws on different disciplines that have only recently started to converge. Advances in this field brings us closer to a day when we may be able to ask a search engine like Google or Baidu to ferret through millions of images, and find only the ones containing a Volkswagen bus with a flat tire, or seven oranges in a bowl.

The development from Baidu and UCLA, while important, is far from perfect. The system can't handle multiple questions in a row, like asking what types of fruit are in a basket, then asking it to count the number of apples. In tests, it gave the correct answer 64.7 percent of the time, the paper says. People answered the questions with 94.8 percent accuracy. “In its current stage, the system is not ready for serious applications, as it still makes errors,” Xu says. But things tend to proceed quickly in AI. Since a major image recognition challenge called Imagenet began in 2010, the rate at which computers misidentify items has fallen about fourfold.

The AI system misidentified questions about these images.
The AI system misidentified questions about these images.
Courtesy Baidu

Baidu and UCLA's research follows earlier attempts by researchers at universities around the world, including the Max Planck Institute for Informatics in Germany, the University of California at Berkeley, the University of Toronto, and various technology companies. Microsoft's motivation is to nurture the development of ultra-smart AI systems that could be included with its products.

Both Microsoft and Baidu have used their research to generate humongous databases that other groups can use to test their own systems. Microsoft plans to organize an annual challenge and workshops to spur researchers to explore this area further. Creating computers that can look at images and answer specific questions about them “has the distinctive advantage of pushing the frontiers on ‘AI-complete’ problems,” Microsoft and Virginia Tech wrote in their paper. “Given the recent progress in the community, we believe the time is ripe to take on such an endeavor.”

Correct answers from the AI system developed by Baidu and UCLA.
Correct answers from the AI system developed by Baidu and UCLA.
Courtesy Baidu

Work done by UCLA and two startups has focused on the analysis of surveillance videos. One day, AI may be able to monitor security camera footage to quickly and automatically discover unmarked vans parked outside of banks for four hours without moving. Baidu is interested in other aspects, too. “In the future, potential applications are education and mobile image search,” Xu says. AI might cater lessons to students by, for instance, quizzing them on the types of animals in a photo their parents shot on a weekend trip to the zoo.

With the new research, computers have reached a milestone, not unlike that of many young kids figuring out the world. You can now show a machine a Dr. Seuss book, and it can tell you: On the cover of this book is a cat wearing a red and white striped hat.

Before it's here, it's on the Bloomberg Terminal. LEARN MORE