Before he died on Oct. 5, Steve Jobs left clues that he was working on a new product that would revolutionize how we interact with our TVs. “It will have the simplest user interface you could imagine,” he said to biographer Walter Isaacson, and will eliminate the collections of remote controls that litter living rooms. After years of struggling with the Apple TV set-top box, which was never a huge success, “I finally cracked it,” he said.
No one knows for sure what “it” is, and Isaacson isn’t saying. But many tech executives agree that an Apple TV set is likely to make use of humankind’s most natural interface: the voice. Already, millions of Apple customers are talking to their new iPhone 4S, thanks to a program called Siri that tries to provide an answer to questions like, “How’s the weather today?” Whether the rumors are true that Apple is planning to release a TV set by 2013, Siri-like voice recognition is headed for the living room. Microsoft is already there, via its Xbox 360 game console, and Comcast, Samsung Electronics, LG, and Sharp are working on voice-enabled features for TV sets, set-top boxes, and related products. Mike Thompson, senior vice-president at Nuance Communications, the world’s largest supplier of voice recognition technology, says “a wave” of device makers will ship products that understand voice commands next year.
It’s easy to see the appeal. Few would be upset if, instead of figuring out which one of three remotes to use, viewers could sit on the couch and say, “Record the next episode of Modern Family.” And while a growing percentage of new TVs connect to the Internet, many customers are put off by overly complex controls or on-screen keyboards that require the user to type by moving a cursor at an excruciatingly slow pace, says Jakob Nielsen, a product usability expert and co-founder of design consultancy Nielsen Norman Group. “Anything would be better than what we have now,” he says. “We can only go up from here.”
Microsoft has the early lead thanks to Kinect, an Xbox peripheral with cameras and motion sensors for hands-free gaming. Kinect also has sensitive microphones. After waking up the system by saying “Xbox,” subscribers to Microsoft’s $60-a-year Xbox LIVE service can search for shows, movies, and games by speaking to Microsoft’s Bing search engine. “You get a lot of claims saying, ‘We’re about to transform TV,’ ” says Ross Honey, general manager of Xbox LIVE entertainment and advertising for Microsoft. “We already have.”
Most consumers’ first opportunity to talk to their TVs—and have them listen—will be through voice-enabled apps for their smartphone or tablet. More than 3 million Comcast subscribers have downloaded an app that turns their smartphone into a remote control for the company’s Xfinity broadband service. Comcast is looking at adding voice-control features to the app, says spokeswoman Jennifer Khoury. Samsung and Sharp are developing similar apps of their own, according to people familiar with their plans. This may well have been the approach Jobs had in mind. According to one former Apple manager who asked to remain anonymous because he was not authorized to speak publicly, Jobs saw little reason for a stand-alone remote when iPhones and iPads can do the job better.
Others are looking to fix rather than eliminate the remote. Nuance’s Thompson says TV, DVD, and set-top box makers are all working on models that look more like iPhones, some with touchscreens rather than that gaggle of unused buttons. Some of the prototypes are designed around a single prominent button that activates a microphone, he says. Cost will be a challenge, since such a device would need a microphone and Wi-Fi antenna instead of the infrared sensors now commonly used. Industry politics will also be an issue. Since having every electronic box within earshot respond at once would be a nightmare, equipment makers need to agree on which device runs the show.
The best approach of all, says design expert Nielsen, is to have no remote at all. Nuance is researching ways to embed microphones around the living room, like so many home-theater speakers—the better to discern words, says Thompson. And TV makers are looking into building mics right into TVs. Piper Jaffray analyst Gene Munster expects such devices from Apple in 2013, but others may be on the market by then. Nuance’s Thompson estimates that 5 percent of TVs could have built-in voice control by Christmas 2012.
Dave Grannan, chief executive officer of voice software maker Vlingo, expects many technologies to be integrated eventually. In his dream scenario, he’ll be able to tell his TV to pull up his Netflix queue. Using Kinect-style motion control, he could then air-swipe through his library. Thanks to eye-tracking software in the TV or set-top box, he would simply look at the movie he wants to watch and say, “Play that.” “The combination of voice, gesture, and eye tracking is the future,” says Grannan. He says Vlingo will announce its first voice recognition product for TV at the International Consumer Electronics Show that begins on Jan. 10.
Major hurdles remain. “The living room is tough,” says Dag Kittlaus, who co-founded Siri, the voice control technology startup bought by Apple in 2010. Kittlaus left Apple in October and is now writing a sci-fi novel (which, yes, involves computers that listen). Voice recognition products in the living room need to be able to distinguish voice commands from casual chitchat, screaming kids, and the sounds coming from the TV itself. Also, most TVs don’t share a common operating system such as Windows for which developers can create new applications. “The minute someone comes along with an ‘app store’ for TV, it will break wide open and transform the experience,” says Kittlaus. “As it stands, the only people who have access to what you see on the TV are the TV makers and cable guys.” A startup called Zypr, founded by consumer electronics maker Pioneer, has developed standards for speech-enabled devices but has yet to announce partners.
The biggest challenge, however, is simply making the software smarter. Kinect users are limited to a handful of commands, and plenty of iPhone 4S owners have already tired of hearing Siri tell them, “I don’t understand.” That means the heaviest lifting still needs to be done by the technology companies that have worked for decades on artificial intelligence. SRI International, the Silicon Valley research lab that created the Siri technology before spinning it off as a separate company, is working on software to allow for far more advanced computer-human dialogue. There’s even a project to enable technology to discern a person’s mood by picking up on verbal cues. “Siri is the beginning of the story, or near the beginning of the story,” says SRI Vice-President Norman Winarsky, an early champion of Siri. “There’s much more to come.”