Hello Again, Speech Recognition
TECH & YOU PODCAST
Back in the bubbly 1990s, many folks in and out of the tech industry believed they would soon be controlling their computers simply by talking to them. This never happened, of course, but when the hype died, the research work went on and great progress was made. Speech may now be poised for a comeback on PCs and, even more so, on handheld devices.
I recently spent a day at IBM's (IBM ) Thomas J. Watson Research Center in Yorktown Heights, N.Y., catching up on the progress in speech recognition, computer synthesized speech, and automatic translation. Since the mid-1990s, speech recognition programs have helped the relatively small group of PC users who can't manage a keyboard. You can still buy several versions of IBM ViaVoice from about $75 or Dragon NaturallySpeaking from Nuance Communications for $150 and up, but PC-based speech recognition has never outgrown its niche status. That's because it takes lots of practice to reach the point where the computer recognizes 95% of what you say -- which means one error in every 20 words. For most people, typing is far more efficient.
Speech recognition in call centers is a different story. Many have replaced those horrid "press 1 for English" automated response systems with more user-friendly voice recognition. This shows that a technology doesn't have to be perfect to please customers, just less annoying than what it replaces. The real progress has been in the quality of the synthesized responses. In the best samples that I heard at Watson, cyber-voices could almost be mistaken for a real human.
LOOK FOR VOICE TECHNOLOGY TO BECOME a lot more common in cars, where hands-free operation of everything but the car itself can be a real safety plus. My favorite system is in the Acura TL, where the driver can control the navigation, climate-control, and audio systems, as well as a Bluetooth-equipped phone, just by speaking. Most luxury cars now have some sort of voice-control system, and this should move rapidly into the mass market over the next couple of years.
Speech recognition became more accurate as computer scientists found ways for software to grasp some of the meaning of language, so that it could put individual words in context. That same understanding is being extended, though not without difficulty, from simple recognition to simultaneous translation. One approach is a medical translation system that, in the prototype I saw, would allow a Chinese-speaking patient to discuss symptoms and treatment options in some detail with an English-speaking doctor. The translations into English, both spoken and written, are a bit stilted but easily comprehensible.
Given the scarcity of human interpreters, fast machine translation is a hot topic in both academia and industry. Watson researchers showed a demo that takes a foreign language television feed (from Al Jazeera, in this case) and translates it in real time. This is much harder than the medical translation because the subject matter is unrestricted and the syntax is far more complex than "Where does it hurt?" The output is just good enough to give you a sense of what is being said and can be used to determine what parts of a broadcast are worth the attention of a human translator.
While dictation is never likely to become the dominant method for entering text on PCs (keyboards simply work too well to be displaced), applications such as translation could give speech technology a big boost on laptops and desktops. Plus, recognition software is becoming more efficient, and handheld devices are acquiring greater processing power. The combination should soon make recognition practical on your Palm (PALM ) or BlackBerry (RIMM ), where data entry remains a challenge.
Speech has been flying below the radar for quite a while, promising more than it delivers. This next wave of tools could make devices easier to use and users more productive.
For past columns and online-only reviews, go to Tech Maven at www.businessweek.com/technology/wildstrom.htm