PCs and Speech: A Rocky Marriage
When David Nahamoo joined IBM's "speech" team in 1982, he was signing on to one of the most difficult but exciting tech challenges of the 20th century. The goal was to make computers smart enough to recognize spoken language. He wasn't alone. Small bands of researchers at brain centers such as Massachusetts Institute of Technology and Stanford Research International also fancied that they could free humans from the tyranny of keyboards.
Two decades later, the keyboard has yet to land on history's trash heap. Although faster processors have helped software recognize tens of thousands of words with increasing accuracy, users still complain that dictation programs for PCs garble words and don't automatically punctuate sentences. In automated call centers, speech-recognition programs sometimes balk at accented English and have trouble with high-pitched voices of women and children. "We realized that performance always falls below the customer expectations," says Nahamoo, now manager of human-language technologies for IBM Research.
And so, while scientists toiled in their labs, the market for dictation tools faded like a distant radio signal. Because computers are so important, most schools now push students to master typing, which undercuts demand. Furthermore, the whole speech sector was tarnished when market leader Lernout & Hauspie admitted in late 2000 that it had overstated sales and then filed for bankruptcy. Doubtful that the market would ever mature, some analysts stopped covering speech altogether. "The industry remains a legacy of disappointments, uncertain demand, and lackluster reliability," says John P. Dalton, an analyst at Forrester Research Inc. in Cambridge, Mass.
Die-hard speech-recognition enthusiasts concede that the dictation market has been a bust, but still believe speech technology will take root. They're particularly excited about environments such as call centers, where consumers dangle on help lines and navigate maddening touch-tone menus. Already, in some locations, these have been replaced by friendly, human-sounding responders that seem to understand natural speech and can deliver on request everything from bank balances to weather forecasts and travel itineraries. Forget about dictation, says Ronald Croen, CEO of Nuance Communications (NUAN ) in Menlo Park, Calif., which sells core recognition software that others build into call-center applications. "We've all realized there are more diverse opportunities on the communications and services side," he says.
Among the companies that have gone down this road are Yahoo! Inc. and Amtrak. At Yahoo, subscribers pay $4.95 a month to interact with a virtual responder named Jenni, who can help them find sports scores and weather reports. And Amtrak's perky attendant, Julie, serves up schedule, fare, and train-status information.
Sometimes, wireless technology is the lever for speech. Later this year, Wells Fargo & Co. (WFC ) plans to test a speech program named Reed, which will help credit-card customers check balances and make payments--making life easier for callers on the go. "A lot of our customers say it's hard to use our automated system with their cell phones," says Tom La Centra, vice-president for customer service in card services at Wells Fargo. If the choice is talking or punching in a 16-digit credit-card number on a tiny keypad, "people would find speech less aggravating," agrees Steve McClure, an analyst at IDC.
Impressive as some of these programs are, most companies that operate call centers have yet to adopt sophisticated speech software. Researcher Frost & Sullivan puts the call-center market for such voice programs at $114 million in 2001--just 10% of what analysts predicted a few years ago for the speech-recognition market as a whole. And some speech companies that banked on sales to call centers have seen revenues slide. Sales at Nuance were down 24% in 2001, to $39.2 million, and the company lost $110 million. Market leader SpeechWorks International Inc. in Boston saw sales rise 50%, to $44 million, in 2001, but its net loss deepened 58%, to $46.8 million. This year, SpeechWorks predicts that sales will be flat. Both companies announced layoffs in the last 12 months.
To explain the eerie silence on the speech front, company executives blame the sinking economy. But analysts say the problems are more fundamental. Voice software can be buggy, testifies IDC's McClure, and takes far too long to implement. Amtrak, for example, spent 18 months getting Julie to the point where she could tell callers if a certain train was on time. Twelve months later--in April of this year--she was able to answer questions about fares and schedules. But Julie won't be able to take credit-card payments until this fall. In the meantime, she relays those callers to a human operator.
To date, Amtrak has spent $4 million on speech-related hardware, software, and integration. Execs at the financially strapped operator don't regret the outlay, though. The portion of satisfied callers--those who get information before they hang up or switch to a live person--has risen to an average of 27% with the voice system, from 16% with the touch-tone menu, according to Robert Hackman, senior director of distribution systems at Amtrak. "Our payback is within a year," he says, based on reduced labor costs in call centers. Voice systems cost 25 cents a call, Hackman explains, down from about $5 when a human is on the line.
Such testimonials give developers hope. Stuart Patterson, CEO of SpeechWorks, figures that if all touch-tone call centers converted to speech, it would amount to a $20 billion market over the next 10 years. (Giga Information Group more conservatively predicts a cumulative $4 billion in sales by 2006.) Patterson and other speech leaders expect a turn-around when tech spending revives and key telecom customers get back on their feet. "The speech wave hasn't hit," insists Patterson. "I think it'll happen in 6 to 12 months."
Projections like that, however, raise red flags. For decades, skeptics have joked that true speech recognition is 18 months away--and always will be. So far, history bears them out. Back in 1984, a company in Newton, Mass. called Dragon Systems Inc. introduced a 1,000-word dictation product. Steady improvements eventually led to the 1997 release of the 230,000 word Dragon NaturallySpeaking program. Sales in 1998 jumped 158%, to $69.4 million.
But in hindsight, that year looks like both the beginning and the end of dictation's golden era. Sales dipped in 1999, and Dragon co-founders Janet and Jim Baker sold their company to Lernout & Hauspie in March, 2000--mostly for L&H stock. L&H snapped up other pioneers as well. When the Belgian company filed for bankruptcy later that year, the Bakers were left virtually empty-handed after spending 30 years trying to perfect the technology. ScanSoft Inc. in Peabody, Mass., picked up pieces of L&H's intellectual property, but Baker says few of the original Dragon engineers went with it. Now on the lecture circuit, she continues to have high hopes for the technology. However, she says: "To make serious improvement for the future will take the best and brightest minds."
Defenders of speech technology say it's a mistake to judge the whole sector on the history of dictation alone. ScanSoft, for example, sees an important niche in voice-synthesis software, which lets computers read text out loud. This application might be useful to drivers who wish to have their e-mail read aloud during long commutes. IBM, meantime, has just launched an initiative dubbed the Super Human Speech Recognition Program, to look for a quantum leap in accuracy and reduce the need for customization. Big Blue says the initiative will run until 2010. Maybe by then, you'll be listening to this magazine. Then again, maybe not.
By Faith Keenan in Boston