A Computer That Recognizes Its Master's Voice?

Thomas Fumai is the type of New Yorker who talks so fast that some people, even his wife, have trouble understanding him. At work, though, the machine on his desk has no trouble at all. As a securities trader for Shearson Lehman Brothers Inc., Fumai spends much of the day chatting with a personal computer. When Fumai tells it to, his PC will instantly buy or sell $25 million or so in government bonds. Not only is speaking to the computer easier and faster than typing but it's also less error-prone, he says.

Computers that can understand and respond to human speech have been science-fiction staples for decades. While versatile conversationalists like the HAL computer in 2001: A Space Odyssey have not yet arrived, recent advances make it practical to add much simpler forms of speech recognition to everyday tasks in offices, homes, hospitals, and factories.

SMALL TALK. The dream of getting computers to process speech the way humans do has exasperated some of the world's best minds. "When I started, I assumed it would all be solved in 10 years," says Lalit R. Bahl, who has been exploring speech at IBM's Thomas J. Watson Research Center since 1972.

Lately, many researchers have concluded that it's not really imperative to create computers that can comprehend everything. "Let's have a machine that knows all about just one thing," says Lawrence R. Rabiner, a research director at AT&T Bell Laboratories. If a computer is trained to understand, for instance, the vocabulary of booking an airline reservation or trading stocks, then the problem becomes far simpler (chart).

Thanks to this new approach, the technology is finally living up to its promise. The U. S. market for speech-recognition hardware and software will surpass $100 million this year, according to Probe Research Inc. in Cedar Knolls, N. J. That's up from $50 million in 1990 and from almost nothing in the mid-1980s. Another big factor in the growth: Speech-recognition programs, which a few years ago required massive mainframes, can now run on powerful but inexpensive desktop computers.

So far, the biggest market has been for systems that replace the keying-in of data by hand. At the U. S. Postal Service branch in Oklahoma City, mail sorters read zip codes from large sacks--about 10,000 of them daily--and speak the numbers into microphones. Each sack is then routed to the proper district. Because they no longer have to put down the bags and type, they're sorting at four times the old rate.

ROBO OPERATORS. While replacing manual data entry is the leading use of speech recognition now, two other forms will become more popular within two years, according to Voice Information Associates Inc., a market researcher in Lexington, Mass. The first is automating telephone tasks, such as providing directory assistance or allowing consumers to order merchandise directly by phone. The second is dictation: instantly converting a person's voice into computer text.

Phone companies anticipate a bonanza, both in new business and in cost-cutting. "Speech recognition could be worth hundreds of millions of dollars per year for us," says Casmir Skrzypczak, vice-president for science and technology at Nynex Corp. Some savings, he says, will come from reducing fraud by storing a "voiceprint" of each calling-card customer. Since a voice is as unique as a fingerprint, phone cheats would be foiled.

Phone companies also plan to save money by automating some work that human operators do. Many phone companies already have computers that understand your "yes" or "no" when you're asked to pay for a collect call. The specter of robo-operators has phone company unions up in arms.

Nevertheless, phone companies are forging ahead. In 1992, Nynex plans to introduce voice-activated dialing, now available on some car phones. After Nynex's computer has been trained for your voice, you'll be able to pick up the phone and say "Mom" or "Bob" and the computer will call them.

Voice recognition is also expected to boost overall phone use. Phone companies figure that consumers will soon be calling in orders to voice-recognition computers owned by direct marketers, rather than filling out mail-order forms. Amway Corp. now uses such a system for its thousands of dealers to place merchandise orders around the clock.

While telephone applications may soon become the most pervasive form of speech recognition, the technology causing the most excitement involves converting speech to text. The holy grail is a system that will allow people to treat their PC like a human secretary, verbally telling it to take dictation and then print copies of documents.

PAUSES APLENTY. Two Boston-area companies, Dragon Systems Inc. and Kurzweil Applied Intelligence Inc., are among the furthest along in that area. With the ability to recognize 30,000 words, the DragonDictate system can convert to test almost anything you say. But it still can't process continuous speech, so talkers must pause unnaturally between each word. That stalls the pace to a maximum of 40 words per minute, slower than a proficient typist. "It's not designed for able-bodied secretaries or journalists," says Janet M. Baker, president of Dragon Systems.

But the $9,000 system has been a godsend for the handicapped. One customer is a government attorney named David Bristol. Despite his cerebral palsy, the Dragon system enables Bristol to write briefs and other documents himself. The system is catching on so well among the disabled that IBM, instead of bringing its own technology out of the labs, has based its first commercial dictation product on Dragon's technology.

Kurzweil's niche is health care. Its system has become popular in hospitals, where doctors with busy hands and notoriously bad handwriting are finding that talking to a computer saves time and is more accurate than jotting notes that must later be transcribed by a secretary. It used to take five days to generate an emergency-room report at Mercy Hospital in Springfield, Mass., for example. Now, with the Kurzweil system, the same report can be created and printed in less than five minutes.

But these large-vocabulary systems have their drawbacks. Besides not being able to recognize continuous speech, they must be trained to "learn" the nuances of each speaker's voice. It often takes hours for a system to gather sufficient voice samples from each speaker. That's not acceptable in most over-the-phone tasks, which require "speaker independence." Fact is, not one of the dozen or so speech systems now available commercially can instantly understand large vocabularies of naturally paced speech from any person.

WORD SPOTTING. New techniques promise to overcome such limitations. Linguistics programs, for example, can increase accuracy by anticipating where a noun or verb is likely to occur in a sentence. "Word spotting" makes small-vocabulary systems far more useful by filtering out irrelevant words or phrases. So if you reply, "Um, well, yes, thank you," to a question that requires a simple yes or no, the computer is able to disregard everything but the key word.

Despite recent progress, researchers all over the world are still racing to make new breakthroughs. In Japan, for instance, almost every high-tech company is working feverishly. Hitachi Ltd. is experimenting with so-called neural network systems that simulate the learning abilities of the human brain. Rival NEC Corp., meanwhile, is trying to combine voice recognition with machine translation. A prototype unveiled early this year grasps words spoken in Japanese and renders them into computer-synthesized English.

Experts predict that systems with HAL-like capabilities will indeed arrive--but probably not before 2001. Until then, the struggle continues. "Machines should deal with humans on human terms, not on their own," says Skrzypczak at Nynex. If that happens, when people complain of computerphobia, at least the computer will be able to listen.

By Evan I. Schwartz in New York, with Keith Hammonds in Boston and bureau reports

For information on reprints of this Special Report, call Business Week Reprints at 609 426-5494, or write Business Week Reprints, P.O. Box 457, Hightstown, N.J. 08520.

Before it's here, it's on the Bloomberg Terminal.