One of the funniest moments in the classic Star Trek motion pictures is the scene when the engineer "Scotty" - who has traveled back in time to the 1980s with his comrades - attempts to use a computer. "Computer!" he exclaims, attempting to initiate a dialogue with the PC. Embarrassed, a contemporary engineer hands him a mouse. "Aha," says Scotty who then holds the mouse to his mouth only to again exclaim, "Computer!" The idea that computers in the future would be able to understand human speech was common a few decades ago. Speech generation and recognition is so fundamental to the human experience that we tend to underestimate the incredible complexity of human information processing that makes it possible.
Each person has unique pronunciation, which is why we are able to recognize friends and family by voice alone. When you factor in the variation that exists between nationalities and regions, the potential variation in the pronunciation of individual words is incredibly wide. The reason that we humans are able to decipher - admittedly not perfectly - the speech of others is that we perform a semantic analysis of what we hear. In other words, we identify the words within a sentence by working out which words makes sense within the context of what we think the person is trying to say. When we hear a sentence such as, "The rain in Spain falls mainly on the plain," we never believe that "plain" is "plane" because the latter would make no sense.
The inability of the computer to understand the meaning of sentences has led to some pretty disappointing speech recognition implementations. Most of us have experienced the frustration of responding to a speech-based phone support system which misinterprets your commands, and yet these systems have an excellent expectation of what you are trying to say. Expecting computers to respond accurately to arbitrarily complex statements has become increasingly unlikely as we have better understood the magnitude of the challenge.
I had become particularly cynical as to the practicality of speech recognition. However, I recently was strongly motivated to attempt to use the built-in Windows 7 speech recognition system when a hand condition made extensive typing uncomfortable. To my surprise, I found that the quality of the speech recognition had improved dramatically. Indeed, I have been able to draft this column - as well as much longer documents - entirely through speech recognition without using the keyboard.
Like most speech recognition, Windows speech recognition uses a mix of heuristic techniques - essentially guessing what you might say next - together with training mechanisms, to improve recognition quality. Corrections, which are presented when the system isn't sure what you are saying, are incorporated into the speech recognition profile, leading to an increasingly accurate recognition of your unique voice.
Humorous and often frustrating mistakes are still commonplace, and the inability of the system to make what would seem to be obvious corrections can be deeply annoying. For instance, when I try and dictate "IO," speech recognition often believes that I mean "Io." The chance that I am writing about the first Moon of Jupiter would seem remote! Likewise, it was difficult to persuade the system to correctly dictate, "The rain in Spain falls mainly on the plain."
For all of these occasional issues, I have to admit that I've become a convert to speech recognition, and I'm optimistic that we will eventually experience a Star Trek level of voice control over our computers. Now if they can just get those teleporters working!