Before we get into the particulars of the design process, let's examine what a typical speech-recognition application looks like. Here's an example of a person calling the AirTran flight information toll-free number. The following exchange ensues.
Let's examine some elements of this conversation. When the system answers the telephone call, it plays an audio file , a recorded, spoken prompt. The caller then responds. At this and each succeeding turn (also called a state ”in a speech-recognition application where the system asks the caller a question and then listens for an answer), the system analyzes the sound of what it has heard to determine whether or not it was something it expected to hear. So, for example, when the system asks the caller for "arrival or departure information," it is expecting to hear "Arrival" or "Departure" or even similar phrases, such as "Arrival information" or "Departure information." However, when it asks for the flight number, it's expecting to hear one of many, many different responses. For example, the caller might say, "Three twenty-one," "Three, two, one," "Flight three twenty-one," "Flight three, two, one," or any other number within a particular range of a few thousand possible flight numbers . Because of this, speech-recognition systems have to be designed and tested to ensure that they can "understand" virtually all possible responses that callers to the system are likely to make. That's why the effectiveness of a speech-recognition system can depend greatly on the capabilities of its core technology ”the speech recognizer and the way it is used. |