|
|
|
| United States Patent | 6173266 |
| Link to this page | http://www.wikipatents.com/6173266.html |
| Inventor(s) | Marx; Matthew T. (Everett, MA), Carter; Jerry K. (Somerville, MA), Phillips; Michael S. (Belmont, MA), Holthouse; Mark A. (Newton, MA), Seabury; Stephen D. (Boston, MA), Elizondo-Cecenas; Jose L. (Boston, MA), Phaneuf; Brett D. (Marshfield, MA) |
| Abstract | Dialogue modules are provided, with each dialogue module includes computer
readable instructions for accomplishing a predefined interactive dialogue
task in an interactive speech application. In response to user input, a
subset of the plurality of dialogue modules are selected to accomplish
their respective interactive dialogue tasks in the interactive speech
application and are interconnected in an order defining the call flow of
the application, and the application is generated. A graphical user
interface represents the stored plurality of dialogue modules as icons in
a graphical display in which icons for the subset of dialogue modules are
selected in the graphical display in response to user input, the icons for
the subset of dialogue modules are graphically interconnected into a
graphical representation of the call flow of the interactive speech
application, and the interactive speech application is generated based
upon the graphical representation. Using the graphical display, the method
further includes associating configuration parameters with specific
dialogue modules. Each configuration parameter causes a change in
operation of the dialogue module when the interactive speech program
executes. A window is displayed for setting the value of the configuration
parameter in response to user input, when an icon for a dialogue module
having an associated configuration parameter is selected. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 6173266 |
|
|
System and method for developing interactive speech applications |
|
| Inventor |
Marx; Matthew T. (Everett, MA) , Carter; Jerry K. (Somerville, MA) , Phillips; Michael S. (Belmont, MA) , Holthouse; Mark A. (Newton, MA) , Seabury; Stephen D. (Boston, MA) , Elizondo-Cecenas; Jose L. (Boston, MA) , Phaneuf; Brett D. (Marshfield, MA) |
|
|
|
| Publication Date |
January 9, 2001 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
RELATED APPLICATIONS
This application claims priority from provisional application, U.S. Ser.
No. 60/045,741, filed on May 6, 1997, which is incorporated herein by
reference. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 6058166 Osder et al.
May,2000 |      Your vote accepted [0 after 0 votes] | | 6035275 Brode et al.
Mar,2000 |      Your vote accepted [0 after 0 votes] | | 5905774 Tatchell et al.
May,1999 |      Your vote accepted [0 after 0 votes] | | 5842193 Reilly
Nov,1998 |      Your vote accepted [0 after 0 votes] | | 5774860 Bayya et al.
Jun,1998 |      Your vote accepted [0 after 0 votes] | | 5694558 Sparks et al.
Dec,1997 |      Your vote accepted [0 after 0 votes] | | 5652789 Miner et al.
Jul,1997 |      Your vote accepted [0 after 0 votes] | | 5638425 Meador, III et al.
Jun,1997 |      Your vote accepted [0 after 0 votes] | | 5615296 Stanford et al.
Mar,1997 |      Your vote accepted [0 after 0 votes] | | 5594638 Iliff
Jan,1997 |      Your vote accepted [0 after 0 votes] | | 5566272 Brems et al.
Oct,1996 |      Your vote accepted [0 after 0 votes] | | 5544305 Ohmaye et al.
Aug,1996 |      Your vote accepted [0 after 0 votes] | | 5479488 Lennig et al.
Dec,1995 |      Your vote accepted [0 after 0 votes] | | 5357596 Takebayashi et al.
Oct,1994 |      Your vote accepted [0 after 0 votes] | | 4625081 Lotito et al.
Nov,1986 |      Your vote accepted [0 after 0 votes] | | | | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
| Add a new Other reference: |
| Post related web sites and other references in this section |
| | Reference | Relevancy | Comments | AVIOS 1996 Presentation, Applied Language Technologies, Sep., 1996, pp. 1-11.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Marx, Matthew, "Rapid Development of Robust Speech Applications with DialogModules", AVIOS Conference Proceedings, Sep., 1996.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Phillips, Mike, "Designing Successful Speech Recognition Applications", Computer Telephony Conference Exposition Seminar Manual, Mar. 12, 1996.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Davis, James Raymond, "Let Your Fingers Do the Spelling: Implicit Disambiguation of Words Spelled with the Telephone Keypad", Journal of the American Voice Input/Output Society, 9:57-66, Mar., 1991.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Kamm, Candace, "User Interfaces for Voice Applications", Voice Communication Between Humans and Machines, pp. 422-442, Washington, D.C., 1994.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Marx, Matthew Talin, "Toward Effective Conversational Messaging", MIT Master's Thesis, Program in Media Arts and Sciences, 1995.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Schmandt, Chris, "Phoneshell: The Telephone as Computer Terminal", Proceedings of ACM Multimedia Conference, Aug., 1993.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Pelton, Gordon E., "Development Software", Ch. 8 of Voice Processing, 1993, pp. 221-287.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Yankelovich, Nicole, et al., "SpeechActs: A Framework for Building Speech Application", 1994.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Yankelovich, Nicole, et al, "Designing SpeechActs" Issues in Speech User Interfaces, Proc. CHI, 1995.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Kamm, Candace, "User Interfaces for Voice Applications", from Voice Communication Between Humans and Machines, National Academy Press, 1994, pp. 422-442.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Ly, Eric, et al., "Chatter: A Conversational Learning Speech Interface", AAAI 1994 Spring Symposium on Multi-Media Multi-Modal Systems, Mar., 1994.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Marx, Matt, et al., "MailCall: Message Presentation and Navigation in a Nonvisual Environment".
. May,2007 |      Your vote accepted [0 after 0 votes] | | Martin, Paul, et al., "SpeechActs: A Spoken-Language Framework", IEEE Computer, Jul., 1996, pp.33-40.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Scharf, Ira, "A Dialog Specification Language for Developing Interactive Speech Recognition Applications", pp. 49-68.
. May,2007 |      Your vote accepted [0 after 0 votes] | | Zue, Victor W., "Development of Spoken Language Systems", IEEE Expert, 1993.. May,2007 |      Your vote accepted [0 after 0 votes] | | |
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. A computer-implemented method of constructing, by a developer, an interactive speech application for use by an application user, the method comprising:
providing a plurality of dialogue modules, wherein each dialogue module includes computer readable instructions for accomplishing a predefined interactive dialogue task;
selecting, in response to developer input, at least one of the plurality of dialogue modules to accomplish at least one respective interactive dialogue task; and
establishing, in response to developer input, at least one relationship between the at least one dialogue module and a dialogue-processing unit other than the at least one dialogue module to define a flow of dialogue between the application user
and the interactive speech application.
2. The method of claim 1, further comprising:
setting, in response to developer input, at least one configuration parameter associated with at least one of the dialogue modules, wherein each configuration parameter affects how an associated dialogue module operates when the interactive
speech application executes.
3. The method of claim 2, wherein an interactive dialogue task associated with a dialogue module includes outputting a prompt to the application user and receiving a response from the application user, and at least one of the at least one
configuration parameters is a timeout parameter defining a period of time for the application user to respond after a prompt is output.
4. The method of claim 2, wherein an interactive dialogue task associated with a dialogue module includes outputting a prompt to the application user and receiving a response from the application user, and at least one of the at least one
configuration parameter is a prompt parameter defining a prompt to be output.
5. The method of claim 2, wherein an interactive dialogue task associated with a dialogue module includes outputting a prompt to the application user and receiving a response from the application user, and at least one of the at least one
configuration parameters is an apology prompt parameter for defining an apology prompt to be output if the application user's response is not recognized.
6. The method of claim 2, wherein an interactive dialogue task associated with a dialogue module includes outputting a prompt to the application user and receiving a response from the application user, and at least one of the at least one
configuration parameters is a parameter for designating recognizable responses from the application user.
7. The method of claim 1, further comprising storing the selected at least one dialogue module and an indication of the at least one relationship.
8. The method of claim 1, wherein an interactive dialogue task associated with a dialogue module includes:
instructions for outputting a prompt to the application user;
instructions for receiving a response from the application user; and
instructions for interacting with a speech recognition engine for recognizing the received response using recognition models.
9. The method of claim 8, wherein an interactive dialogue task associated with a dialogue module further includes instructions for updating the recognition models used by the speech recognition engine based on recognized responses during
execution of the interactive speech application.
10. The method of claim 1, further comprising:
graphically representing the plurality of dialogue modules as icons in a graphical display, wherein:
the selecting includes receiving an indication of the at least one dialogue module; and
the establishing includes graphically interconnecting the icon representing the at least one dialogue module with a graphical indication representing the other dialogue-processing unit according to the at least one relationship.
11. The method of claim 10, further comprising:
displaying a window in the graphical display for setting a value of a configuration parameter when an icon for a dialogue module having an associated configuration parameter is selected in response to developer input; and
setting the value of the configuration parameter in response to developer input;
wherein the configuration parameter affects how a dialogue module operates when the interactive speech application executes.
12. The method of claim 1 wherein the selecting includes selecting at least two dialogue modules.
13. The method of claim 12 wherein the another dialogue-processing unit is a selected dialogue module.
14. The method of claim 13 wherein the selecting includes selecting at least two different dialogue modules and the another dialogue-processing unit is different from the selected dialogue module with which the another dialogue-processing unit
has a relationship established.
15. A memory device storing computer-readable instructions for enabling a developer to construct an interactive speech application in a speech processing system, the instructions being for causing a computer to:
perform a plurality of predefined interactive dialogue tasks, at least the instructions associated with the tasks forming a respective plurality of dialogue module templates;
produce, in response to developer input, a plurality of dialogue module instances for use in an interactive speech application, wherein each dialogue module instance is based on, and is a customized version of, one of the dialogue module
templates, the dialogue module templates and the dialogue module instances being forms of dialogue modules; and
establish, in response to developer input, at least one relationship between at least two dialogue modules to define a dialogue flow.
16. The memory device of claim 15, further comprising instructions for causing a computer to:
set, in response to developer input, a value of at least one configuration parameter associated with at least one of the dialogue modules, wherein each configuration parameter affects how an associated dialogue module operates when the
interactive speech application executes.
17. The memory device of claim 16, wherein an interactive dialogue task associated with a dialogue module includes outputting a prompt to the application user and receiving a response from the application user, and at least one of the at least
one configuration parameter is a parameter for designating recognizable responses from the application user.
18. The memory device of claim 15, further comprising instructions for causing a computer to store the at least two dialogue modules and an indication of the relationship between the at least two dialogue modules.
19. The memory device of claim 15, wherein an interactive dialogue task associated with a dialogue module includes instructions for causing a computer to:
output a prompt to the application user;
receive a response from the application user; and
interact with a speech recognition engine for recognizing the received response using recognition modules.
20. The memory device of claim 19, wherein an interactive dialogue task associated with a dialogue module further includes instructions for causing a computer to update the recognition models used by the speech recognition engine based on
recognized responses during execution of the interactive speech application.
21. The memory device of claim 15, further comprising:
instructions for causing a computer to graphically represent the plurality of dialogue modules as icons in a graphical display,
wherein:
the instructions for causing a computer to produce the dialogue module instances include instructions for causing a computer to select the plurality of dialogue modules templates in response to developer input and instructions for causing a
computer to graphically represent the dialogue module instances as icons in the graphical display; and
the instructions for causing a computer to establish at least one relationship between at least two dialogue modules include instructions for causing a computer to graphically interconnect the icons representing the dialogue modules into a
graphical representation of the dialogue flow of the interactive speech application in accordance with the at least one relationship.
22. A computer program product, residing on a computer-readable medium, for enabling a speech-application developer to construct an interactive speech application for use by an application user, the computer program product comprising
instructions for causing a computer to:
provide a plurality of dialogue modules, each dialogue module including computer-readable instructions for causing a computer to accomplish a predefined interactive dialogue task;
select, in response to developer input, at least one of the plurality of dialogue modules to accomplish at least one respective interactive dialogue task; and
establish, in response to developer input, at least one relationship between the at least one dialogue module and a dialogue-processing unit other than the at least one dialogue module to define a flow of dialogue between the application user and
the interactive speech application.
23. The computer program product of claim 22 wherein the instructions for causing a computer to select include instructions for causing a computer to select, in response to developer input, at least two dialogue modules.
24. The computer program product of claim 23 wherein the another dialogue-processing unit is a selected dialogue module.
25. The computer program product of claim 22 further comprising instructions for causing a computer to store the selected at least one dialogue module and an indication of the at least one relationship. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates generally to a system and method for developing computer-executed interactive speech applications.
BACKGROUND
Computer-based interactive speech applications are designed to provide automated interactive communication, typically for use in telephone systems to answer incoming calls. Such applications can be designed to perform various tasks of ranging
complexity including, for example, gathering information from callers, providing information to callers, and connecting callers with appropriate people within the telephone system. However, using past approaches, developing these applications has been
difficult.
FIG. 1 shows a call flow of an illustrative interactive speech application 100 for use by a Company A to direct an incoming call. Application 100 is executed by a voice processing unit or PBX in a telephone system. The call flow is activated
when the system receives a incoming call, and begins by outputting a greeting, "Welcome to Company A" (110).
The application then lists available options to the caller (120). In this example, the application outputs an audible speech signal to the caller by, for example, playing a prerecorded prompt or using a speech generator such as text-to-speech
converter: "If you know the name of the person you wish to speak to, please say the first name followed by the last name now. If you would like to speak to an operator, please say `Operator` now."
The application then waits for a response from the caller (130) and processes the response when received (140). If the caller says, for example, "Mike Smith," the application must be able to recognize what the caller said and determine whether
there is a Mike Smith to whom it can transfer the call. Robust systems should recognize common variations and permutations of names. For example, the application of FIG. 1 may identify members of a list of employees of Company A by their full
names--for example, "Michael Smith." However, the application should also recognize that a caller asking for "Mike Smith" (assuming there is only one employee listed that could match that name) should also be connected to the employee listed as "Michael
Smith."
Assuming the application finds such a person, the application outputs a confirming prompt: "Do you mean `Michael Smith`?" (150). The application once again waits to receive a response from the caller (160) and when received (170), takes
appropriate action (180). In this example, if the caller responded "Yes," the application might say "Thank you. Please hold while I transfer your call to Michael Smith," before taking the appropriate steps to transfer the call.
FIG. 2 shows some of the steps that are performed for each interactive step of the interactive application of FIG. 1. Specifically, applying the process of FIG. 2 to the first interaction of the application described in FIG. 1, the interactive
speech application outputs the prompt of step 120 of FIG. 1 (210). The application then waits for the caller's response (220, 130). This step should be implemented not only to process a received response, as shown in the example of FIG. 1 (140), but
also to handle a lack of response. For example, if no response is received within a predetermined time, the application can be implemented to "time out" (230) and reprompt the caller (step 215) with an appropriate prompt such as "I'm sorry, I didn't
hear your response. Please repeat your answer now," and return to waiting for the caller's response (220, 130).
When the application detects a response from the caller (240), step 140 of FIG. 1 attempts to recognize the caller's speech, which typically involves recording the waveform of caller's speech, determining a phonetic representation for the speech
waveform, and matching the phonetic representation with an entry in a database of recognized vocabulary. If the application cannot determine any hypothesis for a possible match (250), it reprompts the caller (215) and returns to waiting for the caller's
response (220). Generally, the reprompt is varied at different points in the call flow of the application. For example, in contrast to the reprompt when no response is received during the time out interval, the reprompt when a caller's response is
received but not matched with a recognized response may be "I'm sorry, I didn't understand your response. Please repeat the name of the person to whom you wish to speak, or say `Operator.`"
If the application comes up with one or more hypotheses of what the caller said (260, 270), it determines a confidence parameter for each hypothesis, reflecting the likelihood that it is correct. FIG. 2 shows that the interpretation step (280)
may be applied for both low confidence and high confidence hypotheses. For example, if the confidence level falls within a range determined to be "high" (step 260), an application may be implemented to perform the appropriate action (290, 180) without
going through the confirmation process (150, 160, 170). Alternatively, an application can be implemented to use the confirmation process for both low and high confidence hypotheses. For example, the application of FIG. 1 identifies the best hypothesis
to the caller and asks whether it is correct.
If the application interprets the hypothesis to be incorrect (for example, if the caller responds "No" to the confirmation prompt of step 150), the application rejects the hypothesis and reprompts the caller to repeat his or her response (step
215). If the application interprets the hypothesis to be correct (for example, if the caller responds affirmatively to the verification prompt), the application accepts the hypothesis and takes appropriate action (290), which in the example of FIG. 1,
would be to output the prompt of 180 and transfer the caller to Michael Smith.
As exemplified by application 100 of FIGS. 1 and 2, interactive speech applications idare complex. Implementing an interactive speech application such as that described with reference to FIGS. 1 and 2 using past application development tools
requires a developer to design the entire call flow of the application, including defining vocabularies to be recognized by the application in response to each prompt of the application. In some cases, vocabulary implementation can require the use of an
additional application such as a database application. In the past approaches, it has been time consuming and complicated for the developer to ensure compatibility between the interactive speech application and any external applications and data it
accesses.
Furthermore, the developer must design the call flow to account for different types of responses for the same prompt in an application. In general, past approaches require that the developer define a language model of the language to be
recognized, typically including grammar rules to generally define the language and to more specifically define the intended call flow of the interactive conversation to be carried on with callers. Such definition is tedious.
Because of the inevitable ambiguities and errors in understanding speech, an application developer also needs to provide error recovery capabilities, including error handling and error prevention, to gracefully handle speech ambiguities and
errors without frustrating callers. This requires the application developer not only to provide as reliable a speech recognition system as possible, but also to design alternative methods for successfully eliciting and processing the desired information
from callers. Such alternative methods may include designing helpful prompts to address specific situations and implementing different methods for a caller to respond, such as allowing callers to spell their responses or input their responses using the
keypad of a touch-tone phone. In past approaches, an application developer is required to manually prepare error handling, error prevention, and any alternative methods used in them. This is time consuming and may lead to omissions of functions or
critical steps.
Based on the foregoing, there is a clear need in this field for an interactive speech development system and method that overcome these shortcomings.
SUMMARY
In general, in one aspect, the invention features a computer-implemented method of constructing an interactive speech | | |