An application (50) utilizing a speech recognition interface (10) employs the use of verbal proxies to accomplish dynamic vocabulary updates. One or more proxies used by the application (50) are defined (60) and added to the vocabulary (38) of the speech recognition interface (10). Upon the speech recognition interface (10) receiving a speech signal (14) representative of a command containing a defined proxy, the recognized speech is converted to text by the speech recognition interface (66, 68) to generate a text version of the recognized speech (28). The application receives the text version of the recognized speech (28) and processes it in accordance with the received command (70, 72). A parsing function (52) of the application (50) receives the text version of the recognized speech and determines what proxy routine(s) of the application to send the speech information to (80, 82). The identified proxy function (54) of the application receives the text version of the recognized speech, together with externally provided target information, and links a textual target contained in the target information to the defined proxy (84, 86). The proxy function (54) further generates an update instruction (56) that causes the speech recognition interface to insert the selected text-based target (58), provided by the proxy function, into the grammar (40) of the speech recognition interface, thereby updating the vocabulary (38) and the grammar (40) of the speech recognition interface.
Declarative markup languages for speech applications such as VoiceXML are becoming more prevalent programming modalities for describing speech applications. Present declarative markup languages for speech applications model the running speech application as a state machine with the program specifying the transitions amongst the states. These languages can be extended to support a marker-semantic to more easily solve several problems that are otherwise not easily solved. In one embodiment, a partially overlapping target window is implemented using a mark semantic. Other uses include measurement of user listening time, detection and avoidance of errors, and better resumption of playback after a false barge in.
A method for speech recognition can include generating a context-enhanced database from a system input. A voice-generated output can be generated from a speech signal by performing a speech recognition task to convert the speech signal into computer-processable segments. During the speech recognition task, the context-enhanced database can be accessed to improve the speech recognition rate. Accordingly, the speech signal can be interpreted with respect to words included within the context-enhanced database. Additionally, a user can edit or correct an output in order to generate the final voice-generated output which can be made available.