A distributed speech recognition system includes at least one client station and a server station connected via a network, such as Internet. The client station includes means for receiving the speech input signal from a user. A speech controller directs at least part of the speech input signal to a local speech recognizer. The, preferably limited, speech recognizer is capable of recognizing at least part of the speech input, for instance a spoken command for starting full recognition. In dependence on the outcome of the recognition, the speech controller selectively directs a part of the speech input signal via the network to the server station. The server station includes means for receiving the speech equivalent signal from the network and a large/huge vocabulary speech recognizer for recognizing the received speech equivalent signal.
A dynamically generated small grammar is used for text editing using voice commands in a hand-held electronic device where memory requirements do not allow a large grammar to be used. However, the text is dictated using a large grammar, which can reside in the electronic device or in a remote site. When an editing session begins, words in the dictated text are added to the small grammar. As such, words to be deleted or replaced can be located, for example. When a voice command calls for a text-modifying word, the text-modifying word is obtained from the small grammar if possible. Otherwise it is obtained from the large grammar. During the editing session, as more text-modifying words are used to modify the text, these words are also added to the small grammar. At the end of the editing session, the dynamically generated part of the small grammar is removed.
A method and system for performing computer implemented recognition is disclosed. In one method embodiment, the present invention first accesses user input stored in a memory of a mobile device. On the mobile device, the present invention performs a coarse recognition process on the user input to generate a coarse result. The coarse process may operate in real-time. The embodiment then displays a portion of the coarse result on a display screen of the mobile device. The embodiment further performs a detailed recognition process on the user input to generate a detailed result. The detailed process has more recognition patterns and computing resources available to it. The present embodiment performs a comparison of the detailed result and the coarse result. The present embodiment displays a portion of the comparison on the display screen.
Generally, the present invention provides the ability to present a mixed display of a transcription to a user. The mixed display is preferably organized in a hierarchical fashion. Words, syllables and phones can be placed on the same display by the present invention, and the present invention can select the appropriate symbol transcription based on the parts of speech that meet minimum confidences. Words are displayed if they meet a minimum confidence or else syllables, which make up the word, are displayed. Additionally, if a syllable does not meet a predetermined confidence, then phones, which make up the syllable, may be displayed. A transcription, in one aspect of the present invention, may also be described as a hierarchical transcription, because a unique confidence is derived that accounts for mixed word/syllable/phone data.
A method and system for performing computer implemented recognition. In one method embodiment, the present invention first accesses user input stored in a memory of a mobile device. On the mobile device, the present invention performs a coarse recognition process on the user input to generate a coarse result. The coarse process may operate in real-time. The embodiment then displays a portion of the coarse result on a display screen of the mobile device. The embodiment further performs a detailed recognition process on the user input to generate a detailed result. The detailed process has more recognition patterns and computing resources available to it. The present embodiment performs a comparison of the detailed result and the coarse result. The present embodiment displays a portion of the comparison on the display screen.
This invention has as its object to provide a speech recognition system to which a client and a device that provides a speech recognition process are connected, which provides a plurality of usable speech recognition means to the client, and which allows the client to explicitly switch and use the plurality of speech recognition means connected to the network. To achieve this object, a speech recognition system of this invention has speech input means for inputting speech at the client, designation means for designating one of the plurality of usable speech recognition means, and processing means for making the speech recognition means designated by the designation means recognize speech input from the speech input means.