WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and system for buffering recognized words during speech recognition    
United States Patent5899976   
Link to this pagehttp://www.wikipatents.com/5899976.html
Inventor(s)Rozak; Michael J. (Issaquah, WA)
AbstractA method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Rozak; Michael J. (Issaquah, WA)
Owner/Assignee     Microsoft Corporation (Redmond, WA)
Patent assignment
All assignments
Publication Date     May 4, 1999
Application Number     08/741,698
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 31, 1996
US Classification    
Int'l Classification    
Examiner     Hudspeth; David R.
Assistant Examiner     Chawan; Vijay B.
Attorney/Law Firm     Seed & Berry LLP
Address
Parent Case    
Priority Data    
USPTO Field of Search    
Patent Tags     buffering recognized words during speech recognition
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5712957
Waibel
704/240
Jan,1998

[0 after 0 votes]
5651096
Pallakoff
704/275
Jul,1997

[0 after 0 votes]
5640485
Ranta
704/251
Jun,1997

[0 after 0 votes]
5623578
Mikkilineni
704/255
Apr,1997

[0 after 0 votes]
5604897
Travis

Feb,1997

[0 after 0 votes]
5561757
Southgate
715/790
Oct,1996

[0 after 0 votes]
5548681
Gleaves
704/233
Aug,1996

[0 after 0 votes]
5526463
Gillick
704/251
Jun,1996

[0 after 0 votes]
5428707
Gould
704/231
Jun,1995

[0 after 0 votes]
5425128
Morrison
704/243
Jun,1995

[0 after 0 votes]
5386494
White
704/270.1
Jan,1995

[0 after 0 votes]
5367453
Capps
715/531
Nov,1994

[0 after 0 votes]
5248707
Gerber
523/145
Sep,1993

[0 after 0 votes]
5231670
Goldhor
704/275
Jul,1993

[0 after 0 votes]
5127055
Larkey
704/244
Jun,1992

[0 after 0 votes]
5091947
Ariyoshi
704/246
Feb,1992

[0 after 0 votes]
5040127
Gerson
704/255
Aug,1991

[0 after 0 votes]
5027406
Roberts
704/244
Jun,1991

[0 after 0 votes]
4972485
Dautrich
704/251
Nov,1990

[0 after 0 votes]
4914704
Cole
704/235
Apr,1990

[0 after 0 votes]
4882757
Fisher
704/253
Nov,1989

[0 after 0 votes]
4870686
Gerson
704/234
Sep,1989

[0 after 0 votes]
4837831
Gillick

Jun,1989

[0 after 0 votes]
4829576
Porter
704/235
May,1989

[0 after 0 votes]
4809333
Taylor
704/252
Feb,1989

[0 after 0 votes]
4799262
Feldman
704/241
Jan,1989

[0 after 0 votes]
4783803
Baker
704/252
Nov,1988

[0 after 0 votes]
4761815
Hitchcock
704/253
Aug,1988

[0 after 0 votes]
4618984
Das
704/244
Oct,1986

[0 after 0 votes]
4566065
Toth
704/251
Jan,1986

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


I claim:

1. A method in a dictation editing system for buffering recognized words before sending to an application program, the method comprising:

receiving from a speaker an indication of an amount of speech;

receiving utterances from the speaker;

recognizing the received utterances as recognized words;

displaying the recognized words in a dictation window;

in response to a request from the speaker to correct a displayed word,

displaying a list of alternative words for the word to correct; and

replacing the word to correct with an alternative word from the list; and

when the indicated amount of speech has been recognized and displayed, transferring to the application program system words displayed in the dictation window.

2. The method of claim 1 wherein the amount of speech is indicated to be a sentence.

3. The method of claim 1 wherein the amount of speech is indicated to be a paragraph.

4. The method of claim 1 wherein the amount of speech is indicated by resizing the dictation window.

5. The method of claim 1 wherein the step of recognizing uses continuous speech recognition.

6. The method of claim 1 wherein the step of recognizing uses discrete speech recognition.

7. The method of claim 1 wherein the application program is a word processor.

8. A method in a computer system for delaying transmission of words from a dictation editing system to a processing system so that a user can correct any words misrecognized, the method comprising:

receiving from the user an indication of an amount of recognized words;

receiving representations of words;

recognizing the received representations as recognized words;

displaying the recognized words;

correcting the displayed words as directed by the user; and

when the indicated amount of recognized words have been recognized and displayed, transferring to the processing system some of the displayed words.

9. The method of claim 8 wherein the received representations are spoken utterances.

10. The method of claim 8 wherein the amount of recognized words is indicated to be a sentence.

11. The method of claim 8 wherein the amount of recognized words is indicated to be a paragraph.

12. The method of claim 8 wherein the amount of recognized words is indicated by resizing a window in which the words are displayed.

13. The method of claim 8 wherein the step of recognizing uses continuous speech recognition.

14. The method of claim 8 wherein the step of recognizing uses discrete speech recognition.

15. A computer system for delayed transmission of words from a dictation editing system to a processing system so that a user can correct any words misrecognized by the dictation editing system, comprising:

means for receiving from the user an indication of an amount of recognized words;

means for receiving representations of words;

means for recognizing the received representations as recognized words;

means for displaying the recognized words;

means for correcting the displayed words as directed by the user; and

means for transferring to the processing system some of the displayed words when the indicated amount of recognized words have been recognized and displayed.

16. The computer system of claim 15 wherein the received representations are spoken utterances.

17. The computer system of claim 15 wherein the amount of recognized words is indicated to be a sentence.

18. The computer system of claim 15 wherein the amount of recognized words is indicated to be a paragraph.

19. The computer system of claim 15 wherein the amount of recognized words is indicated by resizing a window in which the words are displayed.

20. A computer-readable medium containing instructions for causing a computer system to delay transmission of words from a dictation editing system to a processing system so that a user can correct any words misrecognized, by:

receiving from the user an indication of an amount of recognized words;

receiving spoken utterances from the user;

recognizing the received spoken utterances as recognized words;

displaying the recognized words;

correcting the displayed words as directed by the user; and

when the indicated amount of recognized words have been recognized and displayed, transferring to the processing system a portion of the displayed words as corrected.

21. The computer-readable medium of claim 20 wherein the amount of recognized words is indicated to be a sentence.

22. The computer-readable medium of claim 20 wherein the amount of recognized words is indicated to be a paragraph.

23. The computer-readable medium of claim 20 wherein the amount of recognized words is indicated by resizing a window in which the words are displayed.

24. The computer-readable medium of claim 20 wherein the recognizing uses continuous speech recognition.

25. The computer-readable medium of claim 20 wherein the recognizing uses discrete speech recognition.
 Description Submit all comments and votes
 


TECHNICAL FIELD

The present invention relates to computer speech recognition, and more particularly, to the editing of dictation produced by a speech recognition system.

BACKGROUND OF THE INVENTION

A computer speech dictation system that would allow a speaker to efficiently dictate and would allow the dictation to be automatically recognized has been a long-sought goal by developers of computer speech systems. The benefits that would result from such a computer speech recognition (CSR) system are substantial. For example, rather than typing a document into a computer system, a person could simply speak the words of the document, and the CSR system would recognize the words and store the letters of each word as if the words had been typed. Since people generally can speak faster than type, efficiency would be improved. Also, people would no longer need to learn how to type. Computers could also be used in many applications where their use is currently impracticable because a person's hands are occupied with tasks other than typing.

Typical CSR systems have a recognition component and a dictation editing component. The recognition component controls the receiving of the series of utterances from a speaker, recognizing each utterance, and sending a recognized word for each utterance to the dictation editing component. The dictation editing component displays the recognized words and allows a user to correct words that were misrecognized. For example, the dictation editing component would allow a user to replace a word that was misrecognized by either speaking the word again or typing the correct word.

The recognition component typically contains a model of an utterance for each word in its vocabulary. When the recognition component receives a spoken utterance, the recognition component compares that spoken utterance to the modeled utterance of each word in its vocabulary in an attempt to find the modeled utterance that most closely matches the spoken utterance. Typical recognition components calculate a probability that each modeled utterance matches the spoken utterance. Such recognition components send to the dictation editing component a list of the words with the highest probabilities of matching the spoken utterance, referred to as the recognized word list.

The dictation editing component generally selects the word from the recognized word list with the highest probability as the recognized word corresponding to the spoken utterance. The dictation editing component then displays that word. If, however, the displayed word is a misrecognition of the spoken utterance, then the dictation editing component allows the speaker to correct the misrecognized word. When the speaker indicates to correct the misrecognized word, the dictation editing component displays a correction window that contains the words in the recognized word list. In the event that one of the words in the list is the correct word, the speaker can just click on that word to effect the correction. If, however, the correct word is not in the list, the speaker would either speak or type the correct word.

Some CSR systems serve as a dictation facility for word processors. Such a CSR system controls the receiving and recognizing of a spoken utterance and then sends each character corresponding to the recognized word to the word processor. Such configurations have a disadvantage in that when a speaker attempts to correct a word that was previously spoken, the word processor does not have access to the recognized word list and thus cannot display those words to facilitate correction.

SUMMARY OF THE INVENTION

The present invention provides a new and improved computer speech recognition (CSR) system with a recognition component and a dictation editing component. The dictation editing component allows for rapid correction of misrecognized words. The dictation editing component allows a speaker to select the number of alternative words to be displayed in a correction window by resizing the correction window. The dictation editing component displays the words in the correction window in alphabetical order to facilitate locating the correct word. In another aspect of the present invention, the CSR system eliminates the possibility, when a misrecognized word or phrase is respoken, that the respoken utterance will be again recognized as the same misrecognized word or phrase based on analysis of both the previously spoken utterance and the newly spoken utterance. The dictation editing component also allows a speaker to specify the amount of speech that is buffered in a dictation editing component before transferring the recognized words to a word processor. The dictation editing component also uses a word correction metaphor or a phrase correction metaphor which changes editing actions which are normally character-based to be either word-based or phrase-based.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates a sample resizable correction window.

FIG. 1B illustrates the sample correction window after resizing.

FIG. 2A illustrates an adjustable dictation window.

FIG. 2B illustrates the use of a correction window to correct text in the dictation window.

FIGS. 3A-B illustrate the word/phrase correction metaphor for the dictation editing component.

FIGS. 4A-C are block diagrams of a computer system of a preferred embodiment.

FIG. 5A is a flow diagram of a dictation editing component with a resizable correction window.

FIG. 5B is a flow diagram of a window procedure for the resizable correction window.

FIG. 6 is a flow diagram of a dictation editing component with an adjustable dictation window.

FIG. 7 is a flow diagram of a window procedure for a word processor or dictation editing component that implements the word correction metaphor.

FIG. 8 is a flow diagram of a CSR system that eliminates misrecognized words from further recognition.

FIG. 9 is a flow diagram of automatic recognition training.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides for a dictation editing component that allows the editing of dictation produced by a computer speech recognition (CSR) system. In an exemplary embodiment, the dictation editing component allows a speaker to select the number of alternative words to be displayed in a correction window by resizing the correction window. The dictation editing component also displays the words in the correction window in alphabetical order. A preferred dictation editing component also eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The dictation editing component, when providing recognized words to an application program, such as a word processor, preferably allows the speaker to specify the amount of speech that is buffered by the dictation editing component before transferring recognized words to the application program. In the following, the various aspects of the present invention are described when used in conjunction with a discrete CSR system (i.e., the speaker pauses between each word). These aspects, however, can also be used in conjunction with a continuous CSR system. For example, the correction window can be resized to indicate the number of alternative phrases to be displayed. Also, when a speaker selects a phrase to be replaced, the user interface system can ensure that the same phrase is not recognized again.

FIG. 1A illustrates a sample resizable correction window. The dictation editing component window 101 contains the recognized words 102 and the correction window 103. In this example, the speaker spoke the words "I will make the cake." The recognition component misrecognized the word "make" as the word "fake." The speaker then indicated that the word "fake" should be corrected. Before displaying the correction window, the dictation editing component determines the current size of the resizable correction window and calculates the number of words that could be displayed in that correction window. The dictation editing component then selects that number of words from the recognized word list with the highest probabilities (i.e., alternative words) and displays those words in the correction window using standard window resizing techniques (e.g., pointing to a border of the window with a mouse pointer and dragging the mouse). If the speaker wishes to see more words from the list, the speaker simply resizes the correction window. When the correction window is resized, the dictation editing component again determines the number of words that can be displayed in the correction window and displays that number of words in the correction window. The next time that the speaker indicates to correct a word, the dictation editing component displays the correction window with a number of words that will fit based on its last resizing. In this way, the speaker can effectively select the number of words to be displayed by simply resizing the correction window. FIG. 1B illustrates the sample correction window after resizing.

Additionally, the dictation editing component preferably displays the words in the correction window in alphabetical order. The displaying of the words in alphabetical order allows the speaker to quickly locate the correct word if it is displayed. Prior dictation editing components would display words in correction windows in an order based on the probability as determined by the recognition component. However, when displayed in probability order, it may be difficult for a speaker to locate the correct word unless the correct word is displayed first or second.

FIG. 2A illustrates an adjustable dictation window for a CSR system that interfaces with a word processor. The CSR system inputs a series of utterances from the speaker, recognizes the utterance, and displays recognized words for the utterances in the dictation window 201. Since the dictation window is controlled by the dictation editing component, the speaker can correct the words in the dictation window. Thus, when a speaker selects to correct a word within the dictation window, the speaker can use any of the correction facilities supported by the dictation editing component. For example, the speaker can use the correction window to display the words in the recognized word list for any word currently displayed in the dictation window. FIG. 2B illustrates the use of a correction window to correct text in the dictation window.

In one embodiment, the dictation editing component allows a speaker to adjust the amount of speech that the dictation window can accommodate. Since the speaker can only use the correction facilities on words within the dictation window, but not on words within the word processor window, the speaker can adjust the size of the dictation window to accommodate the amount of speech based on the dictation habits of the speaker. For example, the speaker can specify that the dictation window should only accommodate one sentence, one paragraph, or a fixed number of words. Alternatively, the speaker can resize the dictation window using standard window resizing techniques to indicate that the dictation window should accommodate as many words as can fit into the window. When the dictation window becomes full, the CSR system transmits either all of the words or some of the words in the dictation window to the word processor. For example, if the speaker indicates that the dictation window should accommodate a sentence, then any time a new sentence is started, the CSR system would transmit all of the words (i.e., one sentence) to the word processor. Conversely, if the speaker resized the dictation window, then the CSR system may transmit only a line of words at a time to the word processor.

FIG. 3A illustrates the word correction metaphor for the dictation editing component. When a word processing system is in dictation mode, the dictation editing component automatically changes the definition of various editing events (e.g., keyboard events, mouse events, pen events, and speech events) to be word-based, rather than character-based. For example, when in dictation mode, the backspace key, which normally backspaces one character, is modified to backspace a word at a time. Thus, when the user depresses the backspace key when in dictation mode, the entire word to the left of the current insertion point is deleted. Similarly, when in dictation mode, the right and left arrow keys will cause the insertion point to move left or right one word, and the delete key will delete the entire word to the right of the insertion point. Also, when a user clicks with a button of the mouse and the mouse pointer is over a word, the dictation editing component selects the word at which the mouse pointer is over, rather than simply setting the insertion point to within the word. However, if the mouse pointer is in between words, then an insertion point is simply set in between the words. Lines 301-304 illustrate sample effects of the word correction metaphor. Each line shows the before and after text when the indicated event occurs. For example, line 302 shows that if the insertion point is after the word "test," then the left arrow event will cause the insertion point to be moved before the word "test." The use of the word correction metaphor facilitates the correction of words when in dictation mode because typically speakers wish to re-speak the entire word when correcting. Thus, when a speaker clicks on a word, the entire word is selected and the speaker can simply speak to replace the selected word. When the speech recognition is continuous, a phrase correction metaphor may be preferable. Because continuous speech recognition may not correctly identify word boundaries, the word correction metaphor may select a misrecognized word whose utterance represents only a part of a word or represents multiple words. It may be preferable in such situations to simply re-speak the entire phrase. Consequently, the definition of various editing events would be changed to be phrase-based, rather than being changed word-based. For example, the editing event of the user speaking the word "backspace" that would normally backspace over the previous character would be changed to backspace a phrase at a time. FIG. 3B illustrates this phrase correction metaphor.

In one embodiment, the CSR system provides misrecognized word elimination to prevent re-recognition of a respoken utterance as the same word that is being corrected. The dictation editing component determines when a speaker is correcting a misrecognized word. The speaker can correct a misrecognized word in different ways. For example, the speaker could delete the word and then speak with the insertion point at the location where the word was deleted. Alternatively, the speaker could highlight the misrecognized word and then speak to replace that highlighted word. When the recognition component receives a respoken utterance, it recognizes the utterance and sends a new recognized word list to the dictation editing component. The dictation editing component then selects and displays the word from the new recognized word list with the highest probability that is other than the word being corrected. In one embodiment, the dictation editing component uses the previous recognized word list for the misrecognized utterance and the new recognized word list to select a word (other than the word being corrected) that has the highest probability of matching both utterances. To calculate the highest probability, the dictation editing component identifies the words that are in both recognized word lists and multiplies their probabilities. For example the following table illustrates sample recognized word lists and the corresponding probabilities.

______________________________________ Previous Recognized Word List New Recognized Word List ______________________________________ Fake .4 Fake .4 Make .3 Mace .3 Bake .1 Make .2 Mace .1 Bake .1 ______________________________________

If the speaker spoke the word "make," then without misrecognized word elimination the dictation editing component would select the word "fake" both times since it has the highest probability in both lists. With misrecognized word elimination, the dictation editing component selects the word "mace" when the is word "fake" is corrected since the word "mace" has the highest probability other than the word "fake" in the current list. However, when the probabilities from both recognized word lists are combined, the dictation editing component selects the word "make" as the correct word since it has the highest combined probability. The combined probability for the word "make" is 0.06 (0.3.times.0.2), for the word "mace" is 0.03 (0.1.times.0.3), and for the word "bake" is 0.01 (0.1.times.0.1).

The CSR system also automatically adds words to its vocabulary and automatically trains. When a user corrects a misrecognized word by typing the correct word, the dictation editing component determines whether that typed word is in the vocabulary. If the typed word is not in the vocabulary, then the dictation editing component directs the recognition component to add it to the vocabulary using the spoken utterance that was misrecognized to train a model for that word. If, however, the typed word is in the vocabulary, the dictation editing component then automatically directs the recognition component to train the typed word with the spoken utterance that was misrecognized.

The dictation editing component allows for phrase correction, in addition to word correction, when used with a continuous dictation system. In a continuous dictation system, the recognition component may incorrectly identify a word boundary. For example, a speaker may say the phrase "I want to recognize speech." The recognition component may recognize the spoken phrase as "I want to wreck a nice beach." However, the use of single word c