WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Semantic co-occurrence filtering for speech recognition and signal transcription applications    
United States Patent5500920   
Link to this pagehttp://www.wikipatents.com/5500920.html
Inventor(s)Kupiec; Julian M. (Cupertino, CA)
AbstractA system and method for automatically transcribing an input question from a form convenient for user input into a form suitable for use by a computer. The question is a sequence of words represented in a form convenient for the user, such as a spoken utterance or a handwritten phrase. The question is transduced into a signal that is converted into a sequence of symbols. A set of hypotheses is generated from the sequence of symbols. The hypotheses are sequences of words represented in a form suitable for use by the computer, such as text. One or more information retrieval queries are constructed and executed to retrieve documents from a corpus (database). Retrieved documents are analyzed to produce an evaluation of the hypotheses of the set and to select one or more preferred hypotheses from the set. The preferred hypotheses are output to a display, speech synthesizer, or applications program. Additionally, retrieved documents relevant to the preferred hypotheses can be selected and output.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5500920
Semantic co-occurrence filtering for speech recognition and signal

     transcription applications - US Patent 5500920 Drawing
Semantic co-occurrence filtering for speech recognition and signal transcription applications
Inventor     Kupiec; Julian M. (Cupertino, CA)
Owner/Assignee     Xerox Corporation (Stamford, CT)
Patent assignment
All assignments
Publication Date     March 19, 1996
Application Number     08/316,619
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 30, 1994
US Classification     704/270.1 704/7 704/275 704/277
Int'l Classification     G10L 009/00
Examiner     Knepper; David D.
Assistant Examiner     Sartori; Michael A.
Attorney/Law Firm     Silverman; Alexander E.
Address
Parent Case     This is a continuation of application Ser. No. 08/126,170, filed Sep. 23, 1993, now abandoned.
Priority Data    
USPTO Field of Search     395/2.44 395/2.69 395/2.79 395/2.84 395/2.86 381/43 381/44 381/52 364/419.03 364/419.08 364/419.07 364/419.13
Patent Tags     semantic co-occurrence filtering speech recognition signal transcription applications
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
2921133



[0 after 0 votes]
3158685



[0 after 0 votes]
5406480
Kanno
704/10
Apr,1995

[0 after 0 votes]
5390281
Luciw
706/11
Feb,1995

[0 after 0 votes]
5278980
Pedersen
707/4
Jan,1994

[0 after 0 votes]
5278918
Bernzott
382/176
Jan,1994

[0 after 0 votes]
5063508
Yamada

Nov,1991

[0 after 0 votes]
5062074
Kleinberger

Oct,1991

[0 after 0 votes]
4994967
Asakawa
704/9
Feb,1991

[0 after 0 votes]
4931935
Ohira
704/8
Jun,1990

[0 after 0 votes]
4823306
Barbic
707/5
Apr,1989

[0 after 0 votes]
4674066
Kucera
707/5
Jun,1987

[0 after 0 votes]
4270182
Asija
704/8
May,1981

[0 after 0 votes]
3996569
Saunders
365/189.07
Dec,1976

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. An automated transcription disambiguation method comprising the steps of:

providing an input question having first and second words to a processor in a form subject to misinterpretation by the processor;

generating a plurality of hypotheses with the processor, the hypotheses including alternative interpretations of at least one of the first and second words due to possible misinterpretations of the input question by the processor;

producing with the processor an initial evaluation of the hypotheses;

gathering confirming evidence for the hypotheses by searching with the processor in a text corpus for co-occurrences of hypothesized first and second words for the hypotheses;

automatically and explicitly selecting with the processor from among the plurality of hypotheses a preferred hypothesis as to both of the first and second words based at least in part on the initial evaluation and at least in part on the gathered confirming evidence; and

outputting a transcription result from the processor, the transcription result representing the selected preferred hypothesis.

2. In the operation of a system comprising a processor, an input transducer, an output facility, and a corpus comprising at least one document comprising words represented in a first form, a method for transcribing an input question by transforming the input question from a sequence of words represented in a second form, subject to misinterpretation by the processor, into a sequence of words represented in the first form, the method comprising the steps of:

accepting the input question into the system, the question comprising a sequence of words represented in the second form;

converting the input question into a signal with the input transducer;

converting the signal into a sequence of symbols with the processor;

generating a set of hypotheses from the sequence of symbols with the processor, the hypotheses of the set comprising sequences of words represented in the first form, the set of hypotheses including alternative interpretations of at least one of the words to account for possible misinterpretation of the input question;

producing with the processor an initial evaluation of the hypotheses;

automatically constructing a query from hypotheses of the set with the processor;

executing the constructed query by searching with the processor in the corpus for co-occurrences of hypothesized words for the hypotheses;

analyzing the co-occurrences and the initial evaluation with the processor to produce a revised evaluation of the hypotheses of the set;

automatically and explicitly selecting a preferred hypothesis from the set with the processor responsively to the revised evaluation, the preferred hypothesis comprising a preferred sequence of words in the first form and thus a preferred transcription of the sequence of words of the input question; and

outputting the preferred hypothesis with the output facility.

3. The method of claim 2 wherein:

the corpus includes a plurality of documents;

the step of executing the constructed query includes retrieving documents containing the co-occurrences;

the step of automatically and explicitly selecting the preferred hypothesis further comprises selecting with the processor a preferred set of documents, the preferred set of documents comprising a subset of the retrieved documents that are relevant to the preferred hypothesis, and

the step of outputting the preferred hypothesis further comprises outputting with the output facility at least a portion of a document belonging to the preferred set of documents.

4. The method of claim 3 further comprising the steps, performed after the step of outputting at least a portion of a document belonging to the preferred set of documents, of:

accepting a relevance feedback input into the system, the relevance feedback input comprising a sequence of words represented in the second form, the sequence of words including a relevance feedback keyword and a word that occurs in the outputted document;

converting the relevance feedback input into an additional query with the processor; and

executing the additional query with the processor to retrieve an additional document from the corpus.

5. The method of claim 2 wherein:

the step of automatically and explicitly selecting the preferred hypothesis further comprises selecting a plurality of preferred hypotheses with the processor; and

the step of outputting the preferred hypothesis further comprises outputting the selected plurality of preferred hypotheses with the output facility.

6. The method of claim 2 wherein:

the step of accepting an input question further comprises accepting information into the system, the information concerning the locations of word boundaries between words of the question; and

the step of converting the signal into a sequence of symbols further comprises specifying subsequences of the sequence of symbols with the processor according to the locations of word boundaries thus accepted.

7. The method of claim 2 wherein the step of generating a set of hypotheses from the sequence of symbols further comprises generating hypothesized locations of word boundaries with the processor.

8. The method of claim 2 wherein the step of converting the input question into a signal comprises converting spoken input into an audio signal with an audio transducer.

9. The method of claim 2 wherein the step of constructing a query from hypotheses of the set comprises constructing a Boolean query with a proximity constraint.

10. The method of claim 2 wherein the step of generating a set of hypotheses from the sequence of symbols comprises detecting a keyword with the processor to prevent inclusion of the keyword in hypotheses of the set.

11. The method of claim 10 wherein the step of constructing a query from hypotheses of the set comprises constructing a query from hypotheses of the set with the processor, the query being responsive to the detected keyword.

12. The method of claim 2 wherein the step of constructing a query from hypotheses of the set comprises constructing an initial query with the processor and prior to the outputting step automatically constructing a reformulated query with the processor, the reformulated query comprising a reformulation of the initial query.

13. The method of claim 2 wherein the step of outputting the preferred hypothesis comprises visually displaying the preferred hypothesis.

14. The method of claim 2 wherein the step of outputting the preferred hypothesis comprises synthesizing a spoken form of the preferred hypothesis.

15. The method of claim 2 wherein the step of outputting the preferred hypothesis comprises providing the preferred hypothesis to an applications program.

16. The method of claim 15 further comprising the step of accepting the preferred hypothesis into the applications program as textual input to the applications program.

17. The method of claim 2 wherein the step of producing an initial evaluation comprises determining an initial evaluation measurement for each hypothesis.

18. In a system comprising a processor, a method for processing an input utterance comprising speech, the method comprising the steps of:

accepting the input utterance into the system;

producing a phonetic transcription of the input utterance with the processor;

responsively to the phonetic transcription, generating with the processor a set of hypotheses, the hypotheses of the set being hypotheses as to a first word contained in the input utterance and further as to a second word contained in the input utterance, the set of hypotheses including alternative interpretations of at least one of the words to account for the error-prone nature of speech analysis;

determining with the processor an initial evaluation measurement for each hypothesis;

automatically constructing an information retrieval query with the processor, the query comprising the set of hypotheses and a proximity constraint;

executing the constructed query in conjunction with an information retrieval subsystem comprising a text corpus; and

responsively to the results of the executed query with respect to each hypothesis of the set of hypotheses, and taking into consideration the initial evaluation measurements of the hypotheses, automatically and explicitly selecting with the processor from among the hypotheses of the set a preferred hypothesis, the preferred hypothesis including the first and second words.

19. The method of claim 18 wherein the step of generating a set of hypotheses comprises matching portions of the phonetic transcription against a phonetic index with the processor.

20. In a system comprising a processor, an error-prone input facility, and an information retrieval subsystem, said information retrieval subsystem comprising a natural-language text corpus, a method for accessing documents of the corpus, the method comprising the steps of:

transcribing a question with the error-prone input facility and the processor, the question comprising a sequence of words;

selecting a subset of words of the sequence with the processor;

forming with the processor a plurality of hypotheses about the selected subset of words, the hypotheses of the plurality representing possible alternative transcriptions of the question to account for the error-prone nature of the input facility;

producing with the processor an initial evaluation of the hypotheses;

automatically constructing a co-occurrence query with the processor, the co-occurrence query being based on hypotheses of the plurality;

executing the co-occurrence query in conjunction with the information retrieval subsystem to retrieve a set of documents;

analyzing the initial evaluation and documents of the retrieved set with the processor to produce a revised evaluation of the hypotheses;

responsively to the revised evaluation, automatically and explicitly selecting with the processor a preferred hypothesis representing a preferred transcription of the sequence of words of the question;

evaluating documents of the retrieved set with the processor with respect to the selected hypothesis to determine a relevant document; and

outputting from the system the relevant document thus determined.

21. An automated system for producing a preferred transcription of a question presented in a form prone to erroneous transcription, comprising:

a processor;

an input transducer, coupled to the processor, for accepting an input question and producing a signal therefrom;

converter means, coupled to the input transducer, for converting the signal to a string comprising a sequence of symbols;

hypothesis generation means, coupled to the converter means, for developing a set of hypotheses from the string, each hypothesis of the set comprising a sequence of word representations, the set of hypotheses representing a set of possible alternative transcriptions of the input question to account for the likelihood of erroneous transcription;

initial scoring means, coupled to the hypothesis generation means, for determining an initial score for each hypothesis;

query construction means, coupled to the hypothesis generation means, for automatically constructing at least one information retrieval query using hypotheses of the set;

a corpus comprising documents, each document comprising word representations;

query execution means, coupled to the query construction means and to the corpus, for retrieving from the corpus documents responsive to said at least one query;

analysis means, coupled to the query execution means, for generating an analysis of the retrieved documents and evaluating the hypotheses of the set based on the initial scores and the analysis to determine a preferred hypothesis from among the hypotheses of the set, the preferred hypothesis representing a preferred transcription of the sequence of words of the input question; and

output means, coupled to the analysis means, for outputting the preferred hypothesis.

22. A speech processing apparatus comprising:

input means for transducing a spoken utterance into an audio signal;

means for converting the audio signal into a sequence of phones;

means for analyzing the sequence of phones to generate a plurality of hypotheses comprising sequences of words, the hypotheses representing possible alternative transcriptions of the spoken utterance to account for the error-prone nature of speech analysis;

means for determining an initial evaluation measurement for each hypothesis;

means for automatically constructing a query using the hypotheses of the plurality;

information retrieval means, coupled to a corpus of documents and to the constructing means, for retrieving documents of the corpus relevant to the constructed query;

means for automatically and explicitly ranking the hypotheses of the plurality according to confirming evidence found in the retrieved documents and further according to the initial evaluation measurements previously determined; and

means for outputting a subset of the hypotheses thus ranked, each hypothesis of the subset comprising a sequence of words representing a possible transcription of the spoken utterance.
 Description Submit all comments and votes
 


COPYRIGHT NOTIFICATION

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owners have no objection to the facsimile reproduction, by anyone, of the patent document or the patent disclosure, as it appears in the patent and trademark office patent file or records, but otherwise reserve all copyright rights whatsoever.

SOFTWARE APPENDIX

An appendix comprising 71 pages is included as part of this application. The appendix provides two (2) files of a source code software program for implementation of an embodiment of the method of the invention on a digital computer.

The files reproduced in the appendix represent unpublished work that is Copyright .COPYRGT.1993 Xerox Corporation. All rights reserved. Copyright protection claimed includes all forms and matters of copyrightable material and information now allowed by statutory or judicial law or hereafter granted, including without limitation, material generated from the software programs which are displayed on the screen such as icons, screen display looks, etc.

BACKGROUND OF THE INVENTION

The present invention relates to systems and methods for transcribing words from a form convenient for input by a human user, e.g., spoken or handwritten words, into a form easily understood by an applications program executed by a computer, e.g., text. In particular, it relates to transcription systems and methods appropriate for use in conjunction with computerized information-retrieval (IR) systems and methods, and more particularly to speech-recognition systems and methods appropriate for use in conjunction with computerized information-retrieval systems and methods used with textual databases.

In prior art IR systems, the user typically enters input--either natural-language questions, or search terms connected by specialized database commands--by typing at a keyboard. Few IR systems permit the user to use speech input, that is, to speak questions or search strings into a microphone or other audio transducer. Systems that do accept speech input do not directly use the information in a database of free-text natural-language documents to facilitate recognition of the user's input speech.

The general problem of disambiguating the words contained in an error-prone transcription of user input arises in a number of contexts beyond speech recognition, including but not limited to handwriting recognition in pen-based computers and personal digital assistants (e.g., the Apple Newton) and optical character recognition. Transcription of user input from a form convenient to the user into a form convenient for use by the computer has any number of applications, including but not limited to word processing programs, document analysis programs, and, as already stated, information retrieval programs. Unfortunately, computerized transcription tends to be error-prone.

SUMMARY OF THE INVENTION

The present invention provides a technique for using information retrieved from a text corpus to automatically disambiguate an error-prone transcription, and more particularly provides a technique for using co-occurrence information in the corpus to disambiguate such input. According to the invention, a processor accepts an input question. The processor is used to generate a hypothesis, typically as to a first word and a second word in the input question, and then is used to gather confirming evidence for the hypothesis by seeking a co-occurrence of the first word and the second word in a corpus.

In one aspect, the present invention provides a system and method for automatically transcribing an input question from a form convenient for user input into a form suitable for use by a computer. The question is a sequence of words represented in a form convenient for the user, such as a spoken utterance or a handwritten phrase. The question is transduced into a signal that is converted into a sequence of symbols. A set of hypotheses is generated from the sequence of symbols. The hypotheses are sequences of words represented in a form suitable for use by the computer, such as text. One or more information retrieval queries are constructed and executed to retrieve documents from a corpus (database). Retrieved documents are analyzed to produce an evaluation of the hypotheses of the set and to select one or more preferred hypotheses from the set. The preferred hypotheses are output to a display, speech synthesizer, or applications program. Additionally, retrieved documents relevant to the preferred hypotheses can be selected and output.

In another aspect, the invention provides a system and method for retrieving information from a corpus of natural-language text in response to a question or utterance spoken by a user. The invention uses information retrieved from the corpus to help it properly interpret the user's question, as well as to respond to the question.

The invention takes advantage of the observation that the intended words in a user's question usually are semantically related to each other and thus are likely to co-occur in a corpus within relatively close proximity of each other. By contrast, words in the corpus that spuriously match incorrect phonetic transcriptions are much less likely to be semantically related to each other and thus less likely to co-occur within close proximity of each other. The invention retrieves from the corpus those segments of text or documents that are most relevant to the user's question by hypothesizing what words the user has spoken based on a somewhat unreliable, error-prone phonetic transcription of the user's spoken utterance, and then searching for co-occurrences of these hypothesized words in documents of the corpus by executing Boolean queries with proximity and order constraints. Hypotheses that are confirmed by query matching are considered to be preferred interpretations of the words of the user's question, and the documents in which they are found are considered to be of probable relevance to the user's question.

A further understanding of the nature and advantages of the invention will become apparent by reference to the remaining portions of the specification and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a system that embodies the invention;

FIG. 2 schematically depicts information flow in a system according to a first specific embodiment of the invention;

FIG. 3 is a flowchart of method steps carried out according to a first specific embodiment of the invention;

FIG. 4 illustrates a conceptual model of a portion of a phonetic index;

FIG. 5 is a flowchart of steps for phonetic index matching;

FIG. 6 is a flowchart of steps for query reformulation;

FIG. 7 is a flowchart of steps for scoring;

FIG. 8 schematically depicts an example of information flow in a system according to a second specific embodiment of the invention;

FIG. 9 is a flowchart of method steps carried out according to a second specific embodiment of the invention;

FIG. 10 illustrates a system in which the invention is used as a "front end" speech-recognizer component module in the context of a non-information-retrieval application; and

FIG. 11 is a specific embodiment that is adaptable to a range of input sources, hypothesis generation mechanisms, query construction mechanisms, and analysis techniques.

DESCRIPTION OF SPECIFIC EMBODIMENTS

The disclosures in this application of all articles and references, including patent documents, are incorporated herein by reference.

1. Introduction

The invention will be described in sections 1 through 6 with respect to embodiments that accept user input in the form of spoken words and that are used in information retrieval (IR) contexts. In these embodiments, the invention enables a person to use spoken input to access information in a corpus of natural-language text, such as contained in a typical IR system. The user is presented with information (e.g., document titles, position in the corpus, words in documents) relevant to the input question. Some of these embodiments can incorporate relevance feedback.

The invention uses information, particularly co-occurrence information, present in the corpus to help it recognize what the user has said. The invention provides robust performance in that it can retrieve relevant information from the corpus even if it does not recognize every word of the user's utterance or is uncertain about some or all of the words.

A simple example illustrates these ideas. Suppose that the corpus comprises a database of general-knowledge articles, such as the articles of an encyclopedia, and that the user is interested in learning about President Kennedy. The user speaks the utterance, "President Kennedy," which is input into the invention. The invention needs to recognize what was said and to retrieve appropriate documents, that is, documents having to do with President Kennedy. Suppose further that it is unclear whether the user has said "president" or "present" and also whether the user has said "Kennedy" or "Canada." The invention performs one or more searches in the corpus to try to confirm each of the following hypotheses, and at the same time, to try to gather documents that are relevant to each hypothesis:

______________________________________ president kennedy present kennedy president canada present canada ______________________________________

The corpus is likely to include numerous articles that contain phrases such as "President Kennedy," "President John F. Kennedy," and the like. Perhaps it also includes an article on "present-day Canada," and an article that contains the phrase "Kennedy was present at the talks . . . ." It does not include any article that contains the phrase "president of Canada" (because Canada has a prime minister, not a president).

The invention assumes that semantically related words in the speaker's utterance will tend to appear together (co-occur) more frequently in the corpus. Put another way, the invention assumes that the user has spoken sense rather than nonsense, and that the sense of the user's words is reflected in the words of articles of the corpus. Thus the fact that "President Kennedy" and related phrases appear in the corpus much more frequently than phrases based on any of the other three hypotheses suggests that "President Kennedy" is the best interpretation of the user's utterance and that the articles that will most interest the user are those that contain this phrase and related phrases. Accordingly, the invention assigns a high score to the articles about President Kennedy and assigns lower scores to the article about present-day Canada and the article about Kennedy's presence at the talks. The highest-scoring articles can be presented to the user as a visual display on a computer screen, as phrases spoken by a speech synthesizer, or both. Optionally, the user can make additional utterances directing the invention to retrieve additional documents, narrow the scope of the displayed documents, and so forth, for example, "Tell me more about President Kennedy and the Warren Commission."

The present invention finds application in information retrieval systems with databases comprising free (unpreprocessed) natural-language text. It can be used both in systems that recognize discrete spoken words and in systems that recognize continuous speech. It can be used in systems that accommodate natural-language utterances, Boolean/proximity queries, special commands, or any combination of these.

More generally, the invention finds application in speech-recognition systems regardless of what they are connected to. A speech recognizer that embodies or incorporates the method of the invention with an appropriate corpus or corpora can be used as a "front end" to any application program where speech recognition is desired, such as, for example, a word-processing program. In this context, the invention helps the application program "make more sense" of what the user is saying and therefore make fewer speech-recognition mistakes than it would otherwise. This is discussed further in section 7 below.

Still more generally, the invention finds application beyond speech-recognition in handwriting recognition, optical character recognition, and other systems in which a user wishes to input words into a computer program in a form that is convenient for the user but easily misinterpreted by the computer. This is discussed further in Section 8 below. The technique of the present invention, in which a sequence of words supplied by a user and transcribed by machine in an error-prone fashion is disambiguated and/or verified by automatically formulating alternative hypotheses about the correct or best interpretation, gathering confirming evidence for these hypotheses by searching a text corpus for occurrences and co-occurrences of hypothesized words, and analyzing the search results to evaluate which hypothesis or hypotheses best represents the user's intended meaning, is referred to as semantic co-occurrence filtering.

2. Glossary

The following terms are intended to have the following general meanings:

Corpus: A body of natural language text to be searched, used by the invention. Plural: corpora.

Document match: The situation where a document satisfies a query.

FSM, finite-state recognizers: A device that receives a string of symbols as input, computes for a finite number of steps, and halts in some configuration signifying that the input has been accepted or else that it has been rejected.

Hypothesis: A guess at the correct interpretation of the words of a user's question, produced by the invention.

Inflected form: A form of a word that has been changed from the root form to mark such distinctions as case, gender, number, tense, person, mood, or voice.

Information retrieval, IR: The accessing and retrieval of stored information, typically from a computer database.

Keyword: A word that received special treatment when input to the invention; for example, a common function word or a command word.

Match sentences: Sentences in a document that cause or help cause the document to be retrieved in response to a query. Match sentences contain phrases that conform to the search terms and constraints specified in the query.

Orthographic: Pertaining to the letters in a word's spelling.

Phone: A member of a collection of symbols that are used to describe the sounds uttered when a person pronounces a word.

Phonetic transcription: The process of transcribing a spoken word or utterance into a sequence of constituent phones.

Query: An expression that is used by an information retrieval system to search a corpus and return text that matches the expression.

Question: A user's information need, presented to the invention as input.

Root form: The uninflected form of a word; typically, the form that appears in a dictionary citation.

Utterance: Synonym for question in embodiments of the invention that accept spoken input.

Word index: A data structure that associates words found in a corpus with all the different places such words exist in the corpus.

3. System Components

Certain system components that are common to the specific embodiments of the invention described in sections 4, 5, and 6 will now be described.

FIG. 1 illustrates a system 1 that embodies the present invention. System 1 comprises a processor 10 coupled to an input audio transducer 20, an output visual display 30, an optional output speech synthesizer 31, and an information retrieval (IR) subsystem 40 which accesses documents from corpus 41 using a word index 42. Also in system 1 are a phonetic transcriber 50, a hypothesis generator 60, a phonetic index 62, a query constructor 70, and a scoring mechanism 80. Certain elements of system 1 will now be described in more detail.

Processor 10 is a computer processing unit (CPU). Typically it is part of a mainframe, workstation, or personal computer. It can comprise multiple processing elements in some embodiments.

Transducer 20 converts a user's spoken utterance into a signal that can be processed by processor 10. Transducer 20 can comprise a microphone coupled to an analog-to-digital converter, so that the user's speech is converted by transducer 20 into a digital signal. Transducer 20 can further comprise signal-conditioning equipment including components such as a preamplifier, a pre-emphasis filter, a noise reduction unit, a device for analyzing speech spectra (e.g., by Fast Fourier Transform), or other audio signal processing devices in some embodiments. Such signal-conditioning equipment can help to eliminate or minimize spurious or unwanted components from the signal that is output by transducer 20, or provide another representation (e.g., spectral) of the signal.

Display 30 provides visual output to the user, for example, alphanumeric display of the texts or titles of documents retrieved from corpus 41. Typically, display 30 comprises a computer screen or monitor.

Speech synthesizer 31 optionally can be included in system 1 to provide audio output, for example, to read aloud portions of retrieved documents to the user. Speech synthesizer 31 can comprise speech synthesis hardware, support software executed by CPU 10, an audio amplifier, and a speaker.

IR subsystem 40 incorporates a processor that can process queries to search for documents in corpus 41. It can use processor 10 or, as shown in FIG. 1, can have its own processor 43. IR subsystem 40 can be located at the same site as processor 10 or can be located at a remote site and connected to processor 10 via a suitable communication network.

Corpus 41 comprises a database of documents that can be searched by IR subsystem 40. The documents comprise natural-language texts, for example, books, articles from newspapers and periodicals, encyclopedia articles, abstracts, office documents, etc.

It is assumed that corpus 41 has been indexed to create word index 42, and that corpus 41 can be searched by IR subsystem 40 using queries that comprise words (search terms) of word index 42 with Boolean operators and supplemental proximity and order constraints expressible between the words. This functionality is provided by many known IR systems. Words in word index 42 can correspond directly to their spellings in corpus 41, or as is often the case in IR systems, can be represented by their root (uninflected) forms.

Transcriber 50, hypothesis generator 60, phonetic index 62, query constructor 70, and scoring mechanism 80 are typically implemented as software modules executed by processor 10. The operation and function of these modules is described more fully below for specific embodiments, in particular with reference to the embodiments of FIGS. 2 and 8. It will be observed that corresponding elements in FIGS. 1, 2, 8, and 10 are similarly numbered.

3.1 Query Syntax

It is assumed that IR subsystem 40 can perform certain IR query operations. IR queries are formulated in a query language that expresses Boolean, proximity, and ordering or sequence relationships between search terms in a form understandable by IR subsystem 40. For purposes of discussion the query language is represented as follows:

__________________________________________________________________________ term represents the single search term term. A term can be an individual word or in some cases another query. <p term1 term2 . . . > represents strict ordering of terms. The IR subsystem determines that a document matches this query if and only if all the terms enclosed in the angle brackets appear in the