A method and system for generating and searching a full text index. The fill text index includes the use of word numbers and a minimum delta which minimizes the need to access document level information during the application of search operators. Word registers having coordinated document level and word level information, as well as relevance information are used in search operations. Word numbers are clustered together during sub-operations in preparation for the next operation in a search query. The fill text index according to the present invention is extremely efficient and greatly reduces table accesses and/or disk I/Os.
A method, system and computer program product implementing the method are provided to process a text search query in a collection of documents. A full posting index is generated for the documents in the collection. The full posting index comprises one or more first index terms and a full posting list for each first index term, enumerating the occurrences of the first index term in the documents. In addition to the full posting index, at least one additional posting index is generated for the documents. The additional posting index is related to a defined document part and comprises one or more second index terms and a restricted posting list for each second index term, enumerating all occurrences of the second index term in the document part of the documents of the collection. The text search query is performed using the additional posting index.
A method, system, and computer program product are provided for processing a text search query in a collection of documents. A full posting index is generated. The full posting index comprises one or more first index terms and a full posting list for each first index term, enumerating occurrences of the first index term in the documents of the collection. A text search query comprises search conditions on search terms. These search conditions are translated into conditions on the search terms to provide translated conditions. At least one short posting index is generated. The short posting index comprises one or more second index terms and a short posting list for each second index term, enumerating the documents in which the second index term occurs. Filter conditions and complementary conditions to represent the full content of the translated conditions are generated, wherein the filter conditions approximate the translated conditions. The filter conditions are processed using the short posting index and the complementary conditions are processed using the full posting index to provide a query result.
A system and method transform queries with subqueries, using window aggregation. An optimizer in a relational database management system transforms queries to optimize their efficiency and speed. The method transforms queries that have a subquery, replacing the subquery with a window aggregation function. In the case of a correlated subquery, the window aggregation function is partitioned by a correlated column of a correlated table. All data in the main select clause, or outer block, of the query that was obtained through references to the correlated table is instead obtained through the new window aggregation subquery. By using window aggregation, the aggregation is performed at the same time as the selection of relevant data from the correlated table, thereby compiling all needed data in a single pass through the table or view. Reducing the number of times that tables or views are accessed reduces the computational demands of a query.
A full text search system using a character string collation method which searches a large quantity of data using a plurality of search processing apparatuses is disclosed. This system comprises a search integration unit which divides search-target character string data into a group of character string records, allocates the divided records to one or more search processing apparatuses, transmits given character string search conditions to each search processing apparatus, and receives and integrates search results. Furthermore, this System comprises an update temporary storage unit which temporarily stores new character string records to update the search-target character string data and an update record search instruction unit which instructs the new character string records stored in the update temporary memory unit to any one of the search processing apparatuses determined in advance as a part of the search-target character string data.
A computer-based method of routing a message to a system includes receiving a message, and processing the message using large-vocabulary continuous speech recognition to generate a string of text corresponding to the message. The method includes generating a confidence estimate of the string of text corresponding to the message and comparing the confidence estimate to a predetermined threshold. If the confidence estimate satisfies the predetermined threshold, the string of text is forwarded to the system. If the confidence estimate does not satisfy the predetermined threshold, the information relating to the message is forwarded to a transcriptionist. The message may include one or more utterances. Each utterance in the message may be separately or jointly processed. In this way, a confidence estimate may be generated and evaluated for each utterance or for the whole message. Information relating to each utterance may be separately or jointly forwarded based on the results of the generation and evaluation.