A text comparator which includes a decoded data memory (13) which contains a plurality of shift registers (SR2d-SR7F), one shift register for each of the plurality of different symbols forming the data base stored within the mass storage device (11). The decoded signal is applied to the input lead of the shift register associated with that character, and a clock signal applied to each shift register of the decoded data memory. The decoded data memory will provide signals on the output leads of each shift register indicative of the most recently received character, as well as each of the preceding K characters received from the mass storage device and decoded. The output leads of the shift registers are connected to a variety of logic gates in order to provide an output signal indicating when the desired textual phrase has been located on the disk. In addition, word counters, paragraph counters, and other devices are employed as desired to provide special text comparison functions. A second embodiment of a text comparator constructed in accordance with this invention receives data stored in a mass storage device. This embodiment includes word logic, delimiter logic, set logic, set combination logic, proximity logic and programming logic.
RELATED APPLICATIONS
This application is a division of application Ser. No. 06/456,989 filed Jan. 12, 1983, now U.S. Pat. No 4,625,295 which is turn is a continuation-in-part of U.S. Patent application Ser. No. 06/342,620 filed Jan. 25, 1982, now U.S. Pat. No. 4,531,201.
An improved method for text searching has application to application programs in computers, where the application program handles files written in a language. Advantageously a character within the first string is selected. In the general case the character selected is a relatively rarely used character in the language, and in simplified cases it is selected as non-alphabetic or as found within a set of characters known to be relatively rarely used. The file is then scanned, starting not at the beginning but rather from a position offset from the beginning of the file according to the position of the selected character within the first string. In the event of a match for that character, the remainder of the character positions of the first string are compared for a string-length match. The scanning terminates at a position short of the end of the file according to the position of the selected character within the first string, or terminates on a string-length match. The speed of a text search within a file is improved by an order of magnitude or more, and sometimes by two orders of magnitude.
A computerized method for retrieving documents from a text corpus in response to a user-supplied natural language input string, e.g., a question. An input string is accepted and analyzed to detect phrases therein. A series of queries based on the detected phrases is automatically constructed through a sequence of successive broadening and narrowing operations designed to generate an optimal query or queries. The queries of the series are executed to retrieve documents, which are then ranked and made available for output to the user, a storage device, or further processing. In another aspect the method is implemented in the context of a larger two-phase method, of which the first phase comprises the method of the invention and the second phase of the method comprises answer extraction.
In accordance with embodiments of the invention, local metadata is embedded into an embedded interactive code document by combining a first m-array and a second m-array to generate a combined m-array with encoded local metadata such that a start position of the second m-array in the combined m-array is shifted, by an amount that is based on the local metadata, relative to a start position of the first m-array in the combined m-array. The first m-array and the second m-array may contain the same repeating bit sequence. Local metadata may be decoded from the embedded interactive code document by decoding the local metadata from the combined m-array by determining the amount by which the second m-array is shifted, relative to the first m-array, in the combined m-array.
A method of searching a database having a plurality of objects is provided. Each object includes attributes and, for each attribute, a number of values. A query specifies two attributes and a maximum distance. A respective set of ranges is established for each object that has a value for the first attribute. Each set includes a range for each value of the first attribute. Each range is defined by minimum and maximum location values. A test range is established for one of the ranges. The test range has values equal to the minimum and maximum values of one of the ranges. The test range is adjusted, if necessary, so that it includes one of the values of the second attribute of the corresponding object. The test range is added to a group of ranges corresponding to the object if the minimum and maximum test values do not differ from one another by more than the maximum distance. The steps of (1) establishing a test range, (2) adjusting the test range and (3) adding the test range to the group are repeated for each range in the set of ranges corresponding to the one object. Steps (1) to (3) are repeated for each value of the second attribute of each respective object for which a set of ranges is established. Each object for which the group of ranges includes at least one range is identified as being found by the searching.
A method and apparatus for capturing information encoded within a surface, such as location information or document metadata, and associating the information with a document is described. The captured information may be obtained by a camera associated with a pointing or writing device having a camera, such as an image-capturing pen.