The subject of this disclosure is a Finite State Automaton (FSA) used as part of a term detector employed in a digital pattern search system (searcher). In particular the invention includes various advances in the art of FSA design which make the FSA practical for pattern recognition. Specifically, these advances minimize the amount of memory which is required in each FSA in performing pattern recognition, and allow a speed capability such that the searching can be performed at the rate at which a mass storage medium can supply data. The large amount of memory required and the low speed of processing in the prior state of the art made the use of an FSA impractical for most real applications. The new advances include the following: An adaptation of the indexing means described in reference 4, which allows simple selection of the correct success transition state from a number of possible success states; The partitioning of searchable digital patterns into parts (called nibbles) to reduce the amount of memory used within an FSA; The use of various types of states to allow the detection of specific input patterns in the presence of don't-care patterns; The use of multiple FSA's to reduce the amount of memory needed in these FSA's when handling don't-care patterns; The unique design of an FSA to search for multiple sequential input patterns; The unique features of the FSA design to allow recognition of numerical ranges of values from among numerical data.
A range-conditional character string retrieving method and system capable of performing retrieval of a numerical value from a character string at an increased speed by shortening the time taken for generation of finite automaton, range condition retrieval for a character string containing admixedly numeric characters and non-numeric characters such as alphabetic letters and highly intelligent retrieval of a numerical value with designation of preceding and succeeding characters. Given range condition is partitioned in accordance with difference in the digit number between upper and lower limit values, whereon retrieval is performed in each of partitioned ranges in parallel. When a finite automaton transits from a predetermined state to at least two state in dependence on the result of collation of a character string subjected to retrieval, conditions for the state transitions are designated in terms of corresponding codes. A numerical value detecting unit for detecting a numerical value of interest from the character string subjected to retrieval is provided in association with a range decision unit for deciding whether the numerical value detected by the numerical value detecting unit falls within a specified range. A character string collating unit for retrieving a specific character string from the string subjected to retrieval is provided in association with a range condition collating unit for detecting a numerical value falling within a specific range from the specific character string.
In a document information filing system of the present invention, the document picture information and the filing information are recorded in a longitudinal video recorder (LVR). When one of the codes forming a retrieval code of the filing information is called for by a keyboard input operation, a plurality of the retrieval codes containing the called for code are selected from the LVR and then sequentially stored in a title memory. Different sequence codes are added to each of the retrieval codes by means of a sequence code addition circuit. The sequence codes and the retrieval codes are displayed in a corresponding manner by a display device. When a particular sequence code is designated by an operator, the filing information containing the retrieval code with the particular sequence code are read out from the LVR and the picture information corresponding to the filing information are displayed by the display device and/or printed.
A microprocessor includes a programmable logic array (PLA) adapted to allow "subroutines" or sequences within the PLA. The subroutines can be used by more than one opcode with a return performed to the opcode after execution of the subroutine.
A means and method for comparing an incoming sequential string of digitally encoded characters from a database stored in a conventional memory against a pattern with an arbitrary number of elements, comprising specified characters or character types (alphabetic, numeric, delimiter, etc.) or tokens to indicate the matching of a specified or arbitrary number of input characters, is disclosed. The system comprises a number of digital machines, sequenced by control words fetched from their memories. The control words may indicate the current input character or character type of interest for each machine, the address of the potential next control word of the machine, a flag indicating the successful completion of a match, and other control fields. If the input character matches the character or type of interest, the machine's next control word will be that specified by the current control word, and optionally the next control word of one or more of the other machines will be forced to an address specified in the current control word. By properly specifying the control words in each machine the input character string can be compared against an arbitrary number of pattern elements, limited only by the ability to map the elements into the control word memories of the available machines.
Methods and apparatus are disclosed for regular expression matching, especially for, but not limited to high-speed applications such as in a packet switching system (e.g., a router). One implementation includes a matching mechanism for processing each character of a plurality of input characters to progressively generate keyword indications of matched keywords as matched keywords are identified, and for generating one or more matching indications of matched base expressions and non-keyword expressions. These indications are received by a matched regular expression detection mechanism which generates one or more matched regular expression indications based on said one or more keyword indications and said one or more matching indications. In one implementation, the matched regular expression detection mechanism maintains a keyword data structure, which is updated as matched keyword indications are received to ensure they are matched in a proper order. One implementation uses a bitmap to track the matched keywords and AND-SHIFT-OR operations to efficiently update the bitmap.