An electronic dictionary comprising an input device, a display, a dictionary memory, a searching device, and an indicator device. The memory consists of plural divisions of equal storage capacity and stores multiple sets of word data representative of different words in an alphabetical order. Each set of word data is stored in one of the plural divisions such that each of the divisions has at its end a waste portion storing no part of any of the multiple sets of word data, if the number of characters of the first set of word data stored in the following division is larger than the number of characters which may be stored in the waste portion. In response to input data entered through the input device, the display displays the input data, and the searching device finds, in a binary search process, one of the plural divisions of the memory which is assigned to store the set of word data representative of the input data, and scans the found division to search the set of word data representative of the input data. The indicator device indicates whether the set of word data representative of the input data has been found or not.
A memory device having an associative memory for the storage of data belonging to a plurality of classes. The associative memory has a plurality of memory locations aligned along rows and columns for the storage of data along the rows. Each memory row has a plurality of groups of memory locations, each storing a respective datum, wherein groups of memory locations adjacent along one and the same row store data belonging to different classes. Groups of memory locations adjacent in the direction of the columns and disposed on different rows store data belonging to one and the same class. Each class has data having a different maximum lengths. The device is particularly suitable for the storage of words belonging to a dictionary for automatic recognition of words in a written text.
In a method of, and apparatus for validating character sequences, for example strings of characters generated by keyboards or keypads in telephone systems, computers, and the like, increased speed and reduced memory requirements are achieved by comparing the characters with a database representing valid sequences of characters, the database comprising one or more segments each comprising valid character sequences. A potential range of characters is extrapolated from a character of the input character sequence taking into account possible values of possible succeeding characters. This potential range of sequences is than compared with a database segment. If there is intersection between the potential range and one of the valid character sequences, and there is no succeeding character in the input character sequence, the input character sequence is determined to be valid. If intersection exists and there is a succeeding character, a second potential range of sequences that is a subset of the first-mentioned potential range of sequences is extrapolated from the succeeding character. This second potential range of sequences is compared with the database to determine whether or not there is intersection between the second potential range of sequences and a the valid character sequence range. When the step of determining intersection has been performed for each character in the input sequence, the input character sequence is indicated to be complete.
A memory device having an associative memory for the storage of data belonging to a plurality of classes. The associative memory has a plurality of memory locations aligned along rows and columns for the storage of data along the rows. Each memory row has a plurality of groups of memory locations, each storing a respective datum, wherein groups of memory locations adjacent along one and the same row store data belonging to different classes. Groups of memory locations adjacent in the direction of the columns and disposed on different rows store data belonging to one and the same class. Each class has data having a different maximum lengths. The device is particularly suitable for the storage of words belonging to a dictionary for automatic recognition of words in a written text.
An FSM data structure is encoded by generating a transition unit of data corresponding to each transition which leads ultimately to a final state of the FSM. Information about the states is included in the transition units, so that the encoded data structure can be written without state units of data. The incoming transition units to a final state each contain an indication of finality. The incoming transition units to a state which has no outgoing transition units each contain a branch ending indication. The outgoing transition units of each state are ordered into a comparison sequence for comparison with a received element, and all but the last outgoing transition unit contain an alternative indication of a subsequent alternative outgoing transition. The indications are incorporated with the label of each transition unit into a single byte, and the remaining byte values are allocated among a number of pointer data units, some of which begin full length pointers and some of which begin pointer indexes to tables where pointers are entered. The pointers may be used where a state has a large number of incoming transitions or where the block of transition units depending from a state is broken down to speed access. The first outgoing transition unit of a state is positioned immediately after one of the incoming transitions so that it may be found without a pointer. Each alternative outgoing transition unit is stored immediately after the block beginning with the previous outgoing transition unit so that it may be found by proceeding through the transition units until the number of alternative bits and the number of branch ending bits balance.
A high-speed document retrieval system creates a regular expression dictionary and a word index on the basis of a retrieval document and a word dictionary to conduct retrieval to a document through the regular expression dictionary and the word index at a high speed. A regular expression dictionary expressing a set of character strings having the same length is created from a word dictionary. In terms of a character string included in a retrieval document and matching with a regular expression in the regular expression dictionary, an index element is recorded in a word index when there is no different index element which allows an observing index element to be deducible, which eventually produces a word index capable of achieving a high-speed full-text retrieval without the noticeable increase in the index capacity.