The subject invention is directed to a database system for organizing large amounts of data to be accessed by a digital computer. More particularly, a free form type database, in the form of a summarized, multikey tree, is built from files stored on the computer. After a building operation, the user obtains specified information by using the summarized database. Information in the files is divided into three categories; that is, a dimension field which comprises data to be organized, a summary field which comprises a numeric quantity on which calculations can be performed, and a non-summary field which comprises other information associated with an input record. The internal nodes of the tree summarize and organize sets of input records. Methods are provided for reducing the amount of storage space used by cutting off the tree when the size of sets go below a given threshold, and sharing parts of the tree so that each record does not appear n! times in the database.
CROSS REFERENCES TO RELATED APPLICATIONS
The present Patent Application is a divisional patent application from U.S. patent application Ser. No. 07/495,360, for MULTI-DIMENSIONAL SUMMARY DATABASE SYSTEM AND METHOD by Frederick A Powers and Stanley R. Zanarotti, filed Mar. 16, 1990, now U.S. Pat. No. 5,257,365.
The present Patent Application is related to U.S. patent application Ser. No. 07/495,360 by Frederick A. Powers and Stanley R. Zanarotti for MULTI-DIMENSIONAL SUMMARY DATABASE SYSTEM AND METHOD by Frederick A Powers and Stanley R. Zanarotti, filed on the same date as the present Patent Application;
U.S. patent application Ser. No. 08/079,248 by Frederick A. Powers and Stanley R. Zanarotti for MULTI-DIMENSIONAL SUMMARY DATABASE SYSTEM AND METHOD by Frederick A Powers and Stanley R. Zanarotti, filed on the same date as the present Patent Application; and,
U.S. patent application Ser. No. 08/078,396 by Frederick A. Powers and Stanley R. Zanarotti for MULTI-DIMENSIONAL SUMMARY DATABASE SYSTEM AND METHOD by Frederick A Powers and Stanley R. Zanarotti, filed on the same date as the present Patent Application.
The above referenced U.S. Patent Applications are assigned to the assignee of the present U.S. Patent Application.
A data structure allowing the necessary amount of memory to be reduced while maintaining or improving search performance is disclosed. The data structure in which items of data are stored for search includes: a tree structure in which the items or data are stored except for a portion of the items of data corresponding to a sub-tree structure, which is a selected portion of an assumed tree structure formed by all the items of data; and an equivalent table storing the portion of the items of data in table form.
A method of optimizing ranking queries in analytic applications. The method involves dividing a ranking query constructed by a user into multiple sub-commands. These sub-commands are based on the relationship between the foreign key and ranking attributes of a table of data and are run as a batch of commands. As usable data is extracted from each sub-command, it is stored and the foreign key is used as a filter for the next query in the batch. This eliminates accessing irrelevant data and facilitates the query operation. The extracted data is joined as the last step of the process in order to minimize the number of join operations performed by the query while the batch is being executed, thereby facilitating the query process.
The invention relates to a database system comprising a computing unit, a main memory and a, in particular a peripheral memory storing at least one multi-dimensional stock of data in the form of a UB tree, further the invention relates to methods to run a database of this kind to read data and to implement join operations and further relational algebra operations, and sub-dividing the UB tree into a predetermined number of sub-spaces and consecutively processing the sub-spaces to read and ready the stock of data. Advantageously a cache storage buffers the regions (jump regions) of the UB tree cutting the sub-space being processed until the jump region(s) has (have) been completely processed in the subsequent sub-spaces.
A method for performing a range-sum query in a database, in which the data is represented as a multi-dimensional data cube, is disclosed. The method comprises the steps of: selecting a subset of the dimensions of the data cube; computing a set of prefix-sums along the selected dimensions, based on the aggregate values in the cube corresponding the queried ranges; and generating a range-sum result based on the computed prefix-sums. Two d-dimensional arrays A and P are used for representing the data cube and the prefix-sums of the data cube, respectively. By maintaining the prefix-sum array P of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the sub-cube circumscribed by a query, using the inverse binary operator of the SUM operator. Alternatively, only auxiliary information for any user-specified fraction of the size of the d-dimensional data cube is maintained, to minimize the required system storage. The answer to a range query may now require access to some cells of the data cube in addition to the auxiliary information, but the average time complexity is still reduced significantly.
Disclosed is a method and system for performing a partial-sum query in a database in which the data is represented as a multi-dimensional data cube. The data cube is partitioned into multi-dimensional blocks. One or more covering codes are then selected for each block, and a group of partial-sums is computed for each block based on its covering codes. At query time, the query result is generated by combining the partial-sums for those blocks that intersect with the query subset. To improve the query response time and reduce system storage requirements, the covering codes are preferably augmented as single-weight extended covering codes or composition-extended covering codes. Also, a second partial-sum may also be computed for each block to efficiently find its partial sum, based on the block's first partial-sums and the bit-position differences between selected codewords for the block and bit strings representing the cell indexes of the blocks intersecting with the query subset.