The computer-aided official management system for developments described in patents and publications, which limits the official examination work for each development and substantially supports development management, permits data to be organized in a classification system with low redundancy. The management system contains a multiplicity of development systems (1) which emerge from one another by means of set operations via orientated relationships (4), in that each development system (1) is linked to a database (11), which contains, in particular, the definition of the development system (1) serving as a reference system (2), the formulation and status of each individual relationship (4) forming the development system (1), as well as unique indicators to further data. The development systems are uniquely defined as intersection sets in the data structure for linking previously separately considered reference systems via the relationships (4).
A technique for efficient representation of dependencies between electronically-stored documents, such as in an enterprise data processing system. A document distribution path is developed as a directional graph that is a representation of the historic dependencies between documents, which is constructed in real time as documents are created. The system preferably maintains a lossy hierarchical representation of the documents indexed in such a way that allows for fast queries for similar but not necessarily equivalent documents. A distribution path, coupled with a document similarity service, can be used to provide a number of applications, such as a security solution that is capable of finding and restricting access to documents that contain information that is similar to other existing files that are known to contain sensitive information.
A technique for determining when documents stored in digital format in a data processing system are similar. A method compares a sparse representation of two or more documents by breaking the documents into "chunks" of data of predefined sizes. Selected subsets of the chunks are determined as being representative of data in the documents and coefficients are developed to represent such chunks. Coefficients are then combined into coefficient clusters containing coefficients that are similar according to a predetermined similarity metric. The degree of similarity between documents is then evaluated by counting clusters into which chunks of similar documents fall.
A computer-implemented method of gathering large quantities of training data from case law documents (especially suitable for use as input to a learning algorithm that is used in a subsequent process of recognizing and distinguishing fact passages and discussion passages in additional case law documents) has steps of: partitioning text in the documents by headings in the documents, comparing the headings in the documents to fact headings in a fact heading list and to discussion headings in a discussion heading list, filtering from the documents the headings and text that is associated with the headings, and storing (on persistent storage in a manner adapted for input into the learning algorithm) fact training data and discussion training data that are based on the filtered headings and the associated text. Another method (of extracting features that are independent of specific machine learning algorithms needed to accurately classify case law text passages as fact passages or as discussion passages) has steps of: determining a relative position of the text passages in an opinion segment in the case law text, parsing the text passages into text chunks, comparing the text chunks to predetermined feature entities for possible matched feature entities, and associating the relative position and matched feature entities with the text passages for use by one of the learning algorithms. Corresponding apparatus and computer-readable memories are also provided.
A method (100) of researching and analyzing information contained in documents that belong to a first database (200) and are organized according to a first set of fields (210) for an electronic search and retrieval by a computer (850). The method includes the steps of: a) conducting an electronic search (202) of the first database to retrieve at least one document; b) developing user-defined fields (300); c) reading (310) the at least one document to retrieve information pertaining to the user-defined fields; d) entering into a second database (510) the at least one document, values of the first set of fields for the at least one document, the user-defined fields and the retrieved information pertaining to the user-defined fields; and e) analyzing (506) the information contained in the second database.