When a retrieval condition including a first desired word and a first desired component including a value in which the first desired word is included, is inputted, a first detecting device detects second desired components each being similar to the first desired component, an acquiring device acquirers second desired words each being similar to the first desired word, a first retrieving device retrieves first structured documents each including a first component including a value in which one of the first desired word and the second desired words is included, a second retrieving device retrieves second structured documents each including a second component corresponding to one of the first desired component and the second desired components and including or corresponding to the first component.
This is a continuation of U.S. application Ser. No. 10/107,066, now U.S. Pat. No. 6,889,223, filed Mar. 28, 2002, which is incorporated herein by reference.
A method and system for augmenting a corpus with documents on concepts not sufficiently covered within the corpus is provided. The augmentation system generates a corpus concept graph from the documents of a corpus. A corpus concept graph represents concepts of the documents as nodes and related concepts as links between nodes. To generate a corpus concept graph, the augmentation system identifies the concepts that are related within each document of the corpus and adds nodes and links to the corpus concept graph for related concepts. The augmentation system analyzes the corpus concept graph to determine whether the relatedness of concepts of the documents of the corpus is sufficient. If the relatedness of a pair of concepts is not sufficient, then the augmentation system attempts to identify documents not already in the corpus that are related to the concepts that are not sufficiently related.