WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Document filing system with knowledge-base network of concept interconnected by generic, subsumption, and superclass relations    
United States Patent4868733   
Link to this pagehttp://www.wikipatents.com/4868733.html
Inventor(s)Fujisawa; Hiromichi (Tokorozawa, JP); Higashino; Jun'ichi (Kokubunji, JP); Hatakeyama; Atushi (Kokubunji, JP)
AbstractA document filing system is provided for storing a large amount of information in proper arrangement for facilitating utilization thereof by a user, while allowing semantical retrieval to be realized even from vague fragmental information. Further, a method is provided for expressing the facts consitituting information in terms of "concepts" representing things and "relations" defined between the concepts internally of computer, and a method of inputting user's information to a computer through dialogical procedure and retrieving desired information. Information stored of the computer architects internally a concept network which is displayed in various forms such as hierarchical form based on subsumption relations between the concepts, hierarchical representation based on part-whole relation between the concept, a frame display of a single concepts, and tabular representation of a set of concepts belonging to a given class. The network may be browsed by referring to the contents of the display so that a user can easily know what kind of information has been stored internally of the computer, whereby he or she can perform inputting of new information and retrieval of desired information in a facilitated and simplified manner. The relations stored internally of the computer are classified into "generic relationship" and "instance relation" representing individual facts, whereby a generic framework of facts can be stored. The generic framework is displayed upon interaction with the user for allowing new information to be inputted and desired information to be retrieved in a facilitated and simplified manner. Retrieval by using sematic retrieval formula created internally through dialogical procedure is realized through inferring processing.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4868733
Document filing system with knowledge-base network of concept

     interconnected by generic, subsumption, and superclass relations - US Patent 4868733 Drawing
Document filing system with knowledge-base network of concept interconnected by generic, subsumption, and superclass relations
Inventor     Fujisawa; Hiromichi (Tokorozawa, JP); Higashino; Jun'ichi (Kokubunji, JP); Hatakeyama; Atushi (Kokubunji, JP)
Owner/Assignee     Hitachi, Ltd. (Tokyo, JP)
Patent assignment
All assignments
Publication Date     September 19, 1989
Application Number     06/844,123
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     March 26, 1986
US Classification     707/5 706/50 706/53 706/55 706/934 715/533
Int'l Classification     G06F 007/28 G06F 015/21
Examiner     Williams Jr.; Archie E.
Assistant Examiner     Wang; Leo Li
Attorney/Law Firm     Antonelli, Terry & Wands
Address
Parent Case    
Priority Data     Mar 27, 1985[JP]60-60678
USPTO Field of Search     364/521 364/200 MS File 364/900 MS File 364/513
Patent Tags     document filing knowledge-base network concept interconnected generic, subsumption, superclass relations
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4611298
Schuldt
707/1
Sep,1986

[0 after 0 votes]
4606002
Waisman
707/3
Aug,1986

[0 after 0 votes]
4575798
Lindstrom
707/7
Mar,1986

[0 after 0 votes]
4497039
Kitakami
707/2
Jan,1985

[0 after 0 votes]
4420817
Yoshida
704/6
Dec,1983

[0 after 0 votes]
4384329
Rosenbaum
704/10
May,1983

[0 after 0 votes]
4358824
Glickman
707/5
Nov,1982

[0 after 0 votes]
4318184
Millett
707/1
Mar,1982

[0 after 0 votes]
4305131
Best
715/716
Dec,1981

[0 after 0 votes]
4298957
Duvall
715/522
Nov,1981

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A document filing system for retrieving stored information based on an operator's partial or abstract description of said information comprising:

means for storing said knowledge base in which knowledge is represented in terms of concepts, each of which has a name associated therewith, relations each of which connects two of said concepts, at least some of said relations being subsumption relations, each of which is an ordered relation representing a superclass relationship between said concepts and generic relationships, each of which exists between two classes of concepts and which represent a possible relationship between two concepts each of which is a concept subsumed by one of said two classes, respectively;

means for interacting with said user for presenting guiding information from part of said knowledge base to said user, and for allowing said user to enter information necessary to register said partial or abstract descriptions to retrieve documents, and information necessary to update said knowledge base;

means for storing programs and data necessary for managing said knowledge base and for carrying out document registration and inferential retrieval;

means for controlling operations of said system according to said programs stored in said storage means, including matching an abstract description of a concept entered by said user with a concept stored in said knowledge base,

means for inputting said documents,

means for storing a large amount of documents inputted by said inputting means, and

means for displaying the retrieved documents wherein the contents of said knowledge base is arranged in

a first table means for storing tables which record at least a concept identification number and names for said concept,

second table means for storing tables which record at least two concept identification numbers representing subsumption relations between two concepts,

third table means for storing tables which record at least an identification number of a generic relationship which is a relationship between any two concepts which represents a possible relationship between two different concepts each of which is subsumed by one of said two concepts, respectively, and

two character strings corresponding to said generic relationship for two directions, and

fourth table means for storing tables which record at least two concept identification numbers and one generic relationship identification number, representing a specific relation defined between two concepts.

2. A document filing system for retrieving stored information based on an operator's partial or abstract description of said information, comprising:

means for inputting information into said system;

means for storing said information;

means for storing said information as a knowledge base;

means for storing operating programs and data for managing said knowledge base;

means for controlling operations of said system according to said operating programs and said data;

means for interacting with said operator, including a display means, and for instructing said operator;

means for retrieving desired or precise information, based on an input of a partial or abstract description of said information, input by said operator;

means for displaying at least said desired or precise information;

wherein said knowledge base comprises a plurality of concepts and a plurality of relations which may exist between said concepts;

said plurality of concepts forming a concept tree and representing a taxonomic hierarchy having a first concept representing a universal concept and all remaining concepts being subsumed, either directly or indirectly, by said universal concept; and

said relations including,

generic relations, each of which represents at least one of a link between a first predetermined concept and a second predetermined concept, a link between said first predetermined concept and a concept subsumed by said second predetermined concept, and a link between said second predetermined concept and a concept subsumed by said first predetermined concept; and

instance relations, each of which represents a link between a concept subsumed by said first predetermined concept and a concept subsumed by said second predetermined concept.

3. A document filing system according to claim 2 wherein said means for storing said information as a knowledge base includes a first memory means fr storing at least an identification number, and a name for each concept.

4. A document filing system according to claim 3, wherein said means for storing information as a knowledge base further includes a second memory means for storing at least a list of each concept and at least one corresponding subsuming concept.

5. A document filing system according to claim 4, wherein said means for storing information as a knowledge base further includes a third memory means for storing at least a list of all said generic relations.

6. A document filing system according to claim 5, wherein said means for storing information as a knowledge base further includes a fourth memory means for storing at least a list of all said instance relations.

7. A document filing system according to claim 3, wherein said first and second predetermined concepts are superclass concepts, a superclass concept being a concept which represents the highest concept in a related class of concepts, such that all concepts in said class of concepts are subsumed by the superclass concept for said class.

8. A document filing system according to claim 7, wherein said display means displays a generic frame to said user when a new concept is input by said user, said generic frame being generated by said retrieving means based on said instance relations corresponding to all superclass concepts of said new concept.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an information storage and retrieval system which permits storage, retrieval and display of information such as documents, drawings, photographs and the like in such a manner in which common users can easily manipulate the system for the storage and/or retrieval of information.

2. Description of the Prior Art

Heretofore, management of a data base which permits storage and retrieval of an enormous amount of information has been relied on by those skilled in the art. The information is available to the end user only through the medium of experts. However, in accompaniment to the development of small size storage devices of a large capacity such as optical disks, there are realized document filing systems for office use which can be directly manipulated by the end users. Further, word processors have increasingly come into wide use. Under the circumstances, there is an increasing tendency that a large amount of documents are stored in electronic devices.

Heretofore, items, such as documents, are managed in tabular form listing bibliographic data such as identification names, titles and author's names attached to the documents, and attempts have been made to facilitate the retrieval of information by assigning keywords or classification codes thereto. Nevertheless, there arise problems mentioned below.

In most of the computer file systems, the file management is performed with the aid of identification names (each composed of ca. 20 characters). However, difficulty is often encountered in naming the document or file so that it can be readily recalled. Besides, searching the file on the basis of the character string which constitutes the name while inferring the contents from the name is an extremely difficult job even for the user who has prepared the name himself.

Since the bibliographic data are objective items, registration thereof can be easily made. However, there scarcely arises the situation in which the bibliographic data are made use of as means for retrieval. Utilization of the bibliographic data as the aid for the retrieval is restricted to the rare case in which the document to be retrieved is clearly known to the user as the source or reference literature.

In most cases of the retrieval of documents, the title ambigously memorized by user or the contents thereof provides a clue for the retrieval. To this end, keywords and classification codes are employed. However, difficulty is encountered in assigning the keywords or classification codes to the documents upon registration thereof. In other words, it is difficult to determine the keyword which makes it possible to retrieve properly the associated document later on. By way of example, it is assumed that many keywords are attached to a document so that it can be retrieved, as viewed from various perspective. This however means that a number of keywords which are useless for retrieval are employed. If the number of the keywords is decreased, uncertainty arises as to the correct selection for retrieval. In the data base for literatures, preparation and allocation of the keywords have heretofore been relied on by those skilled in the art.

Moreover, difficulty is often encountered in recalling the keyword itself. By way of example, upon preparation of the retrieval formula composed of the keywords for the retrieval of a document, literatures having a resemblance to the desired one are searched out from a general list for picking up their keywords, which are then referred to for determining the keywords possibly allocated to the desired document. Such procedure is not rare and tells how difficult it is to recall the keyword.

In the case of filing documents through classification, ambiguity of the taxonomic tree (hierarcal tree) as well as confusion of the taxonomic trees (i.e. multiple classifications of one document) provide problems. Further, standards for the classification vary as passes. A span of several years will make the classification standards useless, giving rise to another problem.

Under the circumstance, easy management and retrieval of information for the user provide extremely important problems remaining to be solved in the hitherto known document filing systems.

As n attempt to cope with the above problems, there has been proposed a method of diagraming the retrieval conditions and deriving a formal query formula for the retrieval by using natural language, as disclosed in J. F. Sowa's "Cohceptual graphs for a Data Base Interface" IBM J. Research and Development, Vol. 20, 1976, p.p. 336-357. Furthermore, a method of assisting creation of the conditional formula for retrieval by presenting knowledge concerning the contents of a data base from a computer is known, as disclosed in F. N. Tou et al's "RABBIT: An Intelligent Database Assistant", Proceedings of National Conference of AAAI, 1982, p.p. 314-318. These methods are intended only for assisting the retrieval from the data base. No teachings are disclosed as to the assistance of storage of information for the updating purpose.

In the filing of documents by the end user, registration of new documents as well as maintenance of the file system (e.g. reexamination as to pertinency of classification) is important for realizing the facilitated retrieval. The approaches mentioned above do not meet this requirement.

Finally, the retrieval is accompanied by still another problem. Namely, no measures are available for re-examining the old information from the view point of a new concept which has not yet been clearly defined at the time the old information was stored or for retrieving from the new point of view. By way of example, there often occurs such case in which classification is to be modified from the new viewpoint or in a manner specific to the user himself after lapse of several years. In this way, possibility of rearrangement of information as well as alteration of retrieval also provide important factors for enhancing the easy usability of the information storage and retrieval system.

SUMMARY OF THE INVENTION

An object of the present invention is to solve the problems mentioned above and provide an information storage and retrieval system which allows the user to retrieve the desired document from ambiguous or vague and fragmentary (partial) information in a facilitated and simplified manner while making it easy to enter or register documents and other information.

In view of the above and other objects which will be more apparent as description proceeds, there is provided according to a general aspect of the invention an information storage system in which a mechanism of storing information in the machine is so arranged as to be compatible or comparable to the user's memorization mechanism and thinking process so that the end user can easily understand manipulation of the system to thereby enhance the facilitated usability thereof.

More specifically, the invention contemplates to make it possible to facilitate registration of new information and the inputting of conditions for retrieval, realizing semantically meaningful retrieval, and adapting the retrieval for diversity of viewpoints.

To this end, the system according to the invention is imparted with the novel functions mentioned below:

(1) Supporting function for registration.

For registration of new documents, it is necessary to input the subject matter and the nature or class thereof in addition to the entry of the bibliographic items (author's name, title, the sources and others). Further in order to realize semantic retrieval, it is required to additionally provide more detailed or concrete information. By way of example, suppose that the subject matter is a computer. Then, there may be required such information as "what kind of computer it is", "what characteristics it has", "what company has developed it", "where the company is located", "which country the location belongs to", and so forth. When the information mentioned above is stored, it is possible to retrieve with the aid of inference function "the document concerning a computer developed by a certain company located in a country A and having characteristic features B".

According to the teachings of the invention, knowledge about the concepts "computer", "company" and others is stored in the storage system, wherein upon addition of new information, the user is given instruction as to what kind of property data should be inputted through dialogical procedure, so that he or she can input the data within a short time without being accompanied with entry of erroneous or false information.

In the case where information or similar property has been already registered, such function is realized which allows only the property differing from that of the above information to be inputted without need for entering all the property data of information to be newly inputted, to thereby facilitate the inputting procedure. By way of example, suppose a case in which a man named "John Smith" has been already registered and his brother named "George Smith" is to be newly registered. In that case, by selecting "John Smith" as a similar concept, the system displays a list of the properties of this concept, for example, in a manner as follows:

(FATHER-IS "Davise Smith")

(MOTHER-IS "Samanser Smith")

(BIRTHDAY-IS "May 4, 1960")

(SEX-IS "male")

(HOBBY-IS "music") (1)

Then, the user can input the properties of the concept "George Smith" that differ from the above, e.g. (BIRTHDAY-IS "June 7, 1963") and (HOBBY-IS "sport").

(2) Supporting Function for retrieval condition input.

When the end user is going to perform the retrieval of a document, it is common that he or she has only an ambiguous image or concept of the document and has difficulty in expressing it in the natural language.

According to the teaching of the present invention, the retrieval is started from the most important concept and information is sequentially added through dialogical procedure or interaction. To this end, the knowledge of the world model conserving the content of the filed documents is stored in the system as is the case with the registration assistance function. On the basis of the knowledge, the names of properties which can be inputted and the concept (class of things) to which the properties may belong are presented to the user.

By way of example, suppose that what the user wants is "technical paper". Then, the user inputs "technical paper". The system knows that "technical paper" has properties such as "author", "title", "subject matter" and others. Accordingly, the system displays on a terminal CRT sets of names of such properties and concepts such as (author, name), (title, text).. and (subject, concept). The user who observes the display in turn inputs the selected data which the user memorizes as the relevant information. For example, "subject" is selected and "computer" is inputted. This process can be recursively repeated. In the above example, when the "computer" is inputted as the selected subject, the system in turn displays (DEVELOPED-BY ORGANIZATION COMPANY), (RUNS COMPUTER-LANGUAGE), (RUNS-UNDER OS) and others. In response thereto, the user will input (RUNS LISP) as the additional condition for retrieval.

By virtue of the assistance function mentioned above, there can be established the retrieval condition as follows:

______________________________________ "Technical paper about computer in which LISP runs and which is written by an employee of company A" (2) ______________________________________

As will be described in detail hereinafter, the above retrieval condition is expressed in the formula or expression as follows:

______________________________________ (TECHNICAL-PAPER (SUBJECT-IS (COMPUTER (RUNS LISP)) (AUTHOR-IS (EMPLOYEE (WORKS-AT COMPANY A)))) (3) ______________________________________

The above expression is based on symbolic expression (S-expression) in LISP Language (refer to P. H. Winston "LISP" Addison-Wesley Publishing Co., 1981, p. 18).

(3) Semantic retrieval function.

It is common that a user who wants to retrieve a certain item has only fragmentary and ambiguous information thereof. On the other hand, the computer memory (e.g. data base) stores that item in a concrete name. The gap between the user's fragmentary information and the precise data stored in the computer memory must be bridged.

In this connection, the ambiguity may be generally classified into five varieties mentioned below:

(i) Incompleteness of name

Only a part of the name of an item or concept is memorized.

(ii) Synonym

The same thing is often memorized or recalled in terms of different words. By way of example, words "artificial intelligence", "thinking machine", and "AI" indicates the same concept.

(iii) Incompleteness of number.

It is rare that a user remembers numerical values precisely, as exemplified by "during the generation of 1980s", "about 1985", "from 1983 to 1987", "before 1960" and so on.

(iv) Taxonomic conceptual abstraction -1

Things and concepts are often memorized in terms of concepts of higher rank with the concrete contents being forgotten. Memorization of the is often based on the classification of concept, as exemplified by sayings that "although the name of the company is forgotten, the organization is neither university nor laboratory but a company at any rate", "that was a certain electric machinery manufacturer" or the like.

In this case, assuming that the electric machinery manufacturer is "ABC Co., Ltd.", for example, the following relations hold true.

("ABC Co., Ltd." IS-A ELECTRIC-MANUFACTURER)

(ELECTRIC-MANUFACTURER IS-A MANUFACTURER)

Schematically, the concepts "ABC Co., Ltd." and "ELECTRIC-MANUFACTURER" are coupled by a link "IS-A". Herein, the link "IS-A" represents a relation defined between the two concepts mentioned above and is referred to as the subsumption relation which is an ordered relation representing a superclass relation between two concepts.

In general, it is believed that all the concepts constitute a hierachial taxonomy by means of the link "IS-A". The resulting hierachical tree is referred to as a concept tree or conceptual tree.

(v) Partomic conceptual abstraction -2

The abstraction discussed above is a sort of set theoretical abstraction. It should be pointed out that people often memorizes a thing in terms of upper rank part in part-whole relation of a concept. For example, man says that "although I can not remember the factory where Mr. A works, I am sure that he is an employee of ABC Co., Ltd." or "although I can not remember what the city is called, I am sure that the city is located in the state of California".

In contrast, the conventional data base stores the corresponding facts in more definite manner such as "Mr. A works at XYZ factory" or "ABC Co., Ltd., is located at Los Angeles". Accordingly, the information stored in the data base can not be retrieved starting from the ambiguous information memorized by the user.

In this case, the following relations play an important role.

("ABC Co., Ltd." HAS-PART-OF "XYZ factory")

("California state" HAS-PART-OF "Los Angeles").

What is important to be noted is

("LosAngeles" IS-A "California state")

is not correct, but should be

(LosAngeles IS-PART-OF "California state").

These relations "IS-PART-OF" and "HAS-PART-OF" are referred to as "part-whole" relations which are ordered relations representing a structural inclusion relationship between two concepts.

This relation should be clearly distinguished from the subsumption relation described above. Parenthetically, it should be mentioned that the relation "IS-PART-OF" is a reverse relation of "HAS-PART-OF".

In more strict sense, the relation having directivity is referred to, simply as the relation, while the relation is referred to as the relationship when the direction is not concerned.

As to a person's memorization faculty or characteristic, it may further be pointed out that relation between the concepts is more susceptible to be memorized than the concepts themselves. For example, in the case of retrieval starting from such fragmentary ambiguous information that "the subject matter of a certain article is an operating system which was developed by an institute in U.S.A.", the fact "developed" is important, and this fact represents "relation" defined between the two concepts "operating system" and "institute". In more concrete, retrieval condition may be expressed as follows:

______________________________________ ("UX OPERATING SYSTEM" IS-DEVELOPED-BY "INSTITUTE B") ______________________________________

wherein "IS-DEVELOPED-BY" represents the relation. In the retrieval based on the ambiguous information, this "relation" defined among the concepts is important.

Among the characteristics of a person's memorizing faculty, the incompleteness of name and numerical values are taken into consideration in the hitherto known information retrieval. For example, there can be mentioned the matching function of fragmentary (partial) character string and designation of numerical range. The semantic retrieval function according to the invention is characterized above all by the conceptual abstractions among the classified varieties described above. More specifically, with the aid of the retrieval condition input supporting function, the semantically ambiguous retrieval is rendered possible, as follows:

______________________________________ Retrieval Condition: "Article concerning a computer developed by a certain company located in California state and in which an operating system developed by a certain institute runs" (4) ______________________________________

In the above conditional statement, the concrete concept is only "California state". Other words which may possibly be used as keywords are "computer", "institute", and "operating system". Through the hitherto known information retrieval system, e.g. keyword retrieval system, any satisfactory results of retrieval can not be obtained. It is however noted that the conditional statement (4) is considered a "semantic meaningful retrieval condition" according to the invention, because the statement (4) contains relations between "California state" and "company", "company" and "computer", and "operating system" and "computer", respectively, as the information for retrieval. Further, in the sense that "company", "computer", "operating system" are generic name (abstract concepts), the so-called "abstract" retrieval is realized. In contrast, in the case of the hitherto known retrieval system, since the relations between keywords are not stated, the above statement (4) may be erroneously interpretted as "article about computer introduced by an institute located in California state and in which operating system developed by a certain company runs", which is of course "semantically meaningless retrieval".

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a view showing a system arrangement according to an embodiment of the present invention;

FIG. 2 is a view for illustrating a concept network;

FIG. 3 is a view illustrating the concept network in a schematic diagram;

FIG. 4 is a view showing a concept relation model in an Entity-Relation diagram;

FIGS. 5 to 8 are views illustrating concrete examples of knowledge representation by the concept relation model;

FIG. 9 is a view illustrating an example of image data management;

FIG. 10 is a functional block diagram showing software employed according to an embodiment of the invention;

FIG. 11 is a view for illustrating a result of character substring matching procedure;

FIG. 12 is a view showing a menu;

FIG. 13 is a view for illustrating network traverse procedure based on selection from the menu;

FIG. 14 is a view showing a concept tree display;

FIG. 15 is a view showing a hierarcal tree based on the part-whole relationship;

FIG. 16 is a view for illustrating network traverse procedure based on concept frames;

FIG. 17 is a view for illustrating method for definition and registration of a new concept;

FIG. 18 is a view for illustrating concept network edition;

FIGS. 19 to 22 are views for illustrating dialogical retrieval formula creating procedure;

FIG. 23 is a view for illustrating semantic retrieval;

FIG. 24 is a view for illustrating a concept matching procedure; and

FIG. 25 is a view for illustrating functions for displaying concepts in tabular form.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

In the following, the present invention will be described in detail in conjunction with the exemplary or preferred embodiments thereof by referring to the accompanying drawings.

FIG. 1 shows a general arrangement of an image information filing system in which an information storage and retrieval system according to an exemplary embodiment of the invention is adopted. Initially, the structure and operation of the whole system will be outlined below.

Basically, the system is composed of a data processing portion and an image information processing portion. The data processing portion comprises a control unit (also referred to as CPU) 100, a main memory 300, magnetic disk units 400 and a terminal console 200 (which includes a CRT 210, a keyboard 220 and a mouse 230) and an image information processing portion. On the other hand, the image information processing portion comprises an image scanner 700, an image printer 750 an optical disk unit 450, an image buffer memory 350, a high-speed image processor (also referred to as IP) 600 and a high-resolution image display (also referred to as CRT) 500. The data processing portion and the image information processing portion are interconnected through a bus adapter 805.

As main operations to be performed, there can be mentioned registration of image information from documents, retrieval of desired information for display or other type of outputting thereof, and inputting and editing of information or data belonging to the field to be filed. In the registration of the image knowledge of a document, the latter is scanned through the image scanner 700, wherein the resulting image information is loaded in the image buffer memory 350 and stored in the optical disk unit 450 after having been coded in a compressed form by the high-speed image processor or IP 600. At that time, the image information in the buffer memory 350 is displayed on the image display or CRT 500 to check whether the image information has been properly digitized, while bibliographic data of the document (such as subject or title, author, the source and others) as well as significance thereof in the world knowledge are inputted through the terminal console 200. The bibliographic data, physical addresses (pack address, track address and sector address) of the image information in concern on the optical disk unit 450 and properties of the image (size, scan density, type of coding as adopted and the like) are stored in the magnetic disk unit or file unit 420. On the other hand, information about the significance of the document in the world knowledge and the like is stored in the file unit 430.

In the retrieval and display operation, the desired document is identified with the aid of the terminal console 200 through dialogical interacting process described hereinafter to be thereby displayed on the image display CRT 500. When a hard copy is desired, this can be outputted from the printer 750. Information about the location of the identified document (such as the physical address of the optical disk unit) is read out from the file unit 420 to be subsequently sent to the optical disk control unit 450 as the control command for reading the optical disk by way of the bus adapter 805. The image information or data thus read out is once stored in the buffer memory 350 and is sequentically decoded through the IP 600 to be displayed.

The mouse 230 is capable of designating the display position or location on both the CRTs 210 and 500. Accordingly, the display position of the image on the CRT 500 is designated by the mouse 230. By taking advantage of this function, the document images on a plurality of pages can also be displayed at given locations or positions on the CRT in overlapping relation. Furthermore, the document image corresponding to one page can be displayed in a reduced size through the IP 600, for thereby allowing a number of ges to be simultaneously displayed on a single CRT screen. Management of images to be displayed on the CRT is performed by the control unit or CPU 100.

Inputs for editing the world knowledge are performed on the terminal 200 by displaying the document on the CRT 500, as it is required. The phrase "world knowledge", is intended to mean a set of concepts concerning the world or field described in the document and the facts described in terms of relationships among the concepts, which document is to be registered or has already been registered. Further, the term "world knowledge" encompasses these concepts, as well as the interconceptural relationships, in a natural language. Needless to say, the document itself is included as one of the concepts by the term "world". These knowledges are stored in the file unit 430.

The three main functions described above can be arbitrarily called in a modeless manner whenever they are required. By way of example, information as required can be displayed on the CRT 500 by resorting to the retrieval function in the course of performing the additional editing of the world knowledges. It is also possible to additionally file the knowledge of the contents of a document in the course of performing the registration of the same document.

Next, discussion will be directed to the representation format of the world knowledge data. The representation of knowledge is made in terms of two varieties of elements, i.e. the concepts and the relation(s) between or among the concepts. FIG. 2 is a schematic diagram illustrating conceptually these elements in terms of a kind of a semantic network. In the figure, each node represented by an ellipse represents a concept, wherein the word written within the ellipse is typical word representing that concept. This word is referred to as the name of the concept. Links interconnecting the ellipses (i.e. solid and broken lines with respective arrows) represent the relationships among the concepts. For example, the fact that a "supercomputer 1012" is "one variety of" a "computer 1011" is represented by a link labelled "IS-A". It should be mentioned that "UNIVERSAL 1010" is a specific concept defined to subsume all the other concepts. In other words, all the concepts constitute a concept tree having a root constituted by the concept "UNIVERSAL", wherein the concept tree represents a taxonomic hierarchy. The link "IS-A" is one variety of the relationships. However, this link also serves as a route for inheriting the property of a concept to the one ranked lower. Consequently, this link or relationship is considered discriminatively from the other relationships. To this end, the links "IS-A" are represented by the arrowed solid lines, while other links or relationships are represented by broken lines.

By way of example, in considering a generic property that "computer runs software", it will be noted that this property can also be represented by the expression "software runs on computer". This kind of relationship will herein be referred to as the generic relation. The representing format of the generic relation in the case of the example mentioned above is

(COMPUTER RUNS SOFTWARE)

(SOFTWARE RUNS-ON COMPUTER) . . . . (5)

These generic relations can be taken over or inherited to the low rank concepts in such a manner that "supercomputer runs software" and "X-800 computer runs software" or "operating system runs on computer" and "UX runs on computer", where each of the foregoing is referred to as a generic relation. These relationships can be derived from the generic relation (5) and is not directly described in the knowledge base.

In FIG. 2, the link 1005 interconnecting the concepts "X-800" and "UX" differs from the aforementioned generic relationship. This link 1005 represents the individual relation defined between the two concepts linked together. This sort of relation will be referred to as the instance relation or simply as relation. It should however be noted that the relation 1005 is an instance relation of the generic relationship 1004.

In this way, the schematic diagram of FIG. 2 tells a fact that the subject matter of an article "ART #018" denoted by a numeral 1018 is the supercomputer X-800 and that an operating system UX runs on the supercomputer X-800. Further, it will be seen that all the concepts are interconnected by longitudinal lines referred to as the links labelled "IS-A" on one hand and interconnected by transverse links referred to as the generic relations and the instance relations, to thereby constitute the conceptual network.

In this conjunction, it is important to no