WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for abstracting concepts from natural language    

Get related patents on CD
United States Patent5056021   
Link to this pagehttp://www.wikipatents.com/5056021.html
Inventor(s)Ausborn; Carolyn (1904 Bluebird Ave., Huntsville, AL 35816)
AbstractA method and apparatus for abstracting meanings from natural language words. Each word is analyzed for its semantic content by mapping into its category of meanings within each of four levels of abstraction. The preferred embodiment uses Roget's Thesaurus and Index of Classification to determine the levels of abstraction and category of meanings for words. Each of a sequence of words is mapped into the various levels of abstraction, forming a file of category of meanings for each of the words. The common categories between words are determined, and these common elements are output as data indicative of the most likely categories of meaning in each of the levels of abstraction to indicate the proper meaning intended to be conveyed. This data is then processed according to a rule to obtain a result.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Drawing from US Patent 5056021
Method and apparatus for abstracting concepts from natural language - US Patent 5056021 Drawing
Method and apparatus for abstracting concepts from natural language
Inventor     Ausborn; Carolyn (1904 Bluebird Ave., Huntsville, AL 35816)
Owner/Assignee    
Patent assignment
All assignments
Company News
Publication Date     October 8, 1991
Application Number     07/363,181
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 8, 1989
US Classification     704/9 706/55
Int'l Classification     G06F 015/21
Examiner     Jablon; Clark A.
Assistant Examiner    
Attorney/Law Firm     Cushman, Darby & Cushman
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/419
Patent Tags     abstracting concepts natural language
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4942526
Okajima
704/10
Jul,1990

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method for automatically determining a meaning from a sequence of words in a machine, comprising the steps of:

receiving a sequence of words in the machine;

using said machine to access a database that includes a prestored plurality of categories of meaning for each said word, to obtain at least one category of meaning for each of said words of said sequence; and

using said machine to select certain ones of said plurality of categories of meanings by determining ones of said categories of meanings which are common for different ones of said respective words, said common categories of meanings being used as abstracted meanings for the sequence.

2. A method as in claim 1, wherein said database is of the type that includes a plurality of pre-stored categories of meanings at different levels of abstraction for each said word, said levels of abstraction ranging progressively from a first level of abstraction which is a more specific category of meaning of the word to a last level of abstraction which is a more general category of meaning of the word, wherein said at least one category of meaning for each said word includes categories of meanings from each said level of abstraction for each said word, and wherein said selecting step includes the step of determining common categories of meaning within each said level of abstraction for each said word, and using said common categories of meaning for each said level of abstraction as said abstracted meanings within said each level of abstraction.

3. A method as in claim 2, wherein said using said machine to select step includes the steps of:

storing a multiple-dimension array of information from said database, each said level of abstraction representing one dimension of the array, and each word of the sequence representing another dimension of the array, and

using said machine to determine common categories of meanings and numbers of occurrences of said common categories of meanings among said words, within each said one dimension representing each said level of abstraction.

4. A method as in claim 2, further comprising the step of processing said abstracted meaning by correlating categories of meanings within one of said levels of abstraction to categories of meanings within another level of abstraction.

5. A method as in claim 2, wherein said sequence of words is in natural language, and said database is a Roget-type database stored in said machine having multiple levels of abstraction.

6. A method as in claim 5, wherein said database is Roget's thesaurus stored in said machine and has four levels of abstraction.

7. A method as in claim 1, wherein said using said machine to select step includes the steps of:

storing an array of information from said database, each element of the array including said at least one category of meaning for one of said words, and

determining common ones of said categories of meanings.

8. A method for automatically abstracting a meaning from a sequence of words using a machine, comprising the steps of:

receiving a sequence of words in said machine;

accessing a database with said machine of the type that includes a plurality of categories of meanings at different levels of abstraction for each said word, each said level of abstraction ranging progressively from a first level which is a more specific category of meaning of the word to a last level which is a more general category of meaning of the word, to obtain a plurality of categories of meanings for each said word which includes categories of meanings from each said level of abstraction for each said word of said sequence;

using said machine to determine ones of said categories of meanings within each said level of abstraction which are common for different ones of said respective words; and

using said common categories of meanings as abstracted meanings for the sequence.

9. A method as in claim 8, comprising the further step of processing said abstracted meanings to obtain a plurality of classifications into which said sequence belongs.

10. A method as in claim 8, further comprising the step of processing said abstracted meanings by correlating categories of meanings within one of said levels of abstraction to categories of meanings within another level of abstraction.

11. A method as in claim 8, wherein said sequence is in natural language, and said database is a Roget-type database stored in said machine.

12. A method as in claim 11, wherein said database is Roget's international Thesaurus stored in said machine.

13. A method for automatically using a Roget-type database that includes a plurality of categories of meanings at different levels of abstraction for a plurality of words, each said level of abstraction ranging progressively from a first level which is a more specific category of meaning of the word to a last level which is a more detailed category of meaning of the word to abstract a category of meaning from a sequence of words in natural language, comprising the steps of:

receiving a sequence of words in natural language in a machine;

accessing said Roget-type database to obtain a plurality of categories of meanings for each said word from each said level of abstraction for reach said word of said sequence;

using said machine to determine commonality by determining ones of said categories of meanings within each said level of abstraction which are common for different ones of said respective words at said each level of abstraction; and

using said common categories of meanings as abstracted meanings for the sequence.

14. A method for automatically abstracting a category of meaning from a sequence of words using a machine, comprising the steps of:

using said machine to obtain for each of a sequence of words, pointers to all categories of meanings which said each word can convey; and

determining which are the most likely categories of meanings to be intended by the sequence, by determining ones of said categories of meanings which are common for different ones of said respective words.

15. A method as in claim 14, wherein said obtaining step includes the steps of obtaining said categories of meanings at each of a plurality of different levels of abstraction for each said word, said levels of abstraction ranging progressively from a lowest level of abstraction which is a more specific category of meaning of the word to a highest level of abstraction which is a more general category of meaning of the word, wherein said at least one category of meaning for each said word includes categories of meanings from each said level of abstraction for each said word, and wherein said determining step includes the step of determining common elements within each said level of abstraction for each said word, and further comprising the step of using said common categories of meanings as abstracted meanings for the sequence.

16. A method as in claim 15, further comprising the step of processing said abstracted meanings in said machine by correlating categories of meanings within one of said levels of abstraction to categories of meanings within another level of abstraction.

17. A method as in claim -6, wherein said processing step includes the steps of:

deriving and storing a plurality of pre-formed rules for correlating one of said levels of abstraction with another level of abstraction.

18. A method as in claim 17, wherein said rule is derived by:

taking a first level of abstraction and processing each category of meaning in said first level of abstraction by correlating categories of meanings within each of a second level of abstraction more specific than said first level of abstraction to get a plurality of results in each said second level of abstraction for each said first level of abstraction,

mapping each said result in said second level into a third level of abstraction more general than said first level; and

determining if any of said mapped third level results are present and using said second level of abstraction for said present third level of abstraction as results.

19. An apparatus for automatically abstracting a meaning from a sequence of words, using a machine comprising:

means for receiving a sequence of words;

database means, including a prestored plurality of categories of meaning for each said word, for producing at least one category of meaning for each of said words of said sequence; and

processing means for automatically selecting certain ones of said plurality of categories of meanings by determining categories of meanings which are common for different ones of said respective words, said common categories of meanings being used as abstracted meanings for the sequence.

20. An apparatus as in claim 19, wherein said database means stores a plurality of pre-stored categories of meanings at different levels of abstraction for each said word, said levels of abstraction ranging progressively from a lowest level of abstraction which is a more specific category of meaning of the word to a last level of abstraction which is a more general category of meaning of the word, wherein said at least one category of meaning for each said word includes categories of meanings from each said level of abstraction for each said word, and wherein said processing means includes means for determining common elements within each said level of abstraction for each said word, and using said common elements for each said level of abstraction as said abstracted meanings within said each level of abstraction.

21. An apparatus as in claim 20, wherein said processing means includes:

means for storing a multiple-dimensional array of information from said database means, each said level of abstraction representing one dimension of the array, and each word of the sequence representing another dimension of the array, and

means for determining common categories of meanings among said words, within each said one dimension representing each said level of abstraction.

22. An apparatus as in claim 20, wherein said processing means processes said abstracted meanings by correlating categories of meaning within one of said levels of abstraction to categories of meaning within another level of abstraction.

23. An apparatus as in claim 20, wherein said sequence of words is in natural language, and said database means is a Roget-type database having multiple levels of abstraction.

24. An apparatus as in claim 23, wherein said database is Roget's thesaurus, and has four levels of abstraction.

25. An apparats as in claim 19, wherein said processing means includes means for:

storing an array of information from said database means, each element of the array including said at least one category of meaning for said each word, and

determining common ones of said categories of meaning.

26. An apparatus for automatically abstracting a meaning from a sequence of words using a machine, comprising:

means for receiving a sequence of words;

a database processor of the type that includes a plurality of categories of meanings at different levels of abstraction for each said word, each said level of abstraction ranging progressively from a lowest level which is a more specific category of meaning of the word to a last level which is a more general category of meaning of the word, for receiving said sequence of words and analyzing each said word to obtain a plurality of categories of meaning for each said word which includes category of meanings from each said level of abstraction for each said word of said sequence; and

processing means for automatically determining ones of said categories of meanings within each said level of abstraction which are common for different ones of said respective words and using said common categories of meanings as abstracted meanings for the sequence.

27. An apparatus as in claim 26, wherein said processing means comprises means for further processing said abstracted meanings to obtain a plurality of further categories of meaning into which said sequence belongs.

28. An apparatus as in claim 26, further comprising the step of processing said abstracted meanings by correlating categories of meanings within one of said levels of abstraction to categories of meanings within another level of abstraction.

29. An apparatus as in claim 26, wherein said sequence is in natural language, and said database is a Roget-type database.

30. An apparatus for automatically abstracting a category of meaning from a sequence of words using a machine, comprising:

means for obtaining, for each of a sequence of words, pointers to all categories of meanings which said each word can convey; and

means for automatically determining which are the most likely categories of meanings to be intended by the sequence, by determining ones of said categories of meanings which are common for different ones of said respective words.

31. An apparatus as in claim 30, wherein said obtaining means includes means for obtaining said categories of meanings at each of a plurality of different levels of abstraction for each said word, said levels of abstraction ranging progressively from a lowest level of abstraction which is a more specific category of meaning of the word to a highest level of abstraction which is a more general category of meaning of the word, wherein said at least one category of meaning for each said word includes categories of meanings from each said level of abstraction for each said word, and wherein said determining means includes means for determining common elements within each said level of abstraction for each said word.

32. An apparatus as in claim 31, wherein said determining means further comprises means for processing said abstracted category of meaning by correlating categories of meanings within one of said levels of abstraction to category of meanings within another level of abstraction.

33. An apparatus as in claim 32, wherein said processing means includes means for:

deriving and storing a plurality of pre-formed rules for correlating one of said levels of abstraction with another level of abstraction.

34. An apparatus as in claim 33, wherein said processing means includes means for deriving said rule by:

taking a first level of abstraction and processing each category of meaning in said first level of abstraction by correlating categories of meanings within each of a second level of abstraction within said first level of abstraction, and which second level of abstraction is one more specific than said first level of abstraction, to get a plurality of results in each said second level of abstraction for each said first level of abstraction,

mapping each said result in said second level into a third level of abstraction more general than said first level; and

determining if any of said mapped third level results are present and using said second level of abstraction for said present third level of abstraction as results.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention defines a system for abstracting concepts from natural language. This system enables an input sequence of words in natural language to be abstracted into categories of meanings which represent the concepts underlying the natural language phrases. This technique has application to classifying the information in a sequence for use in expert systems, database processing, artificial intelligence, and other similar fields of endeavor.

BACKGROUND OF THE INVENTION

Human beings communicate ideas with one another using a mechanism known as natural language. Natural language evolved as a medium of communications, as human beings learned to communicate with one another. The specific evolution and structure of our natural language is of little concern to all of us when we speak or write in it. However, due to the inherent structure of natural language, it is an imperfect mechanism for conveying ideas. The human brain, through its intricate and somewhat intuitive workings, translates natural language into concepts and ideas, and allows communications between different individuals using natural language through a complicated translation process that no machine has been able to duplicate.

Machines have no intuitive conceptualization capabilities as of this date (1989). Therefore many difficulties have been encountered in the attempt to teach machines to understand natural language. Machines communicate in languages which were specially invented for the machines and which do not include the ambiguities endemic to natural, evolved languages.

One of the most crucial problems in the artificial intelligence field is the presentation to a machine of the conceptual knowledge behind natural language.

Conceptual knowledge is the knowledge which is known by a human being as a function of semantic meaning. Aristotle had the view (known as the Aristotlean view) that meanings are represented most commonly in concepts which are actualized by words. For instance, when a human being uses the word "chair", the purpose of the word is not to use the actual word itself (e.g. not the letters c.cndot.h.cndot.a.cndot.i.cndot.r), but to denote the meaning of the item (a chair) behind the word "chair". The interpretation of language (such as the word chair) as a whole functions primarily at the conceptual level. That is, in our example we interpret the language which we hear (the compressions and rarefactions of air which comprise the sounds comprising the word "chair.revreaction.) as the item which is represented by the word "chair."

Aristotle's line of reasoning implies that meaning has three elements. The first element is the word itself, which differs from language to language, and even within its own language has inherent ambiguities. For instance, the word "chair" as a noun may mean a seat, a chair rail, or a chair lift, or as a verb, may infer leadership, e.g., to chair a committee, for instance. The second level are the mental images of concepts in the mind, which are signified by the word which are the same for all people. The concepts, for instance, are those which the listener develops when he hears the word "chair". The third element is the concept itself, which is the chair itself, and is the same for all people.

The second and third elements, the concepts and the things themselves, were believed by Aristotle to be the same for all people. It is only the words which differ.

Moreover, the concepts and the things themselves are totally unambiguous, while most words in natural language are ambiguous.

SUMMARY OF THE INVENTION

The present invention has an object of translating a sequence of words in natural language into a group of concepts based on the sequence, the concepts being of a type which are common to all people. The present invention links each word in a phrase to the concepts behind the word, to link the word to the concept.

The present invention defines a method and apparatus which translates each natural language word into a plurality of categories of meaning. Almost all words in natural language are ambiguous, and almost all words have such a plurality of such categories of meaning. After each word is translated into the plurality of categories of meaning which it can represent, common ones of said categories of meaning are ascertained. This determination of commonality indicates that this common categories of meaning may be an actual meaning conveyed by the phrase.

According to a preferred embodiment of the invention, a database is used which includes a plurality of usages of each word organized into different levels of generality, called levels of abstraction. The preferred mode of the invention uses Roget's International Thesaurus, 4th Edition, to obtain these usages of each word. Roget's classification organizes each word into its conceptual meanings based on the membership of the the word in the cluster of linguistic signs that go to make up some large and very general concept. Such a database will be referred to throughout this application as a Roget-type database. Roget's index to Classifications is reproduced as an Appendix to this application. It includes four levels ranging from most specific to least specific, and all words in the English language can be classified into these categories.

A best mode of the present invention implements Roget's taxonomy along with the Principle of Commonality. This principle extracts the common elements within each level of an abstraction according to Roget's taxonomy. The program takes as input a sequence of natural language words. Each word is traced through Roget's taxonomy from the most specific to the most general. Each level of abstraction is then examined to determine common elements within the level of abstraction which are common to any two elements of the phrase. The common elements receive a certainty value, based on the number of occurences of the element. The common elements and certainty value are used to generically describe the knowledge contained in the phrase.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other advantages of the invention will be described in detail with reference to the accompanying drawings, in which

FIG. 1 shows an example of Roget's classification system using four progressively more specific levels of abstraction, as used in the preferred mode of the present invention;

FIG. 2 shows a general flowchart of the operation of the present invention;

FIG. 3 shows a more detailed flowchart of the operation of classifying the words according to their meaning;

FIG. 4 shows detail on how each word is classified into the various levels of abstraction;

FIG. 5 shows a sample hardware layout of the present invention; and

FIG. 6 shows a flow diagram of the operation of the rule used to correlate level of abstraction 2 with level of abstraction 3.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The inventor of the present invention has first made the realization of an alternative use of a database which includes meanings of words organized by levels of abstraction. One such database is found in Roget's International Thesaurus.

A thesaurus is a tool for transforming ideas into words. Its most common use is for determining synonyms or antonyms. When a writer has an idea (a conceptualization of a meaning) and wants a word for it, the writer consults a thesaurus in order to determine all of the possible words for the idea.

The inherent organization of a thesaurus is by conceptualization of ideas, organized into words which express the ideas. The inventor of the present invention has first realized that this organization has an application when it is used in reverse--to obtain the ideas behind words from the words themselves.

The inventor has also pioneered a technique for using these meanings, and has titled it "the principle of commonality". This technique is used for determining which ones of the many meanings are applicable to a sequence of words or phrase. This phrase may, of course, be a sentence. This principle finds common meanings between various words of a sequence and uses these common meanings as the key meanings: those which are most likely to be the ones of the meanings intended to be conveyed by the phrase.

The present invention has been embodied using Roget's International Thesaurus, 4th Edition, as the preferred embodiment of a database for obtaining ideas and meanings behind various words. However, it should be understood that any similarly organized database or classification scheme could be equivalently used.

Peter Mark Roget organized his thesaurus in 1852 by producing a finite list of unambiguous concepts, or categories of meaning, and identifying the relationship among these concepts as a hierarchy. Each ordinary word in natural language was mapped from the natural language into a hierarchy of categories of meaning, so that each word has its many meanings related into many of the single concepts, each of which is unambiguous. A proposed use for this is to determine synonyms.

Roget's conceptual taxonomy is organized according to the structure of FIG. 1. This cultural taxonomy is organized into four hierarchical levels of abstraction. The fourth level of abstraction is the most general, and has eight categories of meaning. All of these eight categories of meaning are shown in FIG. 1, and include ABSTRACT RELATIONS; SPACE; PHYSICS; MATTER; SENSATION; INTELLECT; VOLITION; and AFFECTIONS. All words in the natural English language can be classified into one of these eight, most-general categories of meaning within the fourth (most-general) level of abstraction. In fact, Roget has done so throughout his Thesaurus.

Within, and more specific than, each of these categories of meaning at the fourth level of abstraction are a number of categories at the third levels of abstraction. For instance, within the category of meaning at the fourth level of abstraction known as MATTER, there are three categories of meaning at the third level of abstraction: MATTER IN GENERAL; INORGANIC MATTER; and ORGANIC MATTER. All other categories of meaning at the fourth level of abstraction have a similar organization of categories at the third level.

Each of the categories of meaning within the third level of abstraction can be expanded into a number of more specific categories of meaning within a second level of abstraction. For instance, inorganic matter can be expanded into MINERAL KINGDOM; SOIL; LIQUID; and VAPORS.

Finally, the first level of abstraction and its associated categories of meaning are the most specific, and each of the second categories of meaning are expanded into first categories of meaning. For instance, SOIL can be expanded into LAND; BODY OF LAND; and PLAIN.

Roget has classified each word in natural language into categories of meaning within each of the four levels of abstraction. Some of these four levels of categories of meaning may not exist, however, for certain words. For instance, within the level of abstraction 4, the category of meaning MATTER has a third level of abstraction category of meaning labelled as MATTER IN GENERAL. Under this third category of meaning is no second category of meaning, but rather the third category of meaning MATTER IN GENERAL has only first level of abstraction categories of meaning depending therefrom, including UNIVERSE; MATERIALITY; IMMATERIALITY; MATERIAL; CHEMICAL; OILS AND LUBRICANTS; and RESINS AND GUMS.

The inventor has determined that using this hierarchy allows each word in a phrase of a natural language to be grouped according to its meaning. Each "word" in a phrase can be analyzed to determine the appropriate categories of meaning from the Roget's classification that apply thereto. These categories of meaning from the four levels of abstraction are stored as functions of their level of abstraction. When the entire phrase has been analyzed, common categories of meaning within levels of abstraction are determined and assigned a certainty value based on how many times they occur. If any two words in the phrase have a common category of meaning within any level of abstraction, there is a high probability that this is a meaning behind these words when used in the phrase. However, the commonality need not be between all words in the phrase and any two occurrences of category of meaning within any level of abstraction.

The term "word" is used loosely throughout the specification and claims, and refers to a structure which may be more than one natural language word. For instance, the word may include an article (such as "a word") or may be an entire verb such as "to obtain". It should be understood throughout this specification and claims that this terminology of "a word" refers to one conceptual structure to be analyzed, and does not necessarily refer only to a single natural language word.

As explained above, Roget's conceptual hierarchy can be used to trace the categories of meaning for a word through different levels of abstraction. The way in which this is commonly looked up using Roget's levels of abstraction, will be demonstrated herein using the word "human" as an example.

When the word "human" is located in the Thesaurus, the following entry is found:

HUMAN

N. PERSON 417.3

ADJ. KIND 938.13

MANKIND 417.10

PITYING 944.7

This shows four possible meanings of the word HUMAN and the different levels of abstraction within which these meanings lie. The meanings are either in a fourth level of abstraction category called MATTER (including numbers 375-421) or a category called AFFECTIONS (including numbers 855-1042). (See Appendix showing the synopsis of categories of Roget's Thesaurus.) Within each of these categories of meaning at the fourth level of abstraction is a plurality of categories of meaning for the third level of abstraction which includes ORGANIC MATTER (including categories of meaning 406-421 and in which mankind (417.10) and person (417.3) are included, SYMPATHETIC AFFECTIONS (922-956 and in which KIND (938.13) and PITYING (944.7) are included. Within each of these categories of meaning at the third level of abstraction is a plurality of categories of meaning for the second level of abstraction covering the meanings of HUMAN which includes MANKIND (417-418), BENEVOLENCE (938-943) and SYMPATHY (944-948). The word HUMAN can be mapped using this kind of exercise to obtain the possible semantic ideas behind the word. The correct one of these meanings is determined using the principle of commonality.

Roget's taxonomy of concepts is similar to what John Haugeland has called "stereotypes". However, Roget's system supplies the pointers to pick out the stereotypes according to our own natural language. The taxonomy reaches toward the deeper level to a level of communication, and is organized according to our natural language to enable us to more easily grasp the ideas behind our natural language.

One of the most important aspects about Roget's taxonomy is that it is not domain-specific. It not only contains information about tangible concepts such as physics and matter, but also about intangible concepts such as affection and sensations.

Although the preferred embodiment is described herein using Roget's taxonomy and Roget's Thesaurus, it should be understood that any similar database which includes information about uses of words could similarly be used. Roget's Thesaurus is the most comprehensive database of this type known to the inventor of the present invention, and therefore, has been used as the preferred mode. However, any other similarly organized database could alternatively be used, and should be considered equivalent in this regard.

The present invention makes use of a relation found by the inventor, using a principle of commonality, implemented into a program which the inventor has called "WordMap". This program implements Roget's taxonomy along with the principle of commonality. This principle extracts the common categories of meaning within each level of abstraction according to Roget's taxonomy. The program takes as input a sequence of natural language words. Each word is traced through Roget's taxonomy from the most specific level of abstraction to the most general level of abstraction. The categories of meaning within any level of abstraction are then examined to determine common elements within the level of abstraction. The common elements are used to generically describe the knowledge contained in the phrase.

The wordmap program will now be described in detail with reference to the accompanying figures of the present invention representing flow charts of the wordmap program. FIG. 2 shows a most general flow chart of the wordmap program. FIG. 2 starts with step 200 in which the sequence of natural language words which is to be analyzed is obtained. Step 202 follows, in which a pointer, which is in this embodiment Roget's classification number, is determined for each word to indicate a category of meaning for each of the plural levels of abstraction in the pre-stored database. For instance, the pointers for the word human are the numbers 417.3, 417.10, 938.13 and 944.7.

At step 204, a determination of commonality of categories of meaning, and a determination of a certainty value for this commonality, is made within each of the multiple levels of abstraction for the words of the phrase. Any common categories of meaning within a particular level of abstraction are taken as "answers", and the certainty value indicates how many times the commonality occurs, therefore indicating how certain it is to have occured. At step 206, each of the common categories of meaning are stored. At step 208 these common categories are processed according to rules for the specific application. One example of such a rule will be described below. This ends the process.

A more detailed flow chart of steps 200-206 is shown in FIG. 3. At step 300, a variable x is set to 1. The variable x is used to denote the word number within the word sequence that is currently being processed. Step 302 is executed to access the word pointed to by the variable x. The particular word being processed is correlated to determine its categories of meaning at the various levels of abstraction within the pre-stored database at step 304 and the resulting categories of meaning are stored in an array at an address of the array referenced by array [x, LOA].

A test is made at step 306 to determine if there is another word within the word sequence. If the answer is yes, the variable x is incremented at step 308, followed by control passing to point 301 where the next word pointed to by the variable x (x now having been incremented) is obtained. If no other word is detected at step 306, the entire word sequence has been entered into the array, and the principle of commonality is applied beginning at step 310.

The principle of commonality is applied by processing the two-dimensional array to determine common elements within each particular level of abstraction labeled as levels 1-4, and certainty values based on a number of occurances of commonality. This involves setting up two nested loops. A first nested loop is set up at step 310, from 1 to 4, each element representing one of the multiple levels of abstraction. The second nested loop is between 1 and x, so that one word in the sequence is processed in each pass.

For each word in the word sequence, the contents of the 5 array at [N1, N] are determined at step 314 (N1 being level of abstraction, N being word number x). Each meaning of each word identified by variable x in the sequence is entered into temporary memory while the level of abstraction remains the same. Step 316 executes the "next" loop. At step 318, therefore, all of the meanings for all of the words in the particular level of abstraction (N) have been entered into the temporary memory. Step 318 finds all common categories of meaning and stores these common categories as the result for level N, as well as counting the number of common occurences to store this as the certainty value. This is followed at step 320 by an incrementing of the level of abstraction, to the next level of abstraction.

A brief explanation of the hierarchy and how each meaning for each word is obtained will be explained with reference to FIG. 4. FIG. 4 corresponds substantially to step 304 in FIG. 3. Step 400 looks up word x in the pre-stored database to get its pointer into the many levels of abstraction. This number is processed to get the categories of meaning in each of the four levels of abstraction according to Roget's classification system at step 402. The categories of meaning comprising the four levels of abstraction are then stored at step 404, followed by step 306 (FIG. 3).

A first example of the operation of the wordmap program explained with reference to FIGS. 2-4 will now be given with reference to Tables 1 and 2 which follow:

TABLE 1 __________________________________________________________________________ WE HAVE NOTHING IN COMMON __________________________________________________________________________ I. Event Nonexistence Newness Unimportance Receiving Absence Location Mediocrity Production; Birth Unimportance Interiority Language Knowledge Unsubstantiality Reception Plain Speech Permission Friendship Knowledge Affirmation Ingress; Entrance Mankind Memory Plants Intelligibility Frequency Deception Generality Compulsion Normality Possession Correlation Retention Inferiority Mean Prose Cooperation Property Participation Amusement Vulgarity Commonality Plainness Dullness II. Possession Being in Abstract Relative Space Mankind Power in Operation Being in Concrete Social Affections Esteem Comprehension Existence in Space Time w/Reference Recurrent Time Acceptance Adaptation to Time to Age Language Consent Motion w/Reference Sharing Recollection to Direction Comprehension Transfer of Property External & Internal Support Dimensions Possession Adaptation to Ends Absolute Relation Conformity to Rule Distributive Order Comparative Quantity Discriminative Affections Linguistic Representation Recurrent Time Vegetable Life Pleasure; Pleasurableness Style; Mode of Expression __________________________________________________________________________

TABLE 2 __________________________________________________________________________ WE HAVE NOTHING IN COMMON __________________________________________________________________________ III. Event Existence Time Quantity Power Space in General Dimensions Relation States of Mind Conditions Motion Order Authority; Control Sympathetic Time Affections Possessive Relations Space in Conditions General Communication of Ideas Support; Opposition Intellectual Faculties Organic Matter & Processes Communication of Ideas Intellectual Faculties & Processes Possessive Relations Personal Affections IV. Intellect Space Affections Matter Volition Volition Space Intellect Abstract Abstract Abstract Volition Relations Relations Relations Abstract Relations Affections __________________________________________________________________________

Tables 1 and 2 show a phrase WE HAVE NOTHING IN COMMON being analyzed and processed, and shows the categories of meaning for each word in the phrase, being grouped by word and level of abstraction. All common elements are underlined. Level of abstraction IV is the most general level and level I is the most specific level.

The phrase WE HAVE NOTHING IN COMMON is processed in level I (most specific) to obtain the common categories of UNIMPORTANCE and KNOWLEDGE. These categories relate to the phrase as follows:

First of all, this statement is about knowledge of the person making the statement (KNOWLEDGE). The import of this phrase comes from the meaning of the sentence. "We have nothing in common." generally means that our relationship is not important (UNIMPORTANCE).

Within the level of abstraction II (second most specific), the common elements are POSSESSION, COMPREHENSION, and ADAPTATION TO ENDS. The statement WE HAVE NOTHING IN COMMON is a statement of the user's comprehension. POSSESSION is also an indication that the possessions between the speaker and the speakee have nothing in common. ADAPTATION TO ENDS relates to the inherent meaning of the phrase, that there is no adaptation between the speaker and the speakee.

Level of abstraction 2 has the common elements of POSSESSIVE RELATIONS, COMMUNICATIONS OF IDEAS, INTELLECTUAL FACULTIES AND PROCESSES, CONDITIONS, and TIME. Among these, perhaps the most interesting is that of POSSESSIVE RELATION. The phrase WE HAVE NOTHING IN COMMON is the type of phrase which might be said from one person who knows another, as an intimate statement to the other. The principle of commonality has enabled this meaning to be gleaned from the phrase, even though no one word by itself might be indicative of this relationship.

The wordmap program has been explained above, and is used according to the present invention to abstract meaning from words within a sequence of words. Many uses for this program would be apparent to those having ordinary skill in the art. For example, if the wordmap program were applied to semantic analysis, it could be used for natural language programs. The program could be used in expert systems for knowledge organization. This technique could be used in software engineering for systems specification. In programming languages, the present invention could be used for form, syntax and structure. In information systems, the present invention could find application in preparation of structure, organization and queries. In software libraries, this could be used for queries. In criminal investigations, it could be used for information structuring. Finally, in writing tools, the present application could be used for organizing information.

These examples are not intended to be limiting, and future research would be expected to obtain results which were usable in machine learning, man-machine interfaces, database queries and organization, natural language generation, programming languages, foreign interpretation, composition and text processing tools, expert systems design, and understanding, representing and implementing knowledge based systems.

The information obtained above from the phrase WE HAVE NOTHING IN COMMON gives unambiguous meanings for the words in the phrase. This is further processed according to the specific end application, so that it is output from this program in a more organized way. For instance, the information output in the description given above with respect to Tables 1 and 2, is processed using the processing step 208 of FIG. 2. One example of this processing step will be explained herein, it being understood that this example is only one of many ways in which the information could be processed.

The example provided herein is one rule which is used according to the application of specifying software structures. The example given is that of writing a program to correlate the curves of LINEAR LOGARITHMIC, SEMI-LOGARITHMIC, HYPERBOLIC and RECIPROCAL. These words are input to the wordmap program. The common categories of meaning within each level of abstraction, where Level of Abstraction IV is the most general level of abstraction and Level of Abstraction I is the most specific level of abstraction, are as follows:

______________________________________ WORDMAP RESULTS - COMMONALITY GENERAL ______________________________________ IV. ABSTRACT RELATIONS SPACE SENSATION INTELLECT VOLITION III. RELATION ODER NUMBER