|
Claims  |
|
|
What is claimed is:
1. A method for automatically determining a meaning from a sequence of
words in a machine, comprising the steps of:
receiving a sequence of words in the machine;
using said machine to access a database that includes a prestored plurality
of categories of meaning for each said word, to obtain at least one
category of meaning for each of said words of said sequence; and
using said machine to select certain ones of said plurality of categories
of meanings by determining ones of said categories of meanings which are
common for different ones of said respective words, said common categories
of meanings being used as abstracted meanings for the sequence.
2. A method as in claim 1, wherein said database is of the type that
includes a plurality of pre-stored categories of meanings at different
levels of abstraction for each said word, said levels of abstraction
ranging progressively from a first level of abstraction which is a more
specific category of meaning of the word to a last level of abstraction
which is a more general category of meaning of the word, wherein said at
least one category of meaning for each said word includes categories of
meanings from each said level of abstraction for each said word, and
wherein said selecting step includes the step of determining common
categories of meaning within each said level of abstraction for each said
word, and using said common categories of meaning for each said level of
abstraction as said abstracted meanings within said each level of
abstraction.
3. A method as in claim 2, wherein said using said machine to select step
includes the steps of:
storing a multiple-dimension array of information from said database, each
said level of abstraction representing one dimension of the array, and
each word of the sequence representing another dimension of the array, and
using said machine to determine common categories of meanings and numbers
of occurrences of said common categories of meanings among said words,
within each said one dimension representing each said level of
abstraction.
4. A method as in claim 2, further comprising the step of processing said
abstracted meaning by correlating categories of meanings within one of
said levels of abstraction to categories of meanings within another level
of abstraction.
5. A method as in claim 2, wherein said sequence of words is in natural
language, and said database is a Roget-type database stored in said
machine having multiple levels of abstraction.
6. A method as in claim 5, wherein said database is Roget's thesaurus
stored in said machine and has four levels of abstraction.
7. A method as in claim 1, wherein said using said machine to select step
includes the steps of:
storing an array of information from said database, each element of the
array including said at least one category of meaning for one of said
words, and
determining common ones of said categories of meanings.
8. A method for automatically abstracting a meaning from a sequence of
words using a machine, comprising the steps of:
receiving a sequence of words in said machine;
accessing a database with said machine of the type that includes a
plurality of categories of meanings at different levels of abstraction for
each said word, each said level of abstraction ranging progressively from
a first level which is a more specific category of meaning of the word to
a last level which is a more general category of meaning of the word, to
obtain a plurality of categories of meanings for each said word which
includes categories of meanings from each said level of abstraction for
each said word of said sequence;
using said machine to determine ones of said categories of meanings within
each said level of abstraction which are common for different ones of said
respective words; and
using said common categories of meanings as abstracted meanings for the
sequence.
9. A method as in claim 8, comprising the further step of processing said
abstracted meanings to obtain a plurality of classifications into which
said sequence belongs.
10. A method as in claim 8, further comprising the step of processing said
abstracted meanings by correlating categories of meanings within one of
said levels of abstraction to categories of meanings within another level
of abstraction.
11. A method as in claim 8, wherein said sequence is in natural language,
and said database is a Roget-type database stored in said machine.
12. A method as in claim 11, wherein said database is Roget's international
Thesaurus stored in said machine.
13. A method for automatically using a Roget-type database that includes a
plurality of categories of meanings at different levels of abstraction for
a plurality of words, each said level of abstraction ranging progressively
from a first level which is a more specific category of meaning of the
word to a last level which is a more detailed category of meaning of the
word to abstract a category of meaning from a sequence of words in natural
language, comprising the steps of:
receiving a sequence of words in natural language in a machine;
accessing said Roget-type database to obtain a plurality of categories of
meanings for each said word from each said level of abstraction for reach
said word of said sequence;
using said machine to determine commonality by determining ones of said
categories of meanings within each said level of abstraction which are
common for different ones of said respective words at said each level of
abstraction; and
using said common categories of meanings as abstracted meanings for the
sequence.
14. A method for automatically abstracting a category of meaning from a
sequence of words using a machine, comprising the steps of:
using said machine to obtain for each of a sequence of words, pointers to
all categories of meanings which said each word can convey; and
determining which are the most likely categories of meanings to be intended
by the sequence, by determining ones of said categories of meanings which
are common for different ones of said respective words.
15. A method as in claim 14, wherein said obtaining step includes the steps
of obtaining said categories of meanings at each of a plurality of
different levels of abstraction for each said word, said levels of
abstraction ranging progressively from a lowest level of abstraction which
is a more specific category of meaning of the word to a highest level of
abstraction which is a more general category of meaning of the word,
wherein said at least one category of meaning for each said word includes
categories of meanings from each said level of abstraction for each said
word, and wherein said determining step includes the step of determining
common elements within each said level of abstraction for each said word,
and further comprising the step of using said common categories of
meanings as abstracted meanings for the sequence.
16. A method as in claim 15, further comprising the step of processing said
abstracted meanings in said machine by correlating categories of meanings
within one of said levels of abstraction to categories of meanings within
another level of abstraction.
17. A method as in claim -6, wherein said processing step includes the
steps of:
deriving and storing a plurality of pre-formed rules for correlating one of
said levels of abstraction with another level of abstraction.
18. A method as in claim 17, wherein said rule is derived by:
taking a first level of abstraction and processing each category of meaning
in said first level of abstraction by correlating categories of meanings
within each of a second level of abstraction more specific than said first
level of abstraction to get a plurality of results in each said second
level of abstraction for each said first level of abstraction,
mapping each said result in said second level into a third level of
abstraction more general than said first level; and
determining if any of said mapped third level results are present and using
said second level of abstraction for said present third level of
abstraction as results.
19. An apparatus for automatically abstracting a meaning from a sequence of
words, using a machine comprising:
means for receiving a sequence of words;
database means, including a prestored plurality of categories of meaning
for each said word, for producing at least one category of meaning for
each of said words of said sequence; and
processing means for automatically selecting certain ones of said plurality
of categories of meanings by determining categories of meanings which are
common for different ones of said respective words, said common categories
of meanings being used as abstracted meanings for the sequence.
20. An apparatus as in claim 19, wherein said database means stores a
plurality of pre-stored categories of meanings at different levels of
abstraction for each said word, said levels of abstraction ranging
progressively from a lowest level of abstraction which is a more specific
category of meaning of the word to a last level of abstraction which is a
more general category of meaning of the word, wherein said at least one
category of meaning for each said word includes categories of meanings
from each said level of abstraction for each said word, and wherein said
processing means includes means for determining common elements within
each said level of abstraction for each said word, and using said common
elements for each said level of abstraction as said abstracted meanings
within said each level of abstraction.
21. An apparatus as in claim 20, wherein said processing means includes:
means for storing a multiple-dimensional array of information from said
database means, each said level of abstraction representing one dimension
of the array, and each word of the sequence representing another dimension
of the array, and
means for determining common categories of meanings among said words,
within each said one dimension representing each said level of
abstraction.
22. An apparatus as in claim 20, wherein said processing means processes
said abstracted meanings by correlating categories of meaning within one
of said levels of abstraction to categories of meaning within another
level of abstraction.
23. An apparatus as in claim 20, wherein said sequence of words is in
natural language, and said database means is a Roget-type database having
multiple levels of abstraction.
24. An apparatus as in claim 23, wherein said database is Roget's
thesaurus, and has four levels of abstraction.
25. An apparats as in claim 19, wherein said processing means includes
means for:
storing an array of information from said database means, each element of
the array including said at least one category of meaning for said each
word, and
determining common ones of said categories of meaning.
26. An apparatus for automatically abstracting a meaning from a sequence of
words using a machine, comprising:
means for receiving a sequence of words;
a database processor of the type that includes a plurality of categories of
meanings at different levels of abstraction for each said word, each said
level of abstraction ranging progressively from a lowest level which is a
more specific category of meaning of the word to a last level which is a
more general category of meaning of the word, for receiving said sequence
of words and analyzing each said word to obtain a plurality of categories
of meaning for each said word which includes category of meanings from
each said level of abstraction for each said word of said sequence; and
processing means for automatically determining ones of said categories of
meanings within each said level of abstraction which are common for
different ones of said respective words and using said common categories
of meanings as abstracted meanings for the sequence.
27. An apparatus as in claim 26, wherein said processing means comprises
means for further processing said abstracted meanings to obtain a
plurality of further categories of meaning into which said sequence
belongs.
28. An apparatus as in claim 26, further comprising the step of processing
said abstracted meanings by correlating categories of meanings within one
of said levels of abstraction to categories of meanings within another
level of abstraction.
29. An apparatus as in claim 26, wherein said sequence is in natural
language, and said database is a Roget-type database.
30. An apparatus for automatically abstracting a category of meaning from a
sequence of words using a machine, comprising:
means for obtaining, for each of a sequence of words, pointers to all
categories of meanings which said each word can convey; and
means for automatically determining which are the most likely categories of
meanings to be intended by the sequence, by determining ones of said
categories of meanings which are common for different ones of said
respective words.
31. An apparatus as in claim 30, wherein said obtaining means includes
means for obtaining said categories of meanings at each of a plurality of
different levels of abstraction for each said word, said levels of
abstraction ranging progressively from a lowest level of abstraction which
is a more specific category of meaning of the word to a highest level of
abstraction which is a more general category of meaning of the word,
wherein said at least one category of meaning for each said word includes
categories of meanings from each said level of abstraction for each said
word, and wherein said determining means includes means for determining
common elements within each said level of abstraction for each said word.
32. An apparatus as in claim 31, wherein said determining means further
comprises means for processing said abstracted category of meaning by
correlating categories of meanings within one of said levels of
abstraction to category of meanings within another level of abstraction.
33. An apparatus as in claim 32, wherein said processing means includes
means for:
deriving and storing a plurality of pre-formed rules for correlating one of
said levels of abstraction with another level of abstraction.
34. An apparatus as in claim 33, wherein said processing means includes
means for deriving said rule by:
taking a first level of abstraction and processing each category of meaning
in said first level of abstraction by correlating categories of meanings
within each of a second level of abstraction within said first level of
abstraction, and which second level of abstraction is one more specific
than said first level of abstraction, to get a plurality of results in
each said second level of abstraction for each said first level of
abstraction,
mapping each said result in said second level into a third level of
abstraction more general than said first level; and
determining if any of said mapped third level results are present and using
said second level of abstraction for said present third level of
abstraction as results. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention defines a system for abstracting concepts from
natural language. This system enables an input sequence of words in
natural language to be abstracted into categories of meanings which
represent the concepts underlying the natural language phrases. This
technique has application to classifying the information in a sequence for
use in expert systems, database processing, artificial intelligence, and
other similar fields of endeavor.
BACKGROUND OF THE INVENTION
Human beings communicate ideas with one another using a mechanism known as
natural language. Natural language evolved as a medium of communications,
as human beings learned to communicate with one another. The specific
evolution and structure of our natural language is of little concern to
all of us when we speak or write in it. However, due to the inherent
structure of natural language, it is an imperfect mechanism for conveying
ideas. The human brain, through its intricate and somewhat intuitive
workings, translates natural language into concepts and ideas, and allows
communications between different individuals using natural language
through a complicated translation process that no machine has been able to
duplicate.
Machines have no intuitive conceptualization capabilities as of this date
(1989). Therefore many difficulties have been encountered in the attempt
to teach machines to understand natural language. Machines communicate in
languages which were specially invented for the machines and which do not
include the ambiguities endemic to natural, evolved languages.
One of the most crucial problems in the artificial intelligence field is
the presentation to a machine of the conceptual knowledge behind natural
language.
Conceptual knowledge is the knowledge which is known by a human being as a
function of semantic meaning. Aristotle had the view (known as the
Aristotlean view) that meanings are represented most commonly in concepts
which are actualized by words. For instance, when a human being uses the
word "chair", the purpose of the word is not to use the actual word itself
(e.g. not the letters c.cndot.h.cndot.a.cndot.i.cndot.r), but to denote
the meaning of the item (a chair) behind the word "chair". The
interpretation of language (such as the word chair) as a whole functions
primarily at the conceptual level. That is, in our example we interpret
the language which we hear (the compressions and rarefactions of air which
comprise the sounds comprising the word "chair.revreaction.) as the item
which is represented by the word "chair."
Aristotle's line of reasoning implies that meaning has three elements. The
first element is the word itself, which differs from language to language,
and even within its own language has inherent ambiguities. For instance,
the word "chair" as a noun may mean a seat, a chair rail, or a chair lift,
or as a verb, may infer leadership, e.g., to chair a committee, for
instance. The second level are the mental images of concepts in the mind,
which are signified by the word which are the same for all people. The
concepts, for instance, are those which the listener develops when he
hears the word "chair". The third element is the concept itself, which is
the chair itself, and is the same for all people.
The second and third elements, the concepts and the things themselves, were
believed by Aristotle to be the same for all people. It is only the words
which differ.
Moreover, the concepts and the things themselves are totally unambiguous,
while most words in natural language are ambiguous.
SUMMARY OF THE INVENTION
The present invention has an object of translating a sequence of words in
natural language into a group of concepts based on the sequence, the
concepts being of a type which are common to all people. The present
invention links each word in a phrase to the concepts behind the word, to
link the word to the concept.
The present invention defines a method and apparatus which translates each
natural language word into a plurality of categories of meaning. Almost
all words in natural language are ambiguous, and almost all words have
such a plurality of such categories of meaning. After each word is
translated into the plurality of categories of meaning which it can
represent, common ones of said categories of meaning are ascertained. This
determination of commonality indicates that this common categories of
meaning may be an actual meaning conveyed by the phrase.
According to a preferred embodiment of the invention, a database is used
which includes a plurality of usages of each word organized into different
levels of generality, called levels of abstraction. The preferred mode of
the invention uses Roget's International Thesaurus, 4th Edition, to obtain
these usages of each word. Roget's classification organizes each word into
its conceptual meanings based on the membership of the the word in the
cluster of linguistic signs that go to make up some large and very general
concept. Such a database will be referred to throughout this application
as a Roget-type database. Roget's index to Classifications is reproduced
as an Appendix to this application. It includes four levels ranging from
most specific to least specific, and all words in the English language can
be classified into these categories.
A best mode of the present invention implements Roget's taxonomy along with
the Principle of Commonality. This principle extracts the common elements
within each level of an abstraction according to Roget's taxonomy. The
program takes as input a sequence of natural language words. Each word is
traced through Roget's taxonomy from the most specific to the most
general. Each level of abstraction is then examined to determine common
elements within the level of abstraction which are common to any two
elements of the phrase. The common elements receive a certainty value,
based on the number of occurences of the element. The common elements and
certainty value are used to generically describe the knowledge contained
in the phrase.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other advantages of the invention will be described in detail
with reference to the accompanying drawings, in which
FIG. 1 shows an example of Roget's classification system using four
progressively more specific levels of abstraction, as used in the
preferred mode of the present invention;
FIG. 2 shows a general flowchart of the operation of the present invention;
FIG. 3 shows a more detailed flowchart of the operation of classifying the
words according to their meaning;
FIG. 4 shows detail on how each word is classified into the various levels
of abstraction;
FIG. 5 shows a sample hardware layout of the present invention; and
FIG. 6 shows a flow diagram of the operation of the rule used to correlate
level of abstraction 2 with level of abstraction 3.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The inventor of the present invention has first made the realization of an
alternative use of a database which includes meanings of words organized
by levels of abstraction. One such database is found in Roget's
International Thesaurus.
A thesaurus is a tool for transforming ideas into words. Its most common
use is for determining synonyms or antonyms. When a writer has an idea (a
conceptualization of a meaning) and wants a word for it, the writer
consults a thesaurus in order to determine all of the possible words for
the idea.
The inherent organization of a thesaurus is by conceptualization of ideas,
organized into words which express the ideas. The inventor of the present
invention has first realized that this organization has an application
when it is used in reverse--to obtain the ideas behind words from the
words themselves.
The inventor has also pioneered a technique for using these meanings, and
has titled it "the principle of commonality". This technique is used for
determining which ones of the many meanings are applicable to a sequence
of words or phrase. This phrase may, of course, be a sentence. This
principle finds common meanings between various words of a sequence and
uses these common meanings as the key meanings: those which are most
likely to be the ones of the meanings intended to be conveyed by the
phrase.
The present invention has been embodied using Roget's International
Thesaurus, 4th Edition, as the preferred embodiment of a database for
obtaining ideas and meanings behind various words. However, it should be
understood that any similarly organized database or classification scheme
could be equivalently used.
Peter Mark Roget organized his thesaurus in 1852 by producing a finite list
of unambiguous concepts, or categories of meaning, and identifying the
relationship among these concepts as a hierarchy. Each ordinary word in
natural language was mapped from the natural language into a hierarchy of
categories of meaning, so that each word has its many meanings related
into many of the single concepts, each of which is unambiguous. A proposed
use for this is to determine synonyms.
Roget's conceptual taxonomy is organized according to the structure of FIG.
1. This cultural taxonomy is organized into four hierarchical levels of
abstraction. The fourth level of abstraction is the most general, and has
eight categories of meaning. All of these eight categories of meaning are
shown in FIG. 1, and include ABSTRACT RELATIONS; SPACE; PHYSICS; MATTER;
SENSATION; INTELLECT; VOLITION; and AFFECTIONS. All words in the natural
English language can be classified into one of these eight, most-general
categories of meaning within the fourth (most-general) level of
abstraction. In fact, Roget has done so throughout his Thesaurus.
Within, and more specific than, each of these categories of meaning at the
fourth level of abstraction are a number of categories at the third levels
of abstraction. For instance, within the category of meaning at the fourth
level of abstraction known as MATTER, there are three categories of
meaning at the third level of abstraction: MATTER IN GENERAL; INORGANIC
MATTER; and ORGANIC MATTER. All other categories of meaning at the fourth
level of abstraction have a similar organization of categories at the
third level.
Each of the categories of meaning within the third level of abstraction can
be expanded into a number of more specific categories of meaning within a
second level of abstraction. For instance, inorganic matter can be
expanded into MINERAL KINGDOM; SOIL; LIQUID; and VAPORS.
Finally, the first level of abstraction and its associated categories of
meaning are the most specific, and each of the second categories of
meaning are expanded into first categories of meaning. For instance, SOIL
can be expanded into LAND; BODY OF LAND; and PLAIN.
Roget has classified each word in natural language into categories of
meaning within each of the four levels of abstraction. Some of these four
levels of categories of meaning may not exist, however, for certain words.
For instance, within the level of abstraction 4, the category of meaning
MATTER has a third level of abstraction category of meaning labelled as
MATTER IN GENERAL. Under this third category of meaning is no second
category of meaning, but rather the third category of meaning MATTER IN
GENERAL has only first level of abstraction categories of meaning
depending therefrom, including UNIVERSE; MATERIALITY; IMMATERIALITY;
MATERIAL; CHEMICAL; OILS AND LUBRICANTS; and RESINS AND GUMS.
The inventor has determined that using this hierarchy allows each word in a
phrase of a natural language to be grouped according to its meaning. Each
"word" in a phrase can be analyzed to determine the appropriate categories
of meaning from the Roget's classification that apply thereto. These
categories of meaning from the four levels of abstraction are stored as
functions of their level of abstraction. When the entire phrase has been
analyzed, common categories of meaning within levels of abstraction are
determined and assigned a certainty value based on how many times they
occur. If any two words in the phrase have a common category of meaning
within any level of abstraction, there is a high probability that this is
a meaning behind these words when used in the phrase. However, the
commonality need not be between all words in the phrase and any two
occurrences of category of meaning within any level of abstraction.
The term "word" is used loosely throughout the specification and claims,
and refers to a structure which may be more than one natural language
word. For instance, the word may include an article (such as "a word") or
may be an entire verb such as "to obtain". It should be understood
throughout this specification and claims that this terminology of "a word"
refers to one conceptual structure to be analyzed, and does not
necessarily refer only to a single natural language word.
As explained above, Roget's conceptual hierarchy can be used to trace the
categories of meaning for a word through different levels of abstraction.
The way in which this is commonly looked up using Roget's levels of
abstraction, will be demonstrated herein using the word "human" as an
example.
When the word "human" is located in the Thesaurus, the following entry is
found:
HUMAN
N. PERSON 417.3
ADJ. KIND 938.13
MANKIND 417.10
PITYING 944.7
This shows four possible meanings of the word HUMAN and the different
levels of abstraction within which these meanings lie. The meanings are
either in a fourth level of abstraction category called MATTER (including
numbers 375-421) or a category called AFFECTIONS (including numbers
855-1042). (See Appendix showing the synopsis of categories of Roget's
Thesaurus.) Within each of these categories of meaning at the fourth level
of abstraction is a plurality of categories of meaning for the third level
of abstraction which includes ORGANIC MATTER (including categories of
meaning 406-421 and in which mankind (417.10) and person (417.3) are
included, SYMPATHETIC AFFECTIONS (922-956 and in which KIND (938.13) and
PITYING (944.7) are included. Within each of these categories of meaning
at the third level of abstraction is a plurality of categories of meaning
for the second level of abstraction covering the meanings of HUMAN which
includes MANKIND (417-418), BENEVOLENCE (938-943) and SYMPATHY (944-948).
The word HUMAN can be mapped using this kind of exercise to obtain the
possible semantic ideas behind the word. The correct one of these meanings
is determined using the principle of commonality.
Roget's taxonomy of concepts is similar to what John Haugeland has called
"stereotypes". However, Roget's system supplies the pointers to pick out
the stereotypes according to our own natural language. The taxonomy
reaches toward the deeper level to a level of communication, and is
organized according to our natural language to enable us to more easily
grasp the ideas behind our natural language.
One of the most important aspects about Roget's taxonomy is that it is not
domain-specific. It not only contains information about tangible concepts
such as physics and matter, but also about intangible concepts such as
affection and sensations.
Although the preferred embodiment is described herein using Roget's
taxonomy and Roget's Thesaurus, it should be understood that any similar
database which includes information about uses of words could similarly be
used. Roget's Thesaurus is the most comprehensive database of this type
known to the inventor of the present invention, and therefore, has been
used as the preferred mode. However, any other similarly organized
database could alternatively be used, and should be considered equivalent
in this regard.
The present invention makes use of a relation found by the inventor, using
a principle of commonality, implemented into a program which the inventor
has called "WordMap". This program implements Roget's taxonomy along with
the principle of commonality. This principle extracts the common
categories of meaning within each level of abstraction according to
Roget's taxonomy. The program takes as input a sequence of natural
language words. Each word is traced through Roget's taxonomy from the most
specific level of abstraction to the most general level of abstraction.
The categories of meaning within any level of abstraction are then
examined to determine common elements within the level of abstraction. The
common elements are used to generically describe the knowledge contained
in the phrase.
The wordmap program will now be described in detail with reference to the
accompanying figures of the present invention representing flow charts of
the wordmap program. FIG. 2 shows a most general flow chart of the wordmap
program. FIG. 2 starts with step 200 in which the sequence of natural
language words which is to be analyzed is obtained. Step 202 follows, in
which a pointer, which is in this embodiment Roget's classification
number, is determined for each word to indicate a category of meaning for
each of the plural levels of abstraction in the pre-stored database. For
instance, the pointers for the word human are the numbers 417.3, 417.10,
938.13 and 944.7.
At step 204, a determination of commonality of categories of meaning, and a
determination of a certainty value for this commonality, is made within
each of the multiple levels of abstraction for the words of the phrase.
Any common categories of meaning within a particular level of abstraction
are taken as "answers", and the certainty value indicates how many times
the commonality occurs, therefore indicating how certain it is to have
occured. At step 206, each of the common categories of meaning are stored.
At step 208 these common categories are processed according to rules for
the specific application. One example of such a rule will be described
below. This ends the process.
A more detailed flow chart of steps 200-206 is shown in FIG. 3. At step
300, a variable x is set to 1. The variable x is used to denote the word
number within the word sequence that is currently being processed. Step
302 is executed to access the word pointed to by the variable x. The
particular word being processed is correlated to determine its categories
of meaning at the various levels of abstraction within the pre-stored
database at step 304 and the resulting categories of meaning are stored in
an array at an address of the array referenced by array [x, LOA].
A test is made at step 306 to determine if there is another word within the
word sequence. If the answer is yes, the variable x is incremented at step
308, followed by control passing to point 301 where the next word pointed
to by the variable x (x now having been incremented) is obtained. If no
other word is detected at step 306, the entire word sequence has been
entered into the array, and the principle of commonality is applied
beginning at step 310.
The principle of commonality is applied by processing the two-dimensional
array to determine common elements within each particular level of
abstraction labeled as levels 1-4, and certainty values based on a number
of occurances of commonality. This involves setting up two nested loops. A
first nested loop is set up at step 310, from 1 to 4, each element
representing one of the multiple levels of abstraction. The second nested
loop is between 1 and x, so that one word in the sequence is processed in
each pass.
For each word in the word sequence, the contents of the 5 array at [N1, N]
are determined at step 314 (N1 being level of abstraction, N being word
number x). Each meaning of each word identified by variable x in the
sequence is entered into temporary memory while the level of abstraction
remains the same. Step 316 executes the "next" loop. At step 318,
therefore, all of the meanings for all of the words in the particular
level of abstraction (N) have been entered into the temporary memory. Step
318 finds all common categories of meaning and stores these common
categories as the result for level N, as well as counting the number of
common occurences to store this as the certainty value. This is followed
at step 320 by an incrementing of the level of abstraction, to the next
level of abstraction.
A brief explanation of the hierarchy and how each meaning for each word is
obtained will be explained with reference to FIG. 4. FIG. 4 corresponds
substantially to step 304 in FIG. 3. Step 400 looks up word x in the
pre-stored database to get its pointer into the many levels of
abstraction. This number is processed to get the categories of meaning in
each of the four levels of abstraction according to Roget's classification
system at step 402. The categories of meaning comprising the four levels
of abstraction are then stored at step 404, followed by step 306 (FIG. 3).
A first example of the operation of the wordmap program explained with
reference to FIGS. 2-4 will now be given with reference to Tables 1 and 2
which follow:
TABLE 1
__________________________________________________________________________
WE HAVE NOTHING IN COMMON
__________________________________________________________________________
I. Event Nonexistence
Newness Unimportance
Receiving Absence Location Mediocrity
Production; Birth
Unimportance
Interiority
Language
Knowledge Unsubstantiality
Reception Plain Speech
Permission Friendship Knowledge
Affirmation Ingress; Entrance
Mankind
Memory Plants
Intelligibility Frequency
Deception Generality
Compulsion Normality
Possession Correlation
Retention Inferiority
Mean
Prose
Cooperation
Property
Participation
Amusement
Vulgarity
Commonality
Plainness
Dullness
II.
Possession Being in Abstract
Relative Space
Mankind
Power in Operation
Being in Concrete
Social Affections
Esteem
Comprehension
Existence in Space
Time w/Reference
Recurrent Time
Acceptance Adaptation to Time
to Age Language
Consent Motion w/Reference
Sharing
Recollection to Direction
Comprehension
Transfer of Property External & Internal
Support
Dimensions Possession
Adaptation to Ends
Absolute Relation
Conformity to Rule
Distributive Order
Comparative Quantity
Discriminative
Affections
Linguistic
Representation
Recurrent Time
Vegetable Life
Pleasure; Pleasurableness
Style; Mode of Expression
__________________________________________________________________________
TABLE 2
__________________________________________________________________________
WE HAVE NOTHING IN COMMON
__________________________________________________________________________
III.
Event Existence
Time Quantity
Power Space in General
Dimensions
Relation
States of Mind
Conditions
Motion Order
Authority; Control Sympathetic
Time
Affections
Possessive Relations Space in
Conditions
General
Communication of Ideas Support; Opposition
Intellectual Faculties Organic Matter
& Processes Communication of
Ideas
Intellectual Faculties
& Processes
Possessive Relations
Personal Affections
IV.
Intellect Space Affections
Matter
Volition Volition Space Intellect
Abstract Abstract Abstract
Volition
Relations Relations
Relations
Abstract Relations
Affections
__________________________________________________________________________
Tables 1 and 2 show a phrase WE HAVE NOTHING IN COMMON being analyzed and
processed, and shows the categories of meaning for each word in the
phrase, being grouped by word and level of abstraction. All common
elements are underlined. Level of abstraction IV is the most general level
and level I is the most specific level.
The phrase WE HAVE NOTHING IN COMMON is processed in level I (most
specific) to obtain the common categories of UNIMPORTANCE and KNOWLEDGE.
These categories relate to the phrase as follows:
First of all, this statement is about knowledge of the person making the
statement (KNOWLEDGE). The import of this phrase comes from the meaning of
the sentence. "We have nothing in common." generally means that our
relationship is not important (UNIMPORTANCE).
Within the level of abstraction II (second most specific), the common
elements are POSSESSION, COMPREHENSION, and ADAPTATION TO ENDS. The
statement WE HAVE NOTHING IN COMMON is a statement of the user's
comprehension. POSSESSION is also an indication that the possessions
between the speaker and the speakee have nothing in common. ADAPTATION TO
ENDS relates to the inherent meaning of the phrase, that there is no
adaptation between the speaker and the speakee.
Level of abstraction 2 has the common elements of POSSESSIVE RELATIONS,
COMMUNICATIONS OF IDEAS, INTELLECTUAL FACULTIES AND PROCESSES, CONDITIONS,
and TIME. Among these, perhaps the most interesting is that of POSSESSIVE
RELATION. The phrase WE HAVE NOTHING IN COMMON is the type of phrase which
might be said from one person who knows another, as an intimate statement
to the other. The principle of commonality has enabled this meaning to be
gleaned from the phrase, even though no one word by itself might be
indicative of this relationship.
The wordmap program has been explained above, and is used according to the
present invention to abstract meaning from words within a sequence of
words. Many uses for this program would be apparent to those having
ordinary skill in the art. For example, if the wordmap program were
applied to semantic analysis, it could be used for natural language
programs. The program could be used in expert systems for knowledge
organization. This technique could be used in software engineering for
systems specification. In programming languages, the present invention
could be used for form, syntax and structure. In information systems, the
present invention could find application in preparation of structure,
organization and queries. In software libraries, this could be used for
queries. In criminal investigations, it could be used for information
structuring. Finally, in writing tools, the present application could be
used for organizing information.
These examples are not intended to be limiting, and future research would
be expected to obtain results which were usable in machine learning,
man-machine interfaces, database queries and organization, natural
language generation, programming languages, foreign interpretation,
composition and text processing tools, expert systems design, and
understanding, representing and implementing knowledge based systems.
The information obtained above from the phrase WE HAVE NOTHING IN COMMON
gives unambiguous meanings for the words in the phrase. This is further
processed according to the specific end application, so that it is output
from this program in a more organized way. For instance, the information
output in the description given above with respect to Tables 1 and 2, is
processed using the processing step 208 of FIG. 2. One example of this
processing step will be explained herein, it being understood that this
example is only one of many ways in which the information could be
processed.
The example provided herein is one rule which is used according to the
application of specifying software structures. The example given is that
of writing a program to correlate the curves of LINEAR LOGARITHMIC,
SEMI-LOGARITHMIC, HYPERBOLIC and RECIPROCAL. These words are input to the
wordmap program. The common categories of meaning within each level of
abstraction, where Level of Abstraction IV is the most general level of
abstraction and Level of Abstraction I is the most specific level of
abstraction, are as follows:
______________________________________
WORDMAP RESULTS - COMMONALITY
GENERAL
______________________________________
IV. ABSTRACT RELATIONS
SPACE
SENSATION
INTELLECT
VOLITION
III. RELATION
ODER
NUMBER
| | |