|
Description  |
|
|
Included is a microfiche appendix of 4 microfiche and 385 frames
BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates to a system for user-customizing a natural
language menu command system.
In general, the task of writing a natural language interface to some target
computer system involves writing a natural language grammar and a
corresponding lexicon and a set of semantic translations to the target
system. This process has traditionally been a manual one requiring a
grammar specialist and weeks or months or even years or work. Furthermore,
the interface that results is not portable to new domains, not robust with
respect to changes in the target system, not easy to debug, and may not
cover the target system (a proof that it does so may be extremely
difficult). The invention described here automates the process of
generating an interface in the special case of relational databases and in
the context of a menu-based natural language understanding system. In
essence, the invention involves parameterizing a core grammar, lexicon and
set of translations so that the user specifies only information that
includes what database tables (relations and views) he wants covered, what
table access rights (retrieved, insert, . . . ) the interface should
allow, what joins he wants the grammar to explicitly support, what
attributes are numeric and computable, etc.
The ADVANTAGES of this invention are that an end user can construct his own
interface knowing nothing about grammars and lexicons so no specialist is
required. The process of specifying the parameters to the
MAKE-PORTABLE-INTERFACE routine takes minutes, not weeks or months. The
technique is portable across domains. The user can control precisely what
tables to include, etc and so can control the coverage of the natural
language interface. The interface is easy to change if the user changes
table descriptions or adds or deletes tables. Finally, a given core
grammar can be constructed to probably cover any subportion of a target
relational database interface language. And it can be guaranteed to be
bug-free, since it is easy to validate the parameters to the
MAKE-PORTABLE-INTERFACE routine and prove it correct.
The significance of this invention is that it makes it possible, for a MUCH
broader class of users and applications, to use menu-based natural
language interfaces to databases.
Natural language interfaces to computer systems are not in common use today
for two main reasons: they are difficult to use and expensive to build and
maintain. A companion patent application described a particular
grammar-driven, menu-based software tool which, when used with a semantic
grammar and lexicon allows a naive user to construct understandable
sentences using a pointing device. That patent application solves many of
the problems having to do with "ease-of-use" of natural language
interfaces. The present invention solves the second problem, of making
natural language interfaces easy to build and maintain, in the very
important special case of relational databases and in the context of a
grammar-driven, menu-based interface drive. (The technique can be
generalized to non-relational database applications and probably to
applications other than database applications.
Specifically, this invention addresses the following problems: Existing
natural language interfaces to databases are seldom portable; almost all
are application-specific. They take from man-weeks to man-years for a
specialist to build. They are not robust with regard to changes in the
data they are interfacing to. They are hard to debug. And there is no
established way to guarantee that they cover the desired data or fit the
functionality of the target computer system. So, using existing
technology, natural language interfaces to databases will be built only
for important database applications. These problems will haunt even
NLMENU-driven, manually constructed interfaces, though, because simpler
grammars can be used, a grammar specialist should be able to complete an
interface in a few weeks, (but he would still be required to maintain the
interface when changes occurred to the structure of the database).
The invention described here solves the problems listed above. END-USERS
can build natural language interfaces to their own data; the interfaces
are robust with respect to database changes; they can be customized to
cover user-specified subsets of data and functionality in a precise
manner; and they are provably correct.
In this section we will briefly discuss the general state of natural
language interfaces to software systems and database systems in
particular. Then we will briefly discuss menu-based grammar-driven
interfaces. Then we will relate these to the invention and discuss known
prior art as it concerns automatically generating interfaces to a
database.
INTRODUCTION
Natural language interfaces to software fall into two categories: (1) those
based on pseudo-English, where a user must learn a command language that
is English-like. COBOL is a well-known computer language that employs some
pseudo-English. (2) Those based on the premise that a user should be
allowed to express himself in any way that is natural to him and that the
system will make sense of his input. Keyword extraction techniques suffice
only in simple applications. But for most other purposes, the user's
natural language input must be better understood.
Natural language interfaces to computer systems that have been constructed
employ a grammar which characterizes the set of acceptable input strings.
A parser then access this grammar to produce a parse tree (parse trees for
ambiguous input) for the input string. The parse tree is then translated
into an expression (or set of expressions) which represent the meaning of
the input string and which are interpretable by the target computer system
for which the interface is being built. A wide variety of grammar
formalisms and parsing strategies have been developed.
For all natural language systems, the user is required to type his question
using the keyboard of a computer terminal. When the entire query has been
received, the natural language interface processes the input. Processing
typically results in either an answer to the user's query or a response
indicating that the query was not well-understood.
Most natural language interfaces to database systems have been prototypes,
built by the research community. The primary application for natural
language interfaces has been to natural language database query systems.
Some of these prototype systems include the LUNAR system (Woods et al.
1972), the PLANES system (Waltz and Goodman, 1977), the LADDER system
(Hendrix, 1978), and the TQA system (Petrick, 1981). Some commercial
systems exist. Larry Harris of Artificial Intelligence Corp, Roger Schank
of Cognitive Systems Inc, and Gary Hendrix of Symantec are all marketing
natural language interfaces to software systems.
Natural language systems are not in wide spread use today for two reasons:
current systems are not easy to use nor are they easy to build and
maintain.
Tennant's dissertation (1980) explores why current natural language systems
are not easy to use. Tennant performed the first and only extensive
evaluation of a natural language interface. In his evaluation of the
PLANES system, he found that only simple queries of 6 or 7 words were
input to the system and 1/3 of the queries were not understood by the
system. The results showed clearly why natural language interfaces are not
in wide-spread use today.
Some of the problems involving "ease of use" of natural language interfaces
are:
users type poorly and often use ungrammatical constructions.--users had no
way of knowing the grammatical coverage of the system (what constructions
were allowed in its grammar) and they had no clear path to learn such
limitations.
users had no way of knowing the semantic coverage of the system (which
attributes, entities, and relationships the system knew about) and again
they had no clear path to learn about the system's limitations (what kinds
of things it di and did not know about).
Traditional techniques for handling these difficulties involved spelling
correctors and very large grammars and lexicons, but Tennant's
dissertation shows how unsuccessful these techniques have been.
The companion patent by Tennant, Ross, and Saenz describes an invention
that overcomes the difficulties involving "ease of use" of natural
language interfaces (and also formal language interfaces--see next
subsection).
The second reason why natural language interfaces to databases are not in
common use today is the large amount of time it has traditionally taken to
construct a natural language interface. Current technology is such that
each interface must be constructed on a case by case basis for each
application. Efforts taking from 10 to 30 man-years per application are
not uncommon. Thus, only applications that can justify such large
expenditure of manpower are candidate for possible applications. However,
given the quality of the system that results, the effort has not proven to
be worthwhile.
This patent application describes an invention, which is dependent on a
grammar-driven, menu-based interface driver, which solves the problems
involving ease of building and maintaining natural language interfaces to
databases.
MENU-BASED, GRAMMAR DRIVEN NATURAL LANGUAGE INTERFACES. The companion
application (simultaneously-filed U.S. patent application Ser. No.
462,151, filed Jan. 28, 1983, which is hereby incorporated by reference,
describes an interface driver (hereinafter called the NLMENU driver) that
parses a user's input phrase by phrase, at every point offering him one or
more menus from which to select sentence completions, so that only
syntactically well-formed strings can be constructed. This guarantees
grammatical coverage. A "semantic grammar" (a context-free grammar that
encodes the semantics of the domain) (Burton, 1976) is used to guarantee
semantic coverge. An all-paths, predictive parser interprets the grammar
and is used to control what menu selections are available to a user at any
point in the construction of sentences. A pointing device (or selection
from a keyboard) is used to make the selections of completing phrases from
a menu or menus. So an NLMENU-driven interface answers the set of
objections to natural language interfaces that involve "ease of use" of
the interface. One additional advantage of the NLMENU-driven interfaces is
that only one or a few ways are needed to state a natural language
command, instead of an arbitrarily large grammar that covers every
construction that every user is likely to use including ungrammatical
ones. So the grammar can be small.
The companion patent application actually describes a particular kind of
grammar-driven, menu-based interface system in which a set of menus
permanently resides on the screen of a CRT and menus that the user can
choose from are hilighted in some way and constrained to contain only next
legal choices. Unhilighted menus contain all menu choices for the menu
category to indicate the scope of the interface to the user at all times.
This interface is really a special case of grammar-driven, menu-based
interfaces. Another particular and successful grammar-driven, not quite
menu-based interface is DEC 2060 command language-like interface (TOPS-20,
1980), where a user types enough of a phrase to disambiguate it and asks
the system for completion and at any point in stating a command, he can
use "?" to ask for a menu-like list of next phrases.
The companion patent application is only a natural language interface if it
is used with a natural language grammar and lexicon. It can equally well
be used with formal natural languages like the database query language
SQL, and in fact the DSG product to be released on Feb. 1, 1982, will
feature a GUIDED-SQL interface that uses exactly the same driving software
but a different grammar and lexicon. This point is significant since it
widens the scope of the usefulness of that patent. It happens though that
many advantages (noted above) accrue from using natural language grammars
and lexicons with that NLMENU interface driver.
The above sections indicate that in the past, it has been expensive to
build and maintain natural language interfaces to databases. This
invention solves that problem for relational databases, but only in the
context of some grammar-driven, menu-based driver. The grammars and
lexicons produced by the MAKE-PORTABLE-INTERFACE function are for use with
such a system and would be very inadequate in traditional systems. The
principal reason is that they are purposely engineered to be simple, to be
expressive, and to provide only a limited set of grammatical and lexical
ways of expressing a statement. They are aimed at taking advantage of
person's ability to understand a fragment of natural language written in a
limited language and at guiding him to express himself in that limited
language. So there is no intent to cover more natural language than a
domain requires.
The MAKE-PORTABLE-INTERFACE function described in this patent is NOT the
first and only work in the area of generating interfaces of some kind from
descriptions of the data, but its purposes are different, its
immplementation is simpler, and it works. As mentioned above, its purpose
is to be used in conjunction with a menu-based grammar-driven interface
driver and to interface to a relational database and to be easy enough to
use so that a user who can build tables for his own applications can also
easily build and maintain language interface to those tables. Its
implementation is simpler than previous works since it does not involve a
large grammar that attempts to cover English. It works--several
automatically generated interfaces have been successfully tried.
In this section, I will be describing other's work in building
transportable natural language interfaces to databases. That work is often
not described precisely enough for another researcher to know exactly what
techniques were used or how successful those techniques were. It is not
surprising that techniques reported in the literature bear some general
similarities to my work: an acquisition dialogue of some sort must provide
the interface building mechanism with parameters of some kind. Some sort
of method is needed to construct the grammar. Some sort of database query
language must be the target. A big difference between the present
invention and what others have done is that the MAKE-PORTABLE-INTERFACE
function generates small grammars which are designed for use with a
grammar-driven, menu-based system. All the other systems use large
grammars, with all the attendant problems mentioned above. Using the
present invention, people who have never seen a lisp machine before can
formulate interesting queries using automatically generated natural
language interfaces, as often happens in our demos. None of the other
researchers have reported that their automatically generated natural
language unterfaces are usable (with the exception of Larry Harris and he
has not carefully documented his claims in the literature).
As early as 1978, a panel at the SIGMOD database conference in Austin
listed five questions concerning natural language interfaces to databases.
Even at that time, the question of portability was raised with the
question "How much can we automate the process of changing database
environments?" (SIGMOD, 1978).
Work at SRI: SRI has been an important center for research into natural
language interfaces to databases for several years. Some work has
concerned portable interfaces:
Two papers by Kurt Konolige (Konolige 1979) and (Konolige, 1980) are
related to this work. The 1979 paper (a tech report) describes a framework
for a portable natural language interface to a database. ((I don't have
it)). The 1980 paper briefly describes a method of applying metal rules to
context-free rules to generate other context-free rules. Konolige is
interested in showing how passive and dative movement and other
transformations in transformational grammar can be pre-compiled into a
grammar, but roughly the same sort of rule instantiation is used as I have
used (he uses symbol replacement and tests on whether to generate new
rules and I use string replacement and specify explicitly when to build
new rules). The method is not described with respect to a grammar for
databases and there is no mention of semantic grammars. Konolige
independently notices that his metarule system, because it was applied
uniformly, "exposed gaps in a phase-structure grammar that was
painstakingly developed over a five year period and was thought to be
reasonably complete for a large subset of English."
In (Hendrix and Lewis, 1981), the authors describe TED, a prototype system
for creating new domains for natural language systems without the need for
specially trained experts. Hendrix described a dialogue in which
information (like some of that found in the portable spec) is elicited
from a user. That information is stored in tables and a "pragmatic
grammar" is used to access the tables at execution time. The pragmatic
grammar differs from a semantic grammar in that it contains slots
(variable or procedures) that the information in the data structure fills.
Hendrix does not describe exactly how the constraints work to guarantee
that different slots in the same sentence are semantically related, but
assuming that they are, he is describing something with the same
descriptive power as a semantic grammar, but with the additional advantage
that the pragmatic grammar stays small. His stated intent is to use Jane
Robinson's large grammar DIAGRAM (Robinson, 1982) for his pragmatic
grammar. The information he gathers involves lexical information gathering
(synonyms, antonyms, verb conjugations, +-human, and also database
structural information (like what attributes are numeric . . . ).
Hendrix's approach offers a competing way to do the same sort of thing my
approach does, but it differs in the ways noted above.
In (Grosz et al, 1982a) and (Grosz, 1982b), Grosz reports on a software
component called TEAM. TEAM is an extension to Hendrix's TED. It is
implemented using Robinson's DIAGRAM grammar and the acquisition dialogue
has been extended to include domain-specific verbs. It is intended for use
by a database expert who is not necessarily a natural language expert. In
her section on future research in (Grosz, 1982b), Grosz states:
"The existing version of TEAM demonstrates the feasibility of constructing
transportable systems. It has been used (in experimental settings) by
people who are not experts in natural language processing to create
natural language interfaces to databases. Current work on TEAM is
concerned with extending the range of linguistic expressions it can handle
(e.g., to include temporal expressions, a larger range of verbs, and
quantified commands) and the kinds of databases for which it can provide
interfaces (e.g. non-normalized databases and those with fields containing
dates)."
Grosz offers no documentation to support a claim that usable, portable
natural language interfaces result from TEAM. She does not describe the
users or the databases they have built interfaces to or the usability of
those interfaces.
Larris Harris is responsible for the first important commercial venture
that is based on AI technology. He developed the ROBOT system, now called
INTELLECT and marketed on ON-LINE ENGLISH by Cullinaine. A database
administrator is needed to create and build a lexicon for a fixed grammar
and the system interfaces to a single file. Much of what goes into the
lexicon is synonym information or operational word definitions (e.g.
define "woman" as "employee whose sex="F") and the grammar is large and
complex. It is not clear how much effort is required to move to a new
domain, but the ROBOT system has been tested in a variety of applications,
which is saying more than can be said for the other efforts discussed in
this section, which report on transportable systems but give no concrete
evidence of a range of applications tested.
The IBM Rendezvous system. The Rendezvous prototype natural language
interface to a database uses much the same kind of information as the SRI
and TI systems. The information about database structure is stored in data
structures that describe both database structural information and
linguistic information. The grammar used is some variant of an augmented
transition network and contains rules that are domain dependent, with no
reported provision for automatically generating interfaces to new domains.
S Jerrold Kaplan (Kaplan, 1979) describes a portable interface which
includes an augmented transition network. He restricts his grammar to
handle only questions beginning with WH-words, like "who", "what", . . .
which limits the syntax his grammar has to recognize, but he offers the
user no path to learning to use the limited language that results. His
system interfaces to heirarchically structured collections of files. In
his system, all domain-specific information is stored in the lexicon. He
describes an experience of porting his natural language interface to a new
domain database in 5 hours, which is quite good by the standards of the
other systems mentioned above, but the process still involves an expert.
In his 1979? paper, he does not go into detail on exactly what is
involved.
None of the previous work in the field claims the advantages of portability
that the present system offers:
an end user can construct his own natural language using an interactive
system.
he can finetune the interface by specidying which tables will be covered,
the access rights to those tables, the attributes to be covered and the
joins that the natural language system will support. He can change menu
item phrasing to more natural phrasings.
the interface will be error free.
the interface can be treated as a database object and granted to other
users who can then use the natural language to access their data.
Thus, it is object of the present invention to provide a method for rapidly
instructing a natural-language menu interface system which is customized
according to a particular users needs.
Thus, it is a further object of the present invention to provide a method
and system for rapidly constructing a natural language menu interface
system which is customized according to the attributes of a particular
user-supplied database.
According to the present invention, there is provided:
A system for generating a user customized natural language menu interface
from a database supplied by the user and from inputs supplied
interactively by a user, comprising:
display means for showing to a user expert a menu of options;
input means, for moving a cursor's apparent position on said display means
by the action of said user;
means for receiving a database;
means for displaying on said display means a generalized relation between
two or more tables in said database, with generalized indications of
corresponding natural-language connecting phrases; and
means for interactively receiving from said user a plurality of said
relations between tables in said database, together with natural-language
connecting phrases corresponding to said relations between tables and said
database; and
means for integrating said relations interactively selected by said user
into a portable specification for natural-language-menu interface, and
forstoring said portable specification.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be discussed with reference to the accompanying
drawings, wherein:
FIGS. 1-11 demonstrate construction of a natural-language interface to a
relational database, as performed by an unskilled user.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The invention provides a means of generating a semantic grammar and a
semantic lexicon from a condensed collection of parameters.
The basic operation of the natural language menu interface system will
first be described, and then the features of the present invention, which
permit very rapid construction of a user-customized implementation of such
a natural language interface system, will then be described. In essence, a
natural language menu input system permits the user to provide input in a
natural language subset, which has a predefined correspondence to some
particular desired set of constrained outputs (e.g. executable machine
commands). Partial inputs from the user are successfully parsed, and the
parser indicates all possible legal next entries which could follow the
input received so far from the user. The set of legal next words is then
displayed to the user as active portions of the menu, so that the user
need only select one of the legal next words from the menu. Thus, a 0%
error rate is guranteed, and training time is greatly decreased, since the
user is guided to correct inputs and to correct inputs only.
The present invention provides three major innovations which make such a
natural language menu system more useful to users. First, a
"build-interface" function is used, which creates a portable
specification, i.e. a compact statement of the parameters which are
sufficient to define the desired natural language interface to a
relational database, according to the inputs received from a particular
user expert. It should be noted that the user expert must be familiar with
types of queries which are likely to be demanded from the database, but
the user expert need not be a computer expert, and need not know any
programming language nor need he be familiar with grammars or lexicons.
Second, the "make-portable-interface" function derives a grammar and
lexicon from a portable specification, such that the grammar and lexicon
are then sufficient to implement a natural language menu system. Thus,
using the make-portable-interface function, a large number of portable
specifications can be stored, and each portable specification can be
rapidly constructed into the required grammar and lexicon when that
particular natural language interface is called upon. Thus, a large number
of natural menu interfaces can be kept accessable, without requiring large
amounts of storage for the grammar and lexicon of each interface, which
can be somewhat bulky. Third, the present invention also provides a
well-formedness test, which tests a grammar-lexicon pair to ensure that
they are well-formed. This automatic debugging for an automatically
generated natural-language expert system is novel and provides major
advantages.
The present invention will be described with primary reference to the
presently preferred embodiment, wherein data base queries and updates are
addressed to a relational data base. The natural language inputs are
parsed using a parser which is generally similar to that disclosed in the
1981 dissertation of Ken Ross entitled "Parsing English Phrase Structure".
The system is coded in LISP, and has been demonstrated on a LISP machine
from LMI Incorporated. However, as will be obvious to those skilled in the
art, the present invention can alternatively be realized using a wide
variety of other physical systems, and the key ideas of the invention can
also be translated into other programming languages than LISP. For
example, a further embodiment of the present invention, which is presently
under development, realizes the present invention on a Texas Instruments
Professional Computer, instructed by source code which is written in "C".
It should also be noted that the particular parser used in the preferred
embodiment is not necessary to the invention, but other parsers can be
used. Numerous parsers have been disclosed in the prior art. However, the
presently-preferred embodiment uses a parser which will be described in
great detail.
As discussed above, the end result which the present invention seeks to
achieve is to permit the user to input well-understood sentences in a
convenient language, which is preferably a subset of a natural language
(such as English). According to this invention, after the user has input
each successive word of a sentence, the set of all possible next words
which could form the continuation of a grammatical sentence is displayed
to the user, so that he needs only to select one of the possible next
words. (More broadly, some of the possible inmuts can be lumped together
as a category, so that one of the displayed items would be, e.g.,
"(specific number)", as will be discussed below.)
In general, this is accomplished by parsing partial inputs received from
the user as they are received, so that the parser generates the set of
possible functional descriptions which could be the next element in a
legal input, and displays a set of words (corresponding to these
functional descriptions of possible next elements). Thus, the end result
of the operation, after the user has input a complete sentence, is a
parsed command which is in accordance with the predefined grammar, and
therefore can be trivially translated into an executable instruction, as
will be discussed below.
The first step in the implementation of this method which will be described
in detail is the operation of the parser itself. A simplified version of
the parser will first be described, and then the modifications which
permit word-at-a-time parsing and prediction will be described. The
preferred parser can be specified in terms of nonstandard turing machine
instructions, as described by Griffiths and Petrick, which operate on the
upper elements of an alpha stack and a beta stack in accordance with the
current state of the upper elements of the two stacks. The topmost
elements of these stacks are represented as an ordered pair such as (XY,
ZW), where X is a string variable corresponding to the upper elements of
the alpha stack, Y is the next element of the alpha stack after X, Z
represents the upper elements of the beta stack, and W is the next element
of the beta stack. The parser rules can then be stated as:
1. If one of the rules of the grammar which can be applied is of the form
A.fwdarw.V.sub.1 V.sub.2 . . . V.sub.n, then we map a condition (V.sub.1
Y, X) onto (Y, V.sub.2 . . . V.sub.n t A X), and we attach V.sub.1 as a
node beneath A. However, this rule is applicable only if X is in the set
of "non terminals", that is if X itself represents a partial parsing tree
rather than a first-level input, such a word or the immediate
classification of a word (e.g. noun, adjective, determiner, etc.). The
symbol "t" is merely a placeholding symbol, which is used in this
formalism to permit proper application of the succeeding parser rules.
Thus, this first partial rule stores the possibility that V.sub.1 is the
beginning of a group which can be parsed under a higher level node A
(e.g., the possibility that, if V.sub.1 is a determiner, it is the
beginning of a noun phrase such "the parts"). Where an element is
associated beneath another element, this provides one step in building up
the tree of associated nodes which will eventually form a complete parse.
Thus, a sample of a completed parse, in a grammar roughly corresponding to
English, might be as follows:
##STR1##
The first association is always a direct translation from a specific word
(e.g. "cat") to its directly associated category ("noun"). However, it
should be preliminarily noted here that the categories used need not be as
broad as familiar English-language parts of speech, but, at least for
database-query applications, are preferably modified to include some
semantic information. Thus, in the presently preferred embodiment,
categories such as "number-noun phrase", "worker-noun phrase",
"operation-noun phrase", "job card-noun phrase", and others are used.
(Such grammars are known in the art as "semantic grammars".)
Intuitively, parser Rule 1 means that, where a rule of grammar might be
applicable to a topmost element on stack alpha (V.sub.1), that element is
removed from the top of stack alpha, and stack beta is pushed down,
successively, with the hypothetical parent node of V.sub.1 (together with
V.sub.1 associated below it), with a place holding symbol "t", and with
the remaining symbols which would be required by the possibly-applicable
rule of grammar (V.sub.2 . . . V.sub.n). For convenience, we restate
parser rule 1 again:
Rule 1: If one of the available rules of grammar applied can be stated as
A.fwdarw.V.sub.1 V.sub.2 . . . V.sub.n ; and if a non terminal X is on top
of stack beta; then (V.sub.1 Y, X) maps onto
##STR2##
2. The second rule of the parser shows how the place holding symbol t is
used to indicate that a perfected partial parse can be transferred from
stack beta to stack alpha:
Rule 2: If the place holding symbol t is on top of the stack beta, and is
immediately followed by a nonterminal symbol A, the place holding symbol t
is deleted and the nonterminal symbol A is transferred from the top of
stack beta over to stack alpha:
##STR3##
Intuitively, the allocation of functions between stacks alpha and beta is
that stack alpha contains only parse trees or well-formed subparts of
parse trees, whereas stack beta is used to hold partially formed subpart
of parse trees.
3. The third rule is used to determine when a necessary portion of the
elements required to complete a parse have been found. The formal
statement is:
Rule 3:
##STR4##
The operation of the parser is initiated by loading the string of words to
be parsed into stack alpha, followed by a numeral sign, and by loading the
higher level symbol corresponding to the root node of the tree to be
constructed, followed by a numeral sign, on stack beta. That is, where (as
is usual) the parser is serving to organize strings of words into properly
parsed sentences, the symbol which indicates a complete sentence in the
grammar (in rules such as S.fwdarw.NP VP) is loaded onto stack beta. Thus,
the initial condition of the procedure can be represented as (W.sub.1
W.sub.2 . . . W.sub.n #, S#).
A successful parse has been found when the final condition of the two
stacks is (S#, S#). In this case, the nodes below the S on the alpha stack
provide the desired complete parse. If the root node (i.e. the object of
the parsing) is not a sentence parse, but, for example, a noun phrase
parse, then the condition indicating a successful parse would be
correspondingly different, e.g. (NP#, NP#).
When no rule is applicable, the instant parsing path represents a bad
parse, and the parser must retrace and follow a different parsing path.
When more than one parser rule can be applied, and/or when more than one
rule of the grammar can be applied through parser rule 1, alternative
parsing paths must be followed. Eventually, all of the alternative parsing
paths will result in either successful or unseccessful parses. The set of
seccessful parses represents the output of the parser, after completion of
the parsing process. This has substantial advantages in the present
invention, as described below.
An example of the operation of this simplified version of a parsing
procedure will now be given. Suppose, for example, that the grammar to be
applied is as follows:
(3.4) Grammar
S.fwdarw.A B
S.fwdarw.A B C
A.fwdarw.A C
A.fwdarw.x
B.fwdarw.C B
B.fwdarw.z
C.fwdarw.y
In this example, the input string is "xyz", which is to be parsed as an
"S".
TABLE 1
______________________________________
Parser Grammar
Rule Stack Alpha Stack Beta Rule
______________________________________
x y z # S #
Rule (1)
y z #
##STR5##
##STR6##
Rule (2)
##STR7## S #
Rule (1) applicable 3 Ways:
See continuations 1.1, 1.2 and 1.3
Cont. 1.1 Rule (1)
y z #
##STR8##
##STR9##
Rule (1)
z #
##STR10##
##STR11##
Rule (2)
##STR12##
##STR13##
Rule (1)
z #
##STR14##
##STR15##
Rule (1)
#
##STR16##
##STR17##
Rule (2)
##STR18##
##STR19##
Rule (3)
#
##STR20##
Rule (2)
##STR21##
##STR22##
Rule (3)
#
##STR23##
Rule (2)
##STR24## S #
PARSE FOUND
Cont. 1.2
Rule (1)
y z #
##STR25##
##STR26##
Rule (1)
z #
##STR27##
##STR28##
Rule (2)
##STR29##
##STR30##
Rule (1)
z #
##STR31##
##STR32##
Rule (1)
#
##STR33##
##STR34##
Rule (2)
##STR35##
##STR36##
Rule (3)
#
##STR37##
Rule (2)
##STR38##
##STR39##
Rule (3)
#
##STR40##
BAD PATH - NO INSTRUCTIONS APPLY
Cont. 1.3 Rule (1)
y z #
##STR41## A AC
Rule (1)
z #
##STR42##
##STR43##
Rule (2)
##STR44##
##STR45##
Rule (1) and Rule (3) both applicable -
See continuations 2.1 and 2.2
Cont. 2.1
Rule (1), $
z #
##STR46##
##STR47##
Rule (1)
#
##STR48##
##STR49##
Rule (2)
##STR50##
##STR51##
Rule (3)
#
##STR52##
Rule (2)
##STR53##
##STR54##
BAD PATH - NO INSTRUCTIONS APPLY
Cont. 2.2-
Rule (3)
z #
##STR55##
Rule (2)
##STR56## S #
Rule (1) applicable 3 ways -
See continuations 3.1, 3.2 and 3.3
Cont. 3.1 Rule (1)
z #
##STR57##
##STR58##
Rule (1)
#
##STR59##
##STR60##
Rule (2)
##STR61##
##STR62##
Rule (3)
#
##STR63##
Rule (2)
##STR64## S #
PARSE FOUND
Cont. 3.2 Rule (1)
z #
##STR65##
##STR66##
Rule (1)
#
##STR67##
##STR68##
Rule (2)
##STR69##
##STR70##
Rule (3)
#
##STR71##
BAD PATH - NO INSTRUCTIONS APPLY
Cont. 3.3 Rule (1)
z #
##STR72##
##STR73##
Rule (1) $
#
##STR74##
##STR75##
Rule (2)
##STR76##
##STR77##
BAD PATH - NO INSTRUCTIONS APPLY
DONE
______________________________________
The foregoing carefully-worked-out example shows in detail the operation of
a parser based on a context-free grammar. It can easily be seen from this
example that the formal manipulation rules as set forth above translate
very simply into programmable operations. For example, the final status of
stack beta, in the very last step of the preceding worked-out example,
could be easily represented in Lisp by a list formatted, e.g., as follows:
(C t (A (A (A x) (C y))) S).
Note that the # is not required in the Lisp implementation, since an empty
list is easily tested for.
As noted, the parser set forth above must be modified for use in the
present invention. The parser used in the present invention must be able
to produce the set of items which can come next, in addition to creating a
| | |