|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates generally to recognition of user inputs to a
computer system such as a pen-based computer system and, more
particularly, to methods for analyzing the user inputs and providing
interpretations of those user inputs.
A pen-based computer system is a small, often hand-held, computer system
where the primary method for inputting data includes a "pen" or stylus. A
typical pen-based computer system is housed in a small, rectangular
enclosure, and has a dual-function display assembly having a viewing
screen exposed along one of the planar sides of the enclosure. The
dual-function display serves as both an input device and an output device.
When operating as an input device, the display assembly senses the
position of the tip of the stylus on the viewing screen and provides this
information to the computer's central processing unit (CPU). Some display
assemblies can also sense the pressure of the stylus on the screen to
provide further information to the CPU. When operating as an output
device, the screen displays computer-generated images developed by the
CPU.
The dual-function display assemblies of pen-based computer systems permit
users to operate the computer as a computerized notepad. For example,
graphical images can be input into the pen-based computer by merely moving
the stylus over the surface of the screen. As the CPU senses the position
and movement of the stylus, it generates a corresponding image on the
screen to create the illusion that the stylus is drawing the image
directly upon the screen. With suitable recognition software, text and
numeric information can also be entered into the pen-based computer system
in a similar fashion.
What is needed is a system for recognition of user inputs to a computer
system such as a pen-based system and for analysing the user inputs to
determine possible meanings.
SUMMARY OF THE INVENTION
The present invention provides a system and a method for analyzing and
interpreting user inputs to a computer, such as a hand-held, pen-based
computer system, described herein. Inputs are received at a user
interface, such as a dual function display/input screen, from users in the
form of pen strokes or gestures. These pen strokes or gestures are
preferably input into the CPU as an array of X and Y data points
corresponding to the path of a stylus across the display/input screen. A
database stores the input data points and hypotheses regarding possible
interpretations of the strokes represented by the array of data points.
Recognition of the input strokes and recognition of higher level
combinations of strokes (forming characters and words, etc.) is performed
using recognition domains, each of which performs a particular recognition
task. A controller is provided for the recognition domains. An arbiter is
provided for resolving conflicts among competing hypotheses
The recognition domains, or recognizers generate one or more competing
interpretations for the same input. The recognizers use a data structure
called a unit, where a unit is a set of subhypotheses together with all
their interpretations generated by a single recognizer. An interpretation
is a description of a particular portion of the input data (strokes),
where the description is based on the strokes or on other lower-level
interpretations. A recognizer operates at a first level for identifying
one or more groups of related subhypotheses using the unit absent any
interpretations and stores the unit in the database in piece-pool memory.
A recognizer has a second level of operation where each unit generated in
the grouping stage is classified to provide the unit with one or more
interpretations. The classified unit is stored in a unit pool memory.
One or more interpretations of the input data are combined in a
hierarchical structure according to a predetermined scheme in successive
steps to form higher level interpretations.
For a given hierarchy, an independent recognition domain has two required
parts and one optional part. The required parts are: (1) information to
establish the position of the domain in the hierarchy, and (2) knowledge
about how to perform a particular recognition task; and the optional part
is conditions and context constraints that the input data has to satisfy
to be considered for recognition by a recognizer. The information to
establish a domain as a part of a given hierarchy is information about
which type of hypotheses the domain takes as input and information about
which type of hypothesis the domain generates so that the type of input
hypotheses for a domain uniquely establishes its position in the
hierarchy. The knowledge on how to perform a particular recognition task
is subdivided into two parts, where one part is the grouping knowledge for
deciding which of the available strokes or hypotheses should be considered
as a whole to form a new interpretation, and where the other part is
classification knowledge for generating a list of interpretations from a
given set of hypotheses.
A hypothesis is removed from the database when it reaches the top of the
recognition hierarchy and any conflicts with another hypothesis have been
resolved by the arbitration process. When a hypothesis is removed from the
database, its supporting hierarchy of hypotheses, down to the constituent
strokes, is removed from the database and all other hypotheses in the
database that refer to the these constituent strokes, or the hypotheses
that use them, are removed.
The means for extracting information from each isolated portion of the
input data, the means for assigning an interpretation to each portion of
the input data; the means for combining two or more interpretations of the
input data to form new more integrated interpretations; the additional
means for combining two or more new interpretations to form newer
interpretations until a fully integrated interpretation of the input data
is formed, and the means for combining two or more interpretations of the
input data to form new more integrated interpretations includes a
recognition hierarchy for the task of recognizing, for example,
handwritten words and simple graphical shapes.
The controller, or control unit, includes means for deducing from the
information provided by the recognizers, or domains, precisely which
domains should be active within a specific area of the user-input screen
of, for example, a pen-based system. For recognition of words, the control
unit uses a character-part domain, a character domain, and a word domain
within that specific area. Means are provided for pre-computing for each
recognition area of a user screen what action the recognizer should take
when an input of each of the expected types is seen within that area.
The system can resolve conflicts among competing hypotheses and determines
when the recognition process is completed for a given stroke or set of
strokes. The system provides means for generating a hypothesis; means for
registering and accumulating all of the hypotheses generated by each
different recognizer; and means for finding those hypotheses that best
account for the input data. One technique for finding those hypotheses
that best account for the input data includes means for comparing both the
scores of the various interpretations,and the strokes that are accounted
for by those interpretations. If the score of the best hypothesis is
clearly better than that of competing hypotheses, the recognized object is
emited to the user module so that the user module that requested the
object can then examine its own context and decide whether or not to
accept the interpretation.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and form a part of
this specification, illustrate embodiments of the invention and, together
with the description, serve to explain the principles of the invention:
FIG. 1 is a block diagram of a pen-based computer system in which the
present invention is used.
FIG. 2 is a pictorial representation of the screen of a computer display
assembly in accordance with the present invention.
FIG. 3 is a simplified block diagram modelling the architecture of a
recognition system in accordance with the present invention.
FIG. 4 is a diagram illustrating a simple recognition hierarchy for
recognition of words and simple-shape graphics.
FIG. 5 is a diagram representing three layers of recognition into which
inputs to the the display screen can be categorized, namely, an
alwaysArea, a priorityArea, and a defaultArea.
FIG. 6 is a diagram illustrating the flow of information in a system
according to the invention.
FIG. 7 is a flow chart illustrating the sequence of events associated with
performing a recognition sequence.
FIG. 8 is a flow chart illustrating a sequence of programmed steps for
adding a recognizer domain.
FIG. 9 is a flow chart illustrating a sequence of programmed steps for
adding or deleting display screen areas.
FIG. 10 is a flow chart illustrating a sequence of programmed steps for
collecting new input events to the system.
FIG. 11 is a flow chart illustrating a sequence of programmed steps for
performing a recognition sequence.
FIG. 12 is a flow chart illustrating a sequence of programmed steps for
initiating a group operation.
FIG. 13 is a flow chart illustrating a sequence of programmed steps for
grouping one or more groups of related subhypotheses using domain grouping
knowledge.
FIG. 14 is a flow chart illustrating a sequence of programmed steps for
initiating a classify operation.
FIG. 15 is a flow chart illustrating a sequence of programmed steps for
classifying.
FIG. 16 is a flow chart illustrating a cleanup routine for a unit claimed
by an application.
FIG. 17 is a flow chart illustrating a routine for an application handling
a recognized object and for marking claimed units for deletion.
FIG. 18 is a flow chart for the ClaimUnits method, which is a recursive
method called by an application to mark all claimed units for deletion.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
A Pen-Based Computer System
In FIG. 1, a pen-based computer system 10 in accordance with the present
invention includes a central processing unit (CPU) 12, read only memory
(ROM) 14, random access memory (RAM) 16, input/output (I/O) circuitry 18,
and a display assembly 20. The pen-based computer system 10 may also
optionally include a mass storage unit 22 such as a disk drive unit or
non-volatile memory such as flash memory, and an array of input buttons
23.
The CPU 12 is preferably a commercially-available, single chip
microprocessor. While CPU 12 can be a complex instruction set computer
(CISC) chip, it is preferable that CPU 12 be one of the commercially
available, reduced instruction set computer (RISC) chips which are known
to be of generally higher performance than CISC chips. CPU 12 is coupled
to ROM 14 by a uni-directional data bus 24. ROM 14 contains the basic
operating system for the pen-based computer system 10. CPU 12 is connected
to RAM 16 by a bi-directional data bus 26 to permit the use of RAM 16 as
scratch pad memory. ROM 14 and RAM 16 are also coupled to CPU 12 by
appropriate control and address busses, as is well known to those skilled
in the art. CPU 12 is further coupled to the I/O circuitry 18 by
bi-directional data bus 28 to permit data transfers with peripheral
devices.
I/O circuitry 18 typically includes a number of latches, registers and
direct memory access (DMA) controllers. The purpose of I/O circuitry 18 is
to provide an interface between CPU 12 and such peripheral devices as
display assembly 20, mass storage 22, and the array of input buttons 23.
Display assembly 20 of pen-based computer system 10 is both an input and an
output device. Accordingly, it is coupled to I/O circuitry 18 by a
bi-directional data bus 30. When operating as an output device, the
display assembly 20 receives data from I/O circuitry 18 via bus 30 and
displays that data on a suitable screen. The screen for display assembly
20 is preferably a commercially available liquid crystal display (LCD)
which is available from a variety of manufacturers. The input device of
display assembly 20 is preferably a thin, clear membrane which covers the
LCD display and which is sensitive to the position of a stylus 32 on its
surface. These position-sensitive membranes are also readily available on
the commercial market. Combination display assemblies such as display
assembly 20 which include both the LCD output screen and the input
membrane are commercially available from such vendors as Scriptel
Corporation of Columbus, Ohio.
Other types of pointing devices can also be used in conjunction with the
present invention. While the method of the present invention is described
in the context of a pen-based system, other pointing devices such as a
computer mouse, a track ball, or a tablet can be used to manipulate a
pointer on a screen. Therefore, as used herein, the terms "pointing
device", "pointing means", and the like will refer to any mechanism or
device for pointing to a particular location on a screen of a computer
display.
Some type of mass storage 22 is generally considered desirable. However,
the mass storage 22 can be eliminated by providing a sufficient amount of
RAM 16 to store user application programs and data. In this instance, the
RAM 16 can be provided with a back-up battery to prevent the loss of data
even when the pen-based computer system 10 is turned off. However, it is
generally desirable to have some type of long term storage 22 such as a
commercially available miniature hard disk drive, or non-volatile memory
such as flash memory or battery-backed RAM.
In operation, information is input into the pen-based computer system 10 by
"writing" on the screen of display assembly 20 with the stylus 32.
Information concerning the location of the stylus 32 on the screen of the
display assembly 20 is input into the CPU via I/O circuitry 18. The CPU 12
then processes the data under control of an operating system and/or
application program stored in ROM 14 and/or RAM 16. The CPU 12 then
produces data which is output to the display assembly 20 to produce
appropriate images on its screen.
In FIG. 2, the pen-based computer system 10 is shown housed within an
enclosure 36. The CPU 12, ROM 14, RAM 16, I/O circuitry 18, and mass
storage 22 are preferably fully enclosed within the enclosure 36. The
display assembly 20 is mostly enclosed within the enclosure 36, but
exposes a viewing screen 38. As used herein, the term "screen" will refer
to the portion of the display assembly 20 which can display an image that
can be viewed by a user. Also accessible to the user are the array of
input buttons 23.
RECOGNITION ARCHITECTURE
FIG. 3 is a block diagram of a preferred recognition architecture 100 of a
recognition system of the present invention. This recognition architecture
depicts the organization of a system of data structures and procedures for
analyzing user inputs to a computer and for determining possible
interpretations of those user inputs. The following describes the
organization of the knowledge and data within the recognition architecture
model and the problem solving approach taken using object-oriented
programming techniques and terminology.
For a pen-based computer the user inputs are in the form of strokes of a
pen drawn on an electronic tablet, keystrokes, and voice inputs. This
recognition model can also be extended to other input sources, including
those of non-human origin. In places where the handwritten inputs are
described as characters or words, these inputs can also be numbers,
formulas, or graphic diagrams.
The recognition architecture model 100 consists of several basic system
components: recognition domain group 102; database 104; a control unit
106; recognition areas 108; and an arbiter 110.
The recognition domain group 102 includes a number of recognition domains.
A recognition domain is a recognizer, which contains the knowledge and
methods required to recognize a particular object (for instance,
characters or words).
The database 104 contains two arrays: a piece pool 111 and a unit pool 112.
These two arrays store the input data strokes to the recognition system
and the active recognition hypotheses. The recognition domains perform
changes to the strokes and hypotheses (via the controller). These changes
lead progressively to the recognition of the input. The pool mechanism
eliminates the need for direct communication between domains while
providing a uniform communication mechanism.
The controller, or control unit 106, manages the whole process. It
maintains communication with the recognition areas. It stores and
retrieves hypotheses from the unit and piece pools and ensures their
consistency. It hands the hypotheses to the appropriate domains for
recognition.
The recognition areas are determined by application components, which
specify areas of the pen-based screen (a text field, for instance) and the
type of objects that are expected in these areas. The controller maintains
a list of these areas, and when an object of the specified type (a word,
for instance) is recognized within an area, the controller passes the
object back to the application component.
A recognition domain 102 includes the knowledge and procedures to perform a
particular recognition task (e.g. word recognition, paragraph
recognition). Recognition domains are independent and are organized into a
hierarchy.
The database 104 stores the input data (strokes) to the recognition system
and the active recognition hypotheses. The recognition domains 102 perform
changes to strokes and hypotheses in the database 104. These changes lead
progressively to recognition of the input under the control of the
controller, or control unit 106, which manages the database 104 and
schedules the recognition tasks. A recognition domain 102 does not have
direct access to the database 104. Rather, the control unit 106 provides,
according to a schedule, appropriate strokes or hypotheses to each
recognition domain 102 for recognition. Then, some time later, the
recognition domain 102 returns to the control unit 106 new hypotheses for
storage into the database 102.
The arbiter 110 coexists with the controller and resolves conflicts among
competing hypothesis and also determines when the recognition process is
completed for a given stroke or set of strokes. The recognized unit is
then returned to the application program which requested recognition. The
application extracts information from the recognized unit and verifies to
the controller 106 that the extraction has taken place. After this
verification, the controller 106 removes the recognized unit and all units
related to the recognized unit from the memory arrays 104.
Recognition Areas
In order to receive input, an application needs to specify an area of the
display/input screen and the type of object it expects in that area. This
process is known as registering an area. A toolbox provides the view class
for this purpose.
A recognition area is represented by the TRecArea class:
______________________________________
class TRecArea : public TObject {
public:
ULong fArea;
TTypeAssoc *fATypes; // arbitration unit types
TTypeAssoc *fGTypes; // grouping unit types
// classification unit types = atypes + gtypes)
public:
static TRecArea *Make(ULong area, ULong flags);
void Dispose(void);
void AddAType(ULong atype, ULong (*Handler)
(TSIUnit *, ULong, dInfoRec *),
dInfoRec *dInfo);
};
______________________________________
The controller maintains a list of registered areas. When a new unit is
presented to the recognition system (usually a stroke), the controller
searches its list to determine which of the reas registered by the
application are hit by the new unit. An area is "hit" if the unit is more
than half in the area. This measurement is based on the bounding
rectangles of the areas and unit, and their intersection.
All the areas hit by this unit are collected into a list known as an
AreaList. This AreaList is then stored into the unit itself. It is also
stored into all units that are derived from this original unit. An
advantage of this technique is that when a new classification is assigned
to a unit, the unit's areaList can be scanned directly to determine who
should be called next, and no redundant searching of the controller's
areas is necessary. The controller maintains data structures that help it
perform only the necessary types of recognition in each area. If the area
is no longer needed, the method RemoveArea can be called, with the id
returned by RegisterArea as parameter.
The application that is interested in having recognition performed in a
particular area called the TController method RegisterArea, specifying the
type of area, the area, the type of units that it wants recognized, and a
routine to be called when an object of the appropriate type is recognized
(the "Display" routine). The area is passed in as a rectShape (whose
structure may vary, depending on the graphics system in use). An area can
be created by calling GNewRShape. The unit type is the same type that is
passed to the initialize routine of the domain that recognizes units of
that type. The display routine is also discussed below.
There is an alternate method for registering areas which allows multiple
types of things to be recognized within a single area. To do this, you
must first create a new recognition area by calling TRecArea::Make, and
passing in the rectShape that defines the area. Next you specify the types
of units that should be recognized, calling TRecArea::AddAType once for
each type to be added (and a "Display" routine for each). Once all types
have been added, call TController::BuildAndRegister on the area. In the
section on Display routines, below, there is further description of the
way that you are notified when recognition occurs.
Hierarchical Recognition Structure
FIG. 4 illustrates a simple recognition hierarchical structure 120 for
recognition of words and simple-shape graphics. It uses a problem solving
technique called a bottom-up approach for solving a complex recognition
problem. The first step is to isolate portions of the input data. The next
step is to extract information from each isolated portion. Then an
interpretation is assigned to each portion. Following this, one or more
interpretations of the input data are combined to form new, more
integrated interpretation. Some of the new interpretations are then
combined to form newer interpretations, and so on, until a fully
integrated interpretation of the input data has been formed.
A bottom-up recognition approach is visualized by a hierarchical structure,
or hierarchy, where each node in the hierarchy performs some type of
recognition. The lowest level in the hierarchy is the level closest to the
data and the level below the root node represents the last level of
integration among hypotheses. At the root node of the hierarchy, if there
are competing hypothesis for the data, the best one is chosen according to
some criterion. We will refer to lowest level in the hierarchy as the
first level, to the next level up as the second level and so on.
FIG. 4 presents an example of a simple recognition hierarchical structure
120 for the task of recognizing handwritten words and simple graphical
shapes as inputs. This recognition hierarchy is used to illustrate many of
the issues that arise in the recognition process.
To explain the recognition process in more detail it is necessary to
introduce two terms: interpretation and hypothesis. An interpretation is a
description of a portion of the input data (strokes). This description may
be based directly on the strokes or it may be based on other
interpretations. For example, if we are given the strokes representing the
script letters "c", "a", "r", an interpretation of these strokes is the
word "car". This interpretation may be based directly on the fact that the
strokes form such a word; or it may be based on the fact that the strokes
are interpreted as the letters c, a and r, and that the word car is made
up by these three interpretations. A hypothesis is a portion of the input
data together with an interpretation of the data. A hypothesis may be
derived directly from the input data or from other hypothesis.
In the simple recognition hierarchical structure 120 of FIG. 4, there are
two nodes 122, 124 at the first level and two nodes 126,128 at the second
level. There is one node 130 at the third level. At the highest, or root
node, 132 is performed an arbitration process among the various
hypotehesis produced. The left node 122 at the first level isolates
strokes and recognizes character parts. The right node 124 at the first
level isolates strokes and recognizes graphical shape parts. The left node
126 at the second level integrates character parts to form character
interpretations, and the right node 128 integrates shape parts to form
shape interpretations. Finally, the node 130 at the third level integrates
characters to form words. At the root node 132, arbitration among
hypotheses takes place when necessary.
In the simple recognition hierarchical structure 120, the bottom-up
approach to recognition of handwritten data is utilized, where a
recognition domain is at each node of the hierarchy, except the root node.
A recognition domain can be thought of as an element that encapsulates all
the knowledge necessary to perform a well defined recognition task.
Recognition domains are discussed in more detail herein after. The
recognition domains 122, 124 at the first level in the hierarchy isolate a
stroke or set of strokes from the input and produce very simple hypotheses
by analyzing the strokes (e.g. character parts, shape parts). The
recognition domains 126,128 at the second level in the hierarchy select
some of the hypotheses generated by the domains at the first level, and
generate new hypotheses by combining the ones that were selected (e.g.
characters, shapes). The domains at the next levels proceed in the same
fashion until the top of the hierarchy is reached.
In order for recognition to be successfully completed using the bottom-up
approach proposed, several additional problems need to be solved. One
problem to solve is hypothesis proliferation. By this we mean to prevent
the number of hypotheses present in the system at any given time from
growing without bounds. A second problem to solve is the arbitration of
conflicting and competing hypotheses. This problem arises because
recognition domains are independent from each other and may propose
alternative interpretations for the same data. We say that two or more
hypotheses are competing hypotheses if they use the same subhypotheses,
and that two or more hypotheses are conflicting hypotheses if they share a
subhypothesis.
In this system, both problems are solved with some help provided by the
recognition domains. To prevent hypothesis proliferation, each recognition
domain is required to keep the number of hypotheses it proposes under a
given limit. To help arbitration, each domain assigns a numerical value to
its hypotheses. This numerical value is called a score. The score of a
hypothesis reflects the degree of confidence that the domain has in the
hypothesis. The controller and the application interact to handle a
recognized object and cleanup units after being claimed by an application.
The recognition task is accomplished by independent recognition domains
organized into a hierarchical structure. Essentially, for a given
hierarchy, a recognition domain has two required parts and one optional
part. The required parts are: (1) information to establish the position of
the domain in the hierarchy, and (2) knowledge about how to perform a
particular recognition task. The optional part of a recognition domain is
concerned with conditions that the input data has to satisfy to be
considered for recognition by the domain and context constraints for
recognition.
The information to establish a domain as a part of a given hierarchy is
which type of hypotheses the domain takes as input and which type of
hypothesis the recognition domain generates. The type of input hypotheses
for a recognition domain uniquely establishes its position in the
hierarchy.
The knowledge of how to perform a particular recognition task can be
thought of as being subdivided into two parts, grouping knowledge and
classification knowledge. Grouping knowledge is the knowledge for deciding
which of the available strokes or hypotheses should be considered as a
whole to form a new interpretation. Classification knowledge is the
knowledge for generating a new interpretation from a given set of
hypotheses.
The optional part of a recognition domain provides a mechanism for ensuring
that a hypothesis produced by a recognition domain satisfies some
pre-established constraints. For instance, in an application that has
fields or areas in a form, we may want to specify an area which will
accept only date information. Hence, we need a mechanism to specify that
the input data to the field can only be recognized as a date and is
rejected if it is not a date. Thus each recognition domain can provide a
grammar, or rulebook, for limiting and sequencing its inputs.
Referring back to FIG. 3, in the recognition architecture model 100 of FIG.
3, strokes are stored in the piece pool 111 of the database 104. In
addition, the intermediate hypotheses generated by the domains during a
recognition task are also stored in the unit pool 112 of the database 104.
The database 104 eliminates the need for direct communication among
recognition domains while providing an indirect, but uniform,
communication mechanism between recognition domains.
A stroke is stored in the piece pool 111 of the database 104 as it enters
the system. As the various levels of the recognition domains in the
recognition heirarchy are activated, these recognition domains store their
hypotheses in the database 104. A hypothesis is removed from the database
104 when it reaches the top of the recognition hierarchy and any conflicts
with another hypothesis has been resolved by the arbiter 110. When a
hypothesis is removed from the database 104, its supporting hierarchy of
hypotheses, down to the constituent strokes, is removed from the database
104. In addition all other hypotheses in the database 104 that refer to
the these constituent strokes, or the hypotheses that use them, are
removed.
In this system, control function for the system is centered in the control
unit 106. The control unit 106 has three main functions: 1) the control
unit 106 stores and retrieves hypotheses from the database 104 and it
ensures the integrity of the database 104; 2) the control unit 106 hands
the appropriate hypotheses to the recognition domains 102 for grouping and
classification; and 3) the control unit 106 schedules grouping and
classification tasks.
The recognition function involves the use of recognition areas. A
recognition area is a specific area on the input screen of a pen-based
computer. When a recognition client (such as a document) wants to perform
specific types of recognition in a specific area, the recognition client
informs the control unit 106 about the types of recognition that the
recognition client is interested in. The recognition client also informs
the control unit about the screen areas, in which information is to be
recognized. The control unit 106 deduces from the information provided by
the recognition domains, precisely which recognition domains should be
active within a designated area of the screen. For example, if an area is
| | |