WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Recognition system and method for user inputs to a computer system    

Get related patents on CD
United States Patent5463696   
Link to this pagehttp://www.wikipatents.com/5463696.html
Inventor(s)Beernink; Ernest H. (San Carlos, CA); Pagallo; Giulia (Cupertino, CA); Bozinovic; Radmilo (San Jose, CA)
AbstractA technique for analyzing and interpreting user inputs to a computer, such as strokes, key depressions, or voice signals to a hand-held, pen-based computer system. Inputs to the system are received at a user interface, such as a dual function display/input screen from users in the form of pen strokes or gestures. A database stores the input data strokes and hypotheses regarding possible interpretations of the strokes. Recognition of the input strokes and recognition of higher level combinations of strokes (forming characters and words, etc.) is performed using recognizers, or recognition domains, each of which performs a particular recognition task. A controller is provided for controlling the hypotheses database and for scheduling the recognition tasks in the recognition domains. Arbitration resolves conflicts among competing hypotheses associated with each interpretation. The recognition domains, or recognizers generate two or more competing interpretations for the same input. The recognizers use a data structure called a unit, where a unit is a set of subhypotheses together with all their interpretations generated by a single recognizer. A recognizer operates at a first level for identifying one or more groups of related subhypotheses using grouping knowledge. These grouped subhypotheses generate a unit with no interpretations for each group and store the unit in the database in what is called a piece-pool memory. A recognizer has a second level of operation where each unit generated in the grouping stage is classified to provide the unit with one or more interpretations. The classified unit are stored in a unit pool memory. Two or more interpretations of the input data are combined in a hierarchical structure according to a predetermined scheme in successive steps to form higher level interpretations.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Inventor     Beernink; Ernest H. (San Carlos, CA); Pagallo; Giulia (Cupertino, CA); Bozinovic; Radmilo (San Jose, CA)
Owner/Assignee     Apple Computer, Inc. (Cupertino, CA)
Patent assignment
All assignments
Company News
Publication Date     October 31, 1995
Application Number     08/270,405
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     July 5, 1994
US Classification     382/186
Int'l Classification     G06K 009/00
Examiner     Couso; Jose L.
Assistant Examiner    
Attorney/Law Firm     Hickman & Beyer
Address
Parent Case     This is a continuation of application Ser. No. 07/889,216 filed May 27, 1992 now abandoned.
Priority Data    
USPTO Field of Search     382/9 382/10 382/13 382/22 382/25 382/30 382/34 382/48
Patent Tags     recognition user inputs computer
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5197107
Katsuyama
382/209
Mar,1993

[0 after 0 votes]
5191622
Shojima
382/185
Mar,1993

[0 after 0 votes]
5038382
Lipscomb
382/189
Aug,1991

[0 after 0 votes]
5034989
Loh

Jul,1991

[0 after 0 votes]
4933865
Yamamoto
382/203
Jun,1990

[0 after 0 votes]
4680804
Kuzunuki
382/185
Jul,1987

[0 after 0 votes]
4653107
Shojima
382/189
Mar,1987

[0 after 0 votes]
5157737
Sklarew
382/315
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A system for recognizing user input to a computer from a user input device, comprising:

input means for receiving user input data in the form of a signal from a user input device chosen from the group consisting essentially of a tablet input device and a voice input device;

means for storing said user input data in memory of said computer system;

a hierarchy of independent recognition domains, each of said hierarchy of independent recognition domains comprising group means and classify means, including:

a first recognition domain for receiving a portion of said user input data as a first recognition domain input and performing character-related recognition on the user input data only if the user input data is of the type expected in a screen recognition area and generating one or more active character-related hypotheses, each of said active character-related hypotheses comprising a portion of said user input data together with a character-related interpretation of said user input data;

a second recognition domain for receiving at least one of said active character-related hypotheses as a second recognition domain input and performing word-related recognition on the user input data only if the user input data is of the type expected in said screen recognition area and generating one or more active word-related hypotheses, at least one of said active word-related hypotheses being derived from said one of said active character-related hypotheses;

a third recognition domain for receiving a portion of said user input data as a third recognition domain input and performing shape part recognition on the user input data only if the user input data is of the type expected in said screen recognition area and generating one or more active shape-pan-related hypotheses, each of said shape-part-related hypotheses comprising a portion of said user input data together with a shape-part-related interpretation of said user input data; and

a fourth recognition domain for receiving at least one of said active shape-part-related hypotheses as a fourth recognition domain input and performing shape recognition on the user input data only if the user input data is of the type expected in said screen recognition area and generating one or more active shape-related hypotheses, at least one of said active shape-related hypotheses being derived from said one of said active shape-part-related hypotheses;

means for storing said active hypotheses in a database comprising a piece pool and a unit pool in said memory of said computer system;

a control unit including:

first control means for managing the means for storing active hypotheses, said first control means further including means for removing a previously stored hypothesis from said database, said means for removing further removes a hierarchy of hypotheses from which said previously stored hypothesis is derived, down to constituent strokes as well all other hypotheses in said database that refer to said constituent strokes; and

second control means for scheduling the recognition tasks in said recognition domains;

arbiter means for resolving multiple active hypotheses from said recognition domains and choosing a recognition result among said multiple active hypotheses from said recognition domains; and

a user output device for communicating to said user said recognition result.

2. The system of claim 1 wherein said group means groups domain inputs into groups, said classify means generates one or more hypotheses from said groups and being a lower priority task than said group means, said classify means being activated after a settable delay time period following a completion of a grouping task by said group means.

3. The system of claim 2 wherein said group means includes:

means for identifying one or more groups of related domain inputs using domain grouping knowledge;

means for generating a unit with no hypothesis for each group and storing said unit in the database; and

means for storing units with no hypothesis in said piece-pool; and

wherein said classify means includes:

means for receiving each unit generated by the grouping means and providing a unit with a hypothesis; and

means for storing units with one or more hypothesis in said unit pool.

4. The system of claim 2 wherein said second recognition domain further comprises means for combining at least two interpretations from said one or more active character-related hypotheses to form a higher level interpretation for one of said active word-related hypotheses.

5. The system of claim 4 wherein said hierarchy of independent recognition domains comprises multiple, independent recognition domains organized into a hierarchical structure based upon hierarchy information, said hierarchy information comprises information about domain inputs and information about which type of hypothesis the domain generates so that the type of input hypotheses for a domain establishes its position in said hierarchical structure.

6. The system of claim 5 wherein each of said hierarchy of independent recognition domains further comprises means for preventing hypothesis proliferation, said means for preventing hypothesis proliferation keeps the number of hypotheses a recognition domain proposes under a given limit.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The present invention relates generally to recognition of user inputs to a computer system such as a pen-based computer system and, more particularly, to methods for analyzing the user inputs and providing interpretations of those user inputs.

A pen-based computer system is a small, often hand-held, computer system where the primary method for inputting data includes a "pen" or stylus. A typical pen-based computer system is housed in a small, rectangular enclosure, and has a dual-function display assembly having a viewing screen exposed along one of the planar sides of the enclosure. The dual-function display serves as both an input device and an output device. When operating as an input device, the display assembly senses the position of the tip of the stylus on the viewing screen and provides this information to the computer's central processing unit (CPU). Some display assemblies can also sense the pressure of the stylus on the screen to provide further information to the CPU. When operating as an output device, the screen displays computer-generated images developed by the CPU.

The dual-function display assemblies of pen-based computer systems permit users to operate the computer as a computerized notepad. For example, graphical images can be input into the pen-based computer by merely moving the stylus over the surface of the screen. As the CPU senses the position and movement of the stylus, it generates a corresponding image on the screen to create the illusion that the stylus is drawing the image directly upon the screen. With suitable recognition software, text and numeric information can also be entered into the pen-based computer system in a similar fashion.

What is needed is a system for recognition of user inputs to a computer system such as a pen-based system and for analysing the user inputs to determine possible meanings.

SUMMARY OF THE INVENTION

The present invention provides a system and a method for analyzing and interpreting user inputs to a computer, such as a hand-held, pen-based computer system, described herein. Inputs are received at a user interface, such as a dual function display/input screen, from users in the form of pen strokes or gestures. These pen strokes or gestures are preferably input into the CPU as an array of X and Y data points corresponding to the path of a stylus across the display/input screen. A database stores the input data points and hypotheses regarding possible interpretations of the strokes represented by the array of data points. Recognition of the input strokes and recognition of higher level combinations of strokes (forming characters and words, etc.) is performed using recognition domains, each of which performs a particular recognition task. A controller is provided for the recognition domains. An arbiter is provided for resolving conflicts among competing hypotheses

The recognition domains, or recognizers generate one or more competing interpretations for the same input. The recognizers use a data structure called a unit, where a unit is a set of subhypotheses together with all their interpretations generated by a single recognizer. An interpretation is a description of a particular portion of the input data (strokes), where the description is based on the strokes or on other lower-level interpretations. A recognizer operates at a first level for identifying one or more groups of related subhypotheses using the unit absent any interpretations and stores the unit in the database in piece-pool memory. A recognizer has a second level of operation where each unit generated in the grouping stage is classified to provide the unit with one or more interpretations. The classified unit is stored in a unit pool memory.

One or more interpretations of the input data are combined in a hierarchical structure according to a predetermined scheme in successive steps to form higher level interpretations.

For a given hierarchy, an independent recognition domain has two required parts and one optional part. The required parts are: (1) information to establish the position of the domain in the hierarchy, and (2) knowledge about how to perform a particular recognition task; and the optional part is conditions and context constraints that the input data has to satisfy to be considered for recognition by a recognizer. The information to establish a domain as a part of a given hierarchy is information about which type of hypotheses the domain takes as input and information about which type of hypothesis the domain generates so that the type of input hypotheses for a domain uniquely establishes its position in the hierarchy. The knowledge on how to perform a particular recognition task is subdivided into two parts, where one part is the grouping knowledge for deciding which of the available strokes or hypotheses should be considered as a whole to form a new interpretation, and where the other part is classification knowledge for generating a list of interpretations from a given set of hypotheses.

A hypothesis is removed from the database when it reaches the top of the recognition hierarchy and any conflicts with another hypothesis have been resolved by the arbitration process. When a hypothesis is removed from the database, its supporting hierarchy of hypotheses, down to the constituent strokes, is removed from the database and all other hypotheses in the database that refer to the these constituent strokes, or the hypotheses that use them, are removed.

The means for extracting information from each isolated portion of the input data, the means for assigning an interpretation to each portion of the input data; the means for combining two or more interpretations of the input data to form new more integrated interpretations; the additional means for combining two or more new interpretations to form newer interpretations until a fully integrated interpretation of the input data is formed, and the means for combining two or more interpretations of the input data to form new more integrated interpretations includes a recognition hierarchy for the task of recognizing, for example, handwritten words and simple graphical shapes.

The controller, or control unit, includes means for deducing from the information provided by the recognizers, or domains, precisely which domains should be active within a specific area of the user-input screen of, for example, a pen-based system. For recognition of words, the control unit uses a character-part domain, a character domain, and a word domain within that specific area. Means are provided for pre-computing for each recognition area of a user screen what action the recognizer should take when an input of each of the expected types is seen within that area.

The system can resolve conflicts among competing hypotheses and determines when the recognition process is completed for a given stroke or set of strokes. The system provides means for generating a hypothesis; means for registering and accumulating all of the hypotheses generated by each different recognizer; and means for finding those hypotheses that best account for the input data. One technique for finding those hypotheses that best account for the input data includes means for comparing both the scores of the various interpretations,and the strokes that are accounted for by those interpretations. If the score of the best hypothesis is clearly better than that of competing hypotheses, the recognized object is emited to the user module so that the user module that requested the object can then examine its own context and decide whether or not to accept the interpretation.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention:

FIG. 1 is a block diagram of a pen-based computer system in which the present invention is used.

FIG. 2 is a pictorial representation of the screen of a computer display assembly in accordance with the present invention.

FIG. 3 is a simplified block diagram modelling the architecture of a recognition system in accordance with the present invention.

FIG. 4 is a diagram illustrating a simple recognition hierarchy for recognition of words and simple-shape graphics.

FIG. 5 is a diagram representing three layers of recognition into which inputs to the the display screen can be categorized, namely, an alwaysArea, a priorityArea, and a defaultArea.

FIG. 6 is a diagram illustrating the flow of information in a system according to the invention.

FIG. 7 is a flow chart illustrating the sequence of events associated with performing a recognition sequence.

FIG. 8 is a flow chart illustrating a sequence of programmed steps for adding a recognizer domain.

FIG. 9 is a flow chart illustrating a sequence of programmed steps for adding or deleting display screen areas.

FIG. 10 is a flow chart illustrating a sequence of programmed steps for collecting new input events to the system.

FIG. 11 is a flow chart illustrating a sequence of programmed steps for performing a recognition sequence.

FIG. 12 is a flow chart illustrating a sequence of programmed steps for initiating a group operation.

FIG. 13 is a flow chart illustrating a sequence of programmed steps for grouping one or more groups of related subhypotheses using domain grouping knowledge.

FIG. 14 is a flow chart illustrating a sequence of programmed steps for initiating a classify operation.

FIG. 15 is a flow chart illustrating a sequence of programmed steps for classifying.

FIG. 16 is a flow chart illustrating a cleanup routine for a unit claimed by an application.

FIG. 17 is a flow chart illustrating a routine for an application handling a recognized object and for marking claimed units for deletion.

FIG. 18 is a flow chart for the ClaimUnits method, which is a recursive method called by an application to mark all claimed units for deletion.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

A Pen-Based Computer System

In FIG. 1, a pen-based computer system 10 in accordance with the present invention includes a central processing unit (CPU) 12, read only memory (ROM) 14, random access memory (RAM) 16, input/output (I/O) circuitry 18, and a display assembly 20. The pen-based computer system 10 may also optionally include a mass storage unit 22 such as a disk drive unit or non-volatile memory such as flash memory, and an array of input buttons 23.

The CPU 12 is preferably a commercially-available, single chip microprocessor. While CPU 12 can be a complex instruction set computer (CISC) chip, it is preferable that CPU 12 be one of the commercially available, reduced instruction set computer (RISC) chips which are known to be of generally higher performance than CISC chips. CPU 12 is coupled to ROM 14 by a uni-directional data bus 24. ROM 14 contains the basic operating system for the pen-based computer system 10. CPU 12 is connected to RAM 16 by a bi-directional data bus 26 to permit the use of RAM 16 as scratch pad memory. ROM 14 and RAM 16 are also coupled to CPU 12 by appropriate control and address busses, as is well known to those skilled in the art. CPU 12 is further coupled to the I/O circuitry 18 by bi-directional data bus 28 to permit data transfers with peripheral devices.

I/O circuitry 18 typically includes a number of latches, registers and direct memory access (DMA) controllers. The purpose of I/O circuitry 18 is to provide an interface between CPU 12 and such peripheral devices as display assembly 20, mass storage 22, and the array of input buttons 23.

Display assembly 20 of pen-based computer system 10 is both an input and an output device. Accordingly, it is coupled to I/O circuitry 18 by a bi-directional data bus 30. When operating as an output device, the display assembly 20 receives data from I/O circuitry 18 via bus 30 and displays that data on a suitable screen. The screen for display assembly 20 is preferably a commercially available liquid crystal display (LCD) which is available from a variety of manufacturers. The input device of display assembly 20 is preferably a thin, clear membrane which covers the LCD display and which is sensitive to the position of a stylus 32 on its surface. These position-sensitive membranes are also readily available on the commercial market. Combination display assemblies such as display assembly 20 which include both the LCD output screen and the input membrane are commercially available from such vendors as Scriptel Corporation of Columbus, Ohio.

Other types of pointing devices can also be used in conjunction with the present invention. While the method of the present invention is described in the context of a pen-based system, other pointing devices such as a computer mouse, a track ball, or a tablet can be used to manipulate a pointer on a screen. Therefore, as used herein, the terms "pointing device", "pointing means", and the like will refer to any mechanism or device for pointing to a particular location on a screen of a computer display.

Some type of mass storage 22 is generally considered desirable. However, the mass storage 22 can be eliminated by providing a sufficient amount of RAM 16 to store user application programs and data. In this instance, the RAM 16 can be provided with a back-up battery to prevent the loss of data even when the pen-based computer system 10 is turned off. However, it is generally desirable to have some type of long term storage 22 such as a commercially available miniature hard disk drive, or non-volatile memory such as flash memory or battery-backed RAM.

In operation, information is input into the pen-based computer system 10 by "writing" on the screen of display assembly 20 with the stylus 32. Information concerning the location of the stylus 32 on the screen of the display assembly 20 is input into the CPU via I/O circuitry 18. The CPU 12 then processes the data under control of an operating system and/or application program stored in ROM 14 and/or RAM 16. The CPU 12 then produces data which is output to the display assembly 20 to produce appropriate images on its screen.

In FIG. 2, the pen-based computer system 10 is shown housed within an enclosure 36. The CPU 12, ROM 14, RAM 16, I/O circuitry 18, and mass storage 22 are preferably fully enclosed within the enclosure 36. The display assembly 20 is mostly enclosed within the enclosure 36, but exposes a viewing screen 38. As used herein, the term "screen" will refer to the portion of the display assembly 20 which can display an image that can be viewed by a user. Also accessible to the user are the array of input buttons 23.

RECOGNITION ARCHITECTURE

FIG. 3 is a block diagram of a preferred recognition architecture 100 of a recognition system of the present invention. This recognition architecture depicts the organization of a system of data structures and procedures for analyzing user inputs to a computer and for determining possible interpretations of those user inputs. The following describes the organization of the knowledge and data within the recognition architecture model and the problem solving approach taken using object-oriented programming techniques and terminology.

For a pen-based computer the user inputs are in the form of strokes of a pen drawn on an electronic tablet, keystrokes, and voice inputs. This recognition model can also be extended to other input sources, including those of non-human origin. In places where the handwritten inputs are described as characters or words, these inputs can also be numbers, formulas, or graphic diagrams.

The recognition architecture model 100 consists of several basic system components: recognition domain group 102; database 104; a control unit 106; recognition areas 108; and an arbiter 110.

The recognition domain group 102 includes a number of recognition domains. A recognition domain is a recognizer, which contains the knowledge and methods required to recognize a particular object (for instance, characters or words).

The database 104 contains two arrays: a piece pool 111 and a unit pool 112. These two arrays store the input data strokes to the recognition system and the active recognition hypotheses. The recognition domains perform changes to the strokes and hypotheses (via the controller). These changes lead progressively to the recognition of the input. The pool mechanism eliminates the need for direct communication between domains while providing a uniform communication mechanism.

The controller, or control unit 106, manages the whole process. It maintains communication with the recognition areas. It stores and retrieves hypotheses from the unit and piece pools and ensures their consistency. It hands the hypotheses to the appropriate domains for recognition.

The recognition areas are determined by application components, which specify areas of the pen-based screen (a text field, for instance) and the type of objects that are expected in these areas. The controller maintains a list of these areas, and when an object of the specified type (a word, for instance) is recognized within an area, the controller passes the object back to the application component.

A recognition domain 102 includes the knowledge and procedures to perform a particular recognition task (e.g. word recognition, paragraph recognition). Recognition domains are independent and are organized into a hierarchy.

The database 104 stores the input data (strokes) to the recognition system and the active recognition hypotheses. The recognition domains 102 perform changes to strokes and hypotheses in the database 104. These changes lead progressively to recognition of the input under the control of the controller, or control unit 106, which manages the database 104 and schedules the recognition tasks. A recognition domain 102 does not have direct access to the database 104. Rather, the control unit 106 provides, according to a schedule, appropriate strokes or hypotheses to each recognition domain 102 for recognition. Then, some time later, the recognition domain 102 returns to the control unit 106 new hypotheses for storage into the database 102.

The arbiter 110 coexists with the controller and resolves conflicts among competing hypothesis and also determines when the recognition process is completed for a given stroke or set of strokes. The recognized unit is then returned to the application program which requested recognition. The application extracts information from the recognized unit and verifies to the controller 106 that the extraction has taken place. After this verification, the controller 106 removes the recognized unit and all units related to the recognized unit from the memory arrays 104.

Recognition Areas

In order to receive input, an application needs to specify an area of the display/input screen and the type of object it expects in that area. This process is known as registering an area. A toolbox provides the view class for this purpose.

A recognition area is represented by the TRecArea class:

______________________________________ class TRecArea : public TObject { public: ULong fArea; TTypeAssoc *fATypes; // arbitration unit types TTypeAssoc *fGTypes; // grouping unit types // classification unit types = atypes + gtypes) public: static TRecArea *Make(ULong area, ULong flags); void Dispose(void); void AddAType(ULong atype, ULong (*Handler) (TSIUnit *, ULong, dInfoRec *), dInfoRec *dInfo); }; ______________________________________

The controller maintains a list of registered areas. When a new unit is presented to the recognition system (usually a stroke), the controller searches its list to determine which of the reas registered by the application are hit by the new unit. An area is "hit" if the unit is more than half in the area. This measurement is based on the bounding rectangles of the areas and unit, and their intersection.

All the areas hit by this unit are collected into a list known as an AreaList. This AreaList is then stored into the unit itself. It is also stored into all units that are derived from this original unit. An advantage of this technique is that when a new classification is assigned to a unit, the unit's areaList can be scanned directly to determine who should be called next, and no redundant searching of the controller's areas is necessary. The controller maintains data structures that help it perform only the necessary types of recognition in each area. If the area is no longer needed, the method RemoveArea can be called, with the id returned by RegisterArea as parameter.

The application that is interested in having recognition performed in a particular area called the TController method RegisterArea, specifying the type of area, the area, the type of units that it wants recognized, and a routine to be called when an object of the appropriate type is recognized (the "Display" routine). The area is passed in as a rectShape (whose structure may vary, depending on the graphics system in use). An area can be created by calling GNewRShape. The unit type is the same type that is passed to the initialize routine of the domain that recognizes units of that type. The display routine is also discussed below.

There is an alternate method for registering areas which allows multiple types of things to be recognized within a single area. To do this, you must first create a new recognition area by calling TRecArea::Make, and passing in the rectShape that defines the area. Next you specify the types of units that should be recognized, calling TRecArea::AddAType once for each type to be added (and a "Display" routine for each). Once all types have been added, call TController::BuildAndRegister on the area. In the section on Display routines, below, there is further description of the way that you are notified when recognition occurs.

Hierarchical Recognition Structure

FIG. 4 illustrates a simple recognition hierarchical structure 120 for recognition of words and simple-shape graphics. It uses a problem solving technique called a bottom-up approach for solving a complex recognition problem. The first step is to isolate portions of the input data. The next step is to extract information from each isolated portion. Then an interpretation is assigned to each portion. Following this, one or more interpretations of the input data are combined to form new, more integrated interpretation. Some of the new interpretations are then combined to form newer interpretations, and so on, until a fully integrated interpretation of the input data has been formed.

A bottom-up recognition approach is visualized by a hierarchical structure, or hierarchy, where each node in the hierarchy performs some type of recognition. The lowest level in the hierarchy is the level closest to the data and the level below the root node represents the last level of integration among hypotheses. At the root node of the hierarchy, if there are competing hypothesis for the data, the best one is chosen according to some criterion. We will refer to lowest level in the hierarchy as the first level, to the next level up as the second level and so on.

FIG. 4 presents an example of a simple recognition hierarchical structure 120 for the task of recognizing handwritten words and simple graphical shapes as inputs. This recognition hierarchy is used to illustrate many of the issues that arise in the recognition process.

To explain the recognition process in more detail it is necessary to introduce two terms: interpretation and hypothesis. An interpretation is a description of a portion of the input data (strokes). This description may be based directly on the strokes or it may be based on other interpretations. For example, if we are given the strokes representing the script letters "c", "a", "r", an interpretation of these strokes is the word "car". This interpretation may be based directly on the fact that the strokes form such a word; or it may be based on the fact that the strokes are interpreted as the letters c, a and r, and that the word car is made up by these three interpretations. A hypothesis is a portion of the input data together with an interpretation of the data. A hypothesis may be derived directly from the input data or from other hypothesis.

In the simple recognition hierarchical structure 120 of FIG. 4, there are two nodes 122, 124 at the first level and two nodes 126,128 at the second level. There is one node 130 at the third level. At the highest, or root node, 132 is performed an arbitration process among the various hypotehesis produced. The left node 122 at the first level isolates strokes and recognizes character parts. The right node 124 at the first level isolates strokes and recognizes graphical shape parts. The left node 126 at the second level integrates character parts to form character interpretations, and the right node 128 integrates shape parts to form shape interpretations. Finally, the node 130 at the third level integrates characters to form words. At the root node 132, arbitration among hypotheses takes place when necessary.

In the simple recognition hierarchical structure 120, the bottom-up approach to recognition of handwritten data is utilized, where a recognition domain is at each node of the hierarchy, except the root node. A recognition domain can be thought of as an element that encapsulates all the knowledge necessary to perform a well defined recognition task. Recognition domains are discussed in more detail herein after. The recognition domains 122, 124 at the first level in the hierarchy isolate a stroke or set of strokes from the input and produce very simple hypotheses by analyzing the strokes (e.g. character parts, shape parts). The recognition domains 126,128 at the second level in the hierarchy select some of the hypotheses generated by the domains at the first level, and generate new hypotheses by combining the ones that were selected (e.g. characters, shapes). The domains at the next levels proceed in the same fashion until the top of the hierarchy is reached.

In order for recognition to be successfully completed using the bottom-up approach proposed, several additional problems need to be solved. One problem to solve is hypothesis proliferation. By this we mean to prevent the number of hypotheses present in the system at any given time from growing without bounds. A second problem to solve is the arbitration of conflicting and competing hypotheses. This problem arises because recognition domains are independent from each other and may propose alternative interpretations for the same data. We say that two or more hypotheses are competing hypotheses if they use the same subhypotheses, and that two or more hypotheses are conflicting hypotheses if they share a subhypothesis.

In this system, both problems are solved with some help provided by the recognition domains. To prevent hypothesis proliferation, each recognition domain is required to keep the number of hypotheses it proposes under a given limit. To help arbitration, each domain assigns a numerical value to its hypotheses. This numerical value is called a score. The score of a hypothesis reflects the degree of confidence that the domain has in the hypothesis. The controller and the application interact to handle a recognized object and cleanup units after being claimed by an application.

The recognition task is accomplished by independent recognition domains organized into a hierarchical structure. Essentially, for a given hierarchy, a recognition domain has two required parts and one optional part. The required parts are: (1) information to establish the position of the domain in the hierarchy, and (2) knowledge about how to perform a particular recognition task. The optional part of a recognition domain is concerned with conditions that the input data has to satisfy to be considered for recognition by the domain and context constraints for recognition.

The information to establish a domain as a part of a given hierarchy is which type of hypotheses the domain takes as input and which type of hypothesis the recognition domain generates. The type of input hypotheses for a recognition domain uniquely establishes its position in the hierarchy.

The knowledge of how to perform a particular recognition task can be thought of as being subdivided into two parts, grouping knowledge and classification knowledge. Grouping knowledge is the knowledge for deciding which of the available strokes or hypotheses should be considered as a whole to form a new interpretation. Classification knowledge is the knowledge for generating a new interpretation from a given set of hypotheses.

The optional part of a recognition domain provides a mechanism for ensuring that a hypothesis produced by a recognition domain satisfies some pre-established constraints. For instance, in an application that has fields or areas in a form, we may want to specify an area which will accept only date information. Hence, we need a mechanism to specify that the input data to the field can only be recognized as a date and is rejected if it is not a date. Thus each recognition domain can provide a grammar, or rulebook, for limiting and sequencing its inputs.

Referring back to FIG. 3, in the recognition architecture model 100 of FIG. 3, strokes are stored in the piece pool 111 of the database 104. In addition, the intermediate hypotheses generated by the domains during a recognition task are also stored in the unit pool 112 of the database 104. The database 104 eliminates the need for direct communication among recognition domains while providing an indirect, but uniform, communication mechanism between recognition domains.

A stroke is stored in the piece pool 111 of the database 104 as it enters the system. As the various levels of the recognition domains in the recognition heirarchy are activated, these recognition domains store their hypotheses in the database 104. A hypothesis is removed from the database 104 when it reaches the top of the recognition hierarchy and any conflicts with another hypothesis has been resolved by the arbiter 110. When a hypothesis is removed from the database 104, its supporting hierarchy of hypotheses, down to the constituent strokes, is removed from the database 104. In addition all other hypotheses in the database 104 that refer to the these constituent strokes, or the hypotheses that use them, are removed.

In this system, control function for the system is centered in the control unit 106. The control unit 106 has three main functions: 1) the control unit 106 stores and retrieves hypotheses from the database 104 and it ensures the integrity of the database 104; 2) the control unit 106 hands the appropriate hypotheses to the recognition domains 102 for grouping and classification; and 3) the control unit 106 schedules grouping and classification tasks.

The recognition function involves the use of recognition areas. A recognition area is a specific area on the input screen of a pen-based computer. When a recognition client (such as a document) wants to perform specific types of recognition in a specific area, the recognition client informs the control unit 106 about the types of recognition that the recognition client is interested in. The recognition client also informs the control unit about the screen areas, in which information is to be recognized. The control unit 106 deduces from the information provided by the recognition domains, precisely which recognition domains should be active within a designated area of the screen. For example, if an area is