WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method for indexing and searching handwritten documents in a database    
United States Patent5553284   
Link to this pagehttp://www.wikipatents.com/5553284.html
Inventor(s)Barbara; Daniel (Princeton, NJ); Aref; Walid (Monmouth Junction, NJ)
AbstractA method for indexing electronic handwritten documents is provided. Each document includes a plurality of output symbols in an output sequence, and is modeled by a respective Hidden Markov Model (HMM). The HMMs share a common alphabet and a common sequence length. A tree is established, having linked nodes stored in a memory. Each node has n pointers, each identifying a different node in the next level of the tree. Each path from the root to a different one of the leaf nodes defines a respective sequence of pointers. An indexing procedure is performed, for each of a subset of the nodes in one of the levels of the tree. The procedure includes: (1) determining the probability that a subset of one of the sequences of pointers leading from the root to that node represents a subset of the output symbols in one of the documents; (2) invoking the procedure for the next level, if the determined probability exceeds the minimum probability value of that level; and (3) adding a pointer to that document in the list of pointers of the leaf node associated with that sequence of pointers, if the next level is the last level and the probability is greater than the threshold value. The procedure is repeated for each other document.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5553284
Method for indexing and searching handwritten documents in a database - US Patent 5553284 Drawing
Method for indexing and searching handwritten documents in a database
Inventor     Barbara; Daniel (Princeton, NJ); Aref; Walid (Monmouth Junction, NJ)
Owner/Assignee     Panasonic Technologies, Inc. (Princeton, NJ)
Patent assignment
All assignments
Publication Date     September 3, 1996
Application Number     08/469,803
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 6, 1995
US Classification     707/4
Int'l Classification     G06F 017/30
Examiner     Black; Thomas G.
Assistant Examiner     Pham; C.
Attorney/Law Firm     Ratner & Prestia
Address
Parent Case     This application is a continuation of application Ser. No. 08/248,392 filed May 24, 1994 now abandoned.
Priority Data    
USPTO Field of Search     345/600 345/575 345/700 382/36 382/21 382/48
Patent Tags     indexing searching handwritten documents database
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
3588823



[0 after 0 votes]
5347595
Bokser
382/225
Sep,1994

[0 after 0 votes]
5335289
Abdelazim

Aug,1994

[0 after 0 votes]
5263097
Katz
382/190
Nov,1993

[0 after 0 votes]
5241619
Schwartz
704/200
Aug,1993

[0 after 0 votes]
5226091
Howell
382/107
Jul,1993

[0 after 0 votes]
5202986
Nickel
707/3
Apr,1993

[0 after 0 votes]
5151950
Hullender
382/187
Sep,1992

[0 after 0 votes]
5136687
Edelman
706/20
Aug,1992

[0 after 0 votes]
5129002
Tsuboka
704/246
Jul,1992

[0 after 0 votes]
5123057
Verly
382/156
Jun,1992

[0 after 0 votes]
5105470
Will
382/186
Apr,1992

[0 after 0 votes]
5065431
Rollett

Nov,1991

[0 after 0 votes]
5033087
Bahl
704/256.5
Jul,1991

[0 after 0 votes]
5014327
Potter
382/220
May,1991

[0 after 0 votes]
4989258
Takahashi
382/226
Jan,1991

[0 after 0 votes]
4975975
Filipski
382/227
Dec,1990

[0 after 0 votes]
4653107
Shojima
382/189
Mar,1987

[0 after 0 votes]
4601012
Aiken, Jr.
714/15
Jul,1986

[0 after 0 votes]
4553206
Smutek
707/101
Nov,1985

[0 after 0 votes]
4419740
Hevenor, Jr.
711/202
Dec,1983

[0 after 0 votes]
4028673
Taylor
382/100
Jun,1977

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed:

1. A method for indexing a plurality of electronic handwritten documents, each of which includes a plurality of output symbols ordered in an output sequence, each document being modeled by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n symbols, and a common output sequence length of T symbols, where n and T are integers, the method comprising the steps of:

(a) establishing an index of symbols, each symbol being a feature vector, the index having T levels, each level between zero and T-1 having a respective minimum probability value, each level having at least one node;

(b) indexing the symbols in one of the documents without translating the symbols into one of a known character and a known pictograph by performing, for each node in one of the levels of the index, the steps of:

(1) determining a probability that a symbol stored in said node represents a corresponding output symbol in the one document, using the respective HMM for the one document,

(2) adding a node in a next level of the index, if the probability determined in step (b)(1) exceeds the minimum probability value of the one level and the next level is between the one.sup.th level and the T-1.sup.th level,

(3) executing step (b) for the added node in the next level, if the node is added in step (b)(2), and

(4) adding a pointer to the one document in a list of pointers stored in a node in the T.sup.th level of the index, if the next level is the T.sup.th level and the probability determined in step (b)(1) is greater than the minimum probability value of the T-1.sup.th level; and

(c) repeating step (b), for each of the plurality of documents other than the one document.

2. A method according to claim 1, wherein step (b) includes performing steps (b)(1) through (b)(4) n times at said node, corresponding to one time for each of said n output symbols.

3. A method according to claim 1, wherein each symbol comprises a direction coordinate.

4. A method according to claim 3, wherein each symbol further comprises a velocity coordinate.

5. A method according to claim 3, wherein each symbol further comprises a change-of-direction coordinate and a change-of-velocity coordinate.

6. A method for indexing a plurality of electronic handwritten documents, each of which includes a plurality of output symbols ordered in an output sequence, each document being modeled by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n output symbols, and a common output sequence length of T symbols, where n and T are integers, the method comprising the steps of:

(a) establishing an index of symbols without translating the symbols into one of a known character and a known pictograph, each symbol being a feature vector, the index comprising T+1 levels of linked nodes stored in a memory, the levels ordinally numbered zero.sup.th through T.sup.th, the zero.sup.th level having a root node, each node in the T.sup.th level being a leaf node, each node between the zero.sup.th level and the T-1.sup.th level having n pointers, each corresponding to a respectively different output symbol and identifying a respectively different subtree of the respective node, the pointers between the zero.sup.th and T-1.sup.th levels forming sequences of pointers, each of the sequences of pointers leading from the root node to a respectively different one of the leaf nodes;

(b) storing a respective list of pointers identifying a subset of the plurality of documents in one of the leaf nodes, comprising the steps of:

(1) determining, for the one leaf node, a probability that the respective sequence of pointers leading from the root node to the one leaf node represents the respective output sequence of one of the documents, using the respective HMM for that document,

(2) adding a pointer to the one document in the list of pointers of the one leaf node, if the probability is greater than a threshold value,

(3) repeating steps (1) and (2) for each one of the plurality of documents;

(c) repeating step (b) for each of the leaf nodes other than the one leaf node.

7. A method according to claim 6, wherein each level between zero and T-1 has a respective minimum probability value, and step (b)(1) comprises the steps of:

(i) determining a probability that a subset of said sequence of pointers leading from the root node to the jth node represents the first j+1 symbols in the one document, where j is an integer between zero and T-1, and

(ii) repeating step (i) for each value of j between one and T-1, while said probability determined in step (i) for the j-1.sup.th level exceeds the respective minimum probability value of the j-1.sup.th level.

8. A method for indexing a plurality of electronic handwritten documents, each of which includes a plurality of output symbols ordered in an output sequence, each document being modeled by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n output symbols, and a common output sequence length of T symbols, where n and T are integers, the method comprising the steps of:

(a) establishing an index of symbols without translating the symbols into one of a known character and a known pictograph, each symbol being a feature vector, the index comprising T+1 levels of linked nodes stored in a memory, the levels ordinally numbered zero.sup.th through T.sup.th, the zero.sup.th level representing a root node, each node in the T.sup.th level being a leaf node, each level of the index between zero and T-1 having a respectively different minimum probability value and, each node between the zero.sup.th level and the T-1.sup.th level having n pointers, each identifying a respectively different node in a next level of the index, the pointers between the zero.sup.th and T-1.sup.th levels forming sequences of pointers that represent sequences of output symbols, each sequence of pointers leading from the root node to a respectively different one of the leaf nodes, each leaf node being associated with a respectively different sequence of output symbols;

(b) storing information in the index for one of the documents, comprising the steps of:

(1) performing, for each one of a subset of the nodes in one of the levels, the steps of:

(i) determining a probability that a subsequence of one of said sequences of pointers leading from the root node to the one node represents a corresponding subset of the output symbols in the one document,

(ii) invoking step (b) for a node in a next level, if the probability determined in step (b)(1)(i) exceeds the minimum probability value of the one level and the next level is between the one.sup.th level and the T-1.sup.th level, and

(iii) adding a pointer to the one document in the list of pointers of the leaf node associated with the one sequence of pointers, if the next level is the T.sup.th level and the probability is greater than the threshold value; and

wherein step (b)(1) includes performing steps (b)(1)(i) through (b)(1)(iii) n times at said one node, corresponding to one time for each of said n output symbols; and

(c) repeating step (b), for each of the plurality of documents other than the one document.

9. A method according to claim 8, wherein step (b)(1) further comprises the steps of:

(iii) discontinuing repetition of step (b)(1) for any further nodes in a subtree for which the one node is a root node of the subtree, if said probability is less than the respective minimum probability value of one of the levels.

10. A method, according to claim 8, of indexing and searching a plurality of electronic handwritten documents, further comprising the steps of:

(d) providing a set of T input symbols;

(e) selecting one of the sequences of pointers that corresponds to the T input symbols;

(f) identifying the subset of the plurality of documents in the list of the leaf node to which the selected sequence of pointers leads as being found by the search, wherein steps (d) through (f) are initiated after steps (a) through (c) are completed, and no recognition operations are executed while performing steps (d) through (f).

11. A method according to claim 10, wherein step (e) further comprises the steps of:

(g) selecting the one of said nodes that is in the first level;

(h) repeating, for each value of j between zero and T-1, the steps of:

(1) identifying a respective index k.sub.j for the j.sup.th input symbol, where k.sub.j is an integer between zero and n-1, such that the k.sub.j.sup.th output symbol of the selected node matches the j.sup.th input symbol, and

(2) selecting the node in the j+1.sup.th level to which the k.sub.j.sup.th pointer of the node selected in step (l) (1) points.

12. A method for indexing a plurality of electronic handwritten documents, each of which includes a plurality of output symbols ordered in an output sequence, each document being modeled by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n output symbols, and a common output sequence length of T symbols, where n and T are integers, the method comprising the steps of:

(a) establishing an index of symbols without translating the symbols into one of a known character and a known pictograph, each symbol being a feature vector, the index comprising T+1 levels of linked nodes stored in a memory, the levels ordinally numbered zero.sup.th through T.sup.th, the zero.sup.th level being a root node, each node in the T.sup.th level being a leaf node, each level between zero and T-1 having a respectively different minimum probability value and, each node between the zero.sup.th level and the T-1.sup.th level having n pointers, each being associated with a respectively different symbol in the alphabet and identifying a respectively different node in a next level of the index, the pointers between the zero.sup.th and T-1.sup.th levels forming sequences of pointers that represent sequences of output symbols, each sequence of pointers leading from the root node to a respectively different one of the leaf nodes, each leaf node being associated with a respectively different sequence of output symbols;

(b) executing a procedure that comprises the steps of:

(1) performing, for each one of a subset of the nodes in one of the levels, the steps of:

(i) determining a probability that one of the pointers within one of the sequences of pointers leading from the root node to that node represents the output symbol at that level in one of the documents,

(ii) executing the procedure for a node in a next level, if the probability determined in step (b)(1)(i) exceeds the minimum probability value of the one level and the next level is between the one.sup.th level and the T-1.sup.th level, and

(iii) adding a pointer to the one document in the list of pointers of the leaf node associated with the one sequence of pointers, if the next level is the T.sup.th level and the probability is greater than the threshold value; and

wherein step (b)(1) includes performing steps (b)(1)(i) through (b)(1)(iii) n times at said one node, corresponding to one time for each of said n output symbols; and

(c) repeating step (b), for each of the plurality of documents other than the one document.

13. A method according to claim 12, wherein step (b)(1) further comprises the steps of:

(iii) discontinuing repetition of step (b)(1) for any further nodes in subtree for which the one node is a root node of the subtree, if said probability is less than the respective minimum probability value of one of the levels.

14. A method, according to claim 12, of indexing and searching a plurality of electronic handwritten documents, further comprising the steps of:

(d) providing a set of T input symbols;

(e) selecting one of the sequences of pointers that corresponds to the T input symbols;

(f) fetching the list of pointers stored in the leaf node to which the selected sequence of pointers leads;

(g) identifying the subset of the plurality of documents to which the pointers in the fetched list point as being found by the search, wherein steps (d) through (g) are initiated after steps (a) through (c) have been completed, and no recognition operations are executed while performing steps (d) through (g).

15. A method according to claim 14, wherein step (e) further comprises the steps of:

(h) selecting the one of said nodes that is in the first level;

(i) repeating, for each value of j between zero and T-1, the steps of:

(1) identifying a respective index k.sub.j for the j.sup.th input symbol, where k.sub.j is an integer between zero and n-1, such that the k.sub.j.sup.th output symbol of the selected node matches the j.sup.th input symbol, and

(2) selecting the node in the j+1.sup.th level to which the k.sub.j.sup.th pointer of the node selected in step (l)(1) points.

16. A method for indexing a plurality of electronic handwritten documents, each of which includes a plurality of output symbols and each of which is modeled by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n output symbols, and a common output sequence length of T symbols, where n and T are integers, the method comprising the steps of:

(a) establishing an index of symbols without translating the symbols into one of a known character and a known pictograph, each symbol being a feature vector, the index comprising T+1 levels of linked nodes stored in a memory, the levels ordinally numbered zero.sup.th through T.sup.th, each node between the zero.sup.th level and the T-1.sup.th level having n pointers, each pointer identifying a respectively different subtree of the respective node, each node in the T.sup.th level having a respective list of pointers identifying respective documents;

(b) establishing a respective threshold value for each level;

(c) calculating a probability that the HMM for one of the documents identifies the k.sup.th output symbol of the alphabet as being representative of the j.sup.th output symbol of the one document, where k is an integer between zero and n-1, and j is an integer between zero and T-1;

(d) setting the k.sup.th pointer of one of the nodes in the j.sup.th level to point to one of the subtrees of the one node if the probability calculated in step (c) is greater than the threshold value of the j.sup.th level, the one subtree including a subset of the nodes in the T.sup.th level;

(e) incrementing j by one;

(f) repeating steps (c) through (e) for each level j while: (1) j is less than T, and (2) the probability calculated in step (c) is greater than the threshold value of the j.sup.th level, thereby forming a sequence of pointers;

(g) repeating steps (c) through (f) for each value of k;

(h) identifying the one document in one of the lists in the T.sup.th level, if the probability of the sequence of pointers identifying each of the output symbols in the one document ordinally numbered one through T-1 is greater than the threshold value of the T.sup.th level;

(i) repeating steps (c) through (g) for each one of the plurality of documents.

17. A method for indexing a plurality of electronic handwritten documents, each of which includes a plurality of output symbols ordered in an output sequence, each document being modeled by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n symbols, and a common output sequence length of T symbols, where n and T are integers, the method comprising the steps of:

(a) establishing an index of symbols without translating the symbols into one of a known character and a known pictograph, each symbol being a feature vector, the index comprising T+1 levels of linked nodes stored in a memory, the levels ordinally numbered zero.sup.th through T.sup.th, the zero.sup.th level having a root node, each node in the T.sup.th level being a leaf node, each level between zero and T-1 having a respective minimum probability value;

(b) adding to each node between the zero.sup.th level and the T-1.sup.th level a respectively different number of pointers, the number of pointers being between 0 and n, each pointer identifying a respectively different node in a next level of the index, the pointers between the zero.sup.th and T-1.sup.th levels forming sequences of pointers that represent sequences of symbols, each sequence of pointers leading from the root node to a respectively different one of the leaf nodes, each leaf node being associated with a respectively different sequence of symbols, by executing a procedure that comprises the steps of:

(1) performing, for each one of the nodes in one of the levels, the steps of:

(i) determining a probability that a subset of one of said sequences of pointers leading from the root node to that node represents a subset of the output symbols in one of the documents,

(ii) adding a node in a next level, if the probability determined in step (b)(1)(i) exceeds the minimum probability value of the one level and the next level is between the one.sup.th level and the T-1.sup.th level,

(iii) executing the procedure for the added node in the next level, if the node is added, and

(iv) adding a pointer to the one document in the list of pointers of the leaf node associated with the one sequence of pointers, if the next level is the T.sup.th level and the probability is greater than the threshold value; and

(c) repeating step (b), for each of the plurality of documents other than the one document.

18. A method for indexing and searching a plurality of electronic handwritten documents, each of which includes a plurality of output symbols and is modeled by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n symbols, and a common output sequence length of T symbols, where n and T are integers, the method comprising the steps of:

(a) establishing an index of symbols without translating the symbols into one of a known character and a known pictograph, each symbol being a feature vector, the index comprising T+1 levels of linked nodes stored in a memory, the levels ordinally numbered zero.sup.th through T.sup.th, each node between the zero.sup.th level and the T-1.sup.th level having n pointers, each pointer identifying a respectively different subtree of the respective node, each node in the T.sup.th level having a respective list of pointers identifying respective documents;

(b) establishing a respective threshold value for each level;

(c) calculating a probability that the HMM for one of the documents identifies the k.sup.th symbol of the alphabet as being representative of the j.sup.th output symbol of the one document, where k is an integer between zero and n-1, and j is an integer between zero and T-1;

(d) setting the k.sup.th pointer of one of the nodes in the j.sup.th level to point to one of the subtrees of the one node if the probability calculated in step (c) is greater than the threshold value of the j.sup.th level, the one subtree including a subset of the nodes in the T.sup.th level;

(e) incrementing j by one;

(f) repeating steps (c) through (e) for each level j while: (1) j is less than T, and (2) the probability calculated in step (c) is greater than the threshold value of the j.sup.th level, thereby forming a sequence of pointers;

(g) repeating steps (c) through (f) for each value of k;

(h) identifying the one document in one of the lists in the T.sup.th level, if the probability of the sequence of pointers identifying each of the output symbols in the one document ordinally numbered one.sup.th through T-1.sup.th is greater than the threshold value of the T.sup.th level;

(i) repeating steps (c) through (g) for each one of the plurality of documents;

(j) providing a set of T input symbols;

(k) selecting the one of said nodes that is in the first level;

(l) repeating, for each value of j between zero and T-1, the steps of:

(1) identifying a respective index k.sub.j for the j.sup.th input symbol, where k.sub.j is an integer between zero and n-1, such that the k.sub.j.sup.th output symbol of the selected node matches the j.sup.th input symbol, and

(2) selecting the node in the j+1.sup.th level to which the k.sub.j.sup.th pointer of the node selected in step (l)(1) points; and

(m) selecting the subset of the plurality of documents to which the selected node in the T.sup.th level points as being found by the search, wherein steps (c) through (g) are completed for all documents prior to initiating any of steps (j) through (m).

19. Apparatus for indexing and querying a plurality of electronic handwritten documents, each of which includes a plurality of output symbols, the apparatus comprising:

an index formed by modeling each of the electronic handwritten documents by a respective Hidden Markov Model (HMM), the Hidden Markov Models having a common alphabet including n symbols, and a common output sequence length of T symbols, where n and T are integers, which includes:

a plurality of nodes arranged in T levels, said plurality of nodes including a root node at a zeroth level and a plurality of leaf nodes,

a plurality of paths from the root node to respectively different ones of the plurality of leaf nodes, each path defining a respectively different sequence of feature vectors which has at least a minimum probability of occurring in at least one of said documents, each of said leaf nodes storing at least one pointer which points to the one of said plurality of electronic handwritten documents having said minimum probability, each of said leaf nodes having the capacity to store a plurality of pointers;

a plurality of stored electronic handwritten documents, each pointed to by at least one of said pointers stored in said leaf nodes;

means for receiving an entered input sequence;

means for selecting the one of the plurality of paths that corresponds to the received input sequence without interpreting what the input sequence or the symbols in the input sequence represent; and

means for traversing the selected path of the index based and for identifying all documents pointed to by one of said leaf nodes if said input sequence corresponds to a path from the root node said one leaf node,

wherein said means for receiving, said means for selecting, and said means for traversing operate without translating the input sequence into a fixed character set.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates generally to methods for searching for documents in a database, and in particular to a method for indexing and searching for electronic handwritten documents.

BACKGROUND OF THE INVENTION

Electronic pictures and handwritten documents are entering common use in computers due to the introduction of pen based interfaces. Recent products have replaced the keyboard entirely by a pen with which all data entry is performed.

In a paper by D. Lopresti, and A. Tomkins, entitled "Pictographic Naming", INTERCHI '93 Adjunct Proceedings, April, 1993 (which is incorporated herein by reference for its teachings on use of pictographic names), the authors propose extending the range of acceptable document names to include arbitrary hand-drawn pictures. When a document is created or first stored in a storage medium, the author draws a pictographic name instead of typing in a textual name. To subsequently retrieve one of the documents, the pictographic names may be displayed in a menu or "browser", and the user selects the desired pictographic name.

If the database includes more than about 8-12 documents, it becomes impractical to display all of the pictographic names during retrieval.

In an alternative method to subsequently retrieve one of the documents, the pictographic name is redrawn using a pen based interface. Because the hand-drawn pictures are not drawn in exactly the same way each time, a pattern recognition technique is needed to determine which document (or output sequence) a given hand-drawn picture (or input sequence) is intended to represent.

One of the proposed techniques for identifying a document by its pictographic name involves the use of Hidden Markov Models (HMM) to provide a list of candidate documents that have pictographic names most closely resembling the input sequence. From this list, one file is selected using the pen. HMMs provide a powerful tool for picture and handwritten document matching. Several researchers have used HMMs to model handwriting and handwritten documents.

Rabiner, L. R., "A Tutorial on Hidden Markov Models and selected Applications in Speech Recognition", Proceedings of the IEEE, 77(2):257-285, February 1989, is incorporated by reference herein for its teachings on the use of Hidden Markov Models for pattern recognition.

Formally, an HMM is a doubly stochastic process that contains a non-observable underlying stochastic process (hidden) that is uncovered by a set of stochastic processes that produce the sequence of observed symbols. Mathematically an HMM is a tuple <.sigma., Q, a, b>,

where:

1) .sigma. is a (finite) alphabet of output symbols. A symbol is typically a subset of a character.

2) Q is a set of states, Q={O, . . . , N-1} for an N-state model.

3) a is a probability distribution that governs the transitions between states. The probability of going from state i to j is denoted by a.sub.ij. The transition probabilities a.sub.ij are real numbers between 0 and 1, such that: ##EQU1## The distribution includes the initial distribution of states, that is the probability a.sub.i of the first state being i.

4) b is an output probability distribution b.sub.i (s) that governs the distribution of output symbols for each state. That is, b.sub.i (s) is the probability of producing the symbol s .epsilon. .sigma. while being in state i. These probabilities follow the rules:

For all i.epsilon.Q and s.epsilon..sigma.:0<b.sub.i (s).ltoreq.1(2)

For all i.epsilon.Q, .SIGMA..sub.s.epsilon..sigma. b.sub.i (s)=1(3)

Usually, when HMMs are used, the transition probabilities (a) and the state set (Q) are computed by bestfitting the model to a series of samples. (This is known as training the model). Each sample consists of a sequence of output symbols (points), with which the parameters of the model may be adjusted. However, in applications such as recognition of handwritten documents, the model is described using a single sample (a sequence of output symbols for the document that is to be indexed). Quite commonly, then, the structure of the model is "fixed" to make up for the lack of samples with which to train it. That is, once a model is selected for an index, that model is used for the life of the index. The model is not changed dynamically after the index is created. For example, a left-to-right HMM may be used, i.e. a model in which it is only possible to remain in the current state or to jump to the next state in sequence.

For the handwritten document problem, each picture or document in the database is modeled by an HMM. As a result, given an input pattern, the recognition process involves executing each HMM in the database and selecting the one that generates the input pattern with highest probability. This is very time consuming. The primary impediment to using HMMs is execution speed, especially in the context of large databases. Executing a respective HMM for each document in the database in real-time to retrieve one of the documents introduces an unacceptable d