|
|  Get related patents on CD |
| United States Patent | 5034989 |
| Link to this page | http://www.wikipatents.com/5034989.html |
| Inventor(s) | Loh; Shiu-Chang (Toronto, Ontario, CA) |
| Abstract | An apparatus and a method for identifying handwritten characters is
provided, each of the characters being a member of a set and being formed
from a number of predetermined primitives. The apparatus includes an input
device receiving successively each primitive forming a character. The
input device generates input signals for each primitive forming the
handwritten character. The input signals are conveyed to a processor. The
processor examines the input signals and attempts to identify each of the
primitives used to form the handwritten character. A primitive code is
generated for each identified primitive and an unidentified primitive code
is generated for each unidentified primitive. The primitive and
unidentified primitive codes are combined to form an input character code.
A memory is provided and stores a character code and an international
output code for each of the characters in the set of characters. A
comparator compares the input character code generated for the handwritten
character with each of the character codes stored in the memory. When the
input character code is equivalent to a character code associated with
only one output code, the output code is conveyed to an output device such
as a printer wherein a reproduction of the handwritten character is
formed. When the character code is equivalent to a character code
associated with more than one output code, a differentiator detects the
correct output code associated with the input character code so that the
handwritten character can be reproduced. |
| |
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5034989 |
|
|
On-line handwritten character recognition apparatus with non-ambiguity
algorithm |
|
|
|
|
|
| Publication Date |
July 23, 1991 |
|
|
|
|
|
| Filing Date |
January 17, 1989 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This is a continuation-in-part of U.S. patent application Ser. No.
07/131,734, filed Dec. 11, 1987, now abandoned. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
|
|
|
|
|
|
Public's "Guesstimation" of Royalty Value
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
We claim:
1. A character recognition apparatus for identifying a handwritten of a predetermined set of characters formed from at least one primitive selected from a predetermined set of
primitives illustrated in FIG. 3 with said set of primitives forming said handwritten character being written in an order determined by pre-defined rules, said apparatus comprising:
input means for receiving successively and in accordance with said pre-defined rules each of the primitives forming said handwritten character and generating input signals for each of said received primitives;
processing means for receiving said input signals and identifying each of said primitives received by said input means, said processing means generating a character code representing said handwritten character upon identification of said
primitives forming said handwritten character;
storage means for storing a character code and an associated output code for each of the characters in said predetermined set;
comparing means for comparing said character code generated for said handwritten character with said character codes in said storage means to identify said handwritten character; and
output means in communication with said comparing means and generating a reproduction of said handwritten character upon the identification thereof by said comparing means.
2. The character recognition apparatus as defined in claim 1 further comprising:
differentiation means for examining said input signals generated for each of said received primitives and performing operations thereon when said character code generated for said handwritten character is equivalent to a character code in said
storage means associated with a plurality of output codes to identify the output code associated with said handwritten character.
3. The character recognition apparatus as defined in claim 2 wherein said primitives in FIG. 3 are capable of forming every character in a plurality of languages while reducing the number of characters in said languages formed from the same
series of primitives, said storage means storing a character code and an output code for each of said characters in said plurality of languages.
4. The character recognition apparatus as defined in claim 3 wherein said storage means further stores character test information, said test information being provided for each character code in said storage means associated with more than one
output code, said differentiation means receiving said character test information and said input signals and performing said operations thereon in accordance with said character test information to detect the output code corresponding to said handwritten
character.
5. The apparatus as defined in claim 3 wherein said processing means generated an unidentified primitive code for each handwritten primitive not detected as being in said set, said apparatus further comprising substitution means having means for
receiving said character code generated for said handwritten character when it is not equivalent to any of said character codes in said storage means, said substitution means including comparator means comparing each primitive code forming character code
generated for said handwritten character with the corresponding primitive code of said character codes in said storage means formed from the same number of primitive codes as the character code generated for said handwritten character; and
a memory for storing the output code associated with each of the character codes in said storage means having fewer than a predetermined number of differences when compared with character code generated for said handwritten character.
6. The apparatus as defined in claim 5 wherein said substitution means further comprises a probability matrix, said probability matrix generating a substitution primitive code most likely to be the unidentified primitive code when said
substitution means receives a character code generated for a handwritten character having at least one unidentified primitive code therein and replacing said unidentified primitive code with said substitution primitive code to form a character code
equivalent to a character code stored in said storage means, and most likely to represent said handwritten character.
7. The apparatus as defined in claim 1 wherein said input means is an on-line digitizer tablet providing cartesian co-ordinate data for each of said primitives forming said handwritten character, said processing means further comprising encoding
means for examining said cartesian co-ordinate data for each of said primitives forming said handwritten character and forming therefrom a series of unit vectors.
8. The apparatus as defined in claim 7 wherein said encoding means is a modified Freeman encoder which includes a plurality of freeman unit vectors, said encoder detecting only substantially vertical, horizontal and diagonal strokes forming the
primitives constituting said handwritten character.
9. The apparatus as defined in claim 8 wherein said processing means further comprises:
feature extraction means for receiving said series of unit vectors for each of said primitives and eliminating redundant unit vectors to form a vector code and an associated series of scalars for each of said handwritten primitives;
holding means for storing vector codes and an associated primitive code representing each of said primitives in said set along with an unidentified primitive code; and
comparator means for comparing said vector codes generated for said handwritten character with said vector codes stored in said holding means, said comparator means generating said primitive code when said vector code is equivalent to a vector
code stored in said holding means and generating said unidentified primitive code when said vector code is not equivalent to a vector code stored in said holding means.
10. The apparatus as defined in claim 9 wherein said holding means is further provided with primitive test information, said information being uniquely associated with vector codes which represent more than one primitive in said set, said
processing means further comprising a test section receiving said primitive test information and said series of scalars associated with said vector code and performing operations thereon to detect the correct primitive code associated with said vector
code generated for said handwritten the primitive when said vector code is equivalent to a vector code representing more than one primitive.
11. The apparatus as defined in claim 1 wherein said output means is selected form the group comprising:
a printer, an audio-synthesizer and a video display terminal.
12. An apparatus as defined in claim 9 further comprising pre-processing means for receiving said cartesian co-ordinate data, said preprocessing means comparing the distance between first and adjacent second co-ordinates and removing said second
co-ordinate if said distance is less than a predetermined threshold value thereby reducing the amount of redundant data.
13. A method of identifying a handwritten character of a pre-determined set of characters formed from at least one primitive selected from the set of primitives shown in FIG. 3, said method comprising the steps of:
receiving successively and in an order determined by pre-define rules each of said primitives forming said character and generating input signals for each of said received primitives;
examining said input signals to identify each of said entered primitives forming said handwritten character;
generating a primitive code for each of said primitives forming said handwritten character to form a character code upon identification of said primitives forming said handwritten character;
storing a character code and an associated output code for each of said characters in said set;
comparing said character code formed for said handwritten character with said character codes stored to detect said output code when said character code generated for said handwritten character is equivalent to a stored character code associated
with only one output code; and
examining said primitive codes generated for said handwritten character and performing operations thereon when said character code is equivalent to a stored character code associated with more than one output code in order to detect the output
code associated with said entered character; and
generating an image of said handwritten character upon detection of said associated output code.
14. A character recognition apparatus for identifying a handwritten character formed from at least one primitive, said character and said primitives being members of predetermined sets, said apparatus comprising:
input means for receiving successively and in order determined by pre-defined rules, each of the handwritten primitives forming said handwritten character said input means generating input signals for each of said handwritten primitives;
processing means receiving said input signals for each of said primitives, said processing means converting the input signals generated for each primitive into data representing a series of generally horizontal, vertical and diagonal vectors and
comparing said data with stored information therein and generating a primitive code for each of the primitives when said data are detected as being equivalent to stored information associated with a single primitive;
first differentiation means in communication with said processing means and performing discriminatory tests on said data when said data are detected as being equivalent to stored information associated with a plurality of primitives to determine
the primitive associated with said data to permit said processing means to determine said primitive code, the series of primitive codes generated by said processing means forming a character code;
storage means storing a character code and an associated output code for each of the characters in said predetermined set;
comparing means comparing said character code generated for said handwritten character with said character codes in said storage means to identify said entered handwritten character;
second differentiation means examining said input signals generated for each of said handwritten primitives and performing discriminatory tests thereon when said character code generated for said handwritten character is equivalent to a character
code in said storage means associated with a plurality of output codes to identify the output code associated with said handwritten character; and
output means in communication with said comparing means and said second differentiation means and generating a reproduction of said handwritten character upon identification of the output code associated with the handwritten character.
15. The character recognition apparatus as defined in claim 14 wherein said primitives are capable of forming substantially every character in a plurality of languages, said storage means storing a character code and an output code for each of
said characters in said plurality of languages.
16. The character recognition apparatus as defined in claim 15 wherein said storage means further stores character test information, said test information being provided for each character code associated with more than one output code, said
second differentiation means receiving said character test information and said output signals and performing said discriminatory test on said input signals in accordance with said character test information to detect the output code corresponding to
said handwritten character.
17. The character recognition apparatus a defined in claim 16 wherein said predetermined set of primitives includes twenty distinct primitives, the various combination of said twenty primitives being capable of forming substantially all
characters in said plurality of languages, a substantial portion of said primitives being formed from only substantially horizontal, substantially vertical and substantially diagonal components.
18. The character recognition apparatus as defined in claim 17 further comprising:
substitution means receiving the character code generated for said handwritten character when said character code is not equivalent to any of said character codes stored in said storage means, said substitution means including comparator means
for comparing each primitive code forming said character code generated for the handwritten character with the corresponding primitive codes forming said character codes in said storage means having the same number of primitive codes as the character
code generated for the handwritten character to detect differences between the character code and said character codes in said storage means; and
a memory for storing the output code associated with each of the character codes in said storage means having fewer than a predetermined number of differences when compared with the character code generated for the handwritten character.
19. The character recognition apparatus as defined in claim 18 wherein said processing means generates an unidentified primitive code when said processing means and said first differentiation means do not detect said data as being equivalent to
any information stored therein, said unidentified primitive code when generated forming part of said character code, said substitution means further comprising a probability matrix, said probability matrix generating a substitution primitive code most
likely to be the unidentified primitive code when said substitution means receives a character code having at least one identified primitive code therein and replacing said unidentified primitive code with said substitution primitive code in an attempt
to form a character code equivalent to a character code stored in said storage means and most likely to represent said handwritten character.
20. The character recognition apparatus as defined in claim 14 wherein said input means is an on-line digitizer tablet generating cartesian co-ordinate data for each of said primitives forming said handwritten character, said processing means
further comprising encoding means for examining said cartesian co-ordinate data for each of said primitives and forming therefrom a series of vectors and associated series of scalars.
21. The character recognition apparatus as defined in claim 20 wherein said encoding means is a modified Freeman encoder, said encoder examining said series of vectors to detect substantially horizontal, substantially vertical and substantially
diagonal unit vectors and converting said series of vectors into said data, said first differentiation means performing discriminatory tests on said associated scalars when said data generated for a primitive forming part of said handwritten character is
detected as being equivalent to stored information associated with a plurality of primitives to determine the primitive associated with the data.
22. The character recognition apparatus as defined in claim 21 wherein said processing means further comprises feature extraction means receiving said series of vectors for each of said primitives and eliminating redundant vectors to form said
data, said data being in the form of a vector code and said associated series of scalars for each of said handwritten primitives;
holding means for storing vector codes and an associated primitive code representing each of said primitives in said set along with an unidentified primitive code; and
comparator means for comparing said vector codes generated for said handwritten primitive with said vector codes stored in said holding means, said comparator means output said primitive code when said vector code is equivalent to a vector code
stored in said holding means and output said unidentified primitive code when said vector code is not equivalent to a vector code stored in said holding means.
23. The apparatus as defined in claim 22 wherein said holding means is further provided with primitive test information, said information being uniquely associated with vector codes which represent more than one primitive, said first
differentiation means receiving said primitive test information and said series of scalars associated with said vector code from said processing means and performing operation thereon to detect the correct primitive code associated with said vector code
when said vector code is equivalent to a vector code representing more than one primitive code.
24. An apparatus as defined in claim 23 further comprising a pre-processing means for receiving and conditioning said cartesian co-ordinate data to eliminate spurious data and to reduce redundant data.
25. The apparatus defined in claim 14 wherein all primitives are generally horizontal vectors, generally vertical vectors, generally diagonal vectors, or a combination of generally horizontal, vertical and diagonal vectors.
26. A method of identifying a handwritten character formed from at least one primitive, said character and said primitives being members of predetermined sets, said method comprising the steps of:
receiving successively and in an order determined by predefined rules each of said primitives forming said character in a predetermined manner and generating input signals for each of said received primitives;
examining and converting the input signals for each primitive into data representing a series of generally horizontal, vertical and diagonal vectors and comparing said data generated for each of said entered primitives with stored information to
identify each of said entered primitives forming said character;
generating a primitive code for each of said primitives when the data are detected as being associated with only one primitive and performing tests on said data to determine the correct primitive code when said data are detected as being
associated with more than one primitive;
forming a generated character code from said series of primitive codes;
storing a character code and an associated output code for each of said characters in said set;
comparing the generated character code with said stored character codes to determine said output code when said generated character code is equivalent to a stored character code associated with only one output code;
examining said input signals generated for said entered primitives and performing tests thereon when said generated character code is equivalent to a stored character code associated with more than one output code in order to determine the output
code associated with said handwritten character; and
generating an image of said handwritten character upon detection of said correct output code.
27. The character recognition apparatus as defined in claim 14 wherein the discriminatory tests determine the relative length between two primitives forming said handwritten character or whether one primitive forming the handwritten character
crosses another.
28. The character recognition apparatus as defined in claim 21 wherein the discriminatory tests determine the relative length of the vertical, diagonal and horizontal vectors generated for the primitive.
29. The character recognition apparatus as defined in claim 4 wherein said test information causes said differentiation means to determine the relative length between two primitives forming said handwritten character or whether one primitive
forming the handwritten character crosses another.
30. The character recognition apparatus as defined in claim 10 wherein said test section examines said vector codes to determine the relative length of the vector codes generated for the handwritten character. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
The present invention relates to an apparatus and method for identifying handwritten characters.
Since trade between Non-English speaking countries and Western countries has increased dramatically, the importance of communications has increased. For example, in the past when corresponding between English and Chinese speaking countries, a
document written in English that was received in China would firstly be forwarded to a government translation centre. The document would then be translated and transcribed by hand into Chinese and finally delivered to the addressee of the document When
a response to the translated document was prepared, the response would be translated from Chinese into English at the government translation centre and forwarded to the English correspondent. However, a problem existed in that the use of translators to
transcribe the documents from English to Chinese and vice versa added a significant delay i n the communications process.
To overcome these difficulties, a typewriter device has been developed having keys representing the ideographic characters of the Chinese language. This device allows hard copies of documents written in Chinese to be produced by hiring an
operator skilled in the Chinese language and capable of using the typewriter. However, a problem exists in that a large number of keys are required on the typewriter device since the Chinese language includes more than 50,000 different ideographic
characters. Improvements to this type of device have been introduced to reduce the number of keys required by using function keys, however, the above-mentioned problem still exists. Furthermore, another problem exists when using the typewriter devices
in that extensive training is required for the operators to learn how to use adequately the keyboard device, a process which is expensive and time consuming.
To overcome the problems encountered when using the keyboard devices, an ideographic character detection apparatus has been developed for receiving and identifying handwritten ideographic characters. The apparatus requires that the ideographic
character be written on an input device and that the written characters be formed from predetermined fundamental strokes or primitives which are typical strokes used by everyone who writes in the ideographic language. After an ideographic character has
been entered into the apparatus, the apparatus examines the primitives forming the entered ideographic character and compares the entered primitives with the contents of a look-up table. The look-up table stores a plurality of variations of each of the
predetermined primitives to accommodate variations in user's handwriting. Due to the large number of variations of each primitive stored in the table, the primitives forming the character are usually determined by the device. The table also stores the
sets of primitives used to form each of the characters in the ideographic language. If the set of primitives forming the entered character corresponds with one of the sets of primitives in the look-up table, an output code associated with the set of
primitives is generated and conveyed to an output device. This allows a hard copy image of the entered handwritten ideographic character to be formed. However, a problem exists in that due to the large number of variations of each primitive stored in
the table, the processing speed of the apparatus is greatly reduced making it unsuitable for real-time applications.
Moreover, the number of predetermined fundamental strokes or primitives used in this apparatus has typically been chosen to be five or less or twenty or more. By using only five fundamental primitives in the sub-set to form every ideographic
character in the language a problem exists in that a large number of different ideographic characters are formed from the identical set of primitives even though the ideographic characters are unique in appearance. This results in the decreased ability
of the apparatus to distinguish between different ideographic characters.
To attempt to overcome this problem, twenty or more distinct primitives have been included in the sub-set. However, the same problem still exists in that different ideographic characters are still formed from the identical series of primitives
although the occurrence of a set of primitives representing more than one ideographic character is reduced. However, by increasing the number of primitives in the sub-set, another problem exists in that the processing time of the apparatus is further
increased.
Furthermore, still yet another problem exists in that typically these devices are capable of detecting characters written in one language and do not permit multi-language character detection. Accordingly, there is a need for an improved
character recognition apparatus.
It is therefore an object of the present invention to obviate or mitigate the above disadvantages.
According to the present invention there is provided a character recognition apparatus for identifying characters formed from a number of primitives, said characters and primitives being members of predetermined sets, said apparatus comprising:
input means for receiving successively each of the primitives forming said character and generating input signals for each of said received primitives;
processing means receiving said input signals and identifying each of said primitives received by said input means, said processing means generating a character code representing said character upon identification of said primitives;
storage means storing a character code and an associated output code for each of the characters in said set;
comparing means comparing said character code generated for said entered character with each of said character codes in said storage means to identify said entered character; and
output means in communication with said comparison means and generating a reproduction of said entered character upon the identification thereof by said comparison means.
Preferably, the apparatus further includes differentiation means examining said input signals generated for each of said primitives and performing operations thereon, when said character code is equivalent to a character code associated with a
plurality of output codes to identify the output code associated with said character.
Preferably the apparatus is provided with substitution means for selecting the character code stored in the storage means having the highest probability of being equivalent to the character code generated for the entered character, when the input
character code is not equivalent to any of the character codes stored in the storage means. It is also preferred that the output means comprises at least one device chosen form the group comprising a printer, audio synthesizer or video display terminal
to allow a reproduction of the received ideographic character to be formed or an audio reproduction of the ideographic character to be produced.
Preferably, the character recognition apparatus is capable of recognizing characters written in all ideographic languages, upper case English language characters, and Russian characters.
It is also desirable that the predetermined set of fundamental primitives is chosen to comprise 20 unique primitives, the various combinations of which will form substantially all characters in a plurality of different languages, while decreasing
the occurrence of different characters being formed from the same series of primitives. Thus, the use of twenty distinct primitives decreases the occurrence of entered characters being represented character codes which are equivalent to a character code
associated with more than one international output code. This of course, increases the probability of detecting the correct ideographic character.
An embodiment of the present invention will now be described, by way of example only, with
reference to the accompanying drawings in which:
FIG. 1 is a functional block diagram of an apparatus for identifying handwritten characters;
FIG. 2 is an illustration of an ideographic character;
FIG. 3 are illustrations of the fundamental primitives used in the device illustrated in FIG. 1;
FIGS. 4a to 4c is an illustration of the method of forming the character shown in FIG. 2 from the primitives shown in FIG. 3;
FIG. 5 is a more detailed functional block diagram of the device illustrated in FIG. 1;
FIG. 6 is a detailed functional block diagram of a portion of the device illustrated in FIG. 1;
FIG. 7 is an illustration of a coding method used in the device illustrated in FIG. 1;
FIGS. 8a and 8b are illustrations of entered fundamental strokes;
FIGS. 9a and 9b are illustrations of still more ideographic characters;
FIG. 10 is an illustration of a probability matrix used in the device illustrated in FIG. 1;
FIG. 11 is an illustration of an English character; and
FIG. 12 is an illustration of more English characters.
Referring to FIG. 1, an apparatus 10 for identifying handwritten characters is shown. The apparatus 10 comprises an input device 12 connected to a data processor 14. The input
device 12 receives the handwritten character and converts the character into a series of signals that are conveyed to the data processor 14. The data processor 14 processes the received signals in order to detect the character entered on the input
device 12. An output device 16 is also connected to the data processor 14 and receives therefrom an international ASCII output code representing the handwritten character received by the input device 12. This allows a reproduction of the handwritten
character to be generated.
The apparatus 10 is operable in a number of modes, each mode of which allows handwritten characters of a different language to be recognized and reproduced. Selection means 18 are provided to allow a user to select the language in which the
apparatus 10 is to operate. Thus, the processing means 14 is responsive to the selection means 18 and is partitioned into sections 14a, 14b, . . . , 14n so that appropriate information for each language is separately stored and accessible depending on
the mode selected by the selection means 18.
For simplicity, the apparatus shown in FIG. 1 will be described when the processing means 14 is conditioned to detect ideographic characters, although it should be realized that characters in other languages can be detected in a similar manner by
conditioning the selection means 18 to a different mode.
Referring to FIG. 2, an ideographic character IC is shown. As can be seen, the ideographic character IC is formed from a number of fundamental strokes or primitives, the primitives being labelled as Pr.sub.1 to Pr.sub.3 respectively. The
primitives Pr.sub.1 to Pr.sub.3 are fundamental strokes used when writing in the ideographic language
The writing order of the sequence of strokes for ideographic characters is mainly based on logic, efficiency, experience and natural human habits. According to several research findings, there exist a number of basic rules when writing
ideographic characters and they are as follows:
up-down
left-right
out-in
horizontal-vertical
left slant-right slant
first enter-last close.
Each Chinese character may employ one or more of the above rules in the formation of the character. Examples of basic stroke sequences of ideographic characters are illustrated in Table 1 hereinbelow:
TABLE 1 ______________________________________ UP- HORIZONTAL- DOWN VERTICAL LEFT- LEFT RIGHT SLANT- RIGHT SLANT OUT- FIRST IN ENTER- LAST CLOSE ______________________________________
To decrease the number of primitives that a user must be required to write when forming an ideographic character and to reduce the amount of data that has to be processed by the processor 14, fifteen of the twenty primitives Pr.sub.a to Pr.sub.o
illustrated in FIG. 3 are used by the apparatus 10. The fifteen primitives Pr.sub.a to Pr.sub.o are members of the set of fundamental strokes typically used in the formation of ideographic characters. This sub-set of primitives is chosen since all of
the ideographic characters in the various languages can be formed from various combinations of the primitives Pr.sub.a to Pr.sub.o. The primitives Pr.sub.p to Pr.sub.t are used with some of the primitives Pr.sub.a to Pr.sub.o when the apparatus is
operating to detect characters written in another language as will be described.
Referring now to FIG. 5, the apparatus 10 is better illustrated. The input device 12 comprises an on-line digitizer tablet 20 having a stylus 20a. The ideographic character to be recognized is written on the tablet 20 with the stylus 20a. This
causes a series of cartesian co-ordinate data point signals PN.sub.o to PN.sub.N to be generated for each of the primitives Pr.sub.a to Pr.sub.o entered that form the ideographic character IC. The upper case "N" of the data point signal refers to the
order in which the primitive was entered when forming the character IC while the subscript "N" refers to the number of the sampled point along the primitive. The data point signals are then conveyed to the data processor 14.
A memory 22 is located in the data processor 14 and is connected to the digitizer tablet 20. The memory 22 receives the raw cartesian co-ordinate data point signals and stores them prior to processing. A pre-processor 24 receives a copy of the
cartesian co-ordinate data point signals PN.sub.o to PN.sub.N for each entered primitive and processes the data to remove redundant and spurious data. The pre-processed cartesian co-ordinate data signals are conveyed from the pre-processor 24 to a
feature extraction section 26 which converts the cartesian co-ordinate data point signals for each of the entered primitives Pr into a vector code and a series of scalars.
The vector code and series of scalars generated by the feature extraction section 26 are applied to a primitive detection section 28 which compares the vector code generated for each entered primitive Pr.sub.a to Pr.sub.o forming the character IC
with the contents of a look-up table or dictionary. This allows the processor 14 to detect whether the entered primitives are members of the fifteen primitives Pr.sub.a to Pr.sub.o. When an entered primitive Pr results in the formation of a vector code
equivalent to a vector code associated with only one of the fifteen primitives stored in the primitive detection section 28, a primitive code a to o is generated and conveyed to a memory 30. This process is performed for each vector code representing
each primitive Pr forming the entered ideographic character IC. Thus, a series of primitive codes or a character code is generated for the entered character which represents the ideographic character IC. However, if a vector code generated for an
entered primitive Pr is equivalent to a vector code associated with more than one of the fifteen primitives Pr.sub.a to Pr.sub.o, the detection section 28 performs tests on the series of scalars associated with the generated vector code to detect the
correct entered primitive.
The generated character code is conveyed from the memory 30 to a character detection section 32 and compared with the contents of a second look-up table or dictionary. Section 32 stores the character code representing each of the ideographic
characters in the language. The stored character codes are based on the requirement that the ideographic characters are formed from a combination of the fifteen primitives illustrated in FIG. 3 and that the characters are entered on the tablet 20 in an
order as determined by the previously mentioned rules. Since the previously mentioned rules are generally used when writing in an ideographic language, character codes which can represent ideographic characters, but are formed from primitives entered in
an incorrect order are omitted from the look-up table.
When the character code generated for the entered ideographic character IC is equivalent to a character code found in the character detection section 32, an associated output code or international ASCII output code is outputted to a memory 84.
However, if the character code is equivalent to a character code representing more than one ideographic character, the character detection section 32 performs operations on the raw cartesian co-ordinate data point signals stored in the memory 22 to
determine the correct ideographic character IC which the character code represents. This allows the correct international ASCII code to be outputted to the memory 34.
A substitution and correction means 36 is also provided and examines the entered character code when it is not equivalent to a character code stored in the character detection section 32. The substitution means 36 substitutes for the entered
character code, the most probable character code that the entered character code was supposed to represent and conveys it back to the character detection section 32 wherein the above-mentioned process is performed.
The international ASCII code representing the ideographic character IC stored in the memory 34 is applied to the output device or devices 16 which typically include a video display terminal (VDT) 16a, printer 16b and/or a video synthesizer 16c
wherein an audio and/or visual reproduction of the ideographic character IC can be formed.
Referring to FIG. 6, the processing means 14 is better illustrated. The pre-processor 24 comprises a comparator 24a and a memory 24b which function in a manner to be described to eliminate redundant and spurious cartesian co-ordinate data point
signals. The feature extraction section 26 includes a second comparator 26a and a look-up table or dictionary 26b which function to generate vectors for adjacent cartesian co-ordinate data point signals forming each primitive Pr. A memory 26c receives
the vectors and in turn conveys the vectors to a third comparator 26d. The comparator 26d examines the vectors and removes redundant information to form a series of unit vectors or a vector code for each primitive Pr and a series of scalars. The
scalars represent the length of each unit vector in the vector code generated for each primitive. The vector code and series of scalars generated for each primitive Pr are conveyed to a memory 26e and stored prior to being conveyed to the primitive
detection section 28.
The primitive detection section 28 includes a fourth comparator 28a connected to a second look-up tab-e or dictionary 28b. The table 28b stores a list of predetermined vector codes and a primitive code for each primitive Pr.sub.a to Pr.sub.o.
The vector codes represent one or more of the fifteen primitives Pr.sub.a to Pr.sub.o. The primitive detection section 28 also comprises a memory 28c which holds the scalars generated for each vector code and a test section 28d. The test section 28d
performs operations on the series of scalars if the vector code associated therewith is equivalent to a vector code which represents more than one of the fifteen primitives. This allows the correct primitive to be determined. When the vector code for
each of the entered primitives Pr is located in the dictionary 28b, the primitive code a to o associated therewith is applied to the memory 30.
The series of primitive codes or character code generated for the entered ideographic character IC is conveyed to the character detection section 32 which comprises a fifth comparator 32a and a third look-up table or dictionary 32b. The
dictionary 32b stores a list of the character codes forming each of the ideographic characters in the language and an associated international output code. The comparator 32a and the dictionary 32b function to detect whether the character code
representing the entered handwritten ideographic character IC is equivalent to a character code stored in the dictionary 32b representing one or more of the ideographic characters in the language. The character detection section 32 also includes a
differentiator 32c which performs tests on the raw cartesian co-ordinate data point signals if the character code is equivalent to a character code stored in the dictionary 32b which represents more than one ideographic character. This allows the
correct ideographic character to be detected. When the correct ideographic character has been identified, the international ASCII code associated therewith is conveyed to the memory 34 and in turn to the output device 16.
As mentioned previously, when the character code is not equivalent to a character code found in the dictionary 32b, the substitution and correction means 36 is used. The substitution section 36 includes a probability matrix 36a, a sixth
comparator 36b and a memory 36c which collectively function to determine the most probable character code that the character code generated for the entered handwritten ideographic character IC was supposed t be. This increases the probability of
detecting the ideographic character IC entered on the digitizer tablet 20.
When an ideographic character IC is to be entered into the apparatus 10 via the digitizer tablet 20, the stylus 20a is placed on the tablet 20 and each of the primitives Pr forming the ideographic character IC is drawn separately. As described
hereinabove, the primitives used to form the ideographic character IC must be substantially equivalent to one of the fifteen primitives Pr.sub.a to Pr.sub.o. However, this limitation does not pose many problems since each of the fifteen primitives are
fundamental strokes used by substantially everyone who is capable of writing in an ideographic language. Furthermore, the primitives Pr.sub.a to Pr.sub.o are chosen to reduce the number of entered characters that generate the same character code when
inputted into the apparatus 10 and to simplify processing in section 14. After a primitive Pr has been entered, the stylus 20a is removed from the tablet 20 for a predetermined length of time. This results in a time-out signal being generated which
allows the data processor 14 to recognize that the primitive Pr has been completely entered. Thereafter, the next primitive forming the character is entered and a time-out signal is generated. This process continues until each primitive forming the
character has been entered into the apparatus 10.
As the stylus 20a is moved across the tablet 20 to form a primitive Pr, a series of cartesian co-ordinate data point signals are generated. The data processor 14 samples the cartesian co-ordinate data point signals generated for each primitive
at a sampling rate of approximately 100 samples per second and stores the sampled co-ordinate data signals in the memory 22. The sampled data for each primitive is continuously stored in separate registers until the data processor 14 receives a time-out
signal signifying that the complete primitive has been entered. While the next primitive Pr.sub.2 is being formed on the tablet 20, the sampled cartesian co-ordinate data point signals are separately stored in different registers in the memory 22 until
the next time-out signal is detected by the processor 14. This process continues until each primitive forming the ideographic character has been entered and the cartesian co-ordinate data signals generated therefor have been stored separately in the
memory 22. To indicate to the data processor 14 that the entire ideographic character IC has been entered, an end-of-character (EOC) key located on the tablet must be depressed This prevents further data generated by the tablet 20 from corrupting the
data associated with previously entered handwritten ideographic character.
Since a digitizer tablet 20 is used, temporal and irregular noise occurs during the sampling process due to miscoupling of the stylus 20a and the digitizer tablet surface 20. Furthermore, small amplitude noise occurs due to uneven movements in
the operator's hand which introduces discrepancies between the sampled cartesian co-ordinate data point signals and the desired cartesian co-ordinate data point signals. Also, the slow movement of the | | |