|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a sign language translation system and
method for recognizing a type of sign language and translating it into a
spoken language.
2. Description of the Related Art
As described in Proceedings of Conference on Human Factors in Computing
Systems CHI'91 (1991), pp. 237 to 242, a conventional sign language
translation method recognizes a sign language by converting the motion of
hands wearing gloves with special sensors into electrical signals and
checking whether the pattern of signals matches any one of previously
registered signal patterns.
Another conventional method described in JP-A- 2-144675 recognizes finger
spellings by taking an image of a pair of colored gloves by a TV camera to
derive color information of differently colored areas of the gloves and by
determining whether the pattern of the derived color information matches
any one of previously registered color information patterns.
There are three types of Japanese sign languages, including (1) "a
traditional type sign language" having been used by auditory-impaired
persons, (2) "a simultaneous type sign language" in which each sign
language word is assigned a Japanese word, and (3) "an intermediate type
sign language" in which each sign language word is arranged in the same
order as a spoken language. The intermediate type sign language is most
frequently used. With this type of sign language, sign words are given in
the same order as a spoken language. Consider for example a Japanese
sentence " , , , , , "
equivalent to a phonetic sentence "Watakushi wa fuyu ni hokkaido e ikou to
omou" and corresponding to the English sentence "I am thinking of going to
Hokkaido in the winter". In this case, only important words (also called
independent words) such as a verb " "="iku"="go" and nouns " "="hokkaido"
(Japanese district name), and " "="fuyu"="winter", are expressed in the
sign language, and other words are generally omitted such as postpositions
" , , , ,"="te, ni, wo, ha"="various words or word elements similar to
English particles but postpositional according to the Japanese Grammar",
words dependent on auxiliary verbs " , , , "="reru, rareru, you,
tai"="various words or word elements used in a corresponding manner in
English), and pseudo nouns " , , "="koto, mono, no"="various words used in
a corresponding manner in English". In addition, the conjugations of
conjugative words such as verbs, auxiliary verbs, adjectives, adjective
verbs, are generally omitted. Accordingly, the first-mentioned
conventional method which simply arranges sign language words in the order
recognized, is difficult to express a spoken language.
The sign language mainly used in the U.S.A. is the American Sign Language
(ASL). In the U.S.A, the "traditional type sign language" is widely
accepted. Also in the case of ASL, articles such as "a, an, the" and
prepositions such as "in, at, on, of, by, from" are often omitted.
The second-mentioned conventional method recognizes Japanese finger
spellings " "="a", " "="i", " "="u" and so on, corresponding to English
alphabet characters. Finger spellings are used as an alternative means for
the case when sign words cannot be remembered or understood, and so
conversation using only finger spellings is rarely had. This conventional
method is not satisfactory for conversation, because each word in a
sentence is expressed by giving the finger spellings of all characters or
letters of the word.
Such problems of sign language recognition are associated not only with
Japanese sign languages but also with other sign languages for English and
other foreign languages. Such problems are inherent to the sign language
which provides communications by changing the positions and directions of
hands (inclusive of fingers, backs, palms), elbows, and arms, and by using
the whole body including the face, chest, abdomen, and the like.
SUMMARY OF THE INVENTION
It is an object of the present invention to translate recognized sign words
into a spoken language.
In order to achieve the above object of the present invention, there is
provided a sign language translation system and method which produces a
spoken language by recognizing a series of hand motions to translate them
into words, analyzing the dependency relation between the recognized
words, supplementing omitted words between the recognized words by using
omission/supplement rules, and changing the conjugations of conjugative
words such as verbs and auxiliary verbs.
Namely, case postposition (Japanese post-positional particle) are
supplemented between words having a dependence relationship: a case
postposition determined by the dependence relationship is inserted.
between the recognized words.
In addition, the omitted conjugation at the ending of each conjugative word
having a dependence relationship is determined, while referring to the
predefined conjugations of conjugative words including verbs, adjectives,
adjective verbs, and auxiliary verbs.
Also supplemented are omitted auxiliary verbs, postpositions, pseudo nouns,
and conjunctions, respectively providing the semantic and time sequential
relationships between depending and depended words having a dependence
relationship.
For example, given a train of words in a sign language " "="watakushi"="I",
" "="fuyu"="winter", " "="Hokkaido", " "="iku"="go", and "
"="omou"="think", the dependence analysis finds that the words " ", " "
and " " are the subjective case, place case, and time case of the depended
verb " ", respectively. Next, the omission/supplement rule supplements
case postpositions "wa", "ni", and "e" after the words " ", " ", and " ",
respectively. A case postposition " "="to"="of" is added before the verb "
". An auxiliary verb " "="u" (meaning an intention) is added between the
words " " and " ". A conjugation at the ending of the conjugative word " "
"iku"="go" is changed in order to correctly connect, from the viewpoint of
gramar, the words " " " " and " "="to omou"="thinking of". As a result, a
spoken language sentence " , , , , , "="I am thinking of going to Hokkaido
in the winter" can be generated.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows the structure of a sign language translation system according
to an embodiment of the present invention;
FIG. 2 is a diagram illustrating the outline of the operation to be
executed by the embodiment system;
FIG. 3 shows the details of a word recognition procedure;
FIG. 4 is a diagram illustrating a dependence analysis procedure;
FIG. 5 shows the details of the dependence analysis procedure;
FIG. 6 shows the details of an omission/ supplement procedure;
FIG. 7 shows the data structure of recognized sign data;
FIG. 8 shows the data structure of an analysis table;
FIG. 9 shows an example of a word dictionary;
FIG. 10 shows the structure of an analysis stack;
FIG. 11 shows an example of a case dictionary;
FIG. 12 shows examples of omission/supplement rules;
FIG. 13 shows an example of a conjugation table; and
FIG. 14 shows an example of a generated sentence.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Embodiments of the present invention will be described with reference to
FIGS. 1 to 14.
FIG. 1 shows the structure of a Japanese sign language translation system
according to an embodiment of the present invention. This system is
realized by a computer having a CPU 102, input devices (keyboard 101 and
sign data input device 105), an output device (a CRT 103), and a memory
104. Stored in the memory 104 at a predetermined memory area are programs
2 for generating sentences of a spoken language expression from input sign
data 7 supplied from the sign data input device 105. The programs 2 are
loaded to the CPU 102 for various procedures to be performed. These
programs 2 include a program 3 for recognizing a train of words from the
input sign data 7, a program 4 for analyzing dependency relationship
between words, and a program 6 for supplementing omitted words in and
between words having the dependency relation to generate a sentence 14.
Reference numeral 7 represents a memory area in which the input sign data
is stored. Reference numeral 8 represents a memory area in which an
analysis table to be used by the programs 2 is stored. Reference numeral
10 represents a memory stack area to be used by the dependency analyzing
program 4. Reference numeral 9 represents a memory area in which a word
dictionary is stored, the word dictionary storing the meaning and part of
speech of each word to be used by the sign language. Reference numeral 11
represents a memory area in which a case dictionary is stored, the case
dictionary being used for defining the case of a word dependable on a
predicate word such as a verb to be used by the sign language. Reference
numeral 12 represents a memory area in which omission/supplement rules are
stored, the rules providing a method of supplementing omitted words in and
between words having the dependency relation. Reference numeral 13
represents a memory area in which a conjugation table is stored, the table
defining the conjugation rule for conjugative words. Reference numeral 14
represents a memory area in which generated sentences are stored.
In the sign language translation system shown in FIG. 1, CPU 102 generates
translated sentences in accordance with the programs loaded from the
memory 104. In this case, a discrete hardware structure may be used for
the execution of part or the whole of the word recognition (corresponding
to the program 3 in FIG. 1), dependency relationship analysis
(corresponding to the program 4 in FIG. 1), word omission/supplement
(corresponding to the program 6 in FIG. 1), and other functions.
FIG. 7 shows the detailed structure of input sign data in the form of a
table. This table stores the history of right and left hand data 71 and
72, which represent states of the right and left hands, supplied from the
input device 105 at a predetermined time interval. It is well known to use
gloves for the input device 105, which gloves allow the position and
direction of each hand and the flection of each finger to be converted
into electrical signals. The motion of hands may be input by using a TV
camera for example. The right and left hand data 71 and 72 are stored in
the table at the locations 73, 74, . . . , 75 at sampling times T0, T1, .
. . , Tn, respectively. Data 711 indicates the first articulation angle of
a thumb of the right hand, and data 712 indicates the second articulation
angle of a thumb of the right hand. Data 713 indicates the x-coordinate
position of the right hand, data 714 indicates the y-coordinate position
of the right hand, and data 715 indicates the z-coordinate position of the
right hand. Data 716 indicates the angle of the direction of the right
hand relative to the x-axis, data 717 indicates the angle of the direction
of the right hand relative to the y-axis, and data 718 indicates the angle
of the direction of the right hand relative to the z-axis.
FIG. 8 shows the details of the analysis table 8 which stores data of a
train of words 88 obtained through analysis of the data 7 supplied from
the input device 105. This table is constructed of a word field 81, part
of speech field 82, meaning field 83, depended word field 84, dependency
relationship field 85, and supplemented result 86.
FIG. 9 shows an example of the word dictionary 9 which stores a part of
speech 92 of each word 91 used by the sign language and the meaning 93
thereof.
FIG. 10 shows the structure of the analysis stack area 10, indicating that
words 1001 to 1004 were stacked in this area during the dependency
analysis.
FIG. 11 shows an example of the case dictionary 11 which stores a
dependable case 112 for a predicate word such as a verb 111, dependable
case meaning 113, and case postposition 114 to be added to the word of the
dependable case.
FIG. 12 shows examples of the omission/ supplement rule 12. The rule 121
illustrates a supplement method 123 of supplementing a depending word,
when the condition 122 is met wherein all conditions 1221, 1222, and 1223
are concurrently satisfied. Similarly, the rule 124 illustrates another
supplement method 126 of supplementing a depending word, when the
condition 125 is met wherein all conditions 1251 to 1254 are satisfied.
For example, with the rule 121, on the conditions that the part of speech
of a depending word is a noun, pronoun, or proper noun, that the part of
speech of the depended word is a verb, and that the dependable case is a
subjective case, time case, position case, or objective case, a case
postposition for the dependable case is supplemented to the depending
word.
FIG. 13 shows an example of the conjugation table 13 showing a methodical
presentation of the conjugations of conjugative words such as verbs and
auxiliary verbs.
FIG. 14 shows an example of a generated sentence 14 subjected to
supplement.
The operation of the embodiment will be described with reference to FIGS. 2
to 6.
FIG. 2 is a diagram illustrating the overall operation to be executed by
the embodiment system. Steps 4 and 6 shown in FIG. 2 concern the subject
matter of the present invention.
At Step 3, a train of words are recognized from the input sign data 7, and
loaded in the analysis table 8. The details of Step 3 will be given later
with reference to FIG. 3.
At Step 4, the dependence relationship between the words of the recognized
train is analyzed by using the analysis table 8, word dictionary 9,
analysis stack area 10, and case dictionary 11. The analyzed results are
stored in the analysis table 8. The details of Step 4 will be given later
with reference to FIGS. 4 and 5.
At Step 6, omitted words between the words having the dependency
relationship are estimated by using the analysis table 8, case dictionary
11, omission/supplement rules 12, and conjugation table 13, and the
estimated words are supplemented to the words having the dependency
relationship to generate a spoken language sentence 14 and store it in the
memory. The details of Step 6 will be given later with reference to FIG.
6.
The details of the word recognition procedure shown in FIG. 3 will be
described. First, the principle of the word recognition method used in
this embodiment will be described. In a sign language, a word is expressed
by a series of hand and/or finger motions, and a sentence is expressed by
a train of words. In this embodiment, the history data of a series of
motions of hands, corresponding to a train of words, is stored as the
input sign data 7. The word recognition program 3 compares, on the basis
of maximum coincidence, i.e., a selection method of a train of words
having the longest matching portion, the input sign data 7 with history
data (not shown) of motions of hands learned in advance through a neural
network. The train of words are cut out while being recognized. A typical
learning and recognition by a neural network is well known in the art. For
example, refer to "Neural Computer--Learning from Brains and Neurons", by
Kazuyuki AIHARA, the Publication Department of Tokyo Electric-College,
1988, pp. 93 to 128.
At Step 31, an initial value of a recognition section for the input sign
data 7 are set. As the initial value, a start time of the recognition
section P1 is set to time T0, and an end time of the recognition section
P2 is set to time T0+W2. W2 is a maximum range value of the recognition
section.
At Step 32, data in the recognition section is taken out of the input sign
data 7, and is inputted to an already learned neural network. The neural
network outputs recognized words Y (P1, P2) and coincidence values Z (P1,
P2). The more a pattern of inputted data matches an already learned data
pattern, the larger the coincidence value Z is. If there are a plurality
of words having the same coincidence value Z, these words may be output as
a plurality of candidate words. Alternatively, a different evaluation
method may be used, or one of a plurality of candidates may be
preferentially selected in accordance with a predetermined order, e.g.,
the first processed word.
At Step 33, the end time of the recognition section is shortened by one
unit.
At Step 34, it is checked if the recognition section becomes smaller than
the minimum value W1. If not, the procedure returns to Step 32 to again
recognize data in the one-time shortened recognition section.
At Step 35, a word having the most accurate recognizing result is selected
from among recognized words Y (P1, P2) having the maximum coincidence
values Z (P1, P2) in a case of the end time in range of W1 to W2, and is
stored in the analysis table 8 at the word field 81.
At Step 36, the recognition section is changed to the next recognition
section, and W2 is set as an initial value of the start point for the next
recognition section.
At Step 37, it is checked or not the whole input sign data 7 has been
processed. If not, the procedure returns to Step 32 to continue the word
recognition. If so, the word recognition is terminated. In this manner, a
train of words are recognized from the input sign data 7 and stored in the
analysis table 8.
As another method or the word recognition procedure, a dynamic programming
(DP) matching scheme well known in the field of speech recognition, or
other methods, may also be used. For example, a continuous DP matching
scheme has been proposed by OKA in "Continuous Word Recognition using
Continuous DP", the Speech Study Group of Acoustical Society of Japan,
S78-20, pp. 145 to 152, 1978. By using this or other techniques, it is
possible to dynamically check the matching between a time sequential and
continuous pattern and reference patterns. According to the above-cited
document, it is recognized whether a reference pattern is included in a
continuous pattern, through the continuous matching while the reference
pattern is moved in a direction of the continuous pattern. During this
matching, the time sequence of similarity factors of the continuous
patterns relative to the reference patterns are obtained. A portion of the
continuous pattern having a minimum of the similarity factors which is
equal to or lower than a threshold value is selected as a candidate for
the reference pattern.
However, if data sampled at a predetermined time interval is used as it is
for the continuous pattern and the reference patterns, the time required
for pattern matching becomes long in proportion to the length of
continuous pattern and the number of reference patterns.
Effective word recognition of the sign language can be performed in the
following manner. Specifically, in order to normalize a sign language
pattern considering non-linear expansion/compression, a corresponding
relation is taken between samples of the reference pattern through DP
matching, an average of which samples is determined to produce a sign
language reference pattern. In the continuous sign language pattern
recognition using the continuous DP matching, the continuous sign language
pattern and a reference pattern are compressed and checked whether both
coincide with each other, i.e., the matching is obtained while allowing
the non-linear expansion/compression in the time domain.
In sign language speech, often persons or articles are expressed by
assigning them particular positions, and each position is used thereafter
as the pronoun of each person or article. For example, assuming that a
person A positions at the front right position and a person B positions at
a front left position, a motion of " "="hanasu"="speaking" from person A
to person B, means "person A speaks to person B". Since the word
recognition at Step 3 supplies the position information, it is possible to
correctly recognize a person or article indicated by the pronoun
represented by the position information used in the preceding speech.
FIG. 4 shows the contents of the operation to be executed by the dependence
analysis program 4. In the following description, a train of words 88
shown in FIG. 8 are illustratively used. This word train has five words "
" "watakushi"="I", " "="fuyu"="winter", " "="Hokkaido", " "="iku"="go",
and " "="omou"="think".
At Step 41, the words in the word field 81 of the analysis table 8 are
checked sequentially starting from the start word, whether each word
matches to a word of the word dictionary 9. The part of speech 92 and
meaning 93 in the word dictionary for the matched word are entered in the
part of speech field 82 and meaning field 83 of the analysis table 8.
At Step 42, the depended word and dependence relationship of each matched
word are analyzed and entered in the depended word field 84 and dependence
relationship field 85 of the analysis table 8.
The details of Step 42 will be given with reference to FIG. 5. First, the
principle of the dependence relationship analysis will be described.
It is known that a dependence relationship is present between phrases
constituting a sentence, and a one-sentence/one-case principle and a
no-cross principle are applied.
The one-sentence/one-case principle teaches that a different case is
provided to each predicate in one sentence, without occurrence of the same
two cases. The no-cross principle teaches that a line indicating the
dependence relationship will not cross another line. The dependence
analysis for a train of words is performed using these principles.
At Step 51, the words 81 in the analysis table are sequentially read from
the analysis table 8, starting from the first word.
At Step 52, it is checked if there is no word to be processed. If there is
no word, the procedure is terminated.
At Step 53, the read-out words are stacked on the analysis stack area 10.
FIG. 10 shows the four words stacked, starting from the first word.
At Step 54, the part of speech of each word is checked. If the part of
speech is a predicate such as a verb, the procedure advances to Step 55,
and if not, returns to Step 51.
At Step 55, the case dictionary 11 is referred to with respect to the verb
on the analysis stack area 10 to determine the cases of depending words,
through comparison between the contents of the case dictionary 11 and
analysis stack area 10. In the example shown in FIG. 10, the verb "
"="iku"="go" is referred to the case dictionary 11 to check the contents
shown in FIG. 11. Upon the comparison with the contents on the analysis
stack area 10 shown in FIG. 10, a correspondence can be obtained, which
shows that " "="watakushi"="I" 1001 is the subjective case for the verb "
"="iku"="go", " "="fuyu"="winter" is the time case, and " "="Hokkaido" is
the place case. This data is set in the analysis table 10 at the depended
word field 84 and dependence relationship field 85.
In the above example, the case frame of predicates has been described for
the dependence relationship analysis. The dependence relationship between
substantives such as nouns may also be performed using the case frame of
substantives. For example, the noun frame " "="dokokara"="from where" and
" , "="dokomade"="to where" for defining a " " "kyori"="distance"
determine a certain dependence relationship between nouns.
If the dependence relationship is ambiguous and cannot be analyzed
definitely, the most probable one may be selected or a plurality of
candidates may be used at the later processes.
At Step 56, the entries on the analysis stack area 10 whose dependence
relationships have been determined, are cleared with only the verb being
left uncleared, the remaining words of the train are set on the analysis
stack 10 to further continue the dependence analysis.
As understood from the above description, when the next word "
"="omou"="think" is set on the analysis stack area 10, the contents of "
"="omou"="think" in the case dictionary 11 are referred to and compared
with the contents in the analysis stack area 10. As a result, the word "
"="iku"="go" is determined as the objective case of the word "
"="omou"="think". The results of the dependence analysis are stored in the
analysis table 8. It is noted that there is no-cross between dependence
relationships in the above analysis.
FIG. 6 shows the details of the operation to be executed by the
omission/supplement program 6. This program supplements omitted words by
using the omission/supplement rules 12 or the like, while considering the
word pairs having a dependence relationship determined by the dependence
analysis program 4.
At Step 61, the words 81 in the analysis table 8 are sequentially read
starting from the first word.
At Step 62, it is checked whether a pair of depending and depended words,
and the dependence relationship 85 between the words, match conditions of
a proper omission/supplement rule 12 to search for the rule satisfying the
conditions. Next, the supplement method designated by the searched rule is
executed and the results are stored in the analysis table 8 at the
supplement result field 86.
At Step 63, it is checked if all the words in the analysis table 8 have
been processed. If not, the procedure returns to Step 61 to repeat the
above operation.
The above procedure will be explained by using a particular example. As the
first word " "="watakushi"="I" is read from the analysis table 8, the
depended word " "="iku"="go" and dependence relationship " "
"shukaku"="subjective case" are also read, and these are compared with
each of the omission/supplement rules 12 to identify the rule 121. The
supplement method 123 is therefore executed so that the case postposition
" "="wa" (Japanese postpositional particle representing the subjective
case) is read from the case dictionary 11 and supplemented to the word "
"="watakushi"="I" after it. The results " "="I am" are stored in the
analysis table 8 at the supplement result field 86. Similarly, the case
postpositions " "="ni"="in" and " "="e"="to" are supplemented to the words
" "="fuyu"="winter" and " "="Hokkaido"="hokkaido", respectively, to obtain
the results " "="fuyu ni"="in the winter" and " "="Hokkaido e"="to
Hokkaido". To the word pair of " "="iku"="go" and " "="omou"="think", the
rule 121 is first applied, and the case postposition " "="to"="of" is
supplemented after the word " "="iku"="go". Next the rule 124 is applied,
and the supplement method 126 is executed so that " "="u" (Japanese
postpositional particle indicating intention) is supplemented between "
"="iku"="go" and " "="to"="of". The conjugations of this word "
"="iku"="go" are further determined so as to satisfy the conjugative
conditions defined by the supplement method 126. Accordingly, the "
"="mizenkei"="negative form (this is an arbitrary term since "mizenkei"
has various functions according to the Japanese grammar)" 2 of the word "
"="iku"="go" is determined as " "="ko" 131 from the conjugation table 13.
Similarly, the conclusive form of " "="u" is determined as " "="u" 132
from the conjugation table 13. In this manner, the result obtained by the
supplement method 126 becomes " "="ikou to"="of going" which is stored in
the analysis table 8 at the supplement result field 86.
If any omission/supplement rule cannot be found, the word 81 per se is
stored in the supplement result field 86. In this case, even if the
analysis fails in the midst of procedures, the train of words analyzed
until then are output, to help the user obtain some information. A symbol
representing a failure of supplement may be stored in the supplement
result field 86.
At Step 64, the supplement results stored in the analysis table at the
result field 86 are connected together to generate and store a translated
spoken-language sentence 14 which is displayed on CRT 103 as a translated
spoken-language sentence 107. The train of words 81 entered as the sign
words 106 are also displayed on CRT 103. The translated spoken-language
sentence may be output as sounds-by using a voice synthesizer.
The motions of hands or the like may be displayed on CRT 103 in response to
the translated spoken-language sentence by using computer graphics
techniques. In such a case, the procedures of this embodiment are reversed
by time sequentially synthesizing images while referring to a dictionary
of sign elements each representing a minimum unit of a sign language word,
and to a dictionary of words each constituted by a combination of sign
elements. With synthesized images of a sign language, an auditory-impaired
person can visually confirm his or her sign language. The output may be
any combination of a sentence text, voice representations, and synthesized
images.
According to the present invention, omitted words between the sign language
words can be estimated and supplemented, allowing to translate a sign
language into a spoken language.
The present invention is not limited only to a Japanese sign language, but
is applicable to various other sign languages. In the case of Japanese
language, omitted words can be supplemented using case frame grammar
models as in the above-described embodiment. For the European and American
languages such as English, the order of words has a significant meaning.
In such a case, omitted words may be supplemented by using syntactic
grammar models.
* * * * *
|
|
|
|
|
Description  |
|