|
Claims  |
|
|
What is claimed is:
1. A method for processing speech information in a speech recognition
system, wherein the information is represented by a sequence of frames,
said speech recognition system being capable of comparing a given frame
set to a template, and having template memory to store said template, said
processing method comprising the steps of:
(a) combining contiguous acoustically similar frames of a previous frame
set into representative frames to form a reduced template;
(b) storing said reduced template in template memory; and
(c) comparing frames of said given frame set to said representative frames
of said reduced template according to the number of frames combined in
said representative frames of said reduced template to produce a measure
of similarity between the given frame set and the template.
2. The method of claim 1, wherein combining further includes the steps of
generating a distortion measure for each representative frame and
comparing each said distortion measure to a predetermined distortion
threshold.
3. The method of claim 1, wherein combining further includes the step of
determining a distortion measure corresponding to said reduced template.
4. The method of claim 1, further including the step of determining the
number of frames combined in each representative frame.
5. The method of claim 1, wherein storing further includes the step of
storing the number of frames combined into each representative frame.
6. The method of claim 1, further including the step of determining the
number of representative frames representing said reduced template.
7. The method of claim 1, wherein storing further includes the step of
storing a first element of frame data corresponding to the difference
between a second element of frame data and said first element of frame
data.
8. The method of claim 1, wherein storing further includes the step of
storing an energy measure for each representative frame.
9. The method of claim 1, wherein comparing further includes the step of
accumulating distance measures corresponding to the difference between the
given input word and the reduced template.
10. A method for generating a measure of similarity for speech information
in a speech recognition system, wherein the information is represented by
a sequence of frames, said speech recognition system being capable of
comparing a given input frame set to a template, said method comprising
the steps of:
combining contiguous acoustically similar frames of a previous frame set
into representative frames to form a reduced template;
comparing frames of said given frame set to said representative frames of
said reduced template by accumulating a set of distance measures for each
representative frame, each said set having a total number of accumulated
distance measures corresponding to the number of frames combined in each
said representative frame; and
determining a measure of similarity between said given frame set and said
template based on said accumulated distance measures.
11. The method of claim 10, wherein comparing further includes the step of
defining a maximum number of accumulations for a particular representative
frame corresponding to the number of frames combined into said particular
representative frame.
12. The method of claim 11, wherein comparing further includes the step of
defining a minimum number of accumulations proportional to said maximum
number of accumulations for said particular representative frame.
13. The method of claim 10, wherein comparing further includes the step of
sequentially accumulating similarity measures corresponding to two
representative frames, two said representative frames being separated by
at least one other said representative frame, but without accumulating a
similarity measure from said other representative frame.
14. An arrangement for processing speech information in a speech
recognition system, wherein the information is represented by a sequence
of frames, said speech recognition system being capable of comparing a
given frame set to a template, and having template memory to store said
template, said processing arrangement comprising:
(a) means for combining contiguous acoustically similar frames of a
previous frame set into representative frames to form a reduced template;
(b) means for storing said reduced template in template memory; and
(c) means for comparing frames of said given frame set to said
representative frames of said reduced template according to the number of
frames combined in said representative frames of said reduced template to
produce a measure of similarity between the given frame set and the
template.
15. The arrangement of claim 1, wherein means for combining further
includes means for generating a distortion measure for each representative
frame and comparing each said distortion measure to a predetermined
distortion threshold.
16. The arrangement of claim 1, wherein means for combining further
includes means for determining a distortion measure corresponding to said
reduced template.
17. The arrangement of claim 1, further including means for determining the
number of frames combined in each representative frame.
18. The arrangement of claim 1, wherein means for storing further includes
means for storing the number of frames combined into each representative
frame.
19. The arrangement of claim 1, further including means for determining the
number of representative frames representing said reduced template.
20. The arrangement of claim 1, wherein means for storing further includes
means for storing a first element of frame data corresponding to the
difference between a second element of frame data and said first element
of frame data.
21. The arrangement of claim 1, wherein means for storing further including
means for storing an energy measure for each representative frame.
22. The arrangement of claim 1, wherein means for comparing further
includes means for accumulating distance measures corresponding to the
difference between the given input word and the reduced template.
23. An arrangement for generating a measure of similarity for speech
information in a speech recognition system, wherein the information is
represented by a sequence of frames, said speech recognition system being
capable of comparing a given input frame set to a template, said
arrangement comprising:
means for combining contiguous acoustically similar frames of a previous
frame set into representative frames to form a reduced template;
means for comparing frames of said given frame set to said representative
frames of said reduced template by accumulating a set of distance measures
for each representative frame, each said set having a total number of
accumulated distance measures corresponding to the number of frames
combined in each said representative frame; and
means for determining a measure of similarity between said given frame set
and said template based on said accumulated distance measures.
24. The arrangement of claim 10, wherein means for comparing further
includes means for defining a maximum number of accumulations for a
particular representative frame corresponding to the number of frames
combined into said particular representative frame.
25. The arrangement of claim 11, wherein means for comparing further
includes means for defining a minimum number of accumulations proportional
to said maximum number of accumulations for said particular representative
frame.
26. The arrangement of claim 10, wherein means for comparing further
includes means for sequentially accumulating similarity measures
corresponding to two representative frames, two said representative frames
being separated by at least one other said representative frame, but
without accumulating a similarity measure from said other representive
frame.
27. The method for processing speech information in a speech recognition
system, wherein the information is represented by a sequence of frames,
said speech recognition system being capable of comparing a given
frame-set to a template using a word model comprising of a plurality of
states, and having template memory to store said template, said processing
method comprising the steps of:
(a) combining contiguous acoustically similar frames of a previous
frame-set into representative frames;
(b) forming a word model having a plurality of states, each state
corresponding to one of said representative frames; and
(c) comparing at least a predetermined minimum number of frames of the
given frame-set to a state of the word model according to the number of
frames combined in said representative frames to produce a measure of
similarity between the given frame-set and the template.
28. The method for processing speech information in a speech recognition
system, wherein the information is represented by a sequence of frames,
said speech recognition system being capable of comparing a given
frame-set to a template using a word model comprising of a plurality of
states, and having template memory to store said template, said processing
method comprising the steps of:
(a) combining contiguous acoustically similar frames of a previous
frame-set into representative frames;
(b) forming a word model having a plurality of states, each state
corresponding to one of said representative frames; and
(c) comparing not more than a predetermined maximum number or frames of the
given frame-set to a state of the word model according to the number of
frames combined in said representative frames to produce a measure of
similarity between the given frame-set and the template.
29. The method for processing speech information in a speech recognition
system, wherein the information is represented by a sequence of frames,
said speech recognition system being capable of comparing a given
frame-set to a template using a word model comprising of a plurality of
states, and having template memory to store said template, said processing
method comprising the steps of:
(a) combining contiguous acoustically similar frames of a previous
frame-set into representative frames;
(b) forming a word model having a plurality of states, each state
corresponding to one of said representative frames; and
(c) comparing at least a predetermined minimum number of frames, but not
more than a predetermined maximum number of frames, of the given frame-set
to a state of the word model according to the number of frames combined in
said representative frames to produce a measure of similarity between the
given frame-set and the template. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND
The present invention relates to word recognition for a speech recognition
system and, more particularly, to word recognition using word templates
having a data reduced format.
Typically, speech recognition systems represent spoken words as word
templates stored in system memory. When a system user speaks into the
system, the system must digitally represent the speech for comparison to
the word templates stored in memory.
Two particular aspects of such an implementation have received a great deal
of attention. The first aspect pertains to the amount of memory which is
required to store the word templates. The representation of speech is such
that the data used for matching to an input word typically requires a
significant amount of memory to be dedicated for each particular word.
Moreover, a large vocabulary causes extensive computation time to be
consumed for the match. In general, the computation time increases
linearly with amount of memory required for the template memory. Practical
implementation in real time requires that this computation time be
reduced. Of course, a faster processor architecture could be employed to
reduce this computation time, but due to cost considerations, it is
prefered that the data representing the word templates be reduced to
reduce the computation.
The second aspect pertains to the particular matching techniques used in
the system. Most word recognition techniques have been directed to the
accuracy of the recognition process for a particular type of feature data
used to represent the speech. Typically, channel bank information or LPC
parameters represent the speech. When using feature data of a reduced
format, the word recognition process must be sensitive to the format for
an effective implementation.
The speech recognition system, described herein, clusters frames within the
word templates to reduce the representative data, for which a word
recognition technique requires special consideration to the combined
frames. Data reduced word templates represent spoken words in a compacted
form. Matching an incoming word to a reduced word template without
adequately compensating for its compacted form will result in degraded
recognizer performance. An obvious method for compensating for data
reduced word templates would be uncompacting the reduced data before
matching. Unfortunately, uncompacting the reduced data defeats the purpose
of data reduction. Hence, a word recognition method is needed which allows
reduced data to be directly matched against an incoming spoken word
without degrading the word recognition process.
OBJECTS AND SUMMARY OF THE PRESENT INVENTION
Accordingly, an object of the present invention is to provide a system of
word recognition which reduces template data and recognizes the reduced
data in an efficient manner.
The present invention teaches an arrangement and method for processing
speech information in a speech recognition system. In a system where the
information is depicted as words, each word represented by sequence of
frames and where the recognition system has means for comparing present
input speech to a word template, the word template stored in template
memory and derived from one or more previous input words, the processing
method includes (1) combining continguous acoustically similar frames
derived from the previous input word (3) into representative frames to
form a corresponding reduced word template, (2) storing the reduced word
template in template memory in an efficient manner, and (3) comparing
frames of the present input speech to the representative frames of the
reduced word template according to the number of frames combined in the
representative frames of the reduced word template to produce a measure of
similarity between the present input speech and the word template.
BRIEF DESCRIPTION OF THE DRAWINGS
Additional objects, features, and advantages in accordance with the present
invention will be more clearly understood by reference to the following
description taken in connection with the accompanying drawings, in the
several figures of which like reference numerals identify like elements,
and in which:
FIG. 1 is a general block diagram illustrating the technique of
synthesizing speech from speech recognition templates according to the
present invention;
FIG. 2 is a block diagram of a speech communications device having a
user-interactive control system employing speech recognition and speech
synthesis in accordance with the present invention;
FIG. 3 is a detailed block diagram of the preferred embodiment of the
present invention illustrating a radio transceiver having a hands-free
speech recognition/speech synthesis control system;
FIG. 4a is an expanded block diagram of the data reducer block 322 of FIG.
3;
FIG. 4b is a flowchart showing the sequence of steps performed by the
energy normalization block 410 of FIG. 4a;
FIG. 4c is a detailed block diagram of the of the particular hardware
configuration of the segmentation/compression block 420 of FIG. 4a;
FIG. 5a is a graphical representation of a spoken word segmented into
frames for forming a cluster according to the present invention;
FIG. 5b is a diagram exemplifying output clusters being formed for a
particular word template, according to the present invention;
FIG. 5c is a table showing the possible formations of an arbitrary partial
cluster path according to the present invention;
FIGS. 5d and 5e show a flowchart illustrating a basic implementation of the
data reduction process performed by the segmentation/compression block 420
of FIG. 4a;
FIG. 5f is a detailed flowchart of the traceback and output clusters block
582 of FIG. 5e, showing the formation of a data reduced word template from
previously determined clusters;
FIG. 5g is a traceback pointer table illustrating a clustering path for 24
frames, according to the present invention, applicable to partial
traceback;
FIG. 5h is a graphical representation of the traceback pointer table of
FIG. 5g illustrated in the form of a frame connection tree;
FIG. 5i is a graphical representation of FIG. 5h showing the frame
connection tree after three clusters have been output by tracing back to
common frames in the tree;
FIGS. 6a and 6b comprise a flowchart showing the sequence of steps
performed by the differential encoding block 430 of FIG. 4a;
FIG. 6c is a generalized memory map showing the particular data format of
one frame of the template memory 160 of FIG. 3;
FIG. 7a is a graphical representation of frames clustered into average
frames, each average frame represented by a state in a word model, in
accordance with the present invention;
FIG. 7b is a detailed block diagram of the recogition processor 120 of FIG.
3, illustrating its relationship with the template memory 160;
FIG. 7c is a flowchart illustating one embodiment of the sequence of steps
required for word decoding according to the present invention;
FIGS. 7d and 7e comprise a flowchart illustrating one embodiment of the
steps required for state decoding according to the present invention;
FIG. 8a is a detailed block diagram of the data expander block 346 of FIG.
3;
FIG. 8b is a flowchart showing the sequence of steps performed by the
differential decoding block 802 of FIG. 8a;
FIG. 8c is a flowchart showing the sequence of steps performed by the
energy denormalization block 804 of FIG. 8a;
FIG. 8d is a flowchart showing the sequence of steps performed by the frame
repeating block 806 of FIG. 8a;
FIG. 9a is a detailed block diagram of the channel bank speech synthesizer
340 of FIG. 3;
FIG. 9b is an alternate embodiment of the modulator/bandpass filter
configuration 980 of FIG. 9a;
FIG. 9c is a detailed block diagram of the preferred embodiment of the
pitch pulse source 920 of FIG. 9a;
FIG. 9d is a graphic representation illustrating various waveforms of FIGS.
9a and 9c.
DESCRIPTION OF THE PREFERRED EMBODIMENT
1. System Configuration
Referring now to the accompanying drawings, FIG. 1 shows a general block
diagram of user-interactive control system 100 of the present invention.
Electronic device 150 may include any electronic apparatus that is
sophisticated enough to warrant the incorporation of a speech
recognition/speech synthesis control system. In the preferred embodiment,
electronic device 150 represents a speech communications device such as a
mobile radiotelephone.
User-spoken input speech is applied to microphone 105, which acts as an
acoustic coupler providing an electrical input speech signal for the
control system. Acoustic processor 110 performs acoustic feature
extraction upon the input speech signal. Word features, defined as the
amplitude/frequency parameters of each user-spoken input word, are thereby
provided to speech recognition processor 120 and to training processor
170. Acoustic processor 110 may also include a signal conditioner, such as
an analog-to-digital converter, to interface the input speech signal to
the speech recognition control system. Acoustic processor 110 will be
further described in conjunction with FIG. 3.
Training processor 170 manipulates this word feature information from
acoustic processor 110 to provide word recognition templates to be stored
in template memory 160. During the training procedure, the incoming word
features are arranged into individual words by locating their endpoints.
If the training procedure is designed to accommodate multiple training
utterances for word feature consistency, then the multiple utterances may
be averaged to form a single word template. Furthermore, since most speech
recognition systems do not require all of the speech information to be
stored as a template, some type of data reduction is often performed by a
training processor 170 to reduce the template memory requirements. The
word templates are stored in template memory 160 for use by speech
recognition processor 120 as well as by speech synthesis processor 140.
The exact training procedure utilized by the preferred embodiment of the
present invention may be found in the description accompanying FIG. 2.
In the recognition mode, speech recognition processor 120 compares the word
feature information provided by acoustic processor 110 to the word
recognition templates provided by template memory 160. If the acoustic
features of the present word feature information derived from the
user-spoken input speech sufficiently match the acoustic features of a
particular prestored word template derived from the template memory, then
recognition processor 120 provides device control data to device
controller 130 indicative of the particular word recognized. A further
discussion of an appropriate speech recognition apparatus, and how the
preferred embodiment incorporates data reduction into the training process
may be found in the description accompanying FIGS. 3 through 5.
Device controller 130 interfaces the entire control system to electronic
device 150. Device controller 130 translates the device control data
provided by recognition processor 120 into control signals adaptable for
use by the particular electronic device. These control signals direct the
device to perform specific operating functions as instructed by the user.
(Device controller 130 may also perform additional supervisory functions
related to other elements shown in FIG. 1.) An example of a device
controller known in the art and suitable for use with the present
invention is a microcomputer. Refer to FIG. 3 for further details of the
hardware implementation.
Device controller 130 also provides device status data representing the
operating status of electronic device 150. This data is applied to speech
synthesis processor 140, along with word recogition templates from
template memory 160. Synthesis processor 140 utilizes the status data to
determine which word recognition template is to be synthesized into
user-recognizable reply speech. Synthesis processor 140 may also include
an internal reply memory, also controlled by the status data, to provide
"canned" reply words to the user. In either case, the user is informed of
the electronic device operating status when the speech reply signal is
ouput via speaker 145.
Thus, FIG. 1 illustrates how the present invention provides a
user-interactive control system utilizing speech recognition to control
the operating parameters of an electronic device, and how a speech
recognition template may be utilized to generate reply speech to the user
indicative of the operating status of the device.
FIG. 2 illustrates in more detail the application of the user-interactive
control system to a speech communications device comprising a part of any
radio or landline voice communications system, such as, for example, a
two-way radio system, a telephone system, an intercom system, etc.
Acoustic processor 110, recognition processor 120, template memory 160,
and device controller 130 are the same in structure and in operation as
the corresponding blocks of FIG. 1. However, control system 200
illustrates the internal structure of speech communications device 210.
Speech communication terminal 225 represents the main electronic network
of device 210, such as, for example, a telephone terminal or a
communications console. In this embodiment, microphone 205 and speaker 245
are incorporated into the speech communications device itself. A typical
example of this microphone/speaker arrangement would be a telephone
handset. Speech communications terminal 225 interfaces operating status
information of the speech communications device to device controller 130.
This operating status information may comprise functional status data of
the terminal itself (e.g., channel data, service information, operating
mode messages, etc.), user-feedback information of the speech recognition
control system (e.g., directory contents, word recognition verification,
operating mode status, etc.), or may include system status data pertaining
to the communications link (e.g., loss-of-line, system busy, invalid
access code, etc.).
In either the training mode or the recognition mode, the features of user
spoken input speech are extracted by acoustic processor 110. In the
training mode, which is represented in FIG. 2 by a position "A" of switch
215, the word feature information is applied to word averager 220 of
training processor 170. As previously mentioned, if the system is designed
to average multiple utterances together to form a single word template,
the averaging is performed by word averager 220. Through the use of word
averaging, the training processor can take into account the minor
variances between two or more utterances of the same word, thereby
producing a more reliable word template. Numerous word averaging
techniques may be used. For example, one method would be to combine only
the similar word features of all training utterances to produce a "best"
set of features for the word template. Another technique may be to simply
compare all training utterances to determine which one provides the "best"
template. Still another word averaging technique is described by L. R.
Rabiner and J. G. Wilson in "A Simplified Robust Training Procedure for
Speaker Trained, Isolated Word Recognition Systems", Journal of the
Acoustic Society of America, vol. 68 (November 1980), pp. 1271-76.
Data reducer 230 then performs data reduction upon either the averaged word
data from word averager 220 or upon the word feature signals directly from
acoustic processor 110, depending upon the presence or absence of a word
averager. In either case, the reduction process consists of segmenting
this "raw" word feature data and combining the data in each segment. The
storage requirements for the template are then further reduced by
differential encoding of the segmented data to produce "reduced" word
feature data. This specific data reduction technique of the present
invention is fully described in conjunction with FIGS. 4 and 5. To
summarize, data reducer 230 compresses the raw word data to minimize the
template storage requirements and to reduce the speech recognition
computation time.
The reduced word feature data provided by training processor 170 is stored
as word recognition templates in template memory 160. In the recognition
mode, which is illustrated by position "B" of switch 215, recognition
processor 120 compares the incoming word feature signals to the word
recognition templates. Upon recognition of a valid command word,
recognition processor 120 may instruct device controller 130 to cause a
corresponding speech comunications device control function to be executed
by speech communications terminal 225. Terminal 225 may respond to device
controller 130 by sending operating status information back to controller
130 in the form of terminal status data. This data can be used by the
control system to synthesize the appropriate speech reply signal to inform
the user of the present device operating status. This sequence of events
will be more clearly understood by referring to the subsequent example.
Synthesis processor 140 is comprised of speech synthesizer 240, data
expander 250, and reply memory 260. A synthesis processor of this
configuration is capable of generating "canned" replies to the user from a
prestored vocabulary (stored in reply memory 260), as well as generating
"template" responses from a user-generated vocabulary (stored in template
memory 160). Speech synthesizer 240 and reply memory 260 are further
described in conjunction with FIG. 3, and data expander 250 is fully
described in the text accompanying FIG. 8a. In combination, the blocks of
synthesis processor 140 generate a speech reply signal to speaker 245.
Accordingly, FIG. 2 illustrates the technique of using a single template
memory for both speech recognition and speech synthesis.
The simplified example of a "smart" telephone terminal employing
voice-controlled dialing from a stored telephone number directory is now
used to describe the operation of the control system of FIG. 2. Initially,
an untrained speaker-dependent speech recognition system cannot recognize
command words. Therefore, the user must manually prompy the device to
begin the training procedure, perhaps by entering a particular code into
the telephone keypad. Device controller 130 then directs switch 215 to
enter the training mode (position "A"). Device controller 130 then
instructs speech synthesizer 240 to respond with the predefined phrase
TRAINING VOCABULARY ONE, which is a "canned" response obtained from reply
memory 260. The user then begins to build a command word vocabulary by
uttering command words, such as STORE or RECALL, into microphone 205. The
features of the utterance are first extracted by acoustic processor 110,
and then applied to either word averager 220 or data reducer 230. If the
particular speech recognition system is designed to accept multiple
utterances of the same word, word averager 220 produces a set of averaged
word features representing the best representation of that particular
word. If the system does not have word averaging capabilities, the single
utterance word features (rather than the multiple utterance averaged word
features) are applied to data reducer 230. The data reduction process
removes unnecessary or duplicate feature data, compresses the remaining
data, and provides template memory 160 with "reduced" word recognition
templates. A similar procedure is followed for training the system to
recognize digits.
Once the system is trained with the command word vocabulary, the user must
continue the training procedure by entering telephone directory names and
numbers. To accomplish this task, the user utters the previously-trained
command word ENTER. Upon recognition of this utterance as a valid user
command, device controller 130 instructs speech synthesizer 240 to reply
with the "canned" phrase DIGITS PLEASE? stored in reply memory 260. Upon
entering the appropriate telephone number digits (e.g., 555-1234), the
user says TERMINATE and the system replys NAME PLEASE? to prompt
user-entry of the corresponding directory name (e.g., SMITH). This
user-interactive process continues until the telephone number directory is
completely filled with the appropriate telephone names and digits.
To place a phone call, the user simply utters the command word RECALL. When
the utterance is recognized as a valid user command by recognition
processor 120, device controller 130 directs speech synthesizer 240 to
generate the verbal reply NAME? via synthesizing information provided by
reply memory 260. The user then responds by speaking the name in the
directory index corresponding to the telephone number that he desires to
dial (e.g. JONES). The word will be recognized as a valid directory entry
if it corresponds to a predetermined name index stored in template memory
160. If valid, device controller 130 directs data expander 250 to obtain
the appropriate reduced word recognition template from template memory 160
and perform the data expansion process for synthesis. Data expander 250
"unpacks" the reduced word feature data and restores the proper energy
contour for an intelligible reply word. The expanded word template data is
then fed to speech synthesizer 240. Using both the template data and the
reply memory data, speech synthesizer 240 generates the phrase JONES . . .
(from template memory 160 through data expander 250) . . . FIVE-FIVE-FIVE,
SIX-SEVEN-EIGHT-NINE (from reply memory 260).
The user then says the command word SEND which, when recognized by the
control system, instructs device controller 130 to send telephone number
dialing information to speech communications terminal 225. Terminal 225
outputs this dialing information via a appropriate communications link.
When the telephone connection is made, speech communications terminal 225
interfaces microphone audio from microphone 205 to the appropriate
transmit path, and receive audio from the appropriate receive audio path
to speaker 245. If a proper telephone connection cannot be made, terminal
controller 225 provides the appropriate communications link status
information to device controller 130. Accordingly, device controller 130
instructs speech synthesizer 240 to generate the appropriate reply word
corresponding to the status information provided, such as the reply word
SYSTEM BUSY. In this manner, the user is informed of the communications
link status, and user-interactive voice-controlled directory dialing is
achieved.
The above operational description is merely one application of synthesizing
speech from speech recognition templates according to the present
invention. Numerous other applications of this novel technique to a speech
communications device are contemplated, such as, for example, a
communications console, a two-way radio, etc. In the preferred embodiment,
the control system of the present invention is used with a mobile
radiotelephone.
Although speech recognition and speech synthesis allows a vehicle operator
to keep both eyes on the road, the conventional handset or hand-held
microphone prohibits him from keeping both hands on the steering wheel or
from executing proper manual (or automatic) transmission shifting. For
this reason, the control system of the preferred embodiment incorporates a
speakerphone to provide hands-free control of the speech communications
device. The speakerphone performs the transmit/receive audio switching
function, as well as the received/reply audio multiplexing function.
Referring now to FIG. 3, control system 300 utilizes the same acoustic
processor block 110, training processor block 170, recognition processor
block 120, template memory block 160, device controller block 130, and
synthesis processor block 140 as the corresponding blocks of FIG. 2.
However, microphone 302 and speaker 375 are not an integral part of the
speech communications terminal. Instead, input speech signal from
microphone 302 is directed to radiotelephone 350 | | |