|
|
|
| United States Patent | 6381568 |
| Link to this page | http://www.wikipatents.com/6381568.html |
| Inventor(s) | Supplee; Lynn Michele (Crownsville, MD);
Dean; Richard A. (Columbia, MD);
Kohler; Mary A (Columbia, MD) |
| Abstract | Speech transmission method by initializing silence, transmit, and
blank-period counters; receiving frame; determining frame is speech; if
transmit counter is zero and blank-period counter is less than x then
discard frame, increment blank-period counter, and return to second step;
if transmit counter is zero, blank-period counter greater than x-1, and
frame not speech then discard frame, increment blank-period counter, and
return to second step; if transmit counter is zero, blank-period counter
greater than x-1, and frame is speech then set transmit counter to one,
set blank-period counter to zero, set silence counter to zero, encode
frame, transmit encoded frame, and return to second step; if transmit
counter is one, frame not speech, and silence counter less than y then
encode frame, transmit encoded frame, increment silence counter, and
return to second step; if transmit counter is one, frame not speech, and
silence counter greater than y+z-2 then set transmit counter to zero,
discard frame, encode comfort noise, transmit encoded comfort noise,
increment silence counter, and return to second step; if transmit counter
is one, frame not speech, and silence counter greater than y-1 then
discard frame, encode comfort noise, transmit encoded comfort noise,
increment silence counter, and return to second step; and if transmit
counter is one, frame is speech, and silence counter less than y+z then
encode frame, transmit encoded frame, set silence counter to zero, and
return to second step. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 6381568 |
|
|
Method of transmitting speech using discontinuous transmission and comfort
noise |
|
|
|
|
|
| Publication Date |
April 30, 2002 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
Claims  |
|
|
What is claimed is:
1. A method of transmitting speech, comprising the steps of:
a) setting a silence counter to zero;
b) setting a transmit counter to one;
c) setting a blank period counter to zero;
d) receiving a frame of digitized information;
e) determining if the frame contains speech;
f) if the transmit counter is equal to zero and the blank period counter is
less than x, where x is a positive integer, then discarding the frame,
incrementing the blank period counter by one, and returning to step (d);
g) if the transmit counter is equal to zero, the blank period counter is
greater than x-1 and the frame does not contain speech then discarding the
frame, incrementing the blank period counter by one, and returning to step
(d);
h) if the transmit counter is equal to zero, the blank period counter is
greater than x-1, and the frame contains speech then setting the transmit
counter to one, setting the blank period counter equal to zero, setting
the silence counter equal to zero, encoding the frame, transmitting the
encoded frame, and returning to step (d);
i) if the transmit counter is equal to one, the frame does not contain
speech, and the silence counter is less than y then encoding the frame,
transmitting the encoded frame, incrementing the silence counter by one,
and returning to step (d);
j) if the transmit counter is equal to one, the frame does not contain
speech, and the silence counter is greater than y+z-2, where y and z are
both positive integers, then setting the transmit counter to zero,
discarding the frame, encoding a frame containing comfort noise,
transmitting the encoded frame containing comfort noise, incrementing the
silence counter by one, and returning to step (d);
k) if the transmit counter is equal to one, the frame does not contain
speech, and the silence counter is greater than y-1 then discarding the
frame, encoding a frame containing comfort noise, transmitting the encoded
frame containing comfort noise, incrementing the silence counter by one,
and returning to step (d); and
l) if the transmit counter is equal to one, the frame contains speech, and
the silence counter is less than y+z then encoding the frame, transmitting
the encoded frame, setting the silence counter to zero, and returning to
step (d).
2. The method of claim 1, wherein the step of discarding the frame,
incrementing the blank period counter by one, and returning to step (d) if
the transmit counter is equal to zero and the blank period counter is less
than x is comprised of the step of discarding the frame, incrementing the
blank period counter by one, and returning to step (d) if the transmit
counter is equal to zero and the blank period counter is less than 2.
3. The method of claim 1, wherein said step of setting the transmit counter
to zero, discarding the frame, encoding a frame containing comfort noise,
transmitting the encoded frame containing comfort noise, incrementing the
silence counter by one, and returning to step (d) if the transmit counter
is equal to one, the frame does not contain speech, and the silence
counter is greater than y+z+2 is comprised of the step of setting the
transmit counter to zero, discarding the frame, encoding a frame
containing comfort noise, transmitting the encoded frame containing
comfort noise, incrementing the silence counter by one, and returning to
step (d) if the transmit counter is equal to one, the frame does not
contain speech, and the silence counter is greater than y+z+2, where y
equals 3 and z equals 2.
4. The method of claim 1, wherein said step of determining if the frame
contains speech is comprised of the steps of:
a) calculating an energy of the frame as
E=(A.sup.H.times.A+L )/(FrameSize)
where A is a vector of the frame, where A.sup.H is a complex conjugate
transpose of A, and where FrameSize is a number of samples in the frame;
b) setting a minimum energy threshold;
c) setting a maximum energy threshold;
d) setting a speech threshold as
T=(0.07.times.maximum energy threshold)+(K.times.minimum energy threshold),
where K is a user-definable value;
e) comparing E to T;
f) if E is less than T then concluding that no speech is contained within
the frame, other-wise concluding that speech is contained within the
frame; and
g) increasing the minimum energy threshold by a first user-definable
percentage.
5. The method of claim 4, wherein the step of increasing the minimum energy
threshold by a first user-definable percentage is comprised of the step of
increasing the minimum energy threshold by one percent.
6. The method of claim 5, further including the steps of:
a) if E is less than the minimum energy threshold then setting the first
user-definable percentage to what the first user-definable percentage was
set to initially; and
b) if E is greater than the minimum energy threshold then increasing the
first user-definable percentage by a second user-definable percentage.
7. The method of claim 6, wherein the step of if E is greater than the
minimum energy threshold then increasing the user-definable percentage by
a second user-definable percentage is comprised of the step of if E is
greater than the minimum energy threshold then increasing the first
user-definable percentage by one-hundredth of a percent.
8. The method of claim 4, further including the step of decreasing the
maximum energy threshold by a third user-definable percentage.
9. The method of claim 8, wherein the step of decreasing the maximum energy
threshold by a third user-definable percentage is comprised of the step of
decreasing the maximum energy threshold by one percent.
10. The method of claim 9, further including the steps of:
a) if E is greater than the maximum energy threshold then setting the third
user-definable percentage to what the third user-definable percentage was
set to initially; and
b) if E is less than the maximum energy threshold then decreasing the third
user-definable percentage by a fourth user-definable percentage.
11. The method of claim 10, wherein the step of if E is less than the
maximum energy threshold then decreasing the user-definable percentage by
a fourth user-definable percentage is comprised of the step of if E is
less than the maximum energy threshold then decreasing the third
user-definable percentage by one-hundredth of a percent.
12. The method of claim 1, wherein the step of encoding the frame in steps
(h), (i), (j), (k), and (l) are each comprised of the step of encoding the
frame in Mixed Excitation Linear Prediction (MELP) format. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates, in general, to data processing and, in
particular, to speech signal processing.
BACKGROUND OF THE INVENTION
Systems for transmitting speech to a receiver often digitize the speech,
divide the digitized speech into frames, encode each frame using a
particular voice encoder, or vocoder algorithm, and transmit the frames to
a receiver.
Some of the problems encountered by these systems include unnecessary
complexity, recognizing background noise as speech when no speech is
present, transmitting too many frames that do not contain speech, sending
frames encoded using a format other than the chosen vocoder, and so on.
Some speech transmission systems are unnecessarily complex. Such systems
tend to be more expensive than simpler systems because of the additional
software required to perform a complex function. Also, a complex system
may be too slow for a particular purpose because of the additional time
required to complete a complex function.
Some speech systems set thresholds for background noise that are based on a
theoretical model of noise. Such systems are susceptible to erroneous
determinations that speech is present in a frame when it is not because of
unanticipated changes in the actual background noise from transmission to
transmission. Also, some systems do not adjust the background noise
thresholds once set or do not adjust the thresholds often enough to keep
pace with a rapidly changing noise background. These same points apply to
how systems set the threshold for determining whether or not speech is
present within a frame.
Speech transmission systems that send too many frames that do not contain
speech waste bandwidth that could have been used to transmit frames that
do contain speech and run the risk that the receiver will mistakenly
conclude that the transmission is over for lack of any voice activity.
Some speech transmission systems send additional frames (e.g., comfort
noise) that are not encoded using the chosen vocoder but are sent using
special frames. Using special frames add complexity to the receiver
because the receiver must be able to recognize these special frames. Also,
special frames may cause bothersome noise in the receiver since the
special frames where not encoded using the chosen vocoder algorithm.
U.S. Pat. No. 3,832,491, entitled "DIGITAL VOICE SWITCH WITH AN ADAPTIVE
DIGITALLY-CONTROLLED THRESHOLD," discloses a voice switch that adjusts the
threshold for determining the presence of speech that is adjusted only
after a theoretically optimum threshold is exceeded 1,220 times and
adjusts a minimum speech threshold based on noise. U.S. Pat. No. 3,832,491
does not perform the steps of the present invention and does not adjust
the speech threshold in the same manner, or as often, as does the present
invention. U.S. Pat. No. 3,832,491 is hereby incorporated by reference
into the specification of the present invention.
U.S. Pat. No. 4,008,375, entitled "DIGITAL VOICE SWITCH FOR SINGLE OR
MULTIPLE CHANNEL APPLICATIONS," discloses a voice switch that adjusts the
threshold for determining the presence of speech based on a statistical
analysis of whether or not the number of times the speech threshold is
exceeded is uniform or non-uniform. U.S. Pat. No. 4,008,375 does not
perform the steps of the present invention and does not adjust the speech
threshold as often as does the present invention. U.S. Pat. No. 4,008,375
is hereby incorporated by reference into the specification of the present
invention.
U.S. Pat. Nos. 5,612,955, entitled "MOBILE RADIO WITH TRANSMIT COMMAND
CONTROL AND MOBILE RADIO SYSTEM"; U.S. Pat. No. 5,812,965, entitled
"PROCESS AND DEVICE FOR CREATING COMFORT NOISE IN A DIGITAL SPEECH
TRANSMISSION"; and U.S. Pat. No. 5,835,889, entitled "METHOD AND APPARATUS
FOR DETECTING HANGOVER PERIODS IN A TDMA WIRELESS COMMUNICATION SYSTEM
USING DISCONTINUOUS TRANSMISSION" each transmit a special silence
descriptor (SID) frame when silence is encountered and the transmission of
speech is discontinued. This special frame may cause bothersome noise at
the receiver whereas the method of the present invention does not. U.S.
Pat. Nos. 5,612,955; 5,812,965; and 5,835,889 are hereby incorporated by
reference into the specification of the present invention.
U.S. Pat. No. 4,351,983, entitled "SPEECH DETECTOR WITH VARIABLE
THRESHOLD," discloses a device for and method of detecting speech by
adjusting the threshold for determining speech, but does not do so as does
the present invention. Also, U.S. Pat. No. 4,351,983 does not employ
comfort noise and discontinuous transmission as does the present
invention. U.S. Pat. No. 4,351,983 is hereby incorporated by reference
into the specification of the present invention.
U.S. Pat. No. 4,672,669, entitled "VOICE ACTIVITY DETECTION PROCESS AND
MEANS FOR IMPLEMENTING SAID PROCESS," discloses advice for and method of
detecting voice activity by comparing the energy of a signal to a
threshold. The signal is determined to be voice if its power is above the
threshold. If its power is below the threshold then the rate of change of
the spectral parameters is tested. U.S. Pat. No. 4,672,669 does not
employ, comfort noise of discontinuous transmission as does the present
invention. U.S. Pat. No. 4,672,669 is hereby incorporated by reference
into the specification of the present invention.
U.S. Pat. No. 5,255,340, entitled "METHOD FOR DETECTING VOICE PRESENCE ON A
COMMUNICATION LINE," discloses a method of detecting voice activity by
determining the stationary or non-stationary state of a block of the
signal and comparing the result to the results of the last M blocks and
does not employ the steps of the present method. U.S. Pat. No. 5,255,340
is hereby incorporated by reference into the specification of the present
invention.
U.S. Pat. No. 5,276,765, entitled "VOICE ACTIVITY DETECTION," discloses a
device for and a method of detecting voice activity by performing an
autocorrelation on weighted and combined coefficients of the input signal
to provide a measure that depends on the power of the signal. The measure
is then compared against a variable threshold to determine voice activity.
However, the speech threshold is not adjusted during speech periods as in
the present invention. U.S. Pat. No. 5,276,765 is hereby incorporated by
reference into the specification of the present invention.
U.S. Pat. Nos. 5,459,814 and 5,649,055, both entitled "VOICE ACTIVITY
DETECTOR FOR SPEECH SIGNALS IN VARIABLE BACKGROUND NOISE," discloses a
device for and method of detecting voice activity by measuring short term
time domain characteristics of the input signal, including the
average,signal level and the absolute value of any change in average
signal level and not the steps of the present method. U.S. Pat. Nos.
5,459,814 and 5,649,055 are hereby incorporated by reference into the
specification of the present invention.
U.S. Pat. Nos. 5,533,118 and 5,619,565, both entitled "VOICE ACTIVITY
DETECTION METHOD AND APPARATUS USING THE SAME," discloses a device for and
method of distinguishing voice activity from two tones by dividing the
square of the maximum value of the received signal by its energy and
comparing this ratio to three different thresholds and not the steps of
the present method. U.S. Pat. Nos. 5,533,118 and 5,619,565 are hereby
incorporated by reference into the specification of the present invention.
U.S. Pat. Nos. 5,598,466 and 5,737,407, both entitled "VOICE ACTIVITY
DETECTOR FOR HALF-DUPLEX AUDIO COMMUNICATION SYSTEM," discloses a device
for and method of detecting voice activity by determining an average peak
value, a standard deviation, updating a power density function, and
detecting voice activity if the average peak value exceeds the power
density function and not the steps of the present method. U.S. Pat. Nos.
5,598,466 and 5,737,407 are hereby incorporated by reference into the
specification of the present invention.
U.S. Pat. No. 5,619,566, entitled "VOICE ACTIVITY DETECTOR FOR AN ECHO
SUPPRESSOR AND AN ECHO SUPPRESSOR," discloses a device for detecting voice
activity that includes a whitening filter, a means for measuring energy,
and using the energy level to determine the presence of voice activity and
not the steps of the present method. U.S. Pat. No. 5,619,566 is hereby
incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,732,141, entitled "DETECTING VOICE ACTIVITY," discloses a
device for and method of detecting voice activity by computing the
autocorrelation coefficients of a signal, identifying a first
autocorrelation vector, identifying a second autocorrelation vector,
subtracting the first autocorrelation vector from the second
autocorrelation vector, and computing a norm of the differentiation vector
which indicates whether or not voice activity is present and not the steps
of the present method. U.S. Pat. No. 5,732,141 is hereby incorporated by
reference into the specification of the present invention.
U.S. Pat. No. 5,749,067, entitled "VOICE ACTIVITY DETECTOR," discloses a
device for and method of detecting voice activity by comparing the
spectrum of the a signal to a noise estimate, updating the noise estimate,
computing a linear predictive coding prediction gain, and suppressing
updating the noise estimate if the gain exceeds a threshold and not the
steps of the present method. U.S. Pat. No. 5,749,067 is hereby
incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,867,574, entitled "VOICE ACTIVITY DETECTION SYSTEM AND
METHOD," discloses a device for and method of detecting voice activity by
computing an energy term based on an integral of the absolute value of a
derivative of a speech signal, computing a ratio of the energy to a noise
level, and comparing the ratio to a voice activity threshold and not the
steps of the present method. U.S. Pat. No. 5,867,574 is hereby
incorporated by reference into the specification of the present invention.
SUMMARY OF THE INVENTION.
It is an object of the present invention to transmit encoded frames of
digitized speech.
It is another object of the present invention to. transmit encoded comfort
noise after a user-definable number of frames have been detected that do
not contain speech.
It is another object of the present invention to discontinue transmission
after a user-definable number of frames are detected that do not contain
speech.
It is another object of the present invention to resume transmission after
transmission has been discontinued upon the detection of a frame
containing speech.
It is another object of the present invention to adjust the threshold for
determining the presence of speech based on the energy of the frame on a
frame by frame basis.
It is another object of the present invention to adjust a minimum energy
threshold on a frame by frame basis.
It is another object of the present invention to adjust a maximum energy
threshold on a frame by frame basis.
The present invention is a method of transmitting speech.
The first step is setting a silence counter to zero.
The second step is setting a transmit counter to one.
The third step is setting a blank period counter to zero.
The fourth step is receiving a frame of digitized information that may or
may not contain speech.
The fifth step is determining if the frame contains speech.
The sixth step is checking if the transmit counter is equal to zero and the
blank period counter is less than x, where x is a positive integer.
The seventh step is checking if the transmit counter is equal to zero, the
blank period counter is greater than x-1, and the frame does not contain
speech.
The eighth step is checking if the transmit counter is equal to zero, the
blank period counter is greater than x-1, and the frame contains speech.
The ninth step is checking if the transmit counter is equal to one, the
frame does not contain speech, and the silence counter is less than y.
The tenth step is checking if the transmit counter is equal to one, the
frame does not contain speech, and the silence counter is greater than
y+z-2, where y and z are both positive integers.
The eleventh step is checking if the transmit counter is equal to one, the
frame does not contain speech and the silence counter is greater than y-1.
The twelfth, and last, step is checking if the transmit counter is equal to
one, the frame contains speech and the silence counter is less than y+z.
In the preferred embodiment, the energy of a frame is calculated using the
following equation.
E=(A.sup.H.times.A+L )/(FrameSize)
A minimum energy threshold is set.
A maximum energy threshold is set.
A speech threshold is set as T=(0.07.times.maximum energy
threshold)+(K.times.minimum energy threshold), where K is a user-definable
value.
The energy of the frame is compared to the speech threshold.
If the energy of the frame is less than the speech threshold then
concluding that no speech is contained within the frame, otherwise
concluding that speech is contained within the frame.
Increasing the minimum energy threshold by a first user-definable
percentage.
Additionally, the energy of the frame may be checked to see if it is less
than the minimum energy threshold. If so, set the first user-definable
percentage to what the first user-definable percentage was set to
initially. Also, check if the energy of the frame is greater than the
minimum energy threshold. If so then increase the first user-definable
percentage by a second user-definable percentage.
In an alternate embodiment, the | | |