WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method of transmitting speech using discontinuous transmission and comfort noise    
United States Patent6381568   
Link to this pagehttp://www.wikipatents.com/6381568.html
Inventor(s)Supplee; Lynn Michele (Crownsville, MD); Dean; Richard A. (Columbia, MD); Kohler; Mary A (Columbia, MD)
AbstractSpeech transmission method by initializing silence, transmit, and blank-period counters; receiving frame; determining frame is speech; if transmit counter is zero and blank-period counter is less than x then discard frame, increment blank-period counter, and return to second step; if transmit counter is zero, blank-period counter greater than x-1, and frame not speech then discard frame, increment blank-period counter, and return to second step; if transmit counter is zero, blank-period counter greater than x-1, and frame is speech then set transmit counter to one, set blank-period counter to zero, set silence counter to zero, encode frame, transmit encoded frame, and return to second step; if transmit counter is one, frame not speech, and silence counter less than y then encode frame, transmit encoded frame, increment silence counter, and return to second step; if transmit counter is one, frame not speech, and silence counter greater than y+z-2 then set transmit counter to zero, discard frame, encode comfort noise, transmit encoded comfort noise, increment silence counter, and return to second step; if transmit counter is one, frame not speech, and silence counter greater than y-1 then discard frame, encode comfort noise, transmit encoded comfort noise, increment silence counter, and return to second step; and if transmit counter is one, frame is speech, and silence counter less than y+z then encode frame, transmit encoded frame, set silence counter to zero, and return to second step.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 6381568
Method of transmitting speech using discontinuous transmission and comfort

     noise - US Patent 6381568 Drawing
Method of transmitting speech using discontinuous transmission and comfort noise
Inventor     Supplee; Lynn Michele (Crownsville, MD); Dean; Richard A. (Columbia, MD); Kohler; Mary A (Columbia, MD)
Owner/Assignee     The United States of America as represented by the National Security Agency (Washington, DC)
Patent assignment
All assignments
Publication Date     April 30, 2002
Application Number     09/305,325
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 5, 1999
US Classification     704/210 704/233
Int'l Classification     G10L 011/06
Examiner     Dorvil; Richemond
Assistant Examiner    
Attorney/Law Firm     Morelli; Robert D.
Address
Parent Case    
Priority Data    
USPTO Field of Search     704/210 704/215 704/219 704/200 704/207 704/200.1 704/231 704/233
Patent Tags     transmitting speech discontinuous transmission comfort noise
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
6205476
Hayes, Jr.
709/220
Mar,2001

[0 after 0 votes]
6188980
Thyssen
704/230
Feb,2001

[0 after 0 votes]
6173257
Gao
704/220
Jan,2001

[0 after 0 votes]
6097772
Johnson

Aug,2000

[0 after 0 votes]
6055497
Hallkvist

Apr,2000

[0 after 0 votes]
6049765
Iyengar
704/201
Apr,2000

[0 after 0 votes]
5978756
Walker
704/210
Nov,1999

[0 after 0 votes]
5890109
Walker

Mar,1999

[0 after 0 votes]
5867574
Eryilmaz
379/388.04
Feb,1999

[0 after 0 votes]
5835889
Kapanen
704/215
Nov,1998

[0 after 0 votes]
5812965
Massaloux

Sep,1998

[0 after 0 votes]
5749067
Barrett
704/233
May,1998

[0 after 0 votes]
5737407
Graumann
379/388.04
Apr,1998

[0 after 0 votes]
5732141
Chaoui
381/56
Mar,1998

[0 after 0 votes]
5722086
Teitler
455/561
Feb,1998

[0 after 0 votes]
5649055
Gupta
704/233
Jul,1997

[0 after 0 votes]
5619566
Fogel
379/406.07
Apr,1997

[0 after 0 votes]
5619565
Cesaro
379/386
Apr,1997

[0 after 0 votes]
5612955
Fernandes
370/433
Mar,1997

[0 after 0 votes]
5598466
Graumann
379/388.04
Jan,1997

[0 after 0 votes]
5533118
Cesaro
379/386
Jul,1996

[0 after 0 votes]
5459814
Gupta
704/233
Oct,1995

[0 after 0 votes]
5276765
Freeman
704/233
Jan,1994

[0 after 0 votes]
5255340
Arnaud
704/200
Oct,1993

[0 after 0 votes]
4696039
Doddington
704/215
Sep,1987

[0 after 0 votes]
4672669
DesBlache
704/237
Jun,1987

[0 after 0 votes]
4351983
Crouse
704/233
Sep,1982

[0 after 0 votes]
4008375
Lanier
704/250
Feb,1977

[0 after 0 votes]
3832491
Sciulli
704/212
Aug,1974

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of transmitting speech, comprising the steps of:

a) setting a silence counter to zero;

b) setting a transmit counter to one;

c) setting a blank period counter to zero;

d) receiving a frame of digitized information;

e) determining if the frame contains speech;

f) if the transmit counter is equal to zero and the blank period counter is less than x, where x is a positive integer, then discarding the frame, incrementing the blank period counter by one, and returning to step (d);

g) if the transmit counter is equal to zero, the blank period counter is greater than x-1 and the frame does not contain speech then discarding the frame, incrementing the blank period counter by one, and returning to step (d);

h) if the transmit counter is equal to zero, the blank period counter is greater than x-1, and the frame contains speech then setting the transmit counter to one, setting the blank period counter equal to zero, setting the silence counter equal to zero, encoding the frame, transmitting the encoded frame, and returning to step (d);

i) if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is less than y then encoding the frame, transmitting the encoded frame, incrementing the silence counter by one, and returning to step (d);

j) if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is greater than y+z-2, where y and z are both positive integers, then setting the transmit counter to zero, discarding the frame, encoding a frame containing comfort noise, transmitting the encoded frame containing comfort noise, incrementing the silence counter by one, and returning to step (d);

k) if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is greater than y-1 then discarding the frame, encoding a frame containing comfort noise, transmitting the encoded frame containing comfort noise, incrementing the silence counter by one, and returning to step (d); and

l) if the transmit counter is equal to one, the frame contains speech, and the silence counter is less than y+z then encoding the frame, transmitting the encoded frame, setting the silence counter to zero, and returning to step (d).

2. The method of claim 1, wherein the step of discarding the frame, incrementing the blank period counter by one, and returning to step (d) if the transmit counter is equal to zero and the blank period counter is less than x is comprised of the step of discarding the frame, incrementing the blank period counter by one, and returning to step (d) if the transmit counter is equal to zero and the blank period counter is less than 2.

3. The method of claim 1, wherein said step of setting the transmit counter to zero, discarding the frame, encoding a frame containing comfort noise, transmitting the encoded frame containing comfort noise, incrementing the silence counter by one, and returning to step (d) if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is greater than y+z+2 is comprised of the step of setting the transmit counter to zero, discarding the frame, encoding a frame containing comfort noise, transmitting the encoded frame containing comfort noise, incrementing the silence counter by one, and returning to step (d) if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is greater than y+z+2, where y equals 3 and z equals 2.

4. The method of claim 1, wherein said step of determining if the frame contains speech is comprised of the steps of:

a) calculating an energy of the frame as

E=(A.sup.H.times.A+L )/(FrameSize)

where A is a vector of the frame, where A.sup.H is a complex conjugate transpose of A, and where FrameSize is a number of samples in the frame;

b) setting a minimum energy threshold;

c) setting a maximum energy threshold;

d) setting a speech threshold as

T=(0.07.times.maximum energy threshold)+(K.times.minimum energy threshold), where K is a user-definable value;

e) comparing E to T;

f) if E is less than T then concluding that no speech is contained within the frame, other-wise concluding that speech is contained within the frame; and

g) increasing the minimum energy threshold by a first user-definable percentage.

5. The method of claim 4, wherein the step of increasing the minimum energy threshold by a first user-definable percentage is comprised of the step of increasing the minimum energy threshold by one percent.

6. The method of claim 5, further including the steps of:

a) if E is less than the minimum energy threshold then setting the first user-definable percentage to what the first user-definable percentage was set to initially; and

b) if E is greater than the minimum energy threshold then increasing the first user-definable percentage by a second user-definable percentage.

7. The method of claim 6, wherein the step of if E is greater than the minimum energy threshold then increasing the user-definable percentage by a second user-definable percentage is comprised of the step of if E is greater than the minimum energy threshold then increasing the first user-definable percentage by one-hundredth of a percent.

8. The method of claim 4, further including the step of decreasing the maximum energy threshold by a third user-definable percentage.

9. The method of claim 8, wherein the step of decreasing the maximum energy threshold by a third user-definable percentage is comprised of the step of decreasing the maximum energy threshold by one percent.

10. The method of claim 9, further including the steps of:

a) if E is greater than the maximum energy threshold then setting the third user-definable percentage to what the third user-definable percentage was set to initially; and

b) if E is less than the maximum energy threshold then decreasing the third user-definable percentage by a fourth user-definable percentage.

11. The method of claim 10, wherein the step of if E is less than the maximum energy threshold then decreasing the user-definable percentage by a fourth user-definable percentage is comprised of the step of if E is less than the maximum energy threshold then decreasing the third user-definable percentage by one-hundredth of a percent.

12. The method of claim 1, wherein the step of encoding the frame in steps (h), (i), (j), (k), and (l) are each comprised of the step of encoding the frame in Mixed Excitation Linear Prediction (MELP) format.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates, in general, to data processing and, in particular, to speech signal processing.

BACKGROUND OF THE INVENTION

Systems for transmitting speech to a receiver often digitize the speech, divide the digitized speech into frames, encode each frame using a particular voice encoder, or vocoder algorithm, and transmit the frames to a receiver.

Some of the problems encountered by these systems include unnecessary complexity, recognizing background noise as speech when no speech is present, transmitting too many frames that do not contain speech, sending frames encoded using a format other than the chosen vocoder, and so on.

Some speech transmission systems are unnecessarily complex. Such systems tend to be more expensive than simpler systems because of the additional software required to perform a complex function. Also, a complex system may be too slow for a particular purpose because of the additional time required to complete a complex function.

Some speech systems set thresholds for background noise that are based on a theoretical model of noise. Such systems are susceptible to erroneous determinations that speech is present in a frame when it is not because of unanticipated changes in the actual background noise from transmission to transmission. Also, some systems do not adjust the background noise thresholds once set or do not adjust the thresholds often enough to keep pace with a rapidly changing noise background. These same points apply to how systems set the threshold for determining whether or not speech is present within a frame.

Speech transmission systems that send too many frames that do not contain speech waste bandwidth that could have been used to transmit frames that do contain speech and run the risk that the receiver will mistakenly conclude that the transmission is over for lack of any voice activity.

Some speech transmission systems send additional frames (e.g., comfort noise) that are not encoded using the chosen vocoder but are sent using special frames. Using special frames add complexity to the receiver because the receiver must be able to recognize these special frames. Also, special frames may cause bothersome noise in the receiver since the special frames where not encoded using the chosen vocoder algorithm.

U.S. Pat. No. 3,832,491, entitled "DIGITAL VOICE SWITCH WITH AN ADAPTIVE DIGITALLY-CONTROLLED THRESHOLD," discloses a voice switch that adjusts the threshold for determining the presence of speech that is adjusted only after a theoretically optimum threshold is exceeded 1,220 times and adjusts a minimum speech threshold based on noise. U.S. Pat. No. 3,832,491 does not perform the steps of the present invention and does not adjust the speech threshold in the same manner, or as often, as does the present invention. U.S. Pat. No. 3,832,491 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 4,008,375, entitled "DIGITAL VOICE SWITCH FOR SINGLE OR MULTIPLE CHANNEL APPLICATIONS," discloses a voice switch that adjusts the threshold for determining the presence of speech based on a statistical analysis of whether or not the number of times the speech threshold is exceeded is uniform or non-uniform. U.S. Pat. No. 4,008,375 does not perform the steps of the present invention and does not adjust the speech threshold as often as does the present invention. U.S. Pat. No. 4,008,375 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. Nos. 5,612,955, entitled "MOBILE RADIO WITH TRANSMIT COMMAND CONTROL AND MOBILE RADIO SYSTEM"; U.S. Pat. No. 5,812,965, entitled "PROCESS AND DEVICE FOR CREATING COMFORT NOISE IN A DIGITAL SPEECH TRANSMISSION"; and U.S. Pat. No. 5,835,889, entitled "METHOD AND APPARATUS FOR DETECTING HANGOVER PERIODS IN A TDMA WIRELESS COMMUNICATION SYSTEM USING DISCONTINUOUS TRANSMISSION" each transmit a special silence descriptor (SID) frame when silence is encountered and the transmission of speech is discontinued. This special frame may cause bothersome noise at the receiver whereas the method of the present invention does not. U.S. Pat. Nos. 5,612,955; 5,812,965; and 5,835,889 are hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 4,351,983, entitled "SPEECH DETECTOR WITH VARIABLE THRESHOLD," discloses a device for and method of detecting speech by adjusting the threshold for determining speech, but does not do so as does the present invention. Also, U.S. Pat. No. 4,351,983 does not employ comfort noise and discontinuous transmission as does the present invention. U.S. Pat. No. 4,351,983 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 4,672,669, entitled "VOICE ACTIVITY DETECTION PROCESS AND MEANS FOR IMPLEMENTING SAID PROCESS," discloses advice for and method of detecting voice activity by comparing the energy of a signal to a threshold. The signal is determined to be voice if its power is above the threshold. If its power is below the threshold then the rate of change of the spectral parameters is tested. U.S. Pat. No. 4,672,669 does not employ, comfort noise of discontinuous transmission as does the present invention. U.S. Pat. No. 4,672,669 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 5,255,340, entitled "METHOD FOR DETECTING VOICE PRESENCE ON A COMMUNICATION LINE," discloses a method of detecting voice activity by determining the stationary or non-stationary state of a block of the signal and comparing the result to the results of the last M blocks and does not employ the steps of the present method. U.S. Pat. No. 5,255,340 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 5,276,765, entitled "VOICE ACTIVITY DETECTION," discloses a device for and a method of detecting voice activity by performing an autocorrelation on weighted and combined coefficients of the input signal to provide a measure that depends on the power of the signal. The measure is then compared against a variable threshold to determine voice activity. However, the speech threshold is not adjusted during speech periods as in the present invention. U.S. Pat. No. 5,276,765 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. Nos. 5,459,814 and 5,649,055, both entitled "VOICE ACTIVITY DETECTOR FOR SPEECH SIGNALS IN VARIABLE BACKGROUND NOISE," discloses a device for and method of detecting voice activity by measuring short term time domain characteristics of the input signal, including the average,signal level and the absolute value of any change in average signal level and not the steps of the present method. U.S. Pat. Nos. 5,459,814 and 5,649,055 are hereby incorporated by reference into the specification of the present invention.

U.S. Pat. Nos. 5,533,118 and 5,619,565, both entitled "VOICE ACTIVITY DETECTION METHOD AND APPARATUS USING THE SAME," discloses a device for and method of distinguishing voice activity from two tones by dividing the square of the maximum value of the received signal by its energy and comparing this ratio to three different thresholds and not the steps of the present method. U.S. Pat. Nos. 5,533,118 and 5,619,565 are hereby incorporated by reference into the specification of the present invention.

U.S. Pat. Nos. 5,598,466 and 5,737,407, both entitled "VOICE ACTIVITY DETECTOR FOR HALF-DUPLEX AUDIO COMMUNICATION SYSTEM," discloses a device for and method of detecting voice activity by determining an average peak value, a standard deviation, updating a power density function, and detecting voice activity if the average peak value exceeds the power density function and not the steps of the present method. U.S. Pat. Nos. 5,598,466 and 5,737,407 are hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 5,619,566, entitled "VOICE ACTIVITY DETECTOR FOR AN ECHO SUPPRESSOR AND AN ECHO SUPPRESSOR," discloses a device for detecting voice activity that includes a whitening filter, a means for measuring energy, and using the energy level to determine the presence of voice activity and not the steps of the present method. U.S. Pat. No. 5,619,566 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 5,732,141, entitled "DETECTING VOICE ACTIVITY," discloses a device for and method of detecting voice activity by computing the autocorrelation coefficients of a signal, identifying a first autocorrelation vector, identifying a second autocorrelation vector, subtracting the first autocorrelation vector from the second autocorrelation vector, and computing a norm of the differentiation vector which indicates whether or not voice activity is present and not the steps of the present method. U.S. Pat. No. 5,732,141 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 5,749,067, entitled "VOICE ACTIVITY DETECTOR," discloses a device for and method of detecting voice activity by comparing the spectrum of the a signal to a noise estimate, updating the noise estimate, computing a linear predictive coding prediction gain, and suppressing updating the noise estimate if the gain exceeds a threshold and not the steps of the present method. U.S. Pat. No. 5,749,067 is hereby incorporated by reference into the specification of the present invention.

U.S. Pat. No. 5,867,574, entitled "VOICE ACTIVITY DETECTION SYSTEM AND METHOD," discloses a device for and method of detecting voice activity by computing an energy term based on an integral of the absolute value of a derivative of a speech signal, computing a ratio of the energy to a noise level, and comparing the ratio to a voice activity threshold and not the steps of the present method. U.S. Pat. No. 5,867,574 is hereby incorporated by reference into the specification of the present invention.

SUMMARY OF THE INVENTION.

It is an object of the present invention to transmit encoded frames of digitized speech.

It is another object of the present invention to. transmit encoded comfort noise after a user-definable number of frames have been detected that do not contain speech.

It is another object of the present invention to discontinue transmission after a user-definable number of frames are detected that do not contain speech.

It is another object of the present invention to resume transmission after transmission has been discontinued upon the detection of a frame containing speech.

It is another object of the present invention to adjust the threshold for determining the presence of speech based on the energy of the frame on a frame by frame basis.

It is another object of the present invention to adjust a minimum energy threshold on a frame by frame basis.

It is another object of the present invention to adjust a maximum energy threshold on a frame by frame basis.

The present invention is a method of transmitting speech.

The first step is setting a silence counter to zero.

The second step is setting a transmit counter to one.

The third step is setting a blank period counter to zero.

The fourth step is receiving a frame of digitized information that may or may not contain speech.

The fifth step is determining if the frame contains speech.

The sixth step is checking if the transmit counter is equal to zero and the blank period counter is less than x, where x is a positive integer.

The seventh step is checking if the transmit counter is equal to zero, the blank period counter is greater than x-1, and the frame does not contain speech.

The eighth step is checking if the transmit counter is equal to zero, the blank period counter is greater than x-1, and the frame contains speech.

The ninth step is checking if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is less than y.

The tenth step is checking if the transmit counter is equal to one, the frame does not contain speech, and the silence counter is greater than y+z-2, where y and z are both positive integers.

The eleventh step is checking if the transmit counter is equal to one, the frame does not contain speech and the silence counter is greater than y-1.

The twelfth, and last, step is checking if the transmit counter is equal to one, the frame contains speech and the silence counter is less than y+z.

In the preferred embodiment, the energy of a frame is calculated using the following equation.

E=(A.sup.H.times.A+L )/(FrameSize)

A minimum energy threshold is set.

A maximum energy threshold is set.

A speech threshold is set as T=(0.07.times.maximum energy threshold)+(K.times.minimum energy threshold), where K is a user-definable value.

The energy of the frame is compared to the speech threshold.

If the energy of the frame is less than the speech threshold then concluding that no speech is contained within the frame, otherwise concluding that speech is contained within the frame.

Increasing the minimum energy threshold by a first user-definable percentage.

Additionally, the energy of the frame may be checked to see if it is less than the minimum energy threshold. If so, set the first user-definable percentage to what the first user-definable percentage was set to initially. Also, check if the energy of the frame is greater than the minimum energy threshold. If so then increase the first user-definable percentage by a second user-definable percentage.

In an alternate embodiment, the