WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Voice activity detector for half-duplex audio communication system    
United States Patent5598466   
Link to this pagehttp://www.wikipatents.com/5598466.html
Inventor(s)Graumann; David L. (Vancouver, WA)
AbstractA method of detecting voice in an audio signal comprises the steps of determining an average peak value representing an envelope of the audio signal, determining a running instance of audio signal standard deviation, which corresponds to one of a number of overlapping time intervals, and updating a power density function (PDF) by adding instances of noise to the PDF if the average peak of the audio signal exceeds the current level of the audio signal by a certain amount and if the current standard deviation value falls below a threshold for a predetermined time interval. A noise floor is located based on the mean value of the PDF, and, if the audio signal sustains a power level exceeding the noise floor, voice activity is determined to be present in the audio signal. The PDF is updated by a low confidence factor if all of the standard deviation values calculated during a certain period of time are below the threshold value and by a high confidence factor if all standard deviation values within a certain longer period of time period are below the threshold value.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5598466
Voice activity detector for half-duplex audio communication system - US Patent 5598466 Drawing
Voice activity detector for half-duplex audio communication system
Inventor     Graumann; David L. (Vancouver, WA)
Owner/Assignee     Intel Corporation (Santa Clara, CA)
Patent assignment
All assignments
Publication Date     January 28, 1997
Application Number     08/520,305
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 28, 1995
US Classification     379/388.04 379/351 379/392.01 381/56 704/233
Int'l Classification     H04M 009/10 H04M 009/08 G10L 003/00 G10L 009/18
Examiner     Zele; Krista M.
Assistant Examiner     Kumar; Devendra
Attorney/Law Firm     Blakely, Sokoloff, Taylor & Zafman
Address
Parent Case    
Priority Data    
USPTO Field of Search     379/388 379/389 379/390 379/406 379/414 379/416 379/351 395/2.42 395/2.35 395/2.36 395/2.37 395/2.17 395/2.23 381/46 381/47 381/56 381/57 381/94 381/110
Patent Tags     voice activity detector half-duplex audio communication
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5471528
Reesor
379/406.08
Nov,1995

[0 after 0 votes]
5459814
Gupta
704/233
Oct,1995

[0 after 0 votes]
5357567
Barron
379/406.06
Oct,1994

[0 after 0 votes]
5323337
Wilson
702/73
Jun,1994

[0 after 0 votes]
5297198
Butani

Mar,1994

[0 after 0 votes]
5293588
Satoh
704/233
Mar,1994

[0 after 0 votes]
5255340
Arnaud
704/200
Oct,1993

[0 after 0 votes]
5239574
Brandman
379/88.08
Aug,1993

[0 after 0 votes]
4979214
Hamilton
704/233
Dec,1990

[0 after 0 votes]
4959857
Erving
379/406.07
Sep,1990

[0 after 0 votes]
4887288
Erving
379/22.02
Dec,1989

[0 after 0 votes]
4796287
Reesor
379/390.03
Jan,1989

[0 after 0 votes]
4715063
Haddad
379/390.01
Dec,1987

[0 after 0 votes]
4672669
DesBlache
704/237
Jun,1987

[0 after 0 votes]
4630304
Borth
381/94.3
Dec,1986

[0 after 0 votes]
4461024
Rengger
704/233
Jul,1984

[0 after 0 votes]
4147892
Miller
379/388.05
Apr,1979

[0 after 0 votes]
4028496
LaMarche
704/233
Jun,1977

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of locating a noise floor for qualifying a signal, comprising the steps of:

establishing a noise power density function (NPDF), based on:

a relationship between an approximate peak level of the signal and a current level of the signal, and

a plurality of standard deviation values of the signal, each of the standard deviation values corresponding to one of a plurality of time intervals;

repeatedly updating the NPDF to produce a current state of the NPDF; and

using the current state of the NPDF to locate the noise floor.

2. The method according to claim 1, wherein each of the time intervals overlaps at least one other time interval.

3. The method according to claim 1, wherein the step of repeatedly updating comprises the steps of:

determining whether the approximate peak level of the signal exceeds the current level of the signal by a predetermined amount;

determining whether all of the standard deviation values calculated during a first time period are below a threshold value; and identifying a noise instance if:

the approximate peak level of the signal exceeds the current level of the signal by a predetermined amount, and

all of the standard deviation values calculated during the first time period are below the threshold value.

4. The method according to claim 1, wherein the approximate peak level corresponds to an envelope of the signal.

5. A method of detecting speech in an audio signal, comprising the steps of:

determining an average peak of the audio signal; determining a plurality of standard deviation values of the audio signal, each of the standard deviation values corresponding to one of a plurality of time intervals;

updating a power density function (PDF) to establish a current state of the PDF according to a relationship between the average peak and a current level of the audio signal and based on the standard deviation values;

locating a noise floor based on the current state of the PDF; and

if a predetermined relationship exists between the current level of the audio signal and the noise floor, determining that speech is represented in the audio signal.

6. The method according to claim 5, wherein each of the time intervals overlaps at least one other time interval.

7. The method according to claim 5, wherein the PDF represents a plurality of noise instances.

8. The method according to claim 7, wherein the updating step comprises the steps of:

determining whether the average peak of the audio signal exceeds the current level of the audio signal by a predetermined amount;

determining whether all of the standard deviation values calculated during a first time period are below a threshold value; and

identifying a noise instance if:

the average peak of the audio signal exceeds the current level of the audio signal by a predetermined amount, and

all of the standard deviation values calculated during the first time period are below the threshold value.

9. The method according to claim 8, wherein the updating step further comprises the step of modifying the PDF to reflect an additional noise instance if a noise instance was identified.

10. The method according to claim 9, wherein the modifying step comprises the steps of:

modifying the PDF according to a low confidence factor if all of the standard deviation values calculated during the first time period are below the threshold value; and

modifying the PDF according to a high confidence factor if all of the standard deviation values calculated during a second time period are below the threshold value, wherein the second time period is greater than the first time period.

11. The method according to claim 5, wherein the average peak corresponds to an envelope of the audio signal.

12. The method according to claim 5, wherein the predetermined relationship is a relationship in which the current level exceeds the noise floor by a predetermined amount.

13. An apparatus for determining whether voice is present in an audio signal, comprising:

a peak calculator determining a peak of the audio signal;

a standard deviation generator determining a plurality of standard deviation values of the audio signal, each of the standard deviation values corresponding to one of a plurality of time intervals;

updating logic coupled to receive the peak and the standard deviation values, the updating logic updating a power density function (PDF) to establish a current state of the PDF according to a relationship between the peak and a current level of the audio signal and based on the standard deviation values;

a noise floor locator coupled to receive the current state of the PDF, the noise floor locator locating a noise floor based on the current state of the PDF; and

decision logic coupled to receive the noise floor and the audio signal, the decision logic determining that voice is represented in the audio signal when a predetermined relationship exists between the current level of the audio signal and the noise floor.

14. The apparatus according to claim 13, wherein each of the time intervals overlaps at least one other time interval.

15. The apparatus according to claim 14, wherein the PDF represents a plurality of noise instances.

16. The apparatus according to claim 15, wherein the updating logic comprises:

first comparator logic determining whether the peak of the audio signal exceeds the current level of the audio signal by a predetermined amount;

second comparator logic determining whether all of the standard deviation values calculated during a first time period are below a threshold value; and

noise logic coupled to the first comparator logic and the second comparator logic, the noise logic identifying a noise instance if:

the peak of the audio signal exceeds the current level of the audio signal by a predetermined amount, and

all of the standard deviation values calculated during the first time period are below the threshold value.

17. The apparatus according to claim 15, wherein the peak corresponds to an envelope of the audio signal.

18. An apparatus for detecting voice in an audio signal, comprising:

means for determining an average peak of the audio signal;

means for determining a plurality of standard deviation values of the audio signal, each of the standard deviation values corresponding to one of a plurality of time intervals;

means for updating a power density function (PDF) to establish a current state of the PDF according to a relationship between the average peak and a current level of the audio signal and based on the standard deviation values;

means for locating a noise floor based on the current state of the PDF; and

means for determining that voice is represented in the audio signal if a predetermined relationship exists between the current level of the audio signal and the noise floor.

19. The apparatus according to claim 18, wherein each of the time intervals overlaps at least one other time interval.

20. The apparatus according to claim 18, wherein the PDF represents a plurality of noise instances.

21. The apparatus according to claim 20, wherein the means for updating comprises:

means for determining whether the average peak of the audio signal exceeds the current level of the audio signal by a predetermined amount;

means for determining whether all of the standard deviation values calculated during a first time period are below a threshold value; and

means for identifying a noise instance if:

the average peak of the audio signal exceeds the current level of the audio signal by a predetermined amount, and

all of the standard deviation values calculated during the first time period are below the threshold value.

22. The apparatus according to claim 21, wherein the means for updating further comprises means for modifying the PDF to reflect an additional noise instance if a noise instance was identified.

23. The apparatus according to claim 22, wherein the means for modifying comprises:

means for modifying the PDF according to a low confidence factor if all of the standard deviation values calculated during the first time period are below the threshold value; and

means for modifying the PDF according to a high confidence factor if all of the standard deviation values calculated during a second time period are below the threshold value, wherein the second time period is greater than the first time period.

24. The apparatus according to claim 20, wherein the average peak corresponds to an envelope of the audio signal.

25. A computer system having capability for duplex audio communication with a remote site, the system comprising:

a processor controlling the computer system;

an input device coupled to the processor and coupled to input audio information to be transmitted to the remote site;

an output device coupled to the processor and coupled to output audio information received from the remote site; and

a voice activity detector coupled to the input device and the output device, the voice activity detector detecting voice represented in an audio signal received by the computer system or to be transmitted by the computer system, the voice activity detector including:

peak logic determining an average peak of the audio signal;

a standard deviation generator determining a plurality of standard deviation values of the audio signal, each of the standard deviation values corresponding to one of a plurality of time intervals;

updating logic coupled to receive the standard deviation values and the average peak updating a power density function (PDF) to establish a current state of the PDF according to a relationship between the average peak and a current level of the audio signal and based on the standard deviation values;

noise logic locating a noise floor based on the current state of the PDF; and

decision logic determining that voice is represented in the audio signal when a predetermined relationship exists between the current level of the audio signal and the noise floor.

26. The computer system according to claim 25, wherein each of the time intervals overlaps at least one other time interval.

27. A processing system having capability for duplex audio communication with a remote site, the system comprising:

processor means for controlling the processing system;

input means for inputting audio information to be transmitted to the remote site;

output means for outputting audio information received from the remote sight; and

voice detection means for detecting voice in an audio signal received by the processing system or to be transmitted by the processing system, the voice detection means including:

means for determining an approximate peak of the audio signal;

means for determining a plurality of standard deviation values of the audio signal, each of the standard deviation values corresponding to one of a plurality of time intervals;

means for updating a power density function (PDF) to establish a current state of the PDF according to a relationship between the approximate peak and a current level of the audio signal and based on the standard deviation values;

means for locating a noise floor based on the current state of the PDF; and

means for determining that voice is represented in the audio signal if a predetermined relationship exists between the current level of the audio signal and the noise floor.

28. The processing system according to claim 27, wherein each of the time intervals overlaps at least one other time interval.

29. The processing system according to claim 27, wherein the PDF represents a plurality of noise instances.

30. The processing system according to claim 27, wherein the means for updating comprises:

means for determining whether the approximate peak of the audio signal exceeds the current level of the audio signal by a predetermined amount;

means for determining whether all of the standard deviation values calculated during a first time period are below a threshold value; and

means for identifying a noise instance if:

the approximate peak of the audio signal exceeds the current level of the audio signal by a predetermined amount, and

all of the standard deviation values calculated during the first time period are below the threshold value.

31. The processing system according to claim 30, wherein the means for updating further comprises means for modifying the PDF to reflect an additional noise instance if a noise instance was identified.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention pertains to the field of telecommunications. More particularly, the present invention relates to establishing a noise floor and detecting speech activity in an audio signal.

BACKGROUND OF THE INVENTION

Advances in telecommunications technology are continuously improving the ways in which people carry out both business and personal communications. Such advances include improvements in video conferencing, increased availability of ISDN links and computer networks, and improvements in ordinary telephone service. These technological advances create many design challenges. For example, many telecommunication systems require a solution for distinguishing speech from noise in an audio signal; a device which performs this function has been referred to as a voice activity detector (VAD).

One application for a VAD is in a half-duplex audio communication system used in "open audio", or speakerphone, teleconferencing. Half-duplex transmission is transmission which takes place in only one direction at a given point in time. Therefore, it is a common practice in such a system to temporarily deactivate the microphone at a given site while that site is receiving a transmission and to mute the speaker at either site to eliminate audio feedback being received by the remote site. Consequently, a VAD may be necessary to detect the presence of speech both in the audio signal received from a remote site and in the audio signal to be transmitted to the remote site in order to implement these functions. A VAD may also be used to signal an echo suppression algorithm, to distinguish "voiced" speech from "unvoiced" speech, and in various other aspects of audio communications.

Some existing VADs make use of the communication link itself in detecting speech activity. For example, certain data may be provided to a VAD at one end of the link by "piggybacking" the data on other audio data transmitted from the other end. For various reasons, however, it is not desirable to have a VAD which is dependent upon a remote site in detecting speech. In addition, some existing VADs have undesirably slow response times, frequently misclassify speech, or require excessive processing time.

Another design issue relates to the use of headsets to implement closed audio microphone and speakers in video conferencing. Video conferencing software applications are available which, in general, permit both audio and visual communication between the user of one personal computer and the user of another personal computer via ISDN lines, a LAN, or other channels. One such application is the ProShare.TM. Personal Conferencing Video System, created by Intel Corporation of Santa Clara, California. Some video conferencing applications are sold precalibrated to support one or more particular models of headsets. This precalibration may be accomplished by including data in the software code relating to the appropriate hardware settings, such as the microphone input gain. However, if the user wishes to use a non-supported headset, he or she must generally go outside of the video conferencing application to the operating system in order to adjust the hardware settings. In doing so, the user essentially must guess at the best hardware settings, often having to readjust the settings by trial and error in order to achieve the optimum settings. Hence, existing hardware calibration solutions provide little flexibility in terms of ability to support multiple different headsets.

In view of these and other design issues, therefore, it is desirable to have a VAD which operates independently of the remote site. It is further desirable that such a VAD provide high-accuracy (infrequent misclassifications), fast response time, adaption to the remote site's fluctuating signal-to-noise ratio, and consistent half-duplex performance when the remote user transitions between open and closed audio modes. In addition, it is desirable to provide a VAD which can be directly used by a hardware calibration solution. Finally, it is desirable to have a hardware calibration solution which automatically adjusts the hardware settings to be appropriate for any headset a user wishes to employ.

SUMMARY OF THE INVENTION

An aspect of the present invention is a method of locating a noise floor for qualifying a signal. The method comprises the step of establishing a noise power density function (NPDF), based on (1) a relationship between an approximate peak level of the signal and a current level of the signal, and (2) a number of standard deviation values of the signal. Each of the standard deviation values corresponds to one of a number of time intervals. The method further comprises the steps of repeatedly updating the NPDF to a current state, and using the current state of the NPDF to locate the noise floor.

Another aspect of the present invention is a method of detecting speech in an audio signal. The method comprises the steps of: (1) determining an average peak value of the audio signal; (2) determining a number of standard deviation values of the audio signal, each of which corresponds to one of a number of time intervals; (3) updating a power density function (PDF) to a current state of the PDF, according to: (a) the relationship between the average peak and a current level of the audio signal, and (b) the standard deviation values; (4) locating a noise floor based on the current state of the PDF; and (5) if a certain relationship exists between the current level of the audio signal and the noise floor, determining that speech activity is present in the audio signal.

Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follows.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 illustrates a computer system in which the present invention can be implemented.

FIG. 2 illustrates the data flow associated with speech detection and automatic calibration of a microphone in a computer system using half-duplex audio communication.

FIG. 3 illustrates a waveform of an audio signal having speech activity.

FIGS. 4A and 4B illustrate the function of a voice activity detector (VAD).

FIG. 4C is a block diagram of a voice activity detector.

FIG. 5 is a flowchart illustrating the overall operation of a voice activity detector.

FIG. 6 illustrates a noise power density function.

FIG. 7 is a flowchart illustrating a process of determining and updating a noise floor.

FIG. 8 illustrates a prior art approach to calculating the standard deviation of the energy of an input audio signal.

FIG. 9A illustrates an approach to calculating the standard deviation of an audio signal according to the present invention.

FIG. 9B illustrates a plot of the standard deviation of an input audio signal over time.

FIG. 10 illustrates a waveform of an input audio signal and a plot of the average peak of the input audio signal.

FIG. 11 is a flowchart illustrating a process of calculating an average peak of an input audio signal.

FIG. 12 is a flowchart illustrating a process of determining whether an input signal contains only noise and updating a noise power density function.

FIG. 13 illustrates a waveform of an input audio signal showing a comparison of the sample windows used in calculating average energy, standard deviation, and average peak of the input audio signal.

FIG. 14 is a flowchart illustrating a process for determining whether speech is present in an input audio signal.

FIG. 15 illustrates a power density function of an input audio signal containing noise energy and speech energy.

FIG. 16 is a flowchart illustrating a process for automatically calibrating a microphone of a headset.

FIG. 17 is a flowchart illustrating a process for eliminating erroneous data during automatic calibration of a microphone.

FIGS. 18A and 18B illustrate processes for adjusting hardware settings during automatic calibration of a microphone.

DETAILED DESCRIPTION

A method and apparatus for establishing a noise floor and for detecting speech activity in an audio signal is described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be evident, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

The present invention is implemented in a computer system 1 having half-duplex audio communication with at least one other computer system through an audio channel 95, as illustrated in FIG. 1. The audio channel 95 may be an Integrated Services Digital Network (ISDN) link or a standard computer local area network (LAN), or an analog phone system. The computer system 1 includes a central processing unit 10, a disk storage device 20, a keyboard 30, a memory 40, an audio input/output (I/O) subsystem 50, a cursor control device 60, a display 70, a video I/O subsystem 80 receiving input from a video camera 85, and an interface device 90, such as a modem, providing an interface between the computer system 1 and the audio channel 95. The audio I/O subsystem 50 is coupled to a speaker 52 and a microphone 53 for open audio communication and to a headset 51 having both a speaker and a microphone for closed audio communication. The cursor control device 60 may be a mouse, trackball, light pen, stylus/graphics tablet, or other similar device. The disk storage device 20 may be a magnetic disk, CD-ROM, or other alternative data storage device.

FIG. 2 illustrates the data flow associated with operation of the present invention. The present invention is implemented in a voice activity detector (VAD) receive channel 210, a VAD transmit channel 211, and an autocalibrator 230, each of which may be embodied in software stored in memory 40 or on the disk storage device 20, or in equivalent circuitry. In FIG. 2, compressed audio data is received by the computer system 1 from the audio channel 95 and input to decompression unit 220. Signal AUDIO RX, which contains decompressed audio data, is then output by decompression unit 220 to half-duplex receive channel 200 and to VAD receive channel 210. The energy E of the signal AUDIO RX has a waveform similar to that illustrated in FIG. 3. In FIG. 3, the portion 301 of the waveform which exceeds a noise floor NF is considered to be speech energy, whereas the portions 302 of the waveform not exceeding the noise floor NF are considered to be only noise energy. The VAD receive channel 210 receives signal AUDIO RX as input and generates an output RXO to half-duplex receive channel 200 indicating whether or not the signal AUDIO RX contains speech at any given point in time.

The half-duplex receive channel 200 selectively passes on the signal AUDIO RX to audio front-end output circuitry 252 depending upon the output RXO of the VAD receive channel 210. Audio data passed on to audio front-end output circuitry 252 is processed and sent to the speaker 52. Referring now to FIG. 4A, if the VAD receive channel 210 indicates to the half-duplex receive channel 200 that speech is present in the signal AUDIO RX in step 401, then the half-duplex receive channel 200 communicates with half-duplex transmit channel 201 to cause the microphone 53 to be muted in step 402. The microphone 53 remains muted until speech is no longer detected in the signal AUDIO RX.

Referring again to FIG. 2, sound to be transmitted across the audio channel 95 is input by a user either through the microphone of the headset 51 or through the open audio microphone 53 into audio front-end input circuitry 253, which outputs the signal AUDIO TX. The energy E of signal AUDIO TX, as with signal AUDIO RX, has a form similar to that depicted in FIG. 3. The signal AUDIO TX is provided to VAD transmit channel 211 and to half-duplex transmit channel 201. Half-duplex channel 201 selectively passes on the signal AUDIO TX to compression unit 222 for transmission across the audio channel 95, depending upon an input TXO received from the VAD transmit channel 211 indicating whether or not speech is present in signal AUDIO TX. Referring now to FIG. 4B, if half-duplex transmit channel 201 receives an input TXO from VAD transmit channel 211 indicating that speech is present in signal AUDIO TX in step 404, then half-duplex transmit channel 201 communicates with half-duplex receive channel 200 to cause the half-duplex receive channel 200 to mute the speaker 52 in step 405. The speaker 52 remains muted until speech is no longer detected in the signal AUDIO TX.

Referring again to FIG. 2, autocalibrator 230 automatically calibrates headset 51 in response to a user input entered through a graphical user interface (GUI) 240 in a manner which is not dependent upon the particular make or model of headset 51. Autocalibrator 230 receives a user input UI from the GUI 240 and the signal TXO from the VAD transmit channel 211. Autocalibrator 230 outputs a first calibration signal CAL1 to the audio front-end input circuitry 253 and a second calibration signal CAL2 to the memory 40 and the disk storage device 20. The signal CALl is used to calibrate the audio front end input circuitry 253, and the signal CAL2 is used to store the appropriate hardware settings on the disk storage device 20 or in the memory 40.

Although VAD receive channel 210 and VAD transmit channel 211 have thus far been illustrated and described separately, they perform essentially identical functions and are each hereinafter represented interchangeably by the VAD 410 illustrated in FIG. 4C. The VAD 410 receives an input audio signal AUDIN, which represents either signal AUDIO RX or signal AUDIO TX, and outputs a signal VADOUT, which represents either signal RXO or signal TXO and which indicates whether speech is present in the input signal AUDIN. Referring now to FIG. 5, a flow chart is shown illustrating the overall function of the VAD 410. The function of the VAD 410 consists generally of two steps. In step 501, a noise floor NF is established. Next, in step 502, the VAD 410 determines whether speech is present in the input signal AUDIN based upon the relationship of the input signal AUDIN to the noise floor NF. In the preferred embodiment, steps 501 and 502 are each repeated once every 20 milliseconds (msec).

The VAD 410 continuously recomputes the noise floor NF in determining whether speech is present in the input signal, as will be described in greater detail below. The noise floor is generated based on a noise power density function (NPDF) which is created and updated by the VAD 410. The energy level of the noise floor NF is based upon a current state of the NPDF at any given point and time. An example of an NPDF is illustrated in FIG. 6. The noise floor NF is taken to be the mean energy value of the NPDF, i.e., the mean noise energy level (MNEL), plus a margin value MV. In the preferred embodiment, the input signal AUDIN is sampled by the VAD 410 at a rate of 8 kHz and the NPDF is updated every 20 msec. Consequently, the input signal AUDIN is sampled 160 times for every 20 msec time interval.

The VAD 410 uses both the standard deviation of the energy of the input signal over a period of time as well as the current energy level of the input signal at a particular point in time to update the NPDF. A "sliding window" of time is used in gathering samples of the input signal's energy to generate each new value of the standard deviation SD. That is, each calculated value of standard deviation SD is based upon a sample period which overlaps at least one previous sample period, as illustrated in FIG. 9A and as will be further discussed below. In the preferred embodiment, a sample period of 500 msec is used to generate each standard deviation value SD. This period of 500 msec is updated every 20 msec in order to achieve a fast response time of the VAD 410. Because such short time periods are used, the current energy level E is examined in comparison to