WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Multi-rate digital voice coder apparatus    
United States Patent4890327   
Link to this pagehttp://www.wikipatents.com/4890327.html
Inventor(s)Bertrand; John (Upper Nyack, NY); Noah; Matthew J. (Detroit Lakes, MN)
AbstractAn analog to digital converter for a speech signal is implemented in modules to allow for changes in bit rate and changes in bit stream length according to requirements of the digital transmission system. A pre-emphasis circuit provides an array of pre-emphasized speech samples which are stored in memory. A linear predictive coder provides an array of reflection coefficients and an array of filter coefficients. A pulse processor receives the speech samples and filter coefficients and generates speech amplitude and location signals. These signals are multiplied to generate quantized speech samples. The quantized speech samples and reflection coefficients are provided to a buffer which provides an output signal of a proper bit stream length and bit rate for the digital transmission system.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4890327
Multi-rate digital voice coder apparatus - US Patent 4890327 Drawing
Multi-rate digital voice coder apparatus
Inventor     Bertrand; John (Upper Nyack, NY); Noah; Matthew J. (Detroit Lakes, MN)
Owner/Assignee     ITT Corporation (New York, NY)
Patent assignment
All assignments
Publication Date     December 26, 1989
Application Number     07/057,474
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 3, 1987
US Classification    
Int'l Classification    
Examiner     Clark; David L.
Assistant Examiner     Merecki; John A.
Attorney/Law Firm     Twomey; Thomas N. Werner; Mary C.
Address
Parent Case    
Priority Data    
USPTO Field of Search    
Patent Tags     multi-rate digital voice coder
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4720861
Bertrand
704/222
Jan,1988

[0 after 0 votes]
4720865
Taguchi
704/216
Jan,1988

[0 after 0 votes]
4716592
Ozawa
704/216
Dec,1987

[0 after 0 votes]
4710959
Feldman
704/207
Dec,1987

[0 after 0 votes]
4669120
Ono
704/216
May,1987

[0 after 0 votes]
4472832
Atal
704/221
Sep,1984

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. Apparatus for converting analog speech into a digital signal for transmission of said digital signal over a conventional communications channel, comprising:

pre-emphasis means responsive to said analog speech at an input for providing at an output an array of pre-emphasized speech samples,

memory means coupled to said pre-emphasis means for storing said array of samples in contiguous storage locations,

linear predictive coder means coupled to the output of said memory means and responsive to said stored samples for providing a first array of reflection coefficients at a first output and a second array of filter coefficients at a second output,

pole broadening means coupled to said linear predictive coder means and responsive to said filter coefficient array for providing an array of filter coefficients having a broadened bandwidth including means for multiplying each of said filter coefficients in said array by a given factor,

a pre-emphasis correction means coupled to said pole broadening means for receiving at an input said array of broadened bandwidth filter coefficients for providing at an output an array of corrected filter coefficients,

pulse processing means coupled to said pre-emphasis means and said pre-emphasis correction means and responsive to said pre-emphasis speech samples and said corrected filter coefficients for providing at a first output a first series of pulses indicative of pulse amplitude and at a second output a second series of pulses indicative of pulse location,

encoder means coupled to said first and second outputs of said pulse processing means for providing a stream of pulses indicative of a product code of said first and second series of pulses, and

output buffer means having a first input coupled to said first output of said linear predictive coding means for receiving said reflection coefficients and a second input coupled to said encoder means for receiving said stream of pulses for providing at an output a digital signal of a given length bit stream having a bit rate determined according to said communications channel.

2. The apparatus according to claim 1, further including:

a noise broadening means between said pre-emphasis correction means and said pulse processing means, said noise broadening means responsive to said corrected filter coefficients and including multiplier means for multiplying each corrected filter coefficient by a given multiplication factor and providing to said pulse processing means an array of noise broadened filter coefficients; and

said pulse processing means is responsive to said noise broadened filter coefficients for providing said first and second series of pulses.

3. The apparatus according to claim 2, wherein said pulse processing means comprises:

a noise shaping means having one input for receiving said pre-emphasized speech samples, and having another input coupled to said pre-emphasis correction means for receiving said corrected filter coefficients and another input coupled to said noise broadening means for receiving said noise broadened filter coefficients to provide at an output an array of noise shaped speech samples according to a given pole-zero filter format.

4. The apparatus according to claim 3, wherein said pulse processing means further comprises:

impulse response means coupled to said noise broadening means for providing at an output an impulse response according to said filter format.

5. The apparatus according to claim 4, wherein said pulse processing means further comprises:

auto-correlation means coupled to said impulse response means for providing at an output the auto-correlation signal of said filter format.

6. The apparatus according to claim 5, wherein said pulse processing means further comprises:

cross-correlation means coupled to said impulse response means and said noise shaping means for providing at an output the cross-correlation signal between said noise shaped speech and said impulse response.

7. The apparatus according to claim 6, wherein said pulse processing means further comprises:

pick pulse means coupled to said cross-correlation means and including correlation update means coupled to said cross-correlation means to provide at an output an array indicative of pulse amplitude and location according to a search of the maximum cross-correlation for determining the location and amplitude of the next pulse, wherein said correlation update means scales said impulse response auto-correlation by a value related to pulse amplitude.

8. The apparatus according to claim 7, wherein said pulse processing means further comprises:

an add pulse means having an input coupled to the output of said pick pulse means for providing a first array indicative of pulse location and a second array indicative of pulse amplitude and including means for storing said arrays.

9. The apparatus according to claim 8, wherein said pulse processing means further comprises:

overhange processing means coupled to said impulse response means for providing at an output a signal indicative of the overlap between framed speech.

10. The apparatus according to claim 9, wherein said pulse processing means further comprises:

receiving means coupled to said channel for receiving said digital signal as provided at said output of said buffer means, including:

input buffer means for storing said digital signal as a stored digital signal, means for reading said stored digital signal at a given bit rate for each frame, a linear predictive (LPC) decoder means coupled to said input buffer means for providing decoding filter coefficients from said stored digital signal, a pulse decoder means coupled to said input buffer for receiving said stored digital signal and for providing pulse amplitude and location signals to an excitation format means;

said excitation format means providing an excitation array indicative of pulse position and amplitude,

a linear predictive synthesis filter means for receiving said decoding filter coefficients and for receiving said excitation array for providing at an output an analog speech signal.

11. The apparatus according to claim 10, further including:

decoder pre-emphasis correction means for receiving said decoding filter coefficients and providing corrected decoding filter coefficients to said linear predictive synthesis filter means.

12. Apparatus for converting analog speech into a digital signal for transmission of said digital signal over a conventional communications channel, comprising:

an analog to digital converter for converting said analog speech into digitized speech,

pre-emphasis means responsive to said digitized speech for providing an array of pre-emphasized speech samples,

memory means coupled to said pre-emphasis means for storing said array of samples,

linear predictive coder means coupled to said memory means and responsive to said stored samples for providing a first array of reflection coefficients and a second array of filter coefficients,

pole broadening means coupled to said linear predictive coder means and responsive to said second array of filter coefficients for providing an array of filter coefficients having a broadened bandwidth, said pole broadeneing means including means for multiplying each of said filter coefficients in said second array of filter coefficients by a given factor, and

a pre-emphasis correction means coupled to said pole broadening means for receiving said array of broadened bandwidth filter coefficients for providing an array of corrected filter coefficients,

pulse processing means coupled to said pre-emphasis means and said pre-emphasis correction means and responsive to said array of pre-emphasized speech samples and said corrected filter coefficients for providing a first series of pulses indicative of pulse amplitude and a second series of pulses indicative of pulse location,

encoder means coupled to said pulse processing means for receiving said first and second series of pulses and for providing a stream of pulses indicative of a product code of said first and second series of pulses,

output buffer means coupled to said linear predictive coding means for receiving said reflection coefficients and coupled to said encoder means for receiving said stream of pulses for providing at an output a digital signal of a given length bit stream having a bit rate determined according to said communications channel.

13. The apparatus according to claim 12, further including:

a noise broadening means responsive to said corrected filter coefficients for providing to said pulse processing means an array of noise broadened coefficients, said noise broadening means including multiplier means for multiplying each corrected filter coefficient by a given multiplication factor to provide said array of noise broadened coefficients.

14. The apparatus according to claim 13, wherein said pulse processing means further comprises:

a noise shaping means for receiving said pre-emphasized speech samples, and for receiving said corrected filter coefficients and for receiving said noise broadened coefficients for providing an array of noise shaped speech samples according to a given pole-zero filter format.

15. Apparatus for converting an analog speech signal into a digital signal, comprising:

a pre-emphasizer to an analog speech input, said pre-emphasizer providing a digital speech sample array;

a linear predictive coder for receiving said digital speech sample array and providing a reflection coefficient digital signal and a filter coefficient digital signal;

a pole broadener for receiving said filter coefficient digital signal and providing a pole broadened filter coefficient signal;

a pre-emphasis corrector for receiving said pole broadened filter coefficient signal and providing a corrected filter coefficient signal;

a pulse processor for receiving said corrected filter coefficient signal and said digital speech sample array, said pulse processor generating a first pulse array of amplitude indicating pulses and a second pulse array of position indicting pulses and providing a digital product code indicative of the product of said first pulse array of amplitude indicating pulses and said second pulse array of position indicating pulses;

an output means for receiving said digital product code and said reflection coefficient digital signal, said output means providing a digital output signal of a given length bit stream and having a predetermined bit rate and representative of said analog speech signal.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

This invention relates to apparatus for digitizing analog speech and more particularly to apparatus for providing compressed speech to allow transmission of such compressed speech over conventional communication channels.

Presently, many modern switching systems employ digital data which is transmitted from a first location to a second location through a digital switching system. In such systems, digital signals are employed throughout the system in order to increase system reliability and to further alleviate many of the problems involved with the transmission of analog data. In this manner conventional analog signals are converted to digital signals such as pulse code modulated signals and are transmitted through the switching network over existing communications channels.

As one can ascertain, such switching networks accommodate various transmission capabilities. In this manner, the number of bits as well as the bit rate of the signal varies according to the particular modems employed and in regard to the capacity of the transmission lines associated with such a system. A basic problem which has existed with regard to the digitization and transmission of analog speech involves the fact that the analog speech typically resides in a frequency range from zero to around 3 KHZ. In regard to digitizing such speech one must use a rate which is high enough to satisfy the Nyquist criterion of sampling and hence employ a frequency of twice the bandwidth. That would result in a sampling rate of approximately 8 KHZ.

Assuming that 10 bits would be sufficient to describe the amplitude of the speech wave for each sample, the required bit transmission rate would be 80 kilobits per second. This for example is not capable of being handled by conventional telephone lines. The prior art is cognizant of such problems and employed a technique designated as linear predictive coding (LPC). Linear predictive coding (LPC) uses a parametric model of the human vocal system to encode speech. This model describes speech production as being controlled by three factors. A first factor is the excitation source which is the energy or gain of a signal and the shape of the acoustic cavity from the epiglottis to the lips. Speech signals can either be voiced such as the A in Ape or unvoiced as the S in Sister.

In any event, the excitation mechanism for the voice signal is modeled by a series of pulses separated by a fixed pitch. The excitation source for the unvoiced signal is modeled as a noise generator. The shape of the acoustic cavity is represented by a plurality of resonant circuits tuned to give information regarding the natural frequencies of the analog speech. The linear predictive coding technique takes advantage of the fact that many speech parameters will not change for a considerable number of samples during a typical speech pattern. Thus, linear predictive coding models typically use an analysis frame containing many samples to arrive at a composite profile for the speech frame before transmitting information on the channel. A commonly used analysis frame duration is 180 samples.

Thus, the channel bit transmission rate can be of the order of a few kilobits per second, a number which such channels as ordinary telephone lines is capable of transmitting. The linear predictive coding technique has been discussed in many technical papers. For example, see an article of A. Buzo et al, entitled "Speech Coding Based on Vector Quantization", I.E.E.E. Transactions on ASSP, Oct. 1980. See also an article by B. S. Atal and J.M. Remde entitled "A New Model of LPC Excitation. . .", Proceedings 1982 ICASSP., pages 614-617. See also an article by Parker et al entitled "Low Bit Rate Speech Enhancement. . .",Proceedings 1984 ICASSP, pages 1.5.1-1.5.4.

As one can ascertain from the prior art, there are problems in transmitting digitized speech over transmission lines or telephone lines. There is a desire to transmit digitized speech of high quality at required bit rates or at multiple rates according to the qualities and characteristics of the switching system or the transmission medium. In providing multiple rate capability, one must assure that the speech processing in regard to quality is suitable for purposes of reconverting the digitized speech back into analog signals without losing excessive information content.

The prior art was cognizant of providing apparatus wherein analog speech was digitized and transmitted over a channel at a minimum bit rate and yet allowing such speech to be synthesized at the receiver end with high intelligibility and quality. In any event, as indicated above, based on modern communication systems, such as digital switching systems employing digital transmissions, one must provide the digitization of analog speech in a digital format which format is capable of providing high speech quality with the required bit rate and having the further capability of varying the rate to accommodate different modems or different transmission requirements For examples of certain prior art techniques, reference is made to a patent application entitled DIGITAL SPEECH CODING CIRCUIT filed on Dec. 24, 1985 for J. Bertrand as Serial No. 813,110 and assigned to the assignee herein, now U.S. Pat. No. 4,720,861, issued Jan. 19, 1988.

This application relates to a digital speech coding apparatus circuit which makes use of linear predictive coding, vector quantization, Huffman coding, and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over telephone lines and at the same time capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality.

The transmitter portion of the circuit comprises a series connection of a lowpass filter, analog-to-digital converter, a linear predictive coding module comprising five resonators for establishing five center frequencies and bandwidths of the analog speech, a vector quantization module for providing a binary representation of the likely combinations of resonance found in human speech, a Huffman coding module, a variable bit rate to fixed bit rate converter and optionally an encryption module. Another branch of the transmitter circuit extends from the output of the analog to digital converter to the bit rate converter and comprises a series combination of an inverse filter and an excitation estimation module having parallel outputs respectively representative of a voiced/unvoiced signal, the excitation amplitude, and the excitation pulse position. The receiver portion of the circuit comprises a series connection of a fixed bit rate to variable rate converter, a bit unmapping module which produces separate outputs representative of the reflection coefficients and excitation of the speech. The synthesis filter which receives these outputs produces a digital signal representative of the analog speech and converts the signal to audio by a digital to analog converter and a lowpass filter.

As indicated, the prior art is cognizant of the necessity of providing digital speech coders and reference is also made to U.S. Pat. No. 4,472,832 issued on Sept. 18, 1984 to B. S. Atal et al and entitled DIGITAL SPEECH CODER. In that patent there is shown a speech analysis and synthesis system where an LPC parameter and a modified residual signal for excitation is transmitted. The excitation signal is the crosscorrelation of the residual signal and the LPC recreated original signal. Essentially, the patent recognizes the act that digital speech communication systems including voice storage and voice response facilities may utilize signal compression to produce the bit rate needed for storage and/or transmission.

The patent then describes a sequential pattern processing arrangement which sequential pattern is partitioned into successive time intervals In each time interval a set of signals representative of the interval sequential pattern and a signal representative of the differences between the interval sequential pattern and the interval representative signal are generated.

The speech pattern is partitioned in successive time intervals. In each interval a set of signals representative of the speech pattern and a signal representative of the differences between the interval speech pattern are generated.

In this manner one can obtain a compression of speech after the speech has been digitized. Thus, as indicated, the prior art has been concerned with the problem and concerned with devices which enable one to compress speech to allow transmission without sacrificing speech quality. See also an article entitled "Improved Pulse Search Algorithms For Multi-Pulse Excited Speech Coder" by S. Ono, T. Araseki, and K. Ozawa of the NEC Corporation of Japan, published 1984 at the Globe Com Conference in Atlanta, Ga.

It is an object of the present invention to provide a multi-rate digital voice coder which voice coder allows one to compress speech to allow digital speech to be transmitted over conventional communications channels such as telephone links.

It is a further object of the present invention to provide a multi-rate digital voice coder apparatus which enables one to preserve high speech quality after digitization which digitized signal is capable of being transmitted at different rates for accommodating different transmission channels.

It is a further object of the present invention to provide a multi-rate digital voice coder apparatus which enables one to provide compressed speech for more efficient digital transmission and storage.

BRIEF DESCRIPTION OF PREFERRED EMBODIMENT

Apparatus for converting analog speech into a digital signal for transmission of said digital signal over a conventional communications channel, comprising pre-emphasis means responsive to said analog speech at an input and operative to provide at an output an array of pre-emphasized speech samples, memory means coupled to said pre-emphasis means for storing said array of samples in contiguous storage locations, linear predictive coder means coupled to said pre-emphasis means and said memory means and responsive to said stored samples to provide a first array of reflection coefficients at a first output and a second array of filter coefficients at a second output, pulse processing means coupled to said pre-emphasis means and said linear predicative coder means and responsive to said speech samples and said filter coefficients to provide at a first output a first series of pulses indicative of speech amplitude and at a second output a second series of pulses indicative of speech location and including encoder means coupled to said first and second outputs for providing a stream of pulses indicative of a product code of said first and second series of pulses indicative of quantized speech samples, output buffer means having a first input coupled to said first output of said linear predictive coding means for receiving said reflection coefficients and a second input coupled to said pulse processing means for receiving said stream of pulses for providing at an output a digital signal of a given length bit stream having a bit rate determined according to said communications channel.

BRIEF DESCRIPTION OF FIGURES

FIG. 1 a block diagram showing a transmitter analysis section of a multi-rate digital voice coder according to this invention.

FIG. 2 is a detailed block diagram showing an LPC analyzer section associated with the module shown in FIG. 1.

FIG. 3 is a detailed block diagram showing the pulse finding section of the module depicted in FIG. 1.

FIG. 4 is a block diagram depicting the receiver or synthesis section of the multi-rate digital voice coder.

DETAILED DESCRIPTION OF FIGURES

Referring to FIG. 1, there is shown a block diagram of a portion of a multi-pulse linear predictive coder (MPLPC). The coder to be described is capable of providing multi-rate digitized bit formats which are indicative of digitized voice signals and which are capable of being transmitted to a conventional modem.

The block diagram of FIG. 1 shows the MPLPC transmitting and analyzing section. The module shown in FIG. 1 and which will be described is capable of converting analog speech to a digital format and outputting the digital format at variable bit rates and variable transmission rates to accommodate different modems or different transmission channels.

As shown in FIG. 1, incoming speech is first directed to a module 10 designated as EXEC which essentially is an execution module as will be further explained. The module 10 is coupled to a module 11 designated as INIT. This module is an analysis initialization module and essentially serves to initialize the system prior to processing of speech. The output of the EXEC module 10 is directed to a PPC module 12. The function of the LPC module is to derive a linear predictive code from the speech samples.

Speech output of the EXEC module 10 is also directed to an input of a pulse-finder module 14. The pulse-finder module 14 receives another input from the LPC module 12. As will be explained, the output of the pulse-finder module 14 provides a series of pulses indicative of the processed speech. These pulses are directed to a pulse encoder 15. An output buffer 16 receives one output from the LPC module 12 and one output from the pulse coder 15. The output buffer 16 as will be explained stores and transmits the information from the LPC module 12 and the pulse encoder module 15 to produce a digital stream at a given bit rate and at a given transmission rate for application to a modem or communications channel.

As will be further explained, the rates of the digital stream can be varied accordingly to accommodate various transmission requirements. It is immediately understood as it is conventional with speech processing circuitry that each and every module as for example shown in FIG. 1 can be implemented by means of microprocessors and hence the functions to be described can be implemented by either hardware or software.

As will be further explained each of the modules in FIG. 1 has a well defined boundary with specific inputs and outputs. In most cases it is possible to exchange a function with a substitute function to obtain a modification of system operation. For example, the module marked pulse encoder as 15 of FIG. 1 could represent a simple scalar quantization of the pulse locations and amplitudes. This could be exchanged with a more sophisticated type of quantizer.

Essentially, a major feature of the present invention as will be explained is based on the modular structure of the architecture which can, as indicated, be implemented by conventional integrated circuitry or by means of suitable software programs. The modularity leads to the ease of accommodating different system requirements. In this manner, each module will be discussed and defined in terms of its function, its inputs and outputs and hence the exact nature of the module is thus determined.

In regard to the following discussion, a variable name is given in capitalized letters for example LIR. In this manner the value of that variable is given as a variable name preceded by *, e.g. * LIR. The name of the variable and its memory address are shown as the name of the variable as for example the external data memory address of the variable LIR is LIR. One memory location greater than LIR has the address LIR-1. If 16 is the value of the variable LIR then *LIR=l6.

Referring to FIG. 2, there is shown a more detailed block diagram showing the processing of speech as performed for example by the modules of FIG. 1. In FIG. 2, there is shown a pre-emphasis module 20. Essentially, the pre-emphasis module 20 is contained within the EXEC module 10 of FIG. 1 which is again coupled to the analysis initialization or INIT module 11.

Inputs for the Pre-Emphasis module 20 all come from the EXEC module 10 and the Analysis Initialization module INIT 11. The EXEC module 10 provides N samples of speech stored contiguously in an external data memory 30 starting at a location referenced by the base name ATODIN. The number of samples N, is given by the variable LFRAME. LFRAME is either the value given by FSIZ, one less than FSIZ or one greater than FSIZ. FSIZ is a fixed value given by the Analysis Initialization module 11.

The Analysis Initialization module 11 provides a single sixteen bit quantity called PREFAC which contains the preemphasis factor. It also provides a single sixteen bit quantity called BEGIN.

The pre-emphasis 20 uses data starting at the location specified by ATODIN and BEGIN. It subtracts the value of BEGIN from the base name ATODIN to find the first valid input sample. For example, if the value in BEGIN is 11 then the first input sample is to be found in ATODIN -11.

The pre-emphasis module 20 provides an array of preemphasized speech samples stored contiguously in external data memory 30 starting at a location referenced by the base name PRSPCH. The number of samples stored at PRSPCH is given by the value of the variable FSIZ.

The module 20 performs the pre-emphasis on the input speech. The first value of the speech data, i.e. x.sub.0 is stored K samples in front of the ATODIN array. The value K is specified in the variable BEGIN. The pre-emphasis factor is .alpha.. The pre-emphasis equation is shown below. ##EQU1##

Note that x.sub.o is stored in the location ATODIN-(*BEGIN). The pre-emphasis of speech signals is known in the prior art and has been employed with analog speech. Inputs for the LPC module 21 come from the Pre-Emphasis module 20 and the Analysis Initialization module 11. The pre-emphasized speech is passed from the Pre-Emphasis module 20 via storage in the external data RAM or memory 30. The pre-emphasized speech is stored contiguously starting at a location referenced by the base name PRSPCH. The number of speech samples stored is given by the variable FSIZ. The order of the LPC filter is stored in the variable ORDER.

The LPC module 21 outputs an array of filter coefficients and an array of quantized reflection coefficients. The reflection coefficients (a.sub.O -a.sub.n) are outputted to the buffer 16 of FIG. 1. Each filter coefficient is stored as a single word. a.sub.o is equal to one and need not be stored. a.sub.1 through a.sub.n are stored beginning at the location referenced by the base name ACOEFF. N is the order of the LPC filter as specified by the variable ORDER. a.sub.1 is stored in location ACOEFF -1 while a.sub.n is stored in location ACOEFF -n. The value stored in location ACOEFF -0 is a shift factor, .beta. used to scale the rest of the coefficients. The actual value of coefficient a.sub.i is obtained by multiplying by 2.sup..beta..

The quantized reflection coefficients are stored in an array referenced by the base name QRC. k.sub.1 is stored at QRC while k.sub.10 is stored at QRC -9. The quantization is done in accordance with typical industrial standards.

The LPC module 21 accepts pre-emphasized speech samples from the current frame and performs the LPC analysis as known in the prior art. The analysis referred to here is an LPC covariance analysis solved using Cholesky decomposition. The LPC module 21 performs scalar quantization to encode the LPC reflection coefficients The quantized reflection coefficients must be converted to LPC filter coefficients. It is vitally important that the quantized reflection coefficients be used to convert to filter coefficients.

Inputs for the Pole Bandwidth Broadening module 22 come from the LPC module 21 and the Analysis Initialization module, INIT 11. The LPC module provides N LPC filter coefficients stored contiguously starting at ACOEF -1, i.e. al is stored at ACOEF -1, a.sub.i is stored at ACOEF -i. The first coefficient, a.sub.o is always 1.0 and need not be stored. The value stored at ACOEF -0 is a shift factor .beta.. Each coefficient a.sub.i is actually normalized and is scaled by 2.sup..beta.. The number N is stored in a location named ORDER which defines , the order of the LPC filter. The last coefficient is, therefore, a.sub.N. The pole bandwidth broadening factor is stored in external data memory 30 in a location referenced by the name PBBFAC.

The output of the pole BW module 22 is an array of LPC filter coefficients whose bandwidths have been broadened. The size of the array is the same as the ACOEF array. The name of the array is FC. The module 22 performs a simple multiplication on each of the LPC filter coefficients. The multiplication factor is stored in PBBFAC. It is referred to here as .beta.. If a.sub.i is an LPC filter coefficient then the broadened LPC filter coefficient a.sub.i is given as shown below. ##EQU2## N is the order of the LPC filter.

Inputs for the Pre-Emphasis Correction module 23 come from the Pole Bandwidth Broadening module 22 and the Analysis Initialization module or INIT 11. The Pole Bandwidth Broadening module 22 provides the broadened LPC filter coefficients in the array FC. There are N filter coefficients stored in FC where N is the LPC filter order as specified by the variable ORDER. FC-k holds a.sub.k. a.sub.o is always 1.0 and is not stored. Instead, FC -0 holds a number .beta. which is the scale factor. That is, the actual value of the broadened LPC filter coefficient stored at FC-k is 2.sup..beta. a.sub.k. The pre-emphasis factor is stored in PREFAC.

The output of the pre-emphasis correction module 23 is an array of LPC filter coefficients which have been corrected for pre-emphasis. The base name of the array is FCPRE. The size of this array is one location larger than the FC array. The format of the FCPRE array is identical to that of the FC array. The module 23 performs the pre-emphasis correction of the broadened LPC filter coefficients. The pre-emphasis factor is .alpha.. If a.sub.i represents a broadened LPC filter coefficient, then the corrected LPC filter coefficient, a.sub.i is given by the pre-emphasis correction on equation below. ##EQU3## a.sub.o is one and a.sub.N-1 =.alpha.*a.sub.N. N is the order of the broadened LPC filter.

Inputs for Noise Broadening module 24 come from the Pre-Emphasis Correction module 23 and the Analysis Initialization module 11. The Pre-Emphasis correction module 23 provides N LPC filter coefficients stored contiguously starting at FCPRE, i.e. al is stored at FCPRE -1, a.sub.i is stored at PCPRE -i. The first coefficient, a.sub.o is always 1.0 and need not be stored. A scale factor .beta. is stored at location FCPRE-0. The actual filter coefficient is scaled by 2.sup.62 . The number, N is one greater than the LPC filter order which is stored in a location named ORDER. The last coefficient is, therefore, a.sub.N. The noise broadening factor is stored in external data memory 30 in a location referenced by the name SSF.

The output of the Noise Broadening module 24 is an array of LPC filter coefficients whose bandwidths have been broadened. The size of the array is the same as the FCPRE array. The name of the array is NSFC. The NSFC array has the same format as the FCPRE array. The module 24 performs a simple multiplication on each of the LPC filter coefficients. The multiplication factor is stored in SSF. It is referred to here as .beta.. If a.sub.i is an LPC filter coefficient then the noise broadened LPC filter coefficient a.sub.i is given as shown below. ##EQU4## N is one greater than the order of the LPC filter.

Referring to FIG. 3, there is shown a block diagram of additional processing required. Inputs for the Noise Shaping module 31 come from the Pre-Emphasis Correction module 23, the Noise Broadening module 24, the EXEC module 20 and the Analysis Initialization module 11. The EXEC module 20 provides the speech samples to be noise filtered. Most samples are stored in the array referenced by the base name ATODIN. The remaining samples are stored in memory locations immediately and contiguously preceding the ATODIN array. The numerator and denominator filter orders are identical and that order is one greater than the value stored in the variable ORDER provided by the Analysis Initialization module 11. The same module provides the variable LIR which is the length of the impulse response. It also provides the variable FSIZ which is the size of the frame. The Noise Broadening Module 24 provides the noise-shaped filter coefficients NSFC. The Pre-Emphasis Correction Module 23 provides the filter coefficients FCPRE. The noise shaping function consists of a pole-zero filter operation. The FCPRE array contains the numerator coefficients while the NSFC array contains the denominator coefficients.

The noise shaping module 31 is a complex module in the sense that a good deal of address arithmetic takes place. A detailed description of this arithmetic is given. This can be implemented by many well known processor modules as the Texas Instruments TMS 32020 module. See also U.S. Pat. No. 4,641,238 issued on Feb. 3, 1987 to K. N. knieb entitled MULTIPROCESSOR SYSTEM EMPLOYING DYNAMICALLY PROGRAMMABLE PROCESSING ELEMENTS CONTROLLED BY A MASTER PROCESSOR and assigned to the assignee herein.

Since both filters first coefficients are always 1.0 this value is never stored. Instead, the values stored at FCPRE and NSFC are scale factors. That is, each filter coefficient is actually multiplied by 2.sup..beta. where .beta. is the appropriate scale factor. Let n.sub.i represent the i -th numerator filter coefficient where i is in the range [l,M]. The value of M is (*ORDER) -1 n.sub.i is stored in FCPRE -i. Let d.sub.i represent the i-th denominator filter coefficient where i is in the range [l,M]di is stored in NSFC -i.

The EXEC module writes speech samples every frame to the array ATODIN. It writes *LFRAME samples beginning at location ATODIN. Samples from the previous frame are stored immediately and contiguously preceding ATODIN. If x.sub.i is the input to the noise shaping filter y.sub.i the output of the filter n.sub.i the i-th numerator coefficient and d.sub.i the i -th denominator coefficient, then ##EQU5##

For k=0 i.e. the first output value, one requires the input samples from x.sub.-m through x.sub.o. Hence, by knowing where x.sub.o occurs in the ATODIN array, one can then define the input addressing. x.sub.o does not occur at ATODIN -0 as is known. Rather, x.sub.o occurs at ATODIN -(*ORDER). Therefore, at least ((*ORDER)*2)-1 samples are required from the previous frame to precede the ATODIN array.

The output of the noise shaping module 31 is an array of noise shaped speech samples. The array has the base name DESIG. Its size is *FSIZ plus the value of the variable LIR. DESIG also serves as input to this module since the pole-zero filter requires previous values of its output to calculate the current output as seen from Equation 5.

In this case, at least (*ORDER)-1 samples of the previous output must be placed immediately preceding the DESIG array. The DESIG array is (*FSIZ) (*LIR) samples long. However, the samples which are stored preceding the DESIG array are samples DESIG -(*FSIZ)-(*ORDER)-l through DESIG -(*FSIZ)-l. The storing of these last (*ORDER)-1 samples is the last thing done before exiting this module.

This module 31 performs the noise shaping on the input speech. The noise shaping filter is a pole-zero filter of the form shown below. ##EQU6## If x.sub.i is the input to the noise shaping filter, y' the output of the filter, n' the i-th numerator coefficient and d' the i-th denominator coefficient, then ##EQU7##

Inputs for the All Pole Impulse Response module 32 come from the Noise Broadening module 24 and the Analysis Initialization module 11. The Noise Broadening module 24 provides the noise shaped filter coefficients in the array NSFC. The size of this array is one larger than the LPC filter order specified by the variable ORDER. The first coefficient is stored in the NSFC array at location NSFC -1 and is a.sub.1. a.sub.o is always equal to one and need not be stored. The value stored in NSFC -0 is a shift factor .beta.. The actual value of the noise-broadened filter coefficient a.sub.i is scaled by 2.sup.62 .

The impulse response module 32 provides the impulse response of the noise shaped all pole LPC filter. The length of the impulse response is specified by the variable LIR. The impulse response is stored in an array referenced by the base name IR. The values stored in IR represent normalized values. The actual values are scaled by the shift factor .nu.. That is, the actual values are multiplied by 2 .nu.. .nu. is stored at a location referenced by the name IRSCL.

The module 32 calculates the impulse response of the noise shaped LPC filter. Careful attention to scaling is necessary to insure enough numerical precision. A C function describing the impulse response calculation is shown below. FUNCTION: Computes the impulse response of the all-pole noise shaping filter.

______________________________________ #include <stdio.h> #include <math.h> #include mplpc.h getapir(order,pdfc,lir,pir) int order,lir; float *pir, *pdfc; register int n,k,.index: *pir = 1.0; for(n=1.n<lir;n - -) *(pir-n) = 0.0: for (k=1:k<=order:k--) { index = n-k: if(index > =0) *(pir-n) = *(pdfc-k)*((*pir-index)); } } } ______________________________________

Inputs for the Impulse Response Autocorrelation module 33 come from the All Pole Impulse Response module 32 and the Analysis Initialization module 11.

This module receives the impulse response array IR and calculates the autocorrelation. The length of the IR array is specified by the variable LIR. Associated with the array IR is a scale factor. The values stored in IR represent normalized values. The actual values are scaled by the shift factor . That is, the actual values are multiplied by 2 .nu. is store at a location referenced by the name IRSCL.

The autocorrelation module 33 outputs a two-sided autocorrelation array, a one-sided autocorrelation array and a scale factor. The two-sided autocorrelation array is referenced by the base name IRCOR2. The one-sided autocorrelation array is referenced by the base name IRCOR1. The length of the one-sided autocorrelation is specified by the variable LIR. If K is the length of the one-sided autocorrelation then the length of the two-sided autocorrelation is (2*K) -1. If r' is the value of the autocorrelation function for the i-th lag, then r' is stored at IRCORI -i, IRCOR2 -K -1 -i and IRCOR 2 -K -1 -i. Associated with the arrays IRCOR1 and ICOR2 is a scale factor. The values stored in both arrays represent normalized values The actual values are scaled by the shift factor .beta.. That is, the actual values are multiplied by 2.sup..beta.. .beta. is stored at a location referenced by the name CORSCL. CORSCL may be either positive or negative.

The autocorrelation module 33 calculates the autocorrelation of the impulse response of the noise shaped LPC filter. The autocorrelation equation is shown below. ##EQU8## In addition, the data may have to be scaled appropriately to ensure that the finite precision arithmetic of the processor is not compromised. The input scale factor is stored in IRSCL. The output scale factor is to be stored in CORSCL.

Inputs for the Cross Correlation module 34 come from the Noise Shaping module 31, the All Pole Impulse Response module 32, the Analysis Main module 40, the Overhang module 35 and the Analysis Initialization module 11. The Noise-Shaping module 31 provides noise shaped speech samples in an array referenced by the base name IR and by the scale factor IRSCL. The size of the IR array is given by the variable LIR. The size of the DESIG array is the value of the variable FSIZ plus the value of the variable LIR. The relative sample location in the DESIG array to start the cross correlation is given in the variable PTRDES. PTRDES is set in the Analysis Main module 40.

The Overhang module 35 provides an array of samples which are the result of the synthesis filter ring down. The array is referenced by the base name OVR. Its size is the value of the variable BLKSIZ plus the value of the variable LIR.

The output from the cross correlation module 34 are two arrays of BLKSIZ samples each. They are referenced by the base names XCOR1 and XCOR2. The module 34 performs the cross correlation between the noise shaped speech and the impulse response of the noise shaped synthesis filter.

The first calculation to perform is to subtract the samples in the OVR array from the noise shaped speech samples. The result is be placed in a local array. For the sake of explanation; let's call the difference w.sup.n. The number of samples in the difference array is N. The number of samples in the impulse response is M. The impulse response is denoted by h.sub.n. If the cross correlation is .theta..sub.n, then ##EQU9## L is the value of the variable BLKSIZ.

Inputs for the Pick Pulse module 41 come from the Cross Correlation module 34 the Correlation Update module 42, the Impulse Response Autocorrelation module 33, the Analysis Main module 40 and the Analysis Initialization module 11. The Cross Correlation module 34 and the Correlation Update module 42 provide a cross correlation array referenced by the base name XCOR2. The Impulse Response Autocorrelation module 33 provides an array referenced by the base name IRCOR1 and a variable referenced by the name CORSCL. The value stored in CORSCL is a scale factor used to adjust the IRCOR1 array values. The Analysis Initialization module 11 provides the variables NPULSE and BLKSIZ. The Analysis Main module 40 provides the variable PCNTR.

The output of this pick pulse module 41 is a pulse location and amplitude. The amplitude is stored in the variable PAMP while the location is stored in the variable PLOC. The module 41 performs the search for the maximum cross correlation term and then determines the location and amplitude of the next MPLPC pulse. It searches the cross correlation array XCOR2 for the largest magnitude pulse. The size of the array is contained in the variable BLKSIZ. The location of the MPLPC pulse is the same as that of the largest magnitude cross correlation pulse, i.e., in the range [O,BLKSIZ-1.]

The amplitude of the MPLPC pulse is the value (negative or positive) of the largest cross-correlation value divided by the value of the impulse response autocorrelation value at lag 0. The impulse response autocorrelation value at lag 0 has to be scaled appropriately by *CORSCL. An LPC frame is 192 samples long. For each block, currently three MPLPC pulses are found. The locations of the first two pulses in a block are not constrained. The location of the last pulse in a block is constrained due to quantization constraints. The third pulse must be located no further than 24 locations from any other pulse in the block. Also at least one of the pulses must occur in one of the first 25 locations in the block. The burden of these constraints is placed on the third pulse. Therefore, the search for the third pulse must be constrained to lie in the range so defined by the above two constraints.

The variables PULSE and PCNTR are provided so that the user may determine when the constraints must be applied. Whenever the value of PCNTR plus the number 1 is divisible in whole by the value of NPULSE, then the constraints must be applied. For example the value of PCNTR is 0 when the initial pulse is found. Since NPULSE is 3, (0+1)/3 is not an integer so the constraints are not applied. When PCNTR is 1, the second pulse is found. (1+1)/3 is not an integer so the constraints are not applied. However, when PCNTR is 2, the third pulse is found and (2+1)/3 is an integer and the constraints are applied.

Inputs for the Add Pulse module 43 come from the Pick Pulse module 41 and the Analysis Initialization module 11. The Pick Pulse module 41 provides a pulse location and amplitude. The amplitude is stored in the variable PAMP while the location is stored in the variable PLOC. The Analysis Initialization module 11 provides the variable