|
Claims  |
|
|
What is claimed is:
1. A speech synthesis apparatus comprising:
a wavetable synthesizer; and
a speech element wavetable memory coupled to the wavetable synthesizer, the speech element wavetable memory storing a plurality of primitive speech sounds for processing on the wavetable synthesizer and generating speech sounds.
2. An apparatus according to claim 1, wherein:
the primitive speech sounds are individually assigned to a memory cell of the speech element wavetable memory designated by an instrument identification.
3. An apparatus according to claim 1, wherein:
the primitive speech sounds selected from among sound bites, entire words and phrases, frequently-occurring syllables, phonemes and smaller atomic speech elements.
4. An apparatus according to claim 1, wherein:
the wavetable synthesizer includes an oscillator, a volume scalar, a pan scalar, and an effects processor for playing back the primitive speech elements at a selected pitch, duration, attack velocity and envelope, sustain, and decay velocity and
envelope.
5. An apparatus according to claim 1, wherein:
the speech element wavetable memory includes:
a speech sample database storing a plurality of speech samples; and
a speech reference database including a dictionary, a context list, and an heuristic rules list.
6. An apparatus according to claim 5, wherein:
the dictionary stores sampled words and phonics and an encoding designating the pronunciation of the words and phonics; and
the context list encodes emphasis, lift and emotion that are expressed using variations in volume and addition of vibrato and tremolo.
7. An apparatus according to claim 5, wherein:
the heuristic rules list includes information for guiding decisions relating to selection of primitive speech element, duration, and volume.
8. An apparatus according to claim 1, wherein:
the primitive speech sounds selected include multiple voices and voices combined with sounds; and
the wavetable synthesizer includes multiple channels for creating sounds including the multiple voices and voices combined with sounds simultaneously.
9. A method of synthesizing speech sounds comprising:
storing a plurality of primitive speech sounds in a speech element wavetable memory; and
generating speech sounds as a function of the stored plurality of primitive speech sounds using a wavetable synthesizer.
10. A method according to claim 9, wherein:
storing the plurality of primitive speech sounds includes individually assigning the primitive speech sounds to a memory cell of the speech element wavetable memory designated by an instrument identification.
11. A method according to claim 9, wherein:
storing the plurality of primitive speech sounds includes storing primitive speech sounds in the form of sound bites, entire words and phrases, frequently-occurring syllables, phonemes and smaller atomic speech elements; and
generating speech sounds as a function of the stored plurality of primitive speech sounds includes selecting from the primitive speech sounds.
12. A method according to claim 9, wherein:
generating speech sounds as a function of the stored plurality of primitive speech sounds includes playing back the primitive speech elements at a selected pitch, duration, attack velocity and envelope, sustain, and decay velocity and envelope
using a wavetable synthesizer including an oscillator, a volume scalar, a pan scalar, and an effects processor.
13. A method according to claim 9, wherein:
storing the plurality of primitive speech sounds includes:
storing a plurality of speech samples in a speech sample database; and
storing a dictionary, a context list, and an heuristic rules list in a speech reference database.
14. A method according to claim 13, wherein:
storing a dictionary includes storing sampled words and phonics and an encoding designating the pronunciation of the words and phonics; and
storing a context list includes encoding emphasis, lift and emotion expressed using variations in volume and addition of vibrato and tremolo.
15. A method according to claim 13, wherein:
storing an heuristic rules list includes storing information guiding decisions relating to selection of primitive speech element, duration, and volume.
16. A method according to claim 9, wherein:
storing primitive speech sounds includes storing multiple voices and voices combined with sounds; and
creating sounds including the multiple voices and voices combined with sounds simultaneously.
17. A speech synthesis apparatus comprising:
means for storing a plurality of primitive speech sounds in a speech element wavetable memory; and
means coupled to the storing means for generating speech sounds as a function of the stored plurality of primitive speech sounds using a wavetable synthesizer.
18. A computer system comprising:
a processor;
a memory coupled to the processor and storing a plurality of primitive speech sounds in a speech element wavetable memory; and
an executable program code executable on the processor for generating speech sounds as a function of the stored plurality of primitive speech sounds using a wavetable synthesizer including an effects processor.
19. A computer system according to claim 18 wherein the processor is an MMX processor.
20. A computer system comprising:
a processor;
means coupled to the processor for storing a plurality of primitive speech sounds in a speech element wavetable memory; and
means coupled to the processor and coupled to the storing means for generating speech sounds as a function of the stored plurality of primitive speech sounds using a wavetable synthesizer.
21. A computer system comprising:
a processor; and
a speech synthesis apparatus coupled to the processor apparatus including:
a wavetable synthesizer, including an effects processor; and
a speech element wavetable memory coupled to the wavetable synthesizer, the speech element wavetable memory storing a plurality of primitive speech sounds for processing on the wavetable synthesizer and generating speech sounds.
22. A telephone system comprising:
a telephone;
a controller coupled to the telephone; and
a speech synthesis apparatus coupled to the controller including:
a wavetable synthesizer; and
a speech element wavetable memory coupled to the wavetable synthesizer, the speech element wavetable memory storing a plurality of primitive speech sounds for processing on the wavetable synthesizer and generating speech sounds.
23. A communication apparatus comprising:
an interface for connecting to a communication system; and
a speech synthesis apparatus coupled to the interface including:
a wavetable synthesizer; and
a speech element wavetable memory coupled to the wavetable synthesizer, the speech element wavetable memory storing a plurality of primitive speech sounds for processing on the wavetable synthesizer and generating speech sounds.
24. A communication apparatus according to claim 23 wherein the interface communicates with a modem. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech synthesizer and speech synthesis technique. More specifically, the present invention relates to a speech synthesizer and operating method that produces an improved, robust sound through utilization of
wavetable synthesis techniques.
2. Description of the Related Art
Speech synthesis is the computer generation of sound that resembles human speech. Speech synthesizers have evolved from systems that store and replay speech sounds in the form of simple phonics to more elemental common particles of sound to
sound bites including words and phrases. What is common among digital speech processing systems throughout this evolution is the playback of fundamentally flawed speech with lifeless, monotonic sounds that are unnaturally stilted and formal through
repetitious playback of a limited library of sounds.
Speech synthesis is accomplished using a speech synthesizer operating on stored sounds and algorithms. The speech synthesizer is a device that converts a numerical code representing a digital speech signal into recognizable speech sounds. The
digital speech signal is sampled and recorded speech which is divided into small sound units. The small sound units have characteristics such as pitch, loudness and timbre that are represented as excitation and filter parameter numbers which become a
digital code representing speech. Human speech sounds are stored, generally in ROM, EPROM, RAM, CD, or disk memory or are created by a program, and then generated from the stored digital code by excitation of a time-varying digital filter and played
over a loudspeaker. A processor supplies overall control of speech production. The process of speech production is typically a digital process up to the point of an analog-to-digital converter, which supplies an analog signal to drive a speaker.
An alternative to the time-varying filter approach is a speech generation system which stores digitized speech data signals, samples the speech data at a constant rate such as an 8 kHz rate, interpolates the data for example to a 100 kHz rate.
In a further alternative embodiment, logarithmically compressed amplitude data are used which are analogous to the data processed by digital telephone systems and result in a data rate of 64 kbits/second with very good sound quality. The
time-varying filter techniques supply acceptable speech quality but at a much lower digital input data rate. For example, average rates down to about 1200 bits/second for a ten-pole filter derived from a linear production model of speech. The low data
rates for speech generation are possible due to the redundancy in speech and by using a simplified simulator of the human speech-generating system. The vocal tract is simulated by a dozen or so connected pipes of different diameter, and the excitation
represented by a pulse stream at the vocal-chord rate for voiced sound or a random noise source for the unvoiced parts of speech. The reflection coefficients at the junctions of the pipes are obtained from a linear prediction analysis of the speech
waveform.
The synthesis techniques for synthesizing speech sounds are substantially different from the synthesis techniques which have been developed to synthesize music. Some music synthesis techniques attempt to mimic the acoustical characteristics of
an actual musical instrument. Other techniques generate musical sounds based on mathematical analysis and relationships.
One type of synthesis for generating musical sounds is called subtractive synthesis. Subtractive synthesis closely imitates the physical basis of sound generation inherent in acoustic musical instruments. A harmonic-rich periodic signal is
generated that contains energy at every partial frequency existing in the sound to be produced. Specific selected frequency components are selectively altered using filters. The filters subtract unwanted frequencies. Electronic filters also supply a
frequency-dependent gain so that selected frequencies are enhanced. Subtractive synthesis employs an envelope generator such as a voltage-controlled amplifier or analog multiplier to selectively alter the frequency components of the sound. Subtractive
synthesis generates musical sounds in a manner analogous to an actual acoustic instrument so that the physics of the functional basis of the instrument serve as a model for designing the subtractive synthesis technique. Subtractive synthesis using
digital techniques is relatively difficult and complex since substantial computations are necessary to generate a harmonic-rich signal that is properly band-limited.
Additive synthesis is a musical synthesis technique in which each partial frequency is generated separately, arbitrarily and independently. The separate partial frequencies are added to form a music signal. Each partial frequency is an integer
multiple of the fundamental frequency of the sound to be generated. Additive synthesis functions by providing a plurality of separate oscillators, each of which generally forms a sine wave, and combining the separate sine waves to form a signal that
sounds as close as possible to a particular sound.
A further music synthesis method is wavetable synthesis. Wavetable synthesis is a method of generating sound by playing back digitally stored samples. Real musical sounds, performed by actual musical instruments, are sampled and stored in a
digital recording format in a storage such as a read-only memory (ROM). The digital sound recordings are sampled and mapped to accurately reproduce the acoustic range of the instrument.
In wavetable synthesis, a sample is a recorded sound stored in a digital data form. An instrument is a selectable entry which defines a particular type of sound corresponding to the sound produced by a specific musical instrument. A wave is a
sample or group of samples that are used to reproduce the sound of an instrument over an entire range of frequencies. Instruments are either single-sampled or multi-sampled depending on the timbral characteristics of the corresponding musical
instrument, sampling characteristics of the data and sampling system, and playback characteristics of the data and playback system. Some instruments, a flute for example, are typically single-sampled. Other instruments, such as a piano, have a more
complex data structure and are nearly always sampled, stored, and played in multiple samples. A program is a set of parameters that are selected to completely define a wavetable synthesizers generation of a particular sound.
Wavetable synthesis may be practiced by sampling and playing back a virtually limitless amount of data. However, system performance, circuit and memory size, and cost are advantageously reduced through many data reduction techniques. One such
data reduction technique is termed "looping". Musical sounds are highly sustained and highly repetitive. Looping exploits the sustained and repetitive nature of sound by playing back a section of a sample repeatedly. Different types of looping are
typically supported, including forward looping, reverse looping, bi-directional looping and the like.
Conventional computer-generated speech devices create sounds that are unnaturally stilted and formal due to the repetitious usage of a limited library of sound elements. What is needed is a speech synthesis apparatus and technique that improves
the sound of computer-generated speech. What is further needed is a speech synthesis device that generates an interesting, robust-sounding speech.
SUMMARY OF THE INVENTION
It has been discovered that music wavetable synthesis techniques can be advantageously applied to synthesize speech.
In accordance with the present invention, a wavetable speech synthesis apparatus includes a wavetable memory for defining a plurality of primitive speech sounds. The primitive speech elements are individually assigned to a memory cell designated
by an instrument identification in the wavetable memory. Various primitive speech elements are defined and selected from among sound bites, entire words and phrases, frequently-occurring syllables, phonemes or smaller atomic speech elements. The
primitive speech elements generate primitive sounds that are played back at a selected pitch, duration, attack velocity and envelope, sustain, and decay velocity and envelope.
Various types of speaker qualities or identities are assigned to different frequency ranges of the speech elements. In one example, the lowest octave is assigned to "grandfather" speech samples. The next lowest octave is assigned to "father"
speech samples. Then, in order, "grandmother", "mother", "brother", "sister", and "baby" speech samples are assigned to sequentially higher octaves.
In another example, a lowest octave is assigned to a voice expressing the emotion of anger. Then, in order, the emotions of surprise, boredom, normalcy, fright, and the like are assigned to sequentially higher octaves.
The wavetable memory includes a speech sample database and a speech reference database. The speech sample database supplies speech signals that are processed by the wavetable synthesizer according to information contained in the speech reference
database. Reference information in the speech reference database includes various dictionaries, context lists, algorithms, and heuristic rules for guiding decisions relating to selection of primitive speech element, duration, volume and other
parameters. The dictionaries store of sampled words and phonics and an encoding designating the pronunciation of the words and phonics. The context lists encode emphasis, lift and emotion that are expressed using variations in volume and addition of
vibrato and tremolo.
In accordance with an embodiment of the present invention, the wavetable synthesizer forms words from a plurality of different primitive speech elements so that variations from note to note are available to pitch shift the sounds, generating
interesting randomness into speech. Multiple primitive speech elements are combined into a word while note variations are used to control speed, emphasis and context. The sounds of speech are further processed by adding tremolo and vibrato.
Many advantages are gained by the described speech synthesis system and operating method. One advantage is that a wavetable speech synthesis device provides for the simple introduction of multiple character voices or multiple tones of voice at a
reasonable cost. Another advantage is that effects such as tremolo and vibrato can be used to express a more natural sounding speech by allowing sound pitch, duration and volume to be varied as speech progresses. Volume of speech is selectively
dithered to generate a more random speech effect. Other special effects including light echo, chorus and reverb are selectively added to speech to generate a voice having a more realistic sound.
It is further advantageous that the described speech synthesis method and system uses samples that are processed to apply to a specific person or group of people selected from a particular age, gender, occupational, cultural, or other group.
Similarly, the samples are processed to apply to particular conditions or situations, such as stressful, frightful, or happy situations. It is advantageous that the described speech synthesis method and system advantageously generates multiple sounds
simultaneously such as occurs in case of background conversation with multiple voices active at one time including overlapping of voice sounds.
The wavetable speech synthesizer has advantages over systems that merely play back phoneme, syllabic or word wave patches because the wavetable speech synthesizer can change pitch, duration, tremolo, vibrato and the like during the expression of
a note, thereby expressing emotion and emphasis as well as the characters and sounds of speech.
BRIEF DESCRIPTION OF THE DRAWINGS
The features of the described embodiments believed to be novel are specifically set forth in the appended claims. However, embodiments of the invention relating to both structure and method of operation, may best be understood by referring to
the following description and accompanying drawings.
FIG. 1 is a schematic block diagram illustrating a computer system embodiment of a Speech Synthesis device which access stored wavetable speech data from a memory and generate speech signals for performance.
FIG. 2 is a schematic block diagram illustrating a telephonic system embodiment of a Speech Synthesis device which access stored wavetable speech data from a memory and generate speech signals for performance.
FIG. 3 is a schematic block diagram illustrating a computer system incorporating an audio wavetable synthesizer integrated circuit in accordance with one embodiment of the present invention.
FIG. 4 is a schematic block diagram illustrating an embodiment of the audio wavetable synthesizer integrated circuit for performing logic and digital signal processing supporting audio functions and including a vertical wavetable cache in
accordance with an embodiment of the present invention.
FIG. 5 is a flow chart illustrating an embodiment of a method for coding samples of speech sounds which is performed under the direction of a speech editor program.
FIG. 6 is a schematic block diagram illustrating a representation of a voice architecture definition.
FIG. 7 is a schematic block diagram depicting fundamental signal data paths of a wavetable synthesizer.
FIG. 8 is a signal flow diagram shows flow of a signal from a first voice to a second voice in which two of 32 available voices are linked as a signal generator voice and an effects processor voice.
FIG. 9 is a flowchart which illustrates a method for generating sound using signal voices.
FIG. 10 is a flowchart which illustrates a method for using a voice as an effects processor.
DESCRIPTION OF THE PREFERRED EMBODIMENT(S)
Referring to FIGS. 1 and 2, a pair of schematic high-level block diagrams illustrating two embodiments of a Speech Synthesis device 100 which access stored wavetable speech data from a memory and generate speech signals for performance. In an
embodiment shown in FIG. 1, a computer system 100 includes the speech processor 102, a central processing unit 104, a memory 106, and an interface 108, connected to a modem 110. The computer system 100 also includes a keyboard 112 and a display 114
forming a user interface. The speech processor 102 performs various functions such as reading back e-mail messages that are textually written for access by the computer system 100. In another application, the speech processor 102 may be used to supply
Internet data to a blind user.
In an embodiment shown in FIG. 2, a telephone system 150 includes the speech processor 152 for processing telephonic messages, a central processing unit 154, a memory 156, and an interface 158, connected to a modem 160. One application of the
telephone system 150 is a system supplying Internet data to a user by telephone.
Referring to FIG. 3, a schematic block diagram illustrates an audio performance computer system 300 including an audio wavetable synthesizer integrated -.circuit 310. The computer system 300 employs an architecture based on a bus, such as an
Intel.TM. PCI bus interface 320, and includes a central processing unit (CPU) 302 connected to the PCI bus interface 320 through a Host/PCI/Cache interface 304. The CPU 302 is connected to a main system memory 306 through the Host/PCI/Cache interface
304. A plurality of various special-purpose circuits may be connected to the PCI bus interface 320 such as, for example, the audio wavetable synthesizer integrated circuit 310, a motion video circuit 330 connected to a video memory 331, a graphics
adapter 332 connected to a video frame buffer 333, a small systems computer interface (SCSI) adapter 334, a local area network (LAN) adapter 336, and perhaps a expansion bus such as an ISA expansion bus 338 which is connected to the PCI bus interface 320
through an SIO PCI/ISA bridge 340.
The audio wavetable synthesizer integrated circuit 310 accesses musical voice data in several different voices and processes the multiple voice data into a single set of audio signals, such as stereo audio signals, although other audio formats
such as three-output, five-output, theater-in-the-home formats and other audio formats are also possible. A voice data signal is a single defined sound such as a note of one instrument, a digital audio file, or a digital speech file.
The audio wavetable synthesizer integrated circuit 310 advantageously supplies high-quality, low-cost audio functions in a personal computer environment. The audio wavetable synthesizer integrated circuit 310 supports logic functions and digital
signal processing for performing audio functions typically found in personal computer systems. The audio wavetable synthesizer integrated circuit 310 incorporates a polyphonic music synthesizer and a stereo codec. The audio wavetable synthesizer
integrated circuit 310 generates audio signals based on data that is received from the main system memory 306, rather than through a local memory interface. Accordingly, performance of the audio wavetable synthesizer integrated circuit 310 is highly
dependent on the bus communication structures of the computer system 300. In one embodiment, the audio wavetable synthesizer integrated circuit 310 addresses up to 64 Mbytes of system memory 306 and generates an audio signal including up to 32
simultaneous voices.
Various embodiments of the computer system 300 use operating systems such as MS-DOS.TM., Windows.TM., Windows 95.TM., Windows NT.TM. and the like.
Referring to FIG. 4, a schematic block diagram illustrates an embodiment of the audio wavetable synthesizer integrated circuit 310 performs logic and digital signal processing supporting audio functions implemented in a personal computer. The
audio wavetable synthesizer 310 is connected to a PCI bus interface 320 and includes a PCI bus interface unit 402, an audio codec 404, an audio cache 406, and an audio synthesizer 408.
The PCI bus interface unit 402 is connected between the PCI bus 320 and two buses internal to the audio wavetable synthesizer 310, specifically a general (GEN) bus 428 and a temporary (TMP) bus 432. The TMP bus 432 is internal to the audio cache
406. The audio cache 406 includes the TMP bus 432, a TMP bus control circuit 442 and a voice data queue 440. The TMP bus control circuit 442 and the voice data queue 440 are connected to the TMP bus 432.
The audio synthesizer 408 is connected to the GEN bus 428 and communicates via the PCI bus 320 through the PCI bus interface unit 402. The audio synthesizer 408 includes a 16-bit synthesizer bus 450 which is connected to the GEN bus 428 by a
synthesizer bus interface 452. The audio synthesizer 408 includes a synthesizer bus controller 454, an audio digital signal processor (DSP) 456, a plurality of digital signal processor (DSP) registers 458, a PCI-Audio data controller 460, and an audio
static random access memory (SRAM) 462. The audio DSP 456 is connected to the synthesizer bus 450 and connected to the TMP bus 432 of the audio cache 406. The synthesizer bus controller 454, the PCI-Audio data controller 460, and the audio SRAM 462 are
connected to the synthesizer bus 450. The DSP registers 458 are connected to the audio DSP 456.
The audio DSP 456 processes the multiple voices of the digital musical signal by performing various known signal processing functions, most fundamentally by performing sample rate conversion and mixing. Sample rate conversion is performed so
coordinate the input signal rate of a musical voice signal to an output audio rate since a single output rate is imposed and the input signals commonly may have multiple different sampling rates. For example, the output rate of the audio DSP 456 may be
44.1 kHz while the input rate of a signal such as a telephony-type codec is 8 kHz so that the audio DSP 456 interpolates to generate an output signal at 44.1 kHz.
Furthermore, voice memory is conserved by storing a single voice musical system to represent multiple octaves of a note. The sample rate is converted to provide multiple harmonic key registers to a single stored note. For example, a voice file
is typically recorded at the output frequency of the audio DSP 456 (44.1 kHz). A voice signal corresponding to a single key, for example a middle-C, is recorded at 44.1 kHz and saved in the memory so that the sample rate conversion frequency ratio
F.sub.c is equal to one. To conserve memory, other harmonics of the voice signal such as a D or E is generated by reading the sample corresponding to a middle-C and converting the sample rate. The output frequency is increased by a full octave for an
F.sub.C equal to two, and increased by two octaves for an F.sub.C equal to four.
The sample rate conversion frequency ratio F.sub.C represents the rate at which the audio wavetable synthesizer integrated circuit 310 processes a data file in the system memory 306. Thus, the sample rate conversion frequency ratio F.sub.C is
important for determining an favorable size of each queue of the voice data queue 440. If the sample rate conversion frequency ratio F.sub.C is large, data is accessed from the queue at a high rate so a large queue is advantageous for reducing the
servicing of the queue. However, if the queue is too large, the audio wavetable synthesizer integrated circuit 310 must include a large amount of memory, disadvantageously increasing the size of the circuit.
The audio wavetable synthesizer integrated circuit 310 processes all of the data for a single voice at one time so that the size of the queue for handling a single voice determines the performance of the audio performance computer system 300. If
the queue for storing data for a single voice is small, the audio wavetable synthesizer integrated circuit 310 must frequently request data from the system memory 306, reducing performance by increasing traffic on the PCI bus 320 and delaying processing
of audio signals. Using a small queue, performance is audio processing performance is further reduced when the sample rate conversion frequency ratio Fc is large.
The voice data queue 440 is therefore designed in a vertical cache structure having large voice queues but reducing the number of voice queues that are active at one time. In particular, the vertical cache structure includes a substantially
reduced set of active voice queues, typically three or four, rather than having an active voice queue for each performed voice. Each of the active voice queues in the vertical cache structure is substantially larger than the voice queues in a system
having an active voice queue for each performed voice. In this manner, data communication between the system memory 306 and the audio DSP 456 is greatly reduced while the queue memory size in the audio wavetable synthesizer integrated circuit 310 is not
increased.
In the vertical cache structure, the illustrative voice data queue 440 includes four queues instead of having a queue allocated to each voice. Data from the system memory 306 is accessed to fill a single queue at a time so that the audio DSP 456
operates on a plurality of frames in a "frame batch" for each voice at one time. In the illustrative embodiment, a frame batch includes 32 frames. The PCI-Audio data controller 460 requests 32 frames of data for a single voice from the system memory
306. The 32 frames of single-voice data are communicated from the system memory 306 to the voice data queue 440 in a burst mode. The audio DSP 456 processes the 32 frames of data for the single voice and the results are accumulated by the audio DSP 456
and stored in the audio SRAM 462. The PCI-Audio data controller 460 then requests 32 frames of data for a next single voice, progressing through all 32 voices but processing the frame batch data for each voice separately. The PCI bus 320, like most
buses, operates more efficiently when data is communicated in a block at one time rather than by transmitting data a single piece at a time. Thus, the vertical cache structure advantageously processes multiple samples of a single voice at one time.
The number of voice queues in the voice data queue 440, typically three or four voice queues, is selected to substantially increase the size of a single voice queue while maintaining the total size of the voice data queue 440 at a reasonable
level. Multiple voice queues are implemented so that data is loaded from the system memory 306 to a first voice queue of the voice data queue 440 while data is written from a second voice queue to the audio DSP 456 so that the first voice queue is
filled as the data from the second voice queue is processed. More than two voice queues are implemented to assure that the signal processing circuits of the audio DSP 456 remain bus, reducing the possibility that a queue will become empty due to bus
latencies or congestion on the PCI bus 320. The latencies involved in communicating data via the PCI bus 320 vary widely and unpredictably based on the specifications and load of the audio performance computer system 300. The processing of the audio
DSP 456 proceeds at a generally steady pace while the filling of the queues from them system memory 306 via the PCI bus 320 is highly variable.
The operation of the voice data queue 440 is illustrated by an example in which voice 0 data is previously loaded into a voice queue 0 and is presently accessed by the signal processor circuits of the audio DSP 456. Voice 1 data is filled into
voice queue 1 of the voice data queue 440, voice 2 data is filled into voice queue 2, and voice 3 data is filled into voice queue 3 as the voice 0 data is processed by the audio DSP 456. When processing of the voice 0 data is complete, the audio DSP 456
begins processing of the voice 1 data from the voice 1 queue while filling of voice queues 1, 2 and 3 is completed if such filling is not yet completed and voice queue 0 is filled with voice 4 data. In subsequent cycles, voice 5-31 data are filled into
the voice data queue 440 and processed. In this manner, data from the system memory 306 is filled into the voice data queue 440 over the PCI bus 320 asynchronously from the processing of the queued data by the audio DSP 456.
Mixing is performed to mix the signals of the multiple voices to create a composite sound. The audio DSP 456 also performs other processing such as separation of a voice into two channels for stereo performance, balancing the signal between
different channels, performing three-dimensional localization of multiple output signal channels and other operations.
The DSP registers 458 include an audio DSP system memory address register (ADSMA) and an audio DSP master control register (ADMC). The audio DSP system memory address register (ADSMA) has a format, as follows:
31:0
SAP
where SAP is a system address pointer. The system address pointer specifies the system address pointer for master data accesses.
The audio DSP master control register (ADMC) has a format, as follows:
______________________________________ 15:9 8 7:6 5:0 Reserved RdWr.sub.-- L TMPqueue DWCount ______________________________________
where DWCount is a doubleword (DWORD) count, TMPqueue is a TMP-bus queue number, and RdWr.sub.-- L is a read-write bit. DWCount specifies the number of double words (DWORDs) to be accessed from system memory 306 in a PCI burst. TMPqueue
specifies which of four data queues on the TMP bus 432 is the source or destination of the data. The read-write bit RdWr.sub.-- L, when reset, specifies that the system memory master access is to originate from the PCI master write data FIFO 420 and be
written to system memory 306. The read-write bit RdWr.sub.-- L, when set, specifies that the system memory access is to originate from system memory 306 and be sent to the PCI master read data FIFO 418.
The PCI bus interface unit 402 includes a bus interface circuit 410, a master state machine 412, and a target state machine 414. The PCI bus interface unit 402 also includes a PCI bus master control unit 416, a PCI master read data FIFO 418, a
PCI master write data FIFO 420, a target data to bus converter 422, and configuration registers 424.
The bus interface circuit 410 is directly connected to the PCI interface 320, the master state machine 412 and the target state machine 414. The bus interface circuit 410 includes I/O pad state machines, latches, decoding circuits, parity
generation circuits and multiplexers for handling data transfer to the audio wavetable synthesizer 310. The I/O pad state machines of the bus interface circuit 410 are simple controllers for PCI output signals. The master state machine 412 and the
target state machine 414 generate control signals for controlling input and output signals of the PCI bus interface unit 402 according to the PCI protocol and track the current state of the PCI bus 320. The bus interface circuit 410, master state
machine 412, and target state machine 414 are designed to comply to PCI bus timing rules and generally operate as slaves to the PCI bus 320 and to the PCI bus master control unit 416.
Target data accesses are controlled by the target state machine 414 and pass from the PCI bus 320 through the bus interface circuit 410 to a target address and data (TAD) bus 426. The TAD bus 426 has a width of 32 bits. The target data accesses
are passed from the TAD bus 426 to a destination determined by the target address, either the configuration registers 424 on the TAD bus 426 or through the target data to bus converter 422 to the general (GEN) bus 428. The GEN bus 428 conveys target
data accesses to the audio DSP 456. The GEN bus 428 has a width of sixteen bits. The target data to bus converter 422 converts 32-bit data from the TAD bus 426 into a 16-bit data form for placement on the GEN bus 428. The target data to bus converter
422 includes configuration registers and decoders for converting the data. Target data accesses are generated by the CPU 302 and controlled by the target state machine 414 to control operations of the audio DSP 456 and the PCI bus master control unit
416.
Master data are passed from the PCI bus 320 through the bus interface circuit 410 to a master address and data (MAD) bus 428. Master data includes wavetable data read from the wavetable memory 200. The MAD bus 430 has a width of 32 bits. Under
control of the PCI bus master control unit 416, data is passed from the MAD bus 430 to the GEN bus 428 or to the temporary (TMP) bus 432 through the PCI master read data FIFO 418. The TMP bus 432 carries sample voice data to the voice data queue 440.
The TMP bus 432 has a width of 32 bits. Also under control of the PCI bus master control unit 416, data is passed from the GEN bus 428 or from the TMP bus 432 to the MAD bus 430 through the PCI master write data FIFO 420.
The PCI bus master control unit 416 is connected to the MAD bus 430, the GEN bus 428 and the TMP bus 432 for communicating master data. The PCI bus master control unit 416 manages interfacing to the master state machine 412 to initiate master
bus cycles. The PCI bus master control unit 416 generates addresses for accessing data in the system memory 306. The PCI bus master control unit 416 includes an array of programmable registers (not shown) which are programmed to generate automatic data
access signals to the system memory 306. The PCI bus master control unit 416 then directs the transfer of the accessed data to either the GEN bus 428 or the TMP bus 432. The programmable registers in the PCI bus master control unit 416 are programmed
to generating both read and write accesses to the system memory 306. The programmable registers in the PCI bus master control unit 416 are programmed by a system CPU 302 using target accesses and by the audio synthesizer 408. Accordingly, master bus
cycles are initiated both from the system CPU 302 and from the audio synthesizer 408.
In the case of master write signals, the PCI bus master control unit 416, when the access is requested, moves data from the buffer of a requesting machine (not shown) on the PCI bus 320 into the PCI master write data FIFO 420. In one example,
the PCI bus master control unit 416 moves data from an audio codec record path FIFO (not shown) into the PCI master write data FIFO 420. The PCI bus master control unit 416 then performs a plurality of master bus cycles.
In the case of master read cycles, the PCI bus master control unit 416 first performs the master bus cycles to move data from the system memory 306 into the PCI master read data FIFO 418. Then the PCI bus master control unit 416 moves the data
to the buffer of the requesting machine on the PCI bus 320.
The audio wavetable synthesizer 310 includes many features for improving audio performance by increasing data flow from the PCI bus 320 to the audio DSP 456. The highest performance data flowpath is the master data flowpath through the MAD bus
430 and either the PCI master read data FIFO 418 or the PCI master write data FIFO 420, depending on the data flow direction. The master data flow path is isolated from the 16-bit GEN bus 428 and the 16-bit synthesizer bus 450, instead traversing the
TMP bus 432 to prevent the buses internal to the audio wavetable synthesizer 310 from choking other system data flow through the audio wavetable synthesizer 310.
The remainder of the data flow, not including the master data flowpath, traverses the GEN bus 428. Target data accesses typically pass through the GEN bus 428 to destinations including the system memory 306 and various internal registers
throughout the audio wavetable synthesizer 310. Low bandwidth master data also flows via the GEN bus 428. The synthesizer bus 450 in the audio synthesizer 408 is a separate extension to the GEN bus 428 and forms a primary communication bus for the
synthesizer bus controller 454, the audio DSP 456, the PCI-Audio data controller 460, and the audio SRAM 462. The synthesizer bus 450 is isolated from the GEN bus 428 so that data flows over the synthesizer bus 450 without a heavy amount of bus traffic
choking the GEN bus 428. Both the GEN bus 428 and the synthesizer bus 450 use the same communication protocol and an identical addressing scheme.
In the described embodiment, the audio DSP 456 includes an audio digital-to-analog converter (DAC) (not shown) operating at a rate of 44,100 samples per second (44.1 kHz). Accordingly, the output data rate of the audio DSP 456 is 44.1 kHz,
although the input data rate can be substantially any rate. One sample period is called a frame. A group of 32 samples is called a frame batch. The audio DSP 456 includes two 32-sample stereo accumulators (not shown) for passing data to the audio DAC. As a first audio DAC is updated with the next frame batch for transfer to the audio DAC, a second audio DAC passes current data to the audio DAC.
Nearly all blocks of the audio wavetable synthesizer 310 operate synchronously at the clock rate of the PCI bus 320, typically 33 MHz. The blocks operating at the clock rate of the PCI bus 320 include the PCI bus interface unit 402, the audio
synthesizer 408 and all buses. The audio codec 404 and a telephony codec (not shown), which may be included in other embodiments of an audio wavetable synthesizer, operate at various selected rates that are typically based upon a 16.9344 MHz oscillator.
Referring to FIG. 5, a flow chart illustrates an embodiment of a method for coding samples of speech sounds, which is performed under the direction of a speech editor program 500. The speech editor program 500 is executed to define a set of
primitive speech elements based on input source material such as email, sample text, dictionaries, literature and the like. The speech editor program 500 is typically an interactive program that includes a translator 510 for translating a speech sample
database 512 based on a speech reference database 514. The speech reference database 514 typically includes various dictionaries, context lists, algorithms, heuristic rules and the like and is used to make decisions relating to selection of primitive
speech element, duration, volume and other parameters.
The speech editor program 500 includes a speech sample acquisition routine 502 for acquiring raw speech samples for storing in the speech sample database 512. The speech sample acquisition routine 502 performs acquisition, processing, and
storage of speech samples. The method of the speech sample acquisition routine 502 includes multiple steps including a first step of sensing analog speech signals 502, filtering the signals 504 to constrain the frequency content of the signals to a
preselected frequency band, and digitizing the analog signals 506 using an analog-to-digital converter. In some embodiments, the speech signals are digitally filtered following the step of digitizing the analog signals 506 instead of filtering the
speech signals using analog filters. In other embodiments, both digital and analog filtering are performed. The step of filtering the signals 504 typically involves low pass filtering of the signal, although high pass filtering is also performed in
some embodiments. Filtered signals are stored in the speech sample database 512 for subsequent playback.
The samples stored in the speech sample database 512 form a wavetable memory defining a plurality of primitive speech sounds. The primitiv | | |