|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates in general to the field of digital audio
systems and, in particular, to systems which include MIDI synthesizers.
Still more particularly the present invention relates to a method and
apparatus for outputting digital audio and MIDI synthesized music with
efficient memory utilization.
2. Description of the Related Art
MIDI, the "Musical Instrument Digital Interface" was established as a
hardware and software specification which would make it possible to
exchange information including musical notes, program changes, expression
control, etc. between different musical instruments or other devices such
as sequencers, computers, lighting controllers, mixers, etc. This ability
to transmit and receive data was originally conceived for live
performances, although subsequent developments have had enormous impact in
recording studios, audio and video production, and composition
environments.
A standard for the MIDI interface has been prepared and published as a
joint effort between the MIDI Manufacturer's Association (MMA) and the
Japan MIDI Standards Committee (JMSC). This standard is subject to change
by agreement between JMSC and MMA and is currently published as the MIDI
1.0 Detailed Specification, Document Version 4.1, January 1989.
The hardware portion of the MIDI interface operates at 31.25 KBaud,
asynchronous, with a start bit, eight data bits and a stop bit. This makes
a total of ten bits for a period of 320 microseconds per serial byte. The
start bit is a logical zero and the stop bit is a logical one. Bytes are
transmitted by sending the least significant bit first. Data bits are
transmitted in the MIDI interface by utilizing a five milliamp current
loop. A logical zero is represented by the current being turned on and a
logical one is represented by the current being turned off. Rise times and
fall times for this current loop are less than two microseconds. A five
pin DIN connector is utilized to provide a connection for this current
loop with only two pins being utilized to transmit the current loop
signal. Typically, an opto-isolator is utilized to provide isolation
between devices which are coupled together utilizing a MIDI format.
Communication utilizing the MIDI interface is achieved through multi-byte
"messages" which consist of one status byte followed by one or two data
bytes. There are certain exceptions to this rule. MIDI messages are sent
over any of sixteen channels which may be utilized for a variety of
performance information. There are five major types of MIDI messages:
Channel Voice; Channel Mode; System Common; System Real-Time; and, System
Exclusive. A MIDI event is transmitted as a message and consists of one or
more bytes.
A channel message in the MIDI system utilizes four bits in the status byte
to address the message to one of sixteen MIDI channels and four bits to
define the message. Channel messages are thereby intended for the
receivers in a system whose channel number matches the channel number
encoded in the status byte. An instrument may receive a MIDI message on
more than one channel. The channel in which it receives its main
instructions, such as which program number to be on and what mode to be
in, is often referred to as its "Basic Channel." There are two basic types
of channel messages, a Voice message and a Mode message. A Voice message
is utilized to control an instrument's voices and Voice messages are
typically sent over voice channels. A Mode message is utilized to define
the instrument's response to Voice messages, Mode messages are generally
sent over the instrument's Basic Channel.
System messages within the MIDI system may include Common messages,
Real-Time messages, and Exclusive messages. Common messages are intended
for all receivers in a system regardless of the channel that receiver is
associated with. Real-Time messages are utilized for synchronization and
are intended for all clock based units in a system. Real-Time messages
contain status bytes only, and do not include data bytes. Real-Time
messages may be sent at any time, even between bytes of a message which
has a different status. Exclusive messages may contain any number of data
bytes and can be terminated either by an end of exclusive or any other
status byte, with the exception of Real-Time messages. An end of exclusive
should is sent at the end of a system exclusive message. System exclusive
messages always include a manufacturer's identification code. If a
receiver does not recognize the identification code it will ignore the
following data.
As those skilled in the art will appreciate upon reference to the
foregoing, musical compositions may be encoded utilizing the MIDI standard
and stored and/or transmitted utilizing substantially less data. The MIDI
standard permits the use of a serial listing of program status messages
and channel messages, such as "note on" and "note off" as control
messages.
When utilized in conjunction with various MIDI-controlled sound generated
devices or modules, musical compositions may be recorded and played.
As will hereinafter be detailed, these sound generators or "modules" have
taken many forms. In one form, referred to as "wavetable" or subtractive
synthesis, stored wave forms (shorter than an entire sampled sound
discussed below) are operated upon by filters, voltage controlled
amplifiers, and the like to generate or "synthesize" sound. One benefit of
this approach in addition to creating new and unusual sound forms not
present in nature was that relatively little memory was required, which,
in low-end computer systems, can be an extremely precious commodity.
Yet another form of sound generation took the form of sampling, digitizing,
and storing an analog acoustic signal, and then subsequently converting it
back to analog form during playback. A distinct advantage to this approach
was that it frequently could emulate complex acoustic wave forms in a far
more realistic and convincing manner than other techniques known in the
art. However there was a price to be paid for such realism. The data rate
required for such simple sampling systems can be quite enormous with
several tens of thousands of bits of data and associated memory being
required for each second of audio signal.
As a consequence, many different encoding systems have been developed to
decrease the amount of data required in such systems. For example, many
modern digital audio systems utilize pulse code modulation (PCM) which
employs a variation of a digital signal to represent analog information.
Such systems may utilize pulse amplitude modulation (PAM), pulse duration
modulation (PDM) or pulse position modulation (PPM) to represent
variations in an analog signal.
One variation of pulse code modulation, Delta Pulse Code Modulation (DPCM)
achieves still further data compression by encoding only the difference
between one sample and the next sample. Thus, despite the fact that an
analog signal may have a substantial dynamic range, if the sampling rate
is sufficiently high so that adjacent signals do not differ greatly,
encoding only the difference between two adjacent signals can save
substantial data. Further, adaptive or predictive techniques are often
utilized to further decrease the amount of data necessary to represent an
analog signal by attempting to predict the value of a signal based upon a
weighted sum of previous signals or by some similar algorithm.
In each of these digital audio techniques speech or an audio signal may be
sampled and digitized utilizing straightforward processing and
digital-to-analog or analog-to-digital conversion techniques to store or
recreate the signal.
While the aforementioned digital audio systems may be utilized to
accurately store speech or other audio signal samples, even with data
compression the substantial penalty in storage requirements must be paid
as compared with those required in MIDI-controlled synthesized systems
described above. However, in systems where it is desired to recreate
realistic human speech or other acoustic sounds, there often exists no
appropriate alternative.
Several hybrid approaches were attempted in the prior art seeking to obtain
the benefits of synthesized sound such as wave table synthesis and sampled
sound hereinbefore discussed. In one such attempt, a parallel
implementation of both wavetable synthesis and sampled sounds was provided
in hardware, a representative example being the SY77 Synthesizer
manufactured by the Yamaha Corporation. Such a synthesizer provided for
switching between wavetable or sample-generated sounds and in some limited
instances cross-connection between features of each (such as using the VFO
of the wavetable synthesizer with a playback of a sampled sound). While
thus providing the benefits of both sampled and wavetable synthesis, the
obvious limitation of this parallel implementation was the requirement of
dual parallel implementations having attendant cost increases.
In still another attempt to provide a hybrid approach offering benefits of
wavetable and sampled synthesis, referred to in the art as "LA" synthesis
and as implemented representatively by various synthesizers manufactured
by the Roland Corporation, the generated waveform was a combination of a
sampled and wavetable-generated waveform. It has been found
psychoacoustically that much of the character of a sound is identified in
the human ear by the information carried in the attack portion of a
waveform. Accordingly, in accordance with this technique, a first attack
portion of a waveform was generated by means of playback of an actual
sampled attack of the desired instrument, thereby lending the necessary
realism to the implementation of the sound. This was of course at the cost
of memory in that as previously discussed such sampled waveforms, for any
reasonable resolution and signal to noise ratios, requires relatively more
memory than a corresponding sound genesis technique utilizing synthesis
such as wavetable synthesis. Nevertheless, because only the attack portion
of the sound was generated by an actual sampled sound, memory was saved
which would otherwise have to be used if the entire waveform was a sample
playback. The remaining portion of the desired waveform was thence
generated by means of the second technique, namely wavetable synthesis
which provided more or less the sustained or steady state portion of the
desired waveform. Inasmuch as this portion was generated by wavetable
synthesis with less severe memory requirements than would otherwise be
necessary if this portion of the waveform was generated by a storage
sample, savings in memory was thereby realized. Although there were
distinct benefits to this hybrid approach such as the ability to generate
new sounds which were combinations of sampled and wavetable generated
artificial sounds, they were nevertheless serious drawbacks to this
approach as well.
First, provision was not made for selecting either or the other modes of
sound generation for generating the entirety of the sound. One reason, of
course, was that this would defeat the purpose of such a hybrid approach
inasmuch as for the sampling case, for example, it would require storage
not only of just the attack portion of the sampled waveform but the rest
of the waveform (for which the whole approach was directed to saving the
memory otherwise necessary to create this portion). Yet another serious
drawback to this approach was that there was no provision made for
uploading, altering, or otherwise upgrading the sounds by way of altering
and adding to the existing sample portions and wavetable parameters.
In yet another attempt to avoid the problems of the aforementioned
approaches requiring dual hardware, limitations in upgrading new sounds or
providing for a complete sampled or wavetable sound implementation if
desired, development also focused on digital signal processor or DSP sound
generation. In such an approach, wherein the DSP could implement the sound
generation, attempts have been made to reconfigure the DSP dynamically to
generate either sampled or synthesized sound as desired. In such an
implementation, particularly wherein an expensive multi-tasking DSP system
was not provided, it was found necessary to load DSP code implementing
either the wavetable or sample-based sound generation, on the fly as well
as requiring switching between these various forms of code dynamically in
determining based upon the incoming MIDI datastream which mode in the DSP
to be switching to.
Such a system was found to be extremely difficult to implement, one
alternative being to provide multiple copies of DSP code simultaneously
available depending upon the mode desired. The problems with the approach
of dynamically loading DSP code, depending upon the sound-generation
technique desired, was compounded in multi-tasking operating systems since
it was difficult if not impossible to know, due to the ongoing task
switching, when the appropriate time was and how to coordinate the loading
and switching of the DSP code, again resulting in a need to load complete
sets of DSP code and permit the multi-tasking system to perform the
switching.
Multimedia is an emerging market wherein MIDI capability is a key
multimedia element. However, as previously noted, a serious problem for
low-end systems which may become prevalent in homes and school
environments is in maintaining low cost of the system which
characteristically results in relatively small memory systems, giving rise
to the aforementioned problems. As the use of MIDI increases as well, it
is likely to further increase adoption by low-end users where equipment
expenditures in this area are extremely limited. Thus, techniques are
highly sought which will provide for multimedia function to operate on
smaller, less expensive systems such as techniques for saving memory. Such
memory costs in low-end systems may be the critical difference in
successfully providing systems in the high volume, low price market.
Specifically, a means was needed to provide for MIDI, including sampled
sounds on limited hardware while nevertheless providing the highest
quality sound possible within these constraints of low price systems.
It was thus apparent that a need existed for a method and apparatus whereby
certain digitized audio samples, such as human speech and acoustical
musical sounds, could be recreated and combined with synthesized music
utilizing s MIDI data file in such a way as to obtain the benefits of both
approaches, while at the same time accounting for these severe limitations
imposed on memory availability by low end systems.
More particularly, it was found highly desirable to provide a single
hardware configuration implementing multiple modes of sound generation,
and in particular, either synthesized (such as wavetable) sounds or
sampled sound generation. Still further, it was found desirable to provide
for such a system which would not require dynamic reloading of code such
as DSP code and which would not require inordinate time to be spent trying
to determine which modules of DSP code to execute. Yet a further object
was to provide a system providing the benefits of both synthesized and
sampled sounds wherein it was nevertheless possible to upgrade the system
with improved synthesized and sampled sounds. Still further, it was
desired to implement the system wherein a basic set of acceptable sounds
was provided (such as the standard 175 general MIDI implementation sounds)
implemented with a reasonably cost effective yet pleasing system such as
wavetable synthesis, and wherein, if desired, the user might nevertheless
upgrade the quality of these sounds to sampled sounds which could be
automatically substituted for the corresponding general MIDI wavetable
synthesized sounds if available as desired and as the system resources
permitted.
These and other benefits are provided by the invention which will best be
understood by reference to the following detailed description of an
illustrative embodiment when read in conjunction with the accompanying
drawings wherein:
SUMMARY OF THE INVENTION
A system and method are provided for improving quality of sound generated
by computerized systems having limited memory. A wavetable synthesizer is
implemented wherein data utilized to synthetically generate acoustic
waveforms is stored. A plurality of datasets is also generated and stored,
each comprised of a digitized acoustic waveform. In response to a MIDI
datastream, the system determines if an appropriate stored acoustic sample
corresponding thereto resides in the system's memory. If so, the system
will generate the desired sound utilizing the stored acoustic sample data.
If not, the system automatically determines in real time the appropriate
wavetable dataset which will generate a sound most closely approximating
the acoustic sound. The system thus dynamically reconfigures in real time
between wavetable and acoustic sample synthesis, being configured for the
former when appropriate acoustic samples are not present.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a computer system which may be utilized to
implement the method and apparatus of the present invention;
FIG. 2 is a block diagram illustrating the prior system of sampling
synthesis;
FIG. 3 is a block diagram illustrating the prior art system of subtractive
synthesis;
FIG. 4 is a block diagram illustrating the prior art system of wavetable
synthesis;
FIG. 5 is a block diagram of a dynamically configuring synthesis method and
apparatus in accordance with the present invention;
FIG. 6 is a block diagram of control structures used in the conversion of
MIDI events to the selection of voicing parameters and waveforms or
samples.
FIG. 7 is a block diagram illustrating how the ADSRs and LFO are commonly
shared between the oscillator, filter, and digitally controlled amplified
(DCA).
FIG. 8 is a flowchart of the method and apparatus of the present invention;
FIG. 9 is a block diagram of a portion of a computer system of FIG. 1 used
in implementing the method and apparatus of the present invention,
including an audio adapter having a digital signal processor and
digital-to-audio and audio-to-digital converters.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
With reference now to the figures and in particular with reference to FIG.
1, there is depicted a block diagram of a computer system 1 which may be
utilized to implement the method and apparatus of the present invention.
Related technology for implementing the invention regarding sampling,
MIDI, DSP and the like may be found in patent application Ser. Nos.
07/608,111; 07/608,105; 07/608,126; and 07/770,494 which are incorporated
herein by reference. As is illustrated, a computer system 1 is depicted
which will implement a dynamically configurable synthesizer generating
wavetable synthesized as well as sampled acoustic sound, preferably under
MIDI control, in accordance with the teachings of the invention. Computer
system 1 may be implemented utilizing any state-of-the-art digital
computer system having a suitable digital signal processor disposed
therein which is capable of implementing a MIDI synthesizer. For example,
computer system 1 may be implemented utilizing an IBM PS/2 type computer
which includes an IBM Audio Capture & Playback Adapter (ACPA).
Also included within computer system 1 is display 3. Display 3 may be
utilized, as those skilled in the art will appreciate, to display those
command and control features typically utilized in the processing of audio
signals within a digital computer system. Also coupled to computer system
1 is computer keyboard 4 which may be utilized to enter data and select
various files stored within computer system 1 in a manner well known in
the art. Of course, those skilled in the art will appreciate that a
graphical pointing device, such as a mouse or light pen, may also be
utilized to enter commands or select appropriate files within computer
system 1.
Still referring to computer system 1, it may be seen that processor 2 is
depicted. Processor 2 is preferably the central processing unit for
computer system 1 and, in the depicted embodiment of the present
invention, preferably includes an audio adapter capable of implementing a
MIDI synthesizer by utilizing a digital signal processor. One example of
such a device is the IBM Audio Capture & Playback Adapter (ACPA).
As is illustrated, MIDI file 6 and digital audio file 7 are both depicted
as stored within memory within processor 2. The output of each file may
then be coupled to interface/driver circuitry 8. Interface/driver
circuitry 8 is preferably implemented utilizing any suitable audio
application programming interface which permits the accessing of MIDI
protocol files or digital audio files and the coupling of those files to
an appropriate device driver circuit within interface/driver circuitry 8.
Thereafter, the output of interface/driver circuitry 8 is coupled to
digital signal processor 9. Digital signal processor 9, in a manner which
will be explained in greater detail herein, is utilized to output digital
audio and MIDI synthesized music and to couple that output to audio output
device 5. Audio output device 5 is preferably an audio speaker or pair of
speakers in the case of stereo music files.
Turning now to FIG. 2, in order to more fully comprehend the invention it
will be helpful to describe a technique referred to as sampling synthesis
utilized in the music synthesizer art today in order to generate sounds of
existing (as well as non-existent) musical instruments. Depicted in FIG. 2
is a functional block diagram of such an instrument. In the simplest case,
an existing instrument is "tape recorded" in the sense that a single note
is played from the instrument, and that note is subsequently digitized for
storage in digital memory, shown as sample data 10. Playback of that sound
by a "sampler" device is performed in a manner analogous to playing back
the original tape. Many instruments sounds have variable length durations.
The clarinet, for example, will continue to sound as long as the musician
continues to blow into the mouthpiece. This is in contrast, for example,
to a drum, whose sound quickly dies out at a fairly constant rate after
being struck. A sampler allows notes of different lengths to be generated
using a technique known as looping. A section of the digitized waveform is
played back repeatedly, thus giving the impression of continuous data.
Various functions may be implemented in analog circuitry or in the digital
domain to enhance the sound. For example a low frequency oscillator 14 may
be provided with an output signal 26 which operates upon the sample data
output 24 to modulate the sound from playing back the samples in a desired
manner to create a vibrato. The interpolating oscillator 12 receiving the
sample data output 24 and vibrato data 26 operates upon this data to
produce a vibrato modulated audio signal of the desired average pitch.
Still referring to FIG. 2, yet another technique for enhancing the
played-back sample data commonly used in samplers is filtering. A filter
is utilized to change the tonal quality of the digitized waveform. This is
effective in producing the types of changes that occur to sounds that a
musical instrument will make when played at different volumes. Generally
speaking, for example, a musical instrument will generate a brighter sound
when played loudly. A filter may therefore be utilized to remove some of
the brightness from a waveform when being played quietly. In the block
diagram of a typical sampler in FIG. 2, such a filter 16 is thereby
provided which operates on the output of the interpolating oscillator 28
to generate a filter output 32.
Yet another desired capability of such a sampler is to control the
amplitude of the resulting output 36. This may be conveniently effected by
means of an amplifier 20 receiving the output of the filter 32, whereby
the amplifier, after operating upon the filter output 32 generates the
desired output 36. It will appreciated that in a manner well known in the
art it has been found convenient to regulate operation of such filters 16
and amplifiers 20 by means of voltage control, and consequently ADSR
generators 18 and 22 may typically be provided having respective outputs
30 and 34 that operate upon their respective filter 16 or amplifier 20.
Such an ADSR generator will be easily recognized in the art as being an
attack, decay, sustain, and release generator providing an envelope
comprised, in sequence, of such an attack, decay, sustain, and release
value defining the envelope which will be a voltage value whose magnitude
regulates the amount of filtering or amplification provided.
A shortcoming of the foregoing sampler technique just described is that it
requires large amounts of memory 10 to store each digitized sound, even if
techniques are employed in an attempt to reduce the requirements of such
memory such as the looping technique previously mentioned wherein to
obtain a sustained sound the same data is read out over and over and
converted into sound rather than having to capture and digitize the entire
duration of the desired sound. In an environment such as a synthesizer
implemented utilizing a DSP attached to a personal computer, it may not be
possible to guarantee that a given amount of such memory 10 will be
available for the storage of musical instrument digital waveforms, e.g.
"samples". The previously mentioned General MIDI Mode standard
nevertheless requires that a base set of 175 musical instruments and
special effects sounds be available. This obviously poses a problem if
there isn't enough such memory 10 available to hold the samples for all
175 sounds.
Turning now to FIG. 3, yet an additional technique of sound generation
known in the prior art should be understood to gain a comprehensive
understanding of the invention. FIG. 3 is a simplified block diagram of
yet another type of synthesizer known in the art referred to as a
subtractive synthesizer, such subtractive synthesis being popularized
during the mid-1970's as for example in the well known Moog synthesizer.
This type of synthesizer utilizes an oscillator 40 to generate a
continuous fixed periodic waveform shown as oscillator output 52. As in
the case of the sampling synthesis of FIG. 2, a low frequency oscillator
42 may be provided for similar reasons having an output 54 modulating the
oscillator 40 to provide a modulated output 52 including vibrato as
desired. Also similar to the sampling synthesis technique illustrated in
FIG. 2, a filter 44 may be provided to modify the harmonic content of the
oscillator output 52 in response to an ADSR generator 46 output 58. The
output 56 of the filter containing the output of the oscillator 52 having
its harmonic content modified by the ADSR generator 46, will then
preferably be delivered to a voltage controlled amplifier 48 in the manner
of the synthesizer depicted in FIG. 2 whereby the envelope of the signal
may thereby be shaped by the operation of a second ADSR generator 50,
whose output 62 regulates the amount of amplification by the amplified 48
utilized to generate the output 60.
Yet a third form of sound generation should be understood in gaining an
appreciation of the subject invention known as wavetable synthesis. A
wavetable synthesizer, which is a derivative of the subtractive
synthesizer of FIG. 2 may be shown depicted in a functional block diagram
in FIG. 4. This form of synthesizer will be recognized as being quite
similar to that of FIG. 3. More particularly, an interpolating oscillator
72 is provided which operates upon sound data 84 in response to a vibrato
output 86 from a low frequency oscillator 74, resulting in a modulated
output 88 delivered to a filter 76. In a typical embodiment | | |