|
|
|
| United States Patent | 5038658 |
| Link to this page | http://www.wikipatents.com/5038658.html |
| Inventor(s) | Tsuruta; Schichirou (Osaka, JP);
Takashima; Yosuke (Tokyo, JP);
Fujimoto; Masaki (Tokyo, JP);
Mizuno; Masanori (Tokyo, JP) |
| Abstract | An automatic music transcription method and system for generating a muscial
score from an input acoustic signal. The acoustic signal may include vocal
songs, vocal humming, and music from musical instruments. The system
comprises means for extracting pitch information and power information
from the input acoustic signal, for correcting the pitch information based
on the deviation of the acoustic signal relative to an absolute musical
scale, for dividing the acoustic signal into a set of single-sound
segments using the corrected pitch information, dividing the acoustic
signal into a second set of single-sound segments this time using changes
in the power information, for dividing the acoustic signal in still
greater detail using information contained in both previous segmentations,
for associating each segment with a musical interval of an absolute
musical scale, and for determining single-sounds segments depending on
whether or not the musical intervals of adjacent segments are identical,
for determining the key of the acoustic signal, for correcting the
placement of the segments on the musical scale of the determined key using
the pitch information, for determining the time and tempo of the acoustic
signal using this placement, and for compiling musical score data using
the determined musical scale, sound length, key, time, and tempo of the
acoustic signal. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5038658 |
|
|
Method for automatically transcribing music and apparatus therefore |
|
|
|
|
|
| Publication Date |
August 13, 1991 |
|
|
|
|
|
| Filing Date |
February 27, 1989 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| Priority Data |
Feb 29, 1988[JP]63-46111
Feb 29, 1988[JP]63-46112
Feb 29, 1988[JP]63-46113
Feb 29, 1988[JP]63-46114
Feb 29, 1988[JP]63-46115
Feb 29, 1988[JP]63-46116
Feb 29, 1988[JP]63-46117
Feb 29, 1988[JP]63-46118
Feb 29, 1988[JP]63-46119
Feb 29, 1988[JP]63-46120
Feb 29, 1988[JP]63-46121
Feb 29, 1988[JP]63-46122
Feb 29, 1988[JP]63-46123
Feb 29, 1988[JP]63-46124
Feb 29, 1988[JP]63-46125
Feb 29, 1988[JP]63-46126
Feb 29, 1988[JP]63-46127
Feb 29, 1988[JP]63-46128
Feb 29, 1988[JP]63-46129
Feb 29, 1988[JP]63-46130 |
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
Claims  |
|
|
What is claimed is:
1. A method for transcribing music onto an absolute musical interval axis
with predetermined frequencies marking boundaries of each interval,
comprising the steps of:
inputting an acoustic signal;
extracting pitch information and power information from said acoustic
signal;
correcting said pitch information by determining a musical interval axis of
said pitch information according to a predetermined algorithm and then
shifting the pitch of said pitch information so that a musical interval
axis of the shifted pitch information according to said algorithm matches
the absolute musical interval axis;
first dividing said acoustic signal into first single sound segments on the
basis of said corrected pitch information while second dividing said
acoustic signal into second single sound segments on the basis of power
changes in said power information;
third dividing said acoustic signal into third single sound segments on the
basis of both said first and second single sound segments;
identifying musical intervals in said acoustic signal by matching each of
said third single sound segments to one of said predetermined frequencies
marking the boundaries of the absolute musical interval axis;
fourth dividing said acoustic signal again into fourth single sound
segments by combining adjacent third single sound segments which are
matched to the same predetermined marking frequency;
determining a key inherent in said acoustic signal on the basis of the
pitch information extracted in said extracting pitch information step;
correcting the matching of said fourth dividing step using said determined
key;
fifth dividing said acoustic signal again into fifth single sound segments
by combining adjacent third single sound segments which are matched to the
same predetermined marking frequency;
determining a time and tempo inherent in said acoustic signal on the basis
of said corrected segment information; and
compiling musical score data from the fifth single sound segments, the
predetermined marking frequency on the absolute musical interval axis to
which each of the fifth single sound segments is matched, the key, the
time and the tempo.
2. The method for transcribing music of claim 1, further comprising the
step of:
eliminating noise from and interpolating said extracted pitch and power
information, the noise eliminating and interpolating step being performed
after said step of extracting pitch and power information and before said
step of correcting said pitch information.
3. The method for transcribing music of claim 1, wherein said second
dividing step comprises the steps of:
comparing said power information to a predetermined value and dividing said
acoustic signal into a first section larger than said predetermined value
while recognizing said first section as an effective section and also
dividing said acoustic signal into a second section smaller than said
value while recognizing said second section as an invalid section;
extracting a point of change where said power information rises with
respect to said effective section;
dividing said effective segment into smaller parts at said point of change;
measuring the length of said segments of both of said effective and invalid
sections; and
connecting any segment with a length shorter than a predetermined length to
the preceding segment to form one segment.
4. The method for transcribing music of claim 1, wherein said second
dividing step comprises the steps of:
comparing said power information to a predetermined value and dividing said
acoustic signal into a first section larger than said predetermined value
while recognizing said first section as an effective section and also
dividing said acoustic signal into a second section smaller than said
value while recognizing said second section as an invalid section;
extracting a point of change where said power information rises with
respect to said effective section; and
dividing said acoustic signal on the basis of said extracted point of
change.
5. The method for transcribing music of claim 1, wherein said second
dividing step comprises the steps of:
dividing said acoustic signal into a first section larger than a
predetermined value while recognizing said first section as an effective
section and into a second section smaller than said predetermined value
while recognizing said second section as an invalid section;
measuring the length of both said first and second sections; and
connecting any segment with a length shorter than a predetermined length to
the preceding segment.
6. The method for transcribing music of claim 1, wherein said second
dividing step comprises the steps of:
extracting a point of change where said power information rises; and
dividing said acoustic signal with respect to said point of change.
7. The method for transcribing music of claim 1, wherein said second
dividing step comprises the steps of:
extracting a point of change where of said power information rises;
dividing said acoustic signal with respect to said point of change; and
connecting any segment with a length shorter than a predetermined length to
the preceding segment.
8. The method for transcribing music of claim 1 wherein the acoustic signal
is sampled into individual sampling points, wherein said first dividing
step comprises the steps of:
analyzing said individual sampling points of the acoustic signal using said
extracted pitch information to determine a length of a series of said
sampling points in which the pitch of said sampling points remains in a
range;
detecting a section in which said determined length of said series exceeds
a predetermined value;
identifying the sampling point beginning the series having the maximum
series length of said detected sections to be the typical point;
detecting the amount of the variation in said pitch information between
adjacent typical points with respect to the individual sampling points
between them when the difference in said pitch information at two adjacent
typical points exceeds a predetermined value; and
dividing said acoustic signal at one of said sampling points between
adjacent typical points where the amount of variation between said one
sampling point and an adjacent sampling point is maximum.
9. The method for transcribing music of claim 1, wherein said third
dividing step comprises the steps of:
determining a standard length of a note corresponding to a predetermined
duration of time on the basis of the length of each of said first single
sound segments divided in said first dividing step; and
dividing each of said first single sound segments on the basis of said
determined standard length and dividing said single sound segments again
which have lengths longer than said predetermined duration of time of said
note.
10. The method for transcribing music of claim 1, wherein said step of
identifying musical intervals comprises the steps of:
calculating the differences in pitch between the pitches of each of said
third single sound segments and said predetermined frequencies of said
absolute musical interval;
detecting the smallest difference; and
recognizing the musical interval of said third single sound segment to be
at said predetermined frequency on said absolute musical interval axis in
relation to which the pitch of said third single sound segment has said
smallest difference.
11. The method for transcribing music of claim 1, wherein said step of
identifying musical intervals comprises the steps of:
calculating an average value of all said pitch information of each of said
third single sound segments; and
recognizing the musical interval of each of said third single sound
segments to be at the predetermined frequency on said absolute musical
interval axis in relation to which said calculated average pitch value of
said third single sound segment is closest
12. The method for transcribing music of claim 1, wherein said step of
identifying musical intervals comprises the steps of:
extracting an intermediate value of said pitch information of each of said
third single sound segments; and
recognizing the musical interval of each of said third single sound
segments to be at the predetermined frequency on said absolute musical
interval axis in relation to which said intermediate value is closest.
13. The method for transcribing music of claim 1, wherein said step of
identifying musical intervals comprises the steps of:
extracting the most frequent value of said pitch information of each of
said third single sound segments; and
recognizing the musical interval of each of said third single sound
segments to be at the predetermined frequency on said absolute musical
interval axis in relation to which said most frequent value is closest.
14. The method for transcribing music of claim 1, wherein said step of
identifying musical intervals comprises the steps of:
extracting the peak point pitch value of said power information for each of
said third single sound segments; and
recognizing the musical interval each of said third single sound segments
to be at the predetermined frequency on said absolute musical interval
axis in relation to which said peak point pitch value is closest.
15. The method for transcribing music of claim 1, wherein the acoustic
signal is sampled into individual sampling points, wherein the step of
identifying musical intervals comprises the steps of:
analyzing said individual sampling points of the acoustic signal using said
extracted pitch information to determine a series for each of said
sampling points in which the pitch of said sampling points in the series
remains in a range;
identifying which of said series in each of said third single sound
segments has the longest length
finding an analytical point for said series of longest length in each of
said third single sound segments, the analytical point being the sampling
point about which the pitches of all other sampling points fall within
half of said range; and
identifying each of said third single sound segments with a predetermined
pitch of the absolute musical interval axis by matching the pitch of the
analytical point to the closest predetermined pitch on the absolute
musical interval axis.
16. The method for transcribing music of claim 1, wherein said step of
identifying musical intervals comprises the steps of;
extracting segments with lengths lower than a predetermined value;
extracting segments which have changes in pitch information of a particular
constant inclination;
detecting the differences in pitch between the identified musical interval
of each of said extracted segments and adjacent segments;
identifying the musical interval of both the extracted segment and the
adjacent segment to be the predetermined marking frequency of the absolute
musical interval axis which is closest to either of the extracted segment
and the adjacent segment which is smaller than a predetermined value as an
actual musical interval.
17. The method for transcribing music of claim 1, wherein said step of
identifying musical intervals comprises the steps of:
extracting segments of said acoustic signal which begin and end according
to a half step above and a half step below each of the predetermined
frequencies of the absolute musical interval axis;
classifying totals of each of said extracted segments in said acoustic
signal which corresponds to the same predetermined frequency on the
absolute musical interval axis; and
identifying the musical interval of each of said segments in accordance
with said classified totals.
18. The method for transcribing music of claim 1, wherein said key
determining step comprises the steps of:
classifying totals of said pitch information with respect to the absolute
musical interval axis;
extracting a frequency of occurrence of each of said predetermined
frequencies on the absolute musical interval axis;
calculating product sums of predetermined weighing coefficient and said
extracted frequency of occurrence of each of said predetermined
frequencies on the absolute musical interval axis, a different calculation
being performed for each of musical key; and
identifying the key of the acoustic signal to be the particular musical key
resulting in the maximum product sum calculation.
19. The method for transcribing music of claim 1, wherein said step of
extracting pitch information comprises the steps of:
converting said acoustic signal into digital form;
calculating an autocorrelation function of said acoustic signal in the
digital form;
detecting an amount of deviation giving the maximum of the local maximum
for said calculated autocorrelation functions by an amount of deviation
other than zero;
detecting an approximate curve through which said autocorrelation functions
of a plurality of sampling points including that giving said amount of
deviation pass;
determining an amount of deviation resulting in the local maximum of said
autocorrelation on said calculated approximate curve; and
detecting a pitch frequency in accordance with said determined amount of
deviation.
20. The method for transcribing music of claim 1, wherein said step of
extracting pitch information comprises the steps of:
converting said acoustic signal into digital form;
calculating an autocorrelation function of said acoustic signal in the
digital form;
detecting a pitch information in accordance with the maximum information of
said calculated autocorrelation function;
judging whether the local maximum point of said autocorrelation function
exists approximate to two-times of the largest frequency component of said
detected pitch information; and
outputting pitch information corresponding to said local maximum if the
result of said judge is positive.
21. The method for transcribing music of claim 1, wherein said step of
correcting said pitch information comprises the steps of:
classifying totals of said pitch information;
detecting a deviation from the absolute musical interval axis using said
classified totals; and
shifting the pitch of said pitch information by the amount of said detected
deviation.
22. An apparatus for transcribing music, comprising:
means for inputting an acoustic signal;
means for amplifying said inputted acoustic signal;
means for converting the analog acoustic signal into digital form;
means for processing said digital acoustic signal for extracting pitch
information and power information;
means for storing the processing program;
means for controlling said signal processing program; and
means for displaying the transcribed music,
wherein said means for amplifying, said means for converting, and said
means for processing are formed in a hardware construction. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to automatically transcribing music (vocal
music, vocal humming, and sounds of musical instruments) into a musical
score.
In such an automatic music transcription system, it is necessary to detect
the basic items of information in musical scores: sound lengths, musical
intervals, keys, times, and tempos.
Generally, since acoustic signals are the kind of signals which contain
repetitions of fundamental waveforms in continuum, it is not possible
immediately to obtain the above-mentioned items of information.
Therefore, the present applicants have already proposed an automatic music
transcription system as disclosed, for example, in Unexamined Patent
Application No. 62-178409.
This automatic music transcription system is shown in FIG. 1. The system is
provided with autocorrelation analyzing means 14 for converting hummed
vocal sound signals 11 into digital signals by means of analog/digital
(A/D) converter 12. The digitized sound is called vocal sound data 13.
Pitch information and sound power information 15 is then extracted from
the vocal sound data 13. Segmenting means 16 divides the input song or
hummed sounds into a plural number of segments on the basis of the sound
power information. Musical interval identifying means 17 identifies the
musical interval on the basis of the afore-mentioned pitch data with
respect to each of the segments as established by the afore-mentioned
segmenting means. Key determining means 18 determines the key of the input
song or hummed vocal sounds on the basis of the musical interval as
identified by the afore-mentioned musical interval identifying means.
Tempo and time determining means determines the tempo and time of the
input song or hummed vocal sounds on the basis of the segments established
by division by the afore-mentioned segmenting means. Musical score data
compiling means 110 prepares musical score data on the basis of the output
of the afore-mentioned segmenting means, musical interval identifying
means, key determining means, and tempo and time determining means.
Musical score data outputting means 111 generates musical score data 112
prepared by the afore-mentioned musical score compiling means 110.
It is to be noted in this regard that such acoustic signals as those of
vocal sounds in songs, hummed voices, and musical instrument sounds
consist of repetitions of fundamental waveforms. In an automatic music
transcription system for transforming such acoustic signals into musical
score data, it is necessary first to extract for each analytical cycle the
repetitive frequency of the fundamental waveform in the acoustic signal.
This frequency is hereinafter referred to as "the pitch frequency". The
corresponding cycle is called "the pitch cycle." This "pitch" information
is taken into account, in order accurately to determine various kinds of
information on such items as musical interval and sound length in acoustic
signals.
Two extracting methods, frequency analysis and autocorrelation analysis,
have been developed in the fields of vocal sound synthesis and vocal sound
recognition. Autocorrelation analysis has hitherto been employed because
it extracts pitch without being affected by noises in the environment and
because it permits easy processing.
In the automatic music transcription system mentioned above, the system
calculates the autocorrelation function after it converts acoustic signals
into digital signals. Therefore, an autocorrelation function can be
calculated for each analytical cycle.
Pitch extraction accuracy is similarly dependent upon the sampling cycle.
If the resolution of a pitch so extracted is low, then the musical
interval and sound length determined by the processes described later will
have a low degree of accuracy.
It is conceivable to use a higher frequency for sampling, but such an
approach is liable to result in the inability of the system to perform
real-time processing, as well as a larger-sized, more expensive, automatic
music transcription system apparatus. The disadvantages are a consequence
of the increase in the amount of data processed in arithmetic operations
such as the autocorrelation function.
Acoustic signals have the characteristic feature that their power is
augmented immediately after a change in sound. This feature of sound is
utilized in the segmentation of on the basis of power information.
Unfortunately, acoustic signals, particularly those appearing in songs sung
by a man, do not necessarily take any specific pattern in the change of
their power information. Songs have fluctuations in relation to the
pattern of change. In addition, the sound to be transcribed also often
contains abrupt sounds, such as outside noises. In these circumstances, a
simple segmentation of sound with attention paid to the change in the
power information has not necessarily led to any good division of
individual sounds.
In this regard, it is noted that acoustic signals generated by a man are
not stable in sound length, either. That is, such signals have much
fluctuations in pitch. This has caused an obstacle to the performance of
good segmentation based on pitch information.
Thus, in view of the fluctuations existing in pitch information,
conventional systems often treat two or more sounds as a single segment in
some cases.
With existing transcription equipment, even sounds generated by musical
instruments do not readily lend themselves to segmentation based on pitch
information. This shortcoming is due to ambient noises intruding into the
pitch information after capture by the acoustic signal input apparatus for
converting acoustic signals into electrical signals.
When musical intervals, times, tempos, etc. are determined on the basis of
sound segments (sound length), the process of segmentation becomes a very
important factor in the preparation of musical score data. A low accuracy
of segmentation reduces the accuracy of the ultimately developed musical
score data. A high initial accuracy of segmentation is therefore desired
when final segmentation utilizes the results of the power information. A
high initial accuracy is also desired when final segmentation utilizes the
results of both pitch information segmentation and the results of power
information segmentation.
Acoustic signals, particularly those acoustic signals uttered by a man, are
not stable in their musical interval. These signals have considerable
fluctuations in pitch even when the same pitch (one tone) is intended.
Accordingly, it is very difficult to identify musical intervals in such
signals.
When a transition occurs from one sound to another, it often happens that a
smooth transition is not made to the pitch of the following sound. Pitch
fluctuations occur before and after the transition. Consequently, the
segments on either side are often mistaken for another sound segment. The
result is that sound segments with pitch transitions are often identified
as belonging to a different pitch level in the identification of a musical
interval.
In order to explain this in specific terms, methods permitting simplicity
in arithmetic operation are considered for the automatic music
transcription system mentioned above. For example a given sound can be
identified with a pitch closest on the absolute axis to the average value
of the pitch information within the segment. The sound can also be
identified with the pitch closest on the absolute axis to the medium value
of the pitch information of the segment.
With a method like this, it is possible to identify the musical interval
well when the interval difference between two adjacent sounds is a whole
tone, for example do and re on the C-major scale. But, if the difference
between two adjacent sounds is a semitone, for example of mi and fa on the
C-major scale, there may sometimes be an inaccuracy in the identification
of the musical interval. For example, the sounds intended to be mi on the
C-major scale can be identified as fa.
In addition to sound length, the musical interval is a fundamental element.
It is therefore necessary to identify the interval accurately. If it
cannot be identified accurately, the accuracy of the resulting musical
score data will be low.
The key, on the other hand, is not merely an element of musical score data.
The key gives an important clue to the determination of a musical
interval. A key has a certain relationship to a musical interval and to
the frequency of occurrence of a musical interval. In improving the
accuracy of the musical interval, it is desirable to determine the key and
to review the identified musical interval.
Furthermore, as mentioned above, the musical intervals of acoustic signals,
particularly those of vocal music, deviate from the absolute musical
interval. The greater the deviation, the more inaccurate the musical
interval identified on the musical interval axis. The deviation of the
musical intervals in vocal music heretofore has resulted in lower accuracy
in music transcription.
In summary, the automatic music transcription system and apparatus
disclosed in the present applicants' published patent application No.
62-178409 may generate musical score data with low accuracy. It has so
therefore not found widespread practical use.
SUMMARY OF THE INVENTION
The present invention has been made in consideration of the problems
mentioned hereinabove. Therefore, a primary object of the invention is to
provide a practically usable automatic music transcription system and
apparatus which improves the accuracy of the final musical score data.
Another object of the present invention is to provide an automatic music
transcription method and apparatus which further improves the accuracy of
the final musical score data by segmentation based on power information
segmentation and pitch information segmentation. This accuracy is to be
achieved without being influenced by fluctuations in acoustic signals or
abrupt intrusions of outside sounds.
The present invention is a method of identifying musical intervals which
both identifies musical scales with accuracy and also provides for an
automatic music transcription system for further improving the accuracy of
the final musical score data.
Still another object of the present invention is to provide an automatic
music transcription method and apparatus which further improves the
accuracy of the final musical score data by obtaining more accurate
information on the musical interval. The more accurate musical interval is
achieved through correction of the pitch of segments (identified with
musical intervals whose pitch differs from those pitches intended by the
singer due to pitch fluctuations occurring at the time of transition from
one sound to the next). The pitch of the segment is corrected with
reference to musical interval information on the preceding segment and on
the following segment.
Still another object of the present invention is to provide an automatic
music transcription method and apparatus capable of accurately determining
the key of acoustic signals.
Still another object of the present invention is to provide an automatic
music transcription method and apparatus capable of detecting the amount
of deviation of the musical interval axis of an acoustic signal in
relation to the axis of the absolute musical interval, correcting the
pitch information in proportion to the detected deviation, and making it
possible to compile musical score data more accurately in the subsequent
process.
Still another object of the present invention is to provide a pitch
extracting method and pitch extracting apparatus capable of extracting the
pitch of an acoustic signal with high accuracy without employing a higher
sampling frequency.
In order to attain these and other objects, the automatic music
transcription system according to the present invention involves
extracting pitch information and power information from the input acoustic
signal, correcting pitch information in proportion to the deviation of the
musical interval axis from the absolute musical interval axis, dividing
the acoustic signal into single sound segments on the basis of the
corrected pitch information and on the basis of changes in the power
information, making more detailed divisions of the acoustic signal on the
basis of the segment information, identifying musical intervals amid the
individual segments referencing the pitch information, and dividing the
acoustic signal again into single-sound segments on the basis of whether
or not the identified musical intervals of the segments in continuum are
identical, determining the key of the acoustic signal on the basis of the
extracted pitch information, correcting the prescribed musical interval on
the musical scale for the determined key on the basis of the pitch
information, determining the time and tempo of the acoustic signal on the
basis of the segment information, and finally compiling musical score data
from the information on the determined musical interval, sound length,
key, time, and tempo.
Similarly, the automatic music transcription system according to the
present invention comprises a means for extracting from the input acoustic
signal the pitch information and the power information thereof, a means
for correcting the pitch information in accordance with the amount of
deviation of the musical interval for the acoustic signal in relation to
the axis of the absolute musical interval, a means for dividing the
acoustic signal into single-sound segments on the basis of the corrected
pitch information, a means for dividing the acoustic signal into
single-sound segments on the basis of the changes in the power
information, a means for making further divisions of the acoustic signal
into segments on the basis of both of these sets of segment information
thus made available, a means for identifying the musical intervals for the
acoustic signals in the individual segments along the axis of the absolute
musical interval, a means for dividing the acoustic signal again into
single-sound segments on the basis of whether or not the musical intervals
of the identified segments in continuum are identical, a means for
determining the key for the acoustic signal on the basis of the extracted
pitch information, a means for correcting the prescribed musical interval
on the determined key on the basis of the pitch information, a means for
determining the time and tempo of the acoustic signal on the basis of the
segment information, and a means for compiling musical score data from the
information on the musical interval, sound length, key, time and tempo so
determined.
The automatic music transcription system according to the present invention
is further characterized by a means for inputting acoustic signals, a
means for amplifying the acoustic signals thus input, a means for
converting the amplified analog signals into digital signals, a means for
extracting the pitch information by performing autocorrelation analysis of
the digital acoustic signals and extracting the power information by
performing the operations for finding the square sum, (the means for
extracting the pitch information and the power information being
constructed in hardware) a storage means for keeping in memory the
prescribed music-transcribing procedure, a controlling means for executing
the music-transcribing procedure kept in memory in the storage means, a
means for starting the processing by the control means, and a means for
generating the output of the musical score data obtained by the
processing.
The present invention has made it possible to provide an automatic music
transcription system with sufficient capabilities for its practical
application owing to the extremely significant improvement in its accuracy
in generating the final musical score data. This is so because the system
accurately extracts pitch information and power information from acoustic
signals such as vocal songs, humming voices, and musical instrument
sounds, divides the acoustic signals accurately into single-sound segments
on the basis of such information, and identifies the musical interval and
the key with high accuracy. These performance features therefore have
proven effective in reducing the influence of noise and power fluctuations
in the processing of acoustic signals.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating the automatic music transcription
system leading to the present invention.
FIG. 2 is a block diagram illustrating the first hardware embodiment of the
automatic music transcription system according to the present invention.
FIG. 3 is a flow chart showing the automatic music transcription process in
the first embodiment of the present invention.
FIG. 4 is a summary flow chart illustrating the segmentation process based
on the power information pertinent to the present invention.
FIG. 5 is a flow chart illustrating an example of the segmentation process
in greater detail.
FIG. 6 is a characteristic curve chart illustrating one example of
segmentation by such a process.
FIG. 7 is a summary flow chart illustrating another example of the
segmentation process based on the power information according to the
present invention.
FIG. 8 is a flow chart illustrating the segmentation process in greater
detail.
FIG. 9 is a flow chart illustrating an example of the segmentation process
based on the power information according to the present invention.
FIG. 10 is a characteristic curve chart presenting the chronological change
of the power information together with the results of the segmentation.
FIG. 11 is a flow chart illustrating an example of the segmentation process
based on the power information according to the present invention.
FIG. 12 is a characteristic curve chart presenting the chronological
changes of the power information and those of the rise extracting
functions, together with the results of the segmentation.
FIG. 13 and FIG. 14 are flow charts each illustrating an example of the
segmentation process based on the power information according to the
present invention.
FIG. 15 is a characteristic curve chart presenting the chronological
changes of the power information and the rise extracting functions,
together with the results of the segmentation.
FIG. 16 and FIG. 17 are flow charts each illustrating an example of the
segmentation process based on the pitch information according to the
present invention.
FIG. 18 is a schematic drawing providing an explanation of the length of
the series.
FIG. 19 is a flow chart illustrating the reviewing process for the
segmentation according to the present invention.
FIG. 20 is a schematic drawing provided for an explanation of the reviewing
process.
FIG. 21 is a flow chart illustrating the musical interval identifying
process according to the present invention.
FIG. 22 is a schematic drawing providing an explanation of the distance of
the pitch information to the axis of the absolute musical interval in each
segment.
FIG. 23 is a flow chart illustrating an example of the musical interval
identifying process according to the present invention.
FIG. 24 is a schematic drawing illustrating one example of such a musical
interval identifying process.
FIG. 25 is a flow chart illustrating an example of the musical interval
identifying process according to the present invention.
FIG. 26 is a schematic drawing illustrating one example of such a musical
interval identifying process.
FIG. 27 is a flow chart illustrating one example of the musical interval
identifying process according to the present invention.
FIG. 28 is a schematic drawing showing one example of such a musical
interval identifying process.
FIG. 29 is a flow chart illustrating an example of the process for
correcting the identified musical interval according to the present
invention.
FIG. 30 is a schematic drawing illustrating one example of the correction
of such an identified musical interval.
FIG. 31 is a flow chart illustrating an example of the musical interval
identifying process according to the present invention.
FIG. 32 is a schematic drawing illustrating one example of such a musical
interval identifying process.
FIG. 33 is a flow chart illustrating an example of the musical interval
identifying process according to the present invention.
FIG. 34 is a chart for explaining the length of the series applicable to
the present invention.
FIG. 35 is a schematic drawing illustrating one example by such a musical
interval identifying process.
FIG. 36 is a flow chart illustrating an example of the process for
correcting the identified musical interval according to the present
invention.
FIG. 37 is a schematic drawing explaining such a correcting process for the
identified musical interval.
FIG. 38 is a flow chart illustrating an example of the key determining
process according to the present invention.
FIG. 39 is a table presenting some examples of the weighing coefficients
for each musical scale established in accordance with each key.
FIG. 40 is a flow chart illustrating an example of the key determining
process according to the present invention.
FIG. 41 is a flow chart illustrating an example of the tuning process
according to the present invention.
FIG. 42 is a histogram showing the state of distribution of the pitch
information.
FIG. 43 is a flow chart showing an example of the pitch extracting process
according to the present invention.
FIG. 44 is a schematic drawing presenting the autocorrelation function
curves to be used for the pitch extracting process.
FIG. 45 is a flow chart illustrating an example of the pitch extracting
process according to the present invention.
FIG. 46 is a schematic drawing showing the autocorrelation function curves
used in the pitch extracting process.
FIG. 47 is a block diagram illustrating the second embodiment of the
construction of the automatic musical transcription system.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Detailed descriptions of the various embodiments of the present invention
with reference to the accompanying drawings are given below.
FIG. 2 is a block diagram illustrating the construction of the automatic
music transcription system to which the first embodiment according to the
present invention is applied. FIG. 3 is a flow chart illustrating the
processing procedure for the system.
In FIG. 2, the Central Processing Unit (CPU) 1 performs overall control for
the entire system and executes the music score processing program shown in
FIG. 3. This program is stored in the main storage device 3 which is
connected to the CPU through the bus 2, to which input device keyboard 4,
output device display unit 5, auxiliary memory device 6 for use as working
memory, and analog/digital converter 7 are connected. CPU 1 and main
storage device 3 are also connected to bus 2.
To analog/digital converter 7 is connected acoustic signal input device 8,
which is composed of a microphone. This acoustic signal input device 8
captures the acoustic signals in vocal songs and transforms them into
electrical signals. The electrical signals are supplied to analog/digital
converter 7.
CPU 1 begins the music transcription process when it receives a command to
that effect as entered on the keyboard input device 4. CPU 1 then executes
the program stored in the main storage device 3, temporarily storing the
acoustic signals as converted into digital signals by the analog/digital
converter 7 into the auxiliary memory device 6. CPU 1 thereafter converts
these acoustic signals into musical score data by executing the
above-mentioned program so that the musical score data may be output as
required.
After CPU 1 has input the acoustic signals, processing for musical score
transcription occurs. This processing is described in detail with
reference to the flow chart shown in terms of functional levels in FIG. 3.
First, CPU 1 extracts pitch information for the acoustic signals for each
analytical cycle through its autocorrelation analysis of the acoustic
signals. CPU 1 also extracts power information for each analytical cycle
by first processing the acoustic signals to find the square sum, and then
performing post-treatments. Post-treatments may include the elimination of
noises and an interpolation operation (Steps SP 1 and SP 2). Thereafter,
CPU 1 calculates, with respect to the pitch information, the amount of
deviation of the musical interval axis of the acoustic signal in relation
to the axis of the absolute musical interval. This deviation is calculated
on the basis of the distribution around the musical interval axis. CPU 1
then performs the tuning process (Step SP 3), which involves shifting the
pitch information in proportion to the amount of deviation of the musical
interval axis. In other words, the CPU corrects the pitch information to
reduce the difference between the musical interval axis of the (singer or
musical instrument) and the axis of the absolute musical interval.
Then, CPU 1 executes the segmentation process. This process divides the
acoustic signals into single-sound segments, each of which have continuous
durations of pitch information. CPU 1 treats the resulting segments as
indicating one musical interval. The CPU then executes the segmentation
process again on the basis of the changes in the obtained power
information (Steps SP 4 and SP 5). Each resulting set of segment
information has continuous pitch. CPU 1 then calculates the standard
lengths corresponding respectively to the time lengths of a half note, an
eighth note, and so forth and execute the segmentation process in further
detail on the basis of these standard lengths (Step SP 6).
CPU 1 thus identifies the musical interval of a given segment with the
musical interval on the absolute musical interval axis to which the
relevant pitch information is considered to be closest. This determination
is made on the basis of the pitch information of the segment obtained by
segmentation. CPU 1 then further executes the segmentation process again
on the basis of whether or not the musical interval of the identified
segments in continuum are identical (Steps SP 7 and SP 8).
After that, CPU 1 finds the product sum of the frequency of occurrence of
the musical interval. | | |