|
Description  |
|
|
This invention relates to the generation of audio signals during play of a
software (e.g., motion picture) carrier, and more particularly to a
technique by which multiple dialog languages may be recorded on separate
audio tracks of the same carrier without requiring a full track for each
language version.
BACKGROUND OF THE INVENTION
The most widespread medium for distributing motion pictures is the
videocassette. The conventional practice is to provide only one language
soundtrack on each videocassette. This means that different versions of
the same motion picture must be prepared for distribution in different
countries. Rather than to dedicate a different version of the same motion
picture to each of several different languages, it would be far more
advantageous to provide all desired sound tracks, containing different
dialog languages, on the same carrier; this would require the production
of far fewer versions of the same motion picture. Because of the large
storage requirements, however, this has not proven to be practical. In
fact, the only practical consumer use of multiple sound tracks on the same
carrier is the provision of annotated and non-annotated soundtracks in
some laserdisc releases. (It is possible, for example, to store different
soundtracks in the digital and analog audio channels of a laserdisc.)
Despite the fact that it has occurred to others in the prior art to provide
multiple soundtracks on the same software carrier, certainly the provision
of perhaps a dozen different soundtracks, in different dialog languages,
all on the same consumer software carrier, is not to be found anywhere.
Not only are there no consumer players capable of selecting one from among
so many different sound tracks, but software publishers have just not
found it practical to store so much audio information on a single carrier.
The traditional approach is to publish different versions of the same
motion picture for distribution in different territories where different
languages are spoken.
Digitally encoded optical disks are in theory far superior for the
distribution of motion pictures and other forms of presentation.
Especially advantageous is the use of "compressed video," by which it is
possible to digitally encode a motion picture on a disk no larger than the
present-day audio CD. While much effort has been expended in developing
compressed video systems, less work has been devoted to the provision of
multiple soundtracks on the same software carrier. The conventional
thinking is to pack as much video as possible on any given disk, but still
to provide a different soundtrack version carrier for each required dialog
language.
It is therefore an object of this invention to provide a system and method
for a software publisher to record on a software carrier, such as an
optical disk, a motion picture accompanied with multiple soundtracks, in
different dialog languages, while at the same time eliminating redundant
information so that the storage is as efficient as possible.
SUMMARY OF THE INVENTION
A key to the understanding of the present invention is that there are
sections of many video programs in which no dialog occurs. In the absence
of dialog, there is no reason to provide a language-specific track. During
any "no-dialog" sequence, all that are available, if even that, are music
and effects. Thus a music and effects (M&E) track is really all that is
necessary--for all language versions--during much of the total running
time of a motion picture. In fact, an M&E track is all that is required in
the usual case for far more than half the running time. Obviously, a
Shakespearean movie will have more dialog, and hence more
language-specific dialog, than an action-adventure movie. Nevertheless,
most present-day releases have far more non-dialog M&E than they do the
spoken word.
Before summarizing the invention, it is to be appreciated that the present
invention contemplates data-efficient storage and recovery of various
audio versions, and not just different language movie soundtracks. For
example, multiple soundtracks could include teaching and testing versions
of the same material, and there could perhaps be teaching and testing
versions for multiple levels of expertise. The multiple soundtracks that
would be provided in such a case might even have some dialog in common,
not only M&E. Thus, it is to be understood that the object of the
invention is to provide a plurality of audio tracks synchronized with a
motion picture, and not necessarily audio tracks which differ only in
terms of language. It is also to be understood that the invention is not
limited to a particular medium, and it is applicable to tape carriers and
all digital storage media, not just the optical disks of the illustrative
embodiment of the invention. Nor is the invention limited only to the
distribution of motion pictures. For example, in an extreme case, the
invention is applicable to the distribution of a library of still
pictures, in which case there is no "motion" at all. The term "audio
tracks" thus embraces much more than audio tracks with different dialog
languages, the term "software publisher" thus embraces much more than a
motion picture company, and the term "carrier" embraces much more than a
digitally encoded optical disk.
The illustrative embodiment of the invention is an optical disk which
includes multiple audio tracks synchronized with a motion picture track.
At least one of the audio tracks is a mixing master or a switching master.
A mixing master is a track which includes M&E, but for the most part no
dialog. A switching master is a track which includes M&E, together with
dialog in a particular language. Other tracks on the disks are specific to
respective languages and include material that is language specific. Where
no language-specific material is required for a particular audio track,
nothing is recorded so that there is no wasted "real estate," as will be
described below. Consider the case of a mixing master M&E track, and three
language-specific audio tracks in English, Spanish and French. For a
two-hour movie, the M&E track might have recorded close to two hours of
audio. (Where there is no sound at all, there is no need to store any
data, once again in order to avoid wasting any storage capacity.) The
three language-specific tracks have dialog recorded in them, but no music
and no effects--and each of the three tracks has data recorded in it only
where it is necessary for dialog. The user selects one of the three
tracks, the French track, for example, if he wants to hear the French
version of the movie. The mixing master audio track and the French audio
track are the only ones which are read by the player, and the digital
information recorded in the two tracks is mixed, so that the net result is
a conventional soundtrack, in French. To play the Spanish version of the
same movie, the user would simply select the Spanish-soundtrack instead of
the French.
A switching master, on the other hand, would typically include dialog.
Consider a motion picture which is originally shot with the actors
speaking English. The switching master audio track would include the
original motion picture soundtrack. To play the English version of the
release, the switching master audio track would be played by itself from
beginning to end. But suppose that it is desired to play the French
version of the motion picture. In this case, the French audio track would
include not only French dialog, but French dialog together with music and
effects. All that is necessary to derive the French version of the motion
picture is to play the switching master audio track most of the time, but
to switch from it to the French audio track--and to play the French audio
track alone--where there is French dialog. The major difference between
using mixing and switching masters is that the former is mixed with one of
the language-specific tracks so that M&E can be (although does not
necessarily have to be) recorded only on the master track, while in a
switching system only one track is played at any given time so that M&E
has to be recorded on the language-specific tracks. It is also possible to
provide both schemes on the same disk, i.e., to provide both kinds of
master tracks, with some of the language-specific tracks being used with
the mixing master, and some being used with the switching master.
The disk includes within its lead-in section a series of codes which
identify whether each audio track on the disk is a mixing master, a
switching master, a track to be mixed with a mixing master, or a track to
be switched with a switching master. There are a maximum of 16 audio
tracks which may be provided. However, there are many more languages than
this number. It is necessary to identify which languages are available on
the disk so that the user can control his player to generate a soundtrack
in the desired language. For this reason, the lead-in section of the disk
identifies which languages are available on the disk. In the illustrative
embodiment of the invention, the first audio track is an M&E track, a
mixing master or a switching master. If there are a total of N audio
tracks, where N is 16 or less, then there may be N-1 language-specific
audio tracks. (There would be N-2 language-specific tracks if both mixing
and switching masters are provided.) If the first track is a mixing
master, then there can be at most N-1 language-specific versions since
dialog is available only starting with the second track. (Theoretically,
if the first track is a switching master and it contains dialog in the
original language, then this track can be played alone from beginning to
end and there are N language versions available.) If a player determines
from an analysis of the lead-in section of the disk that the first audio
track is a mixing master and the fourth audio track contains dialog in
French, and it this fourth track that is to be mixed with the mixing
master, then all that is required for generation of a French soundtrack is
to mix the first and fourth soundtracks. This is not to say that there
will always be data in these tracks. On the contrary, the underlying
assumption of the invention is that the French-specific audio track will,
more often than not, contain no data.
Information recorded on the software carrier is recorded in separately
identifiable blocks. This is true for both video and all of the
synchronized audio. Each block contains indicia of which audio tracks in
the block represent a signal. Thus, a particular block may contain
switching master information, as well as information in a
language-specific track which is to be switched with the switching master.
When the player determines at the start of the reading of a block that the
block contains data in a language-specific track, it switches from play of
the switching master to play of the language-specific track. All it takes
is a single bit for each of the up to N tracks at the beginning of a block
to allow the player to determine whether respective language-specific
information is in the block being processed.
Other features of the invention will be described below. For example, a
citizen of Spain, who purchases a player and optical disks in Spain, can
be assumed to want to hear Spanish versions of a motion picture.
Therefore, a player sold in Spain should "default" to play of a Spanish
audio track if one is available on the disk. Only if the default language
is not available, or the user actually wants to hear dialog in a different
language, should she be required to choose from among the available
languages. How the data is stored on software carriers, and how it is
accessed and played, will be discussed at length below.
The invention is disclosed in the context of an overall system which offers
numerous advantageous features. The entire system is described although
the appended claims are directed to specific features. The overall list of
features which are of particular interest in the description below
include:
Video standard and territorial lock out.
Play in multiple aspect ratios.
Play of multiple versions, e.g., PG-rated and R-rated, of the same motion
picture from the same disk, with selective automatic parental disablement
of R-rated play.
Encrypted authorization codes that prevent unauthorized publishers from
producing playable disks.
Provision of multiple-language audio tracks and multiple-language subtitle
tracks on a single disk, with the user specifying the language of choice.
Provision of multiple "other" audio tracks, e.g., each containing some
component of orchestral music, with the user choosing the desired mix.
Variable rate encoding of data blocks, and efficient use of bit capacity
with track switching and/or mixing, to allow all of the above capabilities
on a single carrier.
Further objects, features and advantages of the invention will become
apparent upon consideration of the following detailed description in
conjunction with the drawing, in which:
FIG. 1 depicts a prior art system and typifies the lack of flexibility in,
and the poor performance of, presently available media players;
FIG. 2 depicts the illustrative embodiment of the invention;
FIG. 3 is a chart which lists the fields in the lead-in portion of the
digital data track of an optical disk that can be played in the system of
FIG. 2;
FIG. 4 is a similar chart which lists the fields in each of the data blocks
which follow the lead-in track section of FIG. 3;
FIGS. 5A-5E comprise a flowchart that illustrates the processing by the
system of FIG. 2 of the data contained in the lead-in track section of an
optical disk being played;
FIG. 6 is a flowchart that illustrates the processing of the data blocks,
in the format depicted in FIG. 4, that follow the lead-in section of the
track;
FIG. 7A is a state diagram and legend that characterize the manner in which
the player of the invention reads only those data blocks on a disk track
that are required for the play of a selected version of a motion picture
or other video presentation, and FIG. 7B depicts the way in which one of
two alternate versions can be played by following the rules illustrated by
the state diagram of FIG. 7A;
FIG. 8 depicts symbolically a prior art technique used in compressing the
digital representation of a video signal; and
FIG. 9 illustrates the relationships among three different image aspect
ratios.
THE PRIOR ART
The limitations of the prior art are exemplified by the system of FIG. 1.
Such a system is presently available for playing a single source of
program material, usually a VHS videocassette, to generate a video signal
conforming to a selected one of multiple standards. A system of this type
is referred to as a multistandard VCR, although stand-alone components are
shown in the drawing. Typically, a VHS tape 7 has recorded on it an NTSC
(analog) video signal, and the tape is played in a VHS player 5. The
analog signal is converted to digital form in A/D converter 9, and the
digital representations of successive frames are written into video frame
store 11. Circuit 13 then deletes excess frames, or estimates and adds
additional frames, necessary to conform to the selected standard, e.g.,
PAL. To convert from one standard to another, it is generally necessary to
change the number of horizontal lines in a field or frame (image scaling).
This is usually accomplished by dropping some lines, and/or repeating some
or averaging successive lines to derive a new line to be inserted between
them. The main function of circuit 13, of course, is to convert a digital
frame representation to analog form as the video output.
Systems of the type shown in FIG. 1 generally degrade the video output.
Conventional videocassettes deliver reduced quality video when they
support more than one video standard. One reason is that there is a double
conversion from analog to digital, and then back again. Another is that
the image scaling is usually performed in a crude manner (deleting lines,
repeating lines and averaging lines). There are known ways, however, to
perform image scaling in the digital domain without degrading the picture.
While not generally used, the technique is in the prior art and will
therefore be described briefly as it is also used in the illustrative
embodiment of the invention.
To give a concrete example, the PAL standard has 625 lines per frame, while
the NTSC standard has 525 lines per frame. Because no part of the image is
formed during the vertical retrace, not all of the horizontal line scans
in either system are usable for representing image information. In the PAL
standard there are nominally 576 lines per frame with image information,
and in an NTSC frame there are nominally 483 lines with image information.
To convert from one standard to another, successive fields are first
de-interlaced. Then 576 lines are converted to 483, or vice versa, and
re-interlaced. How this is done is easy to visualize conceptually.
Consider, for example, a very thin vertical slice through a PAL frame. The
slice is broken down into its three color components. Image scaling for
converting from PAL to NTSC, from a conceptual standpoint, is nothing more
than drawing a curve based on 576 PAL pieces of color data and then
dividing the curve into 483 parts to derive a piece of data for each
horizontal line of the desired NTSC signal. In actuality, this is
accomplished by a process of interpolation, and it is done digitally.
(Image scaling, in general, may also involve a change in the aspect ratio,
for example, in going from HDTV to NTSC, and may require clipping off
information at both ends of every horizontal line.)
While prior art systems thus do provide for standards conversion, that is
about the extent of their flexibility. The system of FIG. 2, on the other
hand, offers unprecedented flexibility in ways not even contemplated in
the prior art.
The Illustrative System Of The Invention
The system of FIG. 2 includes a disk drive 21 for playing an optical disk
23. Digital data stored on the disk appears on the DATA OUT conductor 25.
The disk drive operation is governed by microprocessor disk drive
controller 27. The read head is positioned by commands issued over HEAD
POSITION CONTROL lead 29, and the speed of the disk rotation is governed
by commands issued over RATE CONTROL conductor 31. Optical disks are
usually driven at either constant linear velocity or constant angular
velocity. (Another possibility involves the use of a discrete number of
constant angular velocities.) Disks of the invention may be driven at
constant linear velocity so that the linear length of track taken by each
bit is the same whether a bit is recorded in an inner or outer portion of
the track. This allows for the storage of the most data. A constant linear
velocity requires that the rate of rotation of the disk decrease when
outer tracks are being read. This type of optical disk control is
conventional. For example, the CD audio standard also requires disks which
are rotated at a constant linear rate.
Microprocessor 41 is the master controller of the system. As such, it
issues commands to the disk drive controller over conductor 43 and it
determines the status of the disk drive controller over conductor 45. The
disk drive controller is provided with two other inputs. Block
number/pointer analyzer 47 issues commands to the disk drive controller
over conductor 49, and BUFFER FULL conductor 51 extends a control signal
from OR gate 54 to the disk drive controller. These two inputs will be
described below. (In general, although reference is made to individual
conductors, it is to be understood that in context some of these
conductors are in reality cables for extending bits in parallel. For
example, while the output of OR gate 54 can be extended to the disk drive
controller over a single conductor 51, block number/pointer analyzer 47
could be connected to the disk drive controller over a cable 49 so that
multi-bit data can be sent in parallel rather than serially.)
An important feature of the system of FIG. 2 is that bit information is
stored on the disk at a rate which varies according to the complexity of
the encoded material. By this is meant not that the number of bits per
second which actually appear on the DATA OUT conductor 25 varies, but
rather that the number of bits which are used per second varies. Video
information is stored in compressed digital form. FIG. 8 shows the manner
in which video frames are coded according to the MPEG1 and MPEG2
standards. An independent I-frame is coded in its entirety. Predicted or
P-frames are frames which are predicted based upon preceding independent
frames, and the digital information that is actually required for a P
frame simply represents the difference between the actual frame and its
prediction. Bidirectionally predicted B-frames are frames which are
predicted from I and/or P frames, with the information required for such a
frame once again representing the difference between the actual and
predicted forms. (As can be appreciated, fast forward and fast reverse
functions, if desired, are best implemented using I-frames.) The number of
bits required to represent any frame depends not only on its type, but
also on the actual visual information which is to be represented.
Obviously, it requires far fewer bits to represent a blue sky than it does
to represent a field of flowers. The MPEG standards are designed to allow
picture frames to be encoded with a minimal number of bits. Frame
information is required at a constant rate. For example, if a motion
picture film is represented in digital form on the disk, 24 frames will be
represented for each second of play. The number of bits required for a
frame differs radically from frame to frame. Since frames are processed at
a constant rate, it is apparent that the number of bits which are
processed (used) per second can vary from very low values to very high
values. Thus when bits are actually read from the disk, while they may be
read from the disk at a constant rate, they are not necessarily processed
at a constant rate.
Similar considerations apply to any audio stored on the disk. Any data
block may contain the bit information required for a variable number of
image frames. Any data block may similarly contain the bit information
required for a variable time duration of a variable number of even
numerous audio tracks. (There is just one physical track. The reference to
multiple audio tracks is to different series of time-division slices
containing respective audio materials.) The audio tracks contain digital
information, which may also be in compressed form. This means that if
there is information stored in any data block for a particular audio
track, those bits do not necessarily represent the same time duration. It
might be thought that the duration of the sound recorded for any audio
track corresponding to any picture flames represented in a block would be
the duration of the picture flames. However, that is not necessarily true.
This means that audio information may be read before it is actually
needed, with the reading of more audio information pausing when a
sufficient amount has already accumulated or with audio not being included
in some data blocks to compensate for the preceding over-supply. This
leads to the concept of buffering, the function of audio buffers 53, video
buffer 55, pan scan buffer 57, subtitle buffer 59, and OR gate 54 which
generates the BUFFER FULL signal.
As each data block is read from the disk, it passes through gate 61,
provided the gate is open, and the bit fields are distributed by
demultiplexer 63 to the various buffers and, over the COMMAND/DATA line
65, to master controller 41. Each data block in the illustrative
embodiment of the invention contains video bit information corresponding
to a variable number of picture frames. As discussed above, there may be a
large number of bits, or a small number, or even no bits (for example, if
the particular disk being played does not represent any video). Successive
groups of video data are stored in video buffer 55 separated by markers.
Video decoder 67 issues a command over conductor 69 when it wants to be
furnished with a new batch of data over conductor 71. Commands are issued
at a steady rate, although the number of bits furnished in reply vary in
accordance with the number of bits required for the particular flames
being processed. The rate at which bits are read from the disk drive is
high enough to accommodate flames which require maximal information, but
most flames do not. This means that the rate at which data blocks are
actually read is higher than the rate at which they are used. This does
not mean, however, that a well-designed system should delay reading of a
block of data until the data is actually required for processing. For one
thing, when data is actually required, the read head may not be positioned
at the start of the desired data block. It is for this reason that
buffering is provided. The video buffer 55 contains the bit information
for a number of successive frames (the actual number depending upon the
rate at which bits are read, the rate at which frames are processed, etc.,
as is known in the art), and video data block info | | |