|
Claims  |
|
|
We claim:
1. A structure for encoding by motion vectors a current frame of video data
using a reference frame of video data, said current frame and said
reference frame each being divided into multiple rows and multiple columns
of macroblocks, said macroblocks of said current frame being designated
"current macroblocks" and said macroblocks of said reference frame being
designated "reference macroblocks", each macroblock representing a number
of pixels in the corresponding frame, said structure comprising:
a memory circuit for storing (a) a plurality n of adjacent current
macroblocks from a row j of current macroblocks in said current frame,
beginning at column p of current macroblocks said current frame, said
plurality of said adjacent current macroblocks being designated C.sub.j,p,
C.sub.j,p+1, . . . , C.sub.j,p+n-1 in the order along one direction of
said row of macroblocks; and (b) a plurality of adjacent reference
macroblocks from a first column i of reference macroblocks in said
reference frame, beginning at row q of reference macroblocks in said
reference frame, said plurality of reference macroblocks being designated
R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i, said plurality of adjacent
reference macroblocks being reference macroblocks within the range of said
motion vectors, each of said current macroblocks being substantially
equidistance from said R.sub.q,i and R.sub.q+m-1,i reference macroblocks;
means for providing each current macroblock a set of scores, each score
corresponding to one of said motion vectors and measuring a difference in
pixel values between said current macroblock and a reference macroblock
linked by said motion vector;
means for selecting, using said set of scores, a motion vector
corresponding to the least of said differences measured; and
means for replacing in said memory circuit (a) the current macroblock
C.sub.j,p with a current macroblock C.sub.j,p+n, said current macroblock
C.sub.j,p+n being the current macroblock adjacent said macroblock
C.sub.j,p+n-1 in said direction; and (b) said first column of adjacent
reference macroblocks R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i with a
second column of adjacent reference macroblocks R.sub.q,i+1,
R.sub.q+1,i+1, . . . , R.sub.q+m-1,i+1, said second column being adjacent
said first column in said direction.
2. A structure as in claim 1, wherein said memory circuit, instead of
storing a row of current macroblocks and a column of reference
macroblocks, stores a column of current macroblocks and a row of reference
macroblocks.
3. A structure as in claim 1, further comprising a counter for controlling
said structure, said means for providing each current macroblock a set of
score providing said scores one score at a time, said counter including a
first field and a second field representing respectively the current
macroblock and the reference macroblock involved in the score currently
being provided; wherein, at any given time, said first field takes, as its
value, one of a first number of values, said first number of values
equalling the number of current macroblocks in said memory circuit, and
said second field takes, as its value, one of a second number of values,
said second number of values equalling the number of reference macroblocks
in said memory circuit.
4. A structure as in claim 3, wherein said first and second number of
values being programmable.
5. A structure as in claim 3, further comprising a third and a fourth field
representing, respectively, the row number j of current macroblocks in
said current frame and the column number i of said reference macroblocks
in said reference frame, wherein, at any given time, said third field
takes, as its value, one of a third number of values, said third number of
values equalling the number of rows in said current frame, and said fourth
field takes, as its value, one of a fourth number of values, said fourth
number of values equalling the number of columns in said reference frame.
6. A method for encoding by motion vectors a current frame of video data
using a reference frame of video data, said current frame and said
reference frame each being divided into multiple rows and multiple columns
of macroblocks, said macroblocks of said current frame being designated
"current macroblocks" and said macroblocks of said reference frame being
designated "reference macroblocks", each macroblock representing a number
of pixels in the corresponding frame, said method comprising the steps of:
storing in a memory circuit (a) a plurality n of adjacent current
macroblocks from a row j of current macroblocks in said current frame,
beginning at column p of current macroblocks in said current frame, said
plurality of said adjacent current macroblocks being designated C.sub.j,p,
C.sub.j,p+1, . . . , C.sub.j,p+n-1 in the order along one direction of
said row of macroblocks; and (b) a plurality of adjacent reference
macroblocks from a first column i of reference macroblocks in said
reference frame, beginning at row q of reference macroblocks in said
reference frame, said plurality of reference macroblocks being designated
R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i, said plurality of adjacent
reference macroblocks being reference macroblocks within the range of said
motion vectors, each of said current macroblocks being substantially
equidistance from said R.sub.q,i and R.sub.q+m-1,i reference macroblocks;
providing each current macroblock a set of scores, each score corresponding
to one of said motion vectors and measuring a difference in pixel values
between said current macroblock and a reference macroblock linked by said
motion vector;
selecting, using said set of scores, a motion vector corresponding to the
least of said differences measured; and
replacing in said memory circuit (a) the current macroblock C.sub.j,p with
a current macroblock C.sub.j,p+n, said current macroblock C.sub.j,p+n
being the current macroblock adjacent said macroblock C.sub.j,p+n-1 in
said direction; and (b) said first column of adjacent reference
macroblocks R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i with a second
column of adjacent reference macroblocks R.sub.q,i+1, R.sub.q+1,i+1, . . .
, R.sub.q+m-1,i+1, said second column being adjacent said first column in
said direction.
7. A method as in claim 6, wherein said step of storing in a memory
circuit, instead of storing a row of current macroblocks and a column of
reference macroblocks, stores a column of current macroblocks and a row of
reference macroblocks.
8. A method as in claim 6, further comprising the step of using a counter
for controlling, said step of providing each macroblock a set of scores
providing said scores one score at a time, said counter including a first
field and a second field representing respectively the current macroblock
and the reference macroblock involved in the score currently being
provided; wherein, at any given time, said first field takes, as its
value, one of a first number of values, said first number of values
equalling the number of current macroblocks in said memory circuit, and
said second field takes, as its value, one of a second number of values,
said second number of values equalling the number of reference macroblocks
in said memory circuit.
9. A method as in claim 8, wherein said first and second number of values
being programmable.
10. A method as in claim 8, further comprising the step of allocating in
said counter a third and a fourth field representing, respectively, the
row number j of said current macroblocks in said current frame and the
column number i of said reference macroblocks in said reference frame,
wherein, at any given time, said third field takes, as its value, one of a
third number of values, said third number of values equalling the number
of rows in said current frame, and said fourth field takes, as its value,
one of a fourth number of values, said fourth number of values equalling
the number of columns in said reference frame. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to integrated circuit designs; and, in
particular, the present invention relates to integrated circuit designs
for image processing.
2. Discussion of the Related Art
The Motion Picture Experts Group (MPEG) is an international committee
charged with providing a standard (hereinbelow "MPEG standard") for
achieving compatibility between image compression and decompression
equipment. This standard specifies both the coded digital representation
of video signal for the storage media, and the method for decoding. The
representation supports normal speed playback, as well as other playback
modes of color motion pictures, and reproduction of still pictures. The
MPEG standard covers the common 525- and 625-line television, personal
computer and workstation display formats. The MPEG standard is intended
for equipment supporting continuous transfer rate of up to 1.5 Mbits per
second, such as compact disks, digital audio tapes, or magnetic hard
disks. The MPEG standard is intended to support picture frames of
approximately 288.times.352 pixels each at a rate between 24 Hz and 30 Hz.
A publication by MPEG entitled "Coding for Moving Pictures and Associated
Audio for digital storage medium at 1.5Mbit/s," included herein as
Appendix A, provides in draft form the proposed MPEG standard, which is
hereby incorporated by reference in its entirety to provide detailed
information about the MPEG standard.
Under the MPEG standard, the picture is divided into a matrix of
"Macroblock slices" (MBS), each MBS containing a number of picture areas
(called "macroblocks") each covering an area of 16.times.16 pixels. Each
of these picture areas is further represented by one or more 8.times.8
matrices which elements are the spatial luminance and chrominance values.
In one representation (4:2:2) of the macroblock, a luminance value (Y
type) is provided for every pixel in the 16.times.16-pixel picture area
(i.e. in four 8.times.8 "Y" matrices), and chrominance values of the U and
V (i.e., blue and red chrominance) types, each covering the same
16.times.16 picture area, are respectively provided in two 8.times.8 "U"
and two 8.times.8 "V" matrices. That is, each 8.times.8 U or V matrix has
a lower resolution than its luminance counterpart and covers an area of
8.times.16 pixels. In another representation (4:2:0), a luminance value is
provided for every pixel in the 16.times.16 pixels picture area, and one
8.times.8 matrix for each of the U and V types is provided to represent
the chrominance values of the 16.times.16-pixel picture area. A group of
four contiguous pixels in a 2.times.2 configuration is called a "quad
pixel"; hence, the macroblock can also be thought of as comprising 64 quad
pixels in an 8.times.8 configuration.
The MPEG standard adopts a model of compression and decompression based on
lossy compression of both interframe and intraframe information. To
compress interframe information, each frame is encoded in one of the
following formats: "intra", "predicted", or "interpolated". Intra encoded
frames are least frequently provided, the predicted frames are provided
more frequently than the intra frames, and all the remaining frames are
interpolated frames. In a prediction frame ("P-picture"), only the
incremental changes in pixel values from the last I-picture or P-picture
are coded. In an interpolation frame ("B-picture"), the pixel values are
encoded with respect to both an earlier frame and a later frame. By
encoding frames incrementally, using predicted and interpolated frames,
the redundancy between frames can be eliminated, resulting in a high
efficiency in data storage. Under the MPEG, the motion of an object moving
from one screen position to another screen position can be represented by
motion vectors. A motion vector provides a shorthand for encoding a
spatial translation of a group of pixels, typically a macroblock.
The next steps in compression under the MPEG standard provide lossy
compression of intraframe information. In the first step, a 2-dimensional
discrete cosine transform (DCT) is performed on each of the 8.times.8
pixel matrices to map the spatial luminance or chrominance values into the
frequency domain.
Next, a process called "quantization" weights each element of the 8.times.8
transformed matrix, consisting of 1 "DC" value and sixty-three "AC"
values, according to whether the pixel matrix is of the chrominance or the
luminance type, and the frequency represented by each element of the
transformed matrix. In an I-picture, the quantization weights are intended
to reduce to zero many high frequency components to which the human eye is
not sensitive. In P- and B- pictures, which contain mostly higher
frequency components, the weights are not related to visual perception.
Having created many zero elements in the 8.times.8 transformed matrix,
each matrix can be represented without further information loss as an
ordered list consisting of the "DC" value, and alternating pairs of a
non-zero "AC" value and a length of zero elements following the non-zero
value. The values on the list are ordered such that the elements of the
matrix are presented as if the matrix is read in a zig.sub.-- zag manner
(i.e., the elements of a matrix A are read in the order A00, A01, A10,
A02, A11, A20 etc.). This representation is space efficient because zero
elements are not represented individually.
Finally, an entropy encoding scheme is used to further compress, using
variable-length codes, the representations of the DC coefficient and the
AC value-run length pairs. Under the entropy encoding scheme, the more
frequently occurring symbols are represented by shorter codes. Further
efficiency in storage is thereby achieved.
The steps involved in compression under the MPEG standard are
computationally intensive. For such a compression scheme to be practical
and widely accepted, however, a high speed processor at an economical cost
is desired. Such processor is preferably provided in an integrated
circuit.
Other standards for image processing exist. These standards include JPEG
("Joint Photographic Expert Group") and CCITT H.261 (also known as
"P.times.64"). These standards are available from the respective
committees, which are international bodies well-known to those skilled in
the art.
SUMMARY OF THE INVENTION
In accordance with the present invention, a structure and a method for
encoding digitized video signals are provided. In one embodiment, the
video signals are stored in an external memory system, and the present
embodiment provides (a) two video ports each configurable to become either
an input port or an output port for video signals; (b) a host bus
interface circuit for interfacing with an external host computer; (c) a
scratch-pad memory for storing a portion of the video image; (d) a
processor for arithmetic and logic operations, which computes discrete
cosine transforms and quantization on the video signals to obtain
coefficients for compression under a lossy compression algorithm; (e) a
motion estimation unit for matching objects in motion between frames of
images of the video signals, and outputting motion vectors representing
the motion of objects between frames; and (f) a variable-length coding
unit for applying an entropy coding scheme on the quantized coefficients
and motion vectors.
In one embodiment, a global bus is provided to be accessed by video ports,
the host bus interface, the scratch-pad memory, the processor, the motion
estimation unit, and the variable-length coding unit. The global bus
provides data transfer among the functional units. In addition, in that
embodiment, a processor bus having a higher bandwidth than the global bus
is provided to allow higher band-width data transfer among the processor,
the scratch-pad memory, and the variable-length coding units. A memory
controller controls data transfers to and from the external memory while
at the same time provides arbitration the uses of the global bus and the
processor bus.
Multiple copies of the structure of the present invention can be provided
to form a multiprocessor of video signals. Under such configuration, one
of the video ports in each structure would be used to receive the incoming
video signal, and the other video port would be used for communication
between the structure and one or more of its neighboring structures.
In accordance with another aspect of the present invention, one of the two
video port in one embodiment comprises a decimation filter for reducing
the resolution of incoming video signals. In one embodiment, one of the
video ports include an interpolator for restoring the reduced resolution
video into a higher resolution upon video signal output.
In accordance with another aspect of the present invention, a memory with a
novel address mechanism is provided to sort video signals arriving at the
structure of the present invention in pixel interleaved order into several
regions of the memory, such that the data in the several regions of this
memory can be read in block interleaved order, which is used in subsequent
signal processing steps used under various video processing standards,
including MPEG.
In accordance with another aspect of the present invention, a synchronizer
circuit synchronizes the system clock of one embodiment with an external
video clock to which the incoming video signals are synchronized. The
synchronization circuit provides for accurate detection of an edge
transition in the external clock within a time period which is comparable
with a flip-flop's metastable period, without requiring an extension of
the system clock period.
In one embodiment of the present invention, a "corner turn" memory is
provided. In this corner-turn memory, a selected region is mapped to two
set of addresses. Using an address in the first set of addresses, a row of
memory cells are accessed. Using an address in the second set of
addresses, a column of memory cells are accessed. The corner-turn memory
is particularly useful for DCT and IDCT operations where each macroblock
of pixels are accessed in two passes, one pass in column order, and the
other pass in row order.
In accordance with another aspect of the present invention, a scratch pad
memory having a width four times the data path of the processor is
provided. In addition, two set of buffer registers, each set including
registers of the width of the data path, are provided as buffers between
the processor and the scratch pad memory. The buffer registers operates at
the clock rate of the processor, while the scratch pad memory can operate
at a lower clock rate. In this manner, the bandwidths of the processor and
the scratch pad memory are matched without the use of expensive memory
circuitry. Each set of buffer registers are either loaded from, or stored
into, the scratch pad as a one register having the width of the scratch
pad memory, but accessed by the processor individually as registers having
the width of the data path. In one set of the buffer registers, each
register is provided with two addresses. Using one address, the four data
words (each having the width of the data path) are stored into the
register in the order presented. Using the other address, prior to storing
into the buffer register, a transpose is performed on the four halfwords
of the higher order two data words. A similar transpose is performed on
the four halfwords of the lower order two data words. The latter mode,
together with the corner turn memory allows pixels of a macroblock to be
read from, or stored into, the scratch pad memory either in row order or
in column order.
In accordance with another aspect of the present invention, the pixels of a
macroblock are stored in one of two arrangements in the external dynamic
random access memory. Under one arrangement, called the "scan-line" mode,
four horizontally adjacent pixels are accessed at a time. Under the other
arrangement, which is suitable for fetching reference pixels in motion
estimation, pixels are fetched in tiles (4 by 4 pixels) in column order. A
novel address generation scheme is provided to access either the memory
for scan-line elements or for quad pels. Since most filtering involves
quad pels (2.times.2 pixels), the quad pel mode arrangement is efficient
in access time and storage, and avoids rearrangement and complex address
decoding.
In accordance with another aspect of the present invention, the operand
input terminals of the arithmetic and logic unit in the process is
provided a set of "byte multiplexors" for rearranging the four 9-bit bytes
in each operand in any order. Because each 9-bit byte can be used to store
the value of a pixel, so that the arithmetic and logic unit can operate on
the pixels in a quad pel stored in a 36-bit operand simultaneously, the
byte multiplexor allows rearranging the relative positions of the pixels
within the 36-bit operands, numerous filtering operations can be achieved
by simply setting the correct pixel configuration. In one embodiment, in
accordance with the present invention, filters for performing pixel
offsets, decimations, in either horizontal or vertical directions, or both
are provided using the byte multiplexor. In addition, the present
invention provides higher compression ratios, using novel functions for
(a) activities analysis, used in applying adaptive control of
quantization, and (b) scene analysis, used in reduction of interframe
redundancy.
In accordance with another aspect of the present invention, a fast detector
of a zero result in an adder is provided. The fast zero detector includes
a number of "zero generator" circuits and a number of zero propagator
circuits. The fast detector signals the presence of a zero result within,
as a function of the length of the adder's operands, logarithm time,
rather than linear time.
In accordance with another aspect of the present invention, the present
invention provides a structure and a method for a non-linear "alpha"
filter. Under this non-linear filter, thresholds T.sub.1 and T.sub.2 are
set by the two parameters m and n. If the absolute difference between the
two input values of the non-linear filter are less than T.sub.1 or greater
than T.sub.2, a fixed relative weight are accorded the input values,
otherwise a relative weight proportional to the absolute difference is
accorded the input values. This non-linear filter finds numerous
application in signal processing. In one embodiment, the non-linear filter
is used in deinterlacing and temporal noise reduction applications.
In accordance with another aspect of the present invention, a structure for
performing motion estimation is provided, including: (a) a memory for
storing said macroblocks of a current frame and macroblocks of a reference
frame; (b) a filter receiving a first group of pixels from the memory for
resampling; and (c) a matcher receiving the resampled first group of
pixels and a second group of pixels from a current macroblock, for
evaluation of a number of motion vectors. The matcher provides a score
representing the difference between the second group of pixels and the
first group of pixels for each of the motion vectors evaluated. In this
embodiment, the best score over a macroblock is selected as the motion
vector for the macroblock. In one embodiment, the matcher evaluates 8
motion vectors at a time using a 2.times.8 "slice" of current pixels and a
4.times.12 pixel reference area.
In accordance with another aspect of the present invention, a structure is
provided for encoding by motion vectors a current frame of video data,
using a reference frame of video data. The structure includes a memory
circuit for storing (a) adjacent current macroblocks from a row j of
current macroblocks, designated C.sub.j,p, C.sub.j,p+1, . . . ,
C.sub.j,p+n-1 in the order along one direction of the row of macroblocks;
and (b) adjacent reference macroblocks from a first column i of reference
macroblocks, designated R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m,1,i and
a second column C.sub.j+1,p C.sub.p+1,p+1, . . . , C.sub.j+1,p+n+1. The
adjacent reference macroblocks are reference macroblocks within the range
of the motion vectors, with each of said current macroblocks being
substantially equidistance from the R.sub.q,i and Rq+.sub.m-1,i reference
macroblocks. The structure of the present invention evaluates each of the
adjacent current macroblocks against each of the adjacent reference
macroblocks under the motion vectors, so as to select a motion vector
representing the best match between each of said current macroblock and a
corresponding one of said reference macroblocks. When evaluation of the
current macroblock against the set of reference frame macroblock in the
memory circuit is completed, the current macroblock C.sub.j,p is remove
from the memory circuit and replaced by a current macroblock C.sub.j,p+n,
said current macroblock C.sub.j,p+n being the current macroblock adjacent
said macroblock C.sub.j,p+n-1. At the same time, the column of adjacent
reference macroblocks R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i are
removed from the memory circuit and replaced by the next column of
adjacent reference macroblocks R.sub.q,i+1, R.sub.q+1,i+1, . . . ,
R.sub.q+m-1,i+1. In this manner, each current macroblock, while in memory,
is evaluated against the largest number of reference macroblocks which can
be held in the memory circuit, thereby minimizing the number of time
current and reference macroblocks have to be loaded into memory. Of
course, for purely convenience reasons, the terms "rows" and "columns" are
used to describe the relationship between current and reference
macroblocks. It is understood that a column of current macroblocks can be
evaluated against a row of reference macroblock, within the scope of the
present invention.
In accordance with the present invention, the control structure for
controlling evaluation of motion vectors is provided by a counter which
includes first and second fields representing respectively the current
macroblock and the reference macroblock being evaluated. Under the
controlling scheme of one embodiment, each of the first and second fields
are individually counted, such that when the first field reaches a
maximum, a carry is generated to increment the count in the second field.
The number of counts in the first and second fields are respectively, the
number of current and reference macroblocks. In this manner, each current
macroblock is evaluated completely with the reference macroblocks in the
memory circuit.
In accordance with another aspect of the present invention, an adaptive
thresholding circuit is provided in the zero-packing circuit prior to
entropy encoding of the DCT coefficients into variable length code. In
this adaptive threshold circuit, a current DCT coefficient is set to zero,
if the immediately preceding and the immediately following DCT
coefficients are both zero, and the current DCT coefficient is less than a
programmable threshold. This thresholding circuit allows even higher
compression ratio by extending a zero runlength.
The present invention is better understood upon consideration of the
detailed description below and the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1a is a block diagram of an embodiment of the present invention
provided in an MPEG encoder chip 100.
FIG. 1b shows a multi-chip configuration in which two copies of chip 100,
chips 100a and 100b, are used.
FIG. 1c is a map of chip 100's address space.
FIG. 2 is a block diagram of video port 107 of chip 100 shown in FIG. 1.
FIG. 3a shows a synchronization circuit 300 for synchronizing video data
arrival at port 107 with an external video source, which provides video at
13.5 Mhz under 16-bit mode, and 27 Mhz under 8-bit mode.
FIG. 3b shows the times at which the samples of video clock signal Vclk
indicated in FIG. 3a are obtained.
FIG. 4a is a timing diagram of video port 107 for latching video data
provided at 13.5 Mhz on video bus 190a under 16-bit mode.
FIG. 4b is a timing diagram of video port 107 for latching video data
provided at 27 Mhz on video bus 190a under 8-bit mode.
FIG. 5a shows the sequence in which 4:2:2 video data arrives at port 107.
FIG. 5b is a block diagram of decimator 204 of video port 107.
FIG. 5c is a tables showing, at each phase of the CIF decimation, the data
output R.sub.out of register 201, the operand inputs A.sub.in and B.sub.in
of 14-bit adder 504, the carry-in input C.sub.in, and the data output Dec
of decimator 204.
FIG. 5d is a tables showing, at each phase of the CCR 601 decimation, the
data output R.sub.out of register 201, the operand inputs A.sub.in and
B.sub.in of 14-bit adder 504, the carry-in input C.sub.in, and the data
output Dec of decimator 204.
FIG. 6a is a block diagram of interpolator 206.
FIG. 6b is an address map of video FIFO 205, showing the partition of video
FIFO 205 into Y region 651, U region 652 and V region 653, and the storage
locations of data in a data stream 654 received from decimator 204.
FIG. 6c illustrates the generation of addresses for accessing video FIFO
205 from the contents of address counter 207, during YUV separation, or
during video output.
FIG. 6d illustrates the sequence in which stored and interpolated luminance
and chrominance pixels are output under interpolation mode.
FIG. 6e shows two block interleaved groups 630 and 631 in video FIFO 205.
FIG. 7a is an overview of data flow between memory blocks relating to CPU
150.
FIG. 7b illustrates in further detail the data flow between P memory 702,
QMEM 701, registers R0-R23, and scratch memory 159.
FIG. 7c shows the mappings of registers P4-P7 into the four physical
registers corresponding to registers P0-P3.
FIG. 7d shows the mappings between direct and alias addresses of the higher
64 36-bit locations in SMEM 159.
FIG. 8a is a block diagram of memory controller 104, in accordance with the
present invention.
FIG. 8b show a bit assignment diagram for the channel memory entries of
channel 1.
FIG. 8c show a bit assignment diagram for the channel memory entries of
channels 0, and 3-7.
FIG. 8d shows a bit assignment diagram for the channel memory entry of
channel 2.
FIG. 9a shows chip 100 interfaced with an external 4-bank memory system 103
in a configuration 900.
FIG. 9b is a timing diagram for an interleaved access under "reference"
mode of the memory system of configuration 900.
FIG. 9c is a timing diagram for an interleaved access under "scan-line"
mode of the memory system of configuration 900.
FIGS. 10a and 10b shows pixel arrangements 1000a and 1000b, which are
respectively provided to support scan-line mode operation and reference
frame fetching during motion estimation.
FIG. 10c shows the logical addresses for scan-line mode access.
FIG. 10d shows the logical addresses for reference frame fetching.
FIG. 10e shows a reference frame fetch in which the reference frame crosses
a memory page boundary.
FIGS. 11a and 11b are timing diagrams showing respectively data transfers
between external memory 103 and SMEM 159 via QG register 810.
FIG. 12 illustrates the pipeline stages of CPU 150.
FIG. 13a shows a 32-bit zero-lookahead circuit 1300, comprising 32
generator circuits 1301 and propagator circuits.
FIG. 13b shows the logic circuits for generator circuit 1301 and propagator
circuit 1302.
FIGS. 14a and 14b show schematically the byte multiplexors 1451 and 1452 of
ALU 156.
FIG. 15a is a block diagram of arithmetic unit 750.
FIG. 15b is a schematic diagram of MAC 158.
FIG. 15c(i) illustrates an example of "alpha filtering" in the mixing
filter for combining chroma during a deinterlacing operation.
FIG. 15c(ii) is a block diagram of a circuit 1550 for computing the value
of alpha.
FIG. 15c(iii) shows the values of alpha obtainable from the various values
of parameters m and n.
FIGS. 15d(i)-15d(iv) illustrates instructions using the byte multiplexors
of arithmetic unit 750, using one mode selected from each of the HOFF,
VOFF, HSHRINK and VSHRINK instructions, respectively.
FIG. 15e shows the pixels involved in computing activities of quad pels A
and B as input to a STAT1 or STAT2 instruction.
FIG. 15f shows a macroblock of luminance data for which a measure of
activity is computed using repeated calls to a STAT1 or a STAT2
instruction.
FIGS. 16a and 16b are respectively a block diagram and a data and control
flow diagram of motion estimator 111.
FIG. 16c is a block diagram of window memory 705, showing odd and even
banks 705a and 705b.
FIG. 16d shows how, in the present invention, vertical half-tiles of a
macroblock are stored in odd and even memory banks of window memory 750.
FIG. 17 illustrates a 2-stage motion estimation algorithm which can be
executed by motion estimator 111.
FIGS. 18a and 18b show, with respect to reference macroblocks, a decimated
current macroblock and the range of a motion vector having an origin at
the upper right corner of the current macroblock for the first stage of a
B frame motion estimation and a P frame motion estimation respectively.
FIG. 18c shows, with respect to reference macroblocks, a full resolution
current macroblock and the range of a motion vector having an origin at
the upper right corner of the current macroblock for the second stage of
motion estimation in both P-frame and B-frame motion estimations.
FIG. 18d shows the respectively locations of current and reference
macroblocks in the first stage of a B frame motion estimation.
FIG. 18e shows the respective locations of current and reference
macroblocks in the first stage of a P frame motion estimation.
FIG. 18f shows both a 4.times.4 tile current macroblock 1840 and a
5.times.5 tile reference region 1841 in the second stage of motion
estimation.
FIG. 18g shows the fields of a state counter 1890 having programmable
fields for control of motion estimation.
FIG. 18h shows the four possibilities by which a patch of motion vectors
crosses a reference frame boundary.
FIG. 18i shows the twelve possible ways the reference frame boundary can
intersect the reference and current macroblocks in window memory 705 under
the first stage motion estimation for B-frames.
FIG. 18j shows, for each of the 12 cases shown in FIG. 18h, the INIT and
WRAP values for each of the fields in state counter 1890.
FIG. 18k shows the twenty possible ways the reference frame boundary can
intersect the current and reference macroblocks in window memory 705.
FIG. 18l shows, for each of the twenty cases shown in FIG. 18k, the
corresponding INIT and WRAP values for each of the fields of state counter
1890.
FIGS. 18m-1 and 18m-2 show the clipping of motion estimation with respect
to the reference frame boundary for either the second stage of a 2-stage
motion estimation, or the third stage of a 3-stage motion estimation.
FIG. 18n provides the INIT and WRAP values for state counter 1890
corresponding to the reference frame boundary clipping shown in FIGS.
18m-1 and 18m-2.
FIG. 19a illustrates the algorithm used in matcher 1606 for evaluate eight
motion vectors over eight cycles.
FIG. 19b shows the locations of the "patch" of eight motion vector
evaluated for each slice of current pixels.
FIG. 19c shows the structure of matcher 1608.
FIG. 19d shows the pipeline in the motion estimator 111 formed by the
registers in subpel filter 1606.
FIGS. 20a and 20b together form a block diagram of VLC 109.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
1. Overview
FIG. 1a is a block diagram of an embodiment of the present invention
provided in an encoder/decoder integrated circuit 100 ("chip 100"). In
this embodiment, chip 100 encodes or decodes bit stream compatible with
MPEG, JPEG and CCITT H.64. As shown in FIG. 1a, chip 100 communicates
through host bus interface 102 with a host computer (not shown) over
32-bit host bus 101. Host bus interface 102 implements the IEEE 1196 NuBus
standard. In addition, chip 100 communicates with an external memory 103
(not shown) over 32-bit memory bus 105. Chip 100's access to external
memory 103 is controlled by a memory controller 104, which includes
dynamic random access memory (DRAM) controller 104a and direct memory
access (DMA) controller 106. Chip 100 has two independent 16-bit
bidirectional video ports 107 and 108 receiving and sending data on video
busses 190a and 190b respectively. Video ports 107 and 108 are
substantially identical, except that port 107 is provided with a
decimation filter, and port 108 is provided with an interpolator. Both the
decimator and the interpolator circuits of ports 107 and 108 are described
in further detail below.
The functional units of chip 100 communicate over an internal global bus
120, these units include the central processing unit (CPU) 150, the
variable-length code coder (VLC) 109, variable-length code decoder (VLD)
110, and motion estimator 111. Central processing unit 150 includes the
processor status word register 151, which stores the state of CPU 150,
instruction memory ("I mem") 152, instruction register 153, register file
("RMEM") 154, which includes 31 general purpose registers R1-R31, byte
multiplexor 155, arithmetic logic unit ("ALU") 156, memory controller 104,
multiplier-accumulator (MAC) 158, and scratch memory ("SMEM") 159, which
includes address generation unit 160. Memory controller 104 provides
access to external memory 103, including direct memory access (DMA) modes.
Global bus 120 is accessed by SMEM 159, motion estimator 111, VLC 109 and
VLD 110, memory controller 104, instruction memory 152, host interface 102
and bidirectional video ports 107 and 108. A processor bus 180 is used for
data transfer between SMEM 159, VLC 109 and VLD 110, and CPU 150.
During video operations, the host computer initializes chip 100 by loading
the configuration registers in the functional units of chip 100, and
maintains the bit streams sending to or receiving from video ports 107 and
108.
Chip 100 has an memory address space of 16 megabytes. A map of chip 100's
address space is provided in FIG. 1c. As shown in FIG. 1c, chip 100 is
assigned a base address. The memory space between the base address and the
location (base address+7FFFFF.sup.1) is reserved for an external dynamic
random access memory (DRAM). The memory space between location (base
address+800000) to location (base address+9FFFFF) is reserved for
registers addressable over global bus 120. The memory space between
location (base address+A00000) and location (base address+BFFFFF) is
reserved for registers | | |