|
|
|
| United States Patent | 5598514 |
| Link to this page | http://www.wikipatents.com/5598514.html |
| Inventor(s) | Purcell; Stephen C. (Mountain View, CA);
Le Gall; Didier J. (Los Altos, CA);
Bose; Subroto (Santa Clara, CA) |
| Abstract | A structure and a format provide a video signal encoder under the MPEG
(Motion Picture Experts Group) standard. In one embodiment, the video
signal interface is provided with a decimator for providing input
filtering for the incoming signals. In one embodiment, the central
processing unit (CPU) and multiple coprocessors implements discrete cosine
transform (DCT) and inverse discrete cosine transform (IDCT) and other
signal processing functions, generating variable length codes, and
provides motion estimation and memory management. The instruction set of
the central processing unit provides numerous features in support for such
features as alpha filtering, eliminating redundancies in video signals
derived from motion pictures and scene analysis. In one embodiment, a
matcher evaluates 16 absolute differences to evaluate a "patch" of eight
motion vectors at a time. |
|
|
|
Title Information  |
|
|
|
|
|
|
| Publication Date |
January 28, 1997 |
|
|
|
|
|
| Filing Date |
August 9, 1993 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 3252148
|      Your vote accepted [0 after 0 votes] | | 5335321 Harney
Aug,1994 |      Your vote accepted [0 after 0 votes] | | 5231484 Gonzales 375/240.04 Jul,1993 |      Your vote accepted [0 after 0 votes] | | 5099322 Gove 348/700 Mar,1992 |      Your vote accepted [0 after 0 votes] | | 5049993 LeGall 348/448 Sep,1991 |      Your vote accepted [0 after 0 votes] | | 5014187 Debize 710/66 May,1991 |      Your vote accepted [0 after 0 votes] | | 4973860 Ludwig 327/145 Nov,1990 |      Your vote accepted [0 after 0 votes] | | 4935942 Hwang 375/354 Jun,1990 |      Your vote accepted [0 after 0 votes] | | 4870563 Oguchi 712/224 Sep,1989 |      Your vote accepted [0 after 0 votes] | | 4838685 Martinez
Jun,1989 |      Your vote accepted [0 after 0 votes] | | 4816914 Ericsson 375/240.03 Mar,1989 |      Your vote accepted [0 after 0 votes] | | 4779190 O'Dell 710/66 Oct,1988 |      Your vote accepted [0 after 0 votes] | | 4591976 Webber 714/20 May,1986 |      Your vote accepted [0 after 0 votes] | | 4559608 Young 708/702 Dec,1985 |      Your vote accepted [0 after 0 votes] | | 4514808 Murayama 710/307 Apr,1985 |      Your vote accepted [0 after 0 votes] | | 4489395 Sato 712/38 Dec,1984 |      Your vote accepted [0 after 0 votes] | | 3812467 Batcher 712/300 May,1974 |      Your vote accepted [0 after 0 votes] | | | | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
We claim:
1. A structure for encoding digitized video signals representing a series
of frames of images, said digitized video signals being stored in an
external memory system, said structure comprising:
a first and a second video ports, each video port being configurable to be
either an input port or an output port for video signals;
a host bus interface circuit for interfacing with an external host
computer;
a scratch-pad memory for storing a portion of said series of frames of
images;
a processor for arithmetic and logic operations, wherein said processor
computing coefficients of a discrete cosine transform of said portion of
said series of frames of images, and for applying a quantization step for
said coefficients to obtained quantized coefficients under a lossy
compression algorithm;
a motion estimation unit for matching objects in motion between said frames
of images, said motion estimation unit providing as data output motion
vectors representing said motion of said objects in motion between said
frames of images;
a variable-length coding unit for applying an entropy coding scheme on said
quantized coefficients and said motion vectors to represent said video
signals;
a global bus accessible by said first and second video port, said host bus
interface, said scratch-pad memory, said processor, said motion estimation
unit, and said variable-length coding unit, said global bus providing data
transfer among said first and second video port, said host bus interface,
said scratch-pad memory, said processor, said motion estimation unit, and
said variable-length coding unit;
a processor bus having a higher bandwidth than said global bus for
providing data transfer among said processor, said scratch-pad memory, and
said variable-length coding unit; and
a memory controller for (a) controlling data transfers between said
external memory and said structure, and (b) for controlling the uses of
said global bus and said processor bus.
2. A structure as in claim 1, wherein said processor comprises:
an instruction memory for storing instructions executable by said
processor;
a register file including a predetermined number of registers for storing
operands;
an arithmetic and logic unit for providing arithmetic and logic operations
for operands in said register file; and
a multiplication unit for performing multiplication operations among said
operands and a result of said arithmetic and logic operations.
3. A structure as in claim 1, for encoding digitized video signals
representing a series of frames of images, said digitized video signals
being stored in an external memory system, said structure comprising:
a first and a second video ports, each video port being configurable to be
either an input port or an output port for video signals;
a host bus interface circuit for interfacing with an external host
computer;
a scratch-pad memory for storing a portion of said series of frames of
images;
a processor for arithmetic and logic operations, wherein said processor
computing coefficients of a discrete cosine transform of said portion of
said series of frames of images, and for applying a quantization step for
said coefficients to obtained quantized coefficients under a lossy
compression algorithm;
a motion estimation unit for matching objects in motion between said frames
of images, said motion estimation unit providing as data output motion
vectors representing said motion of said objects in motion between said
frames of images;
a variable-length coding unit for applying an entropy coding scheme on said
quantized coefficients and said motion vectors to represent said video
signals;
a global bus accessible by said first and second video port, said host bus
interface, said scratch-pad memory, said processor, said motion estimation
unit, and said variable-length coding unit, said global bus providing data
transfer among said first and second video port, said host bus interface,
said scratch-pad memory, said processor, said motion estimation unit, and
said variable-length coding unit;
a processor bus having a higher bandwidth than said global bus for
providing data transfer among said processor, said scratch-pad memory, and
said variable-length coding unit; and
a memory controller for (a) controlling data transfers between said
external memory and said structure, and (b) for controlling the uses of
said global bus and said processor bus;
wherein said motion estimation unit comprises:
a window memory for storing a second portion of said series of frames of
images, said second portion being a subset of said portion of said series
of frames of images stored in said scratch-pad memory, said second portion
of said series of frames of images including video data from a current
frame and video data from a reference frame; and
a matcher for matching said video data from said current frame and said
video data from said reference frame to evaluate a predetermined number of
motion vectors.
4. A structure for encoding digitized video signals representing a series
of frames of images, said digitized video signals being stored in an
external memory system, said structure comprising:
a first and a second video ports, each video port being configurable to be
either an input port or an output port for video signals;
a host bus interface circuit for interfacing with an external host
computer;
a scratch-pad memory for storing a portion of said series of frames of
images;
a processor for arithmetic and logic operations, wherein said processor
computing coefficients of a discrete cosine transform of said portion of
said series of frames of images, and for applying a quantization step for
said coefficients to obtained quantized coefficients under a lossy
compression algorithm;
a motion estimation unit for matching objects in motion between said frames
of images, said motion estimation unit providing as data output motion
vectors representing said motion of said objects in motion between said
frames of images;
a variable-length coding unit for applying an entropy coding scheme on said
quantized coefficients and said motion vectors to represent said video
signals;
a global bus accessible by said first and second video port, said host bus
interface, said scratch-pad memory, said processor, said motion estimation
unit, and said variable-length coding unit, said global bus providing data
transfer among said first and second video port, said host bus interface,
said scratch-pad memory, said processor, said motion estimation unit, and
said variable-length coding unit;
a processor bus having a higher bandwidth than said global bus for
providing data transfer among said processor, said scratch-pad memory, and
said variable-length coding unit; and
a memory controller for (a) controlling data transfers between said
external memory and said structure, and (b) for controlling the uses of
said global bus and said processor bus; wherein said first video port
comprises a decimation filter for reducing the resolution of said video
signals.
5. A system comprising a first and a second structures, each structure
encoding digitized video signals representing a series of frames of
images, said digitized video signals being stored in an external memory
system, each structure comprising:
a first and a second video ports, each video port being configurable to be
either an input port or an output port for video signals;
a host bus interface circuit for interfacing with an external host
computer;
a scratch-pad memory for storing a portion of said series of frames of
images;
a processor for arithmetic and logic operations, wherein said processor
computing coefficients of a discrete cosine transform of said portion of
said series of frames of images, and for applying a quantization step for
said coefficients to obtained quantized coefficients under a lossy
compression algorithm;
a motion estimation unit for matching objects in motion between said frames
of images, said motion estimation unit providing as data output motion
vectors representing said motion of said objects in motion between said
frames of images;
a variable-length coding unit for applying an entropy coding scheme on said
quantized coefficients and said motion vectors to represent said video
signals;
a global bus accessible by said first and second video port, said host bus
interface, said scratch-pad memory, said processor, said motion estimation
unit, and said variable-length coding unit, said global bus providing data
transfer among said first and second video port, said host bus interface,
said scratch-pad memory, said processor, said motion estimation unit, and
said variable-length coding unit;
a processor bus having a higher bandwidth than said global bus for
providing data transfer among said processor, said scratch-pad memory, and
said variable-length coding unit; and
a memory controller for (a) controlling data transfers between said
external memory and said structure, and (b) for controlling the uses of
said global bus and said processor bus;
wherein said first video port of said first structure and said first video
port of said second structure are connected to receive said video signals,
and said second video port of said first structure and said second video
port of said second structure are connected to pass said video data
between said first structure and said second structure.
6. A method for encoding digitized video signals representing a series of
frames of images, said digitized video signals being stored in an external
memory system, said method comprising the steps of:
providing a first and a second video ports, each video port being
configurable to be either an input port or an output port for video
signals;
using a host bus interface circuit to interface with an external host
computer;
storing a portion of said series of frames of images in a scratch-pad
memory;
providing a processor for arithmetic and logic operations, wherein said
processor computing coefficients of a discrete cosine transform of said
portion of said series of frames of images, and for applying a
quantization step for said coefficients to obtained quantized coefficients
under a lossy compression algorithm;
matching objects in motion between said frames of images using a motion
estimation unit, said motion estimation unit providing as data output
motion vectors representing said motion of said objects in motion between
said frames of images;
applying in a variable-length coding unit an entropy coding scheme on said
quantized coefficients and said motion vectors to represent said video
signals;
providing a global bus accessible by said first and second video port, said
host bus interface, said scratch-pad memory, said processor, said motion
estimation unit, and said variable-length coding unit, said global bus
providing data transfer among said first and second video port, said host
bus interface, said scratch-pad memory, said processor, said motion
estimation unit, and said variable-length coding unit;
providing a processor bus having a higher bandwidth than said global bus
for providing data transfer among said processor, said scratch-pad memory,
and said variable-length coding unit; and
providing a memory controller for (a) controlling data transfers between
said external memory and said structure, and (b) for controlling the uses
of said global bus and said processor bus.
7. A method as in claim 6, wherein said step of providing a processor
comprises the steps of:
storing instructions executable by said processor in an instruction memory;
storing operands in a register file including a predetermined number of
registers;
providing arithmetic and logic operations in an arithmetic and logic unit
for operands in said register file; and
performing multiplication operations among said operands in a
multiplication unit and a result of said arithmetic and logic operations.
8. A method for encoding digitized video signals representing a series of
frames of images, said digitized video signals being stored in an external
memory system, said method comprising the steps of:
providing a first and a second video ports, each video port being
configurable to be either an input port or an output port for video
signals;
using a host bus interface circuit to interface with an external host
computer;
storing a portion of said series of frames of images in a scratch-pad
memory;
providing a processor for arithmetic and logic operations, wherein said
processor computing coefficients of a discrete cosine transform of said
portion of said series of frames of images, and for applying a
quantization step for said coefficients to obtained quantized coefficients
under a lossy compression algorithm;
matching objects in motion between said frames of images using a motion
estimation unit, said motion estimation unit providing as data output
motion vectors representing said motion of said objects in motion between
said frames of images;
applying in a variable-length coding unit an entropy coding scheme on said
quantized coefficients and said motion vectors to represent said video
signals;
providing a global bus accessible by said first and second video port, said
host bus interface, said scratch-pad memory, said processor, said motion
estimation unit, and said variable-length coding unit, said global bus
providing data transfer among said first and second video port, said host
bus interface, said scratch-pad memory, said processor, said motion
estimation unit, and said variable-length coding unit;
providing a processor bus having a higher bandwidth than said global bus
for providing data transfer among said processor, said scratch-pad memory,
and said variable-length coding unit; and
providing a memory controller for (a) controlling data transfers between
said external memory and said structure, and (b) for controlling the uses
of said global bus and said processor bus;
wherein said matching step comprises the steps of:
storing in a window memory a second portion of said series of frames of
images, said second portion being a subset of said portion of said series
of frames of images stored in said scratch-pad memory, said second portion
of said series of frames of images including video data from a current
frame and video data from a reference frame; and
matching in a matcher said video data from said current frame and said
video data from said reference frame to evaluate a predetermined number of
motion vectors.
9. A method for encoding digitized video signals representing a series of
frames of images, said digitized video signals being stored in an external
memory system, said method comprising the steps of:
providing a first and a second video ports, each video port being
configurable to be either an input port or an output port for video
signals;
using a host bus interface circuit to interface with an external host
computer;
storing a portion of said series of frames of images in a scratch-pad
memory;
providing a processor for arithmetic and logic operations, wherein said
processor computing coefficients of a discrete cosine transform of said
portion of said series of frames of images, and for applying a
quantization step for said coefficients to obtained quantized coefficients
under a lossy compression algorithm;
matching objects in motion between said frames of images using a motion
estimation unit, said motion estimation unit providing as data output
motion vectors representing said motion of said objects in motion between
said frames of images;
applying in a variable-length coding unit an entropy coding scheme on said
quantized coefficients and said motion vectors to represent said video
signals;
providing a global bus accessible by said first and second video port, said
host bus interface, said scratch-pad memory, said processor, said motion
estimation unit, and said variable-length coding unit, said global bus
providing data transfer among said first and second video port, said host
bus interface, said scratch-pad memory, said processor, said motion
estimation unit, and said variable-length coding unit;
providing a processor bus having a higher bandwidth than said global bus
for providing data transfer among said processor, said scratch-pad memory,
and said variable-length coding unit; and
providing a memory controller for (a) controlling data transfers between
said external memory and said structure, and (b) for controlling the uses
of said global bus and said processor bus;
wherein said step of providing provides a first video port comprising a
decimation filter for reducing the resolution of said video signals. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to integrated circuit designs; and, in
particular, the present invention relates to integrated circuit designs
for image processing.
2. Discussion of the Related Art
The Motion Picture Experts Group (MPEG) is an international committee
charged with providing a standard (hereinbelow "MPEG standard") for
achieving compatibility between image compression and decompression
equipment. This standard specifies both the coded digital representation
of video signal for the storage media, and the method for decoding. The
representation supports normal speed playback, as well as other playback
modes of color motion pictures, and reproduction of still pictures. The
MPEG standard covers the common 525- and 625-line television, personal
computer and workstation display formats. The MPEG standard is intended
for equipment supporting continuous transfer rate of up to 1.5 Mbits per
second, such as compact disks, digital audio tapes, or magnetic hard
disks. The MPEG standard is intended to support picture frames of
approximately 288.times.352 pixels each at a rate between 24 Hz and 30 Hz.
A publication by MPEG entitled "Coding for Moving Pictures and Associated
Audio for digital storage medium at 1.5 Mbit/s," included herein as
Appendix A, provides in draft form the proposed MPEG standard, which is
hereby incorporated by reference in its entirety to provide detailed
information about the MPEG standard.
Under the MPEG standard, the picture is divided into a matrix of
"Macroblock slices" (MBS), each MBS containing a number of picture areas
(called "macroblocks") each covering an area of 16.times.16 pixels. Each
of these picture areas is further represented by one or more 8.times.8
matrices which elements are the spatial luminance and chrominance values.
In one representation (4:2:2) of the macroblock, a luminance value (Y
type) is provided for every pixel in the 16.times.16-pixel picture area
(i.e. in four 8.times.8 "Y" matrices), and chrominance values of the U and
V (i.e., blue and red chrominance) types, each covering the same
16.times.16 picture area, are respectively provided in two 8.times.8 "U"
and two 8.times.8 "V" matrices. That is, each 8.times.8 U or V matrix has
a lower resolution than its luminance counterpart and covers an area of
8.times.16 pixels. In another representation (4:2:0), a luminance value is
provided for every pixel in the 16.times.16 pixels picture area, and one
8.times.8 matrix for each of the U and V types is provided to represent
the chrominance values of the 16.times.16-pixel picture area. A group of
four contiguous pixels in a 2.times.2 configuration is called a "quad
pixel"; hence, the macroblock can also be thought of as comprising 64 quad
pixels in an 8.times.8 configuration.
The MPEG standard adopts a model of compression and decompression based on
lossy compression of both interframe and intraframe information. To
compress interframe information, each frame is encoded in one of the
following formats: "intra", "predicted", or "interpolated". Intra encoded
frames are least frequently provided, the predicted frames are provided
more frequently than the intra frames, and all the remaining frames are
interpolated frames. In a prediction frame ("P-picture"), only the
incremental changes in pixel values from the last I- picture or P-picture
are coded. In an interpolation frame ("B-picture"), the pixel values are
encoded with respect to both an earlier frame and a later frame. By
encoding frames incrementally, using predicted and interpolated frames,
the redundancy between frames can be eliminated, resulting in a high
efficiency in data storage. Under the MPEG, the motion of an object moving
from one screen position to another screen position can be represented by
motion vectors. A motion vector provides a shorthand for encoding a
spatial translation of a group of pixels, typically a macroblock.
The next steps in compression under the MPEG standard provide lossy
compression of intraframe information. In the first step, a 2-dimensional
discrete cosine transform (DCT) is performed on each of the 8.times.8
pixel matrices to map the spatial luminance or chrominance values into the
frequency domain.
Next, a process called "quantization" weights each element of the 8.times.8
transformed matrix, consisting of 1 "DC" value and sixty-three "AC"
values, according to whether the pixel matrix is of the chrominance or the
luminance type, and the frequency represented by each element of the
transformed matrix. In an I-picture, the quantization weights are intended
to reduce to zero many high frequency components to which the human eye is
not sensitive. In P- and B- pictures, which contain mostly higher
frequency components, the weights are not related to visual perception.
Having created many zero elements in the 8.times.8 transformed matrix,
each matrix can be represented without further information loss as an
ordered list consisting of the "DC" value, and alternating pairs of a
non-zero "AC" value and a length of zero elements following the non-zero
value. The values on the list are ordered such that the elements of the
matrix are presented as if the matrix is read in a zig.sub.-- zag manner
(i.e., the elements of a matrix A are read in the order A00, A01, A10,
A02, A11, A20 etc.). This representation is space efficient because zero
elements are not represented individually.
Finally, an entropy encoding scheme is used to further compress, using
variable-length codes, the representations of the DC coefficient and the
AC value-run length pairs. Under the entropy encoding scheme, the more
frequently occurring symbols are represented by shorter codes. Further
efficiency in storage is thereby achieved.
The steps involved in compression under the MPEG standard are
computationally intensive. For such a compression scheme to be practical
and widely accepted, however, a high speed processor at an economical cost
is desired. Such processor is preferably provided in an integrated
circuit.
Other standards for image processing exist. These standards include JPEG
("Joint Photographic Expert Group") and CCITT H.261 (also known as
"P.times.64"). These standards are available from the respective
committees, which are international bodies well-known to those skilled in
the art.
SUMMARY OF THE INVENTION
In accordance with the present invention, a structure and a method for
encoding digitized video signals are provided. In one embodiment, the
video signals are stored in an external memory system, and the present
embodiment provides (a) two video ports each configurable to become either
an input port or an output port for video signals; (b) a host bus
interface circuit for interfacing with an external host computer; (c) a
scratch-pad memory for storing a portion of the video image; (d) a
processor for arithmetic and logic operations, which computes discrete
cosine transforms and quantization on the video signals to obtain
coefficients for compression under a lossy compression algorithm; (e) a
motion estimation unit for matching objects in motion between frames of
images of the video signals, and outputting motion vectors representing
the motion of objects between frames; and (f) a variable-length coding
unit for applying an entropy coding scheme on the quantized coefficients
and motion vectors.
In one embodiment, a global bus is provided to be accessed by video ports,
the host bus interface, the scratch-pad memory, the processor, the motion
estimation unit, and the variable-length coding unit. The global bus
provides data transfer among the functional units. In addition, in that
embodiment, a processor bus having a higher bandwidth than the global bus
is provided to allow higher band-width data transfer among the processor,
the scratch-pad memory, and the variable-length coding units. A memory
controller controls data transfers to and from the external memory while
at the same time provides arbitration the uses of the global bus and the
processor bus.
Multiple copies of the structure of the present invention can be provided
to form a multiprocessor of video signals. Under such configuration, one
of the video ports in each structure would be used to receive the incoming
video signal, and the other video port would be used for communication
between the structure and one or more of its neighboring structures.
In accordance with another aspect of the present invention, one of the two
video port in one embodiment comprises a decimation filter for reducing
the resolution of incoming video signals. In one embodiment, one of the
video ports include an interpolator for restoring the reduced resolution
video into a higher resolution upon video signal output.
In accordance with another aspect of the present invention, a memory with a
novel address mechanism is provided to sort video signals arriving at the
structure of the present invention in pixel interleaved order into several
regions of the memory, such that the data in the several regions of this
memory can be read in block interleaved order, which is used in subsequent
signal processing steps used under various video processing standards,
including MPEG.
In accordance with another aspect of the present invention, a synchronizer
circuit synchronizes the system clock of one embodiment with an external
video clock to which the incoming video signals are synchronized. The
synchronization circuit provides for accurate detection of an edge
transition in the external clock within a time period which is comparable
with a flip-flop's metastable period, without requiring an extension of
the system clock period.
In one embodiment of the present invention, a "corner turn" memory is
provided. In this corner-turn memory, a selected region is mapped to two
sets of addresses. Using an address in the first set of addresses, a row
of memory cells are accessed. Using an address in the second set of
addresses, a column of memory cells are accessed. The corner-turn memory
is particularly useful for DCT and IDCT operations where each macroblock
of pixels are accessed in two passes, one pass in column order, and the
other pass in row order.
In accordance with another aspect of the present invention, a scratch pad
memory having a width four times the data path of the processor is
provided. In addition, two set of buffer registers, each set including
registers of the width of the data path, are provided as buffers between
the processor and the scratch pad memory. The buffer registers operates at
the clock rate of the processor, while the scratch pad memory can operate
at a lower clock rate. In this manner, the bandwidths of the processor and
the scratch pad memory are matched without the use of expensive memory
circuitry. Each set of buffer registers are either loaded from, or stored
into, the scratch pad as a one register having the width of the scratch
pad memory, but accessed by the processor individually as registers having
the width of the data path. In one set of the buffer registers, each
register is provided with two addresses. Using one address, the four data
words (each having the width of the data path) are stored into the
register in the order presented. Using the other address, prior to storing
into the buffer register, a transpose is performed on the four halfwords
of the higher order two data words. A similar transpose is performed on
the four halfwords of the lower order two data words. The latter mode,
together with the corner turn memory allows pixels of a macroblock to be
read from, or stored into, the scratch pad memory either in row order or
in column order.
In accordance with another aspect of the present Invention, the pixels of a
macroblock are stored in one of two arrangements in the external dynamic
random access memory. Under one arrangement, called the "scan-line" mode,
four horizontally adjacent pixels are accessed at a time. Under the other
arrangement, which is suitable for fetching reference pixels in motion
estimation, pixels are fetched in tiles (4 by 4 pixels) in column order. A
novel address generation scheme is provided to access either the memory
for scan-line elements or for quad pels. Since most filtering involves
quad pels (2.times.2 pixels), the quad pel mode arrangement is efficient
in access time and storage, and avoids rearrangement and complex address
decoding.
In accordance with another aspect of the present invention, the operand
input terminals of the arithmetic and logic unit in the process is
provided a set of "byte multiple | | |