WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Structure and method for a multistandard video encoder/decoder    
United States Patent5598514   
Link to this pagehttp://www.wikipatents.com/5598514.html
Inventor(s)Purcell; Stephen C. (Mountain View, CA); Le Gall; Didier J. (Los Altos, CA); Bose; Subroto (Santa Clara, CA)
AbstractA structure and a format provide a video signal encoder under the MPEG (Motion Picture Experts Group) standard. In one embodiment, the video signal interface is provided with a decimator for providing input filtering for the incoming signals. In one embodiment, the central processing unit (CPU) and multiple coprocessors implements discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) and other signal processing functions, generating variable length codes, and provides motion estimation and memory management. The instruction set of the central processing unit provides numerous features in support for such features as alpha filtering, eliminating redundancies in video signals derived from motion pictures and scene analysis. In one embodiment, a matcher evaluates 16 absolute differences to evaluate a "patch" of eight motion vectors at a time.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Purcell; Stephen C. (Mountain View, CA); Le Gall; Didier J. (Los Altos, CA); Bose; Subroto (Santa Clara, CA)
Owner/Assignee     C-Cube Microsystems (Milpitas)
Patent assignment
All assignments
Publication Date     January 28, 1997
Application Number     08/105,253
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 9, 1993
US Classification     345/418 345/474 345/501 715/719
Int'l Classification     G06T 001/00
Examiner     Jankus; Almis R.
Assistant Examiner    
Attorney/Law Firm     Friel, Kwok; Edward C. Skjerven, Morrill, MacPherson, Franklin &
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/118 395/114 395/128 395/133 395/152 395/153 395/154 395/152 395/153 395/154 358/133 358/134 358/135 358/136 358/137 358/138 358/140 358/141 358/142 382/232 382/233 382/234 382/235 382/236 382/237 382/238 382/239 382/240 382/232 382/233 382/234 382/235 382/236 382/237 382/238 382/239 382/240
Patent Tags     multistandard video encoder/decoder
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
3252148



[0 after 0 votes]
5335321
Harney

Aug,1994

[0 after 0 votes]
5231484
Gonzales
375/240.04
Jul,1993

[0 after 0 votes]
5099322
Gove
348/700
Mar,1992

[0 after 0 votes]
5049993
LeGall
348/448
Sep,1991

[0 after 0 votes]
5014187
Debize
710/66
May,1991

[0 after 0 votes]
4973860
Ludwig
327/145
Nov,1990

[0 after 0 votes]
4935942
Hwang
375/354
Jun,1990

[0 after 0 votes]
4870563
Oguchi
712/224
Sep,1989

[0 after 0 votes]
4838685
Martinez

Jun,1989

[0 after 0 votes]
4816914
Ericsson
375/240.03
Mar,1989

[0 after 0 votes]
4779190
O'Dell
710/66
Oct,1988

[0 after 0 votes]
4591976
Webber
714/20
May,1986

[0 after 0 votes]
4559608
Young
708/702
Dec,1985

[0 after 0 votes]
4514808
Murayama
710/307
Apr,1985

[0 after 0 votes]
4489395
Sato
712/38
Dec,1984

[0 after 0 votes]
3812467
Batcher
712/300
May,1974

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A structure for encoding digitized video signals representing a series of frames of images, said digitized video signals being stored in an external memory system, said structure comprising:

a first and a second video ports, each video port being configurable to be either an input port or an output port for video signals;

a host bus interface circuit for interfacing with an external host computer;

a scratch-pad memory for storing a portion of said series of frames of images;

a processor for arithmetic and logic operations, wherein said processor computing coefficients of a discrete cosine transform of said portion of said series of frames of images, and for applying a quantization step for said coefficients to obtained quantized coefficients under a lossy compression algorithm;

a motion estimation unit for matching objects in motion between said frames of images, said motion estimation unit providing as data output motion vectors representing said motion of said objects in motion between said frames of images;

a variable-length coding unit for applying an entropy coding scheme on said quantized coefficients and said motion vectors to represent said video signals;

a global bus accessible by said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit, said global bus providing data transfer among said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit;

a processor bus having a higher bandwidth than said global bus for providing data transfer among said processor, said scratch-pad memory, and said variable-length coding unit; and

a memory controller for (a) controlling data transfers between said external memory and said structure, and (b) for controlling the uses of said global bus and said processor bus.

2. A structure as in claim 1, wherein said processor comprises:

an instruction memory for storing instructions executable by said processor;

a register file including a predetermined number of registers for storing operands;

an arithmetic and logic unit for providing arithmetic and logic operations for operands in said register file; and

a multiplication unit for performing multiplication operations among said operands and a result of said arithmetic and logic operations.

3. A structure as in claim 1, for encoding digitized video signals representing a series of frames of images, said digitized video signals being stored in an external memory system, said structure comprising:

a first and a second video ports, each video port being configurable to be either an input port or an output port for video signals;

a host bus interface circuit for interfacing with an external host computer;

a scratch-pad memory for storing a portion of said series of frames of images;

a processor for arithmetic and logic operations, wherein said processor computing coefficients of a discrete cosine transform of said portion of said series of frames of images, and for applying a quantization step for said coefficients to obtained quantized coefficients under a lossy compression algorithm;

a motion estimation unit for matching objects in motion between said frames of images, said motion estimation unit providing as data output motion vectors representing said motion of said objects in motion between said frames of images;

a variable-length coding unit for applying an entropy coding scheme on said quantized coefficients and said motion vectors to represent said video signals;

a global bus accessible by said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit, said global bus providing data transfer among said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit;

a processor bus having a higher bandwidth than said global bus for providing data transfer among said processor, said scratch-pad memory, and said variable-length coding unit; and

a memory controller for (a) controlling data transfers between said external memory and said structure, and (b) for controlling the uses of said global bus and said processor bus;

wherein said motion estimation unit comprises:

a window memory for storing a second portion of said series of frames of images, said second portion being a subset of said portion of said series of frames of images stored in said scratch-pad memory, said second portion of said series of frames of images including video data from a current frame and video data from a reference frame; and

a matcher for matching said video data from said current frame and said video data from said reference frame to evaluate a predetermined number of motion vectors.

4. A structure for encoding digitized video signals representing a series of frames of images, said digitized video signals being stored in an external memory system, said structure comprising:

a first and a second video ports, each video port being configurable to be either an input port or an output port for video signals;

a host bus interface circuit for interfacing with an external host computer;

a scratch-pad memory for storing a portion of said series of frames of images;

a processor for arithmetic and logic operations, wherein said processor computing coefficients of a discrete cosine transform of said portion of said series of frames of images, and for applying a quantization step for said coefficients to obtained quantized coefficients under a lossy compression algorithm;

a motion estimation unit for matching objects in motion between said frames of images, said motion estimation unit providing as data output motion vectors representing said motion of said objects in motion between said frames of images;

a variable-length coding unit for applying an entropy coding scheme on said quantized coefficients and said motion vectors to represent said video signals;

a global bus accessible by said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit, said global bus providing data transfer among said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit;

a processor bus having a higher bandwidth than said global bus for providing data transfer among said processor, said scratch-pad memory, and said variable-length coding unit; and

a memory controller for (a) controlling data transfers between said external memory and said structure, and (b) for controlling the uses of said global bus and said processor bus; wherein said first video port comprises a decimation filter for reducing the resolution of said video signals.

5. A system comprising a first and a second structures, each structure encoding digitized video signals representing a series of frames of images, said digitized video signals being stored in an external memory system, each structure comprising:

a first and a second video ports, each video port being configurable to be either an input port or an output port for video signals;

a host bus interface circuit for interfacing with an external host computer;

a scratch-pad memory for storing a portion of said series of frames of images;

a processor for arithmetic and logic operations, wherein said processor computing coefficients of a discrete cosine transform of said portion of said series of frames of images, and for applying a quantization step for said coefficients to obtained quantized coefficients under a lossy compression algorithm;

a motion estimation unit for matching objects in motion between said frames of images, said motion estimation unit providing as data output motion vectors representing said motion of said objects in motion between said frames of images;

a variable-length coding unit for applying an entropy coding scheme on said quantized coefficients and said motion vectors to represent said video signals;

a global bus accessible by said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit, said global bus providing data transfer among said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit;

a processor bus having a higher bandwidth than said global bus for providing data transfer among said processor, said scratch-pad memory, and said variable-length coding unit; and

a memory controller for (a) controlling data transfers between said external memory and said structure, and (b) for controlling the uses of said global bus and said processor bus;

wherein said first video port of said first structure and said first video port of said second structure are connected to receive said video signals, and said second video port of said first structure and said second video port of said second structure are connected to pass said video data between said first structure and said second structure.

6. A method for encoding digitized video signals representing a series of frames of images, said digitized video signals being stored in an external memory system, said method comprising the steps of:

providing a first and a second video ports, each video port being configurable to be either an input port or an output port for video signals;

using a host bus interface circuit to interface with an external host computer;

storing a portion of said series of frames of images in a scratch-pad memory;

providing a processor for arithmetic and logic operations, wherein said processor computing coefficients of a discrete cosine transform of said portion of said series of frames of images, and for applying a quantization step for said coefficients to obtained quantized coefficients under a lossy compression algorithm;

matching objects in motion between said frames of images using a motion estimation unit, said motion estimation unit providing as data output motion vectors representing said motion of said objects in motion between said frames of images;

applying in a variable-length coding unit an entropy coding scheme on said quantized coefficients and said motion vectors to represent said video signals;

providing a global bus accessible by said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit, said global bus providing data transfer among said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit;

providing a processor bus having a higher bandwidth than said global bus for providing data transfer among said processor, said scratch-pad memory, and said variable-length coding unit; and

providing a memory controller for (a) controlling data transfers between said external memory and said structure, and (b) for controlling the uses of said global bus and said processor bus.

7. A method as in claim 6, wherein said step of providing a processor comprises the steps of:

storing instructions executable by said processor in an instruction memory;

storing operands in a register file including a predetermined number of registers;

providing arithmetic and logic operations in an arithmetic and logic unit for operands in said register file; and

performing multiplication operations among said operands in a multiplication unit and a result of said arithmetic and logic operations.

8. A method for encoding digitized video signals representing a series of frames of images, said digitized video signals being stored in an external memory system, said method comprising the steps of:

providing a first and a second video ports, each video port being configurable to be either an input port or an output port for video signals;

using a host bus interface circuit to interface with an external host computer;

storing a portion of said series of frames of images in a scratch-pad memory;

providing a processor for arithmetic and logic operations, wherein said processor computing coefficients of a discrete cosine transform of said portion of said series of frames of images, and for applying a quantization step for said coefficients to obtained quantized coefficients under a lossy compression algorithm;

matching objects in motion between said frames of images using a motion estimation unit, said motion estimation unit providing as data output motion vectors representing said motion of said objects in motion between said frames of images;

applying in a variable-length coding unit an entropy coding scheme on said quantized coefficients and said motion vectors to represent said video signals;

providing a global bus accessible by said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit, said global bus providing data transfer among said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit;

providing a processor bus having a higher bandwidth than said global bus for providing data transfer among said processor, said scratch-pad memory, and said variable-length coding unit; and

providing a memory controller for (a) controlling data transfers between said external memory and said structure, and (b) for controlling the uses of said global bus and said processor bus;

wherein said matching step comprises the steps of:

storing in a window memory a second portion of said series of frames of images, said second portion being a subset of said portion of said series of frames of images stored in said scratch-pad memory, said second portion of said series of frames of images including video data from a current frame and video data from a reference frame; and

matching in a matcher said video data from said current frame and said video data from said reference frame to evaluate a predetermined number of motion vectors.

9. A method for encoding digitized video signals representing a series of frames of images, said digitized video signals being stored in an external memory system, said method comprising the steps of:

providing a first and a second video ports, each video port being configurable to be either an input port or an output port for video signals;

using a host bus interface circuit to interface with an external host computer;

storing a portion of said series of frames of images in a scratch-pad memory;

providing a processor for arithmetic and logic operations, wherein said processor computing coefficients of a discrete cosine transform of said portion of said series of frames of images, and for applying a quantization step for said coefficients to obtained quantized coefficients under a lossy compression algorithm;

matching objects in motion between said frames of images using a motion estimation unit, said motion estimation unit providing as data output motion vectors representing said motion of said objects in motion between said frames of images;

applying in a variable-length coding unit an entropy coding scheme on said quantized coefficients and said motion vectors to represent said video signals;

providing a global bus accessible by said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit, said global bus providing data transfer among said first and second video port, said host bus interface, said scratch-pad memory, said processor, said motion estimation unit, and said variable-length coding unit;

providing a processor bus having a higher bandwidth than said global bus for providing data transfer among said processor, said scratch-pad memory, and said variable-length coding unit; and

providing a memory controller for (a) controlling data transfers between said external memory and said structure, and (b) for controlling the uses of said global bus and said processor bus;

wherein said step of providing provides a first video port comprising a decimation filter for reducing the resolution of said video signals.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to integrated circuit designs; and, in particular, the present invention relates to integrated circuit designs for image processing.

2. Discussion of the Related Art

The Motion Picture Experts Group (MPEG) is an international committee charged with providing a standard (hereinbelow "MPEG standard") for achieving compatibility between image compression and decompression equipment. This standard specifies both the coded digital representation of video signal for the storage media, and the method for decoding. The representation supports normal speed playback, as well as other playback modes of color motion pictures, and reproduction of still pictures. The MPEG standard covers the common 525- and 625-line television, personal computer and workstation display formats. The MPEG standard is intended for equipment supporting continuous transfer rate of up to 1.5 Mbits per second, such as compact disks, digital audio tapes, or magnetic hard disks. The MPEG standard is intended to support picture frames of approximately 288.times.352 pixels each at a rate between 24 Hz and 30 Hz. A publication by MPEG entitled "Coding for Moving Pictures and Associated Audio for digital storage medium at 1.5 Mbit/s," included herein as Appendix A, provides in draft form the proposed MPEG standard, which is hereby incorporated by reference in its entirety to provide detailed information about the MPEG standard.

Under the MPEG standard, the picture is divided into a matrix of "Macroblock slices" (MBS), each MBS containing a number of picture areas (called "macroblocks") each covering an area of 16.times.16 pixels. Each of these picture areas is further represented by one or more 8.times.8 matrices which elements are the spatial luminance and chrominance values. In one representation (4:2:2) of the macroblock, a luminance value (Y type) is provided for every pixel in the 16.times.16-pixel picture area (i.e. in four 8.times.8 "Y" matrices), and chrominance values of the U and V (i.e., blue and red chrominance) types, each covering the same 16.times.16 picture area, are respectively provided in two 8.times.8 "U" and two 8.times.8 "V" matrices. That is, each 8.times.8 U or V matrix has a lower resolution than its luminance counterpart and covers an area of 8.times.16 pixels. In another representation (4:2:0), a luminance value is provided for every pixel in the 16.times.16 pixels picture area, and one 8.times.8 matrix for each of the U and V types is provided to represent the chrominance values of the 16.times.16-pixel picture area. A group of four contiguous pixels in a 2.times.2 configuration is called a "quad pixel"; hence, the macroblock can also be thought of as comprising 64 quad pixels in an 8.times.8 configuration.

The MPEG standard adopts a model of compression and decompression based on lossy compression of both interframe and intraframe information. To compress interframe information, each frame is encoded in one of the following formats: "intra", "predicted", or "interpolated". Intra encoded frames are least frequently provided, the predicted frames are provided more frequently than the intra frames, and all the remaining frames are interpolated frames. In a prediction frame ("P-picture"), only the incremental changes in pixel values from the last I- picture or P-picture are coded. In an interpolation frame ("B-picture"), the pixel values are encoded with respect to both an earlier frame and a later frame. By encoding frames incrementally, using predicted and interpolated frames, the redundancy between frames can be eliminated, resulting in a high efficiency in data storage. Under the MPEG, the motion of an object moving from one screen position to another screen position can be represented by motion vectors. A motion vector provides a shorthand for encoding a spatial translation of a group of pixels, typically a macroblock.

The next steps in compression under the MPEG standard provide lossy compression of intraframe information. In the first step, a 2-dimensional discrete cosine transform (DCT) is performed on each of the 8.times.8 pixel matrices to map the spatial luminance or chrominance values into the frequency domain.

Next, a process called "quantization" weights each element of the 8.times.8 transformed matrix, consisting of 1 "DC" value and sixty-three "AC" values, according to whether the pixel matrix is of the chrominance or the luminance type, and the frequency represented by each element of the transformed matrix. In an I-picture, the quantization weights are intended to reduce to zero many high frequency components to which the human eye is not sensitive. In P- and B- pictures, which contain mostly higher frequency components, the weights are not related to visual perception. Having created many zero elements in the 8.times.8 transformed matrix, each matrix can be represented without further information loss as an ordered list consisting of the "DC" value, and alternating pairs of a non-zero "AC" value and a length of zero elements following the non-zero value. The values on the list are ordered such that the elements of the matrix are presented as if the matrix is read in a zig.sub.-- zag manner (i.e., the elements of a matrix A are read in the order A00, A01, A10, A02, A11, A20 etc.). This representation is space efficient because zero elements are not represented individually.

Finally, an entropy encoding scheme is used to further compress, using variable-length codes, the representations of the DC coefficient and the AC value-run length pairs. Under the entropy encoding scheme, the more frequently occurring symbols are represented by shorter codes. Further efficiency in storage is thereby achieved.

The steps involved in compression under the MPEG standard are computationally intensive. For such a compression scheme to be practical and widely accepted, however, a high speed processor at an economical cost is desired. Such processor is preferably provided in an integrated circuit.

Other standards for image processing exist. These standards include JPEG ("Joint Photographic Expert Group") and CCITT H.261 (also known as "P.times.64"). These standards are available from the respective committees, which are international bodies well-known to those skilled in the art.

SUMMARY OF THE INVENTION

In accordance with the present invention, a structure and a method for encoding digitized video signals are provided. In one embodiment, the video signals are stored in an external memory system, and the present embodiment provides (a) two video ports each configurable to become either an input port or an output port for video signals; (b) a host bus interface circuit for interfacing with an external host computer; (c) a scratch-pad memory for storing a portion of the video image; (d) a processor for arithmetic and logic operations, which computes discrete cosine transforms and quantization on the video signals to obtain coefficients for compression under a lossy compression algorithm; (e) a motion estimation unit for matching objects in motion between frames of images of the video signals, and outputting motion vectors representing the motion of objects between frames; and (f) a variable-length coding unit for applying an entropy coding scheme on the quantized coefficients and motion vectors.

In one embodiment, a global bus is provided to be accessed by video ports, the host bus interface, the scratch-pad memory, the processor, the motion estimation unit, and the variable-length coding unit. The global bus provides data transfer among the functional units. In addition, in that embodiment, a processor bus having a higher bandwidth than the global bus is provided to allow higher band-width data transfer among the processor, the scratch-pad memory, and the variable-length coding units. A memory controller controls data transfers to and from the external memory while at the same time provides arbitration the uses of the global bus and the processor bus.

Multiple copies of the structure of the present invention can be provided to form a multiprocessor of video signals. Under such configuration, one of the video ports in each structure would be used to receive the incoming video signal, and the other video port would be used for communication between the structure and one or more of its neighboring structures.

In accordance with another aspect of the present invention, one of the two video port in one embodiment comprises a decimation filter for reducing the resolution of incoming video signals. In one embodiment, one of the video ports include an interpolator for restoring the reduced resolution video into a higher resolution upon video signal output.

In accordance with another aspect of the present invention, a memory with a novel address mechanism is provided to sort video signals arriving at the structure of the present invention in pixel interleaved order into several regions of the memory, such that the data in the several regions of this memory can be read in block interleaved order, which is used in subsequent signal processing steps used under various video processing standards, including MPEG.

In accordance with another aspect of the present invention, a synchronizer circuit synchronizes the system clock of one embodiment with an external video clock to which the incoming video signals are synchronized. The synchronization circuit provides for accurate detection of an edge transition in the external clock within a time period which is comparable with a flip-flop's metastable period, without requiring an extension of the system clock period.

In one embodiment of the present invention, a "corner turn" memory is provided. In this corner-turn memory, a selected region is mapped to two sets of addresses. Using an address in the first set of addresses, a row of memory cells are accessed. Using an address in the second set of addresses, a column of memory cells are accessed. The corner-turn memory is particularly useful for DCT and IDCT operations where each macroblock of pixels are accessed in two passes, one pass in column order, and the other pass in row order.

In accordance with another aspect of the present invention, a scratch pad memory having a width four times the data path of the processor is provided. In addition, two set of buffer registers, each set including registers of the width of the data path, are provided as buffers between the processor and the scratch pad memory. The buffer registers operates at the clock rate of the processor, while the scratch pad memory can operate at a lower clock rate. In this manner, the bandwidths of the processor and the scratch pad memory are matched without the use of expensive memory circuitry. Each set of buffer registers are either loaded from, or stored into, the scratch pad as a one register having the width of the scratch pad memory, but accessed by the processor individually as registers having the width of the data path. In one set of the buffer registers, each register is provided with two addresses. Using one address, the four data words (each having the width of the data path) are stored into the register in the order presented. Using the other address, prior to storing into the buffer register, a transpose is performed on the four halfwords of the higher order two data words. A similar transpose is performed on the four halfwords of the lower order two data words. The latter mode, together with the corner turn memory allows pixels of a macroblock to be read from, or stored into, the scratch pad memory either in row order or in column order.

In accordance with another aspect of the present Invention, the pixels of a macroblock are stored in one of two arrangements in the external dynamic random access memory. Under one arrangement, called the "scan-line" mode, four horizontally adjacent pixels are accessed at a time. Under the other arrangement, which is suitable for fetching reference pixels in motion estimation, pixels are fetched in tiles (4 by 4 pixels) in column order. A novel address generation scheme is provided to access either the memory for scan-line elements or for quad pels. Since most filtering involves quad pels (2.times.2 pixels), the quad pel mode arrangement is efficient in access time and storage, and avoids rearrangement and complex address decoding.

In accordance with another aspect of the present invention, the operand input terminals of the arithmetic and logic unit in the process is provided a set of "byte multiple