WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Motion vector encoding circuit and method thereof    
United States Patent5608656   
Link to this pagehttp://www.wikipatents.com/5608656.html
Inventor(s)Purcell; Stephen C. (Mountain View, CA); Bose; Subroto (Santa Clara, CA)
AbstractA structure and a format for providing a video signal encoder under the MPEG standard are provided. In one embodiment, the video signal interface is provided with a decimator for providing input filtering for the incoming signals. In one embodiment, the central processing unit (CPU) and multiple coprocessors implements DCT and IDCT and other signal processing functions, generating variable length codes, and provides motion estimation and memory management. The instruction set of the central processing unit provides numerous features in support for such features as alpha filtering, eliminating redundancies in video signals derived from motion pictures and scene analysis. In one embodiment, a matcher evaluates 16 absolute differences to evaluate a "patch" of eight motion vectors at a time.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5608656
Motion vector encoding circuit and method thereof - US Patent 5608656 Drawing
Motion vector encoding circuit and method thereof
Inventor     Purcell; Stephen C. (Mountain View, CA); Bose; Subroto (Santa Clara, CA)
Owner/Assignee     C-Cube Microsystems, Inc. (Milpitas, CA)
Patent assignment
All assignments
Publication Date     March 4, 1997
Application Number     08/520,318
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 28, 1995
US Classification     708/203
Int'l Classification     G06F 017/00
Examiner     Ramirez; Ellis B.
Assistant Examiner    
Attorney/Law Firm     Friel, Kwok; Edward C. Skjerven, Morrill, MacPherson, Franklin &
Address
Parent Case     This application is a division of application Ser. No. 08/105,253, filed Aug. 9, 1993.
Priority Data    
USPTO Field of Search     364/514 R 364/715.02 348/402 348/404 348/413 348/414 348/699 348/700 348/394
Patent Tags     motion vector encoding circuit
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5488419
Hui
375/240.17
Jan,1996

[0 after 0 votes]
5475430
Hamada

Dec,1995

[0 after 0 votes]
5461423
Tsukagoshi
375/240.17
Oct,1995

[0 after 0 votes]
5408274
Chang
348/700
Apr,1995

[0 after 0 votes]
5400076
Iwamura
375/240.15
Mar,1995

[0 after 0 votes]
5317397
Odaka
375/240.15
May,1994

[0 after 0 votes]
5247586
Gobert
382/278
Sep,1993

[0 after 0 votes]
5157732
Ishii
382/107
Oct,1992

[0 after 0 votes]
5134480
Wang
348/620
Jul,1992

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A structure for encoding by motion vectors a current frame of video data using a reference frame of video data, said current frame and said reference frame each being divided into multiple rows and multiple columns of macroblocks, said macroblocks of said current frame being designated "current macroblocks" and said macroblocks of said reference frame being designated "reference macroblocks", each macroblock representing a number of pixels in the corresponding frame, said structure comprising:

a memory circuit for storing (a) a plurality n of adjacent current macroblocks from a row j of current macroblocks in said current frame, beginning at column p of current macroblocks said current frame, said plurality of said adjacent current macroblocks being designated C.sub.j,p, C.sub.j,p+1, . . . , C.sub.j,p+n-1 in the order along one direction of said row of macroblocks; and (b) a plurality of adjacent reference macroblocks from a first column i of reference macroblocks in said reference frame, beginning at row q of reference macroblocks in said reference frame, said plurality of reference macroblocks being designated R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i, said plurality of adjacent reference macroblocks being reference macroblocks within the range of said motion vectors, each of said current macroblocks being substantially equidistance from said R.sub.q,i and R.sub.q+m-1,i reference macroblocks;

means for providing each current macroblock a set of scores, each score corresponding to one of said motion vectors and measuring a difference in pixel values between said current macroblock and a reference macroblock linked by said motion vector;

means for selecting, using said set of scores, a motion vector corresponding to the least of said differences measured; and

means for replacing in said memory circuit (a) the current macroblock C.sub.j,p with a current macroblock C.sub.j,p+n, said current macroblock C.sub.j,p+n being the current macroblock adjacent said macroblock C.sub.j,p+n-1 in said direction; and (b) said first column of adjacent reference macroblocks R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i with a second column of adjacent reference macroblocks R.sub.q,i+1, R.sub.q+1,i+1, . . . , R.sub.q+m-1,i+1, said second column being adjacent said first column in said direction.

2. A structure as in claim 1, wherein said memory circuit, instead of storing a row of current macroblocks and a column of reference macroblocks, stores a column of current macroblocks and a row of reference macroblocks.

3. A structure as in claim 1, further comprising a counter for controlling said structure, said means for providing each current macroblock a set of score providing said scores one score at a time, said counter including a first field and a second field representing respectively the current macroblock and the reference macroblock involved in the score currently being provided; wherein, at any given time, said first field takes, as its value, one of a first number of values, said first number of values equalling the number of current macroblocks in said memory circuit, and said second field takes, as its value, one of a second number of values, said second number of values equalling the number of reference macroblocks in said memory circuit.

4. A structure as in claim 3, wherein said first and second number of values being programmable.

5. A structure as in claim 3, further comprising a third and a fourth field representing, respectively, the row number j of current macroblocks in said current frame and the column number i of said reference macroblocks in said reference frame, wherein, at any given time, said third field takes, as its value, one of a third number of values, said third number of values equalling the number of rows in said current frame, and said fourth field takes, as its value, one of a fourth number of values, said fourth number of values equalling the number of columns in said reference frame.

6. A method for encoding by motion vectors a current frame of video data using a reference frame of video data, said current frame and said reference frame each being divided into multiple rows and multiple columns of macroblocks, said macroblocks of said current frame being designated "current macroblocks" and said macroblocks of said reference frame being designated "reference macroblocks", each macroblock representing a number of pixels in the corresponding frame, said method comprising the steps of:

storing in a memory circuit (a) a plurality n of adjacent current macroblocks from a row j of current macroblocks in said current frame, beginning at column p of current macroblocks in said current frame, said plurality of said adjacent current macroblocks being designated C.sub.j,p, C.sub.j,p+1, . . . , C.sub.j,p+n-1 in the order along one direction of said row of macroblocks; and (b) a plurality of adjacent reference macroblocks from a first column i of reference macroblocks in said reference frame, beginning at row q of reference macroblocks in said reference frame, said plurality of reference macroblocks being designated R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i, said plurality of adjacent reference macroblocks being reference macroblocks within the range of said motion vectors, each of said current macroblocks being substantially equidistance from said R.sub.q,i and R.sub.q+m-1,i reference macroblocks;

providing each current macroblock a set of scores, each score corresponding to one of said motion vectors and measuring a difference in pixel values between said current macroblock and a reference macroblock linked by said motion vector;

selecting, using said set of scores, a motion vector corresponding to the least of said differences measured; and

replacing in said memory circuit (a) the current macroblock C.sub.j,p with a current macroblock C.sub.j,p+n, said current macroblock C.sub.j,p+n being the current macroblock adjacent said macroblock C.sub.j,p+n-1 in said direction; and (b) said first column of adjacent reference macroblocks R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i with a second column of adjacent reference macroblocks R.sub.q,i+1, R.sub.q+1,i+1, . . . , R.sub.q+m-1,i+1, said second column being adjacent said first column in said direction.

7. A method as in claim 6, wherein said step of storing in a memory circuit, instead of storing a row of current macroblocks and a column of reference macroblocks, stores a column of current macroblocks and a row of reference macroblocks.

8. A method as in claim 6, further comprising the step of using a counter for controlling, said step of providing each macroblock a set of scores providing said scores one score at a time, said counter including a first field and a second field representing respectively the current macroblock and the reference macroblock involved in the score currently being provided; wherein, at any given time, said first field takes, as its value, one of a first number of values, said first number of values equalling the number of current macroblocks in said memory circuit, and said second field takes, as its value, one of a second number of values, said second number of values equalling the number of reference macroblocks in said memory circuit.

9. A method as in claim 8, wherein said first and second number of values being programmable.

10. A method as in claim 8, further comprising the step of allocating in said counter a third and a fourth field representing, respectively, the row number j of said current macroblocks in said current frame and the column number i of said reference macroblocks in said reference frame, wherein, at any given time, said third field takes, as its value, one of a third number of values, said third number of values equalling the number of rows in said current frame, and said fourth field takes, as its value, one of a fourth number of values, said fourth number of values equalling the number of columns in said reference frame.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to integrated circuit designs; and, in particular, the present invention relates to integrated circuit designs for image processing.

2. Discussion of the Related Art

The Motion Picture Experts Group (MPEG) is an international committee charged with providing a standard (hereinbelow "MPEG standard") for achieving compatibility between image compression and decompression equipment. This standard specifies both the coded digital representation of video signal for the storage media, and the method for decoding. The representation supports normal speed playback, as well as other playback modes of color motion pictures, and reproduction of still pictures. The MPEG standard covers the common 525- and 625-line television, personal computer and workstation display formats. The MPEG standard is intended for equipment supporting continuous transfer rate of up to 1.5 Mbits per second, such as compact disks, digital audio tapes, or magnetic hard disks. The MPEG standard is intended to support picture frames of approximately 288.times.352 pixels each at a rate between 24 Hz and 30 Hz. A publication by MPEG entitled "Coding for Moving Pictures and Associated Audio for digital storage medium at 1.5Mbit/s," included herein as Appendix A, provides in draft form the proposed MPEG standard, which is hereby incorporated by reference in its entirety to provide detailed information about the MPEG standard.

Under the MPEG standard, the picture is divided into a matrix of "Macroblock slices" (MBS), each MBS containing a number of picture areas (called "macroblocks") each covering an area of 16.times.16 pixels. Each of these picture areas is further represented by one or more 8.times.8 matrices which elements are the spatial luminance and chrominance values. In one representation (4:2:2) of the macroblock, a luminance value (Y type) is provided for every pixel in the 16.times.16-pixel picture area (i.e. in four 8.times.8 "Y" matrices), and chrominance values of the U and V (i.e., blue and red chrominance) types, each covering the same 16.times.16 picture area, are respectively provided in two 8.times.8 "U" and two 8.times.8 "V" matrices. That is, each 8.times.8 U or V matrix has a lower resolution than its luminance counterpart and covers an area of 8.times.16 pixels. In another representation (4:2:0), a luminance value is provided for every pixel in the 16.times.16 pixels picture area, and one 8.times.8 matrix for each of the U and V types is provided to represent the chrominance values of the 16.times.16-pixel picture area. A group of four contiguous pixels in a 2.times.2 configuration is called a "quad pixel"; hence, the macroblock can also be thought of as comprising 64 quad pixels in an 8.times.8 configuration.

The MPEG standard adopts a model of compression and decompression based on lossy compression of both interframe and intraframe information. To compress interframe information, each frame is encoded in one of the following formats: "intra", "predicted", or "interpolated". Intra encoded frames are least frequently provided, the predicted frames are provided more frequently than the intra frames, and all the remaining frames are interpolated frames. In a prediction frame ("P-picture"), only the incremental changes in pixel values from the last I-picture or P-picture are coded. In an interpolation frame ("B-picture"), the pixel values are encoded with respect to both an earlier frame and a later frame. By encoding frames incrementally, using predicted and interpolated frames, the redundancy between frames can be eliminated, resulting in a high efficiency in data storage. Under the MPEG, the motion of an object moving from one screen position to another screen position can be represented by motion vectors. A motion vector provides a shorthand for encoding a spatial translation of a group of pixels, typically a macroblock.

The next steps in compression under the MPEG standard provide lossy compression of intraframe information. In the first step, a 2-dimensional discrete cosine transform (DCT) is performed on each of the 8.times.8 pixel matrices to map the spatial luminance or chrominance values into the frequency domain.

Next, a process called "quantization" weights each element of the 8.times.8 transformed matrix, consisting of 1 "DC" value and sixty-three "AC" values, according to whether the pixel matrix is of the chrominance or the luminance type, and the frequency represented by each element of the transformed matrix. In an I-picture, the quantization weights are intended to reduce to zero many high frequency components to which the human eye is not sensitive. In P- and B- pictures, which contain mostly higher frequency components, the weights are not related to visual perception. Having created many zero elements in the 8.times.8 transformed matrix, each matrix can be represented without further information loss as an ordered list consisting of the "DC" value, and alternating pairs of a non-zero "AC" value and a length of zero elements following the non-zero value. The values on the list are ordered such that the elements of the matrix are presented as if the matrix is read in a zig.sub.-- zag manner (i.e., the elements of a matrix A are read in the order A00, A01, A10, A02, A11, A20 etc.). This representation is space efficient because zero elements are not represented individually.

Finally, an entropy encoding scheme is used to further compress, using variable-length codes, the representations of the DC coefficient and the AC value-run length pairs. Under the entropy encoding scheme, the more frequently occurring symbols are represented by shorter codes. Further efficiency in storage is thereby achieved.

The steps involved in compression under the MPEG standard are computationally intensive. For such a compression scheme to be practical and widely accepted, however, a high speed processor at an economical cost is desired. Such processor is preferably provided in an integrated circuit.

Other standards for image processing exist. These standards include JPEG ("Joint Photographic Expert Group") and CCITT H.261 (also known as "P.times.64"). These standards are available from the respective committees, which are international bodies well-known to those skilled in the art.

SUMMARY OF THE INVENTION

In accordance with the present invention, a structure and a method for encoding digitized video signals are provided. In one embodiment, the video signals are stored in an external memory system, and the present embodiment provides (a) two video ports each configurable to become either an input port or an output port for video signals; (b) a host bus interface circuit for interfacing with an external host computer; (c) a scratch-pad memory for storing a portion of the video image; (d) a processor for arithmetic and logic operations, which computes discrete cosine transforms and quantization on the video signals to obtain coefficients for compression under a lossy compression algorithm; (e) a motion estimation unit for matching objects in motion between frames of images of the video signals, and outputting motion vectors representing the motion of objects between frames; and (f) a variable-length coding unit for applying an entropy coding scheme on the quantized coefficients and motion vectors.

In one embodiment, a global bus is provided to be accessed by video ports, the host bus interface, the scratch-pad memory, the processor, the motion estimation unit, and the variable-length coding unit. The global bus provides data transfer among the functional units. In addition, in that embodiment, a processor bus having a higher bandwidth than the global bus is provided to allow higher band-width data transfer among the processor, the scratch-pad memory, and the variable-length coding units. A memory controller controls data transfers to and from the external memory while at the same time provides arbitration the uses of the global bus and the processor bus.

Multiple copies of the structure of the present invention can be provided to form a multiprocessor of video signals. Under such configuration, one of the video ports in each structure would be used to receive the incoming video signal, and the other video port would be used for communication between the structure and one or more of its neighboring structures.

In accordance with another aspect of the present invention, one of the two video port in one embodiment comprises a decimation filter for reducing the resolution of incoming video signals. In one embodiment, one of the video ports include an interpolator for restoring the reduced resolution video into a higher resolution upon video signal output.

In accordance with another aspect of the present invention, a memory with a novel address mechanism is provided to sort video signals arriving at the structure of the present invention in pixel interleaved order into several regions of the memory, such that the data in the several regions of this memory can be read in block interleaved order, which is used in subsequent signal processing steps used under various video processing standards, including MPEG.

In accordance with another aspect of the present invention, a synchronizer circuit synchronizes the system clock of one embodiment with an external video clock to which the incoming video signals are synchronized. The synchronization circuit provides for accurate detection of an edge transition in the external clock within a time period which is comparable with a flip-flop's metastable period, without requiring an extension of the system clock period.

In one embodiment of the present invention, a "corner turn" memory is provided. In this corner-turn memory, a selected region is mapped to two set of addresses. Using an address in the first set of addresses, a row of memory cells are accessed. Using an address in the second set of addresses, a column of memory cells are accessed. The corner-turn memory is particularly useful for DCT and IDCT operations where each macroblock of pixels are accessed in two passes, one pass in column order, and the other pass in row order.

In accordance with another aspect of the present invention, a scratch pad memory having a width four times the data path of the processor is provided. In addition, two set of buffer registers, each set including registers of the width of the data path, are provided as buffers between the processor and the scratch pad memory. The buffer registers operates at the clock rate of the processor, while the scratch pad memory can operate at a lower clock rate. In this manner, the bandwidths of the processor and the scratch pad memory are matched without the use of expensive memory circuitry. Each set of buffer registers are either loaded from, or stored into, the scratch pad as a one register having the width of the scratch pad memory, but accessed by the processor individually as registers having the width of the data path. In one set of the buffer registers, each register is provided with two addresses. Using one address, the four data words (each having the width of the data path) are stored into the register in the order presented. Using the other address, prior to storing into the buffer register, a transpose is performed on the four halfwords of the higher order two data words. A similar transpose is performed on the four halfwords of the lower order two data words. The latter mode, together with the corner turn memory allows pixels of a macroblock to be read from, or stored into, the scratch pad memory either in row order or in column order.

In accordance with another aspect of the present invention, the pixels of a macroblock are stored in one of two arrangements in the external dynamic random access memory. Under one arrangement, called the "scan-line" mode, four horizontally adjacent pixels are accessed at a time. Under the other arrangement, which is suitable for fetching reference pixels in motion estimation, pixels are fetched in tiles (4 by 4 pixels) in column order. A novel address generation scheme is provided to access either the memory for scan-line elements or for quad pels. Since most filtering involves quad pels (2.times.2 pixels), the quad pel mode arrangement is efficient in access time and storage, and avoids rearrangement and complex address decoding.

In accordance with another aspect of the present invention, the operand input terminals of the arithmetic and logic unit in the process is provided a set of "byte multiplexors" for rearranging the four 9-bit bytes in each operand in any order. Because each 9-bit byte can be used to store the value of a pixel, so that the arithmetic and logic unit can operate on the pixels in a quad pel stored in a 36-bit operand simultaneously, the byte multiplexor allows rearranging the relative positions of the pixels within the 36-bit operands, numerous filtering operations can be achieved by simply setting the correct pixel configuration. In one embodiment, in accordance with the present invention, filters for performing pixel offsets, decimations, in either horizontal or vertical directions, or both are provided using the byte multiplexor. In addition, the present invention provides higher compression ratios, using novel functions for (a) activities analysis, used in applying adaptive control of quantization, and (b) scene analysis, used in reduction of interframe redundancy.

In accordance with another aspect of the present invention, a fast detector of a zero result in an adder is provided. The fast zero detector includes a number of "zero generator" circuits and a number of zero propagator circuits. The fast detector signals the presence of a zero result within, as a function of the length of the adder's operands, logarithm time, rather than linear time.

In accordance with another aspect of the present invention, the present invention provides a structure and a method for a non-linear "alpha" filter. Under this non-linear filter, thresholds T.sub.1 and T.sub.2 are set by the two parameters m and n. If the absolute difference between the two input values of the non-linear filter are less than T.sub.1 or greater than T.sub.2, a fixed relative weight are accorded the input values, otherwise a relative weight proportional to the absolute difference is accorded the input values. This non-linear filter finds numerous application in signal processing. In one embodiment, the non-linear filter is used in deinterlacing and temporal noise reduction applications.

In accordance with another aspect of the present invention, a structure for performing motion estimation is provided, including: (a) a memory for storing said macroblocks of a current frame and macroblocks of a reference frame; (b) a filter receiving a first group of pixels from the memory for resampling; and (c) a matcher receiving the resampled first group of pixels and a second group of pixels from a current macroblock, for evaluation of a number of motion vectors. The matcher provides a score representing the difference between the second group of pixels and the first group of pixels for each of the motion vectors evaluated. In this embodiment, the best score over a macroblock is selected as the motion vector for the macroblock. In one embodiment, the matcher evaluates 8 motion vectors at a time using a 2.times.8 "slice" of current pixels and a 4.times.12 pixel reference area.

In accordance with another aspect of the present invention, a structure is provided for encoding by motion vectors a current frame of video data, using a reference frame of video data. The structure includes a memory circuit for storing (a) adjacent current macroblocks from a row j of current macroblocks, designated C.sub.j,p, C.sub.j,p+1, . . . , C.sub.j,p+n-1 in the order along one direction of the row of macroblocks; and (b) adjacent reference macroblocks from a first column i of reference macroblocks, designated R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m,1,i and a second column C.sub.j+1,p C.sub.p+1,p+1, . . . , C.sub.j+1,p+n+1. The adjacent reference macroblocks are reference macroblocks within the range of the motion vectors, with each of said current macroblocks being substantially equidistance from the R.sub.q,i and Rq+.sub.m-1,i reference macroblocks. The structure of the present invention evaluates each of the adjacent current macroblocks against each of the adjacent reference macroblocks under the motion vectors, so as to select a motion vector representing the best match between each of said current macroblock and a corresponding one of said reference macroblocks. When evaluation of the current macroblock against the set of reference frame macroblock in the memory circuit is completed, the current macroblock C.sub.j,p is remove from the memory circuit and replaced by a current macroblock C.sub.j,p+n, said current macroblock C.sub.j,p+n being the current macroblock adjacent said macroblock C.sub.j,p+n-1. At the same time, the column of adjacent reference macroblocks R.sub.q,i, R.sub.q+1,i, . . . , R.sub.q+m-1,i are removed from the memory circuit and replaced by the next column of adjacent reference macroblocks R.sub.q,i+1, R.sub.q+1,i+1, . . . , R.sub.q+m-1,i+1. In this manner, each current macroblock, while in memory, is evaluated against the largest number of reference macroblocks which can be held in the memory circuit, thereby minimizing the number of time current and reference macroblocks have to be loaded into memory. Of course, for purely convenience reasons, the terms "rows" and "columns" are used to describe the relationship between current and reference macroblocks. It is understood that a column of current macroblocks can be evaluated against a row of reference macroblock, within the scope of the present invention.

In accordance with the present invention, the control structure for controlling evaluation of motion vectors is provided by a counter which includes first and second fields representing respectively the current macroblock and the reference macroblock being evaluated. Under the controlling scheme of one embodiment, each of the first and second fields are individually counted, such that when the first field reaches a maximum, a carry is generated to increment the count in the second field. The number of counts in the first and second fields are respectively, the number of current and reference macroblocks. In this manner, each current macroblock is evaluated completely with the reference macroblocks in the memory circuit.

In accordance with another aspect of the present invention, an adaptive thresholding circuit is provided in the zero-packing circuit prior to entropy encoding of the DCT coefficients into variable length code. In this adaptive threshold circuit, a current DCT coefficient is set to zero, if the immediately preceding and the immediately following DCT coefficients are both zero, and the current DCT coefficient is less than a programmable threshold. This thresholding circuit allows even higher compression ratio by extending a zero runlength.

The present invention is better understood upon consideration of the detailed description below and the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a is a block diagram of an embodiment of the present invention provided in an MPEG encoder chip 100.

FIG. 1b shows a multi-chip configuration in which two copies of chip 100, chips 100a and 100b, are used.

FIG. 1c is a map of chip 100's address space.

FIG. 2 is a block diagram of video port 107 of chip 100 shown in FIG. 1.

FIG. 3a shows a synchronization circuit 300 for synchronizing video data arrival at port 107 with an external video source, which provides video at 13.5 Mhz under 16-bit mode, and 27 Mhz under 8-bit mode.

FIG. 3b shows the times at which the samples of video clock signal Vclk indicated in FIG. 3a are obtained.

FIG. 4a is a timing diagram of video port 107 for latching video data provided at 13.5 Mhz on video bus 190a under 16-bit mode.

FIG. 4b is a timing diagram of video port 107 for latching video data provided at 27 Mhz on video bus 190a under 8-bit mode.

FIG. 5a shows the sequence in which 4:2:2 video data arrives at port 107.

FIG. 5b is a block diagram of decimator 204 of video port 107.

FIG. 5c is a tables showing, at each phase of the CIF decimation, the data output R.sub.out of register 201, the operand inputs A.sub.in and B.sub.in of 14-bit adder 504, the carry-in input C.sub.in, and the data output Dec of decimator 204.

FIG. 5d is a tables showing, at each phase of the CCR 601 decimation, the data output R.sub.out of register 201, the operand inputs A.sub.in and B.sub.in of 14-bit adder 504, the carry-in input C.sub.in, and the data output Dec of decimator 204.

FIG. 6a is a block diagram of interpolator 206.

FIG. 6b is an address map of video FIFO 205, showing the partition of video FIFO 205 into Y region 651, U region 652 and V region 653, and the storage locations of data in a data stream 654 received from decimator 204.

FIG. 6c illustrates the generation of addresses for accessing video FIFO 205 from the contents of address counter 207, during YUV separation, or during video output.

FIG. 6d illustrates the sequence in which stored and interpolated luminance and chrominance pixels are output under interpolation mode.

FIG. 6e shows two block interleaved groups 630 and 631 in video FIFO 205.

FIG. 7a is an overview of data flow between memory blocks relating to CPU 150.

FIG. 7b illustrates in further detail the data flow between P memory 702, QMEM 701, registers R0-R23, and scratch memory 159.

FIG. 7c shows the mappings of registers P4-P7 into the four physical registers corresponding to registers P0-P3.

FIG. 7d shows the mappings between direct and alias addresses of the higher 64 36-bit locations in SMEM 159.

FIG. 8a is a block diagram of memory controller 104, in accordance with the present invention.

FIG. 8b show a bit assignment diagram for the channel memory entries of channel 1.

FIG. 8c show a bit assignment diagram for the channel memory entries of channels 0, and 3-7.

FIG. 8d shows a bit assignment diagram for the channel memory entry of channel 2.

FIG. 9a shows chip 100 interfaced with an external 4-bank memory system 103 in a configuration 900.

FIG. 9b is a timing diagram for an interleaved access under "reference" mode of the memory system of configuration 900.

FIG. 9c is a timing diagram for an interleaved access under "scan-line" mode of the memory system of configuration 900.

FIGS. 10a and 10b shows pixel arrangements 1000a and 1000b, which are respectively provided to support scan-line mode operation and reference frame fetching during motion estimation.

FIG. 10c shows the logical addresses for scan-line mode access.

FIG. 10d shows the logical addresses for reference frame fetching.

FIG. 10e shows a reference frame fetch in which the reference frame crosses a memory page boundary.

FIGS. 11a and 11b are timing diagrams showing respectively data transfers between external memory 103 and SMEM 159 via QG register 810.

FIG. 12 illustrates the pipeline stages of CPU 150.

FIG. 13a shows a 32-bit zero-lookahead circuit 1300, comprising 32 generator circuits 1301 and propagator circuits.

FIG. 13b shows the logic circuits for generator circuit 1301 and propagator circuit 1302.

FIGS. 14a and 14b show schematically the byte multiplexors 1451 and 1452 of ALU 156.

FIG. 15a is a block diagram of arithmetic unit 750.

FIG. 15b is a schematic diagram of MAC 158.

FIG. 15c(i) illustrates an example of "alpha filtering" in the mixing filter for combining chroma during a deinterlacing operation.

FIG. 15c(ii) is a block diagram of a circuit 1550 for computing the value of alpha.

FIG. 15c(iii) shows the values of alpha obtainable from the various values of parameters m and n.

FIGS. 15d(i)-15d(iv) illustrates instructions using the byte multiplexors of arithmetic unit 750, using one mode selected from each of the HOFF, VOFF, HSHRINK and VSHRINK instructions, respectively.

FIG. 15e shows the pixels involved in computing activities of quad pels A and B as input to a STAT1 or STAT2 instruction.

FIG. 15f shows a macroblock of luminance data for which a measure of activity is computed using repeated calls to a STAT1 or a STAT2 instruction.

FIGS. 16a and 16b are respectively a block diagram and a data and control flow diagram of motion estimator 111.

FIG. 16c is a block diagram of window memory 705, showing odd and even banks 705a and 705b.

FIG. 16d shows how, in the present invention, vertical half-tiles of a macroblock are stored in odd and even memory banks of window memory 750.

FIG. 17 illustrates a 2-stage motion estimation algorithm which can be executed by motion estimator 111.

FIGS. 18a and 18b show, with respect to reference macroblocks, a decimated current macroblock and the range of a motion vector having an origin at the upper right corner of the current macroblock for the first stage of a B frame motion estimation and a P frame motion estimation respectively.

FIG. 18c shows, with respect to reference macroblocks, a full resolution current macroblock and the range of a motion vector having an origin at the upper right corner of the current macroblock for the second stage of motion estimation in both P-frame and B-frame motion estimations.

FIG. 18d shows the respectively locations of current and reference macroblocks in the first stage of a B frame motion estimation.

FIG. 18e shows the respective locations of current and reference macroblocks in the first stage of a P frame motion estimation.

FIG. 18f shows both a 4.times.4 tile current macroblock 1840 and a 5.times.5 tile reference region 1841 in the second stage of motion estimation.

FIG. 18g shows the fields of a state counter 1890 having programmable fields for control of motion estimation.

FIG. 18h shows the four possibilities by which a patch of motion vectors crosses a reference frame boundary.

FIG. 18i shows the twelve possible ways the reference frame boundary can intersect the reference and current macroblocks in window memory 705 under the first stage motion estimation for B-frames.

FIG. 18j shows, for each of the 12 cases shown in FIG. 18h, the INIT and WRAP values for each of the fields in state counter 1890.

FIG. 18k shows the twenty possible ways the reference frame boundary can intersect the current and reference macroblocks in window memory 705.

FIG. 18l shows, for each of the twenty cases shown in FIG. 18k, the corresponding INIT and WRAP values for each of the fields of state counter 1890.

FIGS. 18m-1 and 18m-2 show the clipping of motion estimation with respect to the reference frame boundary for either the second stage of a 2-stage motion estimation, or the third stage of a 3-stage motion estimation.

FIG. 18n provides the INIT and WRAP values for state counter 1890 corresponding to the reference frame boundary clipping shown in FIGS. 18m-1 and 18m-2.

FIG. 19a illustrates the algorithm used in matcher 1606 for evaluate eight motion vectors over eight cycles.

FIG. 19b shows the locations of the "patch" of eight motion vector evaluated for each slice of current pixels.

FIG. 19c shows the structure of matcher 1608.

FIG. 19d shows the pipeline in the motion estimator 111 formed by the registers in subpel filter 1606.

FIGS. 20a and 20b together form a block diagram of VLC 109.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

1. Overview

FIG. 1a is a block diagram of an embodiment of the present invention provided in an encoder/decoder integrated circuit 100 ("chip 100"). In this embodiment, chip 100 encodes or decodes bit stream compatible with MPEG, JPEG and CCITT H.64. As shown in FIG. 1a, chip 100 communicates through host bus interface 102 with a host computer (not shown) over 32-bit host bus 101. Host bus interface 102 implements the IEEE 1196 NuBus standard. In addition, chip 100 communicates with an external memory 103 (not shown) over 32-bit memory bus 105. Chip 100's access to external memory 103 is controlled by a memory controller 104, which includes dynamic random access memory (DRAM) controller 104a and direct memory access (DMA) controller 106. Chip 100 has two independent 16-bit bidirectional video ports 107 and 108 receiving and sending data on video busses 190a and 190b respectively. Video ports 107 and 108 are substantially identical, except that port 107 is provided with a decimation filter, and port 108 is provided with an interpolator. Both the decimator and the interpolator circuits of ports 107 and 108 are described in further detail below.

The functional units of chip 100 communicate over an internal global bus 120, these units include the central processing unit (CPU) 150, the variable-length code coder (VLC) 109, variable-length code decoder (VLD) 110, and motion estimator 111. Central processing unit 150 includes the processor status word register 151, which stores the state of CPU 150, instruction memory ("I mem") 152, instruction register 153, register file ("RMEM") 154, which includes 31 general purpose registers R1-R31, byte multiplexor 155, arithmetic logic unit ("ALU") 156, memory controller 104, multiplier-accumulator (MAC) 158, and scratch memory ("SMEM") 159, which includes address generation unit 160. Memory controller 104 provides access to external memory 103, including direct memory access (DMA) modes.

Global bus 120 is accessed by SMEM 159, motion estimator 111, VLC 109 and VLD 110, memory controller 104, instruction memory 152, host interface 102 and bidirectional video ports 107 and 108. A processor bus 180 is used for data transfer between SMEM 159, VLC 109 and VLD 110, and CPU 150.

During video operations, the host computer initializes chip 100 by loading the configuration registers in the functional units of chip 100, and maintains the bit streams sending to or receiving from video ports 107 and 108.

Chip 100 has an memory address space of 16 megabytes. A map of chip 100's address space is provided in FIG. 1c. As shown in FIG. 1c, chip 100 is assigned a base address. The memory space between the base address and the location (base address+7FFFFF.sup.1) is reserved for an external dynamic random access memory (DRAM). The memory space between location (base address+800000) to location (base address+9FFFFF) is reserved for registers addressable over global bus 120. The memory space between location (base address+A00000) and location (base address+BFFFFF) is reserved for registers