WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for a region-based approach to coding a sequence of video images    
United States Patent5608458   
Link to this pagehttp://www.wikipatents.com/5608458.html
Inventor(s)Chen; Homer H. (Lincroft, NJ); Ebrahimi; Touradj (Ecublens, CH); Haskell; Barin G. (Tinton Falls, NJ); Horne; Caspar (Beaverton, OR)
AbstractAn encoder segments frames in a sequence of digital images into multiple regions of arbitrary shape each of which has a corresponding motion vector relative to a previous decoded frame. A hierarchical multi-resolution motion estimation and segmentation technique, which segments the frame into multiple blocks and which assigns a best motion vector to each block is used. Blocks having the same or similar motion vector are then merged to form the arbitrarily-shaped regions. The shape of each region is coded, and a decision is made to code additional image data of each region in one of three modes. In a first inter-frame mode, a motion vector associated with a region is encoded. In a second inter-frame mode, a prediction error for the region is also encoded. In an intra-frame mode, the intensity of each picture element in the region is encoded. A region interior coder with frequency domain region-zeroing and space domain region-enforcing operations is employed for effectively coding the interior image data of the arbitrarily-shaped regions. The region interior coder uses an iterative technique based on the theory of successive projection onto convex sets (POCS) to find the best values for a group of selected transform coefficients. The coded information, including the shape of the region, the choice of the mode, and the motion vector and/or the region's interior image data, may then be transmitted to a decoder where the image can be reconstructed.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Chen; Homer H. (Lincroft, NJ); Ebrahimi; Touradj (Ecublens, CH); Haskell; Barin G. (Tinton Falls, NJ); Horne; Caspar (Beaverton, OR)
Owner/Assignee     Lucent Technologies Inc. (Murray Hill, NJ)
Patent assignment
All assignments
Publication Date     March 4, 1997
Application Number     08/322,893
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 13, 1994
US Classification     375/240.14 348/699 375/240.24
Int'l Classification     H04N 007/32
Examiner     Au; Amelia
Assistant Examiner    
Attorney/Law Firm    
Address
Parent Case    
Priority Data    
USPTO Field of Search     348/394 348/395 348/397 348/400 348/401 348/402 348/405 348/409 348/411 348/412 348/413 348/415 348/416 348/420 348/699 382/236 382/238 382/173 382/240
Patent Tags     region-based approach coding sequence of video images
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5440346
Alattar
375/240.24
Aug,1995

[0 after 0 votes]
5422963
Chen
382/232
Jun,1995

[0 after 0 votes]
5295201
Yokohama

Mar,1994

[0 after 0 votes]
5241383
Chen
375/240.04
Aug,1993

[0 after 0 votes]
5164828
Tahara
375/240.13
Nov,1992

[0 after 0 votes]
5117287
Koike
348/402.1
May,1992

[0 after 0 votes]
4922341
Strobach
375/240.24
May,1990

[0 after 0 votes]
4729021
Kondo
375/240.24
Mar,1988

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. An encoder for encoding a sequence of video frames comprising:

a segmentation unit which segments a current frame in said video sequence into a plurality of arbitrarily-shaped regions which may have different dimensions, each of said plurality of arbitrarily-shaped regions being assigned a motion vector;

a decoded frame memory for storing a previously decoded frame in said video sequence;

a prediction unit connected to said segmentation unit and said decoded frame memory for predicting image data of said current frame based upon a previously decoded frame and based upon the motion vector assigned to one of said plurality of arbitrarily-shaped regions;

a region shape coding unit for encoding the shape of each of said arbitrarily-shaped regions;

a mode decision unit which determines in which one of a plurality of modes image data from each of said plurality of arbitrarily-shaped regions is to be encoded, where said plurality of modes comprises an intra-frame mode in which the intensity of each pel in one of said plurality of arbitrarily-shaped regions is encoded;

a mode coding unit which encodes the mode in which each of said plurality of arbitrarily-shaped regions is to be encoded;

a motion coding unit for encoding motion vectors associated with said plurality of arbitrarily-shaped regions;

a region interior coder which encodes the intensity of each pel in one of said plurality of arbitrarily-shaped regions if the region is to be encoded in said intra-frame mode;

a buffer which serves as an interface for transmitting encoded information from said encoder; and

a rate controller which receives signals from said buffer, where said rate controller sends control signals to said segmentation unit, said mode decision unit, and said region interior unit in response to the signals received from said buffer.

2. The encoder of claim 1 wherein

said plurality of modes further comprises a first inter-frame mode in which a motion vector associated with one of said plurality of arbitrarily-shaped regions is encoded and a second inter-frame mode in which a motion vector and a motion compensated prediction error associated with one of said plurality of arbitrarily-shaped regions are encoded; and

said region interior coder encodes a motion compensated prediction error associated with one of said plurality of arbitrarily-shaped regions if the region is to be encoded in said second inter-frame mode.

3. The encoder of claim 1 wherein said segmentation unit comprises a joint motion estimation and segmentation unit for performing the following functions:

(a) dividing said current frame into a plurality of smaller regions of predetermined shape and size;

(b) performing for each of said smaller regions a motion vector updating routine by assigning to the smaller region a best motion vector selected from among an initial motion vector assigned to the smaller region, motion vectors of neighboring regions, and an updated motion vector obtained by performing a block matching technique for the smaller region, wherein the best motion vector is selected according to a priority scheme and a predetermined threshold value;

(c) dividing each of said smaller regions into a plurality of smaller regions of predetermined shape and size;

(d) repeating step (b) for each of the smaller regions resulting from step (c);

(e) iteratively repeating steps (c) and (d) for each smaller region resulting from step (d) until a stop condition is reached; and

(f) merging adjacent regions having similar motion vectors to form said arbitrarily-shaped regions.

4. The encoder of claim 3 wherein said joint motion estimation and segmentation unit comprises a best motion vector selection unit which performs the following functions:

determining the smallest matching error from among the matching errors obtained respectively by assigning to the smaller region the following motion vectors:

(a) the initial motion vector assigned to the smaller region;

(b) the updated motion vector obtained by performing a block matching technique for the smaller region; and

(c) the motion vectors of the smaller region's neighboring regions;

selecting the initial motion vector as the best motion vector if the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is less than the predetermined threshold value;

selecting the motion vector of one of the neighboring regions as the best motion vector if:

(a) the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is not less than the predetermined threshold value; and

(b) the absolute value of the difference between the smallest matching error and the matching error obtained by assigning to the smaller region the motion vector of the neighboring region is less than the predetermined threshold value; and

selecting the matched motion vector as the best motion vector if:

(a) the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is not less than the predetermined threshold value; and

(b) the absolute value of the difference between the smallest matching error and each of the matching errors obtained by assigning to the smaller region the motion vector of one of the neighboring region is not less than the predetermined threshold value.

5. The encoder of claim 3 wherein the step of dividing each of said smaller regions into a plurality of smaller regions comprises the step of dividing each of said smaller regions into four smaller square blocks of equal size.

6. The encoder of claim 5 wherein said region interior coder uses an iterative technique with frequency domain region-zeroing and space domain region-enforcing operations to transform an arbitrarily-shaped image into optimal transform coefficients (OTC).

7. The encoder of claim 6 wherein said region interior coder comprises:

an image circumscription unit for receiving image data of an arbitrarily-shaped region, and circumscribing a rectangular block around said arbitrarily-shaped region, thereby defining an original internal pel set and an original external pel set;

an extrapolator which extrapolates pel values of said internal pel set to initialize pel values of said external pel set;

a forward transform which transforms said image to transform coefficients;

a TCS generator which generates a transform coefficient set (TCS) from said transform coefficients, said TCS generator outputs said TCS when said TCS represents said OTC, and sends said TCS to an inverse transform when said TCS does not represent said OTC;

an inverse transform which transforms said TCS to a computed region block having computed pel values; and

a replacer which replaces those computed pel values corresponding to said interior pel set with said original pel values to form a modified computed region block (MCRB), said replacer sends the modified computed region block to the forward transform for re-iteration.

8. The encoder of claim 7 wherein said TCS generator generates said TCS by selecting an retaining those transform coefficients which have high energy according to the energy compaction property of transform coefficients, and by zeroing all the non-selected transform coefficients.

9. The encoder of claim 8 further comprising a frame skip unit which receives said sequence of frames and which determines whether each frame in said sequence should be skipped.

10. The encoder of claim 9 further comprising a multiplexer for passing encoded information from said region shape coding unit, said mode coding unit, said motion coding unit, and said region interior coder to said buffer in a predefined order.

11. The encoder of claim 2 wherein said mode decision unit determines in which one of said plurality of modes to encode image data of a particular one of said arbitrarily-shaped regions based upon the values of the following normalized sums of absolute differences for the particular region: ##EQU2## where N is the total number of pels in the particular region, i is a given pel in the region, R is the set of all pels in the particular region, I.sub.i is the intensity of the given pel i, m is the mean value of the intensity of all the pels in the particular region, and e.sub.i designates the motion compensated prediction error associated with the given pel i.

12. The encoder of claim 3 wherein said segmentation unit further comprises:

an intensity segmentation unit which divides a frame into a plurality of arbitrarily-shaped intensity regions by grouping together pels that have a similar intensity features;

a region based matching unit for determining a motion vector indicating the relative difference in position between one of said plurality of intensity regions and a matched region in a previously decoded frame;

a switch for sending a received frame to said intensity segmentation unit in response to a first control signal and to said joint motion estimation and segmentation unit in response to a second control signal; and

a plurality of switches that allow image data information to pass from said intensity segmentation unit and said joint motion estimation and segmentation unit to other of said units in said encoder in response to control signals synchronized with said first and second control signals, respectively.

13. The encoder of claim 1 wherein said region interior coder uses an iterative technique with frequency domain region-zeroing and space domain region-enforcing operations to transform an arbitrarily-shaped image into optimal transform coefficients (OTC).

14. The encoder of claim 13 wherein said region interior coder comprises:

an image circumscription unit for receiving image data of an arbitrarily-shaped region, and circumscribing a rectangular block around said arbitrarily-shaped region, thereby defining an original internal pel set and an original external pel set;

an extrapolator which extrapolates pel values of said internal pel set to initialize pel values of said external pel set;

a forward transform which transforms said image to transform coefficients;

a TCS generator which generates a transform coefficient set (TCS) from said transform coefficients, said TCS generator outputs said TCS when said TCS represents said OTC, and sends said TCS to an inverse transform when said TCS does not represent said OTC;

an inverse transform which transforms said TCS to a computed region block having computed pel values; and

a replacer which replaces those computed pel values corresponding to said interior pel set with said original pel values to form a modified computed region block (MCRB), said replacer sends the modified computed region block to the forward transform for re-iteration.

15. The encoder of claim 1 further comprising a frame skip unit which receives said sequence of frames and which determines whether each frame in said sequence should be skipped.

16. The encoder of claim 15 further comprising a multiplexer for passing encoded information from said region shape coding unit, said mode coding unit, said motion coding unit, and said region interior coder to said buffer in a predefined order.

17. The encoder of claim 15 wherein said rate controller sends control signals to said frame skip unit in response to the signals received from said buffer.

18. A method of encoding a frame in a video sequence comprising the steps of:

(a) segmenting the frame into a plurality of arbitrarily-shaped regions which may have different dimensions, each having a corresponding motion vector;

(b) encoding the shape of each arbitrarily-shaped region;

(c) determining in which of a plurality of modes image data of each arbitrarily-shaped region is to be encoded, where said plurality of modes includes a first mode in which the motion vector corresponding to an arbitrarily-shaped region is encoded, a second mode in which the motion vector and a motion compensated prediction error associated with an arbitrarily-shaped region are encoded, and a third intra-frame mode in which the intensity of each pel in an arbitrarily-shaped region is encoded;

(d) encoding the mode in which each of said plurality of arbitrarily-shaped regions is to be encoded;

(e) encoding the motion vector corresponding to one of said plurality of arbitrarily-shaped regions if the region is to be encoded in either said first mode or said second mode;

(f) encoding a motion compensated prediction error associated with one of said plurality of arbitrarily-shaped regions if the region is to be encoded in said second mode;

(g) encoding the intensity of each pel in one of said plurality of arbitrarily-shaped regions if the region is to be encoded in said third mode; and

(h) storing information encoded in steps (b), (d), (e), (f) and (g).

19. The method of claim 18 wherein the step of segmenting said frame comprises the steps of:

(a) dividing the frame into a plurality of smaller regions of predetermined shape and size to form a first segmentation level;

(b) assigning to each of said plurality of smaller regions an initial motion vector;

(c) performing for each of said plurality of smaller regions a motion vector updating routine which updates the motion vector of a smaller region by assigning to it a best motion vector selected from among the initial motion vector assigned to the smaller region, an updated motion vector obtained by performing a block matching technique for the smaller region, and motion vectors of the smaller region's neighboring regions, wherein the best motion vector is selected according to a priority scheme and a predetermined threshold value;

(d) dividing each smaller region in the previous segmentation level into a plurality of smaller regions of predetermined shape and size to form a subsequent segmentation level;

(e) assigning to each of the plurality of smaller regions in the subsequent segmentation level an initial motion vector equal to the motion vector of its parent region;

(f) performing the motion vector updating routine for each of said plurality of smaller regions in the subsequent segmentation level;

(g) iteratively performing the steps (d), (e) and (f) until a stop condition is reached;

(h) merging adjacent smaller regions having similar motion vectors to form said plurality of arbitrarily-shaped regions.

20. The method of claim 19 wherein the motion vector updating routine comprises the steps of:

determining the smallest matching error from among the matching errors obtained respectively by assigning to the smaller region the following motion vectors:

(a) the initial motion vector assigned to the smaller region;

(b) the updated motion vector obtained by performing a block matching technique for the smaller region; and

(c) the motion vectors of the smaller region's neighboring regions;

selecting the initial motion vector as the best motion vector if the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is less than the predetermined threshold value;

selecting the motion vector of one of the neighboring regions as the best motion vector if:

(a) the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is not less than the predetermined threshold value; and

(b) the absolute value of the difference between the smallest matching error and the matching error obtained by assigning to the smaller region the motion vector of the neighboring region is less than the predetermined threshold value; and

selecting the matched motion vector as the best motion vector if:

(a) the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is not less than the predetermined threshold value; and

(b) the absolute value of the difference between the smallest matching error and each of the matching errors obtained by assigning to the smaller region the motion vector of one of the neighboring region is not less than the predetermined threshold value.

21. The method of claim 20 wherein the steps of encoding the prediction error and encoding the intensity of each pel comprise the steps of:

generating original pel values by:

(a) circumscribing said arbitrarily-shaped region with a rectangular region block, thereby creating an internal pel set which lies within said arbitrarily-shaped image and within said region block, and an external pel set which lies outside said arbitrarily-shaped region and within said region block; and,

(b) initializing pel values of said external pel set by extrapolating the pel values of said internal pel set; and

calculating optimal transform coefficients (OTC) by:

(a) performing a forward transform on said region block to generate transform coefficients;

(b) generating a transform coefficient set (TCS) from said transform coefficients;

(c) performing an inverse transform on said TCS thereby generating a computed region block having computed pel values;

(d) replacing those computed pel values corresponding to said internal pel set with original pel values to form a modified computed region block (MCRB);

(e) determining whether said TCS represents said OTC;

(f) reiterating steps (a) and (b) on said modified computed region block and outputting said TCS when said TCS represents said OTC; and,

(g) reiterating steps (a) through (g) on said modified computed region block when said TCS values do not represent said OTC.

22. The method of claim 21 wherein said step of performing a forward transform comprises the step of performing a discrete cosine transform (DCT).

23. The method of claim 22 wherein the step of generating a TCS comprises the step of quantizing said transform coefficients.

24. The method of claim 23 wherein the step of generating said TCS further comprises the steps of selecting and retaining those transform coefficients which have high energy according to the energy compaction property of transform coefficients, and zeroing the non-selected transform coefficients.

25. The method of claim 18 wherein the steps of encoding the prediction error and encoding the intensity of each pel comprise the steps of:

generating original pel values by:

(a) circumscribing said arbitrarily-shaped region with a rectangular region block, thereby creating an internal pel set which lies within said arbitrarily-shaped image and within said region block, and an external pel set which lies outside said arbitrarily-shaped region and within said region block; and,

(b) initializing pel values of said external pel set by extrapolating the pel values of said internal pel set; and

calculating optimal transform coefficients (OTC) by:

(a) performing a forward transform on said region block to generate transform coefficients;

(b) generating a transform coefficient set (TCS) from said transform coefficients;

(c) performing an inverse transform on said TCS thereby generating a computed region block having computed pel values;

(d) replacing those computed pel values corresponding to said internal pel set with original pel values to form a modified computed region block (MCRB);

(e) determining whether said TCS represents said OTC;

(f) reiterating steps (a) and (b) on said modified computed region block and outputting said TCS when said TCS represents said OTC; and,

(g) reiterating steps (a) through (g) on said modified computed region block when said TCS values do not represent said OTC.

26. The method of claim 25 wherein said step of performing a forward transform comprises the step of performing a discrete cosine transform (DCT).

27. The method of claim 26 wherein the step of generating a TCS comprises the step of quantizing said transform coefficients.

28. The method of claim 27 further comprising the step of deciding whether to skip said frame where said step of deciding depends upon the fullness of a buffer which serves as an interface to said decoder.

29. The method of claim 28 wherein the step of determining in which of a plurality of modes image data of each arbitrarily-shaped region is to be encoded is based upon the values of the following normalized sums of absolute differences for the particular region: ##EQU3## where N is the total number of pels in the particular region, i is a given pel in the region, R is the set of all pels in the particular region, I.sub.i is the intensity of the given pel i, m is the mean value of the intensity of all the pels in the particular region, and e.sub.i designates the motion compensated prediction error associated with the given pel i.

30. The method of claim 29 wherein image data of the particular region is encoded in said first mode if the value of NSAD.sub.P is less than a threshold value c and in said second mode when the value of NSAD.sub.P exceeds the threshold value c and is less than the value of NSAD.sub.I by at least a threshold value b.

31. The method of claim 30 wherein said threshold values b and c depend upon the fullness of a buffer which serves as an interface to said decoder.

32. The method of claim 24 wherein the number of selected transform coefficients is based upon the size of the particular arbitrarily-shaped region being encoded.

33. The method of claim 23 wherein the step of quantizing said transform coefficients uses a quantization step size which depends upon the fullness of a buffer which serves as an interface to a decoder.

34. A method of encoding a frame in a video sequence comprising the steps of:

(a) segmenting the frame into a plurality of arbitrarily-shaped regions which may have different dimensions, each having a corresponding motion vector;

(b) encoding the shape of each arbitrarily-shaped region;

(c) determining in which of a plurality of modes image data of each arbitrarily-shaped region is to be encoded, where said plurality of modes includes a first mode in which the motion vector corresponding to an arbitrarily-shaped region is encoded, a second mode in which the motion vector and a motion compensated prediction error associated with an arbitrarily-shaped region are encoded, and a third intra-frame mode in which the intensity of each pel in an arbitrarily-shaped region is encoded;

(d) encoding the mode in which each of said plurality of arbitrarily-shaped regions is to be encoded;

(e) encoding the motion vector corresponding to one of said plurality of arbitrarily-shaped regions if the region is to be encoded in either said first mode or said second mode;

(f) encoding a motion compensated prediction error associated with one of said plurality of arbitrarily-shaped regions if the region is to be encoded in said second mode;

(g) encoding the intensity of each pel in one of said plurality of arbitrarily-shaped regions if the region is to be encoded in said third mode; and

(h) transmitting information encoded in steps (b), (d), (e), (f) and (g) to a decoder.

35. The method of claim 34 wherein the step of segmenting said frame comprises the steps of:

(a) dividing the frame into a plurality of smaller regions of predetermined shape and size to form a first segmentation level;

(b) assigning to each of said plurality of smaller regions an initial motion vector;

(c) performing for each of said plurality of smaller regions a motion vector updating routine which updates the motion vector of a smaller region by assigning to it a best motion vector selected from among the initial motion vector assigned to the smaller region, an updated motion vector obtained by performing a block matching technique for the smaller region, and motion vectors of the smaller region's neighboring regions, wherein the best motion vector is selected according to a priority scheme and a predetermined threshold value;

(d) dividing each smaller region in the previous segmentation level into a plurality of smaller regions of predetermined shape and size to form a subsequent segmentation level;

(e) assigning to each of the plurality of smaller regions in the subsequent segmentation level an initial motion vector equal to the motion vector of its parent region;

(f) performing the motion vector updating routine for each of said plurality of smaller regions in the subsequent segmentation level;

(g) iteratively performing the steps (d), (e) and (f) until a stop condition is reached;

(h) merging adjacent smaller regions having similar motion vectors to form said plurality of arbitrarily-shaped regions.

36. The method of claim 35 wherein the motion vector updating routine comprises the steps of:

determining the smallest matching error from among the matching errors obtained respectively by assigning to the smaller region the following motion vectors:

(a) the initial motion vector assigned to the smaller region;

(b) the updated motion vector obtained by performing a block matching technique for the smaller region; and

(c) the motion vectors of the smaller region's neighboring regions;

selecting the initial motion vector as the best motion vector if the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is less than the predetermined threshold value;

selecting the motion vector of one of the neighboring regions as the best motion vector if:

(a) the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is not less than the predetermined threshold value; and

(b) the absolute value of the difference between the smallest matching error and the matching error obtained by assigning to the smaller region the motion vector of the neighboring region is less than the predetermined threshold value; and

selecting the matched motion vector as the best motion vector if:

(a) the absolute value of the difference between the smallest matching error and the matching error obtained by using the initial motion vector is not less than the predetermined threshold value; and

(b) the absolute value of the difference between the smallest matching error and each of the matching errors obtained by assigning to the smaller region the motion vector of one of the neighboring region is not less than the predetermined threshold value.

37. The method of claim 36 wherein the steps of encoding the prediction error and encoding the intensity of each pel comprise the steps of:

generating original pel values by:

(a) circumscribing said arbitrarily-shaped region with a rectangular region block, thereby creating an internal pel set which lies within said arbitrarily-shaped image and within said region block, and an external pel set which lies outside said arbitrarily-shaped region and within said region block; and,

(b) initializing pel values of said external pel set by extrapolating the pel values of said internal pel set; and

calculating optimal transform coefficients (OTC) by:

(a) performing a forward transform on said region block to generate transform coefficients;

(b) generating a transform coefficient set (TCS) from said transform coefficients;

(c) performing an inverse transform on said TCS thereby generating a computed region block having computed pel values;

(d) replacing those computed pel values corresponding to said internal pel set with original pel values to form a modified computed region block (MCRB);

(e) determining whether said TCS represents said OTC;

(f) reiterating steps (a) and (b) on said modified computed region block and outputting said TCS when said TCS represents said OTC; and,

(g) reiterating steps (a) through (g) on said modified computed region block when said TCS values do not represent said OTC.

38. The method of claim 37 wherein said step of performing a forward transform comprises the step of performing a discrete cosine transform (DCT).

39. The method of claim 38 wherein the step of generating a TCS comprises the step of quantizing said transform coefficients.

40. The method of claim 39 wherein the step of generating said TCS further comprises the steps of selecting and retaining those transform coefficients which have high energy according to the energy compaction property of transform coefficients, and zeroing the non-selected transform coefficients.

41. The method of claim 34 wherein the steps of encoding the prediction error and encoding the intensity of each pel comprise the steps of:

generating original pel values by:

(a) circumscribing said arbitrarily-shaped region with a rectangular region block, thereby creating an internal pel set which lies within said arbitrarily-shaped image and within said region block, and an external pel set which lies outside said arbitrarily-shaped region and within said region block; and,

(b) initializing pel values of said external pel set by extrapolating the pel values of said internal pel set; and

calculating optimal transform coefficients (OTC) by:

(a) performing a forward transform on said region block to generate transform coefficients;

(b) generating a transform coefficient set (TCS) from said transform coefficients;

(c) performing an inverse transform on said TCS thereby generating a computed region block having computed pel values;

(d) replacing those computed pel values corresponding to said internal pel set with original pel values to form a modified computed region block (MCRB);

(e) determining whether said TCS represents said OTC;

(f) reiterating steps (a) and (b) on said modified computed region block and outputting said TCS when said TCS represents said OTC; and,

(g) reiterating steps (a) through (g) on said modified computed region block when said TCS values do not represent said OTC.

42. The method of claim 41 wherein said step of performing a forward transform comprises the step of performing a discrete cosine transform (DCT).

43. The method of claim 42 wherein the step of generating a TCS comprises the step of quantizing said transform coefficients.

44. The method of claim 43 further comprising the step of deciding whether to skip said frame where said step of deciding depends upon the fullness of a buffer which serves as an interface to said decoder.

45. The method of claim 44 wherein the step of determining in which of a plurality of modes image data of each arbitrarily-shaped region is to be encoded is based upon the values of the following normalized sums of absolute differences for the particular region: ##EQU4## where N is the total number of pels in the particular region, i is a given pel in the region, R is the set of all pels in the particular region, I.sub.i is the intensity of the given pel i, m is the mean value of the intensity of all the pels in the particular region, and e.sub.i designates the motion compensated prediction error associated with the given pel i.

46. The method of claim 45 wherein image data of the particular region is encoded in said first mode if the value of NSAD.sub.P is less than a threshold value c and in said second mode when the value of NSAD.sub.P exceeds the threshold value c and is less than the value of NSAD.sub.I by at least a threshold value b.

47. The method of claim 46 wherein said threshold values b and c depend upon the fullness of a buffer which serves as an interface to said decoder.

48. The method of claim 40 wherein the number of selected transform coefficients is based upon the size of the particular arbitrarily-shaped region being encoded.

49. The method of claim 39 wherein the step of quantizing said transform coefficients uses a quantization step size which depends upon the fullness of a buffer which serves as an interface to a decoder.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates generally to a method and apparatus for coding a video sequence, and, in particular, to a region-oriented method and apparatus for coding arbitrarily-shaped regions of video images.

BACKGROUND OF THE INVENTION

Many approaches to encoding a sequence of digital video images are known in the art. One classical approach is to divide each frame in the sequence into square blocks of predetermined size also known as macroblocks. Each macroblock is then assigned a motion vector relative to a previous decoded frame, where the motion vector represents the offset between the current macroblock and a block of pixels of the same size in a previous reconstructed frame that forms a best match. The motion vector is transmitted to a decoder which can then reconstruct the current frame based upon the previous decoded frame, the motion vector and a prediction error. Block-based techniques, however, can lead to distortions such as blocking and mosquito effects in low bit-rate applications.

A more complex object-oriented, or region-oriented, approach encodes arbitrarily-shaped regions instead of rectangular or square blocks. While block-oriented coding techniques typically transmit two parameter sets, specifically the motion and color of each block, an object-oriented approach requires that the shape of each region be transmitted as well in order to allow reconstruction of the image. For example, in M. Hotter, "Object-Oriented Analysis-Synthesis Coding Based On Moving Two-Dimensional Objects," Signal Processing: Image Communication, Vol. 2, pp. 409-428 (1990), an encoder which encodes arbitrarily-shaped regions is presented, where objects are described by three parameter sets defining their motion, shape and color. A priority control determines in which of two modes the coded information will be sent based upon the success or failure of the motion estimation technique for a particular region. The shape coding technique considered in the aforementioned article approximates the shape of each region by a combination of polygon and spline representation of the shape. U.S. Pat. No. 5,295,201 also discloses an object-oriented encoder which includes an apparatus for approximating the shape of an arbitrarily-shaped region to a polygon. The vertices of the polygon are determined, and the coordinate values of the vertices are calculated and transmitted.

One color coding technique for use in object-oriented approaches is presented in Gilge et al., "Coding of Arbitrarily Shaped Image Segments Based On A Generalized Orthogonal Transform," Signal Processing: Image Communication, Vol. 1, pp. 153-180 (1989). According to the technique disclosed in this article, an intensity function inside each region is approximated by a weighted sum of basis functions which are orthogonal with respect to the shape of the region to be coded. While this technique may be theoretically useful, it is not practicable for implementation in a real-time system.

Due to the potential advantages of an object-oriented approach, there exists a need for an object-oriented encoder which provides powerful schemes for segmenting an image into arbitrarily-shaped regions, each of which has a corresponding motion vector, and for representing the segment content in a manner which can be readily implemented for use in real-time. It is also desirable to have an encoder which can encode a generic scene or sequence of images, the content of which is not known beforehand, in contrast to the requirements of the prior art. It is further desirable to provide an encoder which permits additional functionalities, such as tracking objects moving from one area of a scene to another between images in a sequence.

SUMMARY OF THE INVENTION

The present invention discloses an encoder for encoding a sequence of video frames. The encoder comprises a segmentation unit which segments a current frame in the video sequence into a plurality of arbitrarily-shaped regions, where each of the plurality of arbitrarily-shaped regions is assigned a motion vector. The encoder also has a decoded frame memory for storing a previously decoded frame in the video sequence and a prediction unit connected to the segmentation unit and the decoded frame memory for predicting image data of the current frame based upon a previously decoded frame and based upon the motion vector assigned to one of the plurality of arbitrarily-shaped regions. The encoder further comprises a region shape coding unit for encoding the shape of each of the arbitrarily-shaped regions.

A mode decision unit determines in which one of a plurality of modes image data from each of the plurality of arbitrarily-shaped regions is to be encoded. The plurality of modes comprises a first inter-frame mode in which a motion vector associated with one of the plurality of arbitrarily-shaped regions is encoded and a second inter-frame mode in which a motion vector and a motion compensated prediction error associated with one of the plurality of arbitrarily-shaped regions are encoded. A third mode is an intra-frame mode in which the intensity of each pel in one of the plurality of arbitrarily-shaped regions is encoded. A mode coding unit then encodes the mode in which each of the plurality of arbitrarily-shaped regions is to be encoded.

The encoder includes a motion coding unit for encoding motion vectors associated with the plurality of arbitrarily-shaped regions. In addition, the encoder comprises a region interior coder which encodes a motion compensated prediction error associated with one of the plurality of arbitrarily-shaped regions if the region is to be encoded in the se