|
|
|
| United States Patent | 5608458 |
| Link to this page | http://www.wikipatents.com/5608458.html |
| Inventor(s) | Chen; Homer H. (Lincroft, NJ);
Ebrahimi; Touradj (Ecublens, CH);
Haskell; Barin G. (Tinton Falls, NJ);
Horne; Caspar (Beaverton, OR) |
| Abstract | An encoder segments frames in a sequence of digital images into multiple
regions of arbitrary shape each of which has a corresponding motion vector
relative to a previous decoded frame. A hierarchical multi-resolution
motion estimation and segmentation technique, which segments the frame
into multiple blocks and which assigns a best motion vector to each block
is used. Blocks having the same or similar motion vector are then merged
to form the arbitrarily-shaped regions. The shape of each region is coded,
and a decision is made to code additional image data of each region in one
of three modes. In a first inter-frame mode, a motion vector associated
with a region is encoded. In a second inter-frame mode, a prediction error
for the region is also encoded. In an intra-frame mode, the intensity of
each picture element in the region is encoded. A region interior coder
with frequency domain region-zeroing and space domain region-enforcing
operations is employed for effectively coding the interior image data of
the arbitrarily-shaped regions. The region interior coder uses an
iterative technique based on the theory of successive projection onto
convex sets (POCS) to find the best values for a group of selected
transform coefficients. The coded information, including the shape of the
region, the choice of the mode, and the motion vector and/or the region's
interior image data, may then be transmitted to a decoder where the image
can be reconstructed. |
|
|
|
Title Information  |
|
|
|
|
|
|
| Publication Date |
March 4, 1997 |
|
|
|
|
|
| Filing Date |
October 13, 1994 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
We claim:
1. An encoder for encoding a sequence of video frames comprising:
a segmentation unit which segments a current frame in said video sequence
into a plurality of arbitrarily-shaped regions which may have different
dimensions, each of said plurality of arbitrarily-shaped regions being
assigned a motion vector;
a decoded frame memory for storing a previously decoded frame in said video
sequence;
a prediction unit connected to said segmentation unit and said decoded
frame memory for predicting image data of said current frame based upon a
previously decoded frame and based upon the motion vector assigned to one
of said plurality of arbitrarily-shaped regions;
a region shape coding unit for encoding the shape of each of said
arbitrarily-shaped regions;
a mode decision unit which determines in which one of a plurality of modes
image data from each of said plurality of arbitrarily-shaped regions is to
be encoded, where said plurality of modes comprises an intra-frame mode in
which the intensity of each pel in one of said plurality of
arbitrarily-shaped regions is encoded;
a mode coding unit which encodes the mode in which each of said plurality
of arbitrarily-shaped regions is to be encoded;
a motion coding unit for encoding motion vectors associated with said
plurality of arbitrarily-shaped regions;
a region interior coder which encodes the intensity of each pel in one of
said plurality of arbitrarily-shaped regions if the region is to be
encoded in said intra-frame mode;
a buffer which serves as an interface for transmitting encoded information
from said encoder; and
a rate controller which receives signals from said buffer, where said rate
controller sends control signals to said segmentation unit, said mode
decision unit, and said region interior unit in response to the signals
received from said buffer.
2. The encoder of claim 1 wherein
said plurality of modes further comprises a first inter-frame mode in which
a motion vector associated with one of said plurality of
arbitrarily-shaped regions is encoded and a second inter-frame mode in
which a motion vector and a motion compensated prediction error associated
with one of said plurality of arbitrarily-shaped regions are encoded; and
said region interior coder encodes a motion compensated prediction error
associated with one of said plurality of arbitrarily-shaped regions if the
region is to be encoded in said second inter-frame mode.
3. The encoder of claim 1 wherein said segmentation unit comprises a joint
motion estimation and segmentation unit for performing the following
functions:
(a) dividing said current frame into a plurality of smaller regions of
predetermined shape and size;
(b) performing for each of said smaller regions a motion vector updating
routine by assigning to the smaller region a best motion vector selected
from among an initial motion vector assigned to the smaller region, motion
vectors of neighboring regions, and an updated motion vector obtained by
performing a block matching technique for the smaller region, wherein the
best motion vector is selected according to a priority scheme and a
predetermined threshold value;
(c) dividing each of said smaller regions into a plurality of smaller
regions of predetermined shape and size;
(d) repeating step (b) for each of the smaller regions resulting from step
(c);
(e) iteratively repeating steps (c) and (d) for each smaller region
resulting from step (d) until a stop condition is reached; and
(f) merging adjacent regions having similar motion vectors to form said
arbitrarily-shaped regions.
4. The encoder of claim 3 wherein said joint motion estimation and
segmentation unit comprises a best motion vector selection unit which
performs the following functions:
determining the smallest matching error from among the matching errors
obtained respectively by assigning to the smaller region the following
motion vectors:
(a) the initial motion vector assigned to the smaller region;
(b) the updated motion vector obtained by performing a block matching
technique for the smaller region; and
(c) the motion vectors of the smaller region's neighboring regions;
selecting the initial motion vector as the best motion vector if the
absolute value of the difference between the smallest matching error and
the matching error obtained by using the initial motion vector is less
than the predetermined threshold value;
selecting the motion vector of one of the neighboring regions as the best
motion vector if:
(a) the absolute value of the difference between the smallest matching
error and the matching error obtained by using the initial motion vector
is not less than the predetermined threshold value; and
(b) the absolute value of the difference between the smallest matching
error and the matching error obtained by assigning to the smaller region
the motion vector of the neighboring region is less than the predetermined
threshold value; and
selecting the matched motion vector as the best motion vector if:
(a) the absolute value of the difference between the smallest matching
error and the matching error obtained by using the initial motion vector
is not less than the predetermined threshold value; and
(b) the absolute value of the difference between the smallest matching
error and each of the matching errors obtained by assigning to the smaller
region the motion vector of one of the neighboring region is not less than
the predetermined threshold value.
5. The encoder of claim 3 wherein the step of dividing each of said smaller
regions into a plurality of smaller regions comprises the step of dividing
each of said smaller regions into four smaller square blocks of equal
size.
6. The encoder of claim 5 wherein said region interior coder uses an
iterative technique with frequency domain region-zeroing and space domain
region-enforcing operations to transform an arbitrarily-shaped image into
optimal transform coefficients (OTC).
7. The encoder of claim 6 wherein said region interior coder comprises:
an image circumscription unit for receiving image data of an
arbitrarily-shaped region, and circumscribing a rectangular block around
said arbitrarily-shaped region, thereby defining an original internal pel
set and an original external pel set;
an extrapolator which extrapolates pel values of said internal pel set to
initialize pel values of said external pel set;
a forward transform which transforms said image to transform coefficients;
a TCS generator which generates a transform coefficient set (TCS) from said
transform coefficients, said TCS generator outputs said TCS when said TCS
represents said OTC, and sends said TCS to an inverse transform when said
TCS does not represent said OTC;
an inverse transform which transforms said TCS to a computed region block
having computed pel values; and
a replacer which replaces those computed pel values corresponding to said
interior pel set with said original pel values to form a modified computed
region block (MCRB), said replacer sends the modified computed region
block to the forward transform for re-iteration.
8. The encoder of claim 7 wherein said TCS generator generates said TCS by
selecting an retaining those transform coefficients which have high energy
according to the energy compaction property of transform coefficients, and
by zeroing all the non-selected transform coefficients.
9. The encoder of claim 8 further comprising a frame skip unit which
receives said sequence of frames and which determines whether each frame
in said sequence should be skipped.
10. The encoder of claim 9 further comprising a multiplexer for passing
encoded information from said region shape coding unit, said mode coding
unit, said motion coding unit, and said region interior coder to said
buffer in a predefined order.
11. The encoder of claim 2 wherein said mode decision unit determines in
which one of said plurality of modes to encode image data of a particular
one of said arbitrarily-shaped regions based upon the values of the
following normalized sums of absolute differences for the particular
region:
##EQU2##
where N is the total number of pels in the particular region, i is a given
pel in the region, R is the set of all pels in the particular region,
I.sub.i is the intensity of the given pel i, m is the mean value of the
intensity of all the pels in the particular region, and e.sub.i designates
the motion compensated prediction error associated with the given pel i.
12. The encoder of claim 3 wherein said segmentation unit further
comprises:
an intensity segmentation unit which divides a frame into a plurality of
arbitrarily-shaped intensity regions by grouping together pels that have a
similar intensity features;
a region based matching unit for determining a motion vector indicating the
relative difference in position between one of said plurality of intensity
regions and a matched region in a previously decoded frame;
a switch for sending a received frame to said intensity segmentation unit
in response to a first control signal and to said joint motion estimation
and segmentation unit in response to a second control signal; and
a plurality of switches that allow image data information to pass from said
intensity segmentation unit and said joint motion estimation and
segmentation unit to other of said units in said encoder in response to
control signals synchronized with said first and second control signals,
respectively.
13. The encoder of claim 1 wherein said region interior coder uses an
iterative technique with frequency domain region-zeroing and space domain
region-enforcing operations to transform an arbitrarily-shaped image into
optimal transform coefficients (OTC).
14. The encoder of claim 13 wherein said region interior coder comprises:
an image circumscription unit for receiving image data of an
arbitrarily-shaped region, and circumscribing a rectangular block around
said arbitrarily-shaped region, thereby defining an original internal pel
set and an original external pel set;
an extrapolator which extrapolates pel values of said internal pel set to
initialize pel values of said external pel set;
a forward transform which transforms said image to transform coefficients;
a TCS generator which generates a transform coefficient set (TCS) from said
transform coefficients, said TCS generator outputs said TCS when said TCS
represents said OTC, and sends said TCS to an inverse transform when said
TCS does not represent said OTC;
an inverse transform which transforms said TCS to a computed region block
having computed pel values; and
a replacer which replaces those computed pel values corresponding to said
interior pel set with said original pel values to form a modified computed
region block (MCRB), said replacer sends the modified computed region
block to the forward transform for re-iteration.
15. The encoder of claim 1 further comprising a frame skip unit which
receives said sequence of frames and which determines whether each frame
in said sequence should be skipped.
16. The encoder of claim 15 further comprising a multiplexer for passing
encoded information from said region shape coding unit, said mode coding
unit, said motion coding unit, and said region interior coder to said
buffer in a predefined order.
17. The encoder of claim 15 wherein said rate controller sends control
signals to said frame skip unit in response to the signals received from
said buffer.
18. A method of encoding a frame in a video sequence comprising the steps
of:
(a) segmenting the frame into a plurality of arbitrarily-shaped regions
which may have different dimensions, each having a corresponding motion
vector;
(b) encoding the shape of each arbitrarily-shaped region;
(c) determining in which of a plurality of modes image data of each
arbitrarily-shaped region is to be encoded, where said plurality of modes
includes a first mode in which the motion vector corresponding to an
arbitrarily-shaped region is encoded, a second mode in which the motion
vector and a motion compensated prediction error associated with an
arbitrarily-shaped region are encoded, and a third intra-frame mode in
which the intensity of each pel in an arbitrarily-shaped region is
encoded;
(d) encoding the mode in which each of said plurality of arbitrarily-shaped
regions is to be encoded;
(e) encoding the motion vector corresponding to one of said plurality of
arbitrarily-shaped regions if the region is to be encoded in either said
first mode or said second mode;
(f) encoding a motion compensated prediction error associated with one of
said plurality of arbitrarily-shaped regions if the region is to be
encoded in said second mode;
(g) encoding the intensity of each pel in one of said plurality of
arbitrarily-shaped regions if the region is to be encoded in said third
mode; and
(h) storing information encoded in steps (b), (d), (e), (f) and (g).
19. The method of claim 18 wherein the step of segmenting said frame
comprises the steps of:
(a) dividing the frame into a plurality of smaller regions of predetermined
shape and size to form a first segmentation level;
(b) assigning to each of said plurality of smaller regions an initial
motion vector;
(c) performing for each of said plurality of smaller regions a motion
vector updating routine which updates the motion vector of a smaller
region by assigning to it a best motion vector selected from among the
initial motion vector assigned to the smaller region, an updated motion
vector obtained by performing a block matching technique for the smaller
region, and motion vectors of the smaller region's neighboring regions,
wherein the best motion vector is selected according to a priority scheme
and a predetermined threshold value;
(d) dividing each smaller region in the previous segmentation level into a
plurality of smaller regions of predetermined shape and size to form a
subsequent segmentation level;
(e) assigning to each of the plurality of smaller regions in the subsequent
segmentation level an initial motion vector equal to the motion vector of
its parent region;
(f) performing the motion vector updating routine for each of said
plurality of smaller regions in the subsequent segmentation level;
(g) iteratively performing the steps (d), (e) and (f) until a stop
condition is reached;
(h) merging adjacent smaller regions having similar motion vectors to form
said plurality of arbitrarily-shaped regions.
20. The method of claim 19 wherein the motion vector updating routine
comprises the steps of:
determining the smallest matching error from among the matching errors
obtained respectively by assigning to the smaller region the following
motion vectors:
(a) the initial motion vector assigned to the smaller region;
(b) the updated motion vector obtained by performing a block matching
technique for the smaller region; and
(c) the motion vectors of the smaller region's neighboring regions;
selecting the initial motion vector as the best motion vector if the
absolute value of the difference between the smallest matching error and
the matching error obtained by using the initial motion vector is less
than the predetermined threshold value;
selecting the motion vector of one of the neighboring regions as the best
motion vector if:
(a) the absolute value of the difference between the smallest matching
error and the matching error obtained by using the initial motion vector
is not less than the predetermined threshold value; and
(b) the absolute value of the difference between the smallest matching
error and the matching error obtained by assigning to the smaller region
the motion vector of the neighboring region is less than the predetermined
threshold value; and
selecting the matched motion vector as the best motion vector if:
(a) the absolute value of the difference between the smallest matching
error and the matching error obtained by using the initial motion vector
is not less than the predetermined threshold value; and
(b) the absolute value of the difference between the smallest matching
error and each of the matching errors obtained by assigning to the smaller
region the motion vector of one of the neighboring region is not less than
the predetermined threshold value.
21. The method of claim 20 wherein the steps of encoding the prediction
error and encoding the intensity of each pel comprise the steps of:
generating original pel values by:
(a) circumscribing said arbitrarily-shaped region with a rectangular region
block, thereby creating an internal pel set which lies within said
arbitrarily-shaped image and within said region block, and an external pel
set which lies outside said arbitrarily-shaped region and within said
region block; and,
(b) initializing pel values of said external pel set by extrapolating the
pel values of said internal pel set; and
calculating optimal transform coefficients (OTC) by:
(a) performing a forward transform on said region block to generate
transform coefficients;
(b) generating a transform coefficient set (TCS) from said transform
coefficients;
(c) performing an inverse transform on said TCS thereby generating a
computed region block having computed pel values;
(d) replacing those computed pel values corresponding to said internal pel
set with original pel values to form a modified computed region block
(MCRB);
(e) determining whether said TCS represents said OTC;
(f) reiterating steps (a) and (b) on said modified computed region block
and outputting said TCS when said TCS represents said OTC; and,
(g) reiterating steps (a) through (g) on said modified computed region
block when said TCS values do not represent said OTC.
22. The method of claim 21 wherein said step of performing a forward
transform comprises the step of performing a discrete cosine transform
(DCT).
23. The method of claim 22 wherein the step of generating a TCS comprises
the step of quantizing said transform coefficients.
24. The method of claim 23 wherein the step of generating said TCS further
comprises the steps of selecting and retaining those transform
coefficients which have high energy according to the energy compaction
property of transform coefficients, and zeroing the non-selected transform
coefficients.
25. The method of claim 18 wherein the steps of encoding the prediction
error and encoding the intensity of each pel comprise the steps of:
generating original pel values by:
(a) circumscribing said arbitrarily-shaped region with a rectangular region
block, thereby creating an internal pel set which lies within said
arbitrarily-shaped image and within said region block, and an external pel
set which lies outside said arbitrarily-shaped region and within said
region block; and,
(b) initializing pel values of said external pel set by extrapolating the
pel values of said internal pel set; and
calculating optimal transform coefficients (OTC) by:
(a) performing a forward transform on said region block to generate
transform coefficients;
(b) generating a transform coefficient set (TCS) from said transform
coefficients;
(c) performing an inverse transform on said TCS thereby generating a
computed region block having computed pel values;
(d) replacing those computed pel values corresponding to said internal pel
set with original pel values to form a modified computed region block
(MCRB);
(e) determining whether said TCS represents said OTC;
(f) reiterating steps (a) and (b) on said modified computed region block
and outputting said TCS when said TCS represents said OTC; and,
(g) reiterating steps (a) through (g) on said modified computed region
block when said TCS values do not represent said OTC.
26. The method of claim 25 wherein said step of performing a forward
transform comprises the step of performing a discrete cosine transform
(DCT).
27. The method of claim 26 wherein the step of generating a TCS comprises
the step of quantizing said transform coefficients.
28. The method of claim 27 further comprising the step of deciding whether
to skip said frame where said step of deciding depends upon the fullness
of a buffer which serves as an interface to said decoder.
29. The method of claim 28 wherein the step of determining in which of a
plurality of modes image data of each arbitrarily-shaped region is to be
encoded is based upon the values of the following normalized sums of
absolute differences for the particular region:
##EQU3##
where N is the total number of pels in the particular region, i is a given
pel in the region, R is the set of all pels in the particular region,
I.sub.i is the intensity of the given pel i, m is the mean value of the
intensity of all the pels in the particular region, and e.sub.i designates
the motion compensated prediction error associated with the given pel i.
30. The method of claim 29 wherein image data of the particular region is
encoded in said first mode if the value of NSAD.sub.P is less than a
threshold value c and in said second mode when the value of NSAD.sub.P
exceeds the threshold value c and is less than the value of NSAD.sub.I by
at least a threshold value b.
31. The method of claim 30 wherein said threshold values b and c depend
upon the fullness of a buffer which serves as an interface to said
decoder.
32. The method of claim 24 wherein the number of selected transform
coefficients is based upon the size of the particular arbitrarily-shaped
region being encoded.
33. The method of claim 23 wherein the step of quantizing said transform
coefficients uses a quantization step size which depends upon the fullness
of a buffer which serves as an interface to a decoder.
34. A method of encoding a frame in a video sequence comprising the steps
of:
(a) segmenting the frame into a plurality of arbitrarily-shaped regions
which may have different dimensions, each having a corresponding motion
vector;
(b) encoding the shape of each arbitrarily-shaped region;
(c) determining in which of a plurality of modes image data of each
arbitrarily-shaped region is to be encoded, where said plurality of modes
includes a first mode in which the motion vector corresponding to an
arbitrarily-shaped region is encoded, a second mode in which the motion
vector and a motion compensated prediction error associated with an
arbitrarily-shaped region are encoded, and a third intra-frame mode in
which the intensity of each pel in an arbitrarily-shaped region is
encoded;
(d) encoding the mode in which each of said plurality of arbitrarily-shaped
regions is to be encoded;
(e) encoding the motion vector corresponding to one of said plurality of
arbitrarily-shaped regions if the region is to be encoded in either said
first mode or said second mode;
(f) encoding a motion compensated prediction error associated with one of
said plurality of arbitrarily-shaped regions if the region is to be
encoded in said second mode;
(g) encoding the intensity of each pel in one of said plurality of
arbitrarily-shaped regions if the region is to be encoded in said third
mode; and
(h) transmitting information encoded in steps (b), (d), (e), (f) and (g) to
a decoder.
35. The method of claim 34 wherein the step of segmenting said frame
comprises the steps of:
(a) dividing the frame into a plurality of smaller regions of predetermined
shape and size to form a first segmentation level;
(b) assigning to each of said plurality of smaller regions an initial
motion vector;
(c) performing for each of said plurality of smaller regions a motion
vector updating routine which updates the motion vector of a smaller
region by assigning to it a best motion vector selected from among the
initial motion vector assigned to the smaller region, an updated motion
vector obtained by performing a block matching technique for the smaller
region, and motion vectors of the smaller region's neighboring regions,
wherein the best motion vector is selected according to a priority scheme
and a predetermined threshold value;
(d) dividing each smaller region in the previous segmentation level into a
plurality of smaller regions of predetermined shape and size to form a
subsequent segmentation level;
(e) assigning to each of the plurality of smaller regions in the subsequent
segmentation level an initial motion vector equal to the motion vector of
its parent region;
(f) performing the motion vector updating routine for each of said
plurality of smaller regions in the subsequent segmentation level;
(g) iteratively performing the steps (d), (e) and (f) until a stop
condition is reached;
(h) merging adjacent smaller regions having similar motion vectors to form
said plurality of arbitrarily-shaped regions.
36. The method of claim 35 wherein the motion vector updating routine
comprises the steps of:
determining the smallest matching error from among the matching errors
obtained respectively by assigning to the smaller region the following
motion vectors:
(a) the initial motion vector assigned to the smaller region;
(b) the updated motion vector obtained by performing a block matching
technique for the smaller region; and
(c) the motion vectors of the smaller region's neighboring regions;
selecting the initial motion vector as the best motion vector if the
absolute value of the difference between the smallest matching error and
the matching error obtained by using the initial motion vector is less
than the predetermined threshold value;
selecting the motion vector of one of the neighboring regions as the best
motion vector if:
(a) the absolute value of the difference between the smallest matching
error and the matching error obtained by using the initial motion vector
is not less than the predetermined threshold value; and
(b) the absolute value of the difference between the smallest matching
error and the matching error obtained by assigning to the smaller region
the motion vector of the neighboring region is less than the predetermined
threshold value; and
selecting the matched motion vector as the best motion vector if:
(a) the absolute value of the difference between the smallest matching
error and the matching error obtained by using the initial motion vector
is not less than the predetermined threshold value; and
(b) the absolute value of the difference between the smallest matching
error and each of the matching errors obtained by assigning to the smaller
region the motion vector of one of the neighboring region is not less than
the predetermined threshold value.
37. The method of claim 36 wherein the steps of encoding the prediction
error and encoding the intensity of each pel comprise the steps of:
generating original pel values by:
(a) circumscribing said arbitrarily-shaped region with a rectangular region
block, thereby creating an internal pel set which lies within said
arbitrarily-shaped image and within said region block, and an external pel
set which lies outside said arbitrarily-shaped region and within said
region block; and,
(b) initializing pel values of said external pel set by extrapolating the
pel values of said internal pel set; and
calculating optimal transform coefficients (OTC) by:
(a) performing a forward transform on said region block to generate
transform coefficients;
(b) generating a transform coefficient set (TCS) from said transform
coefficients;
(c) performing an inverse transform on said TCS thereby generating a
computed region block having computed pel values;
(d) replacing those computed pel values corresponding to said internal pel
set with original pel values to form a modified computed region block
(MCRB);
(e) determining whether said TCS represents said OTC;
(f) reiterating steps (a) and (b) on said modified computed region block
and outputting said TCS when said TCS represents said OTC; and,
(g) reiterating steps (a) through (g) on said modified computed region
block when said TCS values do not represent said OTC.
38. The method of claim 37 wherein said step of performing a forward
transform comprises the step of performing a discrete cosine transform
(DCT).
39. The method of claim 38 wherein the step of generating a TCS comprises
the step of quantizing said transform coefficients.
40. The method of claim 39 wherein the step of generating said TCS further
comprises the steps of selecting and retaining those transform
coefficients which have high energy according to the energy compaction
property of transform coefficients, and zeroing the non-selected transform
coefficients.
41. The method of claim 34 wherein the steps of encoding the prediction
error and encoding the intensity of each pel comprise the steps of:
generating original pel values by:
(a) circumscribing said arbitrarily-shaped region with a rectangular region
block, thereby creating an internal pel set which lies within said
arbitrarily-shaped image and within said region block, and an external pel
set which lies outside said arbitrarily-shaped region and within said
region block; and,
(b) initializing pel values of said external pel set by extrapolating the
pel values of said internal pel set; and
calculating optimal transform coefficients (OTC) by:
(a) performing a forward transform on said region block to generate
transform coefficients;
(b) generating a transform coefficient set (TCS) from said transform
coefficients;
(c) performing an inverse transform on said TCS thereby generating a
computed region block having computed pel values;
(d) replacing those computed pel values corresponding to said internal pel
set with original pel values to form a modified computed region block
(MCRB);
(e) determining whether said TCS represents said OTC;
(f) reiterating steps (a) and (b) on said modified computed region block
and outputting said TCS when said TCS represents said OTC; and,
(g) reiterating steps (a) through (g) on said modified computed region
block when said TCS values do not represent said OTC.
42. The method of claim 41 wherein said step of performing a forward
transform comprises the step of performing a discrete cosine transform
(DCT).
43. The method of claim 42 wherein the step of generating a TCS comprises
the step of quantizing said transform coefficients.
44. The method of claim 43 further comprising the step of deciding whether
to skip said frame where said step of deciding depends upon the fullness
of a buffer which serves as an interface to said decoder.
45. The method of claim 44 wherein the step of determining in which of a
plurality of modes image data of each arbitrarily-shaped region is to be
encoded is based upon the values of the following normalized sums of
absolute differences for the particular region:
##EQU4##
where N is the total number of pels in the particular region, i is a given
pel in the region, R is the set of all pels in the particular region,
I.sub.i is the intensity of the given pel i, m is the mean value of the
intensity of all the pels in the particular region, and e.sub.i designates
the motion compensated prediction error associated with the given pel i.
46. The method of claim 45 wherein image data of the particular region is
encoded in said first mode if the value of NSAD.sub.P is less than a
threshold value c and in said second mode when the value of NSAD.sub.P
exceeds the threshold value c and is less than the value of NSAD.sub.I by
at least a threshold value b.
47. The method of claim 46 wherein said threshold values b and c depend
upon the fullness of a buffer which serves as an interface to said
decoder.
48. The method of claim 40 wherein the number of selected transform
coefficients is based upon the size of the particular arbitrarily-shaped
region being encoded.
49. The method of claim 39 wherein the step of quantizing said transform
coefficients uses a quantization step size which depends upon the fullness
of a buffer which serves as an interface to a decoder. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates generally to a method and apparatus for
coding a video sequence, and, in particular, to a region-oriented method
and apparatus for coding arbitrarily-shaped regions of video images.
BACKGROUND OF THE INVENTION
Many approaches to encoding a sequence of digital video images are known in
the art. One classical approach is to divide each frame in the sequence
into square blocks of predetermined size also known as macroblocks. Each
macroblock is then assigned a motion vector relative to a previous decoded
frame, where the motion vector represents the offset between the current
macroblock and a block of pixels of the same size in a previous
reconstructed frame that forms a best match. The motion vector is
transmitted to a decoder which can then reconstruct the current frame
based upon the previous decoded frame, the motion vector and a prediction
error. Block-based techniques, however, can lead to distortions such as
blocking and mosquito effects in low bit-rate applications.
A more complex object-oriented, or region-oriented, approach encodes
arbitrarily-shaped regions instead of rectangular or square blocks. While
block-oriented coding techniques typically transmit two parameter sets,
specifically the motion and color of each block, an object-oriented
approach requires that the shape of each region be transmitted as well in
order to allow reconstruction of the image. For example, in M. Hotter,
"Object-Oriented Analysis-Synthesis Coding Based On Moving Two-Dimensional
Objects," Signal Processing: Image Communication, Vol. 2, pp. 409-428
(1990), an encoder which encodes arbitrarily-shaped regions is presented,
where objects are described by three parameter sets defining their motion,
shape and color. A priority control determines in which of two modes the
coded information will be sent based upon the success or failure of the
motion estimation technique for a particular region. The shape coding
technique considered in the aforementioned article approximates the shape
of each region by a combination of polygon and spline representation of
the shape. U.S. Pat. No. 5,295,201 also discloses an object-oriented
encoder which includes an apparatus for approximating the shape of an
arbitrarily-shaped region to a polygon. The vertices of the polygon are
determined, and the coordinate values of the vertices are calculated and
transmitted.
One color coding technique for use in object-oriented approaches is
presented in Gilge et al., "Coding of Arbitrarily Shaped Image Segments
Based On A Generalized Orthogonal Transform," Signal Processing: Image
Communication, Vol. 1, pp. 153-180 (1989). According to the technique
disclosed in this article, an intensity function inside each region is
approximated by a weighted sum of basis functions which are orthogonal
with respect to the shape of the region to be coded. While this technique
may be theoretically useful, it is not practicable for implementation in a
real-time system.
Due to the potential advantages of an object-oriented approach, there
exists a need for an object-oriented encoder which provides powerful
schemes for segmenting an image into arbitrarily-shaped regions, each of
which has a corresponding motion vector, and for representing the segment
content in a manner which can be readily implemented for use in real-time.
It is also desirable to have an encoder which can encode a generic scene
or sequence of images, the content of which is not known beforehand, in
contrast to the requirements of the prior art. It is further desirable to
provide an encoder which permits additional functionalities, such as
tracking objects moving from one area of a scene to another between images
in a sequence.
SUMMARY OF THE INVENTION
The present invention discloses an encoder for encoding a sequence of video
frames. The encoder comprises a segmentation unit which segments a current
frame in the video sequence into a plurality of arbitrarily-shaped
regions, where each of the plurality of arbitrarily-shaped regions is
assigned a motion vector. The encoder also has a decoded frame memory for
storing a previously decoded frame in the video sequence and a prediction
unit connected to the segmentation unit and the decoded frame memory for
predicting image data of the current frame based upon a previously decoded
frame and based upon the motion vector assigned to one of the plurality of
arbitrarily-shaped regions. The encoder further comprises a region shape
coding unit for encoding the shape of each of the arbitrarily-shaped
regions.
A mode decision unit determines in which one of a plurality of modes image
data from each of the plurality of arbitrarily-shaped regions is to be
encoded. The plurality of modes comprises a first inter-frame mode in
which a motion vector associated with one of the plurality of
arbitrarily-shaped regions is encoded and a second inter-frame mode in
which a motion vector and a motion compensated prediction error associated
with one of the plurality of arbitrarily-shaped regions are encoded. A
third mode is an intra-frame mode in which the intensity of each pel in
one of the plurality of arbitrarily-shaped regions is encoded. A mode
coding unit then encodes the mode in which each of the plurality of
arbitrarily-shaped regions is to be encoded.
The encoder includes a motion coding unit for encoding motion vectors
associated with the plurality of arbitrarily-shaped regions. In addition,
the encoder comprises a region interior coder which encodes a motion
compensated prediction error associated with one of the plurality of
arbitrarily-shaped regions if the region is to be encoded in the se | | |