Efficient digital compression of 3D/stereoscopic video is achieved by a novel technique in which various views forming 3D/stereoscopic video are coded by utilizing the redundancies among the views. Coding is performed in a manner compatible with existing equipment to allowing decoding of one layer of video for display on normal (i.e., monoscopic) displays. The motion compensated discrete cosine transform ("DCT") coding framework of existing standards such as the Motion Pictures Expert Group-Phase 2 ("MPEG-2") video standard is exploited, and when necessary extended, to result in highly efficient, yet practical, coding schemes. In constrast with known techniques of encoding the two views forming stereoscopic video which rely on the use of a disparity estimate between the two views (where one of the views is the reference, coded by itself and the other is disparity compensated predicted and coded with respect to the reference view), the present techniques invention utilize one disparity estimate and one motion compensated estimate. Two novel methods for combining these estimates for prediction are provided. The first method provides an average between the two estimates; the second method allows choice between various combinations resulting from prespecified weightings applied to the estimates. Such a technique, advantageously, represents a significant improvement over known techniques in achieving high-efficiency digital compression of 3D/Stereoscopic video, and advantageously is fully compatible with existing video compression standards. Furthermore, although digital broadcast service for 3D/stereoscopic television can be realized by the practice of the invention immediately, full compatibility with normal video displays is provided, allowing gradual introduction of high quality stereoscopic displays in future.
A corresponding MB2 for the left sequence is determined from the parallax vector of a coding-object MB1 of the frame Bi for the right sequence. The motion vector 2 of the corresponding MB2 is used as an origin of the search area for finding the motion vector of the coding-object MB1. As an alternative, the motion vector 3 of MB3 adjacent to the coding-object MB1 is used as an origin of the search area for finding a motion vector of the coding-object MB1. As another alternative, the motion vector 2 or 3, whichever is higher in utility evaluation, is used as an origin. As a result, both the search area and the circuit size for detecting a motion vector can be reduced. This configuration provides a stereo video data motion vector coding apparatus small in circuit size and capable of detecting a motion vector with high accuracy.
In a stereoscopic video transmission system, video pictures of lower and enhancement layers are transmitted in a particular order such that the number of pictures which must be temporarily stored prior to presentation is minimized. Furthermore, a decode time stamp (DTS) and presentation time stamp (PTS) for each picture can be determined to provide synchronization between the lower layer and enhancement layer pictures. Decoding may occur either sequentially or in parallel. In particular, a method is presented where the enhancement layer includes disparity-predicted pictures which are predicted using corresponding lower layer pictures. The video pictures are ordered such that the disparity-predicted enhancement layer pictures are transmitted after the corresponding respective lower layer pictures. The scheme is illustrated with a number of different specific examples.
A motion compensation adder for increasing a motion compensation processing speed is provided in a microprocessor having a multiply-accumulate instruction. A pixel value of a predicted picture which is expressed by an unsigned value is loaded into a register, and the most significant bit is inverted to format-convert the pixel value to a signed value with -128-offset. When hexadecimal constant 0.times.01000000 as a multiplicand, a signed error value as a multiplier and the format-converted pixel value of the predicted picture stored in the most significant byte of the register as an addition value are supplied to a multiply-accumulate instruction having a clipping function, the multiply-accumulate instruction performs the addition of the pixel value of the predicted picture and the error value and the clipping processing needed for the motion compensation adding processing by only one instruction.
An apparatus and a method for converting a two-dimensional image sequence into a three-dimensional image using a conversion of a motion disparity into a horizontal disparity and a post-processing method during the generation of a three-dimensional image are provided. The apparatus for converting a two-dimensional image sequence into a three-dimensional image according to the present invention includes a block motion measuring portion for measuring a motion vector for each block of a current image divided into blocks having a predetermined size using a previous image frame, a horizontal disparity generating portion for obtaining a horizontal disparity from the motion vector of each block according to the motion characteristic of the current image, an image generating portion for moving each block in a horizontal direction according to each horizontal disparity and generating a composite image corresponding to the current image, and an outputting portion for displaying a three-dimensional image formed of the current image and the composite image.
In a stereoscopic video transmission system, where an enhancement layer image is disparity predicted using a lower layer images, the lower layer image is made to more closely match the enhancement layer image by shifting the lower layer image to the right to compensate for inter-ocular camera lens separation. The motion vector search range for disparity prediction is reduced to improve coding efficiency. At an encoder, the optimal offset, x, between the enhancement layer image and the lower layer image is determined according to either a minimum mean error or a minimum mean squared error between the enhancement and lower layer images. The offset x is bounded by an offset search range X. The x rightmost pixel columns of the lower layer image are deleted, and the x leftmost columns of the lower layer image are padded to effectively shift the lower layer image to the right by x pixels to obtain the reference image for use in disparity predicting the enhancement layer image. For arbitrarily shaped images such as VOPs within a frame, the leftmost portion is deleted and the rightmost portion is padded. At a decoder, the offset value x is recovered if available and used to reconstruct the reference frame.