Efficient digital compression of 3D/stereoscopic video is achieved by a novel technique in which various views forming 3D/stereoscopic video are coded by utilizing the redundancies among the views. Coding is performed in a manner compatible with existing equipment to allowing decoding of one layer of video for display on normal (i.e., monoscopic) displays. The motion compensated discrete cosine transform ("DCT") coding framework of existing standards such as the Motion Pictures Expert Group-Phase 2 ("MPEG-2") video standard is exploited, and when necessary extended, to result in highly efficient, yet practical, coding schemes. In constrast with known techniques of encoding the two views forming stereoscopic video which rely on the use of a disparity estimate between the two views (where one of the views is the reference, coded by itself and the other is disparity compensated predicted and coded with respect to the reference view), the present techniques utilize two disparity estimates: one disparity estimate which allows forward prediction and other disparity estimate allowing backward prediction with respect to the reference view.
Methods and systems for efficient video compression by recording various state signals of cameras. In accordance with the teaching of the present invention, a video camera with means to record the movement, zooming state, focus state, and aperture state of the video camera is provided. The luminous intensity, camera identification number and frame index are also recorded. These various state signals are recorded along with video and audio signals on recording media, such as magnetic tapes, memory cards, and hard drives in a predetermined data format. Additionally, video compression algorithms, which utilize such state signals to predict the current frame from previous reconstructed images, are provided. In particular, the information on the various states of the video camera is useful in obtaining accurate motion compensated images.
In the present invention, two-dimensional images are converted into three-dimensional images by producing from a two-dimensional image signal a main image signal and a sub-image signal delayed from the main image signal. A field delay indicating how many fields are there from a field corresponding to the main image signal to a field corresponding to the sub-image signal is changed depending on the speed of the horizontal movement of the main image signal. The upper limit of the field delay is determined on the basis of vertical components of motion vectors detected from the main image signal. The field delay is so determined that it is not more than the determined upper limit.
In a stereoscopic video transmission system, video pictures of lower and enhancement layers are transmitted in a particular order such that the number of pictures which must be temporarily stored prior to presentation is minimized. Furthermore, a decode time stamp (DTS) and presentation time stamp (PTS) for each picture can be determined to provide synchronization between the lower layer and enhancement layer pictures. Decoding may occur either sequentially or in parallel. In particular, a method is presented where the enhancement layer includes disparity-predicted pictures which are predicted using corresponding lower layer pictures. The video pictures are ordered such that the disparity-predicted enhancement layer pictures are transmitted after the corresponding respective lower layer pictures. The scheme is illustrated with a number of different specific examples.
In a stereoscopic video transmission system, where an enhancement layer image is disparity predicted using a lower layer images, the lower layer image is made to more closely match the enhancement layer image by shifting the lower layer image to the right to compensate for inter-ocular camera lens separation. The motion vector search range for disparity prediction is reduced to improve coding efficiency. At an encoder, the optimal offset, x, between the enhancement layer image and the lower layer image is determined according to either a minimum mean error or a minimum mean squared error between the enhancement and lower layer images. The offset x is bounded by an offset search range X. The x rightmost pixel columns of the lower layer image are deleted, and the x leftmost columns of the lower layer image are padded to effectively shift the lower layer image to the right by x pixels to obtain the reference image for use in disparity predicting the enhancement layer image. For arbitrarily shaped images such as VOPs within a frame, the leftmost portion is deleted and the rightmost portion is padded. At a decoder, the offset value x is recovered if available and used to reconstruct the reference frame.
In order to improve coding efficiency, a segmentation circuit performs segmentation of an input picture by clustering referring to distance data calculated from pixel values of the input data, statistic information and a disparity compensated predictive picture, a statistic information calculator calculates an average and dispersion of pixel values of each segment according to pixel values of the input data and segmentation data from the segmentation circuit, a disparity calculator calculates disparity vectors of a horizontal dimension for minimizing error of the disparity compensated picture from pixel values of a reference picture, those of the input picture and the segmentation data, and a disparity compensating predictor generates the disparity compensated predictive picture.