|
Claims  |
|
|
What is claimed is:
1. In a video processing system having a video memory for storing video
data having P rows and Q columns, a method for reading and writing image
data to the video memory comprising the steps of:
logically subdividing the video data of P rows and Q columns into a
plurality of tiles having dimensions of p rows and q columns of image
data; and
mapping a given one of said tiles to a storage memory by placing the image
data located on an even numbered row of said p rows and an even numbered
column of said q columns on a first data bus and placing the image data
located at an odd numbered row of said p rows and at an odd numbered
columns of said q columns onto a second data bus.
2. The method of claim 1 further comprising the steps of:
storing the image data placed on said first data bus in a first buffer; and
storing the image data placed on said second data bus in a second buffer.
3. The method of claim 2 further comprising the steps of:
interpolating between a first set of image data stored in said first and
said second buffers and a second set of image data stored in said first
and said second buffers;
realigning, prior to said step of interpolating, said first and said second
sets of sata to reconstitute the image data as a first array and a second
array of image data respectively; and
wherein said first and said second arrays have alternating odd and even
rows of image data.
4. The method of claim 1 where said first and said second data bus together
form a video data bus and further comprising the step of configuring the
video data bus to be one of 32 or 16 bits wide.
5. The method of claim 4 wherein said video data bus is configured as a 32
bit data bus and wherein:
the step of logically subdividing the video data into the plurality of
tiles comprises the step of forming tiles having dimensions p=2 and q=2.
6. The method of claim 4 wherein said video data bus is configured as a 16
bit data bus and wherein:
the step of logically subdividing the video data into the plurality of
tiles comprises the step of forming tiles having dimensions p=2 and q=1.
7. The method of claim 1 wherein the step of mapping further comprises the
step of addressing the given tile by a single row and a single column
coordinate.
8. The method of claim 1 wherein the step of mapping further comprises
writing individual ones of the image data in any one of a plurality of
memory devices according to a value stored in a register.
9. The method of claim 1 wherein the step of mapping onto a first and a
second data bus further comprises the step of operating said first and
said second data bus on a first video clock asynchronous with a system
clock.
10. The method of claim 1 further comprising the step of interpolating
between a first set and a second set of image data.
11. The method of claim 1 further comprising the step of refreshing the
video memory.
12. In a video processing system having a video memory for storing image
data having P rows and Q columns, a device for reading and writing data to
the video memory comprising:
means, coupled to a video bus, for mapping a physical image date having P
rows and Q columns logically subdivided into a plurality of tiles having
dimensions of p rows and q columns, to a storage memory, by placing the
image data located on an even numbered row of said p rows and an even
numbered column of said q columns in a first buffer memory; and
means, coupled to said video bus, for mapping said given one of said tiles
to said storage memory by placing the image data located at an odd
numbered row of said p rows and at an odd numbered column of said q
columns into a second buffer memory.
13. The device of claim 12 wherein said video bus comprises a 32 bit video
bus and p=2 and q=2.
14. The device of claim 12 wherein said video bus comprises a 16 bit video
bus and p=2 and q=l.
15. The device of claim 12 further comprising a means, coupled to the video
memory, for refreshing the video memory.
16. The device of claim 12 further comprising a video bus arbiter for
arbitrating access to the video bus.
17. The device of claim 12 further comprising a means for interpolating
between a first set of image data stored in said first and second buffers.
18. The device of claim 12 further comprising a circular buffer coupled to
said first buffer and said second buffer for storing a first set of data
from said first buffer together with a second set of data from said second
buffer as an array of image data having alternating odd and even rows.
19. The device of claim 12, wherein the video memory includes a serial
access video memory and a DRAM, the device further comprising:
a means, coupled to said video bus, for controlling transfers between the
serial access video memory and the DRAM.
20. The device of claim 12, wherein the video memory includes a serial
access video memory and a DRAM, the device further comprising:
means, coupled to the video bus, for controlling transfers between the
serial access video memory and the DRAM.
21. The device of claim 12 wherein said device operates on a video clock
asynchronous from a system clock.
22. The device according to claim 12 further comprising:
a latch having an input coupled to an output of the video memory and an
output coupled to said storage memory; and
a latching signal pin on the device coupled to a CAS signal, wherein the
CAS signal is output from a CAS pin of the device to the video memory and
wherein the latching signal pin forms a latching signal input to said
latch.
23. In a video processing system having a video memory for storing image
data having P rows and Q columns, a device for reading and writing data to
the video memory comprising:
means, coupled to a video bus, for mapping a physical image data having P
rows and Q columns logically subdivided into a plurality of tiles having
dimensions of p rows and q columns, to a storage memory, a given one of
said tiles, by placing the image data located on an even numbered row of
said p rows and an even numbered column of said q columns in a first
buffer memory and
for mapping said given one of said tiles to said storage memory by placing
the image data located at an odd numbered row of said p rows and at an odd
numbered columns of said q columns into a second buffer memory;
a circular buffer, coupled to said first and to said second buffer memory
for storing a first set of data from said first buffer together with a
second set of data from said second buffer as an array of image data
having alternating odd and even rows;
an interpolator, coupled to said circular buffer for interpolating the
array of image data;
an arbiter, coupled to the video bus, for controlling access to the video
bus; and
a state machine, coupled to the means for mapping, to said interpolator and
to said arbiter for sequencing operation of said means for mapping, said
interpolator, and said arbiter.
24. The device of claim 23 further comprising a refresh controller, coupled
to the arbiter for refreshing the video memory.
25. A latch device for a DRAM memory interface comprising:
a latching signal pin on the DRAM memory interface coupled to a CAS signal
output from the DRAM memory interface CAS pin to the DRAM memory;
a latch having:
an input coupled to an output of the DRAM memory;
an output coupled to said interface;
a latching signal input coupled to said latching signal pin; and
wherein when said latching signal pin is disabled, said latch is
transparent. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to special purpose image compression
coprocessors and in particular to video buffers for transferring video
information between video system components.
To represent a picture image digitally, the image area is described as an
array of pixels. A digital number describes the color, luminance and
chrominance of each pixel. Pixel color information actually consists of
three digital values: one digital value for red, one digital value for
green and one digital value for blue. Thus, the sheer volume of data
needed to describe one single pixel means that digital representations of
complete picture images result in exceptionally large data files.
In full motion video, not only are large blocks of data required to
describe each individual picture image, but a new image or frame must be
presented to the viewer at approximately thirty new images per second to
create the illusion of motion. Moving these large streams of video data
across digital networks or phone lines is simply infeasible given the
available bandwidth.
Data compression is a technique for reducing the number of bits required to
send a given message. Data compression utilizes either a single shorthand
notation to signal a repetitive string of bits or omits data bits from the
transmitted message. The latter form of compression is called "lossy"
compression and capitalizes upon the ability of the human mind to provide
the omitted data. In still video, the JPEG standard is used for data
compression and defines the method by which the still image is to be
compressed. In motion video, much of the picture data remains constant
from frame to frame. Therefore, the video data may be compressed by first
describing a reference frame and describing subsequent frames in terms of
the change from the reference frame.
A reference frame can be used in three ways: forward prediction, backward
prediction and interpolation. Forward and backward prediction use a single
reference frame and describe subsequent or previous frames respectively in
terms of the difference from the reference frame. Interpolation uses both
forward and backward reference frames. The forward reference frame is
located in the data stream at an earlier point in time than the current
frame. The backward reference frame is located in the data stream at a
later point in time than the reference frame. The current frame is
calculated based on averaged differences between the first reference frame
and the second reference frame.
Several specific protocols for implementing motion compression exist.
Several of these protocols are hardware specific and developed by chip
manufacturers in the absence of accepted compression standards. Recently,
however, two accepted standards for motion video compression have emerged.
The CCITT (International Consultative Committee on Telephone and
Telegraph) uses a standard called P.times.64 (also known as H.261) for
video conferencing. The P refers to a multiplier in the range 2 to 30 and
the 64 refers to a single 64 Kbps ISDN channel for transmitting the data.
However, squeezing even this compressed data over the ISDN telephone line
requires drastic compression. Fortunately, the typical video conference
does not have much motion from frame to frame, and P.times.64 utilizes
only forward prediction over a single frame time.
To enable higher quality, full motion video, a second standard called MPEG
(Motion Pictures Expert Group) has evolved. The MPEG specifications do not
define the exact procedure for compressing the video. Rather, the standard
defines the format and data rate of the compressed output. The set of
compression tools employed by MPEG includes a JPEG-like method for
compressing intraframes, various combinations of forward, backward, and
interpolated motion compression, and subband coding for audio.
More particularly, operations according to the MPEG standard may be
summarized with reference to the following hypothetical in which the video
system wishes to describe four sequential image frames. The video
processing system first receives the first frame. This first received
frame cannot be described in terms of a reference frame and only
intraframe (i.e. non-predictive) coding is performed.
The second frame is then received. One possible implementation of the MPEG
compression standard describes this frame in terms of the first frame, or
intraframe ("I" frame) and a first forward predicted ("P") frame. However,
this first P frame is not yet defined and compression of the received
second frame is delayed until receipt of the first P frame by the
processing system. The third frame also will be described in terms of the
first I and P frames.
The fourth frame of this hypothetical example is used to form the first P
frame. The P frame is formed by predicting the fourth received frame using
the first I frame as a reference. Upon computation of the first P frame,
the motion estimation processor can process the second and third received
frames as bidirectionally predicted "B" frames by comparing blocks of
these frames to blocks of the first I and P frames. To do this processing,
the motion estimation processor first obtains a forward prediction of a
block in the received frame being processed using the first I frame as a
reference. The motion estimation processor then obtains a backward
prediction of that same block using the first P frame as a reference. The
two predictions are then averaged to form the final prediction for the
block.
In current motion estimation devices, an exhaustive full resolution pel by
pel search is performed for each block of the I or P frame. This method
requires a large bandwidth bus for transfer of the video data.
Furthermore, the processing time required to churn through the data slows
overall system speed.
SUMMARY OF THE INVENTION
The present invention provides a device, or video interface unit, for
accessing video data in a video system using video compression that
improves the throughput of the compression operations without requiring
corresponding increases in system bus bandwidth.
Digital video images are most commonly stored as an array of pixels having
P rows and Q columns. During a block matching search, the search window
most commonly moves across a video image in raster scan fashion. Thus
adjacent search windows share a common set of pixels and only a single
column and/or row of pixels change when the search window moves across an
image. The present invention provides a memory architecture for fetching
and storing video data that maps the physical video memory to a logical
space such that redundant pels need not be fetched for successive search
windows.
According to one embodiment of the present invention, the P x Q physical
array of image data is logically subdivided into tiles of p rows and q
columns. The present invention receives the even rows and columns on a
first bus and the odd rows and columns of the tile on a second data bus.
According to another embodiment of the invention the data received on the
first bus is stored in a first buffer and the data received on the second
bus is stored in a second buffer. Thus, data transfers may be interleaved.
According to yet another embodiment of the present invention, the video
data bus is configurable as either a 32 bit or 16 bit data bus. The
dimensions of the p x q tile may be adjusted accord to the selected
bandwidth of the bus.
According to still another embodiment of the invention the image data may
be processed by an interpolater and may also include circuitry to perform
a refresh of the video memories.
Other features and advantages of the present invention will be described
below.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a video system in which the video interface
system of the present invention may be included according to an embodiment
of the present invention;
FIG. 1A is a diagram showing component grid layouts for a (hpos, vpos)
coordinate system according to an embodiment of the present invention;
FIG. 1B is a diagram showing an 8 by 8 pel logical image for 32 bit
accesses according to an embodiment of the present invention;
FIG. 1C is a diagram showing logical to physical image mapping for 32 bit
mode, without pel interleaving according to an embodiment of the present
invention;
FIG. 1D is a diagram of logical to physical image mapping for 32 bit mode,
with pel interleaving according to an embodiment of the present invention;
FIG. 1E is a diagram showing 480 line CCIR 601 scan line formats according
to an embodiment of the present invention;
FIG. 1F showing 480 line CCIR-601 image storage according to an embodiment
of the present invention;
FIG. 1G is a diagram of an 8 By 8 pel logical image for 16 bit accesses
according to an embodiment of the present invention;
FIG. 1H is a diagram of a logical to physical image mapping for 16 bit mode
without pel interleaving according to an embodiment of the present
invention;
FIG. 1I and 1J are diagrams of logical to physical image mapping for 16 bit
mode, with pel interleaving according to an embodiment of the present
invention;
FIG. 2 is a block diagram of a video interface unit according to an
embodiment of the present invention;
FIG. 3 is a state transition diagram for a VIU global state machine
according to an embodiment of the present invention;
FIGS. 4A and 4B are a timing diagram for the global state machine of FIG. 3
according to an embodiment of the present invention;
FIG. 5A-5C shows an output token data transfer according to an embodiment
of the present invention;
FIG. 6 shows the logical segment to physical storage mapping of the video
interface buffer according to an embodiment of the present invention;
FIG. 7 is a block diagram of a data token buffer according to an embodiment
of the present invention;
FIG. 8 is a block diagram of a partial buffer according to an embodiment of
the present invention;
FIG. 9A is a diagram further illustrating the architecture of the buffer of
FIG. 8;
FIG. 9B is a block diagram of a DRAM data latch according to an embodiment
of the present invention;
FIG. 10 is a block diagram of an interpolator according to an embodiment of
the present invention;
FIGS. 11A and 11B are state transition diagrams for a processing state
machine of the VIU according to an embodiment of the present invention;
FIG. 12 is a state transition diagram for a video interface machine
according to an embodiment of the present invention;
FIGS. 13A-13C show a page mode read cycle timing according to an embodiment
of the present invention;
FIGS. 13D-13F show the page-mode write cycle 499 timing according to an
embodiment of the present invention;
FIG. 14 shows a state transition diagram for a refresh state machine
according to an embodiment of the present invention;
FIG. 15A shows a transition diagram of a request state machine according to
an embodiment of the present invention;
FIG. 15B shows a transition diagram for a serial memory access state
machine according to an embodiment of the present invention;
FIG. 16 shows priority rotation for a video bus arbiter according to an
embodiment of the present invention;
FIG. 17 is a state transition diagram of a video bus arbiter according to
an embodiment of the present invention; and
FIG. 18 is a state transition diagram for a buffer control state machine
according to an embodiment of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENT
Video System Overview
FIG. 1 illustrates one example of a video system incorporating a motion
estimation coprocessor 20 (MEC 20) and an image compression coprocessor. A
camera 24 receives video data inputs and a display unit 26 displays video
data. A digital video control unit 27 connected to both camera 24 and
display 26 coordinates the communication of video data between camera 24,
display 26 and a video pre/post processor 28, and a host processor via a
bus 30. Digital video control unit 27 also coordinates the display of
video graphics data retrieved from a graphics frame store memory 31
operating under the control of a graphics control processor 32.
The video system of FIG. 1 may also include audio. Audio capabilities can
be added with a microphone 33 and a speaker 34 connected to an audio
conversion circuit 35. An audio compression coprocessor 36 for compression
of audio data connects between an embedded control processor 37 and the
audio conversion unit 35.
The system of FIG. 1 operates under the control of a host processor 38,
which couples to the remaining system components via bus 30. In a
preferred embodiment, host processor 38 comprises a RISC processor, such
as for example, an Intel i960 family processor manufactured by Intel of
Santa Clara, California. Also coupled to bus 30 are a ROM 39 for storing
host processor 38 program code and a DRAM 40.
Received video data to be compressed, or uncompressed data to be displayed,
are processed by a group of devices working in tandem and known as a video
compression engine 41. Included as part of the video compression engine 41
is a video frame store 43 and an image compression coprocessor 45. Video
frame store 43 serves as a buffer memory for data input to and output from
the video compression circuitry.
The image compression coprocessor 45 performs all video compression
functions in a typical system except motion estimation, Huffman encoding
and decoding, and bit stream management. An image compression coprocessor
usable with the VIU of the present invention is described in copending
application Serial No. 08/054-950, titled "Image Compression Coprocessor"
and filed the same day herewith and incorporated by reference. Image
compression coprocessor 45 connects to bus 30 through the optional
embedded control processor 37. Control processor 37 offloads from host
processor 38 certain decompression and compression functions not
accomplished by coprocessor 45, such as Huffman coding. Alternatively,
tasks performed by control processor 37 may be performed by host processor
38.
For motion video, MEC 20 processes compressed motion video data. MEC 20
operates in conjunction with its own local memory 54 and a video
prediction store memory 55. An embodiment of MEC 20 is described in
greater detail in copending application Serial No. 08/055-711, titled
"Motion Estimation Coprocessor," filed the same day herewith and
incorporated herein by reference.
Video Interface Unit Overview
The various components of the video system of FIG. 1 may include a video
interface unit (VIU) (not shown in FIG. 1) for fetching video data from
video memories and/or for transferring information between components. For
example, the VIU may generate transfer cycles to move data between a
serial access memory (SAM) and the DRAM of video random access memories
(VRAMs). The VIU writes and reads a token of image pels to and from the
external video memories via a video bus as a series of page mode
transfers. In a preferred embodiment, the video bus comprises a 4-pel
(32-bit) bi-directional data bus and 11 bit address bit which physically
accesses a two row by two column image "tile" with each 32 bit access. In
another preferred embodiment, the video bus comprises a 2-pel (16 bit)
bidirectional data bus and 11 bit address bus which physically accesses a
two row by one column image "tile" with each 16 bit access. In the 32 bit
bus mode a 16.times.16 block of image pels is accessed as an 8 row by 8
column tile array which is physically accessed as 8-DRAMs of 8 words each.
In the 16 bit bus mode, the same 16.times.16 block of image pels is
accessed as an 8 row by 16 column tile array which is physically accessed
as 8 DRAM rows of 16 words each.
The VIU contains several user-loadable timing parameters which are used by
the VIU to generate the proper video memory interface timing. Information
concerning the configuration of the video memory system is loaded by host
processor 38 into the VIU.
To support motion prediction on a half-pel resolution as required by MPEG,
the VIU may also perform an interpolation function on the image pels read
from the video memory. Bilinear interpolation is computed on either the
horizontal or vertical axis or both.
The VIU may also integrate a refresh function which may be enabled, for
example, by host processor 38 of FIG. 1. A CAS-before-RAS refresh cycle
generated by the VIU refreshes the external VRAMs or DRAMs. Contained in
the refresh logic is a refresh timer. The period of refresh timer is
loaded by the user via a host interface.
An arbiter function arbitrates video bus requests by the page-mode pel
fetch function, the refresh function, the SAM-DRAM transfer function, and
an external bus request source. Each VIU can be programmed as the
arbitration master or a slave for the video bus. In the slave mode
operation, a daisy-chained priority scheme is also supported for the
external bus request source.
The Video Bus
According to a preferred embodiment of the invention, luminance images,
each consisting of up to 4096 by 4096 pels, may be accessed by the VIU
using either 16 or 32 bit page mode data transfers. The data width of the
video memory bus may be dynamically configured as either 32 or 16 bits as
determined by the video memory instruction executed by the VIU. If the bus
is configured as being 16 bits wide, pels are accessed on data busses 0
and 1; or 2 and 3 again, as determined by the instruction. Logical to
physical address translation of the image pels is described below. An
overview of pel transferring the 32 bit and 16 bit bus modes is also
provided below.
Pels within each video component are accessed using a logical grid of
component coordinates indexed by a hpos and a vpos descriptor field of the
token being processed by the video memory instruction. An example of a
code useful for converting logical component coordinates to logical pel
coordinates is contained in Table XIII. The commands and fields of Table
XIII are referred to throughout the discussion below.
The horizontal and vertical axis resolutions (in pels) of the logical grid
corresponding to a component "k" are determined by the contents of a
component configuration register (CONFIGk). Register CONFIGk holds the
geometric configuration of component "k" in the data tokens to be
processed. FIG. 1A shows an example grid layout 61-64 for each of the four
possible component configurations, and each grid highlights the pels
corresponding to hpos=4 and vpos=2.
The location of a component at logical coordinates (hpos, vpos) may also be
offset using full or half-pel resolution motion vectors stored in the
sfield (43:0) field of the operand token's descriptor. Motion vectors are
used by the read memory instructions RDV16FMV, RDV32FMV, RDV16BMV and
RDV32BMV prior to accessing a component in physical memory, therefore the
logical component coordinates (hpos, vpos) which are possibly offset by a
motion vector are converted into a set of logical pel coordinates which
can then be translated into physical memory addresses.
The system executes instructions to determine whether a motion vector needs
to be extracted from the operand descriptor. For RDVxFMV instructions
("x"=16 or 32), the 11 bit horizontal and vertical components of the
forward vector are obtained from sfield (43:22) of the descriptor and
copied into variable xvect and yvect respectively; for RDVxBMV. These
components are obtained from sfield (21:0). The fullpel variable is set to
"1" if motion vectors have full pel resolution and "0" if vectors have
half pel resolution and may possibly require pel interpolation. Video
memory instructions other than RDVxFMV and RDVxBMV do not reference motion
vectors and are treated as using a full pel motion vector of (0,0).
The next operation scales the motion vector components based on a
comparison of the resolution of image component "k" with the resolution of
component 0. The contents of registers CONFIGk and CONFIG0 are used to
make this comparison. A horizontal or vertical motion vector component is
halved if the corresponding horizontal or vertical dimension of component
"k" is half that of the corresponding dimension of component 0. This
scaling is consistent with the relative treatment of motion vectors for
luminance and chrominance components by the MPEG and H.261 standards. For
example, assume CONFIG0=3 and CONFIGk=0; then the dimensions of components
0 and "k" are 16 by 16 pels and 8 by 8 pels, respectively. The size ratio
of component 0 to component k is two for both horizontal and vertical
dimensions, so both motion vector components are halved.
Some additional special handling is required for half pel resolution motion
vectors. The procedure used in the code is consistent with the MPEG
standard which treats half pel motion vector components as fractional
two's complement integers. The xhalf and yhalf variables flag whether pel
interpolation is required and are respectively copied from the least
significant bit of xvect and yvect. If xhalf=1, horizontal interpolation
is needed; if yhalf=1, vertical interpolation is needed. xvect and yvect
are then each right-shifted by one bit with sign extension to give them
resolution consistent with full pel vectors.
The logical origin of the block of pels which the VIU needs to physically
access is delivered in the firstrow and firstcol variables. The number of
logical rows and columns in this block is given by numrows and numcols,
respectively. Note that this logical pel block includes all the pels which
may have to be fetched to perform horizontal or vertical pel
interpolation. For example, if a logical 16 by 16 pel image component must
be fetched using both horizontal and vertical interpolation, numrows and
numcols will both equal 17.
In the 32 bit access mode, four 8 bit video memory data busses are used to
simultaneously read or write four pels on every memory cycle, requiring
the storage for one logical image to be physically spread across memories
connected to the four data busses. These four pels form a two by two pel
square in the logical image space. Thus the 32 bit access mode segments a
logical image into non-overlapping two by two pel tiles, and each physical
memory address selects one of these tiles.
In 32 bit mode, a block of pels at logical coordinates (firstcol, firstrow)
is physically accessed as an array of two by two tiles. Pels are
individually labeled with their row:column locations as shown in FIG. 1B.
The 32 bit access mode segments this 64 pel image into a four row by four
column grid of tile. Pels from even numbered image columns are accessed on
data busses 0 or 1 and pels from odd numbered columns are always accessed
data busses 2 or 3. Pels from even numbered image rows are accessed on
data busses 0 or 2 while pels from odd numbered rows are accessed on data
buses 1 or 3. In another preferred embodiment, pels from even numbered
image rows are accessed on data buffers 1 or 3 while pels from odd
numbered rows are accessed on busses 0 or 2. VIU 100 simultaneously
accesses the four pels making up a tile by outputting off-setted versions
of the tile's row and column coordinates on the video address bus.
Thus, the VIU accesses a num.sub.-- tile.sub.-- rows by num.sub.--
tile.sub.-- cols array of tiles beginning at row address first tile row
and column address first tile col. The tiles are accessed in a
left-to-right, top-to-bottom raster scan fashion. Row addresses are
incremented by one from one row of tiles to the next; column addresses are
incremented by one within a row. The tiles within each row are accessed
using a series of page mode accesses where VIU 100 outputs the tile row
number on the video address bus followed by a succession of tile column
addresses.
When required, the RDVxFMV and RDVxBMV instructions perform pel
interpolation using the conventions established, for example, by the MPEG
specification. The interpolation is performed on pels in the logical image
domain. As an example, consider the following 3 by 3 array of pels on a
logical grid with half pel spacing:
______________________________________
P1 H P2
V B X
P3 X P4
______________________________________
Pels P1, P2, P3, and P4 are located at full pel coordinates; pels H, V, and
B are located at half pel coordinates and need to be interpolated from P1,
P2, P3 and P4. The bilinear interpolation formulas are as follows:
______________________________________
H = (P1 + P2)//2 (horizontal interpolation only)
V = (P1 + P3)//2 (vertical interpolation only)
B = (P1 + P2 + P3 + P4)//4
(bidireactional interpolation)
______________________________________
where "//" indicates integer division with rounding to the nearest integer,
with half-integer values rounded away from zero (e.g., 1.5 rounds to 2).
Video memory instructions have four parameters which are used in
conjunction with internal registers and various token descriptor fields to
translate logical coordinates into physical memory addresses for the
memory chips connected to the video bus. These parameters are memsel,
horgsel, vorgsel and corgsel.
The three bit memsel parameter is the primary means for selecting separate
image memories or memory "banks". For video memory instructions using the
32 bit access mode, memsel is output unmodified on pins VMSEL (2:0) during
a memory access. For video memory instructions using the 16 bit access
mode, the least significant two bits of memsel are output on pins VMSEL
(1:0) and VMSEL (2) is set "high". The most significant bit of memsel is
used to select which half of the 32 bit video data bus is to be used for
transferring data. The horgsel and vorgsel (parameters are used to select
a sub-image within memory memsel. The physical (x, y) coordinates of the
upper left corner of the sub-image are given by X=128* horgsel, and Y=128
, vorgsel.
The corgsel parameter selects one of four groups of memory address offset
and control registers which are initialized by host processor 38. Each of
these four groups comprise nine registers divided into three subgroups of
three registers each, with each subgroup corresponding to one of three
video components. The registers in each group and subgroup are defined as
follows (corgsel=0, 1, 2, 3; k=1, 2);
vkX[corgsel](10:0) - Group corgsel, Component k Horizontal Offset
VkY[corgsel](10:0) - Group corgsel, Component k Vertical Offset
VkLSWP[corgsel]- Group corgsel, Component k Line Swap
The VkX and VkY registers corresponding to corgsel=0 and k=0, 1, and 2 are
defined to contain the value zero; the contents of the other 30 registers
are user-definable. The contents of the VkX and VkY registers are used to
arbitrarily offset physical pel coordinates for each component within the
sub-image selected by horgsel and vorgsel. The "line swap" register,
VkLSWP, indicates if even and odd video lines (i.e., rows) are swapped on
the video busses they are accessed on.
Three single bit registers permit pels from different image components to
be physically interleaved in the same memory. Pel interleaving is
especially useful for storing UV chrominance components from YUV imagery
since these components are generally sampled from analog video in a
multiplexed fashion. These three registers are as follows:
______________________________________
C0INTLV-Component
0 Interleave
Register
V1INTLV-Component
1 Interleave
Register
V2INTLV-Component
2 Interleave
Register
______________________________________
Setting the interleave bit to "1" for a particular component enables
interleaving for that component. Generally, if pel interleaving is enabled
for one component, it is also enabled for at least one other. For example,
interleaving of UV pels in YUV images requires that the interleave bits be
set for both of the U and V components.
If bit corgsel in the VkLSWP register is "0" the VIU accesses even numbered
rows from the component k logical image on video data busses 0 and 2 and
odd numbered rows on video busses 1 and 3. If the bit is "1," even rows
are accessed on busses 1 and 3 and odd rows on busses 0 and 2. Line
swapping allows pels to be stored in memory in a fashion compatible with
the way in which they are sampled by a digitizer or read out for display
purposes.
The video memory instructions which use the 32 bit access mode are RDV32,
RDV32FMV, RDV32BMV, WRV32, and WR32.S. In this mode, the four 8 bit video
memory data busses are used to simultaneously read or wrote four pels on
every memory cycle, requiring the storage for one logical image to be
physically spread across memories connected to the four data busses. These
four pels form a two by two pel square in the logical pel coordinate
space. That is, the 32 bit access mode segments a logical image into
non-overlapping two by two pel tiles, and each (row, column) memory
address pair output by the VIU selects one of these tiles.
In 32 bit mode, a component at logical pel coordinates (firstcol, firstrow)
is physically accessed as an array of two by two tiles. As an example,
FIG. 1B shows an eight by eight pel logical image 69. Pels are
individually labeled with their row:column locations; tile coordinate axes
are shown along the top and left sides or figure. The 32 bit access mode
segments this 64 pel image into a four by four grid of tiles; tile (0,0)
is shown shaded in the figure.
FIG. 1C shows how the pels in FIG. 1A are physically mapped to memories on
the four video data busses with pel interleaving disabled. Line swapping
is disabled (i.e. VkLSWP =0) for the mappings 71 shown on the left side of
FIG. 1C and enabled for the mappings 72 shown on the right. Pels from even
numbered columns in FIG. 1B are always accessed on data busses 0 and 1 and
pels from odd numbered columns are always accessed on data busses 2 and 3.
Row accesses depend on the value of VkLSWP (corgsel).
The four pels corresponding to the tile at logical pel coordinates (0,0) in
FIG. 1B are highlighted in each half of FIG. 1C. The VIU simultaneously
accesses the four pels making up a tile by outputting offsetted versions
of the tile's row and column coordinates on the video address bus.
FIG. 1D shows the effects of pel interleaving on 32 bit accesses. Note that
the column address of a pel in FIG. 1C is multiplied by two in order to
arrive at the column address 75 of the same pel in FIG. 1D, the row
addresses are the same. That is, only columns are interleaved, not rows.
In an actual application, the missing columns in FIG. 1D are filled in
with the columns from another component.
To better see the usefulness of both pel interleaving and line swapping,
consider the examples shown in FIG. 1E and 1F. FIG. 1E shows the order in
which lines of luminance (Y) 80 and chrominance (UV) pels are structured
in each of the interlaced fields which make up a 480 line CCIR-601 image.
Within each horizontal scan line in each of the fields, 720 luminance pels
and 720 interleaved UV pels are sampled in a spatially co-sited fashion.
FIG. 1E shows how these lines are stored in four 480 row by 360 column
video memories for each of the two interlaced fields. Each memory stores a
quarter of the total number of pels in the image. In order for the video
system to properly access the chrominance pels using the 32 bit bus mode,
the UV components must have pel interleaving enabled. In addition, these
components must be accessed with line swapping enabled since even numbered
rows of UV pels are stored on busses 1 and 3 and odd numbered rows are
stored on busses 0 and 2.
The VIU addresses a num.sub.-- tile.sub.-- rows by num tile cols array of
tiles beginning at row address first tile row and column address
first.sub.-- tile.sub.-- col. The tiles are accessed in a left-to-right,
top-to-bottom raster scan fashion. Row addresses are incremented by one
from one row of tiles to the next; column addresses are incremented by one
within a row if pel interleaving is disabled and two if pel interleaving
is enabled. The tiles within each row are accessed using a series of page
mode accesses; that is, the VIU addresses a row by outputting the tile row
number on the video address bus followed by a succession of tile column
addresses.
Motion vector offsets may result in memory fetches which do not use all the
pels in the tiles read using the latter formulas; that is, a logical block
of pels might not map evenly onto the grid of tiles. In such cases, any
unuse | | |