|
Claims  |
|
|
We claim:
1. A programmable data formatter responsive to pixel data triples
representing three different color intensities and producing therefrom
individual pixel intensity values to be stored in fewer bit-planes of a
frame buffer than there are bits in the pixel data triples, the data
formatter comprising:
a first pipeline stage that captures first and second portions of a pixel
data triple, truncates a third portion thereof by shifting it a first
selected amount and then captures the truncated third portion as a third
individual pixel intensity value;
a second pipeline stage coupled to the captured contents of the first
pipeline stage, that captures the first portion of the pixel data triple
therein, truncates the second portion of the pixel data triple therein by
shifting it and the truncated third portion together as an intermediate
unit by a second selected amount and then captures the shifted
intermediate unit as the third and a second individual pixel intensity
values;
a third pipeline stage coupled to the captured contents of the second
pipeline stage, that shifts those contents as a final unit by a third
selected amount and then captures that shifted final unit as the third,
second and a first individual pixel intensity values; and
register means coupled to the first, second and third pipeline stages for
containing bit patterns corresponding to the first, second and third
selected amounts.
2. A programmable data formatter responsive to pixel data triples having D
in number of, E in number of and F in number of bits respectively
representing three different color intensities, D, E and F each being
integers, and producing therefrom respective individual pixel intensity
values having d in number of, e in number of and f in number of bits to be
stored in a plurality, (d+e+f) in number, of bit-planes of a frame buffer
having a number of bit-planes at least equal to the sum (d+e+f), d, e and
f each being integers also, and (d+e+f)<(D+E+F), the programmable data
formatter comprising:
rendering means for producing pixel data triples having D in number, E in
number and F in number of bits respectively representing different color
intensities;
a first register of ordered bits, (D+E+F) in number, and having respective
inputs and outputs for each bit, E in number of those inputs respectively
coupled to the E in number of bits of color intensity and F in number of
those inputs respectively coupled to the F in number of bits of color
intensity;
a first shifting means, having inputs respectively coupled to the D in
number of bits of color intensity and having outputs respectively coupled
to D in number of the inputs to the first register means, for causing a
contiguous subset of d in number of the D in number of bits of color
intensity to be applied to a first contiguous subset of the D in number of
inputs of the first register, 0.ltoreq.d.ltoreq.D and the first contiguous
subset starting with the least significant input among the D in number of
first register inputs in the ordering thereof;
a second register of ordered bits, (D+E+F) in number, and having respective
inputs and outputs for each bit, F in number of those inputs respectively
coupled to the F in number of outputs of the first register;
a second shifting means, having inputs respectively coupled to the D in
number of and E in number of outputs of the first register and having
outputs respectively coupled to the D in number of and E in number of
inputs of the second register, for causing a continguous subset of (d+e)
in number of the D in number of and E in number of outputs of the first
register to be applied to a second contiguous subset of the D in number of
and E in number of inputs of the second register, 0.ltoreq.e.ltoreq.E and
the second contiguous subset starting with the least significant input
among the D in number of and E in number of second register inputs in the
ordering thereof;
a third register of ordered bits, having at least (d+e+f) in number of such
ordered bits, 0.ltoreq.f.ltoreq.F, and having respective inputs and
outputs for each bit;
a third shifting means, having inputs respectively coupled to the D in
number of, E in number of and F in number of outputs of the second
register and having outputs respectively coupled to the at least (d+e+f)
in number of inputs of the third register means, for causing a contiguous
subset of (d+e+f) in number of outputs of the second register to be
applied to a third contiguous subset of the at least (d+e+f) in number of
inputs of the third register, the third contiguous subset starting with
the least significant input among the at least (d+e+f) in number of third
register inputs in the ordering thereof; and
a frame buffer of at least (d+e+f) in number of bit planes respectively
coupled to the at least (d+e+f) in number of bits of the third register.
3. A programmable data formatter as in claim 2 further comprising
programming register means having outputs coupled to the first, second and
third shifting means for determining the values of d, e and f.
4. A method of storing a plurality of rendered pixel data values, each
associated with the same pixel address, in a selectable and variable
number of frame buffer bit-planes fewer than the number of bits in the
plurality of rendered pixel data values; the method comprising the steps
of:
selecting a first number of most significant bits from a first pixel data
value that is a member of the plurality of rendered pixel data values;
selecting a second number of most significant bits from a second pixel data
value that is a member of the plurality of rendered pixel data values;
and so on until a number of most significant bits from each member of the
plurality of rendered pixel data values has been selected;
concatenating the first, second, and so on, numbers of selected most
significant bits into a unified data field of selectable and variable
location in a data word having the same number of bits as bit-planes in a
frame buffer; and
storing the data word in an address in the frame buffer, each bit of the
data word being stored in one bit of an associated bit-plane of the frame
buffer.
5. A method of double buffering images stored in a frame buffer having M in
number of bit-planes, M being an integer, the method comprising the steps
of:
a. partitioning the frame buffer into a first sub-buffer of D in number of,
E in number of and F in number of bit-planes, D+E+F=J, and a second
sub-buffer of D' in number of, E' in number of and F' in number of
bit-planes, where each of D, E, F, D', E', F' and J are integers, and
D'+E'+F'=K, D=D', E=E', F=F' and J+K.ltoreq.M, K being an integer also;
b. logically ordering the first and second sub-buffers as a set {D in
number: D' in number: E in number: E' in number: F in number: F' in
number} of bit-planes;
c. rendering a plurality of pixel data values d, e and f each associated
with a same pixel that is part of a first image;
d. selecting as U the (D+D') in number of most significant bits of d;
e. concatenating U with e and f to obtain the logical order {U;e;f};
f. selecting as V from the result of step e the (E+E') in number of most
significant bits of e;
g. concatenating U, V and f from the results of steps e and f to obtain the
logical order {U;V;f};
h. selecting as W from the results of step g the (F+F') in number of most
significant bits of f;
i. concatenating U, V and W from the results of steps g and h to obtain the
logical order {U;V;W}, thereby obtaining the equivalent logical order
{D;D';E;E';F;F'};
j. enabling only the D in number of, E in number of and F in number of
bit-planes associated with the first sub-buffer; and then
k. subsequent to step j, performing a write memory cycle from the logical
order obtained in step i, whereby reduced precision data values of the
same pixel that is part of the first image is written into the first
sub-buffer;
l. repeating steps c through k until the first image has been stored in the
first sub-buffer; f' f' each associated with another same pixel that is
part of second image;
n. Selecting as U' D in number of most significant don't care bits
conjoined as least significant bits by D' in number of most significant
bits of d';
o. concatenating U' with e' and f' to obtain the logical order {U';e';f'};
p. selecting as V' from the result of step o E in number of most
significant don't care bits conjoined as least significant bits by the E'
in number of most significant bits of e';
q. concatenating U', V' and f' from the results of steps o and p to obtain
the logical order {U';V';f'};
r. selecting as W' from the result of step q F in number of most
significant don't care bits conjoined as least significant bits by the F'
in number of most significant bits of f';
s. concatenating U', V' and W' from the results of steps q and r to obtain
the logical order {U';V';W'}, thereby obtaining the equivalent logical
order {D.sub.don't care ;D';E.sub.don't care ;E';F.sub.don't care ;F'};
t. enabling only the D' in number of, E' in number of and F' in number of
bit planes associated with the second sub-buffer; and then
u. subsequent to step t, performing a write memory cycle from the logical
order obtained in step s, whereby reduced precision data values of the
other same pixel that is part of the second image is written into the
second sub-buffer; and
v. repeating steps m through u until the second image has been stored in
the second sub-buffer.
6. A method as in claim 5 wherein each of the pairs of selecting steps (d
and n), (f and p) and (h and r) is accomplished by shifting operations
performed by respective first, second and third barrel shifters.
7. A method as in claim 6 further comprising the steps of storing the
number of shifts each barrel shifter is to perform into an associated
control register. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND AND SUMMARY OF THE INVENTION
A modern high-performance graphics workstation suitable for solid modelling
must incorporate a number of features to provide high speed rendering of
objects while at the same time remaining affordable. Experience shows that
the tasks to be accomplished are so numerous and often so complicated that
special purpose dedicated hardware is a necessity if useful images are to
be rendered and manipulated with adequate speed. Furthermore, it would be
desirable if the feature set of the dedicated hardware were flexible and
reconfigurable according to the firmware and software subtasks arising
from the user's high-level activities. The techniques disclosed herein
reduce costs, increase performance and add flexibility.
One major aspect of the invention involves the concept of cache memory.
This is a technique often used in high-performance computer systems to
increase the speed with which the CPU can access data stored in a main
memory. The idea is to use a small high speed memory to replicate the
contents of a selected region of the main memory. The CPU does its memory
accesses to the cache, which either does or does not contain data
representing the desired location in main memory. If the cache does
contain the data for the desired location the fast access to the cache
acts in the place of a slower access to the main memory. This is called a
"hit." If, on the other hand, the cache does not contain data for the
address to be accessed, then the contents of the cache must be changed to
reflect that part of main memory that does contain the desired data. This
is called a "miss," and involves writing the current contents of the cache
back into main memory (unless the current content of the cache was never
modified) and then loading the cache with data in the main memory taken
from the vicinity of the new address of interest. To facilitate this
architecture there is usually a wide data path between the cache and the
main memory.
A hardware pixel processor in a graphics system is essentially a CPU that
needs to write data into a memory. In this case the memory is called a
frame buffer, and it has an address for each pixel component of the
display. The frame buffer is also accessed by another mechanism that reads
the contents of the frame buffer to create the corresponding pixel by
pixel image upon a monitor. Typically, the monitor will be a color CRT
with red, green and blue (RGB) electron guns whose intensities are varied
by discrete steps to produce a wide range of colors. Accordingly, the
frame buffer is divided into portions containing multi-bit values for each
color of every pixel. The preferred way to do this is to organize the
frame buffer into "planes" which each receive the same address. Each plane
holds one bit at each address. Planes are grouped together to form
multi-bit values for the attributes of the pixels they represent.
Attributes include the RGB intensities, and in many systems ON and OFF for
pixels in an "overlay" plane that is merged with data in other planes. For
instance, an overlay plane might contain a cursor, and the presence of a
bit in the overlay plane might force saturation intensity for all three
electron guns, regardless of the actual RGB values for that pixel. In
graphics systems with two-dimensional displays that are intended for use
with solid modelling of three-dimensional objects, there is frequently
another attribute that is stored for each pixel: its depth. Hardware
storage of depth values greatly facilitates hidden surface removal, as it
allows the hardware to automatically suppress pixels that are not upon the
outer surface facing the viewer.
In accordance with what has been described above, it is not unusual to find
graphics systems with between twenty-four and forty planes of frame buffer
memory: perhaps three sets of eight for RGB values and sixteen or more for
Z, or depth, values. Considering that the monitor could easily be 1280
pixels wide and 1024 pixels high, and that refreshing the display at a
power line frequency of sixty Hertz is a requirement, it can be concluded
that a new pixel of twenty-four or more bits (and possibly qualified for
depth) must be obtained for the monitor from the frame buffer at a rate of
approximately one pixel every nine nanoseconds. To some extent the advent
of so called "video display RAM's" has made this easier to do. They have
special high speed ports that read blocks of data at high speed for use by
a shifter that, when grouped with the shifters of other planes for the
same color, produce the multi-bit values for color intensity. These
multi-bit values are applied to digital-to-analog converters (DAC's) that
in turn generate the signals that actually drive the electron guns.
Despite the video RAM's, formidable problems remain concerning the task of
getting the data into the frame buffer in the first place. In the long
run, the graphics system will not be able to manipulate an image (draw it,
rotate it, cut a hole in it, etc.) any faster than the image can be put
into the frame buffer. The speed with which this can be done is one
important aspect of "high performance" in a graphics system. Recalling the
purpose for caching in a conventional computer system, it will be noted
that there is a certain similarity. It would be desirable if a way could
be found to cache pixels into a high speed memory and reduce the number of
write operations made into the frame buffer. If this could be done without
sacrificing other desirable features it would significantly increase the
rate with which data could be put into the frame buffer. This is indeed
desirable, since much work has been done to develop and perfect dedicated
hardware to generate at high speed pixel values from a more abstract
description of the image to be rendered.
In the invention to be described each plane of frame buffer memory is
equipped with a corresponding plane of a pixel cache. The pixel rendering
hardware stores computed pixel values into the frame buffer by way of the
cache. Those familiar with pixel rendering mechanisms will appreciate that
the order in which pixels are calculated is not necessarily related to the
order they are accessed for use in driving the monitor, which is typically
vertically by horizontal rows for a raster scanned CRT. Instead, pixels
are apt to be generated in an order that makes sense in light of the
techniques being used to represent the object. A wire frame model would
rely heavily on the drawing of arbitrarily oriented vectors, while shaded
polygons would rely heavily upon an area fill based on successive
horizontal lines of pixels. For a curved surface the successive horizontal
lines are apt to be fairly short, may be of varying lengths and might not
line up exactly above or beneath each other. Clearly, the preferred pixel
rendering techniques are no respecters of sequentially addressed memory
spaces! Yet the sequence of generated pixels are still strongly related by
just more than being consecutive members in some order of pixel
generation; their locations in the final image are physically "close" to
each other. That is, sequentially generated pixels are apt to posses a
shared "locality." That this is so has been noticed by others, and has
been termed the "principle of locality." It seems clear that to maximize
the number of hits, a cache for a frame buffer ought to operate in view of
the principle of locality. But it is also clear that a different type of
locality obtains for area fill operations than does for arbitrary vectors.
A "tile" is a rectangular collection of pixels. Various schemes for
manipulating pixels in groups as tiles have been proposed. It would seem
that what a pixel cache for a frame buffer ought to do, at least in part,
is cache a tile. But again, the tile shape best suited for area fill
operations would be one that is one pixel high by some suitably long
number of pixels. The optimum tile shape for the drawing of arbitrary
vectors can be shown to be a square. So what is needed then, is a pixel
cache whose "shape" is adjustable according to the type of tile best
suited for use with the type of pixel rendering to be undertaken.
That object can be achieved by a pixel cache, frame buffer controller and
frame buffer memory organization that cooperate to implement a cache
corresponding to a tile of adjustable rectangular dimensions. The frame
buffer memory organization involves dividing the frame buffer into a
number of separately addressable groups. Each group is composed of one or
more bits. Along the scan lines of the raster groups repeat in a regular
order. Successive scan lines have different starting groups in the pattern
of repetition. Thus, whether a tile proceeds horizontally along a scan
line, or vertically across successive scan lines, different groups are
accessed for the pixels in that tile. This allows the entire tile to be
fetched with one memory cycle. In such a scheme adjacent pixel addresses
do not necessarily map into adjacent frame buffer addresses, as in
conventional bit-mapped displays. Instead, an address manipulator within
the frame buffer controller converts a pixel address (screen location)
into a collection of addresses (one for each group) according to rules
determined by the shape of the tile to be accessed.
Each plane of the frame buffer memory includes a sixteen-bit plane of an
RGB pixel cache and a sixteen-bit plane of a Z value cache. (It will be
understood, of course, that the number sixteen is merely exemplary, and is
not the only practical size of pixel cache.) For each bit in a pixel's RGB
values, the pixel's (X, Y) location on the monitor is mapped into the
proper location of the plane of the RGB cache associated with that bit. If
there is a hit, then the pixel is written to the cache. If there is a
miss, then the cache is written out to the frame buffer in accordance with
a replacement rule similar to those used with so-called "line movers" or
"bitblts." The replacement rule uses sixteen-bit registers named SOURCE,
DESTINATION and PATTERN. There is one of these registers for each plane of
frame buffer memory. At the time of the preceding miss, each DESTINATION,
and not the cache, was loaded with a copy of that region (tile) of the
frame buffer that the cache was then to represent. Data was then written
to the cache until there was a miss. Then the frame buffer controller
simultaneously copied all of the bits of each plane in the cache into each
SOURCE; this frees the cache for immediate use in storing new pixel
values. The frame buffer controller proceeded to combine each SOURCE with
its associated DESTINATION according to the desired rule (OR, AND, XOR,
etc.). The result was further modified by the associated PATTERN, which
can be used to impose special deviations upon the pixel data. For example,
PATTERN might suppress a regular succession of pixels to create "holes"
into which might later be placed pixels of another object, thus creating
the illusion of transparency. However achieved, the result is written, all
sixteen bits in parallel, for each plane, to the frame buffer. The mapping
of pixel addresses into the cache and the parallel write into the frame
buffer (i.e., the mapping of the cache contents back into frame buffer
addresses) are automatically adjusted according to the size and shape of
the tile being handled. Thus, one aspect of the invention to be disclosed
is a pixel cache memory that accepts programmatically variable tile sizes.
It will be further understood as the description proceeds that the tiles
may be aligned on selected pixel boundaries, and that those boundaries
need not be permanently fixed in advance.
A second major aspect of the invention concerns what is commonly referred
to as the Z buffer. In a conventional graphics system the Z buffer is a
memory, separate from the frame buffer, holding the Z (depth) value of
each pixel. In a high-performance graphics system the Z values are
typically sixteen-bit integers. Thus the conventional Z buffer would, like
the frame buffer, have an address for each pixel. The second major aspect
of the invention allows a more efficient use of memory by making each
plane of the frame buffer larger than is necessary merely to hold the RGB
values for pixels. Each plane of frame buffer memory contributes memory
that can be associated with other such contributions to form all or a
portion of a Z buffer. Furthermore, entire planes of what might otherwise
be frame buffer memory can be allocated to the Z buffer. At root, what is
taught is a very flexible division of available frame buffer memory into
an RGB buffer portion and a Z buffer portion. Said another way, the Z
buffer can be made any size and located anywhere in the frame buffer
memory through the use of a Z buffer mapping.
If it should be the case that the amount of available memory for the Z
buffer is less than enough to hold a sixteen-bit integer for each pixel
(and in a preferred embodiment this is frequently the case), then hidden
surface removal is performed in sections. For example, if there has been
only enough memory allocated to the Z buffer to correspond to one fourth
of the frame buffer, then the rendering of an image is divided into four
similar activities. First, an initial fourth of the display is created.
This might be a top-most horizontal strip, or a left-most vertical strip,
or any suitable fourth of the display. Pixels that are to reside in the
selected fourth of the display are rendered. As the RGB values for those
pixels are calculated, so are their Z values.
The existence of a hidden surface implies that there are some addresses in
the frame buffer to which more than one RGB value corresponds; each pixel
is associated with a different surface (or at least a different portion of
the same surface). Absent any special control to the contrary, the various
pixels will be calculated in some order related to the way the object has
been described to the graphics system's software and the rendering
algorithms in use. As each of the multiple pixels corresponding to an
address is rendered its RGB and Z values would overwrite the previous
values. Hidden surface removal at the hardware level with a Z buffer
compares the Z values of the conflicting pixels and allows the one with
the least depth to prevail. That is, the Z value of a new pixel in hand
for a certain address is compared with the Z buffer value for the pixel
already in that address. An old pixel's values are overwritten by the new
values if the old pixel is on a hidden surface to be removed, as indicated
by the comparison of the Z values. An additional feature of the invention
in this connection is the ability to programmatically decide what to do in
the event the new and previous Z values are equal.
To continue with the example, the above process is carried out for all
pixels in the fourth of the display being generated. Following that, the Z
buffer is allocated to represent the next fourth of the display, and the
process is repeated until the entire display has been created afresh. This
process might take several seconds if the image is extremely complicated
and there is but a very small Z buffer. On the other hand, it only has to
be done once for each presentation of a new image to the frame buffer, and
not once for each refresh of the image from the frame buffer to the
monitor.
The above described technique for hidden surface removal with a Z buffer
that corresponds to less than the full frame buffer is termed "strip Z
buffering." Strip Z buffering requires some cooperation from the software
that tells the graphics hardware what to draw. It will be appreciated that
the image to be rendered is described in a data structure called a display
list and resembling a data base. A simplified description of the graphics
system software is that it interacts with the user to get into the form of
a display list an object he desires to display and then manipulate. The
display list describes the object in the abstract. Any particular view of
the object must be derived from that abstract description through
specifying from were to view it, where the clip limits are, where the
light sources are, etc. This information is used to decide what pixels are
needed to form the image on the monitor. If strip Z buffering is in use,
then the software that makes that decision (the derivation mentioned
above) must also know what region of the screen corresponds to the
location of the Z buffer (i.e., where the "strip" is). During the
traversal of the display list it must decline to generate pixels for
regions not in the strip. Then it must traverse the list again with the
new strip, and so on until the entire object has been rendered. In a
preferred embodiment the software already knows where the Z buffer is
because it controls that, too; the Z buffer may be programmatically
located anywhere in the frame buffer.
When hidden surface removal is in effect the pixel processing mechanism
that creates individual RGB values for pixels also simultaneously creates
the Z value. The Z values need to be stored into the frame buffer at the
same overall rate as the RGB values. The Z values are stored via a
sixteen-bit cache memory (with one plane per plane of frame buffer memory)
that are very similar to the one that caches the RGB values. Recalling
that the Z values are themselves sixteen-bit values, one might be tempted
to conclude that all sixteen bits of a Z value are stored in the same
plane of (excess) frame buffer memory, and that when that is full then the
next Z value goes into the excess portion of the next plane. That is not
done since it would require the addresses of the Z values to map into
various planes of the Z cache, which is a major architectural feature not
having a counterpart in the RGB cache. Since the cache mechanism is part
of a VLSI chip, two instances of the same architecture is far more
desirable than two separate ones. Another important consideration involves
the number of planes of frame buffer memory fabricated in a frame buffer
memory assembly. The available number of planes of frame buffer memory
buffer memory assembly. The available number of planes of frame buffer
memory will be a multiple of that number, which in the preferred
embodiment is eight. The preferred embodiment to be described adopts a Z
buffer mapping into the excess portions of the frame buffer that spreads
the sixteen bits of each Z value out across eight planes of frame buffer
memory. This mapping must be flexible and programmatically determinable,
since the total number of planes of frame buffer memory can vary (in
increments of eight) according to the way the user configures the graphics
system (recall the discussion of strip Z buffering).
To cache Z values, the one or more groups of eight planes of frame buffer
memory are allocated to the Z buffer. Entire planes can be allocated, or
just the "excess" not used as RGB buffer. A minimum of one group of eight
must be allocated, implying that a minimum system configuration must
include eight planes of frame buffer memory. (This is certainly no
impractical requirement for a color system; a typical system would have
twenty-four planes of frame buffer memory, although less is possible.) A
group of eight planes of frame buffer memory used for the Z buffer can be
either eight "excess" portions not used as RGB buffer, or eight full
planes used solely for the Z buffer.
Each sixteen-bit plane in a Z buffer cache in a group of eight receives two
bits at a time from a Z value. In this way the entire sixteen-bit Z value
is cached in eight two-bit portions of Z buffer cache memory. When there
is a miss each plane of the Z cache is written out to an associated Z
WRITE register, from whence it is written to the Z buffer. The Z WRITE
registers may have to contend with the SOURCE registers for access to the
frame buffer (the Z cache fills at a rate twice that of the RGB cache, so
sometimes there will be contention, other times not). Thus, the transfer
from the Z cache to the Z WRITE registers frees the Z cache to begin
accepting new Z values immediately.
The display list will previously have been divided into portions that
correspond to the one or more groups of eight planes of frame buffer
memory. The display list portions are required to remain within the
boundaries of their associated strips. As the portions of the divided (and
perhaps even regrouped) display list are traversed, Z values are written
into the Z buffer. The order of these write operations is display list
dependent; some Z buffer locations may never be writted to, while others
may be written into more than once as hidden surface removal proceeds.
Eventually, the traversal of the display list portion is complete. If
there is another strip to construct, the mapping of the one or more groups
of eight is changed to reflect the next strip and the next portion of the
display list is traversed; otherwise the entire display has been
constructed and the strip Z buffering process has been concluded.
Another aspect of the invention concerns a programmatically variable
mapping between the pixel interpolator and the planes of frame buffer
memory. It is desirable in a system with three color interpolators but
with only a minimum number of planes in the frame buffer to be able to
select between one interpolator computing shades of gray and three
interpolators independently computing red, green and blue values. In
addition, to facilitate double buffering it is desirable to control the
mapping of pixel data bits into the frame buffer. Such a controlled
mapping can be obtained through the use of a pipelined data shifter
controlled by a register partitioned into values encoding the number of
shifts to be performed at each level of the pipeline. In a related aspect
of the invention the pipelined shifter allows the programmatic selection
of the number color intensity bits for each color that will be stored in
the frame buffer.
Still another aspect of the invention concerns the way the color map is
updated. A color map is used to create an arbitrary (or nearly so)
correspondence between the R, G and B values stored in the frame buffer
and the digital values actually sent to the R, G and B DAC's. This allows,
for example, a four-plane R value to be mapped into any sixteen of the
2.sup.8 values the DAC can accept. The total number of red values has not
increased, but they have been dispersed over the range of color resolution
available. This would be desirable in systems that either didn't have very
much frame buffer memory in the first place, or where not very many
different red colors were wanted, and frame buffer memory was de-allocated
from R value duty and used to advantage somewhere else.
From time to time the graphics system changes the color map. In
conventional systems this is accomplished without any special concern for
when it is done. This can cause display artifacts in two ways. First, the
read activities of most RAM's are distrubed during write operations. This
causes loss of the mapping action during the update, temporarily resulting
in arbitrary colors in random locations. Second, some rather peculiar
(albeit transient) colorizations can result if the color map is changed
within the duration of a raster presentation; part of the screen would be
mapped one way while part of the screen would be mapped another way. This
is almost sure to be the case because of the difficulty in synchronizing
the activities of the graphics system's CPU with raster generation by the
monitor. In the preferred embodiment the color map and the overlay map are
periodically updated from a shadow RAM during vertical retrace, whether or
not the shadow RAM has been changed. The graphics system updates the
shadow RAM in place of the conventional updates to the color and overlay
map RAM's, which are then subsequently updated from the shadow RAM during
vertical retrace.
In a further aspect of the invention the shadow RAM comprises first and
second portions, each of which is large enough to update both the color
map and the overlay RAM. After a certain number of frames of raster
generation, (say, eight) the color map and overlay map are updated from
the first portion. After eight more frames they are automatically updated
from the second portion. After another eight they are again updated
automatically from the first portion, and so on. Now suppose that the
first and second portions contain certain symmetrically different
information. A cursor in an overlay plane could be made to blink between
any two colors by the change to the overlay RAM. Or, some object in the
RGB planes could be made to blink by having alternating colors assigned to
it by the first and second portions of the color map. If the first and
second portions of the shadow RAM are made identical then no blinking is
induced by the overlay or color maps.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a simplified pictorial representation of a computer graphics
system incorporating the principles of the invention.
FIGS. 2A-C are a simplified block diagram of a portion of the computer
graphics system of FIG. 1.
FIG. 3 is a simplified block diagram of a tile address/data MUX circuit of
FIG. 2A.
FIG. 4 is a representation of a frame buffer memory organization used in
implementing programmable tile sizes.
FIGS. 5A-B illustrate the correspondence between pixel locations on the
monitor according to pixel addresses and their location in the frame
buffer memory in accordance with the memory organization of FIG. 4.
FIG. 6 is a diagram illustrating how the organization of the frame buffer
memory accommodates 16.times.1 tiles.
FIG. 7 is an example of how a specific 16.times.1 tile is stored according
to the frame buffer memory organization of FIG. 6.
FIG. 8 is a diagram illustrating how the organization of the frame buffer
memory accommodates 4.times.4 tiles.
FIG. 9 is an example of how a specific 4.times.4 tile is stored according
to the frame buffer memory organization of FIG. 8.
FIGS. 10A-F are an abbreviated schematic diagram of an address manipulator
used in implementing the frame buffer memory organization of FIGS. 6-9.
FIGS. 11A-B are a simplified block diagram of the mechanism used to refresh
the monitor of FIG. 2C from the frame buffer memory organization of FIGS.
5A-B.
FIG. 12 is a simplified block diagram of the RGB cache of FIG. 2B.
FIG. 13 is a block diagram illustrating the operation of group rotator and
unrotator circuits used in the block diagram of FIG. 12.
FIG. 14 is a simplified block diagram of the Z cache of FIG. 2B.
FIG. 15 is an illustration of how a Z buffer is mapped into the frame
buffer memory assembly of FIG. 2B.
FIG. 16 is a simplified block diagram of a three level pipelined shifter
used to programmably truncate and steer pixel data fields from a pixel
interpolator into a combined format to be stored in the frame buffer of
FIG. 2B.
FIG. 17 is a block diagram of a portion of the color map assembly of FIG.
2C, and illustrates the operation of a shadow RAM for updating the
contents of the color map and overlay map RAM's.
DESCRIPTION OF A PREFERRED EMBODIMENT
1. Introduction
Refer now to FIG. 1, wherein is shown a pictorial representation of an
actual graphics system embodying various aspects of the invention. In
particular, the graphics system includes a computer 1 (which may be a
Hewlett-Packard Model 9000 Series 320 Computer), a keyboard 2, knob box 3,
button box 4, mouse 5, graphics accelerator 7 (which may be a
Hewlett-Packard Model 98720A) and a color monitor 9. The computer 1
executes the software of the graphics system. That software includes the
user interface and the preparation of the display list, which might be
based either upon a B-spline description of the surface to be displayed or
upon a wire frame model. The computer 1 is coupled to the graphics
accelerator 7 through a high speed local graphics bus 16. The graphics
accelerator 7 is in turn coupled to the color monitor 9 through three
coaxial cables for carrying the Red, Green and Blue (RGB) video signals.
To render an image that has been described to the graphics system, the
graphics software traverses the display list and sends values representing
surface patches in a parameter space and/or vector endpoints to the
graphics accelerator 7. In the case of a B-spline description the
transmitted values are processed by microcode in the graphics accelerator
7 to obtain the (X, Y, Z) locations and colors for the vertices of
polygons that approximate each patch. It is then the job of a pixel
interpolator within the graphics accelerator to calculate and write into
the frame buffer all of the pixel values describing the entire polygon
surface, including multi-axis interpolation of the colors for shading
during the fill operation, and including Z axis interpolation and hidden
surface removal. In the case of a wire frame model the tasks are similar,
except that (1) the parameter space description is absent in favor of
vector end points in (X, Y, Z) space, and (2) instead of generating filled
polygons the graphics accelerator creates a continuous and
color-interpolated vector. In either case, as the calculated pixel values
are written into the frame buffer they become visible upon the monitor 9.
The frame buffer that is divisible into RGB and Z buffer portions, the RGB
pixel and Z buffer caches, circuitry for implementing the different tile
sizes, the Z mapping circuitry, and the shadow RAM, are all located in the
graphics accelerator 7. The graphics accelerator 7 is, of course, the item
with which we shall be principally concerned throughout the remainder of
this Specification.
FIGS. 2A-C show a simplified block diagram of the graphics accelerator 7. A
Data Input Output Bus (DIO Bus) 6 within the computer 1 is coupled to an
interface 10, from whence it emerges as a Local Graphic Bus 16 (LGB). The
LGB 16 is a communication path for data and instructions between the
computer 1 and the graphics accelerator 7, and between the various
mechanisms within the graphics accelerator 7. Among the mechanisms within
the graphics accelerator 7 are a transform engine 11, a scan converter 12,
a frame buffer controller 13, one or more frame buffer assemblies 14i-iv,
and a color map assembly 15. The output of the color map assembly 15 is
the three RGB video signals 8 that drive the color monitor 9.
The purpose of the transform engine 11 is to receive sections of the
display list as it is traversed by the graphics software executing in the
computer 1 and convert those into sequences of device coordinates.
Basically, these are pixel values (X, Y, Z, R, G, B) for either vector
endpoints or polygon vertices. These device coordinates are output upon a
device coordinate bus 17 that is coupled to the scan converter 12.
The purpose of the scan converter 12 is to calculate by interpolation
additional pixel values for pixels between vector endpoints or along the
edges and within polygons. To this end the device coordinates are buffered
in a Device Coordinate (DC) RAM 20, from which they are available to a
high-speed pixel interpolator 21. The resulting sequence of Z values is
separated and output on a Z bus 19. The RGB color values are compressed
and formatted by a color data formatter circuit 89, whereupon they pass
via a tile address/data MUX circuit 22 onto a pixel bus 18 that carries in
multiplexed form both pixel data values and pixel address values.
The pixel color data formatter 89 allows the programmatic steering of a
selected number of red pixel value bits, a selected number of green pixel
value bits, and a selected number of blue pixel value bits into
correponding planes of the frame buffer memory. This programmability is
combined with the necessary conversion of the high precision pixel color
values down to the precision that will actually be used in controlling the
electron guns in the CRT.
The tile address/data MUX circuit 22 is programmable to recognize different
tile sizes and shapes. By its multiplexing action it reduces the number of
lines needed in the pixel bus 18, without significantly increasing the
nu | | |