WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Programmable pipeline for formatting RGB pixel data into fields of selected size    
United States Patent5029105   
Link to this pagehttp://www.wikipatents.com/5029105.html
Inventor(s)Coleman; Mark D. (Ft. Collins, CO); Cherry; Robert W. (Loveland, CO)
AbstractA graphics system uses a programmable tile size and shape supported by a frame buffer memory organization wherein (X, Y) pixel map into regularly offset permutations on groups of RAM (Random Access Memory) address and data line assignments. Changing the mapping of (X, Y) pixel addressed to RAM addresses for the groups changes the size and shape of the tiles. A pixel data/partial address multiplexing method based on programmable tile size reduces the number of interconnections between a pixel interpolator and the frame buffer without significantly increasing the number of bus cycles needed to transfer the information. A programmable pipelined shifter allows the dynamic alteration of the mapping between bits of the RGB (red, green, blue) intensity values and the planes of the frame buffer into which those bits are stored, as well as allowing those values to be truncated to specified lengths. Tiles are cached. Tiles for RGB pixel values are cached in an RGB cache, while Z values are cached in a separate cache.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5029105
Programmable pipeline for formatting RGB pixel data into fields of

     selected size - US Patent 5029105 Drawing
Programmable pipeline for formatting RGB pixel data into fields of selected size
Inventor     Coleman; Mark D. (Ft. Collins, CO); Cherry; Robert W. (Loveland, CO)
Owner/Assignee     Hewlett-Packard (Palo Alto, CA)
Patent assignment
All assignments
Publication Date     July 2, 1991
Application Number     07/087,182
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 18, 1987
US Classification     345/539 345/600 345/605
Int'l Classification     G06F 015/20
Examiner     Harkcom; Gary V.
Assistant Examiner     Herndon; H. R.
Attorney/Law Firm     Miller; Edward L.
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/518 364/519 364/520 364/521 364/522 364/523 364/200 MS File 364/ 340/747 340/750 340/798 340/799 340/800 340/703
Patent Tags     programmable pipeline formatting rgb pixel data into fields of selected size
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4597044
Circello
713/100
Jun,1986

[0 after 0 votes]
4467414
Akagi
711/119
Aug,1984

[0 after 0 votes]
4407016
Bayliss
710/3
Sep,1983

[0 after 0 votes]
4574394
Holsztynski
382/303
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A programmable data formatter responsive to pixel data triples representing three different color intensities and producing therefrom individual pixel intensity values to be stored in fewer bit-planes of a frame buffer than there are bits in the pixel data triples, the data formatter comprising:

a first pipeline stage that captures first and second portions of a pixel data triple, truncates a third portion thereof by shifting it a first selected amount and then captures the truncated third portion as a third individual pixel intensity value;

a second pipeline stage coupled to the captured contents of the first pipeline stage, that captures the first portion of the pixel data triple therein, truncates the second portion of the pixel data triple therein by shifting it and the truncated third portion together as an intermediate unit by a second selected amount and then captures the shifted intermediate unit as the third and a second individual pixel intensity values;

a third pipeline stage coupled to the captured contents of the second pipeline stage, that shifts those contents as a final unit by a third selected amount and then captures that shifted final unit as the third, second and a first individual pixel intensity values; and

register means coupled to the first, second and third pipeline stages for containing bit patterns corresponding to the first, second and third selected amounts.

2. A programmable data formatter responsive to pixel data triples having D in number of, E in number of and F in number of bits respectively representing three different color intensities, D, E and F each being integers, and producing therefrom respective individual pixel intensity values having d in number of, e in number of and f in number of bits to be stored in a plurality, (d+e+f) in number, of bit-planes of a frame buffer having a number of bit-planes at least equal to the sum (d+e+f), d, e and f each being integers also, and (d+e+f)<(D+E+F), the programmable data formatter comprising:

rendering means for producing pixel data triples having D in number, E in number and F in number of bits respectively representing different color intensities;

a first register of ordered bits, (D+E+F) in number, and having respective inputs and outputs for each bit, E in number of those inputs respectively coupled to the E in number of bits of color intensity and F in number of those inputs respectively coupled to the F in number of bits of color intensity;

a first shifting means, having inputs respectively coupled to the D in number of bits of color intensity and having outputs respectively coupled to D in number of the inputs to the first register means, for causing a contiguous subset of d in number of the D in number of bits of color intensity to be applied to a first contiguous subset of the D in number of inputs of the first register, 0.ltoreq.d.ltoreq.D and the first contiguous subset starting with the least significant input among the D in number of first register inputs in the ordering thereof;

a second register of ordered bits, (D+E+F) in number, and having respective inputs and outputs for each bit, F in number of those inputs respectively coupled to the F in number of outputs of the first register;

a second shifting means, having inputs respectively coupled to the D in number of and E in number of outputs of the first register and having outputs respectively coupled to the D in number of and E in number of inputs of the second register, for causing a continguous subset of (d+e) in number of the D in number of and E in number of outputs of the first register to be applied to a second contiguous subset of the D in number of and E in number of inputs of the second register, 0.ltoreq.e.ltoreq.E and the second contiguous subset starting with the least significant input among the D in number of and E in number of second register inputs in the ordering thereof;

a third register of ordered bits, having at least (d+e+f) in number of such ordered bits, 0.ltoreq.f.ltoreq.F, and having respective inputs and outputs for each bit;

a third shifting means, having inputs respectively coupled to the D in number of, E in number of and F in number of outputs of the second register and having outputs respectively coupled to the at least (d+e+f) in number of inputs of the third register means, for causing a contiguous subset of (d+e+f) in number of outputs of the second register to be applied to a third contiguous subset of the at least (d+e+f) in number of inputs of the third register, the third contiguous subset starting with the least significant input among the at least (d+e+f) in number of third register inputs in the ordering thereof; and

a frame buffer of at least (d+e+f) in number of bit planes respectively coupled to the at least (d+e+f) in number of bits of the third register.

3. A programmable data formatter as in claim 2 further comprising programming register means having outputs coupled to the first, second and third shifting means for determining the values of d, e and f.

4. A method of storing a plurality of rendered pixel data values, each associated with the same pixel address, in a selectable and variable number of frame buffer bit-planes fewer than the number of bits in the plurality of rendered pixel data values; the method comprising the steps of:

selecting a first number of most significant bits from a first pixel data value that is a member of the plurality of rendered pixel data values;

selecting a second number of most significant bits from a second pixel data value that is a member of the plurality of rendered pixel data values;

and so on until a number of most significant bits from each member of the plurality of rendered pixel data values has been selected;

concatenating the first, second, and so on, numbers of selected most significant bits into a unified data field of selectable and variable location in a data word having the same number of bits as bit-planes in a frame buffer; and

storing the data word in an address in the frame buffer, each bit of the data word being stored in one bit of an associated bit-plane of the frame buffer.

5. A method of double buffering images stored in a frame buffer having M in number of bit-planes, M being an integer, the method comprising the steps of:

a. partitioning the frame buffer into a first sub-buffer of D in number of, E in number of and F in number of bit-planes, D+E+F=J, and a second sub-buffer of D' in number of, E' in number of and F' in number of bit-planes, where each of D, E, F, D', E', F' and J are integers, and D'+E'+F'=K, D=D', E=E', F=F' and J+K.ltoreq.M, K being an integer also;

b. logically ordering the first and second sub-buffers as a set {D in number: D' in number: E in number: E' in number: F in number: F' in number} of bit-planes;

c. rendering a plurality of pixel data values d, e and f each associated with a same pixel that is part of a first image;

d. selecting as U the (D+D') in number of most significant bits of d;

e. concatenating U with e and f to obtain the logical order {U;e;f};

f. selecting as V from the result of step e the (E+E') in number of most significant bits of e;

g. concatenating U, V and f from the results of steps e and f to obtain the logical order {U;V;f};

h. selecting as W from the results of step g the (F+F') in number of most significant bits of f;

i. concatenating U, V and W from the results of steps g and h to obtain the logical order {U;V;W}, thereby obtaining the equivalent logical order {D;D';E;E';F;F'};

j. enabling only the D in number of, E in number of and F in number of bit-planes associated with the first sub-buffer; and then

k. subsequent to step j, performing a write memory cycle from the logical order obtained in step i, whereby reduced precision data values of the same pixel that is part of the first image is written into the first sub-buffer;

l. repeating steps c through k until the first image has been stored in the first sub-buffer; f' f' each associated with another same pixel that is part of second image;

n. Selecting as U' D in number of most significant don't care bits conjoined as least significant bits by D' in number of most significant bits of d';

o. concatenating U' with e' and f' to obtain the logical order {U';e';f'};

p. selecting as V' from the result of step o E in number of most significant don't care bits conjoined as least significant bits by the E' in number of most significant bits of e';

q. concatenating U', V' and f' from the results of steps o and p to obtain the logical order {U';V';f'};

r. selecting as W' from the result of step q F in number of most significant don't care bits conjoined as least significant bits by the F' in number of most significant bits of f';

s. concatenating U', V' and W' from the results of steps q and r to obtain the logical order {U';V';W'}, thereby obtaining the equivalent logical order {D.sub.don't care ;D';E.sub.don't care ;E';F.sub.don't care ;F'};

t. enabling only the D' in number of, E' in number of and F' in number of bit planes associated with the second sub-buffer; and then

u. subsequent to step t, performing a write memory cycle from the logical order obtained in step s, whereby reduced precision data values of the other same pixel that is part of the second image is written into the second sub-buffer; and

v. repeating steps m through u until the second image has been stored in the second sub-buffer.

6. A method as in claim 5 wherein each of the pairs of selecting steps (d and n), (f and p) and (h and r) is accomplished by shifting operations performed by respective first, second and third barrel shifters.

7. A method as in claim 6 further comprising the steps of storing the number of shifts each barrel shifter is to perform into an associated control register.
 Description Submit all comments and votes
 


BACKGROUND AND SUMMARY OF THE INVENTION

A modern high-performance graphics workstation suitable for solid modelling must incorporate a number of features to provide high speed rendering of objects while at the same time remaining affordable. Experience shows that the tasks to be accomplished are so numerous and often so complicated that special purpose dedicated hardware is a necessity if useful images are to be rendered and manipulated with adequate speed. Furthermore, it would be desirable if the feature set of the dedicated hardware were flexible and reconfigurable according to the firmware and software subtasks arising from the user's high-level activities. The techniques disclosed herein reduce costs, increase performance and add flexibility.

One major aspect of the invention involves the concept of cache memory. This is a technique often used in high-performance computer systems to increase the speed with which the CPU can access data stored in a main memory. The idea is to use a small high speed memory to replicate the contents of a selected region of the main memory. The CPU does its memory accesses to the cache, which either does or does not contain data representing the desired location in main memory. If the cache does contain the data for the desired location the fast access to the cache acts in the place of a slower access to the main memory. This is called a "hit." If, on the other hand, the cache does not contain data for the address to be accessed, then the contents of the cache must be changed to reflect that part of main memory that does contain the desired data. This is called a "miss," and involves writing the current contents of the cache back into main memory (unless the current content of the cache was never modified) and then loading the cache with data in the main memory taken from the vicinity of the new address of interest. To facilitate this architecture there is usually a wide data path between the cache and the main memory.

A hardware pixel processor in a graphics system is essentially a CPU that needs to write data into a memory. In this case the memory is called a frame buffer, and it has an address for each pixel component of the display. The frame buffer is also accessed by another mechanism that reads the contents of the frame buffer to create the corresponding pixel by pixel image upon a monitor. Typically, the monitor will be a color CRT with red, green and blue (RGB) electron guns whose intensities are varied by discrete steps to produce a wide range of colors. Accordingly, the frame buffer is divided into portions containing multi-bit values for each color of every pixel. The preferred way to do this is to organize the frame buffer into "planes" which each receive the same address. Each plane holds one bit at each address. Planes are grouped together to form multi-bit values for the attributes of the pixels they represent. Attributes include the RGB intensities, and in many systems ON and OFF for pixels in an "overlay" plane that is merged with data in other planes. For instance, an overlay plane might contain a cursor, and the presence of a bit in the overlay plane might force saturation intensity for all three electron guns, regardless of the actual RGB values for that pixel. In graphics systems with two-dimensional displays that are intended for use with solid modelling of three-dimensional objects, there is frequently another attribute that is stored for each pixel: its depth. Hardware storage of depth values greatly facilitates hidden surface removal, as it allows the hardware to automatically suppress pixels that are not upon the outer surface facing the viewer.

In accordance with what has been described above, it is not unusual to find graphics systems with between twenty-four and forty planes of frame buffer memory: perhaps three sets of eight for RGB values and sixteen or more for Z, or depth, values. Considering that the monitor could easily be 1280 pixels wide and 1024 pixels high, and that refreshing the display at a power line frequency of sixty Hertz is a requirement, it can be concluded that a new pixel of twenty-four or more bits (and possibly qualified for depth) must be obtained for the monitor from the frame buffer at a rate of approximately one pixel every nine nanoseconds. To some extent the advent of so called "video display RAM's" has made this easier to do. They have special high speed ports that read blocks of data at high speed for use by a shifter that, when grouped with the shifters of other planes for the same color, produce the multi-bit values for color intensity. These multi-bit values are applied to digital-to-analog converters (DAC's) that in turn generate the signals that actually drive the electron guns.

Despite the video RAM's, formidable problems remain concerning the task of getting the data into the frame buffer in the first place. In the long run, the graphics system will not be able to manipulate an image (draw it, rotate it, cut a hole in it, etc.) any faster than the image can be put into the frame buffer. The speed with which this can be done is one important aspect of "high performance" in a graphics system. Recalling the purpose for caching in a conventional computer system, it will be noted that there is a certain similarity. It would be desirable if a way could be found to cache pixels into a high speed memory and reduce the number of write operations made into the frame buffer. If this could be done without sacrificing other desirable features it would significantly increase the rate with which data could be put into the frame buffer. This is indeed desirable, since much work has been done to develop and perfect dedicated hardware to generate at high speed pixel values from a more abstract description of the image to be rendered.

In the invention to be described each plane of frame buffer memory is equipped with a corresponding plane of a pixel cache. The pixel rendering hardware stores computed pixel values into the frame buffer by way of the cache. Those familiar with pixel rendering mechanisms will appreciate that the order in which pixels are calculated is not necessarily related to the order they are accessed for use in driving the monitor, which is typically vertically by horizontal rows for a raster scanned CRT. Instead, pixels are apt to be generated in an order that makes sense in light of the techniques being used to represent the object. A wire frame model would rely heavily on the drawing of arbitrarily oriented vectors, while shaded polygons would rely heavily upon an area fill based on successive horizontal lines of pixels. For a curved surface the successive horizontal lines are apt to be fairly short, may be of varying lengths and might not line up exactly above or beneath each other. Clearly, the preferred pixel rendering techniques are no respecters of sequentially addressed memory spaces! Yet the sequence of generated pixels are still strongly related by just more than being consecutive members in some order of pixel generation; their locations in the final image are physically "close" to each other. That is, sequentially generated pixels are apt to posses a shared "locality." That this is so has been noticed by others, and has been termed the "principle of locality." It seems clear that to maximize the number of hits, a cache for a frame buffer ought to operate in view of the principle of locality. But it is also clear that a different type of locality obtains for area fill operations than does for arbitrary vectors.

A "tile" is a rectangular collection of pixels. Various schemes for manipulating pixels in groups as tiles have been proposed. It would seem that what a pixel cache for a frame buffer ought to do, at least in part, is cache a tile. But again, the tile shape best suited for area fill operations would be one that is one pixel high by some suitably long number of pixels. The optimum tile shape for the drawing of arbitrary vectors can be shown to be a square. So what is needed then, is a pixel cache whose "shape" is adjustable according to the type of tile best suited for use with the type of pixel rendering to be undertaken.

That object can be achieved by a pixel cache, frame buffer controller and frame buffer memory organization that cooperate to implement a cache corresponding to a tile of adjustable rectangular dimensions. The frame buffer memory organization involves dividing the frame buffer into a number of separately addressable groups. Each group is composed of one or more bits. Along the scan lines of the raster groups repeat in a regular order. Successive scan lines have different starting groups in the pattern of repetition. Thus, whether a tile proceeds horizontally along a scan line, or vertically across successive scan lines, different groups are accessed for the pixels in that tile. This allows the entire tile to be fetched with one memory cycle. In such a scheme adjacent pixel addresses do not necessarily map into adjacent frame buffer addresses, as in conventional bit-mapped displays. Instead, an address manipulator within the frame buffer controller converts a pixel address (screen location) into a collection of addresses (one for each group) according to rules determined by the shape of the tile to be accessed.

Each plane of the frame buffer memory includes a sixteen-bit plane of an RGB pixel cache and a sixteen-bit plane of a Z value cache. (It will be understood, of course, that the number sixteen is merely exemplary, and is not the only practical size of pixel cache.) For each bit in a pixel's RGB values, the pixel's (X, Y) location on the monitor is mapped into the proper location of the plane of the RGB cache associated with that bit. If there is a hit, then the pixel is written to the cache. If there is a miss, then the cache is written out to the frame buffer in accordance with a replacement rule similar to those used with so-called "line movers" or "bitblts." The replacement rule uses sixteen-bit registers named SOURCE, DESTINATION and PATTERN. There is one of these registers for each plane of frame buffer memory. At the time of the preceding miss, each DESTINATION, and not the cache, was loaded with a copy of that region (tile) of the frame buffer that the cache was then to represent. Data was then written to the cache until there was a miss. Then the frame buffer controller simultaneously copied all of the bits of each plane in the cache into each SOURCE; this frees the cache for immediate use in storing new pixel values. The frame buffer controller proceeded to combine each SOURCE with its associated DESTINATION according to the desired rule (OR, AND, XOR, etc.). The result was further modified by the associated PATTERN, which can be used to impose special deviations upon the pixel data. For example, PATTERN might suppress a regular succession of pixels to create "holes" into which might later be placed pixels of another object, thus creating the illusion of transparency. However achieved, the result is written, all sixteen bits in parallel, for each plane, to the frame buffer. The mapping of pixel addresses into the cache and the parallel write into the frame buffer (i.e., the mapping of the cache contents back into frame buffer addresses) are automatically adjusted according to the size and shape of the tile being handled. Thus, one aspect of the invention to be disclosed is a pixel cache memory that accepts programmatically variable tile sizes. It will be further understood as the description proceeds that the tiles may be aligned on selected pixel boundaries, and that those boundaries need not be permanently fixed in advance.

A second major aspect of the invention concerns what is commonly referred to as the Z buffer. In a conventional graphics system the Z buffer is a memory, separate from the frame buffer, holding the Z (depth) value of each pixel. In a high-performance graphics system the Z values are typically sixteen-bit integers. Thus the conventional Z buffer would, like the frame buffer, have an address for each pixel. The second major aspect of the invention allows a more efficient use of memory by making each plane of the frame buffer larger than is necessary merely to hold the RGB values for pixels. Each plane of frame buffer memory contributes memory that can be associated with other such contributions to form all or a portion of a Z buffer. Furthermore, entire planes of what might otherwise be frame buffer memory can be allocated to the Z buffer. At root, what is taught is a very flexible division of available frame buffer memory into an RGB buffer portion and a Z buffer portion. Said another way, the Z buffer can be made any size and located anywhere in the frame buffer memory through the use of a Z buffer mapping.

If it should be the case that the amount of available memory for the Z buffer is less than enough to hold a sixteen-bit integer for each pixel (and in a preferred embodiment this is frequently the case), then hidden surface removal is performed in sections. For example, if there has been only enough memory allocated to the Z buffer to correspond to one fourth of the frame buffer, then the rendering of an image is divided into four similar activities. First, an initial fourth of the display is created. This might be a top-most horizontal strip, or a left-most vertical strip, or any suitable fourth of the display. Pixels that are to reside in the selected fourth of the display are rendered. As the RGB values for those pixels are calculated, so are their Z values.

The existence of a hidden surface implies that there are some addresses in the frame buffer to which more than one RGB value corresponds; each pixel is associated with a different surface (or at least a different portion of the same surface). Absent any special control to the contrary, the various pixels will be calculated in some order related to the way the object has been described to the graphics system's software and the rendering algorithms in use. As each of the multiple pixels corresponding to an address is rendered its RGB and Z values would overwrite the previous values. Hidden surface removal at the hardware level with a Z buffer compares the Z values of the conflicting pixels and allows the one with the least depth to prevail. That is, the Z value of a new pixel in hand for a certain address is compared with the Z buffer value for the pixel already in that address. An old pixel's values are overwritten by the new values if the old pixel is on a hidden surface to be removed, as indicated by the comparison of the Z values. An additional feature of the invention in this connection is the ability to programmatically decide what to do in the event the new and previous Z values are equal.

To continue with the example, the above process is carried out for all pixels in the fourth of the display being generated. Following that, the Z buffer is allocated to represent the next fourth of the display, and the process is repeated until the entire display has been created afresh. This process might take several seconds if the image is extremely complicated and there is but a very small Z buffer. On the other hand, it only has to be done once for each presentation of a new image to the frame buffer, and not once for each refresh of the image from the frame buffer to the monitor.

The above described technique for hidden surface removal with a Z buffer that corresponds to less than the full frame buffer is termed "strip Z buffering." Strip Z buffering requires some cooperation from the software that tells the graphics hardware what to draw. It will be appreciated that the image to be rendered is described in a data structure called a display list and resembling a data base. A simplified description of the graphics system software is that it interacts with the user to get into the form of a display list an object he desires to display and then manipulate. The display list describes the object in the abstract. Any particular view of the object must be derived from that abstract description through specifying from were to view it, where the clip limits are, where the light sources are, etc. This information is used to decide what pixels are needed to form the image on the monitor. If strip Z buffering is in use, then the software that makes that decision (the derivation mentioned above) must also know what region of the screen corresponds to the location of the Z buffer (i.e., where the "strip" is). During the traversal of the display list it must decline to generate pixels for regions not in the strip. Then it must traverse the list again with the new strip, and so on until the entire object has been rendered. In a preferred embodiment the software already knows where the Z buffer is because it controls that, too; the Z buffer may be programmatically located anywhere in the frame buffer.

When hidden surface removal is in effect the pixel processing mechanism that creates individual RGB values for pixels also simultaneously creates the Z value. The Z values need to be stored into the frame buffer at the same overall rate as the RGB values. The Z values are stored via a sixteen-bit cache memory (with one plane per plane of frame buffer memory) that are very similar to the one that caches the RGB values. Recalling that the Z values are themselves sixteen-bit values, one might be tempted to conclude that all sixteen bits of a Z value are stored in the same plane of (excess) frame buffer memory, and that when that is full then the next Z value goes into the excess portion of the next plane. That is not done since it would require the addresses of the Z values to map into various planes of the Z cache, which is a major architectural feature not having a counterpart in the RGB cache. Since the cache mechanism is part of a VLSI chip, two instances of the same architecture is far more desirable than two separate ones. Another important consideration involves the number of planes of frame buffer memory fabricated in a frame buffer memory assembly. The available number of planes of frame buffer memory buffer memory assembly. The available number of planes of frame buffer memory will be a multiple of that number, which in the preferred embodiment is eight. The preferred embodiment to be described adopts a Z buffer mapping into the excess portions of the frame buffer that spreads the sixteen bits of each Z value out across eight planes of frame buffer memory. This mapping must be flexible and programmatically determinable, since the total number of planes of frame buffer memory can vary (in increments of eight) according to the way the user configures the graphics system (recall the discussion of strip Z buffering).

To cache Z values, the one or more groups of eight planes of frame buffer memory are allocated to the Z buffer. Entire planes can be allocated, or just the "excess" not used as RGB buffer. A minimum of one group of eight must be allocated, implying that a minimum system configuration must include eight planes of frame buffer memory. (This is certainly no impractical requirement for a color system; a typical system would have twenty-four planes of frame buffer memory, although less is possible.) A group of eight planes of frame buffer memory used for the Z buffer can be either eight "excess" portions not used as RGB buffer, or eight full planes used solely for the Z buffer.

Each sixteen-bit plane in a Z buffer cache in a group of eight receives two bits at a time from a Z value. In this way the entire sixteen-bit Z value is cached in eight two-bit portions of Z buffer cache memory. When there is a miss each plane of the Z cache is written out to an associated Z WRITE register, from whence it is written to the Z buffer. The Z WRITE registers may have to contend with the SOURCE registers for access to the frame buffer (the Z cache fills at a rate twice that of the RGB cache, so sometimes there will be contention, other times not). Thus, the transfer from the Z cache to the Z WRITE registers frees the Z cache to begin accepting new Z values immediately.

The display list will previously have been divided into portions that correspond to the one or more groups of eight planes of frame buffer memory. The display list portions are required to remain within the boundaries of their associated strips. As the portions of the divided (and perhaps even regrouped) display list are traversed, Z values are written into the Z buffer. The order of these write operations is display list dependent; some Z buffer locations may never be writted to, while others may be written into more than once as hidden surface removal proceeds. Eventually, the traversal of the display list portion is complete. If there is another strip to construct, the mapping of the one or more groups of eight is changed to reflect the next strip and the next portion of the display list is traversed; otherwise the entire display has been constructed and the strip Z buffering process has been concluded.

Another aspect of the invention concerns a programmatically variable mapping between the pixel interpolator and the planes of frame buffer memory. It is desirable in a system with three color interpolators but with only a minimum number of planes in the frame buffer to be able to select between one interpolator computing shades of gray and three interpolators independently computing red, green and blue values. In addition, to facilitate double buffering it is desirable to control the mapping of pixel data bits into the frame buffer. Such a controlled mapping can be obtained through the use of a pipelined data shifter controlled by a register partitioned into values encoding the number of shifts to be performed at each level of the pipeline. In a related aspect of the invention the pipelined shifter allows the programmatic selection of the number color intensity bits for each color that will be stored in the frame buffer.

Still another aspect of the invention concerns the way the color map is updated. A color map is used to create an arbitrary (or nearly so) correspondence between the R, G and B values stored in the frame buffer and the digital values actually sent to the R, G and B DAC's. This allows, for example, a four-plane R value to be mapped into any sixteen of the 2.sup.8 values the DAC can accept. The total number of red values has not increased, but they have been dispersed over the range of color resolution available. This would be desirable in systems that either didn't have very much frame buffer memory in the first place, or where not very many different red colors were wanted, and frame buffer memory was de-allocated from R value duty and used to advantage somewhere else.

From time to time the graphics system changes the color map. In conventional systems this is accomplished without any special concern for when it is done. This can cause display artifacts in two ways. First, the read activities of most RAM's are distrubed during write operations. This causes loss of the mapping action during the update, temporarily resulting in arbitrary colors in random locations. Second, some rather peculiar (albeit transient) colorizations can result if the color map is changed within the duration of a raster presentation; part of the screen would be mapped one way while part of the screen would be mapped another way. This is almost sure to be the case because of the difficulty in synchronizing the activities of the graphics system's CPU with raster generation by the monitor. In the preferred embodiment the color map and the overlay map are periodically updated from a shadow RAM during vertical retrace, whether or not the shadow RAM has been changed. The graphics system updates the shadow RAM in place of the conventional updates to the color and overlay map RAM's, which are then subsequently updated from the shadow RAM during vertical retrace.

In a further aspect of the invention the shadow RAM comprises first and second portions, each of which is large enough to update both the color map and the overlay RAM. After a certain number of frames of raster generation, (say, eight) the color map and overlay map are updated from the first portion. After eight more frames they are automatically updated from the second portion. After another eight they are again updated automatically from the first portion, and so on. Now suppose that the first and second portions contain certain symmetrically different information. A cursor in an overlay plane could be made to blink between any two colors by the change to the overlay RAM. Or, some object in the RGB planes could be made to blink by having alternating colors assigned to it by the first and second portions of the color map. If the first and second portions of the shadow RAM are made identical then no blinking is induced by the overlay or color maps.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a simplified pictorial representation of a computer graphics system incorporating the principles of the invention.

FIGS. 2A-C are a simplified block diagram of a portion of the computer graphics system of FIG. 1.

FIG. 3 is a simplified block diagram of a tile address/data MUX circuit of FIG. 2A.

FIG. 4 is a representation of a frame buffer memory organization used in implementing programmable tile sizes.

FIGS. 5A-B illustrate the correspondence between pixel locations on the monitor according to pixel addresses and their location in the frame buffer memory in accordance with the memory organization of FIG. 4.

FIG. 6 is a diagram illustrating how the organization of the frame buffer memory accommodates 16.times.1 tiles.

FIG. 7 is an example of how a specific 16.times.1 tile is stored according to the frame buffer memory organization of FIG. 6.

FIG. 8 is a diagram illustrating how the organization of the frame buffer memory accommodates 4.times.4 tiles.

FIG. 9 is an example of how a specific 4.times.4 tile is stored according to the frame buffer memory organization of FIG. 8.

FIGS. 10A-F are an abbreviated schematic diagram of an address manipulator used in implementing the frame buffer memory organization of FIGS. 6-9.

FIGS. 11A-B are a simplified block diagram of the mechanism used to refresh the monitor of FIG. 2C from the frame buffer memory organization of FIGS. 5A-B.

FIG. 12 is a simplified block diagram of the RGB cache of FIG. 2B.

FIG. 13 is a block diagram illustrating the operation of group rotator and unrotator circuits used in the block diagram of FIG. 12.

FIG. 14 is a simplified block diagram of the Z cache of FIG. 2B.

FIG. 15 is an illustration of how a Z buffer is mapped into the frame buffer memory assembly of FIG. 2B.

FIG. 16 is a simplified block diagram of a three level pipelined shifter used to programmably truncate and steer pixel data fields from a pixel interpolator into a combined format to be stored in the frame buffer of FIG. 2B.

FIG. 17 is a block diagram of a portion of the color map assembly of FIG. 2C, and illustrates the operation of a shadow RAM for updating the contents of the color map and overlay map RAM's.

DESCRIPTION OF A PREFERRED EMBODIMENT

1. Introduction

Refer now to FIG. 1, wherein is shown a pictorial representation of an actual graphics system embodying various aspects of the invention. In particular, the graphics system includes a computer 1 (which may be a Hewlett-Packard Model 9000 Series 320 Computer), a keyboard 2, knob box 3, button box 4, mouse 5, graphics accelerator 7 (which may be a Hewlett-Packard Model 98720A) and a color monitor 9. The computer 1 executes the software of the graphics system. That software includes the user interface and the preparation of the display list, which might be based either upon a B-spline description of the surface to be displayed or upon a wire frame model. The computer 1 is coupled to the graphics accelerator 7 through a high speed local graphics bus 16. The graphics accelerator 7 is in turn coupled to the color monitor 9 through three coaxial cables for carrying the Red, Green and Blue (RGB) video signals.

To render an image that has been described to the graphics system, the graphics software traverses the display list and sends values representing surface patches in a parameter space and/or vector endpoints to the graphics accelerator 7. In the case of a B-spline description the transmitted values are processed by microcode in the graphics accelerator 7 to obtain the (X, Y, Z) locations and colors for the vertices of polygons that approximate each patch. It is then the job of a pixel interpolator within the graphics accelerator to calculate and write into the frame buffer all of the pixel values describing the entire polygon surface, including multi-axis interpolation of the colors for shading during the fill operation, and including Z axis interpolation and hidden surface removal. In the case of a wire frame model the tasks are similar, except that (1) the parameter space description is absent in favor of vector end points in (X, Y, Z) space, and (2) instead of generating filled polygons the graphics accelerator creates a continuous and color-interpolated vector. In either case, as the calculated pixel values are written into the frame buffer they become visible upon the monitor 9.

The frame buffer that is divisible into RGB and Z buffer portions, the RGB pixel and Z buffer caches, circuitry for implementing the different tile sizes, the Z mapping circuitry, and the shadow RAM, are all located in the graphics accelerator 7. The graphics accelerator 7 is, of course, the item with which we shall be principally concerned throughout the remainder of this Specification.

FIGS. 2A-C show a simplified block diagram of the graphics accelerator 7. A Data Input Output Bus (DIO Bus) 6 within the computer 1 is coupled to an interface 10, from whence it emerges as a Local Graphic Bus 16 (LGB). The LGB 16 is a communication path for data and instructions between the computer 1 and the graphics accelerator 7, and between the various mechanisms within the graphics accelerator 7. Among the mechanisms within the graphics accelerator 7 are a transform engine 11, a scan converter 12, a frame buffer controller 13, one or more frame buffer assemblies 14i-iv, and a color map assembly 15. The output of the color map assembly 15 is the three RGB video signals 8 that drive the color monitor 9.

The purpose of the transform engine 11 is to receive sections of the display list as it is traversed by the graphics software executing in the computer 1 and convert those into sequences of device coordinates. Basically, these are pixel values (X, Y, Z, R, G, B) for either vector endpoints or polygon vertices. These device coordinates are output upon a device coordinate bus 17 that is coupled to the scan converter 12.

The purpose of the scan converter 12 is to calculate by interpolation additional pixel values for pixels between vector endpoints or along the edges and within polygons. To this end the device coordinates are buffered in a Device Coordinate (DC) RAM 20, from which they are available to a high-speed pixel interpolator 21. The resulting sequence of Z values is separated and output on a Z bus 19. The RGB color values are compressed and formatted by a color data formatter circuit 89, whereupon they pass via a tile address/data MUX circuit 22 onto a pixel bus 18 that carries in multiplexed form both pixel data values and pixel address values.

The pixel color data formatter 89 allows the programmatic steering of a selected number of red pixel value bits, a selected number of green pixel value bits, and a selected number of blue pixel value bits into correponding planes of the frame buffer memory. This programmability is combined with the necessary conversion of the high precision pixel color values down to the precision that will actually be used in controlling the electron guns in the CRT.

The tile address/data MUX circuit 22 is programmable to recognize different tile sizes and shapes. By its multiplexing action it reduces the number of lines needed in the pixel bus 18, without significantly increasing the nu