|
Claims  |
|
|
What is claimed is:
1. A method of distributed DRAM refreshing, comprising:
refreshing a first row of memory cells in a first array of DRAM memory cells with a first row of sense amplifiers; and then
refreshing a second row of memory cells in a second array of DRAM memory cells with a second row of sense amplifiers, wherein refreshing said first row of memory cells is performed during a first clock cycle and refreshing said second row of
memory cells is performed during a second clock cycle, said first clock cycle and said second clock cycle defining a sequence.
2. The method of distributed DRAM refreshing according to claim 1, wherein said first row of memory cells has a row number and said second row of memory cells has an identical row number.
3. The method of distributed DRAM refreshing according to claim 1, further comprising:
refreshing a third row of memory cells in said first array of DRAM memory cells with said first row of sense amplifiers, after refreshing said second row of memory cells; and then
refreshing a fourth row of memory cells in said second array of DRAM memory cells with said second row of sense amplifiers.
4. The method of distributed DRAM refreshing according to claim 3, wherein said third row and said fourth row have an identical row number.
5. The method of distributed DRAM refreshing according to claim 1, wherein said first array and said second array compose a first sub-group of arrays, and,
further comprising, performing a read/write operation on a third row of memory cells in a third array of DRAM memory cells with a third row of sense amplifiers, said third array of DRAM memory cells composing a second sub-group of arrays.
6. A method of distributed DRAM refreshing, comprising:
refreshing a first row of memory cells in a first array of DRAM memory cells with a first row of sense amplifiers;
refreshing a second row of memory cells in a second array of DRAM memory cells with said first row of sense amplifiers; and
performing a read/write operation on a third row of memory cells in a third array of DRAM memory cells with a second row of sense amplifiers,
wherein said first array and said second array compose a first sub-group of arrays and said third array of DRAM memory cells compose a second sub-group of arrays.
7. The method of distributed DRAM refreshing according to claim 6, wherein said first row of memory cells has a row number and said second row of memory cells has an identical row number.
8. The method of distributed DRAM refreshing according to claim 6, further comprising:
refreshing a third row of memory cells in said first array of DRAM memory cells with said first row of sense amplifiers, after refreshing said second row of memory cells; and then
refreshing a fourth row of memory cells in said second array of DRAM memory cells with said second row of sense amplifiers.
9. The method of distributed DRAM refreshing according to claim 8, wherein said third row and said fourth row have an identical row number.
10. A semiconductor device, comprising:
a first array of DRAM memory cells;
a first row of sense amplifiers coupled to said first array of DRAM memory cells;
a second array of DRAM memory cells; and
a second row of sense amplifiers coupled to said second array of DRAM memory cells,
wherein a first row of DRAM memory cells in said first array of DRAM memory cells can be undergoing a first refresh operation at the same time that a second row of DRAM memory cells in said second array of DRAM memory cells is undergoing a second
refresh operation, and wherein refreshing said first row of memory cells is performed during a first clock cycle and refreshing said second row of memory cells is performed during a second clock cycle, said first clock cycle and said second clock cycle
defining a sequence.
11. The semiconductor device of claim 10, wherein said first array of DRAM memory cells, said first row of sense amplifiers, said second array of DRAM memory cells, and said second row of sense amplifiers compose a single semiconductor chip.
12. The semiconductor device of claim 10, further comprising a controller coupled to both said first row of sense amplifiers and said second row of sense amplifiers.
13. The semiconductor device of claim 10, further comprising a circuit adapted to perform a read/write operation to said first array of DRAM memory cells while simultaneously activating said second array of DRAM memory cells.
14. The semiconductor device of claim 10, wherein said first array and said second array compose a first sub-group of arrays, and,
further comprising, a third array of DRAM memory cells composing a second sub-group of arrays and a third row of sense amplifiers coupled to said third array of DRAM memory cells,
wherein a row of DRAM memory cells in said third array can undergo a read/write operation at the same time that both said first array of DRAM memory cells and said second array of DRAM memory cells is undergoing a refresh cycle.
15. An embedded DRAM memory, comprising
a first array of DRAM memory cells;
a first row of sense amplifiers coupled to said first array of DRAM memory cells;
a second array of DRAM memory cells;
a second row of sense amplifiers coupled to said second array of DRAM memory cells; and
a circuit adapted to perform a read/write operation to said first array of DRAM memory cells while simultaneously activating said second array of DRAM memory cells,
wherein said first array of DRAM memory cells, said first row of sense amplifiers, said second array of DRAM memory cells, said second row of sense amplifiers, and said circuit compose a single semiconductor chip, and wherein refreshing said
first row of memory cells is performed during a first clock cycle and refreshing said second row of memory cells is performed during a second clock cycle, said first clock cycle and said
second clock cycle defining a sequence.
16. The embedded DRAM memory of claim 15, further comprising a controller coupled to said row of sense amplifiers. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention pertains to a DRAM architecture that has multiple DRAMs on the same chip, each of which can be accessed independently and simultaneously for performing different tasks and whereby more than one array of a particular DRAM memory can
be opened at the same time.
2. Discussion of the Related Art
Computers can generally be broken into three main components: input/output (I/O) for interfacing the computer with external devices (e.g., monitor, mouse, keyboard, modem, etc.), a central processing unit (CPU) for processing data, and memory for
storing the data. The dominant type of memory used in most computer systems today consists primarily of dynamic random access memory (DRAM). DRAMs are preferred because of their relatively low cost of production and high storage density.
Traditionally, DRAMs were used to store text, computer programs, and numerical data. But as computer systems became faster, more powerful, and more versatile, there was a corresponding requirement to have larger and larger memories to handle the
increased volumes of data. Today, there is a huge demand for additional memory in order to satisfy the demands imposed by video, audio, and graphics applications. This multimedia information consumes vast amounts of memory for storage.
Fortunately, advances in semiconductor manufacturing processes have substantially increased the capacity of DRAM chips, while costs have dropped on a per byte basis. In the past few years, DRAM chip storage capacity have exploded from storing
256 Kbytes, 1 Mbyte, 4 Mbytes, 16 Mbytes, . . . to 256 Mbytes of data. Indeed, the production of 1 Gigabyte DRAM chips is imminent.
However, the speed (i.e., bandwidth) at which data stored in the DRAMs can be accessed has not kept pace with demands. Video and audio recording and playback, three-dimensional graphics generation, real-time teleconferencing, on-the-fly
interactive simulations, etc., all require the transfer of huge amounts of data between the processor(s) and memory. Unfortunately, the amount of data which can be accessed from the DRAM is quite limited. This limitation is attributable to the fact
that the basic DRAM controller scheme has generally remained the same over the past twenty years. The same scheme that was originally developed for controlling 8 Kbyte DRAMs is now being applied to 256 Mbyte DRAMs. What was sufficient twenty years ago
is totally inadequate to meet today's technology. A proper analogy is that of a parking lot where the number of parking spaces has increased a thousandfold, but yet there is still only one tollgate through which all cars must pass.
FIG. 1 shows a typical architecture of a prior art DRAM layout. Cell array 101 is comprised of a 128.times.128 array of memory cells. An individual memory cell consists of a transistor which causes a tiny capacitor to be placed in either a
charged (i.e., "1") or discharged (i.e., "0") state. Thereby, a single memory cell is capable of being programmed to store one bit of information. Hence, this particular 128.times.128 cell array is capable of storing 16 Kbits of data. The memory cells
are arranged in rows and columns. Seven address lines (2.sup.7 =128) are used to specify a particular memory cell for access. These seven address lines (e.g., A0-A6/A7-A13) are multiplexed to provide a 14-bit address by using a row address strobe (RAS)
signal and a column address strobe (CAS) signal. The RAS signal is used to clock addresses A0-A6 to the row address register 102. The row address decoder 103 decodes the address and specifies one of the 128 rows for access. Similarly, the CAS signal
is used to clock addresses A7-A13 to the column address register 104. The column address decoder 105 decodes the address and specifies one of the 128 columns for access. Once a particular cell is specified by decoding its row and column, a read/write
(R/W) signal is used to specify whether a bit is to be written into that cell via DATA IN, or the bit retained by that cell is to be read out via DATA OUT.
In the past, designers have sought to increase the bandwidth of their DRAM architecture by implementing wider address and data buses. FIG. 2 shows a prior art memory architecture having wide buses. However, this workaround solution has a couple
of drawbacks. First, it requires more board space to physically route the wider buses. Wider buses consume precious area on an already crammed motherboard. Second, wider buses require a corresponding increase in the number of pins for the memory chips
and microprocessor. A higher pin count mandates larger chip packages. Again, larger chips consume valuable area on the motherboard. It may be physically impossible to insert these larger chips onto the printed circuit board. The practical limitation
of how wide buses can attain is approximately 64 or 128 bits wide. Beyond this bus width, it becomes too unwieldy.
Designers have also attempted to increase the DRAM bandwidth by implementing high speed special DRAMs. Although these specialized DRAMs can achieve relatively high peak bandwidths, it is difficult to sustain these peak bandwidths over time due
to the nature of their page misses. Generally, data is stored in a "page" format within the DRAM, whereby an entire page must be "opened" in order to access the piece of desired data residing within that page. If the requested data is not in the
currently opened page, a page "miss" occurs. Page misses require a lot of time to service because an entire RAS/CAS cycle must be performed in order to close the current page and open the new page containing the desired data. Hence, page misses
severely impact the specialized DRAMs' bandwidth. It is virtually impossible to avoid page misses because the specialized DRAMs typically implement the traditional RAS/CAS scheme. As such, there is minimal or no capability to perform a page open
look-ahead due to the fact that the page open (RAS) and read/write (CAS, OE) operations have to be performed in sequence and over the same address bus.
Moreover, since specialized DRAMs have an inordinate number of pins (e.g., 80+ pins) to accommodate their complex interface, there is usually just one single on-chip DRAM controller. This same controller is used to access different types of
information. The different types of information are typically stored and accessed from the same DRAM. As a result, there is a relatively high page miss rate as the controller switches between the different types of data. For example, a two-dimensional
drawing operation might require different page locations for operands that are required at the same time. Consequently, the DRAM controller normally includes a large FIFO buffer in order to balance the memory accesses with the drawing engine operations. Furthermore, a large percentage of PC Windows applications require rectangular types of operations. A read-modify-write operation is often necessary to determine whether selected pixels are to be changed. These kinds of operations require multiple
access to the DRAM (i.e., read and write) and effectively cuts the critical DRAM bandwidth in half.
Thus, there is a need in the prior art for a new high-capacity DRAM architecture that also has a sustainable high bandwidth. The invention provides an elegant solution by implementing a DRAM architecture having multiple DRAMs with multiple
arrays. In the invention, each of the on-chip DRAMs has its own address, data, and control lines. Hence, the DRAMs can be accessed independently and simultaneously for executing different tasks. Furthermore, in the invention, each DRAM is divided into
multiple arrays, which once opened, stays open. Each of the arrays has its own circuitry that performs page open and circuitry that performs read/write. Hence, page open and read/write operations can be performed simultaneously within the same RAM.
These improvements greatly minimize page misses, thus yielding a much greater DRAM bandwidth. In addition, each memory array is accompanied by byte write enable lines that control which portion of write data is actually updated into the DRAM array.
This byte write enable lines can change every clock that in real application converts read-modify-write cycle into write cycle. This reduction of memory access (from 2 to 1) provides more memory bandwidth for controller to access data.
SUMMARY OF THE INVENTION
The invention pertains to a semiconductor chip having two or more memory sections, whereby one of the sections is divided into a number of separate arrays. Data is stored in a particular memory depending on its associated task. For instance,
pixel data is stored in a frame buffer memory, whereas data relating to pattern, cursor, and video line buffers are stored in an auxiliary memory. These two separate sections of memory have their own set of address, read/write, activate, control and
data lines. Hence, they can be accessed independently by the memory controller.
Furthermore, a memory can be configured into a number of distinct arrays. Two separate and distinct address buses are implemented to access these arrays. The first address bus is used to specify which of these arrays is to be activated. The
other address bus is used to specify a particular array for performing either a read or write operation. These two address buses, in conjunction with activate, row, column, data, and read/write lines, enable the memory controller to activate one array
while simultaneously reading from or writing to a different array. In addition, once an array is activated, it remains activated. This feature allows more than one array to be in an activated state at any given time.
BRIEF DESCRIPTION OF THE
DRAWINGS
The invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
FIG. 1 shows a typical architecture of a prior art DRAM layout.
FIG. 2 shows a prior art memory architecture having wide buses.
FIG. 3 shows a block diagram of a high-performance media processor chip upon which the invention may be practiced.
FIG. 4 shows a more detailed block diagram of the auxiliary memory.
FIG. 5 shows a more detailed block diagram of the frame buffer memory.
FIG. 6 shows a block diagram of the interface between the frame buffer and associated circuits.
FIG. 7 shows a circuit diagram describing in detail the currently preferred decoding scheme associated with the frame buffer arrays.
FIG. 8 shows a group of frame buffer arrays having a common I/O.
FIG. 9 shows the frame buffer read/write and registers load timing diagram.
FIG. 10 shows the frame buffer DRAM access timing diagram.
FIG. 11 shows a detailed block diagram of one possible physical layout of the chip upon which the invention may be practiced.
FIG. 12 is a circuit schematic of the scoreboarding circuit of the invention for allowing dual array simultaneous memory access within the DRAM of the invention.
FIG. 13A is a logical block diagram of the memory storage arrangement of one of the eight memories of one implementation of the scoreboarding circuit of the invention.
FIG. 13B is an exemplary circuit block layout of one implementation of the scoreboarding circuit of the invention.
FIG. 14A and FIG. 14B illustrate a flow diagram of steps of the invention for performing DRAM scoreboarding.
FIG. 15 is an illustration of a central pixel and four surrounding pixels within the invention memory mapping configuration.
FIG. 16A is an illustration of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 640 (horizontal).times.480 (vertical).times.8 bits per pixel using 5 arrays per scan line.
FIG. 16B is an illustration of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 640 (horizontal).times.480 (vertical).times.16 bits per pixel using 10 arrays per scan line.
FIG. 17 is an illustration of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 640 (horizontal).times.480 (vertical).times.16 bits per pixel using 15 arrays per scan line.
FIG. 18A and FIG. 18B are illustrations of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 800 (horizontal).times.600 (vertical).times.8 bits per pixel using 50 columns per
scan line.
FIG. 19 is an illustration of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 800 horizontal).times.600 (vertical).times.16 bits per pixel using 100 columns per scan line.
FIG. 20A and FIG. 20B are illustrations of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 800 (horizontal).times.600 (vertical).times.24 bits per pixel using 150 columns per
scan line.
FIG. 21 is an illustration of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 1024 (horizontal).times.768 (vertical).times.8 bits per pixel using 8 arrays per scan line.
FIG. 22 is an illustration of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 1024 (horizontal).times.768 (vertical).times.16 bits per pixel using 16 arrays per scan line.
FIG. 23 is an illustration of a memory configuration utilized by an embodiment of the invention for computer system graphic display modes utilizing 1280 (horizontal).times.1024 (vertical).times.8 bits per pixel using 12 arrays per scan line.
FIG. 24 is a logical block diagram of a general purpose computer system utilized in one embodiment of the invention.
FIG. 25 is a logical flow diagram illustrating hardware and software layering within one embodiment of the invention.
FIG. 26 illustrates refresh current as a function of time, representing an embodiment of the invention.
FIG. 27 illustrates a timing diagram for the same row number being refreshed (activated) in a sequence of arrays, representing an embodiment of the invention.
FIG. 28 illustrates a block diagram of a 128 bit address lookup entry, representing an embodiment of the invention.
FIG. 29 illustrates a timing diagram of a address lookup DRAM auto-aging operation for one row of an array of DRAM memory cells, representing an embodiment of the invention.
FIG. 30 illustrates a state diagram for the conditional read-modify-write (RMW) operation performed on a DRAM row containing 8 address entries, representing an embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
A novel DRAM architecture having increased bandwidth is described. This architecture is built on the concepts of multiDRAM and concurrent arrays. The multiDRAM concept pertains to the incorporation of multiple DRAMs on a single chip, whereby
each of the DRAMs can be accessed independently to perform different tasks. The concurrent array concept pertains to structuring the DRAMs into multiple arrays. Each DRAM has the capability of performing page open and read/write operations
simultaneously. These two improvements allow the DRAM architecture of the invention to have access time approaching its peak bandwidth (e.g., 1.6 Gbytes/sec bandwidth for general graphics and video operations). In the following description, for
purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be obvious, however, to one skilled in the art that the invention may be practiced without these specific details.
In other instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the invention.
FIG. 3 shows a block diagram of a high-performance media processor chip upon which the invention may be practiced. External data can be input to the chip either through the video port 301, general purpose port 302, or the PCI interface 303. The
input data is then stored in one of two independent DRAM memories that are available: a frame buffer memory 304 or a separate auxiliary memory 305. Exactly where the input data is ultimately stored, depends on the nature of the data. More particularly,
frame buffer memory 304 contains the pixel data which is used to render images on a raster display. The size of frame buffer memory 304 depends on the size of the display and the number of bits assigned per pixel. A size of 1.5 Mbytes is sufficient for
a 640.times.480 display with 24-bit pixel color values. In contrast, auxiliary memory 305 is a smaller piece of DRAM memory which is used to store data pertaining to the background pattern (e.g., style and color), cursor (e.g., shape, size, and color),
and video line buffer. Either of these two DRAM memories can be independently accessed and controlled by address generator 330. The advantage of having two independently accessible DRAM memories is that now, two different tasks can be performed without
suffering a page miss. For instance, data from frame buffer 304 can be accessed for drawing a window display. Meanwhile, data for drawing the background or cursor is accessible through the auxiliary memory 305.
The actual graphics operations are performed by the raster operation (ROP4) engine 306. Basically, the ROP4 engine 306 performs raster operations on the following four components: source, destination, pattern, and mask. In order to more
efficiently execute various graphics operations which may be performed on a pixel or set of pixels, the ROP4 engine 306 is tightly coupled with frame buffer 304 and its associated registers: Source registers 307, Destination registers 308, Result
registers 309, and Scanout registers 310. Frame buffer 304 outputs its data to the Source, Destination registers and Scanout 304-305 for associated read operations. In addition, frame buffer 304 accepts update data from the Results registers 309 for
associated write operations coupled by byte write enable control lines. All read operations from and write operations to the frame buffer 304 are transmitted via the internal 128-bit bus 311. The associated registers 307-310 are all 128 bits wide. To
improve efficiency, these four registers 307-310 are all double buffered (i.e., ResultA and Result, Source and Source B, etc.). Hence, if one of the two registers in this double-buffering scheme happens to be filled, operations can continue on the other
register while the processor services the filled register. The Result register 309 (i.e., REST A and REST B) is loaded with data from the ROP4 engine 306 32-bits at a time. Data is passed out of the Source and Debt registers 307-308 as 32-bit words to
the ROP4 engine | | |