|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a semiconductor memory device, and more
specifically to a semiconductor memory device containing a cache in which
a dynamic random access memory (DRAM) having a large storage capacity
serving as a main memory and a static random access memory (SRAM) having
small storage capacity serving as a cache memory are integrated on the
same semiconductor chip.
2. Description of the Background Art
Operation speed of recent 16-bit or 32-bit microprocessing unit (MPU) has
been so much increased as to have operation clock frequency as high as 25
MHz or higher. In a data processing system, a standard DRAM (Dynamic
Random Access Memory) is often used as a main memory having large storage
capacity, since cost per bit is low. Although access time in the standard
DRAM has been reduced, the speed of operation of the MPU has been
increased much faster than that of the standard DRAM. Consequently, in a
data processing system using the standard DRAM as a main memory, increase
of wait state is inevitable. The gap in speed of operation between MPU and
the standard DRAM is inherent to the standard DRAM which has the following
characteristics.
(1) A row address and a column address are time divisionally multiplexed
and applied to the same address pin terminal. The row address is taken in
the device at a falling edge of a row address strobe signal (/RAS). The
column address is taken in the device at a falling edge of a column
address strobe signal (/CAS). The row address strobe signal /RAS defines
start of a memory cycle and activates a row selecting system. The column
address strobe signal /CAS activates a column selecting system. Since a
prescribed time period called "RAS-CAS delay time (tRCD) is necessary from
the time the signal /RAS is set to an active state to the time the signal
/CAS is set to the active state, there is a limit in reducing the access
time, namely, there is a limit derived from address multiplexing.
(2) When the row address strobe signal /RAS is once raised to set the DRAM
to a standby state, the row address strobe signal /RAS cannot fall to "L"
again until a time period called a RAS precharge time (tTR) has lapsed.
The RAS precharge time is necessary to insure precharging various signal
lines in the RAM to prescribed potentials. Due to the RAS precharge time
tRP, the cycle time of the DRAM cannot be reduced. In addition, when the
cycle time of the DRAM is reduced, the number of charging/discharging of
signal lines in the DRAM is increased, which increases current
consumption.
(3) The higher speed of operation of the DRAM can be realized by circuit
technique such as improvement of layout, increase of degree of integration
of circuits, development in process technique and by applicational
improvement such as improvement in the methods of driving. However, the
speed of operation of the MPU is increased at much faster rate than DRAM.
The speed of operation of semiconductor memories is hierarchical. For
example, there are high speed bipolar RAMs using bipolar transistors such
as ECLRAMs (Emitter Coupled Logic RAMs) and Static RAM, and comparatively
low speed DRAMs using MOS transistors (insulated gate type field effect
transistors). It is very difficult to expect the operation speed (cycle
time) as fast as several tens ns (nano second) in a standard DRAM formed
of MOS transistors.
There have been various applicational improvements to decrease the gap
between speed of operations of the MPU and the standard DRAM. Such
improvements mainly comprises the following two approaches.
(1) Use of high speed mode of the DRAM and interleave method
(2) External provision of a high speed cache memory (SRAM).
The first approach (1) includes a method of using a high speed mode such as
a static column mode or a page mode, and a method of combining the high
speed mode and the interleave method. In the static mode, one word line
(one row) is selected, and thereafter only the column address is changed
successively, to successively access memory cells of this row. In the page
mode, one word line is selected, and then column addresses are
successively taken by toggling the signal /CAS to successively access
memory cells connected to the selected one word line. In either of these
modes, memory cells can be accessed without toggling the signal /RAS,
enabling higher speed than the normal access using the signals /RAS and
/CAS.
In the interleave method, a plurality of memories are provided in parallel
to a data bus, and by alternately or successively accessing the plurality
of memories, the access time is reduced in effect. The use of high speed
mode of the DRAM and combination of the high speed mode and the interleave
method have been known as methods of using the standard DRAM as a high
speed DRAM in a simple and relatively effective manner.
The second approach (2) has been widely used in main frames. A high speed
cache memory is expensive. However, in the field of personal computers in
which high performance as well as low cost are desired, this approach is
employed in some parts with a sacrifice of cost. There are three possible
ways to provide the high speed cache memory. Namely,
(a) the high speed cache memory is contained in the MPU itself;
(b) the high speed cache memory is provided outside the MPU; and
(c) the high speed cache memory is not separately provided but the high
speed mode contained in the standard DRAM is used as a cache (the high
speed mode is used as a pseudo cache memory). When a cache hit occurs, the
standard DRAM is accessed in the high speed mode, and at the time of a
cache miss, the standard DRAM is accessed in the normal mode. The above
mentioned three ways (a) to (c) have been employed in the data processing
systems in some way or other.
In most MPU systems, the memories are adopted to have bank structure and
interleaving is carried out on bank by bank basis in order to conceal the
RAS precharge time (TRP) which is inevitable in the DRAM, in view of cost.
By this method, the cycle time of the DRAM can be substantially one half
that of specification value. The method of interleave is effective only
when memories are sequentially accessed. When the same memory bank is to
be continuously accessed, it is ineffective. Further, substantial
improvement of the access time of the DRAM itself cannot be realized. The
minimum unit of the memory must be at least 2 banks.
When the high speed mode such as the page mode or the static column mode is
used, the access time can be reduced effectively only when the MPU
successively accesses a certain page (data of a designated one row). This
method is effective to some extent when the number of banks is
comparatively large, for example 2 to 4, since different rows can be
accessed in different banks. When the data of the memory requested by the
MPU does not exist in the given page, it is called a "miss hit". Normally,
a group of data are stored in adjacent addresses or sequential addresses.
In the high speed mode, a row address, which is one half of the addresses,
has been already designated, and therefore possibility of "miss hit" is
high. When the number of banks becomes as large as 30 to 40, data of
different pages can be stored in different banks, and therefore the "miss
hit" rate is remarkably reduced. However, it is not practical to provide
30 to 40 banks in a data processing system. In addition, if a "miss hit"
occurs, the signal (/RAS) is raised and the DRAM must be returned to the
precharge cycle in order to re-select the row address, which sacrifices
the characteristic of the bank structure.
In the above described second method (2), a high speed cache memory is
provided between the MPU and the standard DRAM. In this case, the standard
DRAM may have relatively low speed of operation. Standard DRAMs having
storage capacities as large as 4M bit or 16M bits have come to be used. In
a small system such as a personal computer, the main memory thereof can be
formed by one or several chips of standard DRAMs. External provision of
the high speed cache memory is not very effective in such a small system
in which the main memory can be formed of one standard DRAM. If the
standard DRAM is used as the main memory, the data transfer speed between
the high speed cache memory and the main memory is limited by the number
of data input/output terminals of the standard DRAM, which constitutes a
bottleneck in increasing the speed of the system.
When the high speed mode is used as a pseudo cache memory, the speed of
operation is lower than the high speed cache memory, and it is difficult
to realize the desired system performance.
Provision of the high speed cache memory (SRAM) in the DRAM is proposed as
a method of forming a relatively inexpensive and small system, which can
solve the problem of sacrifice of system performance when the interleave
method or the high speed operation mode is used. More specifically, a
single chip memory having a hierarchical structure of a DRAM serving as a
main memory and a SRAM serving as a cache memory has been conceived. The
1-chip memory having such a hierarchical structure is called a cache DRAM
(CDRAM). The CDRAM will be described.
FIG. 1 shows a structure of a main portion of a conventional standard 1
megabit DRAM. As shown in FIG. 1, the DRAM comprises a memory cell array
500 including a plurality of memory cells MC arranged in a matrix of rows
and columns. A row of memory cells are connected to one word line WL. A
column of memory cells MC are connected to one column line CL. Normally,
the column line CL is formed by a pair of bit lines. A memory cell MC is
positioned at a crossing of one of the pair of bit lines and one word line
WL. In a 1M DRAM, the memory cells MC are arranged in a matrix of
1024.times.1024 columns. Namely, the memory cell array 500 includes 1024
word lines WLs and 1024 column lines CLs (1024 pairs of bit lines).
The DRAM further comprises a row decoder 502 which decodes an externally
applied row address (not shown) for selecting a corresponding row of the
memory cell array 500; a sense amplifier which detects and amplifies data
of the memory cell connected to the word line selected by the row decoder
502; and a column decoder which decodes an externally applied column
address (not shown) for selecting a corresponding column of the memory
cell array 502. In FIG. 1, the sense amplifier and the column decoder are
denoted by one block 504. If the DRAM has an x1 bit structure in which
input/output of data is effected bit by bit, one column line CL (bit line
pair) is selected by the column decoder. If the DRAM has an x4 bit
structure in which input/output of data is effected 4 bits by 4 bits, 4
column lines CL are selected by the column decoder. One sense amplifier is
provided for each column line (bit line pair) CL in the block 504.
In memory access for writing data to or reading data from the memory cell
MC in the DRAM, the following operation is carried out. First, a row
address is applied to the row decoder 502. The row decoder 502 decodes the
row address and raises the potential of one word line WL in the memory
cell array 500 to "H". Data of the 1024 bits of memory cells MC connected
to the selected word line WL are transmitted to a corresponding column
line CL. The data on the column line CL are amplified by sense amplifiers
included in the block 504. Selection of a memory cell to which the data is
written or from which the data is read of the memory cells connected to
the selected word line WL is carried out by a column selection signal from
the column decoder included in the block 504.
In the above described high speed mode, column addresses are successively
applied to the column decoder included in the block 504. In the static
column mode operation, column addresses applied at every prescribed time
interval are decoded as new column addresses by the column decoder, and
the corresponding memory cell of the memory cells connected to the
selected word line WL is selected by the column line CL. In the page mode,
a new column address is applied at every toggling of the signal /CAS, and
the column decoder decodes the column address to select the corresponding
column line. In this manner, in the high speed mode, one row of memory
cells MC connected to the selected word line WL can be accessed at high
speed by setting one word line WL at a selected state and by changing the
column addresses only.
FIG. 2 shows a general structure of a conventional 1M bit CDRAM. Referring
to FIG. 2, the conventional CDRAM comprises, in addition to the elements
of the standard DRAM shown in FIG. 1, SRAM 506 and a transfer gate 508 for
transferring data between one row of the memory cell array 500 of the DRAM
and the SRAM 506. The SRAM includes a cache register provided
corresponding to each column line CL of the memory cell array 500 so as to
enable simultaneous storage of data of one row of the DRAM memory cell
array 500. Therefore, 1024 cache registers are provided. The cache
register is formed by an SRAM cell. In the structure of the CDRAM shown in
FIG. 2, when a signal representing a cache hit is externally applied, the
SRAM 506 is accessed, enabling access to the memory at high speed. At the
time of a cache miss (miss hit), the DRAM portion is accessed.
A CDRAM as described above having a DRAM of a large storage capacity and a
high speed SRAM integrated on the same chip is disclosed in, for example,
Japanese Patent Laid Open (Kokai) Nos. 60-7690 and 62-38590.
In the above described conventional CDRAM structure, column lines (bit line
pairs) CL of the DRAM memory cell array 500 and column lines (bit line
pairs) of the SRAM (cache memory) 506 are connected in one to one
correspondence through a transfer gate 508. More specifically, in the
above described conventional CDRAM structure, data of the memory cells
connected to one word line WL in the DRAM memory cell array 500 and the
data of the same number of SRAMs as one row of the memory cell array 500
are transferred bi-directionally and simultaneously, through the transfer
gate 508. In this structure, the SRAM 506 is used as a cache memory and
the DRAM is used as a main memory.
The so called block size of the cache is considered to be the number of
bits (memory cells) the contents of which are rewritten in one data
transfer in SRAM 506. Therefore, the block size is the same as the number
of memory cells which are physically coupled to one word line WL of DRAM
memory cell array 500. As shown in FIGS. 1 and 2, when 1024 memory cells
are physically connected to one word line WL, the block size is 1024.
Generally, when the block size becomes larger, the hit ratio is increased.
However, if the cache memory has the same size, the number of sets is
reduced in inverse proportion to the block size, and therefore the hit
ratio is decreased. For example, when the cache size is 4K bit and the
block size 1024, the number of sets is 4. However, if the block size is
32, the number of sets is 128. Therefore, in the conventional CDRAM
structure, the block size is made too large, and the cache hit ratio
cannot be very much improved.
A structure enabling reduction in block size is disclosed in, for example,
Japanese Patent Laid Open (Kokai) No. 1-146187. In this prior art, column
lines (bit line pairs) of the DRAM array and the SRAM array are arranged
in one to one correspondence, but they are divided into a plurality of
blocks in the column direction. Selection of the block is carried out by a
block decoder. At the time of a cache miss (miss hit), one block is
selected by the block decoder. Data are transferred between only the
selected DRAM block and the SRAM block. By this structure, the block size
of the cache memory can be reduced to an appropriate size. However, there
remains the following problem unsolved.
FIG. 3 shows a standard array structure of a 1M bit DRAM array. In FIG. 3,
the DRAM array is divided into 8 memory blocks DMB1 to DMB8. A row decoder
502 is commonly provided for the memory blocks DMB1 to DMB8 on one side in
the longitudinal direction of the memory array. For each of the memory
blocks DMB1 to DMB8, (sense amplifier+column decoder) blocks 504-1 to
504-8 are provided.
Each of the memory blocks DMB1 to DMB8 has the capacity of 128K bits. In
FIG. 3, one memory block DMB is shown to have 128 rows and 1024 columns,
as an example. One column line CL includes a pair of bit lines BL, /BL.
As shown in FIG. 3, when the DRAM memory cell array is divided into a
plurality of blocks, one bit line BL (and /BL) becomes shorter. In data
reading, charges stored in a capacitor (memory cell capacitor) in the
memory cell are transmitted to a corresponding bit line BL (or /BL). At
this time the amount of potential change generated on the bit line BL (or
/BL) is proportional to the ratio Cs/Cb of the capacitance Cs of the
memory cell capacitor to the capacitance Cb of the bit line BL (or /BL).
If the bit line BL (and /BL) is made shorter, the bit line capacitance Cb
can be reduced. Therefore, the amount of potential change generated on the
bit line can be increased.
In operation, sensing operation of the memory block (memory block DMB2 in
FIG. 3) including the word line WL selected by the row decoder 502 is
carried out only, and other blocks are kept in a standby state.
Consequently, power consumption incidental to charging/discharging of the
bit line during sensing operation can be reduced.
When the above described block dividing type CDRAM is applied to the DRAM
shown in FIG. 3, a SRAM register and a block decoder must be provided for
each of the memory blocks DMB1 to DMB8, which significantly increases the
chip area.
Further, the bit lines of the DRAM array and the SRAM array are in one to
one correspondence, as described above. When direct mapping method is
employed as the method of mapping memories between the main memory and the
cache memory, then the SRAM 50 is formed by 1024 cache registers arranged
in one row, as shown in FIG. 2. In this case, the capacity of the SRAM
cache is 1K bit.
When 4 way set associative method is employed as the mapping method, the
SRAM array 506 includes 4 rows of cache registers 506a to 506d as shown in
FIG. 4. One of the 4 rows of cache registers 506a to 506d is selected by
the selector 510 in accordance with a way address. In this case, the
capacity of the SRAM cache is 4K bits.
As described above, the method of memory cell mapping between the DRAM
array and the cache memory is determined dependent on the structure in the
chip. When the mapping method is to be changed, the cache size also must
be changed.
In both of the CDRAM structures described above, the bit lines of the DRAM
array and the SRAM array are in one to one correspondence. Therefore, the
column address of the DRAM array is inevitably the same as the column
address of the SRAM array. Therefore, full associative method in which
memory cells of the DRAM array are mapped to an arbitrarily position of
the SRAM array is impossible in principle.
Another structure of a semiconductor memory device in which the DRAM and
the SRAM are integrated on the same chip is disclosed in Japanese Patent
Laid Open (Kokai) No. 2-87392. In this prior art, the DRAM array and the
SRAM array are connected through an internal common data bus. The internal
common data bus is connected to an input/output buffer for
inputting/outputting data to and from the outside of the device. The
position of selection of the DRAM array and the SRAM array can be
designated by separate addresses. However, in this structure of the prior
art, data transfer between the DRAM array and the SRAM array is carried
out by an internal common data bus, and therefore the number of bits which
can be transferred at one time is limited by the number of internal data
buses, which prevents high speed rewriting of the contents of the cache
memory. Therefore, as in the above described structure in which the SRAM
cache is provided outside the standard DRAM, the speed of data transfer
between the DRAM array and the SRAM array becomes a bottleneck, preventing
provision of a high speed cache memory system.
In this prior art, data are transferred between the DRAM array and the SRAM
array through the internal common data bus. Therefore, an operation which
is generally called "copy back mode" cannot be carried out at high speed.
The "copy back mode" includes the step of transferring data of a
corresponding memory cell in the SRAM array to the original memory cell
position of the DRAM array at the time of cache miss, and the step of
transferring the data of the DRAM memory cell to which an access is
requested to a corresponding memory cell of the SRAM array. Although the
internal common data bus is a bi-directional bus, the data transfer at one
time is one way, namely, from SRAM to DRAM or from DRAM to SRAM.
Therefore, in this structure of the prior art, a number of steps, that is,
selecting a word line in the DRAM array, transferring data from the SRAM
array to the DRAM array, precharging of the DRAM array (setting to the
standby state), selecting of another word line of the DRAM array, and
transferring data of a corresponding memory cell of the selected word line
to the SRAM are necessary, and therefore "copy back" at high speed is
impossible.
In this prior art, data are transferred between the DRAM array and the SRAM
array through the internal common data bus. Therefore, at a time of a
cache miss, access to the SRAM array to read data from the SRAM array
cannot be done until data transfer from the DRAM array to the SRAM array
is completed and the DRAM array is set to the standby state. Namely, at
the time of a cache miss or the like, reading of data cannot be carried
out at high speed.
In a general CDRAM, the DRAM must be refreshed. In the CDRAM in which
access to the DRAM array and access the SRAM array cannot be done
independently, the SRAM array cannot be accessed during refreshing of the
DRAM array. Namely, during this period, the CPU cannot use the cache, and
the performance of the cache system is not available.
In a conventional CDRAM, data output timing is determined by an external
control signal (/CAS and /WE). At this time, before the establishment of
output data, invalid data are output. Dependent on application, for
example in a pipeline application, it is preferred that valid data only
are always output. Accordingly, the conventional CDRAM has limited
application, since the data output timing cannot be changed dependent on
application. When it is to be applied to the pipeline processing, separate
latch means and the like must be externally provided, which inevitably
increases the scale of the cache system. In addition, if such a latch is
externally provided and the latch operation is effected by a system clock,
data output from the latch at one time must be the data of the previous
cycle, in order to prevent latching of invalid data. Data accessed at
present cycle cannot be read, which limits the application.
SUMMARY OF THE INVENTION
An object of present invention is to provide a semiconductor memory device
containing a cache having a novel structure.
Another object of the present invention is to provide an improved
semiconductor memory device containing a cache which can realize a desired
mapping system easily.
A further object of the present invention is to provide an improved
semiconductor memory device containing a cache in which mapping method can
be changed easily without changing cache size.
A still further object of the present invention is to provide a high speed
semiconductor memory device containing a cache which can meet with any
mapping method having proper block size and set number.
A still further object of the present invention is to provide a
semiconductor memory device containing a cache in which data can be
transferred at high speed and effectively between a high speed DRAM array
and a SRAM array.
A still further object of the present invention is to provide a
semiconductor memory device containing a cache in which the DRAM array can
be refreshed without keeping an external CPU kept in a waiting state.
A still further object of the present invention is to provide a
semiconductor memory device in which data can be read at high speed even
at a time of a cache miss.
A still further object of the present invention is to provide a method of
data transfer in a semiconductor memory device enabling high speed copy
back.
A still further object of the present invention is to provide a
semiconductor memory device in which data output timing can be changed
dependent on use.
A still further object of the present invention is to provide a data
transfer device capable of high speed and efficient data transfer between
a DRAM array and a SRAM array.
A still further object of the present invention is to provide a data
transfer device in a semiconductor memory device in which writing and
reading of data can be carried out at high speed even at a time of a cache
miss.
A still further object of the present invention is to provide a data
transfer device in a semiconductor memory device capable of high speed
copy back operation.
A semiconductor memory device in accordance with the present invention
includes an internal data line connected to an input/output buffer for
inputting/outputting data to and from the outside of the device; a DRAM
array formed of a plurality of dynamic memory cells arranged in a matrix
of rows and columns; and an SRAM array formed of a plurality of static
memory cells arranged in a matrix of rows and columns.
The semiconductor memory device in accordance with the present invention
further includes data transfer means provided independent from said
internal data line for transferring data between DRAM array; first
connecting means for simultaneously selecting a plurality of memory cells
from the DRAM array in response to an externally applied first address and
for connecting the selected plurality of memory cells to said transfer
means; and a second connecting means for simultaneously selecting a
plurality of memory cells from the SRAM array in response to an externally
applied second address and for connecting the selected plurality of memory
cells to said transfer means. The first and second addresses are applied
independent from each other.
The semiconductor memory device in accordance with the present invention
further includes means | | |