|
Claims  |
|
|
What is claimed is:
1. In a computer processor, a cache circuit for one-cycle operation of
cache read and cache write instructions, said cache circuit allowing
writing operations of a first one of said cache instructions in a first
cycle supplied by a main clock signal to be overlapped with a second one
of said cache instructions in a second cycle immediately following the
first cycle, said cache circuit including:
a physical address means for providing a physical address having a
plurality of fields including a tag address field, a byte offset field,
and a set select field including set select data;
a set associative cache having a plurality of sets, each set including one
or more cache lines for storing computer data, each cache line associating
a tag field, a status field, and a data field, said cache including
a data array having a plurality of data fields,
a tag array having a plurality of tag fields, and
a status array having a plurality of status fields, each status field
including a tag parity field indicative of the parity of the tag, and a
valid field indicative of the validity of the data;
a set select circuit coupled to the physical address means for decoding set
select data of a physical address to select a set of cache lines;
control means for providing a tag enable signal and also for providing a
status enable signal delayed in timing from said tag enable signal;
a plurality of tag drivers coupled to the set select circuit, each tag
driver responsive to said selected set of cache lines and the tag enable
signal, for enabling a tag field in the tag array; and
a plurality of status drivers coupled to the set select circuit, each
status driver being responsive to said selected set of cache lines and the
status enable signal, for enabling a status field in the status array.
2. The cache circuit of claim 1 wherein said set select circuit includes
a decoder coupled to receive set select data from the physical address
means and to select said set responsive thereto;
a set latch coupled to the decoder to latch the set selected by said
decoder, said set latch having a plurality of outputs coupled to the
plurality of tag drivers and status drivers for supplying said selected
set to the drivers for that selected set, and
a latch control circuit coupled to said set latch for latching said
selected set in the first cycle, wherein said latch control circuit
continuously controls latching by said set latch into said second cycle.
3. The cache circuit of claim 2 further comprising:
delayed clock means for delaying the main clock signal, said delayed clock
means coupled to the set latch for supplying a delayed clock cycle
thereto; and
wherein said set latch is transparent during a first half of said delayed
cycle so that the selected set is passed through to the status and tag
drivers in said first half, and said set latch latches and holds the
selected set during the second half of the delayed clock cycle.
4. The cache circuit of claim 1 further comprising, for each tag and status
array:
a tag bus coupled to the tag fields of the tag array;
a status bus coupled to the status fields of the status array;
a comparator coupled to the tag bus and the physical address means, said
comparator for comparing an enabled tag field provided on the tag bus with
a tag address provided by said physical address means; and a parity
generator circuit, coupled to the tag bus, for generating a generated
parity from said enabled tag field.
5. The cache circuit of claim 4 further comprising:
parity checking means, coupled to the status bus, for comparing the
generated parity with the enabled parity field.
6. The cache circuit of claim 5 further comprising:
a compare signal output from said comparison circuit indicative of a match
between the enabled tag field and the tag address;
a valid signal provided on said status bus responsive to the valid field in
an enabled status field; and
a gate, responsive to said compare signal and said valid signal, for
providing a hit signal if said compare signal indicates a match and the
valid signal indicates that the data field associated with said enabled
status field line is valid.
7. The cache circuit of claim 6 further comprising:
a parity signal output from the parity checking means indicative of a
result of comparing the generated parity with the enabled parity field;
and
a control means, responsive to the hit signal and the enabled parity
signal, for preventing use of data from the selected cache line if there
is not a hit or if parity is incorrect, otherwise allowing use of the
data.
8. In a computer processor whose operations are defined by a series of
clock cycles, a method of writing a tag address and its associated parity
information to a cache having a tag array, a status array, and a data
array, said cache having a plurality of cache lines for storing computer
data, each cache line associating a data field in a data array with a tag
field in the tag array, and a status field including a tag parity field in
the status array, said method comprising the steps of:
(a) supplying a decoder with a physical address to be written, including a
tag address, a set select field including set select data, and a byte
offset;
(b) decoding said set select data to select a cache line;
(c) enabling a tag field associated with the selected cache line in the tag
array;
(d) writing said tag address to said enabled tag field;
(e) generating a parity bit indicative of the parity of said tag address;
(f) subsequently to said step (c), enabling a status field associated with
the selected cache line in the status array; and
(g) writing the parity bit generated in said step (e) to the parity field
in said enabled status field.
9. The method of claim 8 wherein
the series of cycles defining computer operation include a first cycle and
a second cycle,
the steps (a) through (c) are completed in the first cycle, and
the steps (d) through (g) are completed in the second cycle.
10. The method of claim 8 wherein
the cycles defining computer operation include a first cycle and a second
cycle,
the steps (a) through (e) are completed in the first cycle, and
the steps (f) and (g) are completed in the second cycle.
11. The method of claim 10 including, following completion of step (d) in
the first cycle, the steps of:
(h) supplying a second physical address including a second tag address, a
second set select field, and a second byte offset; and
(i) decoding said second set select field in the second cycle to select a
second cache line, said decoding of said second set select field occurring
simultaneously with said step (g).
12. The method of claim 11, further including steps for reading data
previously written to said second cache line, including:
(j) enabling the previously written second cache line in the tag array to
select a second tag field;
(k) comparing said second tag field with said second tag address, and
outputting a compare signal indicative of the results of said comparison;
(l) generating a second generated parity from said second tag field;
(m) enabling the second cache line in the status array to select a second
status field including a second parity field;
(n) comparing the second parity field with said second generated parity to
output a parity check signal indicative of the results of said comparison;
and
(o) if said compare signal indicates that the second tag field matches said
second tag address, and if said parity check signal indicates that the
second parity field matches said generated parity, then supplying a hit
signal to indicate the validity of the data field of said second cache
line.
13. The method of claim 10 wherein operation of the steps (f) and (g)
begins in the first cycle and is completed in the second cycle.
14. In a computer processor for executing instructions timed with a series
of clock cycles, a method for accessing a cache line for reading or
writing, and calculating and checking the parity of tag data in the cache
line, so that a read instruction or a write instruction can be performed
each cycle, said method comprising the steps of:
(a) supplying a first clock cycle and a second clock cycle next following
the first clock cycle;
(b) supplying a physical address including a tag address and a set select
field;
(c) if a read is requested, then in the first clock cycle executing the
steps of
(c)(1) selecting a first cache line,
(c)(2) enabling a tag field of said first cache line,
(c)(3) reading a tag from said enabled tag field,
(c)(4) comparing the enabled tag field with the physical address to
determine if there is a match,
(c)(5) if a match is determined in step (c)(4), then supplying a tag match
signal, first signal, and otherwise, supplying a second signal,
(c)(6) checking the parity of the enabled tag field by generating a parity
of the enabled tag field and comparing it with a predetermined parity
stored in a status field of said first cache line,
(c)(7) if the generated parity of said enabled tag field matches said
predetermined parity, then supplying a parity match signal,
(c)(8) determining a hit only if both the tag match signal and the priority
match signal are provided, otherwise determining a miss,
(d) if a write is requested by an instruction, then
(d)(1) in the first clock cycle executing the steps of supplying a physical
address including a tag address that selects a second cache line to be
written to, and writing the tag address to the second cache line,
(d)(2) in the first clock cycle and continuing into the second clock cycle,
calculating the parity of the tag address to supply parity data and
storing said parity data in said second cache line. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to caches in computer processors that have
short cycles for faster performance. More specifically, the present
invention relates to reading and writing a cache line in a computer
processor and generating and checking parity to verify data integrity for
the cache.
2. Description of Related Art
Computer processors are being designed and re-designed by research and
development teams to process instructions faster. Computer processors
perform tasks by executing a series of instructions that are supplied from
a memory source. Thus, faster instruction processing generally means
higher performance. Clock cycles are used to define boundaries for
instruction execution. One way to increase performance is to reduce the
period of each clock cycle so that the computer processes instructions at
a higher rate of speed. However, shortening the clock period is not always
achievable because limits imposed by microprocessor fabrication technology
require a minimum time period for many operations. For example, 18
nanoseconds (ns) may be the minimum time necessary for the hardware to
execute a common instruction for a given technology. If the clock period
is shortened from 20 ns to 10 ns, then two clock cycles will be required
to execute the 18 ns instruction instead of one, and no time savings will
have been realized. Thus, reduction of the clock period is advantageous
only if the instructions can fit within shorter time constraints.
Another way to increase performance is to reduce the number of clock cycles
necessary to execute common computer operations. Memory accesses, which
can consume three or more clock cycles, are common operations. In a memory
access, an instruction may instruct the processor to read data from
memory, or to store data in memory. If the processor executes instructions
faster than it accesses memory, then memory access times could
substantially delay computer operation, because often the processor must
stall other operations while waiting to receive the data. Furthermore,
because instructions are stored in memory, computer operation will be
delayed if memory access times are longer than the average number of
clocks per instruction. In order to reduce the time of memory access, a
"cache" may be utilized to store and supply often used instructions and
data. In most caches, one or two clock cycles is the maximum time
necessary to retrieve data from a cache, in comparison to three or more
cycles to retrieve data from memory. If the processor is faster than
memory, for example if one instruction is executed per clock, substantial
time savings and large increases in performance can result from use of a
cache that can perform one cache access per clock.
Caches are organized in "lines". A cache may include hundreds of cache
lines, each line including a selected block of memory which may be many
bytes in length. There are many types of caches. In a fully associative
cache, data can be stored in any cache line, regardless of its address. In
a set associative cache, the cache lines are organized into "sets". Each
set is assigned to hold data that has common lower address bits (the set
address), and the cache lines in a particular set can hold data only if
the lower bits match the set address. Because the set address uses the
lower bits of an address, a long block of data can be stored in a series
of sets. This is advantageous because data is usually read or written
sequentially from a large block of memory. There are further advantages to
a set associative cache. In a set associative cache, searching for a data
match is simplified because the cache lines from only one set need be
checked.
Each cache line is divided into fields that include a tag field indicative
of the upper portion of address of the memory block, and a data field that
stores the data at the memory location specified by the tag field. An
exemplary address to access a cache includes a tag field indicative of the
upper portion of address of the memory block, a set field indicative of
the lower portion of the address, and a byte offset field to define the
byte to be taken from the data. If a memory access occurs at a
predetermined address, then the computer usually looks first to the cache
to determine if a match (i.e., a "hit") can be found. If a hit occurs
during execution of a read operation, then the data can be read from the
cache line in the same cycle without the time-consuming memory access.
During a write operation, the data is written to the cache line and the
upper address is stored in the tag.
Often, it is desirable to verify the integrity of information stored in the
cache, to guard against the small but distinct possibility that the stored
data may have been altered in some way. Parity may be used for this
purpose. The "parity" of computer data is defined by the number of set
bits in a binary representation of the data. If the data has an even
number of set bits, then an "even parity" results. But if the data has an
odd number of set bits, then the data has an "odd parity". A "parity bit"
is usually appended to the computer data to provide a preselected parity.
For example, if the parity is predetermined to be "even" for each line of
computer data in the cache, then the parity bit gives the data an even
parity by either setting or clearing the parity bit according to the
number of set bits in the data.
Parity checks are useful for both stored data (including instructions) and
tags in a cache. If the stored data and tags are housed in separate arrays
in the cache, then a location in the respective array is usually reserved
for the parity bits, so that the data parity bit is stored together with
the data in a data array, and the tag parity bit is stored together with
the tag in a tag array. During a write to the cache, this configuration
can slow cache operation because, although the data and the tag are
available before the parity information, they cannot be written until
after the parity information is calculated and becomes available. Parity
information is not data, and provides no benefit other than data
verification. It would be an advantage to provide a cache that allows
immediate writing of the data and tag to its respective arrays, while
still providing the advantages of parity verification.
It is advantageous if only one cycle is consumed by cache operations
including a read and a write, and the associated parity checking. This is
particularly advantageous if the processor speed is one clock per
instruction. It would be advantageous to provide a cache and a method for
performing cache operations that requires only one clock cycle per cache
operation, and yet fits within the constraints of a short clock cycle of
high speed computer processors.
SUMMARY OF THE INVENTION
The present invention provides a cache circuit for a computer and a method
for performing cache operations (e.g., read and write) in a single, short
cycle. The cache includes a tag array, a status array, and a data array.
Parity information is generated and checked to verify data and tag
integrity. The parity field is stored in a status array physically
separate from the tag array. The status array is offset in timing so that
it lags behind the tag array for both read and write operations.
Therefore, fields in the status array can be written in the early part of
the next clock cycle without affecting the tag array or another operation
that may be scheduled for the next time cycle. Particularly, the cache
circuit provides overlapping writing of a first cache instruction in a
first cycle with a second cache instruction in a second cycle immediately
following the first cycle. An advantage of this arrangement is that the
integrity of data in the cache is verified by parity checking with
virtually no time penalty, because data processing is separated from
verification of the data.
The cache circuit includes a physical address means for providing a
physical address having a plurality of fields including a tag address
field, a byte offset field, and a set select field including set select
data for selecting a cache line. Also included is a set associative cache
having a plurality of sets, each of which has one or more cache lines for
storing computer data, each cache line associating a tag field, a status
field, and a data field. The set associative cache includes a data array
for storing a plurality of data fields, a tag array for storing a
plurality of tag fields, and a status array for storing a plurality of
status fields. Each status field includes a tag parity field indicative of
the parity of the tag and a valid field indicative of the validity of the
data. A decoder is coupled to the physical address means for decoding set
select data of a physical address to select a set. A control means
provides a tag enable signal and a status enable signal delayed in timing
from said tag enable signal. A tag driver is coupled to the decoder,
responsive to said selected cache line and the tag enable signal, for
enabling a tag field in the tag array, and a status driver is coupled to
the decoder, responsive to said defined set and the status enable signal,
for enabling a status field in the status array.
The method for writing a tag address and its associated parity information
to a cache includes the steps of supplying a physical address to be
written. The physical address includes a tag address, a set select field
including set select data, and a byte offset. The set select data is
decoded to select a set, which includes one or more cache lines. A tag
field associated with the selected cache line in the tag array is enabled.
The tag address in the cache address latch is written to the selected tag
field at the end of the main cycle or in the beginning of the next main
cycle. Subsequently to enabling a tag field, a status field associated
with the selected cache line in the status array is enabled. A parity bit
is generated that is indicative of the parity of said tag address. The
parity bit is written to the selected parity field in an operation that
continues through the end of the main cycle and continues into the
beginning of the next main cycle.
The features and advantages described in the specification are not all
inclusive, and particularly, many additional features and advantages will
be apparent to one of ordinary skill in the art in view of the drawings,
specification and claims hereof. Moreover, it should be noted that the
language used in the specification has been principally selected for
readability and instructional purposes, and therefore resort to the claims
is necessary to determine the inventive subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a cache and cache circuitry in one preferred
embodiment.
FIG. 2 is a diagram of cache and cache circuitry of FIG. 1, illustrating
the signals flowing between the circuit elements.
FIGS. 3A and 3B are a timing diagram illustrating timing of the signals in
a preferred embodiment, including overlapped timing.
FIGS. 4A and 4B are flow charts illustrating cache read operations.
FIG. 5 is a flow chart illustrating cache write operations.
FIG. 6 is a block diagram of an addressing system for providing a physical
address.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
FIGS. 1 through 6 of the drawings disclose various embodiments of the
present invention for purposes of illustration only. One skilled in the
art will readily recognize from the following discussion that alternative
embodiments of the structures and methods illustrated herein may be
employed without departing from the principles of the invention.
The preferred embodiment includes a two-way set associative cache that has
128 sets for storing data blocks. Having two ways, each set has two cache
lines, one for each data block. However, the invention has application in
any N-way set associative cache: the number of "ways" and the number of
"sets" is not crucial to the invention. The number "N" can even be one. In
an N-way set associative cache, each cache set is divided into N cache
lines. For purposes of the following description, the term "data" will be
used in its usual sense to include any binary representation, including
instructions. When a cache line is to be stored, the data is first mapped
onto a set, and then the data is placed in a block within the set. In the
preferred embodiment, the two-way set associative cache has 128 cache sets
(256 lines) for storing instructions. The 2-way set associative cache
defines a block of data (the "data field") as 32 bytes. Data at a given
memory address can be stored in only one of the 128 cache sets, but that
data may be stored in either of the two "ways". The "way" in which the
data is to be stored is preselected by a control unit.
Reference is first made to FIG. 1 which is a block diagram of a cache and
associated circuitry incorporating the present invention. A control unit
20 controls operations to be described, including memory fetch operations
and generation of a physical address. The control unit 20 also controls
operations within the cache. Particularly, a timer 22 illustrated within
the control unit 20 provides timing signals that will be described in
detail with reference to FIGS. 2, 3A, and 3B. These signals supplied by
the timer 22 include a tag enable signal, a status enable signal, a tag
precharge signal, and a status precharge signal. The two tag signals are
offset from the two status signals. Particularly, the status signals are
delayed from the tag signals, as will be described later in more detail.
A physical address is illustrated in a box 30. Generation of the physical
address in the preferred embodiment is described later in more detail with
reference to FIG. 6. The physical address 30 is divided into a plurality
of fields including a tag address field 32, a set select field 34, and a
byte offset field 36. The tag address field 32 stores a tag address
indicative of the upper bits of the physical address, the set select field
34 stores set select data indicative of a set in the cache, and the byte
offset field 36 stores a byte offset indicative of the number of bytes to
be offset in the cache line. Particularly, in the cache to be described,
the preferred embodiment stores thirty-two bytes in a line, and thus the
byte offset is utilized to specify which of the thirty-two bytes in the
cache line are to be applied to a buffer 38. The data in the buffer 38 is
then utilized in any appropriate manner. In the preferred embodiment, the
data is an instruction which is to be executed in a pipeline 39.
The set select data from the physical address 30 is applied to set select
circuitry 40 which will be described in more detail with reference to FIG.
2. The set select circuitry 40 includes a decoder 46 that selects one of
the 128 sets in the cache. Each set has a number of cache lines defined by
N, which is two in the preferred embodiment. The selected set from the
decoder 46 is applied to field drivers 50 including N.times.128 tag
drivers 52, 128 status drivers 54, and N.times.128 data drivers 56. The
circuitry for the drivers 50 is also discussed in more detail with
reference to FIG. 2. Each of the field drivers 50 is associated with one
entry in a cache 60. The cache 60 in the preferred embodiment includes two
tag arrays 62, one status array 64 and two data arrays 66. Each field in
the arrays 62,64,66 is enabled by selecting one of the field drivers 50. A
cache line includes all information associated with one block of data, and
is defined to span each of the arrays 62, 64, 66 in the cache 60.
The tag array 62 includes a plurality of tag fields 70 which can be enabled
by one of the tag drivers 52. A tag field 70 is associated with each of
the 128 cache lines. The status array 64 includes a plurality of status
fields 72, one for each of the 128 cache lines, which are enabled by one
of the status drivers 54. The status array field 72 includes an LRU field
74, a parity field 76 and a valid field 78. The LRU field means "least
recently used" and is used by the control unit 26 for determining which
cache entry will be replaced on a write. A parity field 76 specifies the
parity of the tag fields 70. The valid field 78 specifies whether or not a
particular cache line is valid.
Cache processing circuitry 79 includes circuitry for comparing tags,
verifying parity of the tags, verifying the validity of the cache lines,
and buses for communicating the fields in the status array 64 and the tag
arrays 62. The following discussion describes operation for one of the
ways. For each of the N tag arrays 62, the cache processing circuitry 79
includes a duplicate circuit 79. Therefore, in the preferred embodiment
which has two ways, the boxes shown within the cache processing circuitry
79 are duplicated. A different tag array 62 is used for each way, so that
the number of tag arrays 62 is equal to the number of ways. However, only
one status array 64 is used to store status information for all ways.
Each tag field 70 within the tag array 62 is accessed through a tag bus 80.
The tag bus 80 is connected to supply an enabled tag field 70 to a
comparator 82 and a parity generator 84. The tag bus 80 is also connected
to receive the tag address from the physical address 30 on a write. The
comparator 82 receives one input from the enabled tag field 70 on the tag
bus 80 and another input from the tag address. The output from the
comparator 82, a compare signal, is supplied to an AND gate 86. The parity
generator 84 is connected to supply its output to a parity checker 88 and
a status bus 90. The status bus 90 supplies an enabled status field 72,
including a parity field 76, from the status array 64 to the parity
checker 88. Furthermore, the parity generator 84 supplies a generated
parity to the status bus 90 on a write so that parity can be written to
the status array 64. The parity checker 88 supplies a parity check signal
to the control 20, where it is used as described further with reference to
FIG. 4.
The status bus 90 is also used to read or write a valid bit 78 and an LRU
field 74. The valid field 78 can be directly used for a valid signal to
the AND gate 86. For example, the valid bit 78 may be a "1" and therefore
signify that the line is valid, otherwise, if the valid bit 78 is "0", it
denotes that the line is not valid. The valid signal and the compare
signal are applied to the AND gate 86, so that if the data is valid and
the results of the comparison are correct, then a hit signal is provided
to the control 20.
Reference is now made to FIG. 2 which is a schematic diagram illustrating
the circuitry associated with reading and writing a cache line. FIGS. 3A
and 3B are timing diagrams that illustrate reading and writing a line to
the status array 64 and the tag array 62. Circuitry for the data array 66
is similar to that of the tag array 62. The schematic diagram of FIG. 2 is
particularly useful in conjunction with the timing diagrams of FIGS. 3A
and 3B.
Reference is therefore made to FIGS. 2, 3A, and 3B. A clock cycle is
defined along the top of FIGS. 3A and 3B. The cycle is divided into
fourteen intervals, each representing one nanosecond (ns). Timing details
are provided herein to describe the preferred embodiment of the invention.
It should be noted that some of the specified times are approximations.
Beginning at the time -1.5 ns (before the beginning of a cycle), set select
data from the set select field 34 is supplied to input latches 100 in the
set select circuitry 40. Because the set select data has seven bits, there
are seven input latches 100. The input latches 100 are each latched by a
main clock signal, however, the input latches 100 are "transparent" when
the clock is low. When the input latches 100 are transparent, the set
select data appearing at the input passes through the latches 100 with a
very short delay. When the clock goes high, then the set select data is
latched into those latches 100 and remains latched until the main clock
cycle again drops low. FIG. 3A shows a timing line indicating availability
of the set select data to the decoder 46, which includes a plurality of
predecoders and decoders to be described.
The output of the input latches 100 include seven lines 102 that are
supplied to predecoders 104. There are a number of predecoders, each of
which receives one or more of the seven lines 102. The outputs from the
predecoders 104 are supplied on lines 106 to the decoders 108. The lines
106 include 128 sets of three lines each. These 128 sets are supplied to
decoders 108, which in the preferred embodiment include three-input AND
gates. The 128 outputs of the decoders 108 are supplied on set lines 110
to output latches 112. Each output latch 112 receives and latches one of
the set lines 110. The output latches 112 are clocked by a delayed and
inverted clock signal. A suitable delay is provided by an inverter element
114 that delays and also inverts the main clock signal. The output latches
112 are transparent while the inverted delayed clock signal is low. When
the inverted delayed clock signal becomes high, data on the set lines 110
is latched in, and remains latched in while the inverted delayed clock
signal remains high.
During operation, shortly after initial clock edge of the main clock cycle
in which the set select data is clocked into the input latches 100, the
inverted delayed clock signal becomes low (and therefore the output
latches 112 become transparent). Thereafter, the decoded set select data
becomes available on the set lines 110 and then the selected set will pass
through the output latches 100. This is illustrated in FIG. 3A as
occurring at approximately 3.5 ns. Subsequently at 8.5 ns, during the
second half of the main clock cycle, the inverted delayed clock signal
becomes high and therefore latches the selected set. The selected set
remains latched beyond the end of the second half of the main clock cycle
for a period determined by the clock delay which is illustrated as 1.5 ns.
This delayed and inverted latching mechanism, which provides continued set
selection beyond the end of the main clock cycle, has an advantage in that
data can be written to one of the arrays 60 even after the end of the main
clock cycle.
The output latches 112 supply the selected set on select lines 122. Each
select line 122 is connected to one of a plurality of field drivers 50.
For purposes of illustration, only a single status driver 54a and a single
tag driver 52a are shown. It should be remembered that there is one driver
52,54 for each field in the cache 60 and therefore the number of drivers
52,54 equals the number of fields in the cache 60. One of the select lines
122 is connected to the status driver 54a and the tag driver 52a.
Furthermore, this select line 122 may also be connected to (see FIG. 1) a
data driver 56. Specifically, the select line 122a is supplied to an AND
gate 123 within the status driver 54a and an AND gate 124 within the tag
driver 52a. A status enable signal from the timer 22 in the control unit
20 (FIG. 1) is supplied to an enable status line 125 which is connected to
the other input of the AND gate 123. A tag enable signal from the timer 22
in the control unit 20 (FIG. 1) is supplied to an enable tag line 126
which is connected to another input of the AND gate 124. Thus, the enable
signals on the enable status line 125 and the enable tag line 126 control
the timing of enabling of the respective arrays 62, 64, as will be
described in more detail subsequently, for example in the timing diagrams
of FIGS. 3A and 3B.
A status word line 128 connects the output of the AND gate 103 to enable a
specific field 72a in the status array 64. The status array 64 receives a
status precharge signal on a line 129. The status precharge signal is
timed to provide a high output which precharges the field 72a in the
status array 64 before the status enable signal enables a status word line
128. Similarly, a tag word line 130 is connected to the line 102 from the
decoder 50, to enable a field 70a in the tag array 62. The tag array 62
receives a tag precharge signal on a line 131. The tag precharge signal is
timed to precharge the field 70a in the tag array 62 before the tag enable
signal enables the tag word line 130. The output signals from the latches
112 on the lines 122 are illustrated in FIG. 3A beginning approximately at
1.5 nanoseconds after the beginning of the cycle.
FIG. 3A illustrates timing during a read operation and FIG. 3B illustrates
timing during a write operation. Reads and writes are the two basic
operations described herein. Read-modify-write instructions may be
implemented as a combination of reading and writing. Beginning with FIG.
3A, it can be seen that, during a read, the enable tag line 126 is brought
high at approximately 3.5 ns, and continues to be high until approximately
8.5 ns. The selected tag word line 130 is high at 4.5 ns, and continues to
be high until about 9.5 ns. As a result, at approximately 7.5 ns, the tag
field 70a from the tag array 62 is enabled on the tag bus 80. This enabled
tag field 70a is latched into a latch (not shown) connected to the output
of the tag array 62 to hold the enabled tag field 70a for the remainder of
the cycle, as illustrated in FIG. 3A. At 7.5 ns, the enabled tag field 70a
is supplied on the tag bus 80 to the tag comparator 82, and the tag
address is also supplied to the tag comparator 82. The tag address
requires translation in addition to the set s | | |