|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the operation of a communication bus in a
multiprocessor computer system and, more specifically, to a bus protocol
accessing interleaved memory modules.
2. Description of the Related Art
Due to the demand for increased processing speed and volume, many computer
systems, and other information processing systems, employ multiple central
processing units (CPUs). Typically, in such multiprocessor systems,
multiple CPUs communicate with memory modules, input/output (I/O) devices,
and other peripheral units, via a main system bus. Since the bus can only
be used by one processor at a time, such multiprocessor systems typically
use a bus protocol that determines which processors have control of the
bus at any given time.
Within a typical multiprocessor system, the bus protocol calls for the bus
to be in one of four phases, or states. In an inactive, or bus free state,
none of the CPUs control the bus or are vying for control of the bus. The
bus enters an arbitration state when one or more of the CPUs indicates
that one of the memory modules, or other units accessible on the bus, is
to be accessed. In the arbitration state, the CPUs competing for control
of the system bus determine which CPU should gain control of the bus based
upon the priority of the requests issued by the respective CPUs. Control
of the bus is granted to one of the CPUs in a selection state. Once
control of the bus has been granted to one of the CPUs, the bus enters an
active, or data/control state wherein data and control signals are
transferred over the bus to other units in communication with the bus.
Data bus width and clock speed are the bus parameters which are usually
considered when measuring bus performance. However, in order to increase
processing speed and volume, bus efficiency must be considered in addition
to these parameters. That is, when a CPU has control of the bus, there is
often some dead time wherein no data is being transferred along the bus.
The efficiency of the bus decreases when dead time as a percentage of the
time the CPU has control of the bus increases.
One of the main causes of bus inefficiency is the delay observed when a
memory module has to recover data for successive CPU requests. When a
first request is issued to a memory module, the module is generally in a
ready state so that the memory module can access data with little delay
(usually within one clock cycle). However, if the same memory module is
immediately accessed again, the module typically will exhibit a delay
before transferring data. This delay is typically called "recovery time."
While the memory module is accessing data, no data is transferred across
the system bus during the recovery time period. Thus, bus efficiency is
decreased whenever successive requests are made to the same memory module.
One way to improve bus efficiency involves interleaving the memory
addresses within the memory modules on a system bus. When memory modules
are interleaved, successive memory storage locations (i.e., memory
locations having consecutive addresses) are placed in separate memory
modules. Since associated data is typically stored in successive memory
storage locations, and a group of associated data is likely to be accessed
at once, it is likely that a CPU will access several successive memory
locations in a row for a typical memory access. By placing successive
memory locations in separate memory modules, the effects of recovery time
delay for a given memory module are reduced. This is because a CPU will
typically request data from one memory module, and then request the next
address, which is stored in another memory module, and so on, so that each
memory module is given a chance to recover from the last request. Thus,
interleaving memory modules has been found to be an effective way of
increasing bus efficiency.
In multiprocessor systems, however, memory interleaving is typically not as
effective. This is because the system bus must share multiple CPUs, and
each CPU has an opportunity to vie for control of the system bus after
each data transaction. That is, the system bus usually enters the
arbitration state whenever more than one CPU has a request to fill. In a
typical case, a first CPU may access successive memory locations (and
hence, different memory modules) if it maintains control of the system
bus, however, when a second CPU is granted control of the bus, the data
requested by the second CPU will usually have no relation to the data
requests of the first CPU. Thus, there is no way of assuring that a
different memory module than the memory module just accessed by the first
CPU will be accessed by the second CPU. This may result in bus
inefficiency due to the recovery time when the same memory module is
accessed by the second CPU. In this way, the benefits of memory
interleaving may be severely compromised.
Some systems have attempted to compensate for the bus inefficiency
associated with multiprocessor systems. For example, U.S. Pat. No.
4,669,056 entitled DATA PROCESSING SYSTEM WITH A PLURALITY OF PROCESSORS
ACCESSING A COMMON BUS TO INTERLEAVED MEMORY STORAGE, to Waldecker,
discloses a method of increasing system bus efficiency. In the Waldecker
patent, the addresses accessed by each of the CPUs are selected so that
when control of the bus is switched to another CPU, a memory request is
assured of going to a different memory module than that accessed by the
previous CPU. However, this method will not operate in conjunction with a
conventional CPU (e.g., an INTEL CPU). Even if such a device were to be
implemented within a CPU having pipelining capabilities, it appears that
additional data buffer circuitry would be required to accommodate address
requests which were not in the proper order to assure proper interleaving.
In another system, disclosed in U.S. Pat. No. 5,287,477 entitled
MEMORY-RESOURCE-DRIVEN ARBITRATION, to Johnson, et al., special memory
status queues hold information regarding the status of each of the
interleaved memory modules in communication with the system bus. The
master devices on the system bus (e.g., the CPUs) monitor the local memory
status queue in order to determine which of the memory modules are busy.
Those master devices which have pending requests for busy memory modules
are inhibited from arbitrating for control of the system bus. However,
such an implementation requires that master devices having requests to
ready memory modules rearbitrate for control of the bus. This may cause
system bus inefficiencies since the arbitration and selection states of
the bus must be re-entered, and in these states no data or control signals
are transferred over the system bus. Furthermore special queues are
necessary to implement such a system.
SUMMARY OF THE INVENTION
The present invention provides an apparatus and method for improving bus
efficiency in a memory interleaved, multiprocessor system. A cache line
interleave memory subsystem monitors pending addresses from the processor
units waiting to access the system bus. If the pending addresses in the
CPU which has control of the system bus is to an idle memory module (i.e.,
a memory module which is immediately ready to process a memory request),
then the subsystem circuit of the present invention allows the CPU to
maintain control of the bus ("bus hogging"). Once the CPU in control of
the bus has a pending address request to a busy memory module, other CPUs
on the system bus are able to vie for control of the bus in the
arbitration phase. A counter circuit keeps track of the number of
sequential cycles which a CPU has run while "hogging the bus." In the
event that the number of cycles in which the same CPU has control of the
bus exceeds a designated value, the subsystem causes the system bus to
enter the arbitration state to insure that other processing performance
factors are not compromised.
A multiprocessor information processing circuit has multiple interleaved
memory modules. The circuit comprises a system bus; first and second
interleaved memory modules in communication with the system bus; and first
and second central processing unit (CPU) modules in communication with the
interleaved memory modules via the system bus. Each of the modules
comprises a CPU and a cache memory, wherein the CPU generates address
requests for accessing selective ones of the interleaved memory modules,
and transmits and receives data to and from the interleaved memory
modules; an address decoder circuit in communication with the CPU and
cache memory, wherein the address control circuit receives address and
control data indicative of the presence of a pending address request
generated by the CPU. The address and control data further indicates the
number of the interleaved memory modules on the system bus. Each of the
modules further comprises an address latch circuit which latches pending
addresses generated by the CPU in response to a command from the address
decoder circuit; an address comparator circuit which compares addresses
output by the address latch circuit and the pending address requested by
the CPU; an interleave register which receives data that indicates the
number of the interleaved memory modules on the system bus from the
address decoder; control circuitry which receives inputs from the address
comparator circuit and the interleave register and, based upon the inputs,
generates a signal requesting control of the system bus when the pending
address request is issued to a different memory module from the memory
module accessed by the previous address request issued by the CPU; and a
bus controller which receives the signal generated by the control
circuitry and causes the CPU module to retain control of the system bus
when the control circuitry requests control of the system bus, or releases
control of the system bus when the control circuitry does not request
control of the system bus.
In a preferred embodiment, the CPU modules of the multiprocessor circuit
further comprise a transfer count register which stores a transfer count
value as determined by the address decoder; a transfer counter which
stores a counter value that is incremented each time a data transfer cycle
is performed between the CPU and one of the interleaved memory modules;
and a transfer count comparator circuit which compares the transfer count
value stored in the transfer count register and the counter value stored
in the transfer counter, and provides a terminate control signal to the
bus controller if the counter value is equal to the transfer count value.
Under another aspect, the present invention provides a monitoring
subcircuit for use in a processor module within a multiprocessor system
having a system bus in communication with interleaved memory modules. The
processor module generates address requests on the system bus for
accessing selected ones of the interleaved memory modules. The monitoring
subcircuit comprises an address locator circuit which determines if a
pending address request generated by the processor module is directed to a
memory module which received an immediately preceding address request
generated by the processor module; and a control circuit which indicates
that the processor module should retain control of the system bus when the
address locator circuit determines that the pending address request is
directed to accessing a different memory module than the memory module
which received the immediately preceding address request generated by the
processor module.
In a preferred embodiment, the monitoring subcircuit further comprises a
terminate control circuit which generates a signal indicating that the
processor module should release control of the bus once the processor
module has run a maximum number of consecutive data transfers without
relinquishing control of the system bus.
Under yet another aspect, the present invention provides a multiprocessor
information processing system which comprises a system bus; a plurality of
memory modules in communication with the system bus; and a plurality of
processing modules. Each of the processing modules includes a subcircuit
which monitors addresses requested by the processing modules, and wherein
the subcircuit grants the local processing circuit control of the system
bus for a next data transfer cycle if a current memory address is to a
different memory module than a previous address request.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic block diagram showing a simplified multiprocessor
system including multiple CPUs, as well as multiple interleaved memory
modules.
FIGS. 2A and 2B together illustrate a schematic block diagram showing the
internal components of a CPU module of FIG. 1 constructed in accordance
with the teachings of the present invention.
FIG. 3 is a schematic block diagram which shows the main internal circuitry
of the hog request control circuitry of FIGS. 2A and 2B.
FIG. 4 is a timing diagram which illustrates an exemplary data request and
transfer cycle on the system bus of FIG. 1 according to conventional data
accessing methods.
FIG. 5 is a timing diagram which illustrates exemplary data request cycles
employing the apparatus and method of the present invention and which
shows the improved system bus efficiency obtained by means of the present
invention.
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a simplified schematic block diagram showing a multiprocessor
information processing system 100, which may, for example, comprise a
personal computer, a computer mainframe, or other information-processing
systems which require multiple processing units. The multiprocessor system
100 includes a system bus 110 which provides communication amongst a first
CPU module 120, a second CPU module 125, a first memory module 130, a
second memory module 135, and an input/output device 140. It should be
noted that the schematic block diagram of FIG. 1 is highly simplified and
does not depict many of the accessory circuit elements and buffers
typically associated with multiprocessor systems, as will be appreciated
by those of ordinary skill in the art. Each of the CPU modules 120, 125
may, for example, comprise 80.times.486 Intel microprocessors, in addition
to a cache memory, conventional bus interface circuitry, and subsystem
circuitry (not shown here), which will be described in greater detail with
reference to FIGS. 2A and 2B.
Each of the memory modules 130, 135 may, for example, comprise 64 Mbit
dynamic random access memory (DRAM) such as those manufactured by Motorola
under the Model No. MCM516400. As will be appreciated by those of ordinary
skill in the art, the memory module also may comprise a bus interface as
well as memory control circuitry (not shown) configured to support
interleaving. The input/output device 140 may, for example, comprise a
disk drive, a printer, a keyboard or display, or any other input/output
devices commonly associated with multiprocessor systems. The system bus
110 may, in one embodiment, comprise a 32-bit or a 64-bit such as a PCI
bus.
In operation, each of the CPUs 120, 125 serves as a master unit which
controls data transfers on the bus and initiates memory and I/O requests
on the system bus 110. When neither CPU 120, 125 has control of the system
bus 110, and there are no pending requests within either of the CPUs 120,
125, the bus 110 is in a bus-free phase. If the CPU module 120 or the CPU
module 125 wishes to initiate a data transfer via the bus 110, the system
bus 110 enters an arbitration phase. Within the arbitration phase, each of
the master units on the system bus 110 vies for control of the bus 110.
Within a selection phase of the bus 110, control of the bus 110 is granted
to that master unit which has the highest priority request. Finally, once
one of the master units has control of the bus 110, data or command
signals may be transferred via the bus 110 within a command or data phase.
Thus, for example, if the CPU module 120 wishes to access information
stored within the memory module 130, the CPU module 120 initiates a
request to obtain control of the system bus 110. If there are no other
requests to obtain control of the system bus 110, then the CPU module 120
immediately obtains control of the system bus 110. If, however, another
master device such as the CPU module 125 also has a pending data request,
then the priority of the data request from the CPU module 120 is compared
to the priority of the request issued by the CPU module 125. The higher
priority request is granted so that the CPU module issuing the higher
priority request gains control of the system bus 110. Assuming, for the
sake of example, that the CPU module 120 gains control of the bus 110, and
wishes to access data stored within the memory module 130, then address
data is transmitted by the CPU module 120 to the memory module 130 via the
bus 110. The memory module 130 receives the address request and identifies
it as an address contained within the memory module 130. The memory module
130 then retrieves the data at the desired address and retransmits this
data to the CPU 120 via the bus 110.
FIGS. 2A and 2B together illustrate a schematic block diagram which shows
the internal circuitry of the first CPU module 120 of FIG. 1, constructed
in accordance with the teachings of the present invention. It should be
understood, of course, that the CPU modules 120, 125 are substantially
identical, so that the circuit diagram shown in FIGS. 2A and 2B is also
representative of the internal components of the CPU module 125 and any
other CPU modules in communication with the system bus 110 (FIG. 1).
The system bus 110 communicates with bus interface transceivers 205 via a
bus 202. The bus transceivers 205 communicate with a central processing
unit and cache memory, shown within a block 210, via a local data bus 207
and a local address and control bus 209. The CPU and cache memory 210
connect to a bus controller 215 via a bus 212. The bus controller 215
communicates with a driver receiver module 220 via a bus 217, while the
driver receiver module 220 connects to the system bus 110 via a bus 222.
The bus controller 215 also connects to the bus transceivers 205 via a bus
224. A transfer counter 225 receives a clock input from the bus controller
215 via an increment line 227. The transfer counter 225 further receives a
reset input from the bus controller 215 via a line 228. The transfer
counter 225 connects to a compare register 230 via a bus 229, while the
output of the compare register 230 connects to the bus controller 215 via
a line 232. The compare register 230 receives a second input from a
transfer count register 235. The transfer count register 235 receives an
input from the local data bus line 207 via a bus 239. The transfer count
register 235 further receives an enable, or latch pulse, input from an
address decoder 240 via a line 242. The address decoder 240 receives
inputs from the local address and control bus 209 via a bus 244. An
address latch circuit 245 receives a clock input from the address decoder
240 via a line 248, as well as receiving address bits from the local
address and control bus 209 via a bus 250. An interleave register 255 also
receives an enable, or latch pulse, input from the address decoder 240 via
a line 252. The interleave register 255 further receives inputs from the
local data bus 207 via a bus 257. A compare register 260 receives address
inputs from the local address and control bus 209 via a bus 261, and also
receives inputs from the address latch circuit 245 via a bus 262. The
interleave register 255 and the compare register 260 provide inputs to a
hog request control circuit 270 via lines 264-266, 267-269, respectively.
The hog request control circuitry 270 outputs data to the bus controller
215 via a hog request line 272.
The monitoring subsystem circuitry shown in FIGS. 2A and 2B generally
monitors addresses which are to be requested by the CPU 120 to determine
if additional sequential address request cycles can be run from the CPU
120 without giving up the system bus 110. Basically, the compare register
260 compares the previously accessed address with the present address to
be accessed. If the present address to be accessed is an address within a
different memory module than the memory module containing the previously
accessed address, then the hog request control circuitry 270 transmits a
request to the bus controller 215 via the line 272 to maintain control of
the system bus 110. The internal circuitry and operation of the hog
control circuitry 270 will be described in greater detail below with
reference to FIG. 3. The bus controller 215 then determines if the CPU 120
(i.e., the CPU which currently has control of the system bus 110) has had
control of the bus 110 for more than the maximum number of cycles allowed.
If the CPU 120 has not had control of the bus 110 for more than the
maximum number of allowed cycles, then the bus controller 215 grants
control of the bus 110 to the CPU 120 for the next cycle. When the current
CPU 120 has control of the bus for successive cycles, the CPU 120 is said
to be running in the "hog mode."
The operation of the internal circuitry of the CPU module 120 shown in
FIGS. 2A and 2B is described more specifically below. When the CPU module
120 has control of the system bus 110, address and control data are
transferred to the bus 110 from the CPU and cache memory 210 via the local
address and control bus 209, the bus transceivers 205, and the bus 202.
Data is transferred from the CPU and cache memory 210 to the system bus
110 via the local data bus 207, the bus transceivers 205, and the bus 202.
Data may also be transferred from the bus 110 to the CPU and cache memory
210.
During any data transfer cycle, the address decoder 240 receives address
and control data via the bus 244. The address decoder 240 employs the data
provided on the bus 244 to load an interleave value and a maximum transfer
count value into the interleave register 255 and the transfer count
register 235, respectively. The interleave value indicates the number of
memory boards (or modules) that are configured in interleave fashion
within the multiprocessor system 100, while the maximum transfer count
value indicates the maximum number of cycles which the CPU 120 is able to
maintain control of the system bus 110 while in the hog mode. When a
transfer count value is to be loaded into the transfer count register 255,
the CPU 210 polls the memory modules on the system bus 110 (e.g., the
memory modules 130, 135) to determine how many memory modules are
configured to interleave. The CPU 210 then supplies the address
corresponding to the memory location of the transfer register 255 to the
address decoder 240 via the address and control bus 209 and the bus 244.
In response to the address input over the bus 244, the address decoder 240
asserts a latch pulse input signal over the line 252. In the meanwhile,
the CPU 210 provides the interleave value on the local data bus 207. When
the interleave register 255 receives the latch pulse input signal over the
line 252, the interleave register 255 latches the interleave value
supplied on the local data bus 207 via the bus 257.
In a similar manner, the maximum transfer count value is supplied to the
transfer count register 235. Specifically, the CPU 210 supplies the
address of the transfer count register 235 to the address decoder 240 via
the address and control bus 209 and the bus 244. The address decoder 240
then asserts a latch pulse input signal over the line 242. In the
meanwhile, the CPU 210 provides the maximum transfer count value on the
local data bus 207. When the transfer count register 235 receives the
latch pulse signal from the address decoder 240, the transfer count
register 235 latches the maximum transfer count value from the data bus
207 via the bus 239.
The address decoder 240 also latches the lower three address bits A4-A6
into the address latch circuit 245 by means of an enable line 248. The
data bits A4-A6 are provided to the address latch circuit 245 from the
address and control bus 209 via the bus 250. The address latch 245 holds
the address bits A4-A6 for one data transfer cycle.
The address compare circuit 260 receives the lower three data bits A4-A6
via the bus 261 from the local address and control bus 209. The compare
circuit 260 also receives the output of the address latch circuit 245 via
the output bus 262. The compare circuit then compares inputs from the bus
261 and the bus 262. Because the address latch circuit 245 outputs the
address bits A4-A6 one data transfer cycle after the data bits A4-A6 were
received in the address latch 245, the data bits provided on the output
bus 262 represent the last three address bits of the previously accessed
address. Thus, the compare circuit 260 compares the current address
(provided on the bus 261) with the previously accessed address (provided
on the bus 262) in order to determine if the lowest three address bits are
the same or different.
For each of the compared address bit values A4-A6, an output comparison
value is provided to the hog request control circuitry 270 via buses
267-269. The line 267 outputs a comparison value based upon the values of
A4 and latched A4, while the line 268 outputs a comparison value based
upon the values of A5 and latched A5, and the line 269 outputs a
comparison value based upon the values of A6 and latched A6. In one
embodiment, the comparison circuit 260 comprises a plurality of exclusive
OR gates so that if the input bits are the same, the corresponding
comparison output is low (i.e., logical "0"), while if the input bits are
different, the corresponding comparison output is high (i.e., logical
"1"). The hog request control circuitry 270 uses the comparison outputs
provided on the lines 267-269 to determine if the presently requested
address is to a different memory module than the previously requested
address. In the case where the three least significant address bits are
the same, the same memory module is being requested. In the case where one
or more of the comparison outputs are different, the hog request control
circuitry 270 must then use the interleave register 255 to determine if
the address is to a different memory module. For the purposes of these
examples it is assumed that the memory modules use the least significant
address bits to determine which memory module is being accessed. For the
case of two memory modules, the bit A4 is used to select between the two
modules (e.g., if A4=0 then the first memory module is being accessed,
while if A4=1 then the second memory module is being accessed). For the
case of four interleaved memory modules, the lower two address bits A5, A4
are used so that these two bits in the combinations 00, 01, 10, 11 are
used to select a different memory module.
In a manner similar to the operation of the comparator register 260, the
interleave register 255 provides the interleave value to the hog request
control circuitry 270 via lines 264-266. In one embodiment, the interleave
register 255 outputs an active high signal (logical 1) on the line 264 if
the multiprocessor system 100 is configured to have two interleaved memory
modules, an active high signal on the line 265 if the multiprocessor
system 100 is configured to have four interleaved memory modules, and an
active high signal on the line 266 if the multiprocessor system 100 is
configured to have eight interleaved memory modules.
Given the interleave value, as well as the output of the compare circuitry
260, the hog request control circuitry 270 can determine whether or not
the current address which is to be accessed is within the same memory
module as the previously accessed address. This is because successive
address memory locations are written in successive interleaved memory
modules. Thus, if the interleave value is four (i.e., there are four
interleaved memory modules) this means that the lowest two bits A5, A4 are
used in combination so that the combinations 00, 01, 10, 11 will each
access a different memory module. Thus, in the above example, when the
address bits A6-A4 are 110, respectively, for the present address request,
and the latched address bits A6-A4 are 010, respectively, for the previous
address request, this indicates that the same interleave module is being
accessed when the interleave value is four. Whenever an interleave value
and an address comparison value are input to the hog request control
circuit 270, the hog request control circuit determines if the same memory
module is being accessed twice in a row. The internal operation and
structure of the hog request control circuitry 270 will be described in
greater detail below with reference to FIG. 3.
It should be noted that, although the least significant three bits A4-A6
are used to identify the addresses for purposes of the present invention,
more or less than three bits may be used depending upon the number of
interleaved memory modules within the multiprocessor system 100. For
example, if there are 16 (i.e., 2.sub.4) interleaved memory modules, then
the lowest four address bits should be used to identify the memory module
having a given memory location.
Furthermore, the use of A4 as the least significant bit implies that the
memory modules are interleaved on a 16-byte boundary for the purposes of
the present invention. A higher or lower address bit may be used to
increase or decrease the interleave boundary size. For example, if a
32-byte boundary were desired, then the bit A5 would be used as the least
significant bit, and bits A6, A7 would be used in conjunction with bit A5
for a system with up to eight interleaved memory modules.
If the hog request control circuit 270 determines that the presently
accessed address is not to the same memory module as the previously
accessed address, then the hog request control circuitry 270 provides an
indication to the bus controller 215 that the CPU module 120 is to retain
control of the system bus 110. That is, the CPU module 120 is to "hog" the
bus 110.
Upon reception of a hog request on the line 272, the bus controller 215
determines whether or not the CPU module 120 will maintain control of the
system bus 110 based upon the input provided along the line 232 from the
comparator circuit 230. Basically, the signal on the line 232 indicates
whether or not the CPU module 120 has run the maximum allowed number of
successive cycles without relinquishing control of the system bus 110. In
order to generate an indication signal along the line 232, the comparator
circuit 230 receives input from the transfer count register 235 via the
bus 237, as well as from the transfer counter 225 via the bus 229.
As stated above, the transfer count register 235 holds the maximum transfer
count which is allowable before the CPU module 120 must hand over control
of the system bus to another requesting CPU module. If one CPU module has
been hogging the system bus 110 for too many cycles, this may be
detrimental to the multiprocessor system 100 as a whole, even though the
system bus 110 may be running very efficiently, since the processing
ability of the other processors on the system bus are compromised. Thus,
using a maximum transfer count value is a means of assuring that the other
CPU modules on the system bus 110 are able to operate effectively.
Although the value of the maximum transfer count value is typically on the
order of 4-16, the value of the maximum transfer count value is very
application dependent and may vary significantly from application to
application. For example, systems which have many processors vying for
control of the system bus are likely to | | |