|
Description  |
|
|
A related invention relates to masking the presence of a command in one
processor from the remainder of the multiprocessor system until the flush
operation is complete. In that regard, see the previously co-pending
application entitled "Masking Commands For A Second Processor When A First
Processor Requires a Flushing Operation In A Multiprocessor System", Ser.
No. 664,283 filed Oct. 24, 1984, assigned to the same assignee, and issued
on Dec. 14, 1987 as U.S. Pat. No. 4,713,751.
BACKGROUND OF THE INVENTION
The present invention relates to multiprocessor computer technology, and
more particularly, to a sync and flush circuit within a multiprocessor
computer for synchronizing the clocks of a processor, with respect to a
control circuit, and for flushing modified data from the cache of the
processor to a main store prior to utilizing the modified data in the
execution of an instruction by another processor.
In a multiprocessor computer system, when one processor attempts to locate
desired data in its own cache, and fails to locate such data, it is
necessary to attempt to locate the data in the cache of the other
processor. If the data is not found in the cache of the other processor,
it is necessary to retrieve the data from a main store. Occasionally, the
data is found in the cache of the other processor. The one processor must
utilize the desired data in the execution of an instruction. For some
instructions, the one processor may retrieve the data directly from the
cache of the other processor, store the data in its own cache, and utilize
the data in the execution of the instruction.
However, for other instructions, the one processor cannot retrieve the data
directly from the cache of the other processor. It is therefore necessary
to flush the desired data from the cache of the other processor to the
main store, and utilize the desired data in the main store during the
execution of the instruction.
The clocks of the one processor run independently of the clocks of the
other processor. As a result, the clock of the one processor may be
out-of-sync with respect to the clock of the other processor and with
respect to the clock of a main store control circuit. Therefore, when it
is necessary for the one processor to execute instructions on the other
processor's data, prior to flushing the desired data from the other
processor's cache to the main store, it is necessary to synchronize the
clock of the other processor with the clock of the main store control
circuit. When these clocks are synchronized, the flush operation may
commence.
SUMMARY OF THE INVENTION
It is a primary object of the present invention to provide a method and
apparatus for synchronizing the clocks of another processor of a
multiprocessor system with the clocks of a main store control circuit
prior to flushing a desired page of data from the cache of the other
processor to the main store.
These and other objects of the present invention are accomplished by
synchronizing the clocks of a second processor with the clocks of a main
store control circuit prior to flushing the desired page of data from the
cache of the second processor to the main store in order that a first
processor may utilize the desired page of data stored in the main store
during the execution of an instruction. When the desired page of data is
found in the cache of the second processor, a first signal is generated,
the first signal energizing an alternate cache search signal generation
circuit thereby causing a first alternate cache search signal to be
generated. The first alternate cache search signal energizes a block
circuit in the main store causing the block circuit to delay the start of
the flush operation. The first signal also energizes the sync circuit in
the second processor causing the sync circuit to synchronize the clocks of
the second processor with the clocks of the main store. When the
synchronization is complete, the sync circuit generates another signal
which ultimately results in the generation of a second alternate cache
search signal which energizes the block circuit releasing it from its
delay function. When the block circuit is released from its delay
function, the flush operation begins.
Further scope of applicability of the present invention will become
apparent from the detailed description presented hereinafter. It should be
understood, however, that the detailed description and the specific
examples, while representing a preferred embodiment of the invention, are
given by way of illustration only, since various changes and modifications
within the spirit and scope of the invention will become obvious to one
skilled in the art from a reading of the following detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
A full understanding of the present invention will be obtained from the
detailed description of the preferred embodiment presented hereinbelow,
and the accompanying drawings, which are given by way of illustration only
and are not intended to be limitative of the present invention, and
wherein:
FIG. 1 illustrates a block diagram of a multiprocessor system in accordance
with the present invention;
FIG. 2 illustrates a more detailed block diagram of the multiprocessor
system of FIG. 1;
FIG. 3 illustrates a block diagram of a BSM control circuit, a portion of
the multiprocessor system of FIG. 2;
FIG. 4 illustrates a block diagram of the stacked op discriminator circuit
of FIG. 3;
FIG. 5 illustrates a block diagram of a pair of alternate cache signal
generation circuits, a portion of the multiprocessor system of FIG. 2;
FIG. 6 illustrates a block diagram of the ICT Control(1), a portion of the
alternate cache signal generation circuits of FIG. 5;
FIG. 7 illustrates a block diagram of a block circuit, a portion of the
multiprocessor system of FIG. 2;
FIG. 8 illustrates a block diagram of a sync circuit, a portion of the
multiprocessor system of FIG. 2;
FIG. 9 illustrates a block diagram of the IPU WAIT TRAP SRL, a portion of
the sync circuit of FIG. 8;
FIGS. 10a and 10b illustrate block diagrams of the nand-invert (NI)
circuits of FIG. 9 and the SRL latch circuits (SRL) of FIGS. 6, 7, 8, and
9, respectively;
FIG. 11 illustrates a block diagram of the clock circuits of FIG. 2;
FIG. 12 illustrates the clock sequences associated with the instruction
processing unit (IPU) circuits, IPU 0 and IPU 1, and the BSM control
circuit shown in FIG. 2;
FIG. 13a illustrates an out-of-sync situation wherein a processor's clocks
are out-of-sync with a main store (BSM) control clock; and
FIG. 13b illustrates an in-sync situation wherein a processor's clocks are
in-sync with a main store (BSM) control clock.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring to FIG. 1, a multiprocessor system is illustrated. In FIG. 1, a
first processor 10, of the multiprocessor system, is connected to a main
memory 15 by way of a system bus (the main memory being alternatively
termed a Basic Storage Module BSM or a main store). A second processor 20,
of the multiprocessor system, is also connected to the main store 15 by
way of the system bus.
Referring to FIG. 2, a more detailed block diagram of the multiprocessor
system of FIG. 1 is illustrated. In FIG. 2, the first processor 10,
alternatively termed processor 0, includes an instruction processing unit
10a IPU 0), a cache 10b (cache 0) connected to the IPU 10a, a clock
generator 10c connected to the IPU 10a, and an X module 10d connected to
the clock generator 10c. The X module 10d includes a novel sync circuit
10d1 connected to the clock generator 10c and an alternate cache search
signal generation circuit (ACS GEN) 10d2 connected to the sync circuit
10d1.
The first processor 10 further includes a trap priority circuit 10f
connected between the sync circuit 10d1 of the X module 10d and the clock
generator 10c, and a cache directory (Z) 10e connected to cache 10b, to
the sync circuit 10d1, to the ACS GEN 10d2 of X module 10d, and to PU 10a.
The second processor 20, alternatively termed processor 1, includes an
instruction processing unit (IPU) 20a IPU 1 , a cache 20b (cache 1)
connected to the IPU 20a, a clock generator 20c connected to the IPU 20a,
and an X module 20d connected to the clock generator 20c. The X module 20d
includes a novel sync circuit 20dl connected to the clock generator 20c
and an alternate cache search signal generation circuit (ACS GEN) 20d2
connected to the sync circuit 20d1. The second processor 20 further
includes a trap priority circuit 20f connected between the sync circuit
20d1 of the X module 20d and the clock generator 20c, and a cache
directory (Z) 20e connected to cache 20b, to the sync circuit 20d1 and the
ACS GEN 20d2 of X module 20d, and to IPU 20a.
The multiprocessor system of FIG. 2 further includes a main store control
circuit 30, otherwise termed a Basic Storage Module (BSM) control circuit
30 or "BSM Controls 30", connected to processor 10, and in particular, to
the X module 10d, the cache 10b, and the cache directory 10e of processor
10, and to processor 20, and in particular, to the X module 20d, the cache
20b, and the cache directory 20e of processor 20. The BSM control 30 is
further connected to a main store 15 or Basic Storage Module (BSM) 15. The
BSM control 30 is responsible for controlling the functioning of the
multiprocessor system of FIG. 2, the details of its functioning being
described in the paragraphs below which are dedicated to a description of
the functional operation of the present invention.
Referring to FIG. 3, a block diagram of the BSM controls 30 of FIG. 2 is
illustrated. In FIG. 3, the BSM controls 30 comprise a command status
register (CS REG) 30a connected to the cache directory (Z) 10e and a
command status register (CS REG) 30b connected to the cache directory (Z)
20e. The cache directories (Z) supply hit/miss/flush/modified information
to the command status registers 30a and 30b. The command status registers
30a and 30b are each connected to a stacked op discriminator circuit 30d.
The stacked op discriminator circuit 30d receives the contents of the
command status registers 30a and 30b and develops an output signal when
the command status registers 30a and 30b contain predetermined
information. For example, when command status register 30a contains a
WHEREVER word, and command status register 30b contains a flush indication
from cache directory (Z) 20e, the stacked op discriminator circuit 30d
develops an output signal. The stacked op discriminator circuit 30d would
also generate an output signal if command status register 30b contained a
WHEREVER word and command status register 30a contained a flush indication
from cache directory (z) 10e. The output signal (60) from the stacked op
discriminator circuit 30d performs a "masking" function in that it masks
the contents of the command status register (CS REG) containing the
WHEREVER word information from the remainder of the BSM controls 30 and
prevents the normal start signal 61 from being sent to the BSM ops control
circuit 30g. The output signal 60 of the stacked op discriminator circuit
30d also enables the contents of the command status register containing
the flush indication to be seen by the BSM ops control circuit 30g. The
stacked op discriminator circuit 30d is connected to a block circuit 30c
and to an AND gate 30e. A signal termed "allow reset BSM controls"
energizes the other input terminal of the AND gate 30e. The block circuit
30c is connected to clock 10c, to ACS GEN 10d2. The AND gate 30e is
connected, at its output terminal, to an input (set) terminal of a stacked
op latch 30f. The output terminal of the stacked op latch 30f is connected
to the CS REG 30b and to a BSM OPS CONTROL CIRCUIT 30g. The BSM ops
control circuit 30g is connected, at its output, to cache 10b , cache 20b,
and to BSM 15, the BSM ops control circuit 30g controlling the transfer of
data from the cache memories 10b and 20b to the BSM 15 and vice-versa.
Referring to FIG. 4, a block diagram of the stacked op discriminator
circuit 30d of FIG. 3 is illustrated. In FIG. 4, the stacked op
discriminator circuit 30d comprises an AND gate 30d1 connected to CS REG
30a and an AND gate 30d2 connected to CS REG 30b. AND gate 30d1 is
connected to AND gate 30d2, and vice-versa. Further, AND gate 30d1 is
connected to a "BSM CTLS BUSY FOR PU0" line, and AND gate 30d2 is
connected to a "BSM CTLS BUSY FOR PU1" line. The outputs of AND gates 30d1
and 30d2 are connected to an OR gate 30d3. The output of OR gate 30d3
represents line 60 as shown in FIG. 3.
Referring to FIG. 5, a block diagram of the alternate cache search signal
generation circuits (ACS GEN) 10d2 and 20d2 of FIG. 2 is illustrated. In
FIG. 5, the ACS GEN 20d2 comprises portions of an ICT Control (2) 20d2(a).
The ICT Control (2) 20d2(a) receives signals from cache directory 20e and
sync circuit 20d1 and generates output signals "-Y busy gate XBRD" and "-Y
flush XBRD" in response thereto. The ACS GEN 10d2 comprises portions of an
ICT Control (2) 10d2(a) connected to sync circuit 10d1 and cache directory
(Z) 10e, the Control (2) 10d2(a) generating output signals "+Y REQ
accepted" and "+ flush reset" in response to output signals from the cache
directory (Z) 10e and sync circuit 10d1. The ACS GEN 10d2 also comprises
portions of an ICT Control(1) 10d2(b) connected to the ICT Control (2)
10d2(a) and receiving the output signals therefrom for developing the
alternate cache search signals in response thereto, the alternate cache
search signals being received by the block circuit 30c.
Referring to FIG. 6, a block diagram of portions of the ICT Control(1)
10d2(b) of FIG. 5 is illustrated. In FIG. 6, the ICT Control(1) 10d2(b)
portion comprises an SRL latch circuit (b)1 which receives the "+Y request
accepted" signal from the ICT Control (2) 10d2(a) and output signals from
a clock driver circuit (b)2. The clock driver circuit (b)2 receives an +SO
clock signal. The output of the SRL latch (b)1 is connected to an input
terminal of an OR gate (b)3. The "-Y flush XBRD" signal energizes another
input terminal of OR gate (b)3 via receiver circuit (b)4 and inverter
(b)5. The "+flush reset" signal energizes another input terminal of OR
gate (b)3 via inverter (b)6. The output of OR gate (b)3 is connected to a
second input terminal of OR driver (b)7. The ICT Control(1) 10d2(b)
portion also comprises an SRL scan only latch (b)8 connected, at output
terminal 21, to the OR driver circuit (b)7 via inverter (b)9. The latch
(b)8 is connected, at output terminal 10, to a clock driver circuit (b)10.
A +C1/C3 clock also energizes the clock driver (b)10. The clock driver
circuit (b)10 is connected, at its output, to the -C and +C input
terminals of an SRL latch circuit (b)11. The "-Y busy gate XBRD" signal
energizes another input terminal of the SRL latch circuit (b)11 via a
driver/receiver circuit (b)12. Output terminal 21 of the SRL latch circuit
(b)11 is connected to the first input terminal of OR driver (b)7 via
inverters (b)12 and (b)13. A --SO/--S2 clock energizes the second input of
OR driver (b)7 via inverter (b)14. The OR driver circuit (b)7 generates
the alternate cache search signals which energize the block circuit 30c.
Referring to FIG. 7, a block diagram of the block circuit 30c disposed
within the BSM controls 30, is illustrated. The block circuit 30c includes
an AND gate 30c1 having one input terminal connected to a "BSM busy
controls" line and another input terminal connected to the ACS GEN 10d2.
The output of AND gate 30c1 is connected to OR gate 30c2, the other input
of which is connected to a "Reset BSM controls" signal. The output of OR
gate 30c2 is connected to the input of a clock driver 30c3, the output of
clock driver 30c3 including a +C output terminal and a -C output terminal.
The +C and -C output terminals are input to an SRL latch circuit 30c4
(L1). The L1 (master) portion of the SRL latch circuit 30c 4 is connected
internally to an L2 (slave) portion which is controlled by OR circuit 30c
10. See FIG. 10b for a detailed construction of this SRL latch circuit. An
output of the SRL latch circuit 30c4 (L2) is connected to an input of
invert gate 30c5. The output of invert gate 30c5 is fed back to another
input terminal of the SRL latch circuit 30c4 (L1). An output terminal (L2)
of SRL latch circuit 30c4 is connected to one input of NAND gate 30c6
(NAND gate 30c6 comprising an AND gate with an inverter connected to its
output terminal). The other input of NAND gate 30c6 is connected to the
command status registers 30a and 30b via stacked op discriminator circuit
30d (it should be noted, at this point, that the command status registers
30a and 30b shown in FIGS. 3, 4, and 7 receive hit/miss/modified data
information from the cache directories 10e and 20e, data information, and
information related to the initiation of execution of a special
instruction called a "wherever", abbreviated W.E., instruction). The
output of NAND gate 30c6 is connected to a clock driver circuit 30c7. The
clock driver circuit 30c7 develops two outputs: a +C output and a -C
output. The +C and the -C outputs of clock driver 30c7 are connected to
input terminals of an SRL latch circuit 30c8. The output terminal of the
SRL latch circuit 30c8 is connected to the BSM ops control circuit 30g.
The BSM ops control circuit 30g functions to initiate the flush of data
from a processor's cache 10b or 20b to the BSM 15 and then execute the
subsequent WHEREVER word which has been masked during the flush operation.
Referring to FIG. 8, a block diagram of sync circuits 10d1 and 20d1 is
illustrated. In FIG. 8, sync circuits 10d1 and 20d1 each comprise an AND
gate 10d1(10) connected to cache directories 10e and 20e for receiving
hit/modified data information from cache directories 10e and 20e, for
receiving a clock signal, and for receiving the special "wherever"
instruction indicating that the "wherever" instruction is about to be
executed. The output of AND gate 10d1(10) is connected to an input of a
FLUSH REQ SRL latch circuit 10d1(12). This latch circuit is the same latch
circuit, in construction, as the latch circuits 30c4 and 30c8. The SRL
latch circuit 10d1(12) is connected to an IPU WAIT TRAP SRL latch circuit
10d1(14). Latch circuit 10d1(14) develops an IPU WAIT TRAP REQUEST signal
which energizes clock generators 10c and 20c. When the clock signal
energizing the IPU's 10a and 20a are in sync with the clock signal
energizing the block circuit 30c of the BSM controls 30, the clock
generators 10c and 20c generate a DIR MISS/IPU WAIT TRAP signal which
resets latch circuit 10d1(14) and which energizes one input terminal of
AND gate 10d1(16). The other input of AND gate 10d1(16) originates from
the output terminal of latch circuit 10d1(12). The output of AND gate
10d1(16) is connected to an input of clock driver circuit 5 10d1(18). A
clock signal energizes the clock driver circuit. The output of clock
driver circuit 10d1(18) is connected to an input of a FLUSH GO SRL latch
circuit 10d1(20). The output of the FLUSH GO latch circuit 10d1(20) is
connected to a reset terminal of the FLUSH REQ latch circuit 10d1(12), and
to either the ACS GEN 10d2 or to the ACS GEN 10d2 via the ACS GEN 20d2
(depending upon the sync circuit). The ACS GEN 10d2 develops the alternate
cache search signal which energizes the block circuit.
Referring to FIG. 9, a detailed construction of the IPU WAIT TRAP SRL
10d1(14) shown in FIG. 8 is illustrated. The IPU wait trap SRL 10d1(14) is
discussed in detail in a co-pending application entitled "Dual Stream
Processor Apparatus", serial number 548,748, filed Nov. 4, 1983, and
issued on Jan. 27, 1987 as U.S. Pat. No. 4,639,856 to the same assignee,
the disclosure of which is incorporated by reference into the
specification of this application. A complete discussion of the
construction and functional operation of the IPU wait trap SRL 10d1(14)
may be found in the above-referenced co-pending application associated
with FIG. 3 of the drawings.
Referring to FIG. 10a , a diagram of the construction of the nand-invert
circuit (NI) shown in FIG. 9 is illustrated.
Referring to FIG. 10b , a diagram of the construction of the SRL latch
circuit shown in FIG. 9 is illustrated. In addition, FIG. 10b illustrates
the construction of the SRL latch circuits shown in FIGS. 6, 7 (latches
30c 4 and 30c8), and 8 (the FLUSH REQ latch 10d1(12) and the FLUSH GO
latch 10d1(20)).
Referring to FIG. 11, a block diagram of the clock generator 10c and 20c,
shown in FIG. 2, is illustrated. In FIG. 11, clock generator 10c includes
a processor clock module 10c1 connected to the sync circuit 10d1. A T
clock 10c2 is connected to the processor clock module 10c1. An S clock
10c3 is connected to the processor clock module 10c1. A C clock 10c4 is
connected to an oscillator 70, the oscillator 70 also being connected to
the S clock 10c3 and the T clock 10c2. The C clock is connected, at its
output, to the block circuit 30c via line 52. Clock generator 20c also
includes a processor clock module 20c1 connected to the sync circuit 20d1.
The module 20c1 is further connected to a T clock 20c2, and to an S clock
20c3, the T clock and the S clocks being connected to oscillator 70. The
oscillator 70 is further connected to a C clock 20c 4 and an R clock 20c5.
An output of the R clock 20c5 is connected to an input of the S clock
20c3. An output of the S clock 20c3 is connected to IPU 20a. The trap
priority circuits 10F and 20F each receive the IPU WAIT TRAP REQUEST
signals from the sync circuits 10d1 and 20d1, respectively, and, when
appropriate, issue the DIR MISS/IPU WAIT TRAP signal to the processor
clock modules 10c1 and 20c1 causing them to undertake a clock
synchronization action.
Referring to FIG. 12, clock sequences associated with the clocks for IPU
10a, IPU 20a, and the BSM controls 30 are illustrated. Note that, at
various points along the sequence, the clock associated with the BSM
controls 30 is out-of-sync with respect to the clock associated with IPU
20a and with respect to the clock associated with IPU 10a. For example, at
one point along the sequence, the clock associated with the BSM controls
30 generates the following pulse sequence: 0, 1, 2, 3, 0; the clock
associated with the IPU 10a generates the following pulse sequence: 0, 1,
2, 3, 0; however, the clock associated with the IPU 20a generates the
following pulse sequence: 0, 1, 2, 3, 4, 5, 0. When pulse of IPU 20a is
generated, the clock associated with the BSM controls 30 is out-of-sync
with the clock associated with the IPU 20a. As will be demonstrated in the
functional description presented in the following paragraphs, the clock
associated with the IPU 20a must be synchronized with the clock associated
with the block circuit 30c of the BSM controls 30 prior to flushing a
desired page of data from the IPU 20a to the BSM.
Referring to FIG. 13a, an out-of-sync situation is illustrated wherein the
clock associated with one processor (e.g.-processor 10) is out-of-sync
with the clock associated with the BSM controls 30. Note that pulse zero
associated with the Proc 0 clock is out of sync with pulse zero associated
with the BSM controls clock.
Referring to FIG. 13b, an in-sync situation is illustrated wherein the
clock associated with the one processor is in-sync with the clock
associated with the BSM controls 30. Pulse zero associated with the Proc 0
clock is in sync with the pulse zero associated with the BSM controls
clock. The FIG. 13b situation represents an in-sync condition because,
when pulse 0 of the processor 10 clock energizes IPU 10a, releasing data
from cache 10b for storage in BSM 15, pulse 2 of the BSM controls clock
will energize the BSM 15 at the precise point in time (two pulse periods
from initiation of pulse 0) in order to accept the released data.
The functional operation of the present invention will be described in the
following paragraphs with reference to FIGS. 1 through 13b of the
drawings.
Referring to FIG. 1, a multiprocessor system is illustrated. In FIG. 1,
when processor 10 searches for data in its cache and fails to locate the
data, it searches for the data in the cache of processor 20. If it locates
the data in the cache of processor 20, the data is either directly
transferred to the cache of processor 10 or it is transferred to BSM 15
for use by processor 10, depending upon the type of instruction being
executed by processor 10. If the type of instruction being executed by
processor 10 requires that the data in the cache of processor 20 be
transferred to the BSM 15, prior to the transfer (or flush), the clocks of
the processor 20 must be synchronized with the clocks energizing the BSM
15. When the clocks of processor 20 are synchronized with the clocks
energizing the BSM 15, the data is flushed from processor 20 to the BSM
15. Processor 10 may then utilize the data in the execution of its
instruction. In the above sequence of functional events, the transfer or
flush of the data from processor 20 to BSM 15 is blocked temporarily until
the clocks of processor 20 are synchronized with the clocks energizing the
BSM 15. When the above referenced clocks are synchronized, the blocking
function is terminated. When the blocking function is terminated, the
transfer of the data from processor 20 to BSM 15 begins.
However, if a further instruction is being executed by processor 10 which
does not require a flush of the data from processor 20 to BSM 15, rather,
it requires a direct transfer of the data from the cache of processor 20
to the cache of processor 10, the above referenced blocking function is
precluded or prevented from occurring; the data is directly transferred
from the cache of processor 20 to the cache of processor 10 and execution
of the further instruction begins.
In the multiprocessor system of FIG. 1, when the original instruction is
being executed by processor 10, requiring a flush operation, the blocking
function occurs thereby blocking the flush operation until the clock
synchronization operation is complete. When the synchronization is
complete, the flush operation begins. When the flush operation is
complete, the "mask" is removed and the original instruction is executed.
However, when the further instruction is being executed by processor 10,
not requiring the flush operation, there is no need for a synchronization
operation. Therefore, there is no need for a blocking function, since the
flush operation is normally blocked in order to permit the clock
synchronization to be completed. As a result, when the execution of the
further instruction is sensed, the blocking function is precluded from
occurring.
Referring to FIG. 2, a more detailed construction of the multiprocessor
system of FIG. 1 is illustrated. In FIG. 2, when IPU 20a executes an
instruction, it may need data stored in BSM 15. Consequently, the data is
retrieved from BSM 15 and stored in cache 20b via line 40 marked "data".
The data is stored in cache 20b because the length of time required to
subsequently withdraw data from cache 20b is much smaller than the length
of time required to subsequently withdraw data from BSM 15. When the
instruction is executed, the data may be modified. The modified data is
re-stored in cache 20b. The original, un-modified data still resides in
the BSM 15.
Assume that processor 10 must execute an instruction which requires the
utilization of the modified data stored in cache 20b of processor 20.
Further assume that the instruction to be executed by processor 10 is a
special type of instruction wherein data must be retrieved from the main
st | | |