|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to the field of handling read and write
responses to processors coupled to a computer bus. More particularly, this
invention relates to allowing other processors to utilize the computer bus
during the time spent waiting for a response to a read request.
BACKGROUND OF THE INVENTION
A computer processor typically performs read and write operations to both
memory and input/output devices on a frequent basis. A write operation
usually involves transmitting the data to be written along with the
address of the location being written to. Conversely, with a read
operation, after the read command has been issued, the processor and the
bus can sit idle while waiting for the response to the read command to be
forthcoming. Although the processor may be allocated to another task in
the meantime, the bus can be stuck sitting idle unable to transmit another
command (transmit other data) until the read command is responded to thus
allowing the bus to accept another command. If the device being read is
relatively fast, then the delay may only be for a short time and the
performance degradation may be acceptable. However, if the device being
read is relatively slow, then the bus may be sitting idle for a
considerable, and unacceptable, period of time.
One prior approach to improving bus utilization on read commands is to
limit the types of devices which can be read. If only a global memory may
be read then, because there is no mechanical delay and because the global
memory is most likely directly connected to the bus, the bus is not idle
for extemely long periods of time. In this way, the bus would have a
shorter average idle time. However, the bus is still idle when a read
command is outstanding and other read commands can not be issued during
this idle period.
Another prior approach to improving bus utilization on read commands is to
improve the speed of the bus itself. In this way, when the read response
is ready, it will be transmitted that much faster and thus free up the bus
that much sooner. Additionally, this would decrease the time the
requesting processor waits for a read response as both the read command
and the read response would be transmitted more quickly. However, merely
improving the bus speed does nothing to eliminate the idle time the bus
experiences while it is waiting for a read response. Thus, additional read
commands would still have to wait for responses to earlier read commands
to be completed before these additional read commands could be issued.
SUMMARY AND OBJECTS OF THE INVENTION
One objective of the present invention is to provide an improved method of
handling read transactions on a computer bus in a multiple processor
environment.
Another objective of the present invention is to provide an improved method
of handling read transactions so as to allow the computer bus to be
utilized for other transactions during the time between the read command
and its associated read response.
Still another objective of the present invention is to provide an improved
method of handling read transactions so as to allow the computer bus to be
utilized for other transactions during the time between read commands and
their associated read response so that relatively fast devices provide
ordered responses and relatively slow devices provide out of order
responses.
Yet another objective of the present invention is to provide a method of
providing ordered and out of order split responses to read commands in a
computer system with a command bus, a data bus, and multiple processors
wherein the command bus and the data bus may be utilized for other
commands while a processor is waiting for a read response after issuing a
read command. When a processor desires to issue a read command, the
processor performs read command steps of gaining access to the command bus
and issuing the read command on the command bus. When a processor desires
to issue a write command, the processor performs write command steps of
gaining access to the command bus, issuing the write command on the
command bus, and issuing write data on the data bus if the data bus is
available and if no other processor is outputting an ordered response
signal. When a processor desires to be able to provide ordered split
responses to read commands, the processor performs queueing steps of
adding one marker to a First-In-First-Out (FIFO) queue of the processor if
a read command acknowledgement signal is transmitted without an
out-of-order read response signal, and removing one marker from the
processor's FIFO queue if an ordered read response signal is transmitted.
When a processor desires to provide an ordered response to a read command
the processor performs ordered read response steps of outputting a read
command acknowledgement signal, marking as owned by the ordered response
processor the last entered marker in the ordered response processor's FIFO
queue, and if the ordered response processor is ready to respond to the
read command with an ordered response and if the ordered response
processor's owned marker is at the head of the ordered response
processor's FIFO queue and if no other processor is outputting an ordered
response signal then the ordered response processor outputs an ordered
response signal indicating readiness to provide an ordered response and
transmits data on the data bus if the data bus is available. When a
processor desires to provide and out-of-order response to a read command
the processor performs out-of-order read response steps of outputting a
read command acknowledgement signal, outputting an out-of-order read
response signal, and if the out-of-order response processor is ready to
respond to the read command then the out-of-order response processor gains
access to the command bus and transmits data on the data bus if the data
bus is available and if no other processor is outputting an ordered
response signal.
More specifically, the read command issuing processor of the present
invention repeats the read command steps if no acknowledgement signal by
an other processor that the command is being handled is received.
More specifically, the ordered read response processor in the present
invention outputs a signal indicating when the ordered read response
processor's FIFO queue is full and to have the read command issuing
processor repeat the read command steps if it sees a full FIFO queue
signal.
Even more specifically, the processor desiring to provide an ordered
response to a read command derives the identity of the read command
issuing processor from a combination of the read command and the process
by which the read command issuing processor gained access to the command
bus.
Still more specifically, the processor desiring to provide an out-of-order
response to a read command derives the identity of the read command
issuing processor from a combination of the read command and the process
by which the read command issuing processor gained access to the command
bus.
Other objects, features, and advantages of the present invention will be
apparent from the accompanying drawings and from the detailed description
which follows below.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and not limitation
in the figures of the accompanying drawings, in which like references
indicate similar elements, and in which:
FIG. 1 depicts a prior art multi-tasking, time-sharing architecture;
FIG. 2 depicts a prior art tightly-coupled multi-processing architecture;
FIG. 3 depicts a prior art loosely-coupled multiprocessing/functionally
partitioned architecture;
FIG. 4 depicts the architecture of the present invention;
FIG. 5 depicts a central arbitrator architecture of the prior art;
FIG. 6 depicts a functional node of the present invention;
FIG. 7 is a timing diagram of prior art arbitration as compared with the
arbitration of the present invention;
FIG. 8 depicts an arbitration state diagram;
FIG. 9 is a timing diagram of the new arbitration group formation;
FIG. 10 depicts a card and a slot connector to show the arbitration signal
lines as well as a table to illustrate the rotation of the arbitration
signals lines from one slot to the next;
FIG. 11 is a timing diagram of a two write operations;
FIG. 12 is a more detailed timing diagram of two consecutive write
operations;
FIG. 13 is a timing diagram of a prior art read operation;
FIG. 14 is a timing diagram of a read operation and a write operation;
FIG. 15 depicts a cache coherency state diagram;
FIG. 16 is a flowchart of the steps taken when receiving an Interrupt
Processor Request Command;
FIG. 17 is a further flowchart of the steps taken when receiving an
Interrupt Processor Request Command;
FIG. 18 depicts the correct only mode of the ECC circuitry of the prior
art;
FIG. 19 depicts a processor card (containing up to four processor/cache
modules) with its associated bus controller and the bus interface; FIG. 20
depicts the detect and correct mode of the ECC circuitry;
FIG. 21 is a timing chart of the prior art error detection and correction
as compared to the bus stretching protocol of the present invention;
FIG. 22 is a logic diagram of the arbitration priority determination and
resolution circuitry for the third alternative embodiment of the present
invention;
FIG. 23 depicts the backplane configuration with slot marker for the third
alternative embodiment of the present invention;
FIGS. 24 and 25 are flow charts of the split data transactions of the
present invention, wherein FIG. 24 shows the procedure with respect to the
requester and FIG. 25 shows the procedure with respect to the responder;
FIG. 26 is a flow chart of a write operation with respect to the initiator;
FIG. 27 is a flow chart of a bus stretching operation with respect to the
date sender;
FIG. 28 is a flow chart of the bus stretching operation with respect to the
data receiver.
DETAILED DESCRIPTION
In the early days of data processing, when computers were large enough to
fill a room, the standard processing environment consisted of a single
processor running a single job or task. This single task had complete
control over all available memory and input/output (I/O) devices and there
was no concern about contention for memory or I/O. Then, as processor
speed increased, the standard environment changed.
Referring now to FIG. 1, a prior art multi-tasking time-sharing
architecture can be seen. Using a higher performance processor 3, each
task could then receive a mere slice or portion of the available time 9 on
the processor 3 and, because the processor 3 could quickly switch from
running one task to running another task, each task would think that it
was getting all of the processor's time 9. Further, each task would think
that it had access to all the available memory 1 and I/O 5. However, if
one task wanted to communicate with another task, because they weren't
actually running at the same time they couldn't directly communicate.
Instead, an area 2 of the global shared memory 1 was usually reserved for
task-to-task communication which thus occurred across bus 7.
As individual tasks continued to grow in size and complexity, even the fast
time-sharing processor 3 was unable to juggle quickly enough to keep all
the tasks running. This led to new types of processing architectures in
which multiple processors 3 were used to handle the multitude of tasks
waiting to be run.
With reference to FIG. 2, a prior art architecture implemented to handle
multiple tasks with multiple processors can be seen. This multiple
processor multiple task architecture merely replaced the single large
processor with multiple smaller processors 3 connected by a common bus 7
which also connected the processors 3 to the global memory 1 and I/O
resources 5. This is known as Symmetric/Shared Memory Multi-Processing
(SMP) because the multiple processors 3 all share the same memory 1 (hence
the name global memory) and which thus makes the interconnecting bus 7
predominantly a memory oriented bus. This is also known as a
tightly-coupled multi-processing architecture because the individual
processors 3 are carefully monitored by an overseeing multiprocessing
operating system which schedules the various tasks being run, handles the
communications between the tasks running on the multiple processors 3, and
synchronizes the accesses by the multiple processors 3 to both the global
shared memory 1 and the I/O resources 5 to thus avoid collisions, stale
data, and system crashes. Of course, having multiple processors 3
attempting to access the same global memory 1 and I/O 5 can create
bottlenecks on the bus 7 interconnecting them.
With reference to FIG. 3, an alterantive prior art architecture implemented
to handle multiple tasks with multiple processors, generally known as
either loosely-coupled multi-processing or functional partitioning, can be
seen. Rather than have increasingly large individual tasks time-shared on
a single large computer or have individual tasks running on separate
processors all vying for the same global resources, in a functionally
partitioned environment individual tasks are separately run on functional
nodes 10, each consisting of a separate processor 3 with its own local
memory resources 11 and I/O capabilities 5 all of which are connected by a
predominantly message passing bus 7. This is also known as loosely-coupled
architecture because processing can be done within each node 10, including
accesses to local memory 11, without concern about activity in other
functional nodes 10. In other words, each functional node 10 is
essentially a separate processing environment which is running a separate
task or job and as such is not concerned about memory 11 conflicts or I/O
5 or bus 7 collisions because no other processor 3 is operating within
that particular node's 10 separate environment.
A further refinement of the loosely-coupled multi-processing environment,
which is particularly useful when a given task is too large for the single
processor 3 used in a functional node 10 of FIG. 3, is replacement of the
functional node's 10 single processor 3 with a bank of processors 3, as
can be seen in FIG. 4. This bank of processors 3, connected by a
predominantly memory oriented bus 15, shares the functional node's 10
local (yet still global in nature) memory 11. In this way, more processing
power can be allocated to the task and the functional node is still fairly
autonomous from (only loosely-coupled with) other functional nodes.
Further, because the processors 3 within each functional node 10 are
generally limited to a functional task, bus 15 contention by processors 3
running other functional tasks in other functional nodes 10 is generally
eliminated within each node 10.
Therefore, taking a functionally partitioned processing environment
(wherein a multi-tasking/timesharing environment is broken down into
functional processing partitions or nodes 10) and replacing a functional
node's processor 3 with a bank of homogeneous processors 3 to create
multiprocessing nodes 10 can provide greater processing power for that
functional node's 10 given task(s). Additionally, the "plug compatibility"
of functional partitioning (wherein each node 10 need merely concern
itself with the interfere protocols and can function independently of the
format or structure of other nodes 10) can be retained while eliminating
the need for highly customized architectures within each functional node
10.
Of course, supporting multiple processors 3 requires, in addition to a
multiprocessing operating system, the ability to determine when any given
processor 3 will be able to gain access to the bus 15 connecting them in
order to access the shared memory 11 and/or I/O resources. In the
preferred embodiment of the present invention, this ability to arbitrate
between competing processors's 3 accesses to the bus 15 is implemented
through a fully distributed scheme, rather than having a centralized
arbitrator (which would require additional dedicated logic) as is known in
the prior art.
Additionally, as with all processors, an interrupt scheme is necessary to
control events other than normal branches within an executing program.
Having multiple processors 3 within a node 10 requires the further
capability of processor-to-processor interrupts so that one processor 3
can interrupt another processor 3 in order to request it to handle some
task, event, or problem situation. In the preferred embodiment, the
interrupt scheme is implemented as part of the regular command set and is
supported as a normal bus transaction.
Finally, multiple processors 3 reading and writing to a local/global shared
memory 11 can cause wasted cycles on the bus 15 connecting them unless the
"dead time" spent waiting for a response on a read operation is used
by/for another processor 3. A split transaction scheme is thus implemented
in the preferred embodiment of the present invention using both ordered
and out of order responses, depending upon the usual or expected response
time of the data storage medium holding the desired data, thus allowing
another processor 3 to utilize the dead time of a typical read operation.
The preferred embodiment of the bus 15 of the present invention provides a
high bandwidth, low latency pathway between multiple processors 3 and a
global shared memory 11. The pathway 15 handles the movement of
instructions and data blocks between the shared memory 11 and the cluster
of processors 3 as well as processor-to-processor interrupt
communications.
Three types of modules are supported on the bus 15: processor modules 3,
I/O modules, and memory modules.
1) Processor Modules
Processor modules 3 can be further broken down into two classes: General
Purpose Processors (GPP's) and I/O Processors (IOP's), both of which are
write back cache based system processing resources.
The GPP class of processors run the operating system, provide the
computational resources, and manage the system resources. The GPP devices
are homogeneous (of the same general type or device family) with any task
being capable of execution on any of the GPP devices. This allows a single
copy of the operating system (OS) to be shared and run on any of the
GPP's.
The IOP class of processors provide an intelligent link between standard
I/O devices and the cluster of GPP's. The IOP can be any type of processor
interfacing with any type of I/O bus or IO device. External accesses to
the cluster of computational GPP resources occur via an IOP. Any IOP in
the system can be used to boot an operating system and the boot IOP can be
changed between subsequent boot operations.
2) I/O Modules
I/O modules connect to other buses and thus provide a window into I/O
resources attached to those other buses using either I/O space or memory
mapped I/O. I/O boards are thus slaves to this bus environment and merely
provide a link from this bus to I/O boards on other buses. Windows of
memory and I/O space are set mapped out of the bus address space for
accesses to these other buses. The I/O boards do not, however, provide a
direct window from other buses due to cache coherency and performance
considerations. Thus, when another bus wishes to access this bus, an IOP
is required.
3) Memory Modules
Memory modules connect to the bus to provide high bandwidth low latency
access to a global shared memory resource available to all the processors
on the bus. The shared memory provides the basis for task sharing between
the processors using semaphores, as well as passing data structures using
pointers. Local memory may exist on a GPP or IOP module, but it must be
private and neither visible to nor shared with other agents on the bus.
Both processor 3 module types, GPP's and IOP's, may contain a cache memory
facility. The implemented cache protocol supports write back or write
through caching in separate address spaces. All processors 3 connected to
the bus 15 should support cache block transfers in the write back cache
data space to perform an intervention operation, as is discussed in more
detail below.
Arbitration, address/command signals and data communication are overlapped
on the bus 15. In the preferred embodiment, the bus 15 has the capability
of splitting read transactions into two parts to avoid wasted bus 15
cycles while waiting for a read response. The first part of the split read
transaction is the read request and the second part is the resulting read
response. This split read transaction sequence allows overlapping the
delayed access time of memory 11 accesses (and other relatively lengthy
read responses) with the use of the bus 15 by others. Additionally, read
responses may occur either "in-order" or "out-of-order" with respect to
their associated requests. The in-order mechanism is optimal for
deterministic accesses such as memory 11 reads while the out-of-order
mechanism accommodates slower responders (such as bus bridges and remote
memory or I/O) yet still maintaining high bus 15 utilization/effective
bandwidth.
Interprocessor interrupts over the bus 15 provide an urgent communication
mechanism. Interrupts can be individually directed or widely broadcast to
any or all processors 3 or processor 3 classes in the preferred
embodiment. The communication of interrupts occurs by cycle stealing
available command cycles on the address bus (part of bus 15) thus avoiding
any impact on the performance of data transfers.
Bus 15 initialization support includes test (interrupt and restart),
configuration, and bootstrap capabilities. Each module on the bus 15 can
be tested either automatically or by operator direction, including fault
detection and isolation. Modules contain information regarding their
capabilities and allow for system configuration options.
A boot strap processing (BSP) function should exist on the bus 15 to
perform the configuration operation and provide the boot of the operating
system to the bus 15. The BSP may be a subset of the capabilities of an
IOP 3 which could thus perform a BSP on the cluster of processors 3 on the
bus 15 via a processor from a connected bus.
The bus 15 architecture is thus a high bandwidth, cache coherent memory bus
and in the preferred embodiment of the present invention it is implemented
with a synchronous backplane transfer protocol that runs off a radially
distributed clock. Information of the bus 15 is transferred between boards
on the clock edges with the maximum clock rate dependent on delay from
clock to information, bus settling time and receiving latch setup time.
One of the factors that can limit the speed of backplane transfer is the
electrical length. A 10 slot backplane is used in the preferred embodiment
to minimize the bus 15 electrical length and thus maintain a high
backplane transfer speed capability. Thus, only high performance modules
should be permitted direct access to the backplane. The architecture
permits multiple processor 3 modules to exist on a single board which is
achievable using VLSI solutions for caching and bus 15 interfacing.
The protocol has been defined to maximize the percentage of useful
bandwidth as compared to raw bandwidth. This is accomplished through quick
arbitration, demultiplexed address and data paths (on bus 15) and split
transfers among other features.
Referring now to FIG. 5, a multiprocessing node using an SMP configuration
can be seen. In this particular configuration, a central arbitrator 13 is
shown whereby any processor 3 or Input/Output (IO) processor 5 wishing to
access memory 11 must first post a request to the central arbitrator 13 on
bus 15 in order to do so. The central arbitrator's 13 job is to determine
which processor gets access to the local memory 11 in what order. However,
use of a central arbitrator 13, while handling contention at memory 11,
requires additional logic to handle the specialized arbitration function.
This specialized logic can exist as a separate card connected to the bus
15 but this would take up a card space on bus 15. An alternative would be
to made arbitrator 13 a portion of the bus 15 logic itself. This, however,
makes the bus implementation more complicated and can also make diagnosis
and repair of arbitration problems more difficult the more the central
arbitrator 13 is integrated into bus 15.
Referring now to FIG. 6, memory 11, processors 3 and IO processor 5 connect
to bus 15, in the preferred embodiment of the present invention, where a
distributed arbitration scheme is used instead of a central arbitrator.
This distributed arbitration scheme allows an individual processor,
contending with other processors, to access the local memory 11 by having
each processor 3 and IO processor 5 separately and individually handle the
arbitration process. This distributed arbitration scheme, although
requiring additional logic in each processor card, distributes this
function thus simplifying the bus 15 implementation and thus avoids having
to use an additional slot to handle the arbitration requirements.
A further advantage of the distributed arbitration approach is the
reduction of bus traffic between processors and a central arbitrator. By
merely having those processors who wish to access memory 11 contend for
that access by handling their own arbitration in a distributed manner, the
only additional bus traffic for arbitration is that between processors.
This distributed arbitration scheme thus eliminates contention for memory
11 as well as contention for a central arbitrator. The implementation of
this distributed arbitration scheme is explained more fully below.
Referring now to FIG. 7, timing charts showing various bus implementations
can be seen. In the first timing signal, a combined bus which handles all
arbitration and any addressing or command signals is shown. This timing
signal, depicting a combined bus, shows the sequence of a first processor,
wishing to access the bus/memory, issuing command 1 in the next cycle
after arbitration 1. Next, a second processor, also wishing to access the
bus/memory, issues a command after arbitrating for access, followed by a
third, etc.
While this appears to make most efficient use of the bus as there are no
idle periods in this sequence, this serial bus arbitration command
sequence is not the most efficient methodology. This can be seen by
comparing the first time line to the second and third time lines. The
second and third time lines represent a separate arbitration bus 17 from
an address/command bus 19. In this sequence, in the next cycle after
arbitrating on the arbitration bus 17, a command may be issued on the
address/command bus 19. This is followed by a second arbitration on the
arbitration bus 17 and its associated command on the address/command bus
19.
However, merely splitting the bus into an arbitration bus 17 and an
address/command bus 19 not only creates idle times in each of the
respective buses it also does not improve performance. Comparing each of
these two methodologies to the methodology of the present invention, which
is shown by the 4th and 5th timing signals, an improved more efficient
methodology can be seen. The 4th timing signal represents the arbitration
bus 17 and the 5th timing signal represents a combination address/command
bus 19 (which are part of the bus 15 of FIGS. 4-6). The difference is that
the arbitration bus 17 and the address/command bus 19 are now overlapped
or pipelined, as opposed to a straight sequential process as was shown by
the 2nd and 3rd timing signals. In this sequence, after the first
arbitration has begun and a processor has won that arbitration, that
processor may immediately issue a command on the address command bus. Once
it is known that a command is about to complete, the next arbitration
cycle can begin thus allowing the next arbitration winner to issue a
command immediately following the completion of the first arbitration
winner's command, and so on vis-a-vis the third arbitration, etc. Please
note that in these timing diagrams a single cycle arbitration and a single
cycle command has been shown. In the preferred embodiment of the present
invention, arbitration is a two cycle process and command issuance
requires one or more cycles depending upon such things as the complexity
of the command being issued.
Referring now to FIG. 8, the arbitration process is shown by a state
diagram. The arbitration process involves the resolution of contention for
bus access among the various processors in an arbitration group. Beginning
with an idle condition, state 27, whenever one or more processors wish to
access memory each of those processors raises its own arbitration signal.
Every processor, in state 21, then inputs all other processor's
arbitration signals thus forming an arbitration group by inputting and
latching all of the arbitration signals. Following the latch state 21,
each of the respective processors in the arbitration group compares its
priority to all the other processors in the arbitration group. This
comparison is done in the resolution state 23. The winner of this
resolution of the arbitration group is the processor which next issues a
command and/or accesses the memory. If this access command sequence is to
take more than one cycle then the wait state 25 is achieved. And if this
was the last processor in the group to win the arbitration resolution and
gain access, then the bus would return to an idle state once the last
processor completes its operation. However, if there are more processors
remaining in the arbitration group, then following the resolution or wait
states all processors return to the latch state 21 and again input all
arbitration signals from the remaining processors in the arbitration
group. Note that the processors from the group who have already had access
are no longer part of the arbitration group and they would no longer be
outputting an arbitration signal to request access. The cycle is then
repeated through the resolution state, and possibly the wait state, until
there are no more processors remaining in the arbitration group. If other
processors wish to access memory while the first arbitration group is
arbitrating for access, these other processors (and those processors of
that earlier arbitration group who have already had access) must wait
until the last processor in the present group has completed the resolution
state before they can form a new group. This avoids wasting a cycle going
through the idle state.
One of the advantages of the present invention is that of providing
fairness (equal access over time) to all potential bus contenders.
Although one processor may get slightly faster access in a given
arbitration group due to that processor's arbitration priority, by using
the arbitration group approach each processor is assured of fairness. On
the average, a request would only have to wait for the duration of one
group resolution time before being granted bus access. In this way, no
processor should have to endure starvation.
Referring now to FIG. 9, a timing diagram is shown which represent four
different processors all potentially contending for bus and/or memory
access. Each of the signal lines 1 through 4 represent one of the
processors. The 5th signal line is a logical representation of when the
last processor in an arbitration group is resolving its arbitration
(LastSlot). The 6th line is the address/command bus 19 whereby commands
are issued following arbitration.
Reviewing the arbitration sequence shows signal lines 1 and 3 raised in the
first cycle. This means that processor 1 and processor 3 both wish to
access the bus and/or memory and have indicated this desire by outputting
an arbitration signal. All processors then input and latch these
arbitration signals and all of those within the group, namely 1 and 3, go
through the resolution state to determine who gets access. In this case
processor 1, having a higher priority than processor 3, first gains access
through the arbitration resolution and issues a command on the
address/command bus 19. While this command is being issued, due to the
pipelined operation of this arbitration scheme, processor 6 and all the
other processors input and latch all of the arbitration signals.
Additionally, as is discussed below with reference to the interrupt
protocol, the last arbitration winner is saved by each processor.
Now, because it is only processor 3 who is outputting an arbitration
signal, processor 3 wins the following arbitration resolution and can then
issue a command in the following cycle. Simultaneously with this
resolution, due to LastSlot being logically raised as is shown by the 5th
timing signal, any other processor wishing to form an arbitration group
may then raise their arbitration signal in the next clock cycle. However,
please note that LastSlot is merely a signal generated internally by each
agent and is not actually a separate signal line as is depicted in FIG. 9.
In this example, processor 2 and processor 4 who now wish to access
memory, raise their arbitration signals and go through a latch input,
latch resolution arbitration cycle in order to issue their commands or
access memory.
Note that LastSlot is merely the condition whereby the last processor in an
arbitration group is latching in and resolving its own arbitration signal.
Last slot equals the logical sum of the parts comprising the equation:
(1.multidot.2.multidot.3.multidot.4)+(1.multidot.2.multidot.3.multidot.4)+
(1.multidot.2.multidot.3.multidot.4)+(1.multidot.2.multidot.3.multidot.4).
Stated differently, new group formation may occur if the arbitration bus
is either idle or the arbitration sequence is either in a resolution or
wait state and the LastSlot condition exists.
In the preferred embodiment of the present invention, to support the split
transaction read response protocol as is discussed in more detail below,
there are additional considerations before an arbitration winner can issue
a command on the address/command bus 19. Once a processor has won in
arbitration, if the processor is trying to issue a command which does not
require access to the data bus (a separate bus to handle data transfers,
as is discussed below with reference to the split transaction protocol),
then the processor is free to issue the command. However, if the processor
is trying to issue a command which does require access to the data bus
then the processor must take the additional step, beyond arbitration, of
ensuring that the data bus is available. This extra step, really
comprising two steps, is to first check to see whether the Data Strobe
(DS) signal is asserted (thus indicating that the data bus is busy
handling a prior transaction), and secondly, to check to see whether the
Ordered (ORD) response signal line is asserted (thus indicating that an
ordered split read response is pending). See the discussion below for when
and how the ORD signal gets asserted. Once both the DS signal and the ORD
signal are not asserted, the processor is free to issue a command on the
address/command bus 19 and, after asserting the DS signal line, can then
transmit data on the data bus.
Referring now to FIG. 10, in the preferred embodiment of the present
invention multiple processors exist, each residing on a card which fits
into a slot in the back plane which makes up the bus of the system. In
order for each processor to be able to resolve arbitration, it must know
of each other processor in an arbitration group. Thus, each processor in
an arbitration group must be able to discern each other processor in the
arbitration group. The way this is handled in the preferred embodiment of
the present invention is for each processor to output a distinct
arbitration signal when it wishes to arbitrate to gain access to the bus
and/or memory.
To arbitrate for bus access, each processor asserts the first signal line
on the card and inputs the other remaining arbitration signal lines. Then,
in one embodiment of the present invention because the back plane
connector signal lines are rotated, each card is asserting its own
respective signal line across the back plane to all the other cards on the
bus. This is shown in FIG. 10 by having processing card 29 assert signal
line number 0, when it wishes to arbitrate for access, and inputs lines 1
through 9 to determine which other processors are vying for access.
Processor card 29 plugs into connector 31 of back plane P2. Connector 31
has ten signal lines, each of which is rotated one position as compared to
its neighboring slot connector. Thus, in the table to the right of FIG.
10, processor card slot number 0 has the first signal line connected to
connector 31, wire 0, while processor card slot number 1 has the first
signal line connected to connector 31, wire 1, etc. In this way, each
processor/card can assert the same signal line on the processor/card yet
othe | | |