WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Multiprocessor computer system with data bus and ordered and out-of-order split data transactions    
United States Patent5191649   
Link to this pagehttp://www.wikipatents.com/5191649.html
Inventor(s)Cadambi; Sudarshan B. (Beaverton, OR); Guy; Charles B. (Hillsboro, OR); Gray; David R. (Cupertino, CA); Gonzales; Mark A. (Portland, OR)
AbstractA method of transferring data in response to a read command in a computer system having a plurality of processors coupled to an address bus, a command bus and a data bus is described. A first processor generates and sends the read command to read a first data from a second processor. The second processor then determines with which one of (1) the first data and (2) a read response command and the first data it desires to respond to the read command. If the second processor determines to respond with the first data, then it acknowledges receipt of the read command and performs an ordered response in which the command and address buses are released and only the first data is later sent to the first processor via the data bus when available. If the second processor determines to respond with the read response command and the first data, then it acknowledges receipt of the read command and performs an out-of-order response in which the access of the command and address buses is first released and gained again by arbitration when the first data is determined to be available in the second processor. The second processor then gains the access of the data bus when the data bus is free of any data transaction. The read response command and its address and the first data are then issued to the first processor.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5191649
Multiprocessor computer system with data bus and ordered and

     out-of-order split data transactions - US Patent 5191649 Drawing
Multiprocessor computer system with data bus and ordered and out-of-order split data transactions
Inventor     Cadambi; Sudarshan B. (Beaverton, OR); Guy; Charles B. (Hillsboro, OR); Gray; David R. (Cupertino, CA); Gonzales; Mark A. (Portland, OR)
Owner/Assignee     Intel Corporation (Santa Clara, CA)
Patent assignment
All assignments
Publication Date     March 2, 1993
Application Number     07/631,892
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 21, 1990
US Classification     709/225 710/100 711/158 714/15
Int'l Classification     H04L 012/40 G06F 013/42
Examiner     Shaw; Dale M.
Assistant Examiner     Barry; Lance L.
Attorney/Law Firm     Blakely, Sokoloff, Taylor & Zafman
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/200 395/325 395/400
Patent Tags     multiprocessor computer data bus ordered and out-of-order split data transactions
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4281380
DeMesa, III
710/119
Jul,1981

[0 after 0 votes]
4232366
Levy
710/121
Nov,1980

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of transferring data in response to a read command in a computer system having an address bus, a command bus, a data bus, and a plurality of processors coupled to the address bus, the command bus, and the data bus, comprising the steps of:

(A) generating from a first processor of the plurality of processors the read command with a first address, wherein the read command requires to read a first data from a second processor of the plurality of processors to the first processor, wherein the first address points to the second processor;

(B) gaining by an arbitration operation an access to the command bus and the address bus from the first processor, wherein the first processor is awarded the access to the command bus and the address bus during the arbitration operation when the first processor has a highest priority among a first number of the plurality of processors that are currently requesting the access to the command bus and the address bus with the first processor;

(C) issuing the read command to the command bus and the first address to the address bus after the first processor has gained the access to the command bus and the address bus, wherein the first address causes the second processor to receive the read command;

(D) receiving the read command in the second processor;

(E) determining with which one of (1) the first data and (2) a read response command and the first data that the second processor desires to respond to the read command;

(F) if the second processor determines to respond to the read command with the first data, then

(i) acknowledging receipt of the read command in the second processor to the first processor by sending a first acknowledgement signal from the second processor to the first processor via the command bus, wherein the first acknowledgement signal indicates to the first processor that the second processor desires to respond to the read command with the first data;

(ii) adding an own marker to a first-in-first-out (FIFO) queue, wherein the own marker contains information that indicates that the second processor will send the first data to the first processor via the data bus in response to the read command;

(iii) releasing the access to the command bus and the address bus by the first processor;

(iv) determining when the first data is available in the second processor;

(v) when the first data is determined to be available in the second processor, then determining when the own marker becomes first in the FIFO queue;

(vi) moving the own marker in the FIFO queue by removing a marker which is currently first in the FIFO queue from the FIFO queue when a data transaction associated with the marker is complete such that the own marker can become first in the FIFO queue, wherein the own marker moves in the FIFO queue in a first-in-first-out order;

(vii) when the own marker is determined to be first in the FIFO queue, then determining if the data bus is free of any data transaction, wherein the step (F) (vii) is repeated if there is a data transaction on the data bus;

(viii) if there is determined to be no data transaction on the data bus, then transferring the first data from the second processor to the first processor by gaining an access to the data bus and issuing the first data to the data bus, wherein the first data is directed to the first processor by the information contained in the marker;

(ix) removing the own marker from the FIFO queue;

(G) if the second processor determines to respond to the read command with the read response command and the first data, then

(i) acknowledging receipt of the read command in the second processor to the first processor by sending a second acknowledgement signal from the second processor to the first processor via the command bus, wherein the second acknowledgement signal indicates to the first processor that the second processor desires to respond to the read command with the read response command and the first data;

(ii) releasing the access to the command bus and the address bus by the first processor;

(iii) determining when the first data is available in the second processor;

(iv) when the first data is determined to be available in the second processor, then gaining by the arbitration operation the access to the command bus and the address bus for the second processor, wherein the second processor is awarded the access to the command bus and the address bus during the arbitration operation when the second processor has the highest priority among a second number of the plurality of processors that are currently requesting the access to the command bus and the address bus with the second processor;

(v) gaining the access to the data bus when the data bus is free of any transaction and a current first marker in the FIFO queue is not demanding the access to the data bus;

(vi) issuing the read response command from the second processor to the command bus and a second address from the second processor to the address bus, wherein the read response command and the second address are both destined for the first processor;

(vii) transmitting the first data from the second processor to the first processor via the data bus, wherein the first processor receives the first data in accordance with the read response command received in the first processor.

2. The method of claim 1, further comprising the step of repeating the step (C) by the first processor until one of the first and second acknowledgment signals is received in the first processor.

3. The method of claim 1, further comprising the steps of:

(H) issuing a full signal indicating the FIFO queue is full when the FIFO queue is full;

(I) causing the first processor to repeat the steps (B) through (C) if the first processor has issued the read command and the full signal has been issued.

4. The method of claim 1, wherein the first data includes a plurality of data portions, wherein the first data is sent from the second processor to the first processor via the data bus on a one-data-portion-at-a-time basis, wherein the step (F) (viii) further comprises the steps of

(1) detecting in the second processor whether a data portion of the first data that currently is being tramsmitted on the data bus includes an error;

(2) if the data portion includes the error, then asserting an error flag signal from the second processor to the first processor and stopping transmitting the first data;

(3) correcting in the second processor the error in the data portion such that the data portion becomes a corrected data portion;

(4) resuming transmitting the first data by starting to resend the corrected data portion without the error.

5. The method of claim 1, wherein the first data includes a plurality of data portions, wherein the first data is sent from the second processor to the first processor via the data bus on a one-data-portion-at-a-time basis, wherein the step (G) (vii) comprises the steps of

(1) detecting in the second processor whether a data portion of the first data that is currently being transmitted on the data bus includes an error;

(2) if the data portion includes the error, then asserting an error flag signal from the second processor to the first processor and stopping transmitting the first data;

(3) correcting in the second processor the error in the data portion such that the data portion becomes a corrected data portion;

(4) resuming transmitting the first data by starting to resend the corrected data portion without the error.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates to the field of handling read and write responses to processors coupled to a computer bus. More particularly, this invention relates to allowing other processors to utilize the computer bus during the time spent waiting for a response to a read request.

BACKGROUND OF THE INVENTION

A computer processor typically performs read and write operations to both memory and input/output devices on a frequent basis. A write operation usually involves transmitting the data to be written along with the address of the location being written to. Conversely, with a read operation, after the read command has been issued, the processor and the bus can sit idle while waiting for the response to the read command to be forthcoming. Although the processor may be allocated to another task in the meantime, the bus can be stuck sitting idle unable to transmit another command (transmit other data) until the read command is responded to thus allowing the bus to accept another command. If the device being read is relatively fast, then the delay may only be for a short time and the performance degradation may be acceptable. However, if the device being read is relatively slow, then the bus may be sitting idle for a considerable, and unacceptable, period of time.

One prior approach to improving bus utilization on read commands is to limit the types of devices which can be read. If only a global memory may be read then, because there is no mechanical delay and because the global memory is most likely directly connected to the bus, the bus is not idle for extemely long periods of time. In this way, the bus would have a shorter average idle time. However, the bus is still idle when a read command is outstanding and other read commands can not be issued during this idle period.

Another prior approach to improving bus utilization on read commands is to improve the speed of the bus itself. In this way, when the read response is ready, it will be transmitted that much faster and thus free up the bus that much sooner. Additionally, this would decrease the time the requesting processor waits for a read response as both the read command and the read response would be transmitted more quickly. However, merely improving the bus speed does nothing to eliminate the idle time the bus experiences while it is waiting for a read response. Thus, additional read commands would still have to wait for responses to earlier read commands to be completed before these additional read commands could be issued.

SUMMARY AND OBJECTS OF THE INVENTION

One objective of the present invention is to provide an improved method of handling read transactions on a computer bus in a multiple processor environment.

Another objective of the present invention is to provide an improved method of handling read transactions so as to allow the computer bus to be utilized for other transactions during the time between the read command and its associated read response.

Still another objective of the present invention is to provide an improved method of handling read transactions so as to allow the computer bus to be utilized for other transactions during the time between read commands and their associated read response so that relatively fast devices provide ordered responses and relatively slow devices provide out of order responses.

Yet another objective of the present invention is to provide a method of providing ordered and out of order split responses to read commands in a computer system with a command bus, a data bus, and multiple processors wherein the command bus and the data bus may be utilized for other commands while a processor is waiting for a read response after issuing a read command. When a processor desires to issue a read command, the processor performs read command steps of gaining access to the command bus and issuing the read command on the command bus. When a processor desires to issue a write command, the processor performs write command steps of gaining access to the command bus, issuing the write command on the command bus, and issuing write data on the data bus if the data bus is available and if no other processor is outputting an ordered response signal. When a processor desires to be able to provide ordered split responses to read commands, the processor performs queueing steps of adding one marker to a First-In-First-Out (FIFO) queue of the processor if a read command acknowledgement signal is transmitted without an out-of-order read response signal, and removing one marker from the processor's FIFO queue if an ordered read response signal is transmitted. When a processor desires to provide an ordered response to a read command the processor performs ordered read response steps of outputting a read command acknowledgement signal, marking as owned by the ordered response processor the last entered marker in the ordered response processor's FIFO queue, and if the ordered response processor is ready to respond to the read command with an ordered response and if the ordered response processor's owned marker is at the head of the ordered response processor's FIFO queue and if no other processor is outputting an ordered response signal then the ordered response processor outputs an ordered response signal indicating readiness to provide an ordered response and transmits data on the data bus if the data bus is available. When a processor desires to provide and out-of-order response to a read command the processor performs out-of-order read response steps of outputting a read command acknowledgement signal, outputting an out-of-order read response signal, and if the out-of-order response processor is ready to respond to the read command then the out-of-order response processor gains access to the command bus and transmits data on the data bus if the data bus is available and if no other processor is outputting an ordered response signal.

More specifically, the read command issuing processor of the present invention repeats the read command steps if no acknowledgement signal by an other processor that the command is being handled is received.

More specifically, the ordered read response processor in the present invention outputs a signal indicating when the ordered read response processor's FIFO queue is full and to have the read command issuing processor repeat the read command steps if it sees a full FIFO queue signal.

Even more specifically, the processor desiring to provide an ordered response to a read command derives the identity of the read command issuing processor from a combination of the read command and the process by which the read command issuing processor gained access to the command bus.

Still more specifically, the processor desiring to provide an out-of-order response to a read command derives the identity of the read command issuing processor from a combination of the read command and the process by which the read command issuing processor gained access to the command bus.

Other objects, features, and advantages of the present invention will be apparent from the accompanying drawings and from the detailed description which follows below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:

FIG. 1 depicts a prior art multi-tasking, time-sharing architecture;

FIG. 2 depicts a prior art tightly-coupled multi-processing architecture;

FIG. 3 depicts a prior art loosely-coupled multiprocessing/functionally partitioned architecture;

FIG. 4 depicts the architecture of the present invention;

FIG. 5 depicts a central arbitrator architecture of the prior art;

FIG. 6 depicts a functional node of the present invention;

FIG. 7 is a timing diagram of prior art arbitration as compared with the arbitration of the present invention;

FIG. 8 depicts an arbitration state diagram;

FIG. 9 is a timing diagram of the new arbitration group formation;

FIG. 10 depicts a card and a slot connector to show the arbitration signal lines as well as a table to illustrate the rotation of the arbitration signals lines from one slot to the next;

FIG. 11 is a timing diagram of a two write operations;

FIG. 12 is a more detailed timing diagram of two consecutive write operations;

FIG. 13 is a timing diagram of a prior art read operation;

FIG. 14 is a timing diagram of a read operation and a write operation;

FIG. 15 depicts a cache coherency state diagram;

FIG. 16 is a flowchart of the steps taken when receiving an Interrupt Processor Request Command;

FIG. 17 is a further flowchart of the steps taken when receiving an Interrupt Processor Request Command;

FIG. 18 depicts the correct only mode of the ECC circuitry of the prior art;

FIG. 19 depicts a processor card (containing up to four processor/cache modules) with its associated bus controller and the bus interface; FIG. 20 depicts the detect and correct mode of the ECC circuitry;

FIG. 21 is a timing chart of the prior art error detection and correction as compared to the bus stretching protocol of the present invention;

FIG. 22 is a logic diagram of the arbitration priority determination and resolution circuitry for the third alternative embodiment of the present invention;

FIG. 23 depicts the backplane configuration with slot marker for the third alternative embodiment of the present invention;

FIGS. 24 and 25 are flow charts of the split data transactions of the present invention, wherein FIG. 24 shows the procedure with respect to the requester and FIG. 25 shows the procedure with respect to the responder;

FIG. 26 is a flow chart of a write operation with respect to the initiator;

FIG. 27 is a flow chart of a bus stretching operation with respect to the date sender;

FIG. 28 is a flow chart of the bus stretching operation with respect to the data receiver.

DETAILED DESCRIPTION

In the early days of data processing, when computers were large enough to fill a room, the standard processing environment consisted of a single processor running a single job or task. This single task had complete control over all available memory and input/output (I/O) devices and there was no concern about contention for memory or I/O. Then, as processor speed increased, the standard environment changed.

Referring now to FIG. 1, a prior art multi-tasking time-sharing architecture can be seen. Using a higher performance processor 3, each task could then receive a mere slice or portion of the available time 9 on the processor 3 and, because the processor 3 could quickly switch from running one task to running another task, each task would think that it was getting all of the processor's time 9. Further, each task would think that it had access to all the available memory 1 and I/O 5. However, if one task wanted to communicate with another task, because they weren't actually running at the same time they couldn't directly communicate. Instead, an area 2 of the global shared memory 1 was usually reserved for task-to-task communication which thus occurred across bus 7.

As individual tasks continued to grow in size and complexity, even the fast time-sharing processor 3 was unable to juggle quickly enough to keep all the tasks running. This led to new types of processing architectures in which multiple processors 3 were used to handle the multitude of tasks waiting to be run.

With reference to FIG. 2, a prior art architecture implemented to handle multiple tasks with multiple processors can be seen. This multiple processor multiple task architecture merely replaced the single large processor with multiple smaller processors 3 connected by a common bus 7 which also connected the processors 3 to the global memory 1 and I/O resources 5. This is known as Symmetric/Shared Memory Multi-Processing (SMP) because the multiple processors 3 all share the same memory 1 (hence the name global memory) and which thus makes the interconnecting bus 7 predominantly a memory oriented bus. This is also known as a tightly-coupled multi-processing architecture because the individual processors 3 are carefully monitored by an overseeing multiprocessing operating system which schedules the various tasks being run, handles the communications between the tasks running on the multiple processors 3, and synchronizes the accesses by the multiple processors 3 to both the global shared memory 1 and the I/O resources 5 to thus avoid collisions, stale data, and system crashes. Of course, having multiple processors 3 attempting to access the same global memory 1 and I/O 5 can create bottlenecks on the bus 7 interconnecting them.

With reference to FIG. 3, an alterantive prior art architecture implemented to handle multiple tasks with multiple processors, generally known as either loosely-coupled multi-processing or functional partitioning, can be seen. Rather than have increasingly large individual tasks time-shared on a single large computer or have individual tasks running on separate processors all vying for the same global resources, in a functionally partitioned environment individual tasks are separately run on functional nodes 10, each consisting of a separate processor 3 with its own local memory resources 11 and I/O capabilities 5 all of which are connected by a predominantly message passing bus 7. This is also known as loosely-coupled architecture because processing can be done within each node 10, including accesses to local memory 11, without concern about activity in other functional nodes 10. In other words, each functional node 10 is essentially a separate processing environment which is running a separate task or job and as such is not concerned about memory 11 conflicts or I/O 5 or bus 7 collisions because no other processor 3 is operating within that particular node's 10 separate environment.

A further refinement of the loosely-coupled multi-processing environment, which is particularly useful when a given task is too large for the single processor 3 used in a functional node 10 of FIG. 3, is replacement of the functional node's 10 single processor 3 with a bank of processors 3, as can be seen in FIG. 4. This bank of processors 3, connected by a predominantly memory oriented bus 15, shares the functional node's 10 local (yet still global in nature) memory 11. In this way, more processing power can be allocated to the task and the functional node is still fairly autonomous from (only loosely-coupled with) other functional nodes. Further, because the processors 3 within each functional node 10 are generally limited to a functional task, bus 15 contention by processors 3 running other functional tasks in other functional nodes 10 is generally eliminated within each node 10.

Therefore, taking a functionally partitioned processing environment (wherein a multi-tasking/timesharing environment is broken down into functional processing partitions or nodes 10) and replacing a functional node's processor 3 with a bank of homogeneous processors 3 to create multiprocessing nodes 10 can provide greater processing power for that functional node's 10 given task(s). Additionally, the "plug compatibility" of functional partitioning (wherein each node 10 need merely concern itself with the interfere protocols and can function independently of the format or structure of other nodes 10) can be retained while eliminating the need for highly customized architectures within each functional node 10.

Of course, supporting multiple processors 3 requires, in addition to a multiprocessing operating system, the ability to determine when any given processor 3 will be able to gain access to the bus 15 connecting them in order to access the shared memory 11 and/or I/O resources. In the preferred embodiment of the present invention, this ability to arbitrate between competing processors's 3 accesses to the bus 15 is implemented through a fully distributed scheme, rather than having a centralized arbitrator (which would require additional dedicated logic) as is known in the prior art.

Additionally, as with all processors, an interrupt scheme is necessary to control events other than normal branches within an executing program. Having multiple processors 3 within a node 10 requires the further capability of processor-to-processor interrupts so that one processor 3 can interrupt another processor 3 in order to request it to handle some task, event, or problem situation. In the preferred embodiment, the interrupt scheme is implemented as part of the regular command set and is supported as a normal bus transaction.

Finally, multiple processors 3 reading and writing to a local/global shared memory 11 can cause wasted cycles on the bus 15 connecting them unless the "dead time" spent waiting for a response on a read operation is used by/for another processor 3. A split transaction scheme is thus implemented in the preferred embodiment of the present invention using both ordered and out of order responses, depending upon the usual or expected response time of the data storage medium holding the desired data, thus allowing another processor 3 to utilize the dead time of a typical read operation.

The preferred embodiment of the bus 15 of the present invention provides a high bandwidth, low latency pathway between multiple processors 3 and a global shared memory 11. The pathway 15 handles the movement of instructions and data blocks between the shared memory 11 and the cluster of processors 3 as well as processor-to-processor interrupt communications.

Three types of modules are supported on the bus 15: processor modules 3, I/O modules, and memory modules.

1) Processor Modules

Processor modules 3 can be further broken down into two classes: General Purpose Processors (GPP's) and I/O Processors (IOP's), both of which are write back cache based system processing resources.

The GPP class of processors run the operating system, provide the computational resources, and manage the system resources. The GPP devices are homogeneous (of the same general type or device family) with any task being capable of execution on any of the GPP devices. This allows a single copy of the operating system (OS) to be shared and run on any of the GPP's.

The IOP class of processors provide an intelligent link between standard I/O devices and the cluster of GPP's. The IOP can be any type of processor interfacing with any type of I/O bus or IO device. External accesses to the cluster of computational GPP resources occur via an IOP. Any IOP in the system can be used to boot an operating system and the boot IOP can be changed between subsequent boot operations.

2) I/O Modules

I/O modules connect to other buses and thus provide a window into I/O resources attached to those other buses using either I/O space or memory mapped I/O. I/O boards are thus slaves to this bus environment and merely provide a link from this bus to I/O boards on other buses. Windows of memory and I/O space are set mapped out of the bus address space for accesses to these other buses. The I/O boards do not, however, provide a direct window from other buses due to cache coherency and performance considerations. Thus, when another bus wishes to access this bus, an IOP is required.

3) Memory Modules

Memory modules connect to the bus to provide high bandwidth low latency access to a global shared memory resource available to all the processors on the bus. The shared memory provides the basis for task sharing between the processors using semaphores, as well as passing data structures using pointers. Local memory may exist on a GPP or IOP module, but it must be private and neither visible to nor shared with other agents on the bus.

Both processor 3 module types, GPP's and IOP's, may contain a cache memory facility. The implemented cache protocol supports write back or write through caching in separate address spaces. All processors 3 connected to the bus 15 should support cache block transfers in the write back cache data space to perform an intervention operation, as is discussed in more detail below.

Arbitration, address/command signals and data communication are overlapped on the bus 15. In the preferred embodiment, the bus 15 has the capability of splitting read transactions into two parts to avoid wasted bus 15 cycles while waiting for a read response. The first part of the split read transaction is the read request and the second part is the resulting read response. This split read transaction sequence allows overlapping the delayed access time of memory 11 accesses (and other relatively lengthy read responses) with the use of the bus 15 by others. Additionally, read responses may occur either "in-order" or "out-of-order" with respect to their associated requests. The in-order mechanism is optimal for deterministic accesses such as memory 11 reads while the out-of-order mechanism accommodates slower responders (such as bus bridges and remote memory or I/O) yet still maintaining high bus 15 utilization/effective bandwidth.

Interprocessor interrupts over the bus 15 provide an urgent communication mechanism. Interrupts can be individually directed or widely broadcast to any or all processors 3 or processor 3 classes in the preferred embodiment. The communication of interrupts occurs by cycle stealing available command cycles on the address bus (part of bus 15) thus avoiding any impact on the performance of data transfers.

Bus 15 initialization support includes test (interrupt and restart), configuration, and bootstrap capabilities. Each module on the bus 15 can be tested either automatically or by operator direction, including fault detection and isolation. Modules contain information regarding their capabilities and allow for system configuration options.

A boot strap processing (BSP) function should exist on the bus 15 to perform the configuration operation and provide the boot of the operating system to the bus 15. The BSP may be a subset of the capabilities of an IOP 3 which could thus perform a BSP on the cluster of processors 3 on the bus 15 via a processor from a connected bus.

The bus 15 architecture is thus a high bandwidth, cache coherent memory bus and in the preferred embodiment of the present invention it is implemented with a synchronous backplane transfer protocol that runs off a radially distributed clock. Information of the bus 15 is transferred between boards on the clock edges with the maximum clock rate dependent on delay from clock to information, bus settling time and receiving latch setup time.

One of the factors that can limit the speed of backplane transfer is the electrical length. A 10 slot backplane is used in the preferred embodiment to minimize the bus 15 electrical length and thus maintain a high backplane transfer speed capability. Thus, only high performance modules should be permitted direct access to the backplane. The architecture permits multiple processor 3 modules to exist on a single board which is achievable using VLSI solutions for caching and bus 15 interfacing.

The protocol has been defined to maximize the percentage of useful bandwidth as compared to raw bandwidth. This is accomplished through quick arbitration, demultiplexed address and data paths (on bus 15) and split transfers among other features.

Referring now to FIG. 5, a multiprocessing node using an SMP configuration can be seen. In this particular configuration, a central arbitrator 13 is shown whereby any processor 3 or Input/Output (IO) processor 5 wishing to access memory 11 must first post a request to the central arbitrator 13 on bus 15 in order to do so. The central arbitrator's 13 job is to determine which processor gets access to the local memory 11 in what order. However, use of a central arbitrator 13, while handling contention at memory 11, requires additional logic to handle the specialized arbitration function. This specialized logic can exist as a separate card connected to the bus 15 but this would take up a card space on bus 15. An alternative would be to made arbitrator 13 a portion of the bus 15 logic itself. This, however, makes the bus implementation more complicated and can also make diagnosis and repair of arbitration problems more difficult the more the central arbitrator 13 is integrated into bus 15.

Referring now to FIG. 6, memory 11, processors 3 and IO processor 5 connect to bus 15, in the preferred embodiment of the present invention, where a distributed arbitration scheme is used instead of a central arbitrator. This distributed arbitration scheme allows an individual processor, contending with other processors, to access the local memory 11 by having each processor 3 and IO processor 5 separately and individually handle the arbitration process. This distributed arbitration scheme, although requiring additional logic in each processor card, distributes this function thus simplifying the bus 15 implementation and thus avoids having to use an additional slot to handle the arbitration requirements.

A further advantage of the distributed arbitration approach is the reduction of bus traffic between processors and a central arbitrator. By merely having those processors who wish to access memory 11 contend for that access by handling their own arbitration in a distributed manner, the only additional bus traffic for arbitration is that between processors. This distributed arbitration scheme thus eliminates contention for memory 11 as well as contention for a central arbitrator. The implementation of this distributed arbitration scheme is explained more fully below.

Referring now to FIG. 7, timing charts showing various bus implementations can be seen. In the first timing signal, a combined bus which handles all arbitration and any addressing or command signals is shown. This timing signal, depicting a combined bus, shows the sequence of a first processor, wishing to access the bus/memory, issuing command 1 in the next cycle after arbitration 1. Next, a second processor, also wishing to access the bus/memory, issues a command after arbitrating for access, followed by a third, etc.

While this appears to make most efficient use of the bus as there are no idle periods in this sequence, this serial bus arbitration command sequence is not the most efficient methodology. This can be seen by comparing the first time line to the second and third time lines. The second and third time lines represent a separate arbitration bus 17 from an address/command bus 19. In this sequence, in the next cycle after arbitrating on the arbitration bus 17, a command may be issued on the address/command bus 19. This is followed by a second arbitration on the arbitration bus 17 and its associated command on the address/command bus 19.

However, merely splitting the bus into an arbitration bus 17 and an address/command bus 19 not only creates idle times in each of the respective buses it also does not improve performance. Comparing each of these two methodologies to the methodology of the present invention, which is shown by the 4th and 5th timing signals, an improved more efficient methodology can be seen. The 4th timing signal represents the arbitration bus 17 and the 5th timing signal represents a combination address/command bus 19 (which are part of the bus 15 of FIGS. 4-6). The difference is that the arbitration bus 17 and the address/command bus 19 are now overlapped or pipelined, as opposed to a straight sequential process as was shown by the 2nd and 3rd timing signals. In this sequence, after the first arbitration has begun and a processor has won that arbitration, that processor may immediately issue a command on the address command bus. Once it is known that a command is about to complete, the next arbitration cycle can begin thus allowing the next arbitration winner to issue a command immediately following the completion of the first arbitration winner's command, and so on vis-a-vis the third arbitration, etc. Please note that in these timing diagrams a single cycle arbitration and a single cycle command has been shown. In the preferred embodiment of the present invention, arbitration is a two cycle process and command issuance requires one or more cycles depending upon such things as the complexity of the command being issued.

Referring now to FIG. 8, the arbitration process is shown by a state diagram. The arbitration process involves the resolution of contention for bus access among the various processors in an arbitration group. Beginning with an idle condition, state 27, whenever one or more processors wish to access memory each of those processors raises its own arbitration signal. Every processor, in state 21, then inputs all other processor's arbitration signals thus forming an arbitration group by inputting and latching all of the arbitration signals. Following the latch state 21, each of the respective processors in the arbitration group compares its priority to all the other processors in the arbitration group. This comparison is done in the resolution state 23. The winner of this resolution of the arbitration group is the processor which next issues a command and/or accesses the memory. If this access command sequence is to take more than one cycle then the wait state 25 is achieved. And if this was the last processor in the group to win the arbitration resolution and gain access, then the bus would return to an idle state once the last processor completes its operation. However, if there are more processors remaining in the arbitration group, then following the resolution or wait states all processors return to the latch state 21 and again input all arbitration signals from the remaining processors in the arbitration group. Note that the processors from the group who have already had access are no longer part of the arbitration group and they would no longer be outputting an arbitration signal to request access. The cycle is then repeated through the resolution state, and possibly the wait state, until there are no more processors remaining in the arbitration group. If other processors wish to access memory while the first arbitration group is arbitrating for access, these other processors (and those processors of that earlier arbitration group who have already had access) must wait until the last processor in the present group has completed the resolution state before they can form a new group. This avoids wasting a cycle going through the idle state.

One of the advantages of the present invention is that of providing fairness (equal access over time) to all potential bus contenders. Although one processor may get slightly faster access in a given arbitration group due to that processor's arbitration priority, by using the arbitration group approach each processor is assured of fairness. On the average, a request would only have to wait for the duration of one group resolution time before being granted bus access. In this way, no processor should have to endure starvation.

Referring now to FIG. 9, a timing diagram is shown which represent four different processors all potentially contending for bus and/or memory access. Each of the signal lines 1 through 4 represent one of the processors. The 5th signal line is a logical representation of when the last processor in an arbitration group is resolving its arbitration (LastSlot). The 6th line is the address/command bus 19 whereby commands are issued following arbitration.

Reviewing the arbitration sequence shows signal lines 1 and 3 raised in the first cycle. This means that processor 1 and processor 3 both wish to access the bus and/or memory and have indicated this desire by outputting an arbitration signal. All processors then input and latch these arbitration signals and all of those within the group, namely 1 and 3, go through the resolution state to determine who gets access. In this case processor 1, having a higher priority than processor 3, first gains access through the arbitration resolution and issues a command on the address/command bus 19. While this command is being issued, due to the pipelined operation of this arbitration scheme, processor 6 and all the other processors input and latch all of the arbitration signals. Additionally, as is discussed below with reference to the interrupt protocol, the last arbitration winner is saved by each processor.

Now, because it is only processor 3 who is outputting an arbitration signal, processor 3 wins the following arbitration resolution and can then issue a command in the following cycle. Simultaneously with this resolution, due to LastSlot being logically raised as is shown by the 5th timing signal, any other processor wishing to form an arbitration group may then raise their arbitration signal in the next clock cycle. However, please note that LastSlot is merely a signal generated internally by each agent and is not actually a separate signal line as is depicted in FIG. 9. In this example, processor 2 and processor 4 who now wish to access memory, raise their arbitration signals and go through a latch input, latch resolution arbitration cycle in order to issue their commands or access memory.

Note that LastSlot is merely the condition whereby the last processor in an arbitration group is latching in and resolving its own arbitration signal. Last slot equals the logical sum of the parts comprising the equation: (1.multidot.2.multidot.3.multidot.4)+(1.multidot.2.multidot.3.multidot.4)+ (1.multidot.2.multidot.3.multidot.4)+(1.multidot.2.multidot.3.multidot.4). Stated differently, new group formation may occur if the arbitration bus is either idle or the arbitration sequence is either in a resolution or wait state and the LastSlot condition exists.

In the preferred embodiment of the present invention, to support the split transaction read response protocol as is discussed in more detail below, there are additional considerations before an arbitration winner can issue a command on the address/command bus 19. Once a processor has won in arbitration, if the processor is trying to issue a command which does not require access to the data bus (a separate bus to handle data transfers, as is discussed below with reference to the split transaction protocol), then the processor is free to issue the command. However, if the processor is trying to issue a command which does require access to the data bus then the processor must take the additional step, beyond arbitration, of ensuring that the data bus is available. This extra step, really comprising two steps, is to first check to see whether the Data Strobe (DS) signal is asserted (thus indicating that the data bus is busy handling a prior transaction), and secondly, to check to see whether the Ordered (ORD) response signal line is asserted (thus indicating that an ordered split read response is pending). See the discussion below for when and how the ORD signal gets asserted. Once both the DS signal and the ORD signal are not asserted, the processor is free to issue a command on the address/command bus 19 and, after asserting the DS signal line, can then transmit data on the data bus.

Referring now to FIG. 10, in the preferred embodiment of the present invention multiple processors exist, each residing on a card which fits into a slot in the back plane which makes up the bus of the system. In order for each processor to be able to resolve arbitration, it must know of each other processor in an arbitration group. Thus, each processor in an arbitration group must be able to discern each other processor in the arbitration group. The way this is handled in the preferred embodiment of the present invention is for each processor to output a distinct arbitration signal when it wishes to arbitrate to gain access to the bus and/or memory.

To arbitrate for bus access, each processor asserts the first signal line on the card and inputs the other remaining arbitration signal lines. Then, in one embodiment of the present invention because the back plane connector signal lines are rotated, each card is asserting its own respective signal line across the back plane to all the other cards on the bus. This is shown in FIG. 10 by having processing card 29 assert signal line number 0, when it wishes to arbitrate for access, and inputs lines 1 through 9 to determine which other processors are vying for access. Processor card 29 plugs into connector 31 of back plane P2. Connector 31 has ten signal lines, each of which is rotated one position as compared to its neighboring slot connector. Thus, in the table to the right of FIG. 10, processor card slot number 0 has the first signal line connected to connector 31, wire 0, while processor card slot number 1 has the first signal line connected to connector 31, wire 1, etc. In this way, each processor/card can assert the same signal line on the processor/card yet othe