WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Transaction activation processor for controlling memory transaction processing in a packet switched cache coherent multiprocessor system    
United States Patent5905998   
Link to this pagehttp://www.wikipatents.com/5905998.html
Inventor(s)Ebrahim; Zahir (Mountain View, CA); Nishtala; Satyanarayana (Cupertino, CA); Van Loo; William C. (Palo Alto, CA); Normoyle; Kevin (San Jose, CA); Loewenstein; Paul (Palo Alto, CA); Coffin, III; Louis F. (San Jose, CA)
AbstractA multiprocessor computer system has a multiplicity of sub-systems and a main memory coupled to a system controller. Some of the sub-systems are data processors, each having a respective cache memory that stores multiple blocks of data and a respective set of master cache tags (Etags), including one Etag for each data block stored by the cache memory. Each data processor includes an interface for sending memory transaction requests to the system controller and for receiving cache transaction requests from the system controller corresponding to memory transaction requests by other ones of the data processors. The system controller includes transaction activation logic for activating each said memory transaction request when it meets predefined activation criteria, and for blocking each said memory transaction request until the predefined activation criteria are met. An active transaction status table stores status data representing memory transaction requests that have been activated, including an address value for each activated transaction. The transaction activation logic includes comparator logic for comparing each memory transaction request with the active transaction status data for all activated memory transaction requests so as to detect whether activation of a particular memory transaction request would violate the predefined activation criteria. With certain exceptions concerning writeback transactions, an incoming transaction for accessing a data block that maps to the same cache line a pending, previously activated transaction, will be blocked until the pending transaction that maps to the same cache line is completed.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5905998
Transaction activation processor for controlling memory transaction

     processing in a packet switched cache coherent multiprocessor system - US Patent 5905998 Drawing
Transaction activation processor for controlling memory transaction processing in a packet switched cache coherent multiprocessor system
Inventor     Ebrahim; Zahir (Mountain View, CA); Nishtala; Satyanarayana (Cupertino, CA); Van Loo; William C. (Palo Alto, CA); Normoyle; Kevin (San Jose, CA); Loewenstein; Paul (Palo Alto, CA); Coffin, III; Louis F. (San Jose, CA)
Owner/Assignee     Sun Microsystems, Inc. (Mountain View, CA)
Patent assignment
All assignments
Publication Date     May 18, 1999
Application Number     08/858,792
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 19, 1997
US Classification     711/144 711/121 711/143 711/145 711/146
Int'l Classification     G06F 12//08
Examiner     Chan; Eddie P.
Assistant Examiner     Kim; Hong C.
Attorney/Law Firm     Herbert, Caserza; Steven F. Flehr Hohbach Test Albritton &
Address
Parent Case     This application is a continuation of patent application Ser. No. 08/414,772, filed Mar. 31, 1995, now U.S. Pat. No. 5,655,100.
Priority Data    
USPTO Field of Search     711/144 711/121 711/146 711/143 711/150 711/145 711/120 711/141
Patent Tags     transaction activation processor controlling memory transaction processing packet switched cache coherent multiprocessor
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5504874
Galles
711/145
Apr,1996

[0 after 0 votes]
5490261
Bean
711/121
Feb,1996

[0 after 0 votes]
5442758
Slingwine
707/8
Aug,1995

[0 after 0 votes]
5434993
Liencres

Jul,1995

[0 after 0 votes]
5432918
Stamm

Jul,1995

[0 after 0 votes]
5428761
Herlihy
711/130
Jun,1995

[0 after 0 votes]
5222224
Flynn
711/144
Jun,1993

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A computer system, comprising:

a system controller;

a multiplicity of sub-systems coupled to the system controller;

a main memory coupled to the system controller; and

a plurality of the sub-systems comprising data processors, a plurality of the data processors each having a respective cache memory that stores multiple blocks of data and a respective set of cache tags, including one cache tag for each data block stored by the cache memory;

each of the plurality of data processors including an interface, coupled to the system controller, for sending memory transaction requests to the system controller; the interface for each of the data processors that has a cache memory including circuitry for receiving cache transaction requests from the system controller corresponding to memory transaction requests by other ones of the data processors; each memory transaction request having an associated address value;

the system controller including:

transaction activation logic for activating each memory transaction request when it meets predefined activation criteria, and for blocking each memory transaction request until the predefined activation criteria are met; wherein the predefined activation criteria include an address conflict criterion that is a function of the address value associated with the memory transaction request and the address value of activated memory transaction requests;

an active transaction status table that stores active transaction status data representing memory transaction requests which have been activated by the transaction activation logic, the active transaction status data including data for each activated transaction representing an address value associated with the transaction; the active transaction status data including data representing memory transaction requests received from the plurality of data processors; and

memory transaction request logic for processing the memory transaction request after it has been activated by the transaction activation logic;

the transaction activation logic including parallel comparison logic for simultaneously comparing one not-yet-activated memory transaction request with the stored active transaction status data for all activated memory transaction requests so as to detect whether activation of the each memory transaction request would violate the predefined activation criteria with respect to any of the activated memory transaction requests;

wherein the transaction activation logic blocks all transaction requests by any of the data processors that violate the predefined activation criteria with respect to any memory transaction that has already been activated.

2. The computer system of claim 1,

the comparison logic blocking the transaction activation logic from activating the one memory transaction request when (A) a cache index portion of the address value associated with the memory transaction request matches a corresponding portion of the address value associated with any of the activated memory transaction requests represented in the active transaction status table, without regard to which data processors sent the activated memory transaction requests to the system controller, unless (B) the memory transaction request and the activated memory transaction request with the matching cache index comprise a read transaction request and a writeback transaction request.

3. The computer system of claim 1,

the interface for the data processor including at least two parallel outbound request queues for storing memory transaction requests to be sent to the system controller; the memory transaction requests including read requests and write requests; each memory transaction request specifying an address for an associated data block to be read or written;

wherein the system controller processes the memory transaction requests stored in the outbound request queues in accordance with resource availability such that a first of the memory transaction requests stored in one of the outbound request queues may be processed later by the system controller than a second of the memory transaction requests stored in a second of the outbound request queues even when the second memory transaction request is stored in the second outbound request queue later than when the first memory transaction request is stored in the first outbound request queue.

4. The computer system of claim 3,

the data processors each storing read requests in a first of the outbound request queues and storing writeback requests in a second of the outbound request queues such that the system controller can process in parallel a read request and a writeback request from a same one of the data processors and that both reference address values having matching cache index values.

5. A method of operating a cache coherent multiprocessor system having a system controller coupled to a main memory and to a plurality of data processors each of which has a cache memory, comprising the steps of:

sending memory transaction requests from each of the plurality of data processors to the system controller, each memory transaction request having an associated address value;

activating the memory transaction request when it meets predefined activation criteria, and blocking the memory transaction request until the predefined activation criteria are met; wherein the predefined activation criteria include an address conflict criterion that is a function of the address value associated with the memory transaction request and the address value associated with activated memory transaction requests;

storing active transaction status data representing memory transaction requests which have been activated, the active transaction status data including data for each activated transaction representing an address value associated with the transaction; the active transaction status data including data representing memory transaction requests received from the plurality of data processors; and

processing the memory transaction request after it has been activated by the transaction activation logic;

the activating step including simultaneously comparing one not-yet-activated memory transaction request with the stored active transaction status data for all activated memory transaction requests so as to detect whether activation of the one memory transaction request would violate the predefined activation criteria with respect to any of the activated memory transaction requests;

wherein the activating step includes blocking all transaction requests by any of the data processors that violate the predefined activation criteria with respect to memory transaction that has already been activated.

6. The method of claim 5,

the comparing step blocking the activating step from activating the memory transaction request when (A) a cache index portion of the address value associated with the memory transaction request matches a corresponding portion of the address value associated with any of the activated memory transaction requests represented in the active transaction status table, without regard to which data processors sent the activated memory transaction requests to the system controller, unless (B) the memory transaction request and the activated memory transaction request with the matching cache index comprise a read transaction request and a writeback transaction request.

7. The method of claim 5,

at each data processor, storing the memory transaction requests to be sent to the system controller in at least two parallel outbound request queues associated with that data processor; the memory transaction requests including read requests and write requests; each memory transaction request specifying an address for an associated data block to be read or written;

at each data processor, storing a respective set of cache tags, including one cache tag for each data block stored in the data processor's cache memory;

the processing step includes processing the memory transaction requests stored in the outbound request queues in accordance with resource availability such that a first of the memory transaction requests stored in one of the outbound request queues may be processed later by the system controller than a second of the memory transaction requests stored in a second of the outbound request queues even when the second memory transaction request is stored in the second outbound request queue later than when the first memory transaction request is stored in the first outbound request queue.

8. The method of claim 7,

at each data processor, storing read requests in a first of the outbound request queues and storing writeback requests in a second of the outbound request queues such that the system controller can process in parallel a read request and a writeback request from a same one of the data processors and that both reference address values having matching cache index values.
 Description Submit all comments and votes
 


The present invention relates generally to multiprocessor computer systems in which the processors share memory resources, and particularly to a multiprocessor computer system that utilizes an interconnect architecture and cache coherence methodology to minimize memory access latency so as to maximize computational throughput.

BACKGROUND OF THE INVENTION

The need to maintain "cache coherence" in multiprocessor systems is well known. Maintaining "cache coherence" means, at a minimum, that whenever data is written into a specified location in a shared address space by one processor, the caches for any other processors which store data for the same address location are either invalidated, or updated with the new data.

There are two primary system architectures used for maintaining cache coherence. One, herein called the cache snoop architecture, requires that each data processor's cache include logic for monitoring a shared address bus and various control lines so as to detect when data in shared memory is being overwritten with new data, determining whether it's data processor's cache contains an entry for the same memory location, and updating its cache contents and/or the corresponding cache tag when data stored in the cache is invalidated by another processor. Thus, in the cache snoop architecture, every data processor is responsible for maintaining its own cache in a state that is consistent with the state of the other caches.

In a second cache coherence architecture, herein called the memory reference architecture, main memory includes a set of status bits for every block of data that indicate which data processors, if any, have the data block stored in cache. The main memory's status bits may store additional information, such as which processor is considered to be the "owner" of the data block if the cache coherence architecture requires storage of such information.

The present invention utilizes a different cache in which each data processor that has a cache memory maintains a master cache index and a system controller maintains a duplicate cache index for each such cache memory. For each memory transaction by a data processor in which there is a cache miss or other state change requiring communication with either main memory or other cache memories in order to maintain cache coherence, the System Controller does a cache index lookup on all the duplicate cache lookups, the system controller selects the sequence of actions needed to perform the memory transaction.

In prior art "snoop bus" cache coherence systems, it is impossible for two overlapping memory transactions to simultaneously access the same address because only one cache coherent memory transaction can be performed at a time, and that transaction is broadcast to all data processors so that they can "snoop" on the address and control busses and thereby keep their local cache memories consistent with memory transactions performed by other data processors.

In prior art memory reference architecture cache coherent systems it is also impossible for two overlapping memory transactions to simultaneously access the same address because the memory reference logic inherently serializes such memory transactions.

In the "duplicate cache tag" architecture of the present invention in which multiple data processors can initiate memory transactions and in which the interconnect can process multiple memory transactions simultaneously, there needs to be a mechanism to avoid "coherence hazards." In particular, the pipelined execution of transactions in the present invention results in multiple transactions being active simultaneously in the System Controller. This would lead to coherence hazards in the system if multiple active transactions shared the same cache index in the Dtags. To avoid such hazards, the System Controller utilizes special transaction activation logic that blocks a first memory transaction from becoming active if a second memory transaction that is already active is using the same cache index as would be used by the first memory transaction. One important exception to this transaction activation blocking is that writeback transactions do not need to be blocked and do not cause other transactions to be blocked.

SUMMARY OF THE INVENTION

In summary, the present invention is a multiprocessor computer system that has a multiplicity of sub-systems and a main memory coupled to a system controller. An interconnect module, interconnects the main memory and sub-systems in accordance with interconnect control signals received from the system controller.

At least two of the sub-systems are data processors, each having a respective cache memory that stores multiple blocks of data and a respective set of master cache tags (Etags), including one cache tag for each data block stored by the cache memory.

Each data processor includes a master interface for sending memory access requests to the system controller and for receiving cache access requests from the system controller corresponding to memory access requests by other ones of the data processors. The system controller includes memory access request logic for processing each memory access request by a data processor, for determining which one of the cache memories and main memory to couple to the requesting data processor, for sending corresponding interconnect control signals to the interconnect module so as to couple the requesting data processor to the determined one of the cache memories and main memory, and for sending a reply message to the requesting data processor to prompt the requesting data processor to transmit or receive one data packet to or from the determined one of the cache memories and main memory.

The system controller includes transaction activation logic for activating each memory transaction request when it meets predefined activation criteria, and for blocking each memory transaction request until the predefined activation criteria are met. An active transaction status table stores status data representing memory transaction requests that have been activated, including an address value for each activated transaction. The transaction activation logic includes comparator logic for comparing each memory transaction request with the active transaction status data for all activated memory transaction requests so as to detect whether activation of a particular memory transaction request would violate the predefined activation criteria. With certain exceptions concerning writeback transactions, an incoming transaction for accessing a data block that maps to the same cache line as a pending, previously activated transaction, will be blocked until the pending transaction that maps to the same cache line is completed.

BRIEF DESCRIPTION OF THE DRAWINGS

Additional objects and features of the invention will be more readily apparent from the following detailed description and appended claims when taken in conjunction with the drawings, in which:

FIG. 1 is a block diagram of a computer system incorporating the present invention.

FIG. 2 is a block diagram of a computer system showing the data bus and address bus configuration used in one embodiment of the present invention.

FIG. 3 depicts the signal lines associated with a port in a preferred embodiment of the present invention.

FIG. 4 is a block diagram of the interfaces and port ID register found in a port in a preferred embodiment of the present invention.

FIG. 5 is a block diagram of a computer system incorporating the present invention, depicting request and data queues used while performing data transfer transactions.

FIG. 6 is a block diagram of the System Controller Configuration register used in a preferred embodiment of the present invention.

FIG. 7 is a block diagram of a caching UPA master port and the cache controller in the associated UPA module.

FIGS. 8, 8A, 8B, 8C, and 8D show a simplified flow chart of typical read/write data flow transactions in a preferred embodiment of the present invention.

FIG. 9 depicts the writeback buffer and Dtag Transient Buffers used for handling coherent cache writeback operations.

FIGS. 10A, 10B, 10C, 10D and 10E shows the data packet formats for various transaction request packets.

FIG. 11 is a state transition diagram of the cache tag line states for each cache entry in an Etag array in a preferred embodiment of the present invention.

FIG. 12 is a state transition diagram of the cache tag line states for each cache entry in an Dtag array in a preferred embodiment of the present invention.

FIG. 13 depicts the logic circuitry for activating transactions.

FIGS. 14A-14D are block diagrams of status information data structures used by the system controller in a preferred embodiment of the present invention.

FIG. 15 is a block diagram of the Dtag lookup and update logic in the system controller in a preferred embodiment of the present invention.

FIG. 16 is a block diagram of the S.sub.-- Request and S.sub.-- Reply logic in the system controller in a preferred embodiment of the present invention.

FIG. 17 is a block diagram of the datapath scheduler in a preferred embodiment of the present invention.

FIG. 18 is a block diagram of the S.sub.-- Request and S.sub.-- Reply logic in the system controller in a second preferred embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The following is a glossary of terms used in this document.

Cache Coherence: keeping all copies of each data block consistent.

Tag: a tag is a record in a cache index for indicating the status of one cache line and for storing the high order address bits of the address for the data block stored in the cache line.

Etag: the primary array of cache tags for a cache memory. The Etag array is accessed and updated by the data processor module in a UPA port.

Dtag: a duplicate array of cache tags maintained by the system controller.

Interconnect: The set of system components that interconnect data processors, I/O processors and their ports. The "interconnect" includes the system controller 110, interconnect module 112, data busses 116, address busses 114, and reply busses 120 (for S.sub.-- REPLY's), 122 (for P.sub.-- REPLY's) in the preferred embodiment.

Victim: a data block displaced from a cache line

Dirty Victim: a data block that was updated by the associated data processor prior to its being displaced from the cache by another data block. Dirty victims must normally be written back to main memory, except that in the present invention the writeback can be canceled if the same data block is invalidated by another data processor prior to the writeback transaction becoming "Active."

Line: the unit of memory in a cache memory used to store a single data block.

Invalidate: changing the status of a cache line to "invalid" by writing the appropriate status value in the cache line's tag.

Master Class: an independent request queue in the UPA port for a data processor. A data processor having a UPA port with K master classes can issue transaction requests in each of the K master classes. Each master class has its own request FIFO buffer for issuing transaction requests to the System Controller as well as its own distinct inbound data buffer for receiving data packets in response to transaction requests and its own outbound data buffer for storing data packets to be transmitted.

Writeback: copying modified data from a cache memory into main memory.

The following is a list of abbreviations used in this document:

DVMA: direct virtual memory access (same as DMA, direct memory access for purposes of this document)

DVP: dirty victim pending

I/O: input/output

IVP: Invalidate me Advisory

MOESI: the five Etag states: Exclusive Modified (M), Shared Modified (O), Exclusive Clean (E), Shared Clean (S), Invalid (I).

MOSI: the four Dtag states: Exclusive and Potentially Modified (M), Shared Modified (O), Shared Clean (S), Invalid (I).

NDP: no data tag present

PA›xxx!: physical address ›xxx!

SC: System Controller

UPA: Universal Port Architecture

Referring to FIG. 1, there is shown a multiprocessor computer system 100 incorporating the computer architecture of the present invention. The multiprocessor computer system 100 includes a set of "UPA modules." UPA modules 102 include data processors as well as slave devices such as I/O handlers and the like. Each UPA module 102 has a port 104, herein called a UPA port, where "UPA" stands for "universal port architecture." For simplicity, UPA modules and their associated ports will often be called, collectively, "ports" or "UPA ports," with the understanding that the port or UPA port being discussed includes both a port and its associated UPA module.

The system 100 further includes a main memory 108, which may be divided into multiple memory banks 109 Bank.sub.0 to Bank.sub.m, a system controller 110, and an interconnect module 112 for interconnecting the ports 104 and main memory 108. The interconnect module 112, under the control of datapath setup signals from the System Controller 110, can form a datapath between any port 104 and any other port 104 or between any port 104 and any memory bank 109. The interconnect module 112 can be as simple as a single, shared data bus with selectable access ports for each UPA port and memory module, or can be a somewhat more complex crossbar switch having m ports for m memory banks and n ports for n UPA ports, or can be a combination of the two. The present invention is not dependent on the type of interconnect module 112 used, and thus the present invention can be used with many different interconnect module configurations.

A UPA port 104 interfaces with the interconnect module 112 and the system controller 110 via a packet switched address bus 114 and packet switched data bus 116 respectively, each of which operates independently. A UPA module logically plugs into a UPA port. The UPA module 102 may contain a data processor, an I/O controller with interfaces to I/O busses, or a graphics frame buffer. The UPA interconnect architecture in the preferred embodiment supports up to thirty-two UPA ports, and multiple address and data busses in the interconnect. Up to four UPA ports 104 can share the same address bus 114, and arbitrate for its mastership with a distributed arbitration protocol.

The System Controller 110 is a centralized controller and performs the following functions:

Coherence control;

Memory and Datapath control; and

Address crossbar-like connectivity for multiple address busses.

The System Controller 110 controls the interconnect module 112, and schedules the transfer of data between two UPA ports 104, or between UPA port 104 and memory 108. The architecture of the present invention supports an arbitrary number of memory banks 109. The System Controller 110 controls memory access timing in conjunction with datapath scheduling for maximum utilization of both resources.

The System Controller 110, the interconnect module 112, and memory 108 are in the "interconnect domain," and are coupled to UPA modules 102 by their respective UPA ports 104. The interconnect domain is fully synchronous with a centrally distributed system clock signal, generated by a System Clock 118, which is also sourced to the UPA modules 104. If desired, each UPA module 102 can synchronize its private internal clock with the system interconnect clock. All references to clock signals in this document refer to the system clock, unless otherwise noted.

Each UPA address bus 114 is a 36-bit bidirectional packet switched request bus, and includes 1-bit odd-parity. It carries address bits PA›40:4! of a 41-bit physical address space as well as transaction identification information.

Referring to FIGS. 1 and 2, there may be multiple address busses 114 in the system, with up to four UPA ports 104 on each UPA address bus 114. The precise number of UPA address busses is variable, and will generally be dependent on system speed requirements. Since putting more ports on an address bus 114 will slow signal transmissions over the address bus, the maximum number of ports per address bus will be determined by the signal transmission speed required for the address bus.

The datapath circuitry (i.e., the interconnect module 112) and the address busses 114 are independently scaleable. As a result, the number of address busses can be increased, or decreased, for a given number of processors so as to optimize the speed/cost tradeoff for the transmission of transaction requests over the address busses totally independently of decisions regarding the speed/cost tradeoffs associated with the design of the interconnect module 112.

FIG. 3 shows the full set of signals received and transmitted by a UPA port having all four interfaces (described below) of the preferred embodiment. Table 1 provides a short description of each of the signals shown in FIG. 3.

TABLE 1 ______________________________________ UPA Port Interface Signal Definitions Signal Name Description ______________________________________ Data Bus Signals UPA.sub.-- Databus›128! 128-bit data bus. Depending on speed requirements and the bus technology used, a system can have as many as one 128-bit data bus for each UPA port, or each data bus can be shared by several ports. UPA.sub.-- ECC›16! Bus for carrying error correction codes. UPA.sub.-- ECC<15:8> carries the ECC for UPA.sub.-- Databus<127:64>. UPA.sub.-- ECC<7:0> carries the ECC for UPA.sub.-- Databus<63:0>. UPA.sub.-- ECC.sub.-- Valid ECC valid. A unidirectional signal from the System Controller to each UPA port, driven by the System Controller to indicate whether the ECC is valid for the data on the data bus. Address Bus Signals UPA.sub.-- Addressbus›36! 36-bit packet switched transaction request bus. See packet format in FIGS. 9A, 9B, 9C. UPA.sub.-- Req.sub.-- In›3! Arbitration request lines for up to three other UPA ports that might be sharing this UPA.sub.-- Addressbus. UPA.sub.-- Req.sub.-- Out Arbitration request from this UPA port. UPA.sub.-- SC.sub.-- Req.sub.-- In Arbitration request from System Controller. UPA.sub.-- Arb.sub.-- Reset.sub.-- L Arbitration Reset, asserted at the same time that UPA.sub.-- Reset.sub.-- L is asserted. UPA.sub.-- AddrValid There is a separate, bidirectional, address valid signal line between the System Controller and each UPA port. It is driven by the port which wins the arbitration or by the System Controller when it drives the address bus. UPA.sub.-- Data.sub.-- Stall Data stall signal, driven by the System Controller to each UPA port to indicate, during transmission of a data packet, whether there is a data stall in between quad-words of a data packet. Reply Signals UPA.sub.-- P.sub.-- Reply›5! Port's reply packet, driven by a UPA port directly to the System Controller. There is a dedicated UPA.sub.-- P.sub.-- Reply bus for each UPA port. UPA.sub.-- S.sub.-- Reply›6! System Controller's reply packet, driven by System Controller directly to the UPA port. There is a dedicated UPA.sub.-- S.sub.-- Reply bus for each UPA port. Miscellaneous Signals: UPA.sub.-- Port.sub.-- ID›5! Five bit hardwired UPA Port Identification. UPA.sub.-- Reset.sub.-- L Reset. Driven by System Controller at power-on and on any fatal system reset. UPA.sub.-- Sys.sub.-- Clk›2! Differential UPA system clock, supplied by the system clock to all UPA ports. UPA.sub.-- CPU.sub.-- Clk›2! Differential processor clock, supplied by the system clock controller only to processor UPA ports. UPA.sub.-- Speed›3! Used only for processor UPA ports, this hard- wired three bit signal encodes the maximum speed at which the UPA port can operate. UPA.sub.-- IO.sub.-- Speed Used only by IO UPA ports, this signal encodes the maximum speed at which the UPA port can operate. UPA.sub.-- Ratio Used only for processor UPA ports, this signal encodes the ratio of the system clock to the processor clock, and is used by the processor to internally synchronize the system clock and processor clock if it uses a synchronous internal interface. UPA.sub.-- JTAG›5! JTAG scan control signals, TDI, TMS, TCLK, TRST.sub.-- L and TDO. TDO is output by the UPA port, the others are inputs. UPA.sub.-- Slave.sub.-- Int.sub.-- L Interrupt, for slave-only UPA ports. This is a dedicated line from the UPA port to the System Controller. UPA.sub.-- XIR.sub.-- L XIR reset signal, asserted by the System Controller to signal XIR reset. ______________________________________

A valid packet on the UPA address bus 114 is identified by the driver (i.e., the UPA port 104 or the System Controller 110) asserting the UPA.sub.-- Addr.sub.-- valid signal.

The System Controller 110 is connected to each UPA address bus 114 in the system 100. The UPA ports 104 and System Controller 110 arbitrate for use of each UPA address bus 114 using a distributed arbitration protocol. The arbitration protocol is described in patent application Ser. No. 08/414,559, filed Mar. 31, 1995, now U.S. Pat. No. 5,710,891, which is hereby incorporated by reference.

UPA ports do not communicate directly with other UPA ports on a shared UPA address bus 114. Instead, when a requesting UPA port generates a request packet that requests access to an addressed UPA port, the System Controller 110 forwards a slave access to the addressed UPA port by retransmitting the request packet and qualifying the destination UPA port with its UPA.sub.-- Addr.sub.-- valid signal.

A UPA port also does not "snoop" on the UPA address bus to maintain cache coherence. The System Controller 110 performs snooping on behalf of those UPA ports whose respective UPA modules include cache memory using a write-invalidate cache coherence protocol described below. The UPA address bus 114 and UPA data bus 116 coupled to any UPA port 104 are Independent. An address is associated with its data through ordering rules discussed below.

The UPA data bus is a 128-bit quad-word bidirectional data bus, plus 16 additional ECC (error correction code) bits. A "word" is defined herein to be a 32-bit, 4-byte datum. A quad-word consists of four words, or 16 bytes. In some embodiments, all or some of the data busses 116 in the system 110 can be 64-bit double word bidirectional data bus, plus 8 additional bits for ECC. The ECC bits are divided into two 8-bit halves for the 128-bit wide data bus. Although the 64-bit wide UPA data bus has half as many signal lines, it carries the same number of bytes per transaction as the 128-bit wide UPA data bus, but in twice the number of clock cycles. In the preferred embodiment, the smallest unit of coherent data transfer is 64 bytes, requiring four transfers of 16 bytes during four successive system clock cycles over the 128-bit UPA data bus.

A "master" UPA port, also called a UPA master port, is herein defined to be one which can initiate data transfer transactions. All data processor UPA modules must have a master UPA port 104.

Note that graphics devices, which may include some data processing capabilities, typically have only a slave interface. Slave interfaces are described below. For the purposes of this document, a "data processor" is defined to be a programmable computer or data processing device (e.g., a microprocessor) that both reads and writes data from and to main memory. Most, but not necessarily all, "data processors" have an associated cache memory. For instance, an I/O controller is a data processor and its UPA port will be a master UPA port. However, in may cases an I/O controller will not have a cache memory (or at least not a cache memory for storing data in the coherence domain).

A caching UPA master port is a master UPA port for a data processor that also has a coherent cache. The caching UPA master port participates in the cache coherence protocol.

A "slave" UPA port is herein defined to be one which cannot initiate data transfer transactions, but is the recipient of such transactions. A slave port responds to requests from the System Controller. A slave port has an address space associated with it for programmed I/O. A "slave port" within a master UPA port (i.e., a slave interface within a master UPA port) also handles copyback requests for cache blocks, and handles interrupt transactions in a UPA port which contains a data processor.

Each set of 8 ECC bits carry Shigeo Kaneda's 64-bit SEC-DED-S4ED code. The interconnect does not generate or check ECC. Each UPA port sourcing data generates the corresponding ECC bits, and the UPA port receiving the data checks the ECC bits. UPA ports with master capability support ECC. Slave-only UPA port containing a graphics framebuffer need not support ECC (See UPA.sub.-- ECC.sub.-- Valid signal).

The UPA data bus 116 is not a globally shared common data bus. As shown in FIGS. 1 and 2, there may be more than one UPA data bus 116 in the system, and the precise number is implementation specific. Data is always transferred in units of 16 bytes per clock-cycle on the 128-bit wide UPA data bus, and in units of 16 bytes per two clock-cycles on the 64-bit wide UPA data bus.

The size of each cache line in the preferred embodiment is 64 bytes, or sixteen 32-bit words. As will be described below, 64 bytes is the minimum unit of data transfer for all transactions involving the transfer of cached data. That is, each data packet of cached data transferred via the interconnect is 64 packets. Transfers of non-cached data can transfer 1 to 16 bytes within a single quad-word transmission, qualified with a 16-bit bytemask to indicate which bytes within the quad-word contain the data being transferred.

System Controller 110 schedules a data transfer on a UPA data bus 116 using a signal herein called the S.sub.-- REPLY. For block transfers, if successive quadwords cannot be read or written in successive clock cycles from memory, the UPA.sub.-- Data.sub.-- Stall signal is asserted by System Controller 110 to the UPA port.

For coherent block read and copyback transactions of 64-byte data blocks, the quad-word (16 bytes) addressed on physical address bits PA›5:4! is delivered first, and the successive quad words are delivered in the wrap order shown in Table 2. The addressed quad-word is delivered first so that the requesting data processor can receive and begin processing the addressed quad-word prior to receipt of the last quad-word in the associated data block. In this way, latency associated with the cache update transaction is reduced. Non-cached block read and block writes of 64 byte data blocks are always aligned on a 64-byte block boundary (PA›5:4!=0.times.0).

Note that these 64-byte data packets are delivered without an attached address, address tag, or transaction tag. Address information and data are transmitted independently over independent busses. While this is efficient, in order to match up incoming data packets