WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Bus control system and method that selectively generate an early address strobe    
United States Patent5404464   
Link to this pagehttp://www.wikipatents.com/5404464.html
Inventor(s)Bennett; Brian R. (Laguna Niguel, CA)
AbstractAn improved bus architecture system for use in a multi-processor computer system has a shared address bus and a shared data bus, and has at least two separate memory modules. The system reduces the bus latency time by allowing sequential address requests to different memory modules to begin before previous cycles are terminated. Preferably, the physical memory is mapped onto several separate memory modules which will increase the probability that concurrent address requests from different processors on the common bus are for different memory modules. The processor address determines which memory module contains the data for a new request. If the memory module addressed by the new request differs from the memory module addressed by the current request, the bus controller may issue an early address request for the new data. While the early address request for the new request is being processed, the current bus cycle for the data located in the first memory module is completed on the shared data bus. Thus, the bus latency in a tightly-coupled multi-processor system can be significantly reduced using the improved bus architecture.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5404464
Bus control system and method that selectively generate an early address

     strobe - US Patent 5404464 Drawing
Bus control system and method that selectively generate an early address strobe
Inventor     Bennett; Brian R. (Laguna Niguel, CA)
Owner/Assignee     AST Research, Inc. (Irvine, CA)
Patent assignment
All assignments
Publication Date     April 4, 1995
Application Number     08/016,726
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     February 11, 1993
US Classification     710/306
Int'l Classification     G06F 012/00 G06F 013/14
Examiner     Ray; Gopal C.
Assistant Examiner    
Attorney/Law Firm     Knobbe, Martens, Olson & Bear
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/400 395/425 395/325 395/725 395/200 365/230.01 365/230.03 365/230.04 365/230.05 365/230.06
Patent Tags     bus control selectively generate early address strobe
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5261064
Wyland
711/211
Nov,1993

[0 after 0 votes]
5226134
Aldereguia
711/5
Jul,1993

[0 after 0 votes]
5214769
Uchida
711/151
May,1993

[0 after 0 votes]
5197140
Balmer
711/220
Mar,1993

[0 after 0 votes]
5140682
Okura
711/130
Aug,1992

[0 after 0 votes]
5043883
Inouchi
711/169
Aug,1991

[0 after 0 votes]
4969088
McAuliffe
709/241
Nov,1990

[0 after 0 votes]
4916692
Clarke
370/451
Apr,1990

[0 after 0 votes]
4797815
Moore
710/109
Jan,1989

[0 after 0 votes]
4796232
House
365/189.03
Jan,1989

[0 after 0 votes]
4669056
Waldecker
710/114
May,1987

[0 after 0 votes]
4594657
Byrns
710/241
Jun,1986

[0 after 0 votes]
4383297
Wheatley
710/3
May,1983

[0 after 0 votes]
4371929
Brann
710/45
Feb,1983

[0 after 0 votes]
4051551
Lawrie
711/168
Sep,1977

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. An improved bus control system for a multi-processor system, said multi-processor system having at least two memory modules, at least two processor modules, a common address bus and a common data bus connecting the memory modules to the processor modules, said control system comprising:

a bus control circuit on each of said processor modules, said bus control circuit further comprising:

a slot identification mapping circuit that determines a first slot identifier of a first one of the memory modules which contains a first address having data requested by a first data request and that determines a second slot identifier of a second one of the memory modules which contains a second address having data requested by a second data request;

a storage register that stores the first slot identifier;

a comparator that compares the first slot identifier with the second slot identifier; and

a first logic circuit that issues an early address request to the common address bus, if the first slot identifier differs from said second slot identifier.

2. A bus control system for a multi-processor system, said multi-processor system having multiple processor modules, at least two memory modules, a common address and a common data bus connecting the memory modules to the processor modules, said control system comprising:

a slot identification mapping circuit that determines a first slot identifier of a first one of the memory modules which contains a first address having data requested by a first data request and that determines a second slot identifier of a second one of the memory modules which contains a second address having data requested by a second data request;

a storage register that stores the first slot identifier;

a comparator that compares the first slot identifier with the second slot identifier;

a first logic circuit that issues an early address request to said common address bus if the first slot identifier differs from said second slot identifier; and

a second logic circuit that disables the issuance of the early address request by any one of the processor modules.

3. A method of improving the bus latency of a multi-processor system wherein multiple processors share a common address bus, a common data bus and a common control bus to multiple shared resources, said method comprising the steps of:

providing a first address on said common address bus;

determining in association with each processor that said first address is directed to a first shared resource;

initiating an access to said first shared resource;

generating a second address at a requesting processor prior to completion of said access to said first shared resource;

determining in association with said requesting processor that said second address is directed to a second shared resource different from said first shared resource, said step of determining that said second address is directed to a second shared resource different from said first shared resource further comprising the steps of:

determining a first resource identification from said first address;

saving said first resource identification;

determining a second resource identification from said second address;

comparing said second resource identification with said first resource identification; and

outputting a signal indicating that said second shared resource is different from said first shared resource when said second resource identification is different from said first resource identification; and initiating an access to said second shared resource prior to completion of said access to said first shared resource.

4. An improved multi-processor system comprising:

at least two memory modules;

at least two processor modules, each of said processor modules comprising a local address bus, a local data bus, and a local control bus; and

a common address bus, a common data bus and a common control bus connecting the memory modules to the processor modules,

each of said processor modules further comprising:

a static random access memory with first and second data input ports and a data output port, wherein said first data input port is connected to said local address bus and said second data input port is connected to said local data bus;

a storage register with an input port and an output port, said input port connected to said data output port of said static random access memory;

a comparator with first and second input ports and an output, wherein said first input port of said comparator is connected to said output port of said static random access memory and said second input port of said comparator is connected to said output port of said storage register;

a first gate with first and second inputs and an output, wherein said first input of said first gate is connected to said output of said comparator and said second input of said first gate is connected to said output port of said static random access memory; and

a second gate with first and second inputs and an output, wherein said first input of said second gate is connected to said output of said first gate, said second input of said second gate is connected to a first control signal provided by said common control bus to indicate the availability of said common address bus and said output of said second gate is a second control signal which is sent to said common control bus to indicate that an early address request can be sent to said common address bus.

5. An improved multi-processor system comprising:

at least two memory modules;

at least two processor modules, each of said processor modules comprising a local address bus, a local data bus, and a local control bus; and

a common address bus, a common data bus and a common control bus connecting the memory modules to the processor modules,

each of said processor modules further comprising:

a static random access memory with first and second data input ports and a data output port, wherein said first data input port is connected to said local address bus and said second data input port is connected to said local data bus;

a storage register with an input port and an output port, said input port connected to said data output port of said static random access memory, wherein said output port of said static random access memory is divided into a first set and a second set of data bits, wherein said first set of data bits indicates the memory module which contains data that is addressed by the local address bus and wherein said second set of bits indicates if an early address request is desired;

a comparator with first and second input ports and an output, wherein said first input port of said comparator is connected to said output port of said static random access memory and said second input port of said comparator is connected to said output port of said storage register;.

a first gate with first and second inputs and an output, wherein said first input of said first gate is connected to said output of said comparator and said second input of said first gate is connected to said output port of said static random access memory; and

a second gate with first and second inputs and an output, wherein said first input of said second gate is connected to said output of said first gate, said second input of said second gate is connected to a first control signal provided by said common control bus to indicate the availability of said common address bus and said output of said second gate is a second control signal which is sent to said common control bus to indicate that an early address request can be sent to said common address bus.

6. A bus control system for a computer system having multiple CPU modules and multiple shared resources, said control system comprising:

a shared bus for communicating address, data and control signals between said CPU modules and said shared resources;

a bus controller on each of said CPU modules for initiating accesses to said shared resources in response to addresses from said CPU modules, said bus controller comprising:

a decoder for decoding identifications of shared resources being addressed by addresses on said shared bus;

a storage device for storing a first identification of a first shared resource for which an access is in progress; and

a comparator for comparing said first identification with a second identification of a second shared resource for which an access is requested, said comparator providing an active output signal when said first identification and said second identification are different, said bus controller initiating an access to said second shared resource prior to completion of said access in progress in response to said active output signal.

7. A bus control system for a computer system having multiple CPU modules and multiple shared resources, said control system comprising:

a shared bus for communicating address, data and control signals between said CPU modules and said shared resources;

a bus controller on each of said CPU modules and said shared bus for initiating accesses to said shared resources in response to addresses from said CPU modules, said bus controller comprising:

a decoder for decoding identifications of shared resources being addressed by addresses on said shared bus, wherein said decoder comprises a memory which maps an address from said shared bus to an identification of a shared resource uniquely associated with said address;

a storage device for storing a first identification of a first shared resource for which an access is in progress; and

a comparator for comparing said first identification with a second identification of a second shared resource for which an access is requested, said comparator providing an active output signal when said first identification and said second identification are different, said bus controller initiating an access to said second shared resource prior to completion of said access in progress in response to said active output signal.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an improved bus architecture which decreases the bus access time in a tightly coupled multi-processor system.

2. Description of the Related Art

In a tightly coupled multi-processor system, all of the processors in the system share a common address and data bus, as well as a common system memory. When a multi-processor system employs a single common bus for address and data transfers, the bus must be restricted to one transfer at a time. Therefore, when one processor is communicating with the memory, all other processors are either busy with their own internal operations or must be idle, waiting for the bus to become free. The time that the processor spends waiting idle for the bus to become available is referred to as bus latency.

In a conventional multi-processor, common-bus architecture, the address bus is needed only for a short time while the addressed memory unit decodes the memory request. The correct memory board will then latch the address from the address bus. The address bus remains idle for the remainder of the data transfer. The data transfer time may be quite long depending upon the type of memory storage unit. Once the memory delivers the data to the data bus, and the requesting device releases the system bus, the address and data busses are released and become available to the other processors.

During the period that one processor is using the bus, the other processor must wait for the data bus to become available in order to initiate a data transfer. As the number of processors increases, the number of bus accesses increases, and, therefore, the bus latency increases. Inherent in typical bus access cycles are periods during which one processor holds the bus while it waits for a reply signal. During this time, the processor is not using the address bus. Rather, it holds the bus to prevent other processors from accessing the bus until it receives a reply. The time that the bus is held but not active while waiting for an acknowledge signal is a principal cause of bus latency in multi-processor systems.

Some multi-processor systems use a split transaction bus in order to cut down on the time that the bus is being held. In the split transaction bus, the address and data bus operate independently, thus allowing multiple requests to be outstanding. The requestor of the bus activates an address request to the address bus. Once the addressed device (e.g., memory module) latches the address and provides an acknowledgement, the requestor releases the address bus for other address requests. When the data is available from the addressed device, the device acquires the bus and delivers the data to the requestor. The address bus is therefore available for other memory requests while the memory system delivers the requested data to the requestor. The split transaction bus method reduces bus latency; however, the complexity of the system is increased dramatically. The memory boards for a split transaction system require the ability to queue and possibly sort requests, and must be provided with bus controller capabilities. The queue capability requires additional memory space to store and queue the outstanding requests and additional control logic to implement the bus controller.

In addition, depending on the system protocol, the amount of time that is saved between bus requests may decrease with increased bus transaction time. The memory access cycle time in a split transaction bus is typically longer then in a single bus system because each cycle includes steps to perform the queuing and bus control functions. If the queuing and bus control steps take longer than the time saved between transaction, the benefits of the "time saving" split transaction bus can quickly diminish. Without the return of a substantial decrease in the overall system memory access time, the increase in the complexity of the system that is required to implement a split transaction bus is often not justified. Maintaining cache coherency further complicates the implementation of a split transaction bus architecture.

A seemingly simple approach to reduce bus latency would be to increase the clock speed of the bus controller. By increasing the clock speed, the time for memory access necessarily decreases. However, this is an expensive approach that may require use of emitter-collector logic ("ECL") or other expensive materials in order to achieve the required increase in clock speeds.

Another attempt at reducing bus latency is the implementation of loosely-coupled processors. This approach has limited benefits in applications which may share common data structures. The level of bus arbitration will increase in order to resolve the multiple contention problems associated with a shared resource. The time spent on bus arbitration will reduce the overall time saved with loosely-coupled processors. Therefore, for shared resources, the system complexity increases, with little or no bus bandwidth increase.

SUMMARY OF THE INVENTION

The present invention is an improved bus architecture system for use in a tightly-coupled multi-processor system with memory storage separated into separate memory modules. The design of the present invention makes use of unused bus time and does not introduce complexities into the system which reduce the overall bandwidth increase.

The improved bus architecture of the present invention utilizes memory mapping, wherein the memory is mapped across several separate memory boards. In addition, the system provides concurrent outstanding address requests on the bus if these requests are for accesses to memory locations located on separate memory modules. The present invention decreases bus latency when any equivalent bus requests involve accesses to data from separate memory modules.

The improved bus control system of the present invention is preferably utilized in a multi-processor system having at least two memory modules, and having a common address and data bus connecting the memory modules to the processor modules. A preferred embodiment of the improved bus control system includes means for determining a first slot identifier of a first one of the memory modules which contains a first data request and means for storing the first slot identifier. In addition, the bus control system includes means for determining a second slot identifier for a second memory address request and means for comparing the first slot identifier with the second slot identifier. Preferably, a static random access memory (SRAM) is used to determine the first and second slot identifiers, and a storage register is used to store the first slot identifier. Desirably, a comparator is used to compare the first and second slot identifiers. If the first slot identifier differs from the second slot identifier, a means for issuing an early address request to the common bus is provided. Preferably, a simple logic circuit is used to issue an early address request to the common bus. Further, the preferred embodiment of the improved bus control system includes means for disabling the issuance of the early address request by any one of the processor modules if the feature is not desired.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system block diagram of a multi-processor system which implements the bus architecture of the present invention.

FIG. 2 is a block diagram representing an individual CPU module and its communication channels with the address and data bus.

FIG. 3 is a timing diagram illustrating the normal read cycle for a snoop-miss data request on a multi-processor system without the bus architecture of the present invention.

FIG. 4 is a block diagram of one embodiment of the circuitry to implement the bus architecture of the present invention.

FIG. 5 is a timing diagram of the improved access time of a snoop-miss data request utilizing the bus architecture of the present invention.

FIG. 6 illustrates an example of a preferred cache line interleaving scheme employed in a system memory.

FIG. 7 is a block diagram of an address line exchange circuit according to the present invention.

FIG. 8 is a block diagram illustrating a memory addressing circuit to realize an interleaved memory mapping technique.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is an improved bus architecture system for use in a common-bus, multi-processor system 10 with shared memory (i.e., the processors have access to common memory and shared resources). FIG. 1 illustrates a conventional multi-processor system 10 which contains a number of CPU modules 12 (i.e., CPU MODULE #1, CPU MODULE #2, ... CPU MODULE #L) and a shared memory storage area containing a number of memory modules 14 (i.e., MEMORY MODULE #1, MEMORY MODULE #2, ... MEMORY MODULE #M). The CPU modules 12 and memory modules 14 are connected to a system address bus 16, a system data bus 18 and a system control bus 20 (collectively the "system bus" 22). The multi-processor system 10 may also include various I/O and peripheral modules (i.e., MISC I/O #1, MISC I/O #2...MISC I/O #N) 24 which are connected together along an I/O bus 26. A peripheral system controller or I/O service module (IOSM Module) 28 provides an interface between the system bus 22 and the I/O bus 26 to control the data flow between the peripheral devices 24 and the system bus 22.

In general, each memory module 14 comprises a plurality of random access memory (RAM) chips organized on a circuit board with accompanying decoder logic to read and decode the address requests. The storage capacity of each memory module 14 depends upon the number of RAM chips that are installed on the circuit board and the capacity of each RAM chip, as is well known in the art. Preferably, using standard memory mapping techniques, the system memory addresses are divided among the individual memory modules 14 based upon the memory capacity of each module 14. The memory map is generated on power-up by the system basic input/output system (BIOS). A preferred memory mapping technique for use with the bus bandwidth maximizer circuit of the present invention is described in more detail below.

FIG. 2 illustrates a block diagram of a typical CPU module 12 used in the multi-processor system 10 of FIG. 1. Preferably, the CPU module 12 comprises a microprocessor 30, a cache memory system ("CACHE") 32 with internal address decoder logic, a CPU or local processor bus 34, bus transceivers 35 and system bus controller logic 36. In accordance with the present invention, each CPU module 12 further has a bandwidth maximizer circuit 38.

In the present embodiment, the CPU modules 12 of the multi-processor system 10 utilize a typical snooping cache, as is well known in the art. When a data request is made, each cache 32 monitors the address to determine if the cache 32 contains the requested data. To allow the caches 32 to snoop each request, the bus transceivers 35 of each CPU module 12 read each address request from the system address bus 16. The address is transmitted via a local processor bus 34 to the cache 32 on the CPU module 12. If the cache 32 has the data for the requested address, a signal is sent to the system bus 22, commonly known as a snoop hit signal. When the system data bus 18 is available, the data is provided from the CPU cache 32 which detected a snoop hit to the data bus 18, as is well known in the art.

FIG. 3 illustrates a read cycle of a CPU module 12 of a multi-processor system 10 for a memory cycle with a snoop miss (i.e., the requested data was not in a cache 32). A CPU1 initiates a BUSREQUEST signal (not shown) to a system bus arbitrator. When the system data bus 18 is available, the bus arbitrator returns a BUSGRANT signal (not shown) to the CPU1. The cycle timing of FIG. 3 begins after the BUSGRANT signal. As illustrated in FIG. 3, once the CPU1 receives the BUSGRANT signal, the CPU1 holds the system address bus 16 in a first clock cycle by driving the address bus available line (ABUSAVL) 40 low. Also during clock cycle 1, the system address from CPU1 is presented to the system address bus 16 via the SADDRxx lines 42. After the CPU1 has asserted the address on the system address bus 16, the CPU1 asserts the system address strobe, SADDS- line 44, in the second clock cycle. The address is valid from clock cycle 2 through clock cycle 5. The devices which latch the address do so in response to the signal of the SADDS- signal line 44. The memory modules 14 and the CPU caches 32 then determine if they contain the requested data. While each CPU cache 32 and memory module 14 determines if it has the requested data, each CPU module 12 and memory module 14 drives the transaction hold (TRNSHLD-) signal 46 low. In other words, initially, several devices may be driving the TRNSHLD- line 46 low. Therefore, until the last device releases the TRNSHLD- line 46, the line 46 remains low. The system bus busy line (SBUSBSY-) 48 is also driven low by the CPU1 to indicate that a cache 32 transfer is in process.

Each CPU cache 32 and memory module 14 releases the transaction hold, or TRNSHLD-, signal line 46 when it determines that it does not contain the requested data. Therefore, the read cycle remains on hold until all of the memory storage areas check for the requested data. The last device to release the TRNSHLD- signal 46 will be assumed to contain the requested data. In the example of FIG. 3, the TRNSHLD- signal 46 is held low from clock cycle 3 to clock cycle 5, and the last memory storage device to release the TRNSHLD- signal line 46 is, for purposes of this example, memory module 1.

After clock cycle 5, the system address bus 16 becomes available again. The signal ABUSAVL 40 returns high during clock cycle 6 in response to the release of the TRNSHLD- signal line 46.

In the present example, once the data is valid on the system data bus 18, the device providing the data, memory module 1 drives the CTERM- line 50 low during clock cycle 8. The valid data is presented to the system data bus 18 on lines SMD[63:00]51, and is valid from clock cycle 8 through clock cycle 11. At the end of the valid data on the system data bus 18, the SEOT- line 52 is strobed low at clock cycle 12, and the CTERM- line 50 returns to a normal high level. At clock cycle 12, the system address bus busy line, SBUSBSY- line 48, returns high, indicating availability of the system bus 22 for further requests. From the time the address is asserted on the system address bus 16 until the data is sent to the requesting CPU module 12, i.e., from clock cycle 3 to clock cycle 12, the system address bus busy line, SBUSBSY- line 48, is held low. During this time, all other CPU modules 12 are prevented from using the system bus 22.

For purposes of the present example, at clock cycle 13, CPU2 makes an address request by driving the ABUSAVL line 40 low, asserting the desired address on the system address lines, SADDRxx 42, and strobing the address bus request line, SADDS- 44.

In response to the address request of CPU2, during clock cycle 13, the CPU2 drives the ABUSAVL line 40 low, signalling that the system address bus 16 is busy with a new request. The read cycle repeats for the read request of CPU2.

As illustrated in FIG. 3, the system address bus 16 is not used from clock cycle 7 into clock cycle 11, yet the system address bus 16 is held "busy" as part of the system bus 22. This prevents another CPU module 12 from issuing a bus request and starting a new cycle. Similarly, during the interval from clock cycle 18 into clock cycle 22, the system address bus 16 is not in use. As described below, the bandwidth maximizer circuit 38 utilizes these unused intervals in a typical read cycle when the system address bus 16 is not in use but the read cycle has not been completed. In general, the bandwidth maximizer circuit 38 allows a second bus master device such as a CPU 12 to issue an early address request without interfering with an address request already in progress.

FIG. 4 illustrates a block diagram of the bandwidth maximizer circuit 38 of the present invention. The preferred embodiment of the bandwidth maximizer circuit 38 comprises an A5/A6 address line substitution circuit 54, a multiplexer (MUX) 56, a memory mapped decoder 58, a slot I.D. mapping static random access memory 60, a Last Slot I.D. register 62, a Slot Comparator 64, a NOR gate 66 and an AND gate 68. The bandwidth maximizer circuit 38 is coupled to the local system bus 34 comprising a local address bus 70, the local data bus 72, the local control bus 74 and the bus controller 36 on the CPU module 12.

The slot I.D. mapping SRAM 60 is organized as 4,096 (4K) words by 5 bits per word. The slot I.D. mapping SRAM 60 stores the memory map information which is configured at power-up by the system bios. More particularly, the first four bits of each location in the slot I.D. mapping SRAM 60 store the slot number of the memory module 14 assigned to the address range in system memory mapped by each location in the slot I.D. mapping SRAM 60. The fifth bit of each location in the slot I.D. mapping SRAM 60 is assigned as a slot overlap disable flag for the memory range mapped by each location in the slot I.D. mapping SRAM 60. The bus controller 36 on the CPU module 12 controls the communications between each CPU module 12 and the system bus 22.

In the present embodiment, the MUX 56 either selects address lines A2-A13 76 or A20-A31 78 from the local address bus 70 and provides them to the slot I.D. mapping SRAM 60. The decoder 58 is responsive to write operations to the addresses assigned to the slot I.D. mapping SRAM 60 to activate an output connected to the MUX 56, and thereby cause the MUX 56 to select address lines A2-A13 76 for coupling to the outputs of the MUX 56. The outputs of the MUX 56 are coupled to the address inputs of the slot I.D. mapping SRAM 60. The decoder 58 is also responsive to the addresses assigned to the slot I.D. mapping SRAM 60 to enable an output connected to the write enable (WE) input 80 of the slot I.D. mapping SRAM 60. Therefore, when the decoder 58 detects that the address request is to an address assigned to the slot I.D. mapping SRAM 60, the decoder 58 connects address lines A2-A13 76 to the address inputs of the slot I.D. mapping SRAM 60, and simultaneously activates the write enable input 80 to the slot I.D. mapping SRAM 60, allowing the SRAM 60 contents to be altered by storing the least significant 5 bits of data from the local data bus 72 via a set of data input lines 81. Write and read operations to addresses other than the addresses assigned to the slot I.D. mapping SRAM 60 do not activate the output of the decoder 58. Accordingly, with write and read operations to addresses other than the addresses assigned to the SRAM 60, the MUX 56 selects address lines A20-A31 78 for coupling to its outputs, and in turn to the inputs to the slot I.D. mapping SRAM 60.

During initialization of the slot I.D. mapping SRAM 60, the computer operating system writes to the addresses assigned to the slot I.D. mapping SRAM 60. The decoder 58 selects address lines A2-A13 76 for connection to the address inputs of the slot I.D. mapping SRAM 60. In the present embodiment, the system memory is divided into blocks of 1 megabyte each. Thus, the slot numbers stored in each location of the slot I.D. mapping SRAM 60 are assigned on the basis of the 1-megabyte divisions of memory. For example, if the first megabyte of memory is mapped to a memory module 14 in slot 3 and the second megabyte of memory is mapped to a memory module 14 installed in slot 4, the computer operating system stores the identifier "3" (011 in binary) in the first location of the slot I.D. mapping SRAM 60, and the identifier "4" (100 in binary) in the second location of the slot I.D. mapping SRAM 60.

During accesses to addresses not assigned to the slot I.D. mapping SRAM locations, the decoder 58 selects address lines A20-A31 78 for transmission through the mux 56 for connection to the address inputs of the slot I.D. mapping SRAM 60. Because A20 becomes the least significant bit of the address inputs to the slot I.D. mapping SRAM 60, the address to the slot I.D. mapping SRAM 60 only increments with each megabyte increment.

In the present embodiment, the high throughput of the bandwidth maximizer circuit 38 is achieved as long as the sequential address requests are to different memory modules (slots). Hence, the benefits of the bandwidth maximizer circuit 38 are well achieved with at least two memory modules. However, even with two memory modules, if the modules are mapped linearly in the 1-megabyte segments, sequential accesses will need to be in different megabyte address ranges. Therefore, to increase the likelihood of accesses going to different modules, the memory is advantageously mapped so that sequential cache lines (32 bytes in the present embodiment) are fetched from different memory modules.

This "interleaving"60 is depicted in FIG. 6 below. In one embodiment, in order to realize this cache-line-based interleaving, address line A5 is exchanged with address line A20 such that each 32-byte cache line is interleaved between two memory modules 14. In a further embodiment, address line A6 is further exchanged with address line A21. This leads to each sequential 32-byte cache line being interleaved between four memory modules. An A5/A6 substitution circuit 54 performs this function. An embodiment of an A5/A6 substitution circuit 54 is depicted in detail in FIG. 7, and will be described in further below.

In a multi-processor computer system wherein each CPU module 12 has an associated snooping cache, as seen in FIG. 4, when a system bus cycle is initiated, all nonparticipating processors 30 monitor (snoop) the address to maintain cache coherency, as is well known in the art. In order to snoop the address, each processor 30 latches each address from the system address bus 16 onto the local address bus 70 for comparison by the cache 32. Therefore, with each address request on the system address bus 16, the same address is present on the local address bus 70 for each CPU module 12 or other bus master module with a snooping cache in the system. Each cache 32 then determines if it has the requested data, as is well known in the art.

The bandwidth maximizer circuit 38 utilizes each address while it is active on the local address bus 70. In general, each bandwidth maximizer circuit 38 stores the slot I.D. of each address request in the Last Slot I.D. register 62. Thus, after each address request, the Last Slot I.D. register 62 for each bandwidth maximizer circuit 38