WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Backing store access coordination in a multi-processor system    
United States Patent4136386   
Link to this pagehttp://www.wikipatents.com/4136386.html
Inventor(s)Annunziata; Eugene J. (Cary, NC); James; Robert S. (Pleasant Valley, NY); Tan; Kwang G. (Poughkeepsie, NY)
AbstractIn a multi-processor system which utilizes store-in-cache techniques, a queueing mechanism for insuring that all units that wish to interrogate a memory line will be permitted to interrogate the line. Backing store access requests for data that is contained in the cache of another processor are queued until they can be serviced. The access requests that pertain to particular data are issued from the queue on a first-in-first-out basis. No group of interrogating units will be able to lock out another unit that wishes to interrogate.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4136386
Backing store access coordination in a multi-processor system - US Patent 4136386 Drawing
Backing store access coordination in a multi-processor system
Inventor     Annunziata; Eugene J. (Cary, NC); James; Robert S. (Pleasant Valley, NY); Tan; Kwang G. (Poughkeepsie, NY)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     January 23, 1979
Application Number     05/840,070
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 6, 1977
US Classification     711/119
Int'l Classification     G06F 009/18 G06F 013/00
Examiner     Chapnick; Melvin B.
Assistant Examiner    
Attorney/Law Firm     Gershuny; Edward S.
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 MS File
Patent Tags     backing store access coordination multi-processor
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
3317898



[0 after 0 votes]
3469239



[0 after 0 votes]
3510844



[0 after 0 votes]
3706077



[0 after 0 votes]
3735360
Connor
379/3
Feb,2006

[0 after 0 votes]
4065809
Matsumoto
711/152
Dec,1977

[0 after 0 votes]
4056844
Izumi
711/117
Nov,1977

[0 after 0 votes]
3940743
Fitzgerald
710/305
Feb,1976

[0 after 0 votes]
3889237
Alferness
711/148
Jun,1975

[0 after 0 votes]
3886525
Brown
711/147
May,1975

[0 after 0 votes]
3848234
MacDonald
711/119
Nov,1974

[0 after 0 votes]
3845474
Lange
711/119
Oct,1974

[0 after 0 votes]
3771137
Barner
711/120
Nov,1973

[0 after 0 votes]
3761883
Alvarez
711/164
Sep,1973

[0 after 0 votes]
3618040
Iwamoto
89/1.3
Nov,1971

[0 after 0 votes]
3581291
Iwamoto
241/28
May,1971

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. For use in a data processing system which includes at least two processors, each with an associated cache, and a backing store shared by all said processors; a system control unit for coordinating backing store access requests by said processors, said system control unit comprising:

cross-interrogate means responsive to a data access request from one of said processors to produce a first signal if the requested data is in the cache associated with another processor;

queueing means for temporarily storing a plurality of data access requests;

means responsive to said first signal for causing the processor associated with the cache containing the requested data to initiate a store operation of said data into the backing store;

means preventing access to said requested data in said backing store until the store operation initiated by said another processor has been completed; and

sequence control means, associated with said queueing means, causing access requests related to said requested data to be issued from said queueing means in a first-in-first-out sequence.

2. The control unit of claim 1 wherein said sequence control means comprises:

means for providing a signal indicative of the location within said queueing means which contains the data access request that has been stored longest within said queueing means.

3. The control unit of claim 2 wherein said sequence control means further comprises:

means for retaining an indication of the locations within said queueing means which are available for storage of data access requests;

means for selecting one of said available locations for storing a data access request in said queueing means; and

means for indicating the sequence in which locations within said queueing means have been allocated to the storage of data access requests.

4. The control unit of claim 3 further including:

means for indicating whether or not a module of said backing store is busy;

means for producing a cast-out signal when storage to said backing store from said another processor has been completed; and

means responsive to said cast-out signal and to a non-busy condition in a module of backing store to cause issuance from said queueing means of the oldest data access request related to said non-busy store module.

5. The apparatus of claim 4 wherein said sequence control means further comprises:

a push-down stack of registers for storing addresses of locations in said queueing means;

the sequence of addresses within said push-down stack being indicative of the sequence in which locations within said queueing means received data access requests.

6. The apparatus of claim 5 wherein said sequence control means contains:

a plurality of registers in said push-down stack, each register for holding an address of a location in said queueing means and a flag indicating whether or not the location in said queueing means contains a data access request that has not yet been executed; and

means for resetting said flag after execution of the data access request enqueued at the indicated location within said queueing means.

7. The apparatus of claim 6 further including:

means responsive to resetting of the flag in an i.sup.th register of said push-down stack to cause shifting of the entries in said push-down stack that were stored therein subsequent to the storage of the entry in said i.sup.th register where i is an integer greater than 1.

8. In a data processing system including at least four processors, each with an associated cache, and a backing store shared by all said processors; at least two system control units, each connected to at least two processors, for coordinating backing store access requests by said at least four processors, each of said system control units comprising:

directory means identifying data in the cache associated with each processor connected to the control unit;

cross-interrogate means responsive to a data access request from any of said processors to produce a first signal if the requested data is in the cache associated with a processor connected to the control unit;

queueing means for temporarily storing data access requests from processors connected to the control unit;

means responsive to said first signal for causing the processor associated with the cache containing the requested data to initiate a store operation of said data into the backing store;

means preventing access to said requested data in said backing store until the store operation initiated by said another processor has been completed; and

sequence control means, associated with said queueing means, causing access requests related to said requested data to be issued by said queueing means in a first-in-first-out sequence;

said system further including means connecting said control units so the cross-interrogate means within one control unit can respond to a data access request from a processor connected to another control unit.

9. The system of claim 8 wherein each of said sequence control means comprises:

means for providing a signal indicative of the location within one of said queueing means which contains the data access request that has been stored longest within said queueing means.

10. The system of claim 9 wherein each of said sequence control means further comprises:

means for retaining an indication of the locations within said one of said queueing means which are available for storage of data access requests;

means for selecting one of said available locations for storing a data access request in said one of said queueing means; and

means for indicating the sequence in which locations within said one of said queueing means have been allocated to the storage of data access requests.

11. The system of claim 10 wherein each of said control units further includes:

means for indicating whether or not a module of said backing store is busy;

means for producing a cast-out signal when storage to said backing store from a processor has been completed; and

means responsive to said cast-out signal and to a non-busy condition in a module of backing store to cause issuance from said queueing means of the oldest data access request related to said non-busy storage module.

12. The system of claim 10 further including:

queue sequence means controlling the sequence in which the plurality of queueing means issue enqueued data access requests.

13. The system of claim 12 wherein:

said queue sequence means cause alternation among the plurality of queueing means in issuing data access requests.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to data processing systems and more particularly to multi-processor systems wherein each processor employs a high-speed buffer or private store in combination with a main storage device that is shared by all processors.

2. Prior Art

An article by C. J. Conti entitled "Concepts For Buffer Storage" published in the IEEE Computer Group News, March 1969, describes a hierarchical memory in which a large slow speed three dimensional core storage (backing store) operates in conjunction with a relatively small high-speed buffer storage (or cache) manufactured using integrated circuit technology. By using the buffer/backing store arrangement, the central processing unit (CPU) is able to access data at a high rate from the high-speed buffer which is matched more closely to the machine cycle of the CPU. When the CPU provides the address of desired information to the memory system, a control circuit determines whether or not the addressed data has been moved from the backing store to the buffer store. If the data is located in the buffer store, high speed access is possible from the buffer store to the CPU. If the data is not in the buffer store, controls move the data from the backing store to the high-speed buffer. A use algorithm is provided to insure that the most frequently used data is stored in the high-speed buffer. If the use algorithm is efficient, most accesses will be for data that is in the higher speed buffer store. This should result in a combined system having effective speeds approaching that of the fastest memory at a cost approaching that of the slowest and least expensive memory.

In the prior art, buffer/backing storage apparatus are transparent to the user and the buffer operation is under fixed hardware control. When a CPU initiates a fetch operation, the main storage address is presented to the memory hierarchy. Controls access a search mechanism or directory of the high-speed buffer to determine if the requested address currently resides in the high-speed buffer. If the requested information is in the buffer, it is immediately made available to the CPU. If the requested information is not currently in the buffer, a fetch operation is initiated to the main storage backing store. The buffer location to receive the information from main storage is determined by replacement logic which, in accordance with some predetermined algorithm, determines which data in the buffer store is to be replaced with new data unit from the backing store. When the fetch is initiated at the main storage, the exact word requested is first accessed and sent directly to the CPU and the buffer. The remaining words in the same "block" of data (as determined by the particular block size of the system) are also accessed from main storage and stored into the buffer.

There are currently three methods in the prior art for handling store operations. The "store through" method is used on most existing systems. In this method data is always stored immediately in the main storage and the buffer address mechanism is checked to determine if the addressed block is currently in the buffer. If the block is in the buffer, the data may also be stored in the buffer. On some systems, where I/O operations store only into main storage, the related buffer block is made invalid by the resetting of an associated "valid" bit so that any subsequent fetches to the same block will require accessing the main storage to fetch the data to the buffer.

A second method is called "store wherever." In this method, the buffer address mechanism is checked to determine if the address block is currently in the buffer. If the block is in the buffer, the data is stored directly into the buffer without further action. If the block is not in the buffer, the data is stored in the main storage.

The third method, "store-in buffer" or "store-in-cache", always stores into the buffer. If a store instruction addresses data that is not in the buffer, the system brings the block from main storage to the buffer and then stores the new data into the block in the buffer.

The above-mentioned Conti article discusses various techniques for organizing data and for accessing that data in the high-speed buffer. One such technique, which is preferred when using the present invention, is known as the "set associative" technique. An example of this technique can be found in U.S. Pat. No. 3,588,829, Ser. No. 776,858, filed Nov. 14, 1968 and which is assigned to the same assignee as this application. In this technique, the address information is broken down into books, pages and words. Depending on the memory size, there can be some predetermined number of books, having a predetermined number of pages, and each page containing a predetermined number of words. As an example, it might be determined that each book should contain 128 pages and that each page should contain some predetermined number of words. When the determination is made, it specifies that the high-speed buffer will have 128 storage sections, each section containing the number of words in a page. Associated with each of the 128 sections of high-speed storage will be a directory, or address index array, containing 128 registers. In the set associative technique of access, the corresponding page number from any of the predetermined number of books will always be found in the same storage section of the high-speed buffer. That is, page 10 from any book in the backing store will always be found in location 10 in the high-speed buffer. The associated register in the directory will be provided with an entry which identifies the particular book to which this particular page 10 belongs. The method of determining if requested data is in the high-speed buffer is to utilize the address bits specifying pages to access the directory, and simultaneously therewith, access the high-speed buffer. The entry in register 10 of the directory is compared with the applied address to determine whether or not the book value of the applied address matches the book value contained in the register. If they do compare, this indicates that the requested page 10 from the requested book is the data contained in the high-speed buffer. If the data is not from the requested book, the page 10 from the requested book is transferred from the backing store to the buffer and inserted in storage section 10 of the high-speed buffer and the identity of the requested book is then inserted in the associated register of the directory.

Another technique for which consideration has already been given in multi-processor systems, is known as the "fully associative" technique. In this technique, the high-speed buffer may be, for example, provided with 16 storage sections. Associated with each storage section will be a register. The size of each storage section may be sufficient for storing an entire book. The particular book stored in a particular one of the storage sections will be identified in the associated register. As each address is applied, the book address portion is compared with the entries in all of the registers and if a match is found, the data is specified as being in the section associated with that register. In the fully associative technique, data transferred from the backing store to the buffer store can be placed in any of the locations. When new data must be inserted, a replacement algorithm determines which of the sections should be replaced, new data is inserted in that section and the identity of the book is inserted in the associated register of the directory.

One prior art technique has been shown in a multi-processor environment which utilizes a fully associative configuration and the store-through concept. A storage protect memory, which is utilized to protect a predetermined fixed amount of data in the backing store, is provided with additional binary bits for reflecting which of the several processors has accessed data from the backing store to its associated private store. Whenever a processor stores data into the backing store, utilizing the store-through concept, the storage protect memory is interrogated and if it is determined that the data block is in another processor's private storage, the mechanism will be utilized to invalidate the data in the other processor requiring that processor to fetch the data from the backing store the next time it is utilized. This prior art technique is limited to a buffer storage configuration in which each storage section must contain the same amount of data as specified in the storage protect memory, and does not address itself to a set associative buffer configuration nor does it consider the problems arising when utilizing the store-in-buffer concept.

U.S. Pat. No. 3,735,360 describes a system wherein high-speed buffer operation is accomplished in a multi-processor system which includes a shared storage and a plurality of processing units. Each processing unit includes a private, high-speed buffer storage, an associated directory for providing an indication of the data transferred from the shared storage to the high-speed buffer, and a storage control unit which accepts signals from the associated processor, including the shared storage address of data to be operated on, and an accessing control signal which indicates that the data is to be fetched for transfer to the processor or that the processor is to store data into the operand location.

The system contains means interconnecting all the processors to perform an interlocking function. The interlocking function is accomplished by broadcasting, under certain specified conditions, address information to all other processors from a particular one of the processors in addition to the access control signal to indicate whether or not the operation is to be a fetch or a store. Utilizing the broadcasted address and access control signals, the storage control mechanism of all other processors is operated to determine further action in connection with the data requested by the particular processor.

The private high-speed buffers have a predetermined number of storage sections and an associated directory register for identifying the address of the shared storage data presently stored in the high-speed buffer. By providing various combinations of additional binary bits in the registers of the directory or index array, various forms of storage organization and access methods can be controlled by the broadcasted address and access control information to insure that each processor accesses the most current value of the operand identified in the shared storage.

If, for each directory register, only one additional control bit is provided which signifies the validity of the data in the associated storage section, each processor must broadcast address and access control information whenever data is to be stored by a particular processor. The directory of each other processor is searched to determine whether or not the same data is contained in the associated private storage. If so, the validity bit is reset to reflect that the data is no longer valid in the associated private storage. Another bit which can be provided in the directory registers is a bit called a fetch-only bit. This bit is set and reset to reflect whether or not the data in the particular one of the private storages is the only copy of the data stored in a private storage. That is, if a particular processor has fetched data from the shared storage into the private storage, and it is known that this is the only copy of the data in the private storages, the fetch-only bit will reflect this. The need for broadcasting the address and access control signals for a store operation would not exist.

Another binary bit which can be provided in the registers of the directory or index array, is a bit known as a store bit. This bit is set and reset to reflect a condition wherein the data in the high-speed buffer of a particular processor differs from the data in the shared storage. That is, when utilizing the store-in-buffer concept, all accesses to data by particular processors are made in the high-speed buffer including accesses for the storage of data. When the data has been transferred to the high-speed buffer of a particular processor, and that data is subsequently stored into the buffer, the store bit is set. Whenever a particular processor request requires transfer of new data from the shared storage to the high-speed buffer for either storing or fetching, the address and access control signals will be broadcast to the other processors. The address information of the requested data is utilized to search the directories of all other processors to determine whether or not the requested data resides in one of the other private high-speed storages and whether or not that data has been stored into. If the data has been stored into by another processor, that data must first be transferred back to the shared storage in its modified form so that the processor requesting the data will receive from the shared storage the most current value. This requirement is not necessary if the determination is made that the data in the other processor has not been stored into and therefore has the same values as the operands in the shared storge.

In a multi-processor system which utilizes the store-in-cache method, it is of great importance that fetch requests to the backing store be allowed only when the backing store contains an updated version of the requested data. In order to prevent a processor from obtaining "stale" data (i.e., data that may have been changed within the cache of another processor) fetch requests to backing store will not be serviced until the current version of the requested data is put into backing store. This may be accomplished through the use of a "spin lock" which causes the processor to periodically reissue its fetch requests until such time as appropriate system mechanisms indicate that the requested data has been updated in backing store. Typically, the original fetch request which resulted in system recognition that the requested data was resident in the cache of another processor would be used to initiate generation of a signal to the processor which owns the data, asking that processor to "cast out" the data and return the updated version to backing store. However, it is possible that two or more processors might, at substantially the same time, all be attempting to fetch the same data while that data is within the cache of yet another processor. The processor which owns the data will have received a signal telling it to cast out but, in attempting to store the updated data into the backing store, it must contend with the other processors that are trying to fetch the same data. This contention could result in a significant delay in updating the backing store, and could even conceivably result in a situation where storage contention permanently prevents backing store from being updated and processing ceases.

The primary object of this invention is to eliminate system deadlocks caused by contention for memory between a processor that is trying to update the backing store and two or more other processors that are trying to fetch data from the backing store.

BRIEF DESCRIPTION OF THE INVENTION

In accordance with a preferred embodiment of the invention, the above and other objects are accomplished within a system control unit (SCU) which, among other things, coordinates memory access by two or more processors. The SCU contains a duplicate cache directory for each of the processors associated with it, a main memory access-queue and a cast-out address register associated with each processor. Whenever a processor accesses (fetches or stores) data that is already resident in its own cache, the access will not affect the SCU. When a processor tries to access data that is not already resident in its own cache, the access request will be transmitted to the SCU where it will be placed on a queue for subsequent transmission to the backing store. When the access request is transmitted to the backing store, the address of the requested data will be compared against the other cache directories that are resident in the SCU to see whether the requested data is resident in the cache of another processor. If the requested data is not resident in the cache of another processor, the access request will be allowed. If the requested data is resident in the cache of another processor, a "cross-interrogate-hit" signal will be utilized to prevent data from being read from backing store into the cache of the requesting processor and to signal the processor which has the data in its cache to cast-out the data to backing store. The access request to backing store will be placed on a first-in-first-out (FIFO) queue from which it will again be accessed at a later time.

A preferred embodiment of this invention may be implemented on a system which utilizes two or more SCU's, each of which has two or more processors associated with it. In such a system, if each SCU contains a copy of the directories of only its associated processors, the address of the requested backing store data must be transmitted to each SCU for checking against all of its directories in order to determine whether or not the requested data is already resident within the cache of a processor. This invention also provides a mechanism for coordinating access to backing store by more than one SCU.

The most significant advantage of this invention is that it enables an orderly sequence of accesses to backing store and prevents deadlocks.

The above and other objects, features and advantages of the invention will be apparent from the following detailed description of a preferred embodiment thereof as illustrated in the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing the main elements of a system embodying this invention;

FIG. 2 is a block diagram showing portions of the System Control Unit embodying this invention;

FIG. 3 shows portions of the SCU that are most important to this invention;

FIG. 4 shows the SCU mechanism which maintains FIFO sequencing;

FIG. 5 shows status sensing logic within the SCU;

FIG. 6 shows the logic for selecting commands from the command queue of the SCU;

FIG. 7 illustrates a system utilizing a plurality of SCU's;

FIG. 8 shows logic for coordinating multiple SCU's; and

FIG. 9 shows logic for sequencing service of processors by an SCU.

DETAILED DESCRIPTION

FIG. 1 shows generally the environment of the present invention. Operands to be utilized in the system are contained in a shared main storage 10. The operands are accessed by a plurality of data processors 11 and 12. Each of the processors 11 and 12 identifies operands in shared storage 10 on address busses 13 and 14, respectively. Processors 11 and 12 have private high-speed storage 15 and 16 and data busses 17 and 18 for the transfer of data between processors and the local private storage. A request for access to locations of operands specified on the address busses 13 and 14 is signalled on access control lines 19 and 20. The access control signals will specify that the processor desires access to the operand location for the purpose of fetching data to the processor or storing data from the processor into the accessed location.

The address information provided on busses 13 and 14 is applied to local storage control units 21 and 22 for the purpose of determining whether or not the data requested is accessible in private storage 15 or 16. If the requested data is in the private storage 15 or 16, the data will be immediately transferred on data busses 17 or 18. If the storage control unit 21 or 22 determines that the requested data is not in the private storages 15 or 16 respectively, a request will be made on control lines 23 or 24 to initiate transfer of the data from shared storage 10 to private storage 15 or 16 on storage data busses 25 or 26. Whether or not the requested data is in the local private storage is determined by means of a search mechanism which includes directories 27 and 28.

In accordance with the present invention, a System Control Unit (SCU) 29 is provided to insure that each processor will access operand locations which have the most current value of an operand in view of the fact that each of the processors, independently, may be modifying the operand values. The SCU 29 will also coordinate storage accesses by the system processors. The SCU contains a copy of each of directories 27 and 28 so that it can check to see if the data from a given main storage location is in either of the caches 15 or 16.

Referring now to FIG. 2, additional details of the SCU 29 (FIG. 1) are shown. In particular, FIG. 2 shows the portions of the SCU that enable it to communicate with, and control backing store access by, the processors. After the hardware within a processor has determined that the data requested by a memory access request is not resident in the cache of that processor, the memory access request is transmitted to a processor input register PIR 31 in the SCU. During the next machine cycle, the command received from the processor will be transmitted along line 32 to a storage input register SIR 33 from which it will address a basic storage module BSM. Simultaneously with the latter data transfer, the command in PIR 31 is transmitted along line 34 to a cross-interrogate register XIR 35 and to a memory command buffer MCB 36 and into a queue array 37. The command in XIR 35 is used to interrogate the directory copies 38 that are maintained in the SCU for the cache of each processor associated with that SCU. If there is no match between the contents of XIR 35 and any of the arrays 38, the main memory will transmit the requested data to the cache of the requesting processor through storage output register SOR 39 and processor output register POR 40. In the preferred embodiment of this invention, the transmission from main storage to a cache will always be a page comprising sixteen double words.

If the contents of XIR 35 matche an entry in one of the directories 38, one of the comparators 41 will, via line 42, signal the occurrence of a "cross-interrogate hit". Since the SCU contains a directory for the cache of each processor, it recognizes which processor has the data in its cache. That processor will then receive a signal telling it to "cast-out" the data wanted by another processor, by replacing it in main storage. The cross-interrogate hit signal also is used to set a "wait" bit in the command that caused the cross-interrogate hit and to prevent the data that was read from main memory from being transmitted to the requesting processor. This latter step is necessary because, in order to improve system efficiency in the preferred embodiment, a memory access to the BSM is initiated as soon as a command is read into SIR 33.

The processor which has been ordered to cast-out will then issue a store command to store data from its cache into the backing store. When the processor has been sent a signal telling it to cast-out the data and return it to backing store, the directory associated with that processor must be corrected to show that the processor no longer owns the data. However, until the store has been successfully completed, no other processor may be permitted to access that particular data from backing store. In order to accomplish these functions, the particular entry in one of the directory arrays 38 relating to the processor which has been ordered to cast-out is transferred to a cast-out register COR 43. COR 43 has an associated comparator 44 which receives one input from COR 43 and the other input from XIR 35 (preferably via a directory register 64). Thus, comparator 44 can produce a cross-interrogate hit signal when a processor tries to access a portion of main memory that has not yet been cast-out from the cache of another processor, even though the latter processor no longer has exclusive ownership of that data. The cross-interrogate hit signal generated by comparator 44 is used exactly as is a cross-interrogate hit signal generated by one of comparators 41 except that the signal generated by comparator 44 will not produce a cast-out signal. After the system completes the cast-out operation by moving data from a processor cache into backing store, all "wait" tags of commands sitting in the MCB 36 are reset and a command will be transmitted from MCB 36 via line 45 to PIR 31 again attempt its execution.

FIG. 3 presents a simplified drawing of portions of an SCU which isolates and highlights portions of the SCU that are particularly important to this invention. A command from the processor enters PIR 31 from which it will be placed in a command queue 37 and be sent to storage via line 34. At this same time, the address of the command in PIR 31 is transmitted to comparators 41 to see if the requested data is held in the cache of another processor as indicated by one of directories 38, and the address is also transmitted to comparators 44 to see if the address is in one of the cast-out registers 43. The four-way associative directories 38 are addressed by address register 46 which receives the low-order portion of the address of the command in PIR 31. If there is no cross-interrogate hit, the memory access will proceed to completion. If there is a cross-interrogate hit, the output of one of comparators 41 or 44 will produce a signal which prevents data from going from backing store to the requesting processor, which sets the "wait" bit associated with the command, and which signals the processor whose cache contains the data to cast-out the data (if the cross-interrogate hit signal was generated by one of comparators 41). In a subsequent cycle, a command from command queue 37 will be placed in PIR 31 and the above sequence will again take place.

Referring now to FIG. 4, the primary mechanism for ensuring the FIFO nature of the command queue is shown. Associated with each of the commands in the command queue is a tag register which contains information concerning the status of the command (as will be discussed below). Each tag register contains a "valid" bit which indicates whether its associated command is still valid; that is, whether it is waiting to be executed. In this preferred embodiment, the command queue contains eight command positions, and there are therefore eight tag registers, 0, 1, 2, 3, 4, 5, 6 and 7. The signal from the "valid" position of each of the tag registers is fed to a polling mechanism 45 which generates an indication of the next free ad