|
Claims  |
|
|
What is claimed is:
1. A multiprocessor system having a data memory for storing data in a
plurality of divided block areas, and a plurality of processing elements
each having a cache memory for holding a copy of part of said data stored
in said divided block areas of said data memory in units of a data block
stored in each of said divided block areas, said multiprocessor system
comprising:
a directory memory for classifying said plurality of processing elements
into processing groups each comprising a plurality of said processing
elements, and for holding directory information indicating which one of
said processing group holds a copy of the data stored in said data memory
in units of said data block; and
control means for referring to said directory memory in response to a
request from one of said processing elements, for identifying the
processing group holding said copy of data, and for delivering a
predetermined message to processing elements belonging to the identified
processing group.
2. The multiprocessor system according to claim 1, wherein said control
means delivers successively said predetermined message to the processing
elements belonging to said identified processing group.
3. The multiprocessor system according to claim 1, wherein said control
means broadcasts said predetermined message to the processing elements
belonging to said identified processing group.
4. The multiprocessor system according to claim 1, wherein said directory
information includes a shared bit indicating that the data is held in at
least two of said processing groups.
5. The multiprocessor system according to claim 1, wherein each of said
processing elements includes a plurality of CPUs (Central Processing
Unit).
6. The multiprocessor system according to claim 1, wherein said control
means refers to said directory memory in response to a write request from
one of said processing elements, identifies the processing group holding
said copy of data, and delivers a data invalidation message to the
processing elements belonging to the identified processing group,
excluding the one processing element making the write request.
7. The multiprocessor system according to claim 6, wherein said control
means includes directory information control means and group information
control means, wherein
said group information control means finds, from a group address assigned
to the processing group sent from said directory information control
means, an end address assigned to the processing elements belong to said
processing group, and delivers the end address to said directory
information control means, and
said directory information control includes:
means for referring to said directory memory, in response to said write
request, for identifying the processing group holding said copy of data,
and for delivering the group address corresponding to said processing
group to said group information control means, and
means for delivering the data invalidation message to the processing
elements belonging to said identified processing group based on the end
address sent from said group information control means.
8. The multiprocessor system according to claim 1, wherein said control
means writes new directory information in said directory memory in
response to a read request from a given one of said processing elements,
and delivers a desired data block to the one processing element making the
read request.
9. The multiprocessor system according to claim 8, wherein said control
means includes directory information control means and group information
control means,
said group information control means includes:
means for identifying, from an end address sent from said directory
information control means, the processing group to which the one
processing element making the read request belongs; and
means for decoding a group address assigned to said identified processing
group, for obtaining first directory information, and for delivering said
first directory information to said directory information control means,
and
said directory information control means includes:
means for delivering said end address of the origin of the request to said
group information control means in response to said read request;
means for referring to said directory memory in response to said read
request and for reading out second directory information corresponding to
desired data; and
means for calculating logical OR of said first directory information and
said second directory information, and for changing said second directory
information in said directory information memory to a result of the
calculated logical OR.
10. The multiprocessor system according to claim 1, wherein a group address
(m-bits) assigned to said processing group is determined to satisfy the
condition:
k.ltoreq.2.sup.m, and n.gtoreq.m,
when an end address assigned to said processing element is n-bits and one
entry of the directory information is k-bits.
11. A directory management method applicable to a multiprocessing system
having a data memory for storing data in a plurality of divided block
areas, a plurality of processing elements each having a cache memory for
holding a copy of part of said data stored in said divided block areas of
said data memory in units of a data block stored in each of said divided
block areas, and a directory memory for storing information relating to a
copy of the data stored in said data memory, said directory management
method comprising the steps of:
a) classifying said plurality of processing elements into processing groups
each comprising a plurality of said processing elements, and holding
directory information indicating which of said processing groups hold the
copy of the data stored in said data memory, said directory information
being held in said directory memory in units corresponding to said data
block; and
b) referring to said directory memory in response to a request for data
from one of said processing elements, identifying the processing group
holding the requested data, and delivering a predetermined message to the
processing elements belonging to the identified processing group.
12. The method according to claim 11, wherein said step b) includes a step
of delivering successively said predetermined message to the processing
elements belonging to said identified processing group.
13. The method according to claim 11, wherein said step b) includes a step
of broadcasting said predetermined message to the processing elements
belonging to said identified processing group.
14. The method according to claim 11, wherein said step a) includes a step
of storing, in said directory memory, shared information indicating that
the copy of the data is held in at least two of said processing groups.
15. The method according to claim 11, wherein each of said processing
elements includes a plurality of CPUs (Central Processing Unit).
16. The method according to claim 11, wherein said step b) includes the
steps of:
referring to said directory memory in response to a write request for data
from a given one of said processing elements, and identifying the
processing group holding the requested data,
identifying an end address assigned to the processing elements within said
processing group based on a group address assigned to said processing
group, and
delivering a data invalidation message to the processing elements within
said identified processing group based on the end address.
17. The method according to claim 11, wherein said step b) includes the
additional steps of:
identifying, in response to a read request from one of the processing
elements, the processing group to which said processing element making the
read request belongs, the identification being made based on an end
address of the processing element making the read request, and
decoding a group address assigned to said identified processing group and
obtaining first directory information, and wherein the steps recited by
claim 11 include:
referring to said directory memory in response to said read request and
reading out second directory information corresponding to desired data,
calculating logical OR of said first directory information and said second
directory information, and changing said second directory information in
said directory information memory to a result of the calculated logical
OR, and
delivering a desired data block to the processing element making the read
request. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a shared-memory type multiprocessor system
wherein a plurality of processors commonly use a single main memory
device, and more particularly to a multiprocessor system wherein
information representing a location of copied data within a main memory
device is stored in a directory memory and data management is performed on
the basis of this information.
2. Description of the Related Art
Recently, multiprocessors have been developed wherein a plurality of
processors are operated in parallel to improve an operation speed,
reliability and extension properties In this type of system, a plurality
of processors commonly use a main memory device or the processors are
coupled by a high-speed channel, etc. For example, in a tightly coupled
multiprocessor, a single main memory device (hereinafter referred to as
"shared-memory") is commonly used by two or more processors, and the
entire system is managed by a single operating system. On the other hand,
in a loosely coupled multiprocessor, there is no commonly used memory, and
processors are respectively provided with exclusive memories (local
memories).
When a large-scale shared-memory type multiprocessor system has several
tens of processors the speed of access to the shared memory and the band
width (bus width.times.bus speed) are important factors which determine
the performance of the multiprocessor. For example, in the case of a
bus-coupled multiprocessor having a single access path to a shared memory,
a plurality of processors require use of a bus. Thus, for example, when
the frequency of access to the shared memory is high, competition for the
use of the bus occurs frequently and the wait time of each processor to
use the bus increases, resulting in a lower performance. In general, the
time for access to a shared memory is much longer than the time of
processing by a processor. Consequently, the performance of the high-speed
processor cannot fully be exhibited.
A method for solving the above problem has been proposed, wherein a copy of
part of memory data stored in the shared memory is kept at a location for
allowing high-speed access by the processor (i.e. a location near the
processor). For example, according to a generally adopted method, a copy
of part of memory data is stored in a local memory or a cache memory. In
this method, a processor capable of accessing the shared memory without a
global access path is provided with a local memory, and a copy of part of
the data stored in the shared memory is held in the local memory. When a
local memory is used, as compared to a large-scale shared memory,
high-speed access is generally obtained. In addition, since a global
access path is not used, a problem of band width is partially solved.
However, in the multiprocessor adopting the above method, since copies of
data stored in the shared memory are present at a plurality of locations,
coherency of data in each cache or local memory must be maintained.
Various methods for maintaining such coherency have been proposed. In one
of these methods, a directory memory is provided along with the shared
memory, and the directory memory stores information representing which one
of processing elements has a copy of data stored in the shared memory.
FIG. 1 shows an example of the structure of a multiprocessing system
adopting the above method. As is shown in FIG. 1, the multiprocessing
system comprises eight processing elements 1 to 8, a shared memory 9 and a
coupling network 10 for coupling the elements 1 to 8 and the shared memory
9.
The processing element 1 includes a CPU 11 for controlling the entire
operations of the element 1, and a cache 21 which stores a copy of part of
data stored in the shared memory 9 and can be accessed at high speed by
the CPU 11. The other processing elements 2 to 8 have the same structure,
and include CPUs 12 to 18 and caches 21 to 28, respectively. The coupling
network 10 may be of a bus type, a cross-bar switch type, or other general
network types.
The shared memory 9 comprises a data memory 19, a directory memory 29, and
a directory information controller 39. The data memory 19 stores various
data items. The data memory 19 comprises a plurality of divided blocks,
and a copy of data is stored in the caches of the processing elements in
units of a block.
The directory memory 29 stores information representing which one of the
processing elements has a copy of data of each block (i.e. "data block")
stored in the shared memory 19. Specifically, the directory memory 29 has
the same number of entries as the blocks in the data memory 19. FIG. 2
shows an example of the entry in the directory memory 29. In this example
of multiprocessor system, since eight processing elements 1 to 8 are used,
one entry in the directory memory 29 comprises 8 bits. Each bit of the
entry corresponds to each processing element. If one or more bits of the
entry have value "1", it is indicated that a copy of a data block in the
data memory 19 corresponding to this entry is stored in the cache(s) of
the processing element(s) associated with the bit(s) having value "1".
The directory memory 29 may include an attribute bit such as a modified bit
indicating the fact that the entry has been modified.
In the multiprocessor system having the above structure, when an
invalidating process for indicating that a copy of a certain data block is
invalid is executed, the directory information controller 39 reads out the
entry of the data block concerned from the directory memory 29. Thereby,
the processing element in which a copy of the data block is present can be
identified. A desired process can be performed for the identified
processing element by sending a predetermined message.
In the above system, however, a directory memory having a capacity
proportional to the number of processing elements is required.
Accordingly, in the case where the number of processing elements
increases, the number of entries in the directory memory increases
accordingly. As a result, a quantitative overhead (i.e. a memory capacity
occupied by an operating system and a capacity of a file employed or a
ratio thereof) increases.
For example, when 256 processing elements are provided, the capacity of the
entry in the directory memory for one data block must be 256 bits =32
bytes. In the case of a system wherein a data block in a data memory
comprises 32 bytes, the capacity of the data memory is equal to that of
the directory memory.
In the conventional directory-type multiprocessor as described above, the
directory information (bits) corresponding to the number of processing
elements is required. Consequently, in a system having a great number of
processing elements, the capacity of the directory memory and the
quantitative overhead increase.
Summary of the Invention
The object of the present invention is to provide a directory-type
multiprocessor system having a directory memory of a predetermined
capacity, irrespective of the number of processing elements, thereby
enhancing memory efficiencies.
In order to achieve the above object, according to a first aspect of the
invention, there is provided a multiprocessor system having a data memory
for storing data in a plurality of divided block areas, and a plurality of
processing elements each having a cache memory for holding a copy of part
of the data stored in the divided block areas of the data memory in units
of a data block stored in each of the divided block areas, the system
comprising:
a directory memory for classifying the plurality of processing elements
into processing groups each comprising at least one of the processing
elements, and holding directory information indicating which one of the
processing group holds a copy of the data stored in the data memory in
units of the data block; and
control means for referring to the directory memory, in response to a
request from a given one of the processing elements, identifying the
processing group holding the copy of data, and delivering a predetermined
message to the processing elements belonging to the identified processing
group.
According to a second aspect of the invention, there is provided a
directory management method applicable to a multiprocessing system having
a data memory for storing data in a plurality of divided block areas, a
plurality of processing elements each having a cache memory for holding a
copy of part of the data stored in the divided block areas of the data
memory in units of a data block stored in each of the divided block areas,
and a directory memory storing information relating to a copy of the data
stored in the data memory, the method comprising the steps of:
a) classifying the plurality of processing elements into processing groups
each comprising at least one of the processing elements, and holding in
the directory memory directory information indicating which one of the
processing group holds a copy of the data stored in the data memory in
units of the data block; and
b) referring to the directory memory, in response to a request from a given
one of the processing elements, identifying the processing group holding
the copy of data, and delivering a predetermined message to the processing
elements belonging to the identified processing group.
According to the invention having the above structure, there are provided a
data memory and a directory memory accompanying the data memory. The
directory memory stores information (directory information) indicating
which one of the processing groups holds a copy of memory data. A
plurality of processing elements are classified as one group, and each bit
of each entry of the directory corresponds to each processing group. In
this invention, it is indicated that a copy of data is present in at least
one of a plurality of processing elements belonging to the processing
group. There are provided a group information control unit for controlling
the group information and a directory information control unit.
Thus, when an invalidation process is executed for a copy of a certain
memory block, the directory information control unit reads out the
directory information from the directory memory. The group information
control unit identifies, from the directory information, the processing
group having the copy and specifies the processing element group belonging
to this group. The directory control unit delivers a predetermined message
to each of the specified processing element groups.
Accordingly, in a multiprocessor system, the capacity of the directory
memory can be set irrespective of the number of processing elements, and a
directory-type multiprocessor system with high memory efficiencies can be
realized. Therefore, a large-scale multiprocessing system having a number
of processing elements can easily be constructed, and a decrease in
performance of the system due to an increase in the number of processing
elements can be prevented.
Additional objects and advantages of the invention will be set forth in the
description which follows, and in part will be obvious from the
description, or may be learned by practice of the invention. The objects
and advantages of the invention may be realized and obtained by means of
the instrumentalities and combinations particularly pointed out in the
appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part
of the specification, illustrate presently preferred embodiments of the
invention, and together with the general description given above and the
detailed description of the preferred embodiments given below, serve to
explain the principles of the invention.
FIG. 1 is a diagram showing the structure of a prior-art multiprocessing
system;
FIG. 2 shows an example of an entry stored in the directory memory, shown
in FIG. 1;
FIG. 3 is a block diagram showing the structure of a multiprocessing system
according to a first embodiment of the present invention;
FIGS. 4A to 4C show bit sequences of a processing element address (FIG.
4A), a processing group address
FIG. 4B) and an entry (FIG. 4C) stored in a directory memory;
FIGS. 5A and 5B are flow charts illustrating a data read operation in the
first embodiment;
FIGS. 6A to 6D illustrate bit sequences of the processing element address,
processing group address and entry stored in the directory memory in the
data read operation;
FIGS. 7A to 7E are flow charts for illustrating a data write operation in
the first embodiment;
FIGS. 8A to 8G illustrate bit sequences of the processing element address,
processing group address and entry stored in the directory memory in the
data write operation;
FIG. 9 is a block-diagram showing another example of the processing element
in the first embodiment; and
FIG. 10 is a block diagram showing the structure of a multiprocessor system
according to a second embodiment of the present invention.
Detailed Description of the Preferred Embodiments
Embodiments of the present invention will now be described with reference
to the accompanying drawings.
FIG. 3 shows a structure of a multiprocessor system according to a first
embodiment of the present invention. The multiprocessor system according
to the first embodiment is a shared-memory type multiprocessor system
adopting a directory system for supporting 124 processing elements.
The processing elements 100 to 223 are interconnected via a shared memory
or a main storage device 300 and a network 400. Various data items are
transmitted via the network 400. As shown in FIG. 3, the processing
elements 100 to 223 are divided into groups each comprising four.
Accordingly, the 124 processing elements 100 to 223 in the first
embodiment are divided into 31 processing groups 50 to 80. The grouping is
not determined by physical conditions such as location of installation of
processing elements, but it is a concept employed in group information
control (described later).
The processing elements (PE) 100 to 223 comprise, respectively, CPUs
(Central Processing Unit) 500 to 623 for controlling the entire operations
of the processor, caches 700 to 823 which can be access at high speed and
store copies of part of data stored in the shared memory 300, and network
access controllers 900 to 1023 for transmitting via the network 400 data
between the processing elements and the shared memory 300 or between the
processing elements.
The caches 700 to 823 store data in units of a data block in response to
access operations of the associated processing elements 100 to 223 to the
shared memory 300. Normally, when data read/write processing is executed,
the CPUs 500 to 623 refer to tags (TAG) provided in the caches 700 to 823
and determine whether data concerned is stored in the caches 700 to 823
("cache hit") or not ("cache mishit"). In the case of cache hit, data
concerned in the caches 700 to 823 is accessed and predetermined
processing is executed. In the case of cache mishit, the CPUs 500 to 623
access the shared memory 300 via the network access controllers 900 to
1023 and network 400, and fetches a data block including the data
concerned into the caches 700 to 823, and a predetermined process for the
stored data is executed. By doing this operation, frequently accessed data
is stored in the caches 700 to 823.
The caches 700 to 823 have status information items 700a to 823a,
respectively, for representing the state of each stored data block. There
are three statuses of data blocks: "invalid", "shared" and "modified". The
status "invalid" indicates that the data block corresponding to this
status is invalid. The term "invalid" means that the data block
corresponding to the status information "invalid" was subjected to a write
operation (i.e. updated) by a processing element other than the processing
element holding this data block (status information ="invalid"). The term
"shared" means that two or more of the processing elements 100 to 223 hold
the data block corresponding to this status information. The term
"modified" means that the data block corresponding to the status
information is updated and occupied in the processing element having the
cache holding this data block. The status information may be designed such
that it is held to the aforementioned tag.
The coupling network 400 may be of a bus type, cross-bar switch type, other
general network types.
The shared memory 300 comprises a data memory 301, a directory memory 302,
a directory information control unit 303, and a group information control
unit 304. The data memory 301 holds various data items. The data memory
301 comprises a plurality of divided blocks, and a copy of data is stored
in the caches 700 to 823 of the processing elements 100 to 223 in units of
a block (e.g. bytes) by the above-described copying process. Specifically,
the unit of the divided block is determined by a transfer unit of data
from the shared memory 300 to the caches 700 to 823.
The directory memory 302 stores directory information indicating which one
of the processing groups 50 to holds a copy of each data block in the data
memory 301. Specifically, the directory memory 302 has the same number of
entries as the blocks in the data memory 301. In this embodiment, a
predetermined bit of the entry bits of the directory memory 302 is used as
a shared bit, and the shared bit indicates whether the associated memory
data is held in a plurality of processing elements. More specifically,
when the shared bit is "1", the associated memory data is held in two or
more of the processing elements 100 to 223. In this case, the default
value of the shared bit is "1".
The directory information control unit 303 has an arithmetic circuit 303a
for performing various arithmetic operations including a decoding
operation, and effecting transmission of data such as memory addresses and
various messages between the directory information control unit 303 and
the processing elements 100 to 223.
The group information control unit 304 finds processing elements of a
processing group from a processing group address (PE group #) assigned to
this processing group, and finds a PE group # from a processing element
address (PE#) assigned to processing elements.
The processing group will now be briefly described. In general, the
processing group can be formed by the following method.
Suppose that a bit sequence of a processing element address (PE#) assigned
to a processing element is
an-1, an-2, . . . ,a1, a0
and a bit sequence of an entry in the directory memory is
dk-1, dk-2, . . . . ,d1, d0
where k.ltoreq.2.sup.m,
and
n.gtoreq.m.
In this case, suppose that a value (0 to 2.sup.m-1) represented by a bit
sequence o m-bits constituted by an-1, an-2, . . . an-m is referred to as
"processing group number i", and each processing element belongs to group
i.
Accordingly, the bit position of the entry of the directory memory
corresponding to the processing group iis represented by di. If the bit di
is "1", it is indicated that there is a copy of memory data concerned
within the processing group i.
The above-described general method will now be applied to the first
embodiment. As has been described above, in the multiprocessor system of
the first embodiment, 124 processing elements 100 to 223 are supported.
Identification of 124 processing elements requires a 7-bit information
capacity. Accordingly, a processing element address (PE#) assigned to the
processing elements 100 to 223 comprises 7 bits (n=7).
In addition, in the first embodiment, 31 processing groups 50 to 80 are
provided, and a 5-bit information capacity is required to identify these
processing groups. Accordingly, a processing group address (PE group #)
assigned to the processing groups 50 to 80 comprises 5 bits (m=5). The
m-th power of 2 is 32 (k=32), and one bit is unnecessary for 31 processing
groups. This one bit may be used to indicate other information, and in
this embodiment this bit is used as a shared bit indicating whether memory
data corresponding to an entry is shared or not.
FIGS. 4A to 4C show the aforementioned bit structures. FIG. 4A illustrates
a processing element address (PE#), FIG. 4B a processing group address (PE
group #), and FIG. 4C an entry (directory information) in the directory
memory 302. As is shown in FIG. 4C, the highest bit of the directory
information is a shared bit indicating whether memory data corresponding
to an entry is shared or not, and the other bits 0 to 30 correspond to the
processing groups 50 to 80. For example, if bit 0 is "1", it is indicated
that data corresponding to the entry is stored in at least one of the
processing elements 100 to 103 belonging to the processing group 50.
The operation of the first embodiment will now be described with respect to
the read/write operation performed by the CPU 500 in the processing
element 100. FIGS. 5A and 5B are flow charts illustrating the data read
operation. As described above, 7-bit addresses PE# are assigned to the
processing elements 100 to 223. Address PE# "0000000" is assigned to the
processing element 100, and the subsequent processing elements 101 to 223
are provided with addresses increased sequentially with an increment of 1,
and PE# "1111011" is assigned to the processing element 223.
When a data read operation is executed, the CPU 500 refers to the tag in
the cache 700 and determines whether data concerned is stored in the cache
700 ("cache hit") or not ("cache mishit") (step A1). In this embodiment,
even in the case where the data to be read is stored, if the status
information of the data block including this data is "invalid", the cache
mishit is determined.
In the case of cache hit ("YES" in step A1), the CPU 500 accesses the cache
and reads out data concerned (step A3).
In the case of mishit ("NO" in step A1), the CPU 500 delivers a
predetermined command to the network access controller 900 and accesses
the shared memory 300 via the network 400 (step A5). The processing
element 100 sends to the shared memory 300 a memory address of data to be
read and the PE# assigned to the processing element 100. The subsequent
processing is executed in the shared memory 300.
The directory information control unit 303 receives the memory address and
PE# sent by the processing element 100 and delivers the PE# to the group
information control unit 304 (step A7). The group information control unit
304 finds a PE group # from the received PE# and sends the PE group # to
the directory information control unit 303 (step A9). As shown in FIG. 6A,
the higher five bits of the PE# are taken out and used as PE group #.
Specifically, since PE# "0000000" is assigned to the processing element
100, the higher five bits "00000" thereof are taken out and used as PE
group # of the processing group 50 to which the processing element 100
belongs. Thereby, for example, even if a read request is issued from the
processing element 101, the processing element 100 is equal to the
processing group and therefore the PE group # is equal to "00000". When a
read request is issued from the processing element 223, PE group # "11110"
is taken out from the PE# "1111011" assigned to the processing element
223.
The directory information control unit 303, which has received the PE group
# from the group information control unit 304, decodes the PE group # and
generates 31-bit directory information excluding the shared bit (step
All). In this case, the processing group 50 including the processing
element 100 corresponds to the bit 0 of the directory information. Thus,
the directory information, which is a decoded result, is obtained, as
shown in FIG. 6B.
Subsequently, the directory information control unit 303 reads out the
entry corresponding to the memory address from the directory memory 302
(step A13). When directory information read out from the directory memory
302 is a bit sequence as shown in FIG. 6C, the shared bit is "1". It is
thus understood that the data represented by the memory address is held by
the other processing elements 101 to 223. Moreover, since bit 1, bit 10
and bit 30 are "1", it is understood that the data represented by the
memory address is held by at least one processing element of each of the
processing groups 51, 60 and 80 corresponding to bits 1, 10 and 30,
respectively.
The directory information control unit 303 performs an OR arithmetic
operation on the basis of the directory information or decoded result
shown in FIG. 6B and the bits 0 to 30 of the directory information read
out from the directory memory 302, and writes the OR result in the
directory memory 302 as new directory information (see FIG. 6D) (step
A15). However, if at least one of bits 0 to 30 is "1" in the directory
information read out from the directory memory 302, a write operation is
effected in the directory memory 302 and the shared bit of the new
directory information is set at "1".
Thereafter, the directory information control unit 303 accesses the data
memory 301 and takes out the data block including data designated by the
memory address and sends this data block to the origin of the read request
or processing element 100 (steps A17 and A19). It should be noted that the
subsequent operation is performed by the CPU 500 once again.
The network access controller 900 in the CPU 500 receives the data block
via the coupling network 400 and the status information corresponding to
the area storing this data block is changed from "invalid" to "shared"
(steps A21 and A23). Thereafter, the data block is stored in the cache 700
and the tag is updated accordingly (step A25). The CPU 500 reads out the
data represented by the memory address from the written data block (step
A27). Thus, the data read process is completed.
With reference to the flow charts of FIGS. 7A to 7E, the write operation
will now be described. Like the description of the read operation, suppose
the write operation for given data by the CPU 500 of the processing
element 100.
When the data write process is executed, the CPU 500 refers to the tag of
the cache 700 and determines whether the data of the memory address to be
written is held in the cache 700 (step B1). In the case of cache hit
("YES" in step B1), the CPU 500 determines whether the status information
of the data to be written is "shared" or "modified" (step B3). If the
status information is "modified", it is indicated that the data concerned
is occupied by the processing element 100. Thus, if the status information
is "modified", the CPU 500 writes data in the cache 700 (step B5).
If the status information is "shared", it is indicated that the data to be
written is already held in a plurality of processing elements. Thus, the
CPU 500 issues a predetermined command to the network access controller
900 and accesses the shared memory 300 via the network 400 (step B7). In
this case, the processing element 100 sends to the shared memory 300 the
memory address of the data to be written and the PE# assigned to the
processing element 100. The subsequent processing is executed in the
shared memory 300.
In response to the access by the processing element 100, the directory
information control unit 303 reads out the entry (directory information,
32 bits) corresponding to the write data in the directory memory 302 (step
B9). The read-out directory information is sent to the group information
control unit 304 (step B11). The group information control unit 304
calculates, from the received directory information, one PE group # of the
processing group holding the write data (step B15). Further, the group
information control unit 304 calculates, from the calculated PE group #,
the PE# of the processing element belonging to this processing group, and
delivers the PE# to the directory information control unit 303 (step B15).
Based on the received PE#, the directory information control unit 303
delivers the memory address of the write data and an invalidation message
to this processing element. Thereafter, it is determined whether the
memory address and invalidation message have been delivered to all
processing elements that belong to the aforementioned processing group,
except for the processing element 100 of the origin of request (step B19).
If this process has not been executed for all the processing elements
("NO" in step B19), the control routine returns to step B15 and the
processing of step B17 is executed for the processing elements to which
the invalidation message has not been delivered.
For example, if the directory information as shown in FIG. 8A is read out
in step B11, bit 11 and bit 25 are "1". Thus, it is indicated that the
data to be written by the CPU 500 is held in the processing elements
belonging to the processing groups 61 and 75. In step | | |