WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Data transfer apparatus    
United States Patent5408613   
Link to this pagehttp://www.wikipatents.com/5408613.html
Inventor(s)Okabayashi; Ichiro (Osaka, JP)
AbstractA parallel processing system enabling a mixed transfer of packets of different lengths is achieved. The data transfer apparatus in this parallel processing system comprises four address controllers. During the first data transfer the third address controller is used for sending and the first address controller is used for receiving data. During the second transfer, the fourth address controller is used for sending, and the second address controller is used for receiving. When the first and second transfer operations are mixed, the first and second address controllers are selectively used during receiving, and the third and fourth address controllers are selectively used during sending. Each of the address controllers changes the address only after packet transfer is completed. The header of the packet contains a packet length field, which is interpreted to enable simultaneous, dynamic handling of plural packets of different lengths. As a result, packets can be transferred without deadlocks occurring even when packets of different lengths are mixed.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5408613
Data transfer apparatus - US Patent 5408613 Drawing
Data transfer apparatus
Inventor     Okabayashi; Ichiro (Osaka, JP)
Owner/Assignee     Matsushita Electric Industrial Co., Ltd. (Osaka, JP)
Patent assignment
All assignments
Publication Date     April 18, 1995
Application Number     07/995,873
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 23, 1992
US Classification     709/234 709/233 709/245
Int'l Classification     G06F 013/14
Examiner     Robertson; David L.
Assistant Examiner    
Attorney/Law Firm     Willian Brinks Hofer Gilson & Lione
Address
Parent Case    
Priority Data     Dec 24, 1991[JP]3-340618
USPTO Field of Search     395/200 395/250 395/275 395/325 395/425
Patent Tags     data transfer
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5315707
Seaman
710/56
May,1994

[0 after 0 votes]
5047917
Athas
719/314
Sep,1991

[0 after 0 votes]
4769771
Lippmann
709/213
Sep,1988

[0 after 0 votes]
4667287
Allen
709/234
May,1987

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A data transfer apparatus comprising a memory port connected to an external memory area,

a network port connected to a network,

buffers connected between the memory port and the network port for holding the data flowing between the two ports,

an interpreter and evaluation means connected to the memory port side of the buffers for interpreting the operating mode encoded in the transfer data, and

an address pointer pointing to a specific address and connected to the interpreter and evaluation means such that

when data flows from the network port to the memory port, the interpreter and evaluation means decodes the memory port buffer output;

when the memory port buffer output is interpreted to be a memory read request and an immediate response is enabled as determined by a buffer not-full state, the interpreter and evaluation means enters the corresponding memory-read sequence to read the data from the memory area specified by the memory read request;

when an immediate response is not enabled as determined by a buffer full state, the interpreter and evaluation means writes the memory read request to the memory area indicated by the address pointer so that the interpreter and evaluation means can fetch the memory read request from this same memory area when response becomes possible as determined by a buffer not-full state, interpret the read memory read request, and thus proceed with the corresponding memory-read sequence.

2. A data transfer apparatus according to claim 1 wherein the external memory area is used as a first-in, first-out (FIFO) device by means of write address pointers and read address pointers whereby

the request is written to the address indicated by the write address pointer and the write address pointer is then advanced when there is a write request, and

the request is read from the address indicated by the read request pointer and the read address pointer is then advanced when there is a read request.

3. A data transfer apparatus according to claim 2 further comprising an externally programmable register for storing the initialization values of the two address pointer sets, and information defining the allowable range of address pointer change, and

the address pointers are reset to the initialization values after being advanced to the range limit.

4. A data transfer apparatus according to claim 1, 2, or 3 wherein the data format comprises a field for identifying the transfer mode, and

a field for identifying the memory address.

5. A parallel processing system comprising plural processor elements each having a data transfer apparatus according to claim 1, 2, 3, or 4 as the interface to the network, and

a network connecting the processor elements to enable data transfer there between.

6. A data packet transfer apparatus comprising a memory port connected to an external memory area,

first and second network ports connected to a network,

a data selector of which the output is connected to the second network port,

a first buffer of which the input is connected to the first network port, and the output is connected to the memory port,

a second buffer of which the output is connected to a first input of the data selector, and the input is connected to the first network port,

a third buffer of which the output is connected to a second input of the data selector, and the input is connected to the memory port,

a first packet interpreter for detecting the packet length included in the data input from the first network port,

first and second address controllers for setting the network switch addresses according to the output from the first packet interpreter,

a first address selector of which the input is the output from the first and second address controllers, and the output is connected to the first network port,

a first control means for controlling the data transfer operation of the first network port,

a second packet interpreter for detecting the packet length included in the data input from the third buffer to determine the packet length,

a third address controller for setting the network switch addresses according to the output from the second packet interpreter,

a third packet interpreter for detecting the packet length included in the data input from the second buffer to determine the packet length,

a fourth address controller for setting the network switch addresses according to the output from the third packet interpreter,

a second address selector of which the input is the output from the third and fourth address controllers, and the output is connected to the second network port, and

a second control means for controlling the data transfer operation of the second network port.

7. A parallel processing system comprising an N.times.N matrix of processor elements arranged in a two-dimensional array with a data packet transfer apparatus comprising a memory port connected to an external memory area,

first and second network ports connected to switches of network,

a data selector of which the output is connected to the second network port,

a first buffer of which the input is connected to the first network port, and the output is connected to the memory port,

a second buffer of which the output is connected to a first input of the data selector, and the input is connected to the first network port,

a third buffer of which the output is connected to a second input of the data selector, and the input is connected to the memory port,

a first packet interpreter for detecting the packet length included in the data input from the first network port,

first and second address controllers for setting network switch addresses according to the output from the first packet interpreter,

a first address selector of which the input is the output from the first and second address controllers, and the output is connected to the first network port,

a first control means for controlling the data transfer operation of the first network port,

a second packet interpreter for detecting the packet length included in the data input from the third buffer to determine the packet length,

a third address controller for setting the network switch addresses according to the output from the second packet interpreter,

a third packet interpreter for detecting the packet length included in the data input from the second buffer to determine the packet length,

a fourth address controller for setting the network switch addresses according to the output from the third packet interpreter,

a second address selector of which the input is the output from the third and fourth address controllers, and the output is connected to the second network port, and

a second control means for controlling the data transfer operation of the second network port functioning as the interface to the network, and

an N.times.N.times.N matrix of switches in a three-dimensional arrangement with each switch comprising two internal buffers;

when in this network the processor element in row i column j is identified as PEij and the number-k switch at row i column j is identified as SWijk where i, j, k, and l are integers greater than zero and less than or equal to (N-1),

one network port of PEij is connected in common to one terminal of N switches SWijk (k=0, 1, . . . N-1), the other network port of PEij is connected in common to one terminal of N switches SWjli (1=0, 1, . . . N-1), the other terminal of SWijk is connected to PEki, and the other terminal of SWjli is connected to PEjl, and

when plural packets comprising a header and plural data blocks are transferred from PEij to PElk, the packets are transferred from PEij through the first buffer of SWijk to PEki in the first transfer step, and from PEki through the second buffer of SWkil to PElk in the second transfer step, and

data transfer apparatus PEki specifies using the first address controller the addresses of those N switches SWijk (j=0, 1, . . . N-1) of which the first buffer is in use,

specifies using the second address controller the addresses of those N switches SWijk (j=0, 1, . . . N-1) of which the second buffer is in use,

specifies using the third address controller the addresses of those N switches SWkil (1=0, 1, . . . N-1) of which the first buffer is in use,

specifies using the fourth address controller the addresses of those N switches SWkil (1=0, 1, . . . N-1) of which the second buffer is in use,

uses the third address controller when sending data in the first transfer step, and uses the first address controller when receiving data in the first transfer step,

uses the fourth address controller when sending data in the second transfer step, and uses the second address controller when receiving data in the second transfer step, and

when the first and second transfers are mixed, switches between the first and second address controllers when receiving, and switches between the third and fourth address controllers when sending,

and the address controllers are characterized by changing the address only at the breaks between packets during the data transfer operation.

8. A parallel processing system according to claim 7 wherein the packets used comprise a field indicating the transfer mode,

a field indicating the switch address in the network,

a field indicating a broadcast, which signifies transfer to all switches, and

a field indicating the packet length.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a means of achieving high functionality and high speed operation in the data transfer component of parallel processing systems, which are widely anticipated in the computer field for high speed processing applications.

2. Description of the Prior Art

Widespread use of large-scale mathematical simulations has significantly increased demand for higher operating speeds in computer processing systems. Parallel processing systems have been developed as one of the most promising future supercomputer technologies, and various systems have been described in the literature.

In a parallel processing architecture, however, data is transferred between processor elements with significantly greater frequency, and the performance and functionality of the data transfer operation significantly affects overall system performance. More specifically, the greatest problems faced in improving the performance of a parallel processing computer are the performance of the individual processors, the software, and the processor-processor data transfer capacity and functionality. This has led to numerous proposals relating specifically to transferring data between processors.

A typical parallel processing system according to the prior art is described below with reference to FIGS. 10 and 11, a block diagram of the conventional parallel processing system and a diagram of the data packet configuration, respectively. It is to be noted that this device has been proposed in Japanese Patent Laid-Open No. S63-124162.

As shown in FIG. 10, this device comprises row crossbar switches 50a and 50b, column crossbar switches 51a and 51b, and element processors 53a-53d. Each element processor 53 comprises input and output ports to row and column crossbar switches 50 and 51. Each data packet (FIG. 11) comprises a header, which contains two switch addresses EW and SN and a routing reset bit R, and a data area.

The operation whereby a packet is transferred from one element processor 53a to another element processor 53d in this conventional parallel processing system is described below.

The packet is transferred in sequence from the element processor 53a to the row crossbar switch 50a, element processor 53c, column crossbar switch 51b, and then to the element processor 53d. The switch addresses EW and SN specify the column and row, respectively, for this operation. In this example both addresses are set to 1. If an error is detected on this route, the routing reset bit R is set to 1, and the packet is resent. If, in this example, an error occurs in the intermediate element processor 53c, the packet is sent the next time from the element processor 53a to the column crossbar switch 51a, element processor 53b, row crossbar switch 50b, and then to the addressed element processor 53d. It is therefore possible to transfer data packets between element processors 53 even if an error occurs in one of the element processors 53 used for routing.

However, the following problems are presented by this configuration.

First, Japanese Patent Laid-Open No. S63-124162 describes only the method of sending data (called "storing") from one element processor to another element processor, and does not describe the process whereby one element processor reads data (called "loading") from another element processor. The loading operation, however, can be more easily handled directly in software, making the loading operation preferable because of the greater flexibility permitted in the software. If both loading and storing operations are supported, however, the memory distributed among each of the processor elements can be freely accessed from any part of the architecture. This results in more flexible software, and a system with higher general utility.

A further drawback is the need to use a single, common packet length throughout the system because there is no packet length information contained in the header. This means that different length packets cannot be handled in this system.

In addition, broadcasting data from one element processor to all other element processors is only possible by addressing the data individually to each of the other element processors.

Finally, this architecture requires data locking measures when a large number of packets is transferred. This is not declared in the application.

SUMMARY OF THE INVENTION

Therefore, an object of the present invention is to provide a data transfer apparatus and parallel processing system which supports data loading, enables packet transmission without data locking occurring when many packets of different lengths are handled, and can enable high speed data broadcasts.

To achieve these objects, a data transfer apparatus according to the present invention comprises a memory port connected to an external memory area; a network port connected to a network; buffers connected between the memory port and the network port for holding the data flowing between the two ports; an interpreter and evaluation means connected to the memory port side of the buffers for interpreting the operating mode encoded in the transfer data; and an address pointer pointing to a specific address and connected to the interpreter and evaluation means.

When data flows from the network port to the memory port, the interpreter and evaluation means decodes the memory port buffer output. When the memory port buffer output is interpreted to be a memory read request and an immediate response is enabled as determined by a buffer not-full state, the interpreter and evaluation means enters the corresponding memory-read sequence to read the data from the memory area specified by the memory read request. When an immediate response is not enabled as determined by a buffer full state, the interpreter and evaluation means writes the memory read request to the memory area indicated by the address pointer so that the interpreter and evaluation means can fetch the memory read request from this same memory area when response becomes possible as determined by a buffer not-full state, interpret the read memory read request, and thus proceed with the corresponding memory-read sequence.

A data transfer apparatus according to an alternative embodiment of the invention comprises a memory port connected to an external memory area; first and second network ports connected to a network of switches; a data selector of which the output is connected to the second network port; a first buffer of which the input is connected to the first network port, and the output is connected to the memory port; a second buffer of which the output is connected to the first input of the data selector, and the input is connected to the first network port; a third buffer of which the output is connected to the second input of the data selector, and the input is connected to the memory port; a first packet interpreter for detecting the packet length included in the data input from the first network port; first and second address controllers for setting the network switch addresses according to the output from the first packet interpreter; a first address selector of which the input is the output from the first and second address controllers, and the output is connected to the first network port; a first control means for controlling the data transfer to the first network port; a second packet interpreter for detecting the packet length included in the data input from the third buffer to determine the packet length; a third address controller for setting the network switch addresses according to the output from the second packet interpreter; a third packet interpreter for detecting the packet length included in the data input from the second buffer to determine the packet length; a fourth address controller for setting the network switch addresses according to the output from the third packet interpreter; a second address selector of which the input is the output from the third and fourth address controllers, and the output is connected to the second network port; and a second control means for controlling the data receiving operation of the second network port.

A parallel processing system according to the invention comprises an N.times.N matrix of processor elements arranged in a two-dimensional array with a data transfer apparatus described as the alternative embodiment above functioning as the interface to the network, and an N.times.N.times.N matrix of switches in a three-dimensional arrangement with each switch comprising two internal buffers.

When in this network the processor element in row i column j is identified as PEij and the number-k switch at row i column j is identified as SWijk where i, j, k, and l are integers greater than zero and less than or equal to (N-1), one network port of PEij is connected in common to one terminal of N switches SWijk (k=0, 1, . . . N-1), the other network port of PEij is connected in common to one terminal of N switches SWjli (1=0, 1, . . . N-1), the other terminal of SWijk is connected to PEki, and the other terminal of SWjli is connected to PEjl. When plural packets comprising a header and plural data blocks are transferred from PEij to PElk, the packets are transferred from PEij through the first buffer of SWijk to PEki in the first transfer step, and from PEki through the second buffer of SWkil to PElk in the second transfer step.

In data transfer apparatus PEki, the first address controller specifies the addresses of those N switches SWijk (j=0, 1, . . . N-1) of which the first buffer is in use, the second address controller specifies the addresses of those N switches SWijk (j=0, 1, . . . N-1) of which the second buffer is in use, the third address controller specifies the addresses of those N switches SWkil (1=0, 1, . . . N=1) of which the first buffer is in use, and the fourth address controller specifies the addresses of those N switches SWkil (1=0, 1, . . . N-1) of which the second buffer is in use. The third address controller of the data transfer apparatus PEki is used when sending data in the first transfer step, and the first address controller is used when receiving data in the first transfer step. The fourth address controller is used when sending data in the second transfer step, and the second address controller is used when receiving data in the second transfer step. When the first and second transfers are mixed, the data transfer apparatus PEki switches between the first and second address controllers when receiving, and switches between the third and fourth address controllers when sending. The address controllers are further characterized by changing the address only at the breaks between packets during the data transfer operation.

In addition, this parallel processing system uses packets comprising a field indicating the transfer mode, a field indicating the switch address in the network, a field indicating a broadcast, which signifies transfer to all switches, and a field indicating the packet length.

Operation

When the data transfer apparatus of the invention as thus described cannot respond immediately to a memory read request because the buffer is full, it buffers the request to the external memory area. When the buffer is not full and the memory read request can be answered, the request is read and interpreted, and the response sequence is entered. As a result, deadlocks caused by a cannot-response state are avoided.

When the first and second transfer sequence packets are mixed because of this configuration, the parallel processing system of the invention operates as follows. In the following description of this network, the processor element in row i column j in the two-dimensional array of N.times.N processor elements is identified as PEij, and the number-k switch at row i column j in the three dimensional array of N.times.N.times.N switches is identified as SWijk where i, j, k, and l are integers greater than zero and less than or equal to (N-1).

Basically, the first address controller specifies the addresses of those N switches SWijk (j=0, 1, . . . N-1) of which the first buffer is in use, the second address controller specifies the addresses of those N switches SWijk (j=0, 1, . . . N-1) of which the second buffer is in use, the third address controller specifies the addresses of those N switches SWkil (1=0, 1, . . . N-1) of which the first buffer is in use, and the fourth address controller specifies the addresses of those N switches SWkil (1=0, 1, . . . N-1) of which the second buffer is in use.

During the first transfer step, the first address controller is used when receiving data, and the third address controller is used when sending data.

During the second transfer step, the second address controller is used when receiving data, and the fourth address controller is used when sending data.

When the first and second transfers are mixed, the data transfer apparatus switches between the first and second address controllers when receiving, and switches between the third and fourth address controllers when sending. Each of the address controllers changes the memory address at the end of each packet being sent.

In addition, the header of the packets used in the invention comprise a field specifying the packet length and whether the packet is to be broadcast, i.e., sent to all other processor elements. By interpreting this header data, plural packets of varying lengths can be handled dynamically in realtime, and packet broadcasting can be completed much more quickly. As a result, data transfer is possible even when packets of varying lengths are combined in the transfer operation.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will become more fully understood from the detailed description given below and the accompanying diagrams wherein:

FIG. 1 is a block diagram of the data transfer apparatus according to the first embodiment of the invention, and the processor element with peripheral connections,

FIGS. 2(a) and 2(b) show the data format of the packet in a memory read request-and the memory read response of this embodiment,

FIG. 3 is a block diagram of a parallel processing system using the data transfer apparatus of the first embodiment,

FIG. 4 is a block diagram of the data transfer apparatus according to the second embodiment of the invention,

FIG. 5 is a block diagram of a parallel processing system using the data transfer apparatus according to the second embodiment of the invention,

FIG. 6 is a more detailed illustration of the connections used in the parallel processing system shown in FIG. 5,

FIG. 7 shows the data format of the packet used in this parallel processing system,

FIG. 8 is a flow chart of the sending algorithm used in this parallel processing system,

FIG. 9 is a flow chart of the receiving algorithm used in this parallel processing system,

FIG. 10 is a block diagram of a conventional parallel processing system, and

FIG. 11 shows the conventional packet format.

DESCRIPTION OF PREFERRED EMBODIMENTS

The preferred embodiments of the data transfer apparatus and parallel processing system according to the present invention are described hereinbelow with reference to the accompanying figures.

First Embodiment

This first embodiment is described below during execution of the load operation, i.e., when one processor element reads data from another processor element.

FIG. 1 is a block diagram of the data transfer apparatus according to the first embodiment of the invention, and the processor element with peripheral connections. FIG. 2(a) shows the data format of the packet in a memory read request of this embodiment, and FIG. 2(b) shows the data format of the packet of the memory read response. FIG. 3 is a block diagram of a parallel processing system using the data transfer apparatus of the first embodiment.

The data transfer apparatuses 1, 1a, and 1b shown in these figures are connected to an external memory area 2, 2a, and 2b through the memory ports 3, and through the network ports 4 to the network 39. Buffers 5a, 5b, 5c, and 5d connected between the memory port 3 and network port 4 hold the data flowing between the memory and network ports, and the interpreter and evaluation means 6 is connected to the memory port side of the buffer 5. The value of the address pointers 7a, 7b to the memory area is set according to the variable contents of the register 15. An address bus 8, data bus 9, and selectors 16a, 16b are also provided. The request storage means 17 counts the number of requests based on the difference between the two address pointers 7a and 7b. Reference numbers 18 and 19 denote address pointer control means and buffer control means, respectively. The data packets comprise a data area 21, operating mode 22, and memory address area 26. Each processor element (PE) 30, 30a, and 30b comprises a data transfer apparatus 1, external memory 2, and processor 35, 35a, 35b, and the buses and ports required for the connections between them.

As shown in FIG. 3, each PE 30a, 30b is connected to the network 39. The network shown here uses the simplest bus connection with only two processor elements shown. As shown in FIG. 1, each processor element has the processor 35, external memory 2, and data transfer apparatus 1 connected commonly to a bus.

Referring to FIGS. 2(a) and 2(b), the possible operating modes written in the operating mode 22 of the packet include:

00: write to another processor,

10: read from another processor (request),

11: read from another processor (reply).

The operation of the data transfer apparatus described above when the one PE 30b reads from the memory assigned to the other PE 30a is described below with reference to FIGS. 1, 2(a) and 2(b) and 3. This is the "load" function discussed above in the description of the prior art. In this operation, the read request is output from PE 30b to PE 30a, and the response then flows from PE 30a back to PE 30b. For simplicity, the following description focuses on the operation of PE 30a, and the apparatus shown in FIG. 1 is assumed to be specifically PE 30a.

The external memory 2 read request is input from the PE 30b through the network port 4. The request is stored in the buffer 5a, and input by the selector 16a to the interpreter and evaluation means 6 by which the request is decoded. At this time the operating mode area 22 shown in FIG. 2a is decoded as `10,` and thus determined to be a memory read request.

If the request can be immediately replied to (because the buffer is not full), the data is read from the external memory 2 and transferred through the buffer 5b to the network port 4. The packet sent at this time is as shown in FIG. 2(b), i.e., the data area 21 is now contained in the packet. The memory read address is contained in the read request data (the memory address area 26) as shown in FIG. 2(a), and the interpreter and evaluation means 6 outputs this address over the address bus 8. The data is transferred over the data bus 9. The case that the request can be immediately replied to is a case that the buffer 5b is not full. This is decided by the buffer control means 19.

It is also possible that the request cannot be immediately complied with. This occurs when the buffer 5b is full and, therefore, the data cannot be stored when read from memory. When this occurs, the request is temporarily stored in the external memory 2, and a flag indicating that the request is stored in external memory 2 is simultaneously set in the request storage means 17.

The address of this external memory 2 is generated by the address pointer 7a under control of the address pointer control means 18 and output from the selector 16b to the address bus 8. It is possible to use a constant value as the initialization value of this address, but in this embodiment the initialization value is externally set in the register 15 through the data bus 9.

It is important to note that the data transfer apparatus can be used to perform another operation until it is possible to respond to the read request because the request-stored flag is held in the request storage means 17. For example, it is possible, after receiving this read request, to receive data through the network port 4 and write this data to the external memory 2 through the buffer 5a. When it becomes possible to respond to the previous read request, i.e., when the buffer 5b full state is cleared, the data transfer apparatus 1 outputs the address pointer 7b through the selector 16b to the address bus 8, reads the stored request from the external memory 2, and simultaneously clears the request-stored flag in the request storage means 17. The address pointer 7b generates an address under control of the address pointer control means 18 which receives outputs from the request storage means 17 and the buffer control means 19 and outputs the same through the selector 16b. The stored request is then input to the interpreter and evaluation means 6 through the selector 16a , and interpreted. Thereafter, operation proceeds as when it is initially possible to respond immediately to the read request: the data is read from the external memory 2 and output through the buffer 5b to the network port 4.

The address pointers 7a, 7b and the request storage means 17 are described in greater detail below. Note that these can function effectively in response to plural requests.

The first step is to set the initialization values in the register 15. One address pointer 7a is used when writing to the external memory 2, and another address pointer 7b is used when reading. In both cases, the address pointer value is advanced by the address pointer control means 18 after the operation is completed. As a result, one address pointer 7a can be used as a write pointer, and the other address pointer 7b as a read pointer, the external memory 2 is operated as a first-in, first-out (FIFO) data storage area, and plural requests can be buffered. The size of the buffer area is also set in the register 15. Once the address pointer values are advanced to the end of the buffer area, they are reset to the initialization value at the next pointer assignment. It is also possible with a FIFO device to calculate the amount of data stored in the device by obtaining the difference between the write and read pointers. As a result, the number of read requests stored to the external memory 2 can be determined by using the request storage means 17 to detect the difference between the two address pointers 7a and 7b.

Deadlocks are described below with reference to FIG. 3.

A deadlock would normally occur when both PE 30a and PE 30b are sending data to the other processor element, all buffers 5a, 5b, 5c, and 5c are full, and both PE 30a and PE 30b are simultaneously attempting to read the memory of the other processor element, i.e., the first output from both buffer 5a and buffer 5c is a memory read request. In this state it is not possible to respond because both buffer 5b and buffer 5d are full. In addition, buffer 5b and buffer 5d cannot send data to buffer 5a and buffer 5c because they are full. All buffers therefore remain full, and the system assumes a deadlocked state because all of the buffers are full and the full state cannot be cleared.

The control sequence of the present invention as described above functions effectively at this time to prevent a deadlock by temporarily storing the read request to the external memory 2. As a result, buffer 5a and buffer 5c sequentially output the buffered data to the external memory 2, and the deadlock does not occur. The memory read operation can be executed by dynamically monitoring the buffer states, specifically by waiting until the buffer 5b, 5d is not full. Because the control sequence is the key to this operation, minimal hardware is required, making it an extremely effective means of avoiding deadlock states.

As described hereinabove, a data transfer apparatus according to the present invention makes it possible to read data from the memory of another processor element. The software is also extremely flexible, and a system with wide general applicability can be achieved in a parallel processing system configured with this data transfer apparatus because the memory assigned to each processor element can be dynamically accessed from anywhere in the system.

It is to be noted that the number of processor elements and the network configuration of the present embodiment are not limited to those as described above.

Second Embodiment

The second embodiment of the invention is described when sending packets between any desired processor elements. FIG. 4 is a block diagram of a data transfer apparatus according to the second embodiment, and FIG. 5 is a block diagram of the parallel processing system using the data transfer apparatus shown in FIG. 4. FIG. 6 is a more detailed illustration of the connections used in the parallel processing system shown in FIG. 5. The connections between the processor elements and switches are the same throughout the system.

In FIG. 5 the processor elements are shown as 30a-30d, the switches (SW) as 31a-31h, and the data buses as 32a-32h.

In FIG. 6, the data bus is shown as 9, the processor as 35, the state evaluation means for judging full or empty of buffers provided in respective switches as 36a and 36b, the state buses as 38a-38d, the address buses as 37a and 37b, the network as 39, the buffer selection signal as 40, and the broadcast signal as 41. It is to be noted that this network 39 is described in detail in the Parallel Computer in U.S. Pat. No. 4,514,807. This network 39 is described briefly below because it is essential to the description of FIG. 6, but the description in U.S. Pat. No. 4,514,807 should be referenced for greater detail.

The configuration and operation of the data transfer apparatus are first described with reference to FIG. 4.

The data transfer apparatus is connected to the external memory area through the memory port 3, and through the first and second network ports 4a, 4b to the network 39. There are three buffers 5a, 5b, and 5c. The first buffer 5a input is connected to the network port 4a, and the output to the memory port 3. The second buffer 5b output is connected to the first input of the data selector 13, and the input to the network port 4a. The third buffer 5c output is connected to the second input of the data selector 13, and the input to the memory port 3.

There are four address controllers 10a, 10b, 10c, and 10d. The first and second address controllers 10a and 10b set the network switch addresses according to the output from the first packet interpreter 12a. The third address controller 10c sets the network switch addresses according to the output from the second packet interpreter 12b. The fourth address controller 10d sets the network switch addresses according to the output from the third packet interpreter 12c.

The first control means 11a controls the transfer of data to the first network port 4a, and the second control means 11b controls the transfer of data to the second network port 4b.

The first packet interpreter 12a detects the packet length included in the data input from the first network port 4a. The second packet interpreter 12b detects the packet length included in the data input from the third buffer 5c to determine the packet length. The third packet interpreter 12c detects the packet length included in the data input from the second buffer 5b to determine the packet length.

The data selector 13 inputs are from the second buffer 5b and third buffer 5c. The first address selector 14a inputs are received from the first and second address controllers 10a, 10b, and the output is connected to the first network port 4a. The second address selector 14b inputs are received from the third and fourth address controllers 10c, 10d, and the output is connected to the second network port 4b.

The configuration of the packets used in this parallel processing system is shown in FIG. 7. Each packet comprises a header 20 and one or more data words 21a-21e. The header 20 comprises an operating mode area 22, which is the field indicating the transfer mode, a broadcast field 23 used to indicate that the packet is to be sent to all switches (i.e., to be broadcast), network address fields 24a and 24b containing the switch address in the network, and packet length 25 information. The length of the data area 21 in the packet shown in FIG. 7 is five words 21a-21e, and this information is contained in the packet length 25. The operating mode area 22 in FIG. 7 is configured as shown in FIG. 2.

In the following description of processor element operation according to this embodiment, the operating mode 22 value is set to `00` to cause a send-relay-receive operation.

Referring to FIG. 4, the data flows in three directions: (1) from network port 4a to first buffer 5a and memory port 3, (2) from network port 4a to second buffer 5b and network port 4b, and (3) from memory port 3 to third buffer 5c and network port 4b. Referring to the parallel processing system in FIG. 5, flow (1) corresponds to data receiving, flow (2) to data relay, and flow (3) to data sending.

There are two sets of address controllers 10, control means 11, and packet interpreters 12, one for sending and one for receiving data. During receiving, data input from the one network port 4a is interpreted by the first packet interpreter 12a and the packet length is set in the address controllers 10a and 10b. The first address controller 10a sets the SW buffer 5d address during a data relay operation, and the second address controller 10b sets the SW buffer 5e address during a data receive operation. The first control means 11a switches the first address selector 14a to control whether data is written to the first buffer 5a or the second buffer 5b.

When sending data, the output from the second buffer 5b is interpreted by the third packet interpreter 12c to set the packet length in the fourth address controller 10d. The output from the third buffer 5c is interpreted by the second packet interpreter 12b to set the packet length in the third address controller 10c. The second control means 11b switches between the data selector 13 and the second address selector 14b.

The method of connecting the processor elements is described below with reference to FIGS. 5 and 6.

The following naming convention is used to identify PE 30a-30d and SW 31a-31h below. Specifically, PE 30a, 30b, 30c, and 30d are identified as PE00, PE01, PE10, and PE11, and SW 31a, 31b, 31c, 31d, 31e, 31f, 31g, and 31h as SW000, SW001, SW010, S1011, SW100, SW101, SW110, and SW111, respectively. One PE is connected to another PE through a SW where the relationship of the connections is defined as PEij.fwdarw.SWijk.fwdarw.PEki. For example, PE10 is connected to PE01 through SW100 (PE10.fwdarw.SW100.fwdarw.PE01), and to PE11 through SW101 (PE10.fwdarw.SW101.fwdarw.PE11). It follows that communication between any two processor elements can be achieved through the route PEij.fwdarw.SWijk.fwdarw.PEki.fwdarw.SWkil.fwdarw.PElk.

The transfer operation is described in detail below with reference to FIG. 6.

The first transfer operation is to read from the external memory 2 of the PE and write the data to the relay buffer 5b of the PE over the network 39. The second transfer operation is to read from the relay buffer 5b of the PE and write the data to the external memory 2 of the PE over the network 39.

The send-receive-relay data flow mentioned earlier is described first. The address controller used is shown in parentheses () after the switch.

SEND: external memory 2.fwdarw.third buffer 5c of the data transfer apparatus 1.fwdarw.buffer 5d of SW 31a (third address controller 10c)

RELAY: buffer 5d of SW 31a (first address controller 10a).fwdarw.buffer 5b of the data transfer apparatus 1.fwdarw.buffer 5e of SW 31a (fourth address controller 10d)

RECEIVE: buffer 5e of SW 31a (second address controller 10b).fwdarw.buffer 5a of the data transfer apparatus 1.fwdarw.external memory 2.

Using the above convention, communication between any given processor elements flows as follows: (PEij, external memory 2.fwdarw.third buffer 5c).fwdarw.SWijk, buffer 5d).fwdarw.(PEki, buffer 5b).fwdarw.(SWkil, buffer 5e).fwdarw.(PElk, buffer 5a.fwdarw.external memory 2). In other words, the switches are used to select one buffer 5d for the first data transfer operation, and to select another buffer 5e for the second data transfer operation. In addition, the data transfer apparatus 1 uses the first address controller 10a and third address controller 10c to select the switches for the first data transfer operation, and the second address controller 10b and fourth address controller 10d to select the switches for the second data transfer operation.

The signal buses are described next. The state buses indicate whether data can be transferred, or more specifically the state of the buffers 5d and 5e.

In a buffer full state, data cannot be sent from the PE to the SW, but can when the buffer is not full. In a buffer empty state, data cannot be sent from the SW to the PE, but can if the buffer is not empty. These states are detected by the state evaluation means 36a and 36b. State bus 38a (for PE.fwdarw.SW) and state bus 38c (for SW.fwdarw.PE) correspond to buffer 5d, and state bus 38b (for PE.fwdarw.SW) and state bus 38d (for SW.fwdarw.PE) correspond to buffer 5e. It is to be noted that the SW selected by the address bus outputs, and the other switches are in a high impedance state. An open drain system can also be used.

If buffer selection signal 40a and 40b are 0, buffer 5d is selected; if 1, buffer 5e is selected.

The address buses 37a, 37b determine which switch is selected. For example, if address bus 37a is 0, data is obtained from SW 31a; if 1, SW 31b.

The operation of each part is followed starting with the send operation below and focusing on PE 30a and SW 31a. FIG. 4 is used when referring to the data transfer apparatus 1.

(1) TRANSMISSION

The data transfer apparatus 1 reads from the external memory 2 and stores the data in the third buffer