WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Parallel processor memory transfer system using parallel transfers between processors and staging registers and sequential transfers between staging registers and memory    
United States Patent5581777   
Link to this pagehttp://www.wikipatents.com/5581777.html
Inventor(s)Kim; Won S. (Fremont, CA); Bulfer; David M. (Mountain View, CA); Nickolls; John R. (Los Altos, CA); Blank; W. Thomas (Palo Alto, CA); Figel; Hannes (Milpitas, CA)
AbstractA massively parallel processor is provided with a plurality of clusters. Each cluster includes a plurality of processor elements ("PEs") and a cluster memory. Each PE of the cluster has associated with it an address register, a stage register, an error register, a PE enable flag, a memory flag, and a grant request flag. A cluster data bus and an error bus connects each of the stage registers and error registers of the cluster to the memory. The grant request flags of the cluster are interconnected by a polling network, which polls only one of the grant request flags at a time. In response to a signal on the polling network and the state of the associated memory flag, the grant request flag determines an I/O operation between the associated data register and the cluster memory over the cluster data bus. Both data and error bits are associated with respective processor elements. The sequential memory operations proceed in parallel with parallel processor operations. The sequential memory operations also may be queued. Addressing modes include direct and indirect. In direct address mode, a PE addresses its own address space by appending its PE number to a broadcast partial address. The broadcast partial address is furnished over a broadcast bus, and the PE number is furnished on a cluster address bus. In indirect address mode, a PE addresses either its own address space or that of other PEs in its cluster by locally calculating a partial address, then appending to it either its own PE number or that of another PE in its cluster. The full address is furnished over the cluster address bus.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5581777
Parallel processor memory transfer system using parallel transfers

     between processors and staging registers and sequential transfers

     between staging registers and memory - US Patent 5581777 Drawing
Parallel processor memory transfer system using parallel transfers between processors and staging registers and sequential transfers between staging registers and memory
Inventor     Kim; Won S. (Fremont, CA); Bulfer; David M. (Mountain View, CA); Nickolls; John R. (Los Altos, CA); Blank; W. Thomas (Palo Alto, CA); Figel; Hannes (Milpitas, CA)
Owner/Assignee     MasPar Computer Corporation (Sunnyvale, CA)
Patent assignment
All assignments
Publication Date     December 3, 1996
Application Number     08/400,411
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     March 3, 1995
US Classification     712/16 711/154
Int'l Classification     G06F 015/76 G06F 013/38
Examiner     Ellis; Richard L.
Assistant Examiner    
Attorney/Law Firm     Skjerven, Morrill, MacPherson, Franklin & Friel
Address
Parent Case     This application is a continuation of application Ser. No. 08/159,916, filed Nov. 30, 1993, which is a continuation of application Ser. No. 07/461,567, filed Jan. 05, 1990 both now abandoned.
Priority Data    
USPTO Field of Search     395/375 395/425 395/800 395/427 395/481
Patent Tags     parallel processor memory transfer parallel transfers between processors staging registers sequential transfers between staging registers memory
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
3863233



[0 after 0 votes]
5165023
Gifford
710/317
Nov,1992

[0 after 0 votes]
5060186
Barbagelata
711/2
Oct,1991

[0 after 0 votes]
5010476
Davis
711/147
Apr,1991

[0 after 0 votes]
4891787
Gifford
712/205
Jan,1990

[0 after 0 votes]
4873626
Gifford
710/120
Oct,1989

[0 after 0 votes]
4860249
Nicely
712/16
Aug,1989

[0 after 0 votes]
4852048
Morton
712/11
Jul,1989

[0 after 0 votes]
4827403
Steele, Jr.
712/13
May,1989

[0 after 0 votes]
4805173
Hillis
714/765
Feb,1989

[0 after 0 votes]
4805091
Thiel
712/12
Feb,1989

[0 after 0 votes]
4797815
Moore
710/109
Jan,1989

[0 after 0 votes]
4791641
Hillis
714/757
Dec,1988

[0 after 0 votes]
4783738
Li
712/21
Nov,1988

[0 after 0 votes]
4773038
Hillis
703/20
Sep,1988

[0 after 0 votes]
4719621
May
370/402
Jan,1988

[0 after 0 votes]
4709327
Hillis
712/14
Nov,1987

[0 after 0 votes]
4598400
Hillis
370/400
Jul,1986

[0 after 0 votes]
4591981
Kassabov
712/22
May,1986

[0 after 0 votes]
4481580
Martin
710/305
Nov,1984

[0 after 0 votes]
4462073
Grondalski
712/207
Jul,1984

[0 after 0 votes]
4447877
Grondalski
713/502
May,1984

[0 after 0 votes]
4316244
Grondalski
711/168
Feb,1982

[0 after 0 votes]
4314349
Batcher
708/230
Feb,1982

[0 after 0 votes]
3987419
Morrill
711/112
Oct,1976

[0 after 0 votes]
3936806
Batcher
712/10
Feb,1976

[0 after 0 votes]
3812467
Batcher
712/300
May,1974

[0 after 0 votes]
3800289
Batcher
711/104
Mar,1974

[0 after 0 votes]
4980854
Donaldson
710/117
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claim is:

1. A method for performing a memory write operation in a parallel processor memory system that includes a processor controller, a transfer controller, a plurality of processor elements, and a memory, each of said processor elements having a processor, said method comprising the steps of:

for each of a plurality of clusters in parallel with the other of said clusters, wherein each of the clusters comprises a respective subset of the processor elements and a plurality of stage registers respectively connected to the processor elements of the subset, selecting a further subset of the processor elements therein for participation in a memory write operation;

placing said processor controller in an interrupted state; for each of said clusters in parallel with the other of said clusters, transferring data in parallel from the processor elements of said selected further subset of processor elements to the stage registers of said selected further subset of processor elements while said processor controller is in said interrupted state;

placing said processor controller in a non-interrupted state following completion of said processor-to-stage register data transferring step;

placing said transfer controller in a memory busy state following the completion of said processor-to-stage register data transferring step; for each of said clusters concurrently with the other of said clusters, transferring data sequentially from the stage registers of said selected further subset of processor elements to an associated memory while said transfer controller is in said memory bUSY state; and placing said transfer controller in a non-memory busy state following the completion of said stage register-to-memory data transferring step.

2. A method as in claim 1, wherein for each of said clusters, said selected further subset is all of the processor elements in said each cluster.

3. A method as in claim 1, wherein for each of said clusters, said selected further subset is less than all of the processor elements in said each cluster.

4. A method as in claim 1 wherein said further processor element subset selecting step comprises the step, for each of said clusters in parallel, of assigning values to respective associated data transfer enable flags of the processor elements of said each cluster for indicating which of the processor elements of said each cluster participate in said parallel data transferring step.

5. A method for performing a memory read operation in a parallel processor memory system that includes a processor controller, a transfer controller, a plurality of processor elements, and a memory, each of said processor elements having a processor, said method comprising the steps of:

for each of a plurality of clusters in parallel with the other of said clusters, wherein each of the clusters comprises a respective subset of the processor elements and a plurality of stage registers respectively connected to the processor elements of the subset, selecting a further subset of the processor elements therein for participation in a memory read operation;

placing said transfer controller in a memory busy state; for each of said clusters concurrently with the other of said clusters, transferring data sequentially from an associated memory to the stage registers of said selected further subset of processor elements while said transfer controller is in said memory state;

placing said transfer controller in a non-memory busy state following completion of said memory-to-stage register data transferring step;

placing said processor controller in an interrupted state followinq the completion of said memory-to-stage register data transferring step;

for each of said clusters in parallel with the other of said clusters, transferring data in parallel from the stage registers of said selected further subset of processor elements to the processors of said selected further subset of processor elements while said processor controller is in said interrupted state; and

placing said processor controller in a non-interrupted state following the completion of said stage register-to-processor data transferring step.

6. A method as in claim 5, wherein for each of said clusters, said selected further subset is all of the processor elements in said each cluster.

7. A method as in claim 5, wherein for each of said clusters, said selected further subset is less than all of the processor elements in said each cluster. elements of said each cluster having their respective grant request flags set and the memory associated with said each cluster.

8. A method as in claim 5 wherein said further processor element subset selecting step comprises the step, for each of said clusters in parallel, of assigning values to respective associated data transfer enable flags of the processor elements of said each cluster for indicating which of the processor elements of said each cluster participate in said parallel data transferring step.
 Description Submit all comments and votes
 


CROSS REFERENCE TO RELATED APPLICATIONS

The following commonly-assigned patent applications, which are filed on even date herewith, are hereby incorporated herein by reference: Nickolls et al., "Scalable Inter-Processor and Processor to I/O Messaging System for Parallel Processing Arrays, " Ser. No. 07/461,492, Taylor, "Network and Method for Interconnecting Router elements Within Parallel Computer System," Ser. No. 07/467,572, and Zapisek U.S. Pat. No. 5,313.590, "Router Chip with Quad-Crossbar and Hyperbar Personalities" Ser. No. 07/461,551, now U.S. Pat. No. 5,434,977.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to data transfers systems for massively parallel processors, and more specifically to data addressing and the transfer of data between a number of SIMD parallel processors arranged in a cluster and a common cluster memory.

2. Background of the Invention

Parallel processors have been developed that are based on the concurrent execution of the same instruction by a large number of relatively simple "processor elements" operating on respective data streams. These processors, known as single instruction, multiple data ("SIMD") processors, are useful in such applications as image processing, signal processing, artificial intelligence, data base operations, and simulations.

Typically, a SIMD processor includes an array of processor elements and a routing network over which the results of calculations and operandi are communicated among the processor elements and input/output ("I/O") devices. The operations of the processor elements and of the routing network are controlled by a separate control processor, in response to instructions and data furnished from a computer subsystem.

A recent SIMD processor is described in U.S. Pat. No. 4,314,349, issued Feb. 2, 1982 to Batcher. A processing element constitutes the basic building block, and each processing element is connected to a uniquely associated random access memory by a bi-directional data bus. The data bus is the main data path for the processing element. During each machine cycle, one bit of data can be transferred from any one of six sources, viz. a bit read from RAM memory, the state of the B, C, P, or S register, or the state of an equivalence function. The destination of a data bit on the data bus may be one or more of, for example, the following: the address location in the RAM memory, the A, G, or S register, the logic associated with the P register, the input to a sum-OR tree, and the input to the parity tree. It will be appreciated that during a memory I/O operation, the bus is reserved for memory data and other operations requiring bus access may not proceed.

The SIMD processor described in U.S. Pat. No. 4,805,173, issued Feb. 14, 1989 to Hillis et al. includes an error control and correction technique which operates across multiple processors and multiple computer memories. Each integrated circuit includes sixteen processors which are connected to an associated memory through a memory interface. The memory is in the form of 22 4K.times.1 bit RAMs. Each of 16 4K.times.1 slices functions as the memory for a different one of the 16 processors, and the remaining 6 4K.times.1 bit slices store parity or syndrome bits for the data stored in the memories of the 16 processors. Parallel data is read from or written to each integrated circuit at the address specified by an address decoder. In a memory read operation, the memory is read in parallel one row at a time to produce data outputs on 16 output lines and parity outputs on 6 additional output lines. These signals then are applied in parallel to error control circuitry for detection and correction of parity errors. In a memory write operation, data is written in parallel from the 16 processors into the 16 memory slices at the given address, and the 6 syndrome outputs are written into the 6 other memory slices at the same addresses and at the same time as the data used in generating the syndrome outputs are stored in the 16 memory slices. It will be appreciated that the error control and correction technique requires that all 16 memory slices be read or written in parallel.

A SIMD processor related to the Hillis et al. '173 processor appears in U.S. Pat. No. 4,791,641, issued Dec. 13, 1988 to Hillis. The error correction system of the Hillis patent treats data for plural memories associated with plural processors as a unitary data word, and generates an error code for the unitary data word. It will be appreciated that the error correction system contemplates that the unitary data word be read or written as a unit.

In parallel processor systems, the size of the memory per processor element tends to be small. Nonetheless, because of the great number of processor elements, the amount of memory required by a parallel processor is large. Unfortunately, memory such as SRAM with speeds comparable to that of even simple microprocessors tends to be expensive. Unfortunately, the less expensive memory such as DRAM is relatively slow and its use in a parallel processor would be expected to compromise performance by causing the processor elements to idle while the memory operation is completed.

SUMMARY OF THE INVENTION

The memory system architecture of our invention permits data transfer operations between memory and processors to proceed concurrently with parallel processing operations. Relatively slow discrete memory may be used without excessively degrading the overall speed of the parallel processor. In one embodiment having DRAMs, memory is utilized at or near the maximum memory bandwidth in both sequential page mode and random mode.

The memory system architecture of our invention provides for addressing by a processor element of its own memory space to be made in accordance with an address furnished to all processor elements by an array control unit, or computed locally by the processor element. Such addressing is possible even as to memory words that includes both data bits and error detection and correction bits.

These and other advantages are realized in the present invention, a memory system suitable for, for example, a parallel processor having a plurality of clusters, each having a plurality of processor elements. In one embodiment, each processor element has an enable flag. The memory system includes a number of memory flags, each associated with a respective processor element. The memory system also includes a number of stage registers, each associated with a respective processor element. A memory is provided, and a cluster data bus connects each of the stage registers of the cluster to the memory. In addition, a number of grant request flags interconnected in a polling network are provided. Each of the grant request flags is associated with a respective processor element and is responsive to a signal on the polling network and the state of the associated memory flag for determining I/O operation between the associated data register and the memory.

In one variation of the foregoing, a number of address registers are provided, each associated with a respective processor element. The address registers are connected to the memory over a cluster address bus, and each of the grant request flags is responsive to the polling network and to the state of the associated memory flag for determining an I/O operation between the associated data register and the memory in accordance with the contents of the associated address register.

In another variation, an address bus is connected to the memory from the array control unit of the parallel processor. Each of the grant request flags is responsive to the polling network and the state of the associated memory flag for determining an I/O operation between the associated data register and the memory in accordance with the contents of address bus. In yet another variation, an error bus is connected to the memory; and a number of error registers are provided. Each error register is associated with a respective processor element and is connected thereto and also to the error bus. Each of the grant request flags is responsive to the polling network and the state of the associated memory flag for determining an I/O operation between the associated error register and said memory.

In another embodiment of the memory system, a data bus of a preselected width is connected to each of the processor elements within a cluster. In addition, an error bus is connected to each of the processor elements within the cluster, and has a preselected width as well. A memory for the cluster also is provided, each word of which has a width equal to the sum of the width of the data bus and the error bus. An address bus for the cluster also is provided, wherein the cluster memory is responsive to a enable signal for performing an I/O operation on the data bus and on the error bus in accordance with the contents of the address bus. In one variation, the address bus is connected to each of the processor elements in the cluster. In another variation, the address bus is connected to an array control unit of the parallel processor. In these variations, the address bus may include an identification bus connected to each of the processor elements in the cluster. The identification bus can be driven by either a fixed or programmable register in each of the processor elements.

In another embodiment of the present invention, data is transferred in a parallel processor as follows. A common data path is provided between a cluster of processor elements and a cluster memory. Respective memory flags of the processor elements are evaluated in parallel, and the values of the memory flags are furnished to respective grant request flags. A polling signal is applied to a selected one of the grant request flags, which is responsive to the value of its respective memory flag and to the polling signal for determining an I/O operation between a data register associated with the selected grant request flag and the memory over the common data bus.

In one variation of the foregoing, an address is provided from an address register associated with the selected grant request flag to the memory over a common address bus connected to the memory. The selected grant request flag is responsive to the value of its respective memory flag and to the polling signal for determining an I/O operation between a data register associated with the selected grant request flag and the memory over the common data bus in accordance with the address provided. In another variation, an address is provided from a processor control unit to the memory over a broadcast bus. The selected grant request flag is responsive to the value of its respective memory flag and to the polling signal for determining an I/O operation between a data register associated with the selected grant request flag and the memory over the common data bus in accordance with the address provided.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, where like reference numerals indicate like parts,

FIG. 1 is a diagrammatic view of a massively parallel processor which incorporates the present invention;

FIG. 2 is a diagrammatic view of a processor element printed circuit board in accordance with the present invention;

FIG. 3 is a flowchart of execution and data bus cycles of the parallel processor of FIG. 1;

FIG. 4 is a block diagram of the data flow path for an exemplary processor element in accordance with the present invention;

FIG. 5 is a block diagram of the data flow path for the exemplary arithmetic processing unit of FIG. 4;

FIG. 6 is a block diagram of the data flow path common to all processor elements of a cluster in accordance with the present invention;

FIG. 7 is a logic schematic diagram of the cluster memory of FIG. 6;

FIG. 8 is a logic schematic diagram of the address controller of FIG. 6;

FIG. 9 is a logic schematic diagram of the error correction logic of FIG. 6;

FIG. 10A and 10B are adjacent sections of a logic schematic diagram of the stage register of FIG. 4;

FIG. 11 is a logic schematic diagram of the exponent/address register of FIGS. 4 and 5;

FIG. 12 is a logic schematic diagram of the grant request flag circuit of FIG. 4;

FIG. 13 is a logic schematic diagram of the E and M flag circuits of FIGS. 4 and 5;

FIG. 14 is a logic schematic diagram of the PE register circuit of