|
Description  |
|
|
FIELD OF THE INVENTION
This invention relates to concurrent computer architectures; and
particularly to realizing a generic capability for a fault-tolerant and
reconfigurable multiprocessor computer scalable to thousands of processor
elements.
BACKGROUND OF THE INVENTION
Concurrent computer architectures are configurations of processors under a
common control interconnected to achieve parallel processing of
information. Processors arrayed in linear strings, sometimes termed
"systolic" architectures, are an increasingly important example of
concurrent architectures. Another such architecture is the binary tree, in
which the nodes are arranged in levels beginning with a single root and
extend to two, four, eight, etc. computing nodes at successive levels.
Pattern recognition is one class of problem to which parallel processing is
especially applicable. Pattern recognition is the comparison of an unknown
signal pattern to a set of reference patterns to find a best match.
Applications include speech recognition, speaker recognition, shape
recognition of imaged objects, and identification of sonar or radar
sources.
One requirement of multiprocessor architectures important to the solution
of pattern recognition and other problems, is scalability of the hardware
and the programming environment. Scalability refers to use of the same
individual PEs, board-level modules, operating system and programming
methodology even as machine sizes grow to tens of thousands of nodes.
Although scalability has been achieved in machines adapted to pattern
recognition, its practical realization especially in larger machines has
been limited by a lack of tolerance to faults exhibited by the relatively
fixed PE lattice structures heretofore used. Schemes in the prior art
which supply fault tolerance by adding redundant processing elements and
elaborate switching details to disconnect a failed PE and substitute a
spare, are expensive and take up space.
If fault tolerance and scalability can be achieved, however, parallel
processing offers real-time execution speed even as the problem size
increases. For example, a GigaFLOP (one billion floating point operations
per second) or more of processing can be required to achieve real-time
execution of large-vocabulary speech recognition apparatus. Pattern
recognition for future speech recognition algorithms will easily require
100 to 1000 times greater throughput. In general, pattern recognition for
higher bandwidth signals, such as imagery, will require a TeraFLOP (one
trillion floating point operations per second). Fault-tolerant, scalable,
parallel computing machines having hundreds or thousands of PEs, offer a
potentially attractive choice of solution.
A property related to scale, is fast execution of communications between a
Host computer and the PE array. PE configurations assembled as a binary
tree, for example, have the advantageous property that if the number of
PEs in the tree array are doubled, the layers through which communications
must pass, increase only by one. This property, known as logarithmic
communications radius, is desirable for large-scale PE arrays, since it
adds the least additional process time for initiating and synchronizing
communications between the Host and the PEs. Scalability is served by
devising a single, basic PE port configuration as well as a basic module
of board- mounted PEs, to realize any arbitrary number of PEs in an array.
This feature is critical to controlling the manufacturing cost and to
systematically increasing the capacity of small parallel processing
machines. Prior art arrangements of high count PE configurations have not
met this need, however; and further, have tended to increase the
installation size and pin-out count for backplane connections.
TeraFLOP capacities requiring many thousands of PEs in a single system,
also currently are prohibitively expensive if realized in the inflexible
and permanent hard-wired topologies of the current art. Additionally,
fault tolerance in conventional hard-wired PE arrays has been limited
heretofore, because the PE interconnection relationships are relatively
determined by the wiring. For this same reason, hard-wired PE arrays are
not generally reconfigurable.
OBJECTS OF THE INVENTION
Accordingly one object of the invention is to increase the fault tolerance
of concurrent computer architectures.
Another object of the invention is to permit the reconfiguration of a
concurrent computer architecture into any of a multiplicity of processing
node topologies.
A further object of the invention is to achieve the foregoing objects
without appreciably adding to backplane bus connections.
A further object of the invention is to provide in a concurrent computer
architecture an interconnected lattice array of processing elements which
under software control can be reconfigured to utilize virtually every
functional node despite faults in multiple other nodes.
A further object of the invention is to achieve greater scalability in
parallel processor architectures.
SUMMARY OF THE INVENTION
This invention contemplates the use of a unique interconnection scheme
among the PEs of a multiprocessor computing architecture; and means
utilizing the unique interconnections for realizing, through PE
reconfiguration, both fault tolerance and a wide variety of different
overall topologies including binary trees and linear systolic arrays. The
reconfigurability realizable pursuant to this aspect of the invention,
allows many alternative PE network topologies to be grown or embedded in a
PE lattice having identified PE or inter-PE connection faults. Further,
under the control of the same fault-identification and route-around
routine, the invention detects and compensates for faults occurring during
operation.
In a specific illustrative embodiment, the invention is realized through
use of 4-port PEs arrayed in a square 4.times.4 rectangular lattice which
constitutes a basic 16-PE module. Each PE has four physical ports, which
connect to the similar ports of its respective neighbors. For tree
topologies, any of the four neighbors of a given PE may be selected as the
parent of the given PE; and any or all of the remaining three neighboring
PEs may be selected as the child(ren) PEs.
Typically, three of the four ports of a given PE are assigned to connect to
adjacent PEs, these being the parent and two children of the given PE. The
aggregate of unused fourth ports of each PE allow the PE lattice to be
reconfigured to effect a large number of changes in parent-child
relationships. Reconfiguration bypasses identified faults in a given
topology. Reconfiguration also creates different computing node
topologies.
The functionality of the ports of each PE, which define the neighbor
relations, may be controlled by instructions from an exterior source, such
as a Host computer. The process for routing among ports within each PE may
be software-defined.
By using a novel variant on a tree expansion scheme, the invention allows
for virtually arbitrary up-sizing of the PE count to build virtually any
size of tree network, with each size exhibiting the same high degree of
fault tolerance and reconfigurability.
In a particular aspect, the invention provides predetermined node fault
correction routines to specific node faults. This inventive feature is
particularly applicable to single-program-multiple-data binary tree (in
which a given PE communicates only with its parent and has from zero to
two children); and single-threaded linear systolic arrays. For these
topologies, the invention teaches mapping or imprinting a software-defined
and controlled topology onto the fixed hardware and avoiding identified
faults, without an expensive commitment of software overhead and run-time
loss.
The same software controlled routing and the same physical interconnection
pattern of the sub-networks as described so far, can be used to realize
fault-tolerant "systolic" networks. These networks in their simplest form
are realized with a linear string of PEs, with each PE requiring a single
input and a single output port. The 4-port PE lattice provides extensive
flexibility in realizing the linear systolic structure in any of a number
of serpentine patterns, by routing around defective PEs.
The 4-port PE, together with the flexibility in routing, also allows the
systolic structure to be expanded at particular stages of processing when
the processing tasks require.
Advantageously, by building the modules with redundant backplane busses,
and interconnecting each of the redundant backplane busses from one module
to a different module, a network of modules is created that enables
routing around an entire module which has failed.
The invention and its further objects, features, and advantages will be
further elucidated in the detained description to follow and in the
DRAWING, in which:
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 is a block diagram of a conventional process for signal pattern
recognition;
FIG. 2 is a block diagram of a conventional expansion scheme for a tree
machine;
FIG. 3 is a block diagram of a fault-tolerant tree expansion scheme;
FIG. 4 is a block diagram of a PE array of 16 elements;
FIG. 5 is a block diagram of 16-element board module showing external and
internal connections;
FIG. 6 is a schematic diagram of a PE board of the invention, with a
specific set of external leads;
FIG. 7 is a schematic diagram showing options for interconnecting
16-element PE boards;
FIG. 8 is a block diagram showing functionalities of each PE node;
FIG. 9 is a block diagram of particular PEs on a board, interconnected in a
tree structure with exemplary external linkages;
FIG. 10 is a high-level block diagram of a multiprocessor system connected
to a Host;
FIG. 11 is a schematic diagram illustrating the data paths resulting from a
particular path configuration at a node;
FIG. 12 is a block diagram illustrating the bussing for routing node
configuration commands from a Host to multiple boards;
FIG. 13 is a flow chart depicting the process for initializing the PE
arrays; and
FIG. 16, consisting of FIGS. 14 and 15, portrays in block diagram form, the
creating of PE lattices within a PE board by configuring ports.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
The present invention's applications will be more readily appreciated by
first considering certain prior art. Accordingly, FIG. 1 depicts a
simplified conventional process for signal pattern recognition, in which
an individual unknown pattern 1 and a specific known pattern 2 from a
library or set of known patterns are compared according to some
instruction set in unit 3, a similarity function generator. Unit 3
develops a comparison measure, for example, a distance or probability
score, which is transmitted to a filter 4. A decision rule is implemented
by filter 4, for example selecting that reference pattern which is of
minimum distance to the unknown, the result of which is a pattern
classification output 5.
The unit 3 may, for example, be a binary tree machine. These vary in size
or "depth", depending on the problem complexity. One method for expanding
the size of a binary tree machine, known as a "Leiserson Expansion",
involves repeated use of identical 4-lead modules, as depicted in FIG. 2.
Two such modules, denoted 10a, 10b, illustrate the method. Each consist of
a subtree comprising a root 11a, 11b which are "parents" respectively of
"children" 12a, 13a in module 10a; and "children" 12b, 13b in module 10b.
These children in turn are parents to further PEs. Included in each module
10a, 10b is an expansion PE denoted 14a, 14b, each having three ports 15a,
16a, 17a; and 15b, 16b, 17b respectively. The ports 18a, 18b respectively
leading from the root PE 11a, 11b, of each subtree, constitute the fourth
port of each identical module.
Two such modules may be interconnected in such a way that the four-port
convention for the resulting combination of the modules 10a, 10b, is
maintained. As shown in FIG. 2, the subtree root port 18b of module 10b is
connected to port 17b of the expansion PE 14b; and the subtree root port
18a of module 10a is connected to the port 15b of the expansion PE 14b.
The resultant two-board system has a 15-PE tree with root PE 14b and an
expansion PE 14a. This combination can now be interconnected to further
identical modules through ports 15a, 16a, 17a of expansion PE 14a of
module 10a; and the port 16b of expansion PE of module 10b, the latter
port becoming the "new root" for module 10a. The resultant network again
comprises a subtree of PEs, plus one PE--namely PE 14a of module 10a,
which is available for expansion.
Since the resultant network after the interconnection is equivalent in
number of ports to the network in the individual modules 10a, 10b prior to
interconnection, the illustrated interconnection scheme may be applied
iteratively with ever-increasing module sizes.
The preceding scheme is not, however, operationally practical for machines
comprising thousands of PEs because it is not sufficiently fault-tolerant.
For example, if the subtree PE 11b were to fail, all children would be
disconnected. Thus, failure of a single component could disproportionately
reduce the available number of PEs.
A Fault-Tolerant Expansion Scheme
The present invention, inter alia, provides a mechanism for achieving
binary tree expansion, which retains the requisite constant number of
basic module ports while also providing a substantial degree of tolerance
to PE faults.
The basic fault-tolerant PE module is depicted in FIG. 3. Two such modules,
20a, 20b illustrate the principle. Each contains a subtree 21a, 21b
respectfully, with each of the latter served by two busses 22a, 23a; and
22b, 23b. The subtrees 21a, 21b may each consist of a multiplicity of PE
modules, for example, realized by the configuration shown in FIG. 4.
Either of the two busses serving each subtree may be selected as the root
bus of that subtree.
In accordance with the invention, each of the modules 20a, 20b include two
expansion PEs, denoted 24a, 25a; and 24b, 25b respectively. Each expansion
PE has three ports labeled as follows: for PE 24a-ports 26a, 27a, 28a; for
PE 24b- ports 26b, 27b, 28b; for PE 25a- ports 29a, 30a, 31a; and for PE
25b- ports 29b, 30b, 31b.
Each of the expansion PEs thus has three off-board ports available for
expansion. The connection paths to/from each module 20a, 20b, therefore
total eight, compared to four in the non-fault- tolerant scheme of the
prior art.
In accordance with the invention, the two modules 20a, 20b may be
interconnected in a way that retains at eight the number of external
connection paths. One illustrative way, shown in FIG. 3, is to effect the
following connections of subtree busses and PE ports: bus 22a to PE port
28b; bus 23a to PE port 26a; bus 22b to PE port 28a; and bus 23b to PE
port 26b. The result is that the combination of the two 8- port modules
20a, 20b retains the eight-lead connection convention. Ports 27a and 27b
of the PEs 24a, 24b, become the "new roots"; and the ports 29a, 30a, 31a
of spare PE 25a together with the ports 29b, 30b, 31b of spare PE 25b,
constitute the eight external interconnections.
The resultant composite network is a subtree with either of two busses 27a,
27b selectable for use as the root bus, and with two PEs 25a, 25b
available for expansion. A failure of hardware associated with either of
the candidate root busses 27a, 27b may be overcome simply by selecting the
alternative root as the operational root bus.
It will be apparent hereinafter in connection with FIG. 5, that by
selectively configuring ports of PEs in an array such as an x-y matrix,
spare PEs can be created and then integrated into either subtree PE
topology.
Fault-Tolerant Lattice of PEs
As shown in FIG. 5, the PE lattice at the board level, denoted 40,
advantageously (although not necessarily) is an extended four-by-four
rectangular array of 4-port PEs, totaling sixteen for each board. Each PE
is denoted by row and column call-outs 1.1 . . . 4.4. Each of the four
internal PEs is connected to each of its four neighbors. Thus, PE 2.2 is
connected to PEs 1.2, 2.3, 3.2, and 2.1. Each of the four PEs 1.1, 1.4,
4.1, and 4.4 at the corners of the array 40, are connected to their
respective two neighboring PEs, such as PEs 2.1 and 1.2 in the case of
corner PE 1.1.
Further, each of the four corner PEs have two ports through which the board
40 can be connected to additional modules or to a Host. These external
ports are denoted a, b for PE 4.1; c, d for PE 1.1; e, f for PE 1.4; and
g, h for PE 4.4.
The remaining eight PEs are paired by four interconnection paths, denoted
j, k, l, and m, respectively to interconnect the PE pairs 1.2 with 4.3,
1.3 with 4.2, 2.1 with 3.4, and 3.1 with 2.4.
The expansion PEs 25a and 25b illustrated in FIG. 3, correspond to the PEs
1.1 and 1.4 in FIG. 5. The roots of the subtree correspond to PEs 4.1 and
4.4; so that busses 22a and 23a of module 20a in FIG. 3 correspond to
busses a and h of FIG. 5.
Growing Minimum-Depth Binary Tree in a Two-Fault Lattice
Using the concepts illustrated, a variety of alternate routing options
between PEs are available in each module. A basic 4-by-4 PE board- level
array is depicted schematically in FIG. 5, where the PEs are given the
call-outs 1.1 . . . 4.4. The depicted array is similar to that of FIG. 4,
where in addition, nomenclature is within each PE symbol to represent the
"depths", or layer distances, of various PEs from the root PE of the
array.
In the example illustrated in FIG. 4, the two PEs denoted 2.3 and 3.1, are
assumed to have faults; and the requirement is to "grow" a tree using the
remaining PEs of the array. Further, however, it is desired to create a
binary tree having a minimum depth between the PE chosen as the root and
the most remote "leaf".
First, a boundary PE is selected as the root; in this illustration, PE 4.1.
Then, in a manner described in detail hereinafter, a message is issued to
each of the PE leaves, instructing each leaf to send a message to the (at
most three) non-parent neighboring PEs, interrogating whether they are
available as potential children.
If a given neighbor PE already has a parent, a code is generated causing
the PE which sent the interrogation to not adopt that PE as a child. If a
given neighbor PE is detected as one that has a fault, or does not respond
to the interrogation, a code is generated causing the PE which sent the
interrogation to not adopt that PE as a child.
If a given neighbor PE has received only one request for adoption, a code
is generated causing the sending PE to adopt that PE as a child. Finally,
if a given neighbor PE has received more than one request for adoption,
then a code is generated causing that PE to randomly select one of the
parent candidates as its parent.
The resulting set of leaves are those most recent adoptees, plus any
previous leaves that in the above process were not able to adopt any
children. The tree thus configured consists of: PE 4.1 as the root and
first level; second-level children of root PE 4.1 consisting of: PE 4.2;
third level children of the second-level leaves consisting of: PEs 3.2 and
4.3; and fourth-level children consisting of: PEs 2.2, 3.3, and 4.4, etc.
The structure extends to six levels, as shown in FIG. 4; and bypasses the
faulty PEs 3.1 and 2.3.
The process for initializing the PE array, which includes orienting the
ports of each PE to serve as parent/child or child/parent paths, is also
depicted in FIG. 13.
This arrangement's tolerance to PE faults is highly advantageous. First,
the arrangement is tolerant to a double fault of the type illustrated. It
is also tolerant to a double fault if one of the fault-pair is "internal";
or if the pair comprises a root and an expansion PE. Even if both roots
fail, one of the expansion PEs can be configured as a root, still
providing fault tolerance.
Referring again to FIG. 3, if both of the expansion PEs 25a, 25b fail, a
degree of fault tolerance is still provided: there will always be one
board whose expansion PE is not utilized. If, for example, the fault-pair
comprises both expansion PEs on one board such as PEs 24b, 25b, a fault
avoidance strategy is to choose that board whose expansion PE is not
required.
Of course, if there are no faults, then the expansion PEs can all be
subsumed into the subtree modules 21a, 21b, thereby utilizing all
available processors. The process depicted in FIG. 13 also illustrates
this feature.
The virtual tree machine shown in FIG. 4 is skewed, in that the leaves are
of variable depths. Also, the structure is not purely binary, because the
branching factor at each leaf varies between one and three. The impact of
these conditions on total execution time is, however, negligible.
Importantly, the machine retains a logarithmic communications radius, and
uses identical and scale-invariant modules to "grow". Once realized in
hardware, the machine also offers the necessary small and constant pin-out
from the PEs and submodules.
Fault-Tolerance for Systolic Arrays
A conventional systolic topology can be configured in a variety of ways in
the 16-element array of FIG. 5. An exemplary sequence is to enter the
array at PE 1.1, then proceed through PEs 1.2, 1.3, 1.4, 2.4, 2.3, 2.2,
2.1, 3.1, 3.2, 3.3, 3.4, 4.4, 4.3 and 4.2, exiting at PE 4.1.
A systolic topology can also be implanted within the interconnected PE
configuration of FIG. 5, despite a single fault condition at any one of
the 16 PEs. For example, given a fault at the corner PE 1.1, a systolic
array may be grown by connecting the remaining 15 PEs in the following
serial sequence: entering the array at PE 4.1, then proceeding through PEs
4.2, 4.3, 4.4, 3.4, 3.3, 3.2, 3.1, 2.1, 2.2, 1.2, 1.3, 2.3 and 2.4, and
exiting at PE 1.4.
An inspection of FIG. 5 will also reveal that a 15-element systolic
configuration essentially the mirror image of the just-described systolic
configuration, can be grown if the single fault occurs at any of the other
corner PEs 4.1, 4.4, or 1.4. A fault at PE 4.1, for example is removed by
entering the array at PE 1.1, then proceeding through PEs 1.2, 2.2, 2.1,
3.1, 3.2, 4.2, 4.3, 3.3, 2.3, 1.3, 1.4, 2.4, 3.4, and exiting at PE 4.4.
The geometries of the PE interconnection paths which compensate for a
single fault at any of the corner PEs, are congruent.
Consider next a fault at PE 1.2, which along with PEs 1.3, 4.2 and 4.3 are
interior to the corner PEs and located on the north and south sides of the
array periphery. The fault-compensating path is selected by entering at PE
1.1, then proceeding through PEs 2.1, 2.2, 2.3, 1.3, 1.4, 2.4, 3.4, 3.3,
3.2, 3.1, 4.1, 4.2, 4.3, and exiting at PE 4.4. Again, an inspection of
FIG. 5 will demonstrate that a path of this same geometry will also
compensate for a single fault at any of the PEs 1.3, 4.2 and 4.3.
Consider now a fault at PE 2.1, which along with PEs 3.1, 2.4 and 3.4 are
also interior to the corner PEs and located on the east and west sides of
the array periphery. The fault-compensating path is selected by entering
at PE 1.1, then connecting through PEs 1.2, 2.2, 3.2, 3.1, 4.1, 4.2, 4.3,
3.3, 2.3, 1.3, 1.4, 2.4, 3.4, and exiting at PE 4.4. A path of this same
geometry will also compensate for a single fault at any of the PEs 3.1,
2.4, and 3.4.
Finally, consider a fault at the interior PE 2.2, one of four PEs including
PEs 2.3, 3.2 and 3.3 which in the 4.times.4 array of the illustrative
embodiment are not on the periphery of the array. The fault-compensating
path is selected by entering at PE 4.1, then proceeding through PEs 4.2,
4.3, 4.4, 3.4, 3.3, 3.2, 3.1, 2.1, 1.1, 1.2, 1.3, 2.3, 2.4, and exiting at
PE 1.4. Again, a path of this same geometry will also compensate for a
single fault at any of the PEs 2.3, 3.2 and 3.3.
The significance of the symmetries just described, is that for systolic
architectures, only four one-fault PE reconfiguration patterns are needed
to cover the possible 16 cases of single faults. Each yields full use of
the 15 functioning PEs.
Other specific reconfiguration paths of serially connected PEs may be
created besides the examples elucidated above to correct for a single PE
fault in a systolic array.
It may be generalized that the fault tolerance of the array depicted in
FIG. 5 is realized in a 4.times.4 array of PEs by providing two separate
accesses (for entering or exiting the array) at each of the four corner
PEs. One means for realizing this capability in a physical embodiment such
as depicted in FIG. 5, is to tie the metallic leads together in the
backplane in the manner shown in FIG. 6. With leads b and c tied to common
lead y, as shown, a path through lead y is afforded to both corner PEs 1.1
and 4.1. With leads a and d tied to common lead x, as shown, a path
through lead x is afforded to the same two corner PEs 1.1 and 4.1.
These redundant paths into and out of a PE board also make possible the
bypassing of an entire PE board in a multi-board arrangement. FIG. 7
illustrates boards 40, 40a, 40b and 40c. Adjacent boards are connected by
tying the x-leads created as described with respect to FIG. 6. Boards once
removed from each other are connected by tying the y-leads. A catastrophic
failure of entire board 40a, for example, is bypassed by the connection
denoted y-prime, which enables board 40 to gain access to board 40b
through either the latter's PE 1.1 or 4.1. These are the same PEs which
were connected to failed board 40a.
As seen in FIG. 5, buffers 45, 45a, respectively control the direction of
signals between lead b and the connection path for PEs 1.1, 1.2; and,
similarly, the direction of signals between lead g and the connection path
for PEs 1.4, 2.4. Each buffer 45, 45a is a conventional tri-stable,
bi-directional device available commercially. Under the direction of an
external control means such as a Host computer, buffer 45 can assume a
first state in which lead b furnishes only an input signal to the module
40. In a second state, the lead b furnishes only an output signal from
module 40. In its third state, the circuit is open and no signals pass to
or from lead b. The operation of buffer 45a is identical to that of buffer
45. The function provided by the buffers 45, 45a, is to add connection
options to and from module 40. If, for example, PE 1.1 is unable to pass
signals from/to either leads c or d, access to module 40 which bypasses PE
1.1 is afforded by lead b through switch 45a, placed in one of its
transmission states. Buffers 45, 45a are included in this realization to
provide the required variety of access ports to the lattice while limiting
the number of connections to the backplane. The buffers provide more
alternative topologies in routing around defective processor elements, and
also lead to efficiencies in backplane connections.
General Hardware and Control for Fault Recovery and Reconfiguration
A variety of control means can be envisioned to configure and reconfigure
the PEs in accordance with the invention. An illustrative description of
the structure of each node, which enables the process control set forth in
FIG. 13 to be exercised, will first be presented.
Each processing node, denoted 50 in FIG. 8, consists of a digital signal
processor 51, a memory 52 external to processor 51, and a configuration
network 53. Processor 51 may advantageously comprise a digital signal
processor chip available from American Telephone and Telegraph Company
under the name "DSP32C". Memory 52 may comprise eight 64K.times.4 static
RAM chips, each being a CY7C196 device available from Cypress
Semiconductor Corporation. Network 53 may comprise, for example, an
application-specific integrated circuit chip manufactured by AT&T as a
semi-custom 0.9 micron twin-tub CMOS device disposed in a 224-pin grid
array.
The functionalities provided by Network 53 include a crosspoint switch
array 54, an input FIFO 74, an output FIFO 75, and four neighbor-ports
denoted north, south, east, west. Data signals communicated to or from
DSP51 during the run of an application program are routed to the
neighbor-ports through the input FIFO 74 and output FIFO 75.
The four ports of each node 50, are designated N, E, S, and W. Each of the
four ports is provided with a node-parallel interface ("NPI"), labeled 61,
62, 63, and 64 respectively, as seen in FIG. 8. Data and header
information between neighboring nodes, that is, the nodes 1.1-4.4 shown in
FIG. 5, are communicated through the NPIs. Particular data/header
communication linkages between nodes are set up to establish a desired
topology. FIG. 9 illustrates a board 40 consisting (unlike the nodes of
FIG. 4) of all functioning nodes 1.1-4.4. These have been configured into
a tree topology. The solid lines connote the paths between the nodes which
are utilized in the tree structure; dotted lines connote paths which are
not used. Typically if a port N is designated as a parent, port S is
either an unused or an adopted child; port E is the parent node's
left-child; and port W is the parent node's right-child. FIG. 9 also shows
the eight exterior connections for the illustrated board 40. These are
denoted with their "direction" labels, and with the associated PE label:
S4.1, W4.1, W1.1, N1.1, N1.4, E.1.4, E4.4, S4.4. In the following section,
the role of the ports in initial configuration will be described.
Initial Tree Configuration of NPIs at Nodes
For each PE 50, a configuration message determines: (a) which of the PE
ports N,S,E,W are participating, i.e., included, in the tree either as
parent or children; (b) which port connects to the current PE's parent;
(c) whether the parent port is the input port and the children ports are
the output ports (as in the case of "BROADCAST" mode), or vice-versa (as
in the case of "REPORT" mode); and (d) which ports are currently active,
i.e., being used to read or write to a child PE; and which ports are
currently passive.
FIG. 10 broadly illustrates an overall parallel multiprocessor system which
utilizes the invention in, for example a tree topology as in FIG. 9. FIG.
10 shows a Host computer 60, which may be an AT&T PC 6300+ unit, and only
one PE board consisting of 16 PEs of which only three are shown for ease
of reading. However, the assembly can and usually will consist of a large
multiplicity of PE boards of the type illustrated in FIG. 5.
The initialization process now to be further described, is substantially
set forth in FIG. 13. Coded messages are formed in the Host 60 in
conventional fashion. A header message is formed to include fields to
specify a destination ID, indicating the ID of the PE to which it is to be
sent. The header also contains a PASS field to indicate whether, even
after reaching its destination, the message should continue to be
propagated "downstream".
Initial node port configuration control messages from Host 60 to the PEs
are sent via parallel communications control bus 56 to each PE 50 of each
board 40. These messages instruct each node as to how to establish its
ports. FIG. 11 illustrates the configuration of one of the PEs of FIG. 9,
arbitrarily PE 2.1. PE 2.1, which is one of two second-level children of
board root PE 1.1 is shown with its NPIs configured in the "Broadcast"
mode. NPI N2.1 is prepared to receive data messages from NPI S.1.1 of the
parent PE. A path denoted 76 is configured from NPI N2.1 to the DSP 51 of
PE 2.1 through the input FIFO 74. To support the data processing
operations of the PE 2.1. Two data signal routing paths, denoted 77, 78,
are provided within PE 2.1, by structures described earlier in connection
with FIG. 8. These support the data processing of the lower level PEs. As
seen in FIG. 11, input FIFO 74 is not connected when bypass paths 77, 78
are designated.
The tree "leaf" levels and the associated parent-child relationships, of
the tree topology illustrated in | | |