|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Technical Field
The invention is related to computers which out-perform supercomputers and
in particular to computers comprising arrays of interconnected field
programmable gate arrays (FPGA's).
2. Background Art
Field programmable gate arrays (FPGA's) were introduced by Xilinx, Inc. and
are sold by Xilinx, Inc. under part no. XC4000, for example, and are
described in U.S. Pat. No. 4,870,302 to Ross Freeman assigned to Xilinx,
Inc. Their use has been explored by a number of workers, including Thomas
C. Waugh of Xilinx, Inc., in "Field Programmable Gate Array Key to
Reconfigurable Array Outperforming Supercomputers," IEEE 1991 Custom
Integrated Circuits Conference, pages 6.6.1 through 6.6.3. This
publication by Waugh discloses an array of FPGAs and software for
configuring the FPGAs in the array in accordance with a user-defined
problem. It discloses a software package called LDG distributed freely by
the Supercomputing Research Center for configuring an array of Xilinx
FPGAs to perform some user-defined algorithm. Xilinx, Inc. sells software
packages, including one package entitled "The Xact Design Implementation
System" for configuring a single FPGA chip in accordance with a set of
user-defined primitives, Minc, Inc. has developed a multiple FPGA logic
partition called "PLDesigner-XL". The concept of interconnected FPGAs has
also been disclosed in U.S. Pat. No. 5,109,353 to Sample et al. and
assigned to Quickturn Systems, Inc. and by Cox et al. of IBM Research
Division, "Ganglion--A Fast Hardware Implementation of a Connectionist
Classifier," IEEE 1991 Custom Integrated Circuits Conference, pages 6.5.1
through 6.5.4 and by Bertin et al. of Digital Equipment Corporation, Paris
Research Laboratory, entitled "Programmable Active Memories: a Performance
Assessment."
The problem with performing a complex algorithm, such as a fast Fourier
transform (FFT) for example, in an array of interconnected FPGAs is that
approximately 90% of the internal resources of each FPGA is consumed by
the FPGA-to-FPGA interconnection requirements, as distinguished from the
computational tasks required by the algorithm itself. This greatly impedes
the performance and has seemed to be a fundamental limitation.
A fundamental problem in computing is that a conventional microprocessor
executes successive instructions which must be input as
instruction-defining-data into the microprocessor prior to execution
thereof. Typically, the amount of instruction-defining-data corresponding
to a single instruction exceeds the amount of data representing the
operands required by the instruction. Thus, the necessity of loading
instructions for each operation creates a fundamental barrier to
computational speed.
SUMMARY OF THE INVENTION
The present invention includes an array of FPGAs whose configurations
change successively during performance of successive algorithms or
instructions, in the manner of a computer executing successive
instructions. By configuring the array of FPGAs to perform an entire
algorithm or set of instructions, the computing machine (i.e., the array
of FPGAs) does not have to wait for each instruction to be down-loaded in
performing each computational step.
In accordance with a second aspect, the invention further includes
connecting adjacent FPGAs in an array of FPGAs through external field
programmable interconnection devices (FPINs) or cross-bar switches in
order to relieve the internal resources of the FPGAs from any external
connection tasks. This solves the problem of having to employ 90% of the
internal FPGA resources on external interconnection, thereby increasing
performance nine-fold.
In accordance with a third aspect of the invention, the array of FPGAs
forms a virtual computer, in that the array includes a processor-like
device capable of performing the computations necessary to reconfigure the
FPGAs in the array in accordance with the next algorithm to be performed.
Preferably, this processor-like device is itself a "control" array of
interconnected FPGAs which have been configured to emulate a selected
microprocessor architecture (such as the Motorola 68000 microprocessor
architecture). Thus, the virtual computer accepts user-defined primitives
corresponding to an algorithm to be performed or a logic architecture to
be emulated, and the control array of FPGAs processes the user-defined
primitives and reconfigure the FPGAs and the FPINs accordingly. Then, the
virtual computer is ready to operate on operand data.
In an initial embodiment of the invention, each FPIN is an IQ160 Field
Programmable Interconnect Device sold by I-Cube Systems, Inc., and is
configured in accordance with the FPGA-to-FPGA interconnection
configuration bits generated by the Supercomputing Research Center's LDG
software program.
In accordance with a fourth aspect of the invention, each FPIN is an array
of general routing cells (GRC's) carrying two parallel bit streams of
configuration bits. The array of GRC's permits one bit stream to shift in
serially through the array in preparation for the next configuration
change while simultaneously implementing the connections specified by the
current set of configuration bits held stationary in the array. Then, upon
change-over to the configuration of the next configuration bit stream, the
first bit stream is shifted out and a following (third) bit stream is
shifted in. In this manner, there is no delay imposed by the shifting in
and out of configuration bits, provided that the computation time of each
configuration exceeds the time required to shift the configuration bits in
and out of the array. Moreover, some algorithms may lend themselves to
alternating the array between two configurations many times, in which case
no configuration bits need be shifted during computation once both streams
of configuration bits have been loaded.
In accordance with a fifth aspect of the invention, the GRC array is
spatially symmetrical, having an equal number of horizontal and vertical
interconnect lines corresponding to interconnectable user I/O pins, with
programmable interconnections between horizontal and interconnect lines.
Permanent interconnections between horizontal and vertical lines permit
incoming configuration bits to propagate through the GRC array in a
serpentine path that passes through all GRC cells, so that the first
configuration bit programs the last cell in the path while the last
configuration bit programs the first cell in the serpentine path.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a simplified block diagram of an elementary cell of an array of
FPGA's and FPIN's in accordance with the invention;
FIG. 2 is a block diagram of a virtual computer embodying the invention,
including an array of FPGAs and FPINs comprising many cells of the type
illustrated in FIG. 1.
FIG. 3 is a block diagram illustrating pin connections between an FPGA chip
and adjacent FPIN chip in the cell of FIG. 1;
FIG. 4 is a block diagram of a reconfigurable control section of the
virtual computer of FIG. 2;
FIG. 5 is a block diagram of the VME interface section of the virtual
computer of FIG. 2;
FIG. 6 is a block diagram of a virtual computing system, including a host
computer temporarily connected to the system bus for initially programming
the virtual computer;
FIG. 7 is a flow diagram of a process employing configuration software
applicable to the FPGAs and the FPINs for configuring the virtual computer
of FIG. 2;
FIGS. 8a and 8b are pin diagrams illustrating one implementation of the
elementary cell of FIG. 1;
FIG. 9 is a pin diagram of the interconnection between the edge columns of
FGPAs and the dual port RAMs in the virtual computer of FIG. 2;
FIG. 10 is a block diagram of an alternative embodiment of an array of
FPGAs and FPINs;
FIG. 11 is a circuit diagram of a general routing cell of the invention;
FIG. 12 is a circuit diagram of a CMOS version of the interconnect switch
employed in the GRC cell of FIG. 11;
FIG. 13 is a block diagram illustrating horizontal and vertical
interconnections in an array of GRC cells of the type corresponding to
FIG. 11; and
FIG. 14 illustrates the propagation of configuration data throughout the
array of FIG. 13.
DETAILED DESCRIPTION OF THE INVENTION
Referring now to FIG. 1, a basic cell or building block of a an array of
FPGAs includes a pair of FPGAs 101, 103, which may be thought of as being
located on opposite corners of a square, and a pair of FPINs 105, 107,
which may be thought of as being located on the other opposing corners of
the square. Each FPGA 101, 103 is connected to the two FPINs 105, 107,
while each FPIN 105, 107 is connected to the two FPGAs 101, 103.
The internal logical architecture of each FPGA is reconfigurable to
implement a user-selected algorithm or a portion of such an algorithm.
Such reconfiguring may be achieved by processing a logical definition of
the algorithm with a software package such as LDG software package
distributed by the Supercomputing Research Center (SRC) to generate a file
of configuration bits for each FPGA. Each FPGA is reconfigured by
down-loading the corresponding configuration bit file into the FPGA, as
will be described later in this specification. Each FPIN is reconfigured
by using the bits specifying FPGA-to-FPGA connections in each
configuration bit file produced by the LDG software package as an input to
a software package such as FPID Pro developed by I-Cube Systems, Inc., to
generate a file of configuration bits for each FPIN. Each FPIN is
reconfigured by down-loading the corresponding configuration bit file into
the FPIN, as will be described later in this specification. In this way,
none of the internal logic resources of the FPGAs are wasted in effecting
the FPGA-to-FPGA connections specified in the configuration bit files
generated by the LDG software package.
In one implementation, each connection is between one bank of 32 user I/O
pins of an FPGA and a corresponding bank of 32 user I/O pins of an FPIN,
although the actual number of pins is a design choice. In this
implementation, each FPGA is a Xilinx XC 4000 programmable gate array sold
by Xilinx, Inc. at 2001 Logic Drive, San Jose, Calif. and each FPIN is an
I-Cube IQ160 field programmable interconnect device sold by I-Cube
Systems, Inc. at 2328-J Walsh Avenue, Santa Clara, Calif.
FIG. 2 illustrates an array 200 of FPGAs 202 and FPINs 204 comprising many
elementary cells of the type illustrated in FIG. 1. The array 200 of FIG.
2 may be employed as a coprocessor to enhance the performance of a host
computer. However, in accordance with a further aspect of the present
invention, the array 200 is comprised within a virtual computer
illustrated in FIG. 2. The virtual computer of FIG. 2 is capable of
successively reconfiguring the FPGAs and FPINs in the array 200 to perform
successive algorithms, in analogy with a conventional microprocessor
executing a set of successive instructions. However, each one of the
successive algorithms performed by the virtual computer of FIG. 2 may
correspond to many conventional instructions to be executed by a
conventional microprocessor. Thus, rate at which the FPGAs and FPINs in
the array 200 are successively reconfigured in performing a large
computational task is less than the rate at which corresponding individual
instructions would have to be loaded into a conventional microprocessor
attempting to perform the same computational task at the same speed. In
fact, therefore, by reconfiguring the FPGAs and FPINs in the array 200 at
the maximum possible rate, the array 200 can perform a large computational
task much faster than a conventional microprocessor.
For this purpose, the virtual computer of FIG. 2 includes a reconfigurable
control section 206 governing the reconfiguration of all of the FPGAs and
FPINs in the array 200 and capable of running software packages such as
the LDG and FPID Pro software packages to generate the configuration bit
files. Each configuration bit file thus generated is transmitted by the
reconfigurable control section 206 to the corresponding FPGA or FPIN in
the array 200.
The array 200 is connected to its left and to its right to respective
columns 210, 212 of eight FPGAs (hereinafter, "column FPGAs"), alternate
pairs of column FPGAs connected to pairs of dual port RAMs whose other
ports are connected to the reconfigurable control section 206. A local bus
214 provides connection between the reconfigurable control section 206, a
bus (VME) interface 216 and a main memory 218. The bus interface 216 is
connected to a system bus 220. The 32-pin banks of the FPGAs 202 and FPINs
204 in the top row of the array 200 corresponding to the upward pointing
arrows are connected to the 32-pin banks of the FPGAs 202 and FPINs 204 in
the bottom row of the array 200 corresponding to the downward pointing
arrows.
As illustrated in FIG. 3, each FPGA 202 in the array 200 has four banks
300, 302, 304, 306 of thirty-two user I/O pins on its four sides connected
to corresponding banks of thirty-two pins of four different FPINs 204. In
addition, each FPGA 202 has a clock signal pin 308 and a configuration bit
input pin 310 on which it receives the corresponding configuration file
generated by the LDG software package executed by the reconfigurable
control section 206. Each FPIN 204 in the array 200 has four banks 312,
314, 316, 318 of thirty-two pins on its four sides connected to
corresponding banks of thirty-two pins of four different FPGAs 202. In
addition, each FPIN has a clock signal pin 320 and a configuration bit
input pin 322 on which it receives the corresponding configuration file
generated by the FPID Pro software package executed by the reconfigurable
control section 206.
FIG. 4 illustrates a preferred embodiment of the reconfigurable control
section 206. The reconfigurable control section 206 of FIG. 4 has ten
FPGAs 401-410 each identical to the FPGAs 202 in the array 200. The ten
FPGAs 401-410 are connected directly together in this embodiment without
the use of FPINs. In accordance with one aspect of the invention, the ten
FPGAs 401-410 are configured using the LDG software (run on an external
host not shown in FIG. 4) to emulate a standard microprocessor (such as a
Motorola 68000). Once this is completed, the reconfigurable control
section 206 is then used to run the LDG and FPID Pro software packages to
generate the configuration bit files for reconfiguring the FPGAs 202 and
FPINs 204 of the array 200 in accordance with a process to be described
later herein. In order to load the configuration bit streams or files into
each of the FPGAs 202 and FPINs 204 independently or simultaneously, some
of the user I/O pins 412, 414 of each of the last two FPGAs 409, 410 of
the reconfigurable control section 206 serve as configuration bit output
pins dedicated to transmitting the configuration bit files to different
ones of the FPGAs 202 and FPINs 204 in the array 200. For this purpose,
different ones of the configuration bit output pins 412, 414 are
individually connected to different configuration bit input pins 310 of
the FPGAs 202 while others of the configuration bit output pins 412, 414
are individually connected to different configuration bit input pins 322
of the FPINs 204. The reconfigurable control section 206 is programmed to
output the various configuration bit output files it has generated for the
various FPGAs 202 and FPINs 204 on corresponding ones of the configuration
bit output pins 412, 414. In the implementation illustrated in FIGS. 2, 3
and 4, the array 200 consists of exactly 24 FPGAs and 24 FPINs. In
addition (for reasons that will be discussed later in this specification),
the array 200 is bounded on its left and right (as viewed in FIG. 2) by
left and right columns 210, 212, respectively, of consisting exclusively
of exactly 8 FPGAs in each column, for a total of 16 additional FPGAs.
Thus, there is a total of 40 FPGAs and 24 FPINs controlled by the
reconfigurable control section 206, requiring a total of 64 configuration
bit output pins. In order to meet this requirement, the configuration bit
output bits 412, 414 comprise one bank of thirty-two user I/O pins on each
one of the last two FPGAs 409, 410 of the reconfigurable control section
206.
In FIG. 4, the FPGAs 401 in the top row and 406 in the bottom row are
typical, but are rotated with respect to one another in the horizontal
plane by 180 degrees. Each one has five banks (labelled OUT, IN, R
(RIGHT), L (LEFT) and INNER, respectively) of thirty-two pins each. In the
reconfigurable control section 206, each one of the five banks
participates in one of five connections. Specifically, the top five FPGAs
401, 402, 403, 404, 405 have their OUT banks connected to a local bus 214
while the bottom five FPGAs 406-410 have their OUT banks connected to the
array 200. Of the latter, the OUT banks of the FPGAs 409 and 410 are the
configuration bit output pins 412 and 414 discussed above, while the
connection of the OUT banks of the FPGAs 406-408 to the array 200 will be
discussed later in this specification. All ten FPGAs 401-410 have their IN
banks of pairs of top and bottom FPGAs connected together, their INNER
banks connected to an inner bus 418 and their LEFT and RIGHT banks
connected to provide left-to-right pipeline connectivity. The inner bus
418 facilitates the reconfiguration of the array of FPGAs 401-410 in a
microprocessor architecture capable of running object code compiled for a
particular microprocessor. The LEFT bank of the FPGA 410 and the RIGHT
bank of the FPGA 406 provide data ports, as do the local bus 214 and the
inner bus 418.
The five banks of 32 pins of each FPGA 202 in the array 200 are allocated
in the manner illustrated in FIGS. 2 and 3. Specifically, the four banks
300, 302, 304, 306 provide connection to the four adjacent FPINs 204
(i.e., to the top, bottom, left and right of each FPGA 202). A fifth bank
324 of thirty-two pins is divided in half, one sixteen-pin half bank 324a
being used for interconnection to the fifth banks of all FPGAs 202 in the
same column using a column global bus 222. Thus, each column global bus is
a 16-bit bus. The other sixteen-pin half bank 324b is dedicated to control
functions listed as "Permanently Dedicated Pins" on page 34 of the Xilinx
technical manual for the XC4000 FPGA entitled Xilinx Technical Data XC
4000 Logic Cell Array Family (1990).
List of 16 I/O pin used as dedicated pins in the Array
______________________________________
TDI USED FOR BOUNDRY SCAN
TDO "
TCK "
TMS "
M0 USED FOR CONFIGURATION MODE
M1 "
M2 "
INIT CONFIGURATION CONTROL AND STATUS
DIN CONFIGURATION DATA IN
PGCK1-4 TIED TO SYSTEM CLOCK
SGCK1-3 GLOBALLY TIED TOGETHER FOR GLOBAL
MESSAGES.
______________________________________
The five banks of 32 pins of each FPIN 204 in the array 200 are allocated
in the manner illustrated in FIGS. 2 and 3. Specifically, the four banks
312, 314, 316, 318 provide connection to the four adjacent FPGAs 202
(i.e., to the top, bottom, left and right of each FPIN 204). A fifth bank
326 of thirty-two pins is used for interconnection to all fifth banks of
all FPINs 204 in the same row using a row global bus 224. Thus, each row
global bus 224 is a 32-bit bus. There are four column busses 222 and four
row busses 224. The column and row busses 222 and 224 enable global
communication among the FPGAs 202 and the FPINs 204 respectively.
As described above with reference to FIG. 4, data flow between the
reconfigurable control section 206 and the array 200 occurs at the 32-pin
OUT banks of the FPGAs 406, 407 and 408 of the control section 206. For
this purpose, array buses 226, 228 are connected to the 32-pin OUT banks
of the FPGAs 406-408 and to the left and right FPGA columns 210 and 212
via dual port RAMs 230-245. The dual port RAMs 230-245 provide buffering
between the data bursts received on the system bus 220 and the synchronous
operation of the array 200.
The five 32-pin banks of each FPGA 202 in the left and right FPGA columns
210 and 212 may be labelled L (LEFT), R (RIGHT), OUT, IN and INNER in the
same manner as the FPGA 406 of FIG. 4. With this analogy in mind, the
connections between the dual port RAMs 230-245 and the left and right FPGA
columns 210 and 212 are typified by the connection of the FPGA 202a at the
top of the right FPGA column 212, as follows. The FPGAs in each column
210, 212 are paired, the IN and INNER banks of each FPGA within a pair
being connected together and to a respective one of a pair of the dual
port RAMs. Thus, the FPGA 202a is paired with the next FPGA down in the
right FPGA column 212, namely the FPGA 202b. In the specific case of the
FPGA 202a, the LEFT bank connects to the adjacent FPIN in the same row
while the RIGHT bank is wrap-around connected to the left bank of the FPGA
in the same row of the left FPGA column 210, the OUT bank is wrap-around
connected to the OUT bank of the bottom FPGA of the right FPGA column 212.
The OUT bank of the FPGA 202b is connected to the OUT bank of the next
FPGA down in the same column. Thus, the right and left FPGA columns 210
and 212 are connected around the array 200 to each other from left to
right, as indicated by the arrows extending away from the array 200. This
latter feature, along with the top to bottom wrap around connection
between the top and bottom rows of FPGAs and FPINs in the array 200
(described previously herein), connects the entire array 200 on a
three-dimensional surface like a cube or sphere.
Referring to FIG. 5, the system bus 220 is a VME bus in one implementation
and the bus interface 216 includes a VME interface chip 500, namely a
Cypress VIC64 driven by a 64 MHz oscillator 502, connected to the system
bus 220 at one port and to the local bus 214 at another port. A boot-up
FPGA 504 has one set of its pins connected to the local bus 214 and
another set of its pins connected to a boot EEPROM 506. Furthermore, the
boot-up FPGA 504 has ten configuration bit output pins 508 connected to
the configuration bit input pins (corresponding to pin 310 of FIG. 3) of
the ten FPGAs 401-410 of the reconfigurable control section 206. The
EEPROM 506 contains instructions which control and configure the boot-up
FPGA 504 when power is first applied. These instructions cause the boot-up
FPGA 504 to transmit via the ten output pins 508 ten configuration files
to the configuration bit input pins of the ten FPGAs 401-410 of the
reconfigurable output section 206. In a preferred embodiment, the
information stored in the boot-up EEPROM 506 corresponds to the
configuration files necessary to configure the FPGAs 410-410 in a
microprocessor architecture. The EEPROM also enables the boot-up FPGA 504
to control the VME interface chip 500 in conformance with the
configuration of the VME system bus 220.
FIG. 6 illustrates how to configure the virtual computer 600 of FIG. 2
immediately upon completion of manufacture. Specifically, a host computer
602 is connected to the system bus and a disk drive 604 is also connected
to the bus 220. Software 606 such as the LDG program for configuring an
array of Xilinx FPGAs is stored in the disk drive 604. Also, information
specifying the configuration bit output pins 508 of the boot-up FPGA 504
and their assignment among the ten FPGAs of the reconfigurable control
section 206 is stored in the disk drive 604. Finally, an array 610 of
logic primitives corresponding to a selected microprocessor logic
architecture (such as the logic architecture of the Motorola 68000
microprocessor) is stored on the disk drive 604. Then, the host computer
602 is instructed to run the LDG software package to process the array of
logic primitives and generate configuration bit files for each of the ten
FPGAs of the reconfigurable control section 206. These configuration bit
files are then stored on the disk drive 604. Thereafter, they are used to
program the boot-up EEPROM 506 so that the configuration files are
downloaded into the corresponding control section FPGAs 401-410 and the
system automatically configures the control section 206 to emulate the
selected microprocessor each time it boots up. Thereafter, the system is a
virtual computer ready to perform any user-defined algorithm.
As one option, the EEPROM 506 may store information enabling the system to
boot up into one of several (e.g. eight) predetermined microprocessor
configurations, depending upon a user-specified choice, enabling the user
to instantly switch the system from one microprocessor architecture to
another, as desired.
FIG. 7 illustrates one mode of operation of the virtual computer of the
invention. The step of block 700 of FIG. 7 is to analyze and define the
current algorithm to be performed. This step requires dividing the
algorithm into a number of sub-algorithms corresponding to the number of
FPGAs in the array 200, and defining logical connections between the
sub-algorithms. Next, the step of block 702 of FIG. 7 is to use the FPID
Pro software to produce the configuration file for the FPINs from the
logical connections defined in the step of block 700. Then, block 704 of
FIG. 7 is to use the LDG software to produce an FPGA output in Xact Design
System Format for each FPGA in the array 200 from each one of the
sub-algorithms defined in the step of block 700. Next, the step of block
706 is to use the Xact Design System software to produce the configuration
files for the FPGAs. Next, the step of block 708 is to load the
configuration files to the disk drive. Then, the step of block 710 is to
send the configuration files from disk to configuration FPGAs 409, 410 in
the control section 206. Next, the step of block 712 is to output the
configuration file for each FPGA in the array on the corresponding pin in
the two banks 412, 414 of configuration bit output pins in the control
section 206. Last, step of block 714 is to output the configuration file
for each FPIN in the array on the corresponding configuration bit output
pin. The array is then enabled to perform the algorithm. Step 700 is then
repeated for the next algorithm to be performed, and then the remaining
steps 702 et seq. are repeated. In this manner a succession of algorithms
are performed.
FIGS. 8a and 8b are pin diagrams corresponding to one implementation of the
elementary cell of FIG. 1. FIG. 9 is a pin diagram illustrating the
connection of a pair of dual-port RAMs to a pair of FPGAs in the right
FPGA column 212.
FIG. 10 illustrates an alternative embodiment of the array 200 in which
each FPGA is connected not only to the four neighboring FPINs (as in FIG.
2) but also to the four neighboring FPGAs, while each FPIN is similarly
connected not only to the four neighboring FPGAs (as in FIG. 2) but also
to the four neighboring FPINs.
As described above, the FPIN is a multi-pin device which connects any one
of its pins with any on | | |