|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to a programmable logic device, and in
particular to a field programmable gate array in which the configurable
logic blocks and the programmable routing matrices are reconfigured
dynamically.
2. Description of Related Art
Programmable logic devices such as field programmable gate arrays ("FPGAs")
are a well known type of integrated circuit and are of wide applicability
due to the flexibility provided by their reprogrammable nature. An FPGA
typically includes an array of configurable logic blocks (CLBs) that are
programmably interconnected to each other to provide logic functions
desired by a user (a circuit designer). An FPGA typically includes a
regular array of identical CLBs, wherein each CLB is individually
programmed to perform any one of a number of different logic functions.
The FPGA has a configurable routing structure for interconnecting the CLBs
according to the desired user circuit design. The FPGA also includes a
number of configuration memory cells which are coupled to the CLBs to
specify the function to be performed by each CLB, as well as to the
configurable routing structure to specify the coupling of the input and
output lines of each CLB. The FPGA may also include data storage memory
cells accessible by a user during operation of the FPGA. However, unless
specified otherwise, the term memory cells refers to the configuration
memory cells. The Xilinx, Inc. 1994 publication entitled "The Programmable
Logic Data Book" describes several FPGA products and is herein
incorporated by reference in its entirety.
One approach available in the prior art to increase the complexity and size
of logic circuits has been coupling multiple FPGAs (i.e. multiple chips)
by external connections. However, due to the limited number of
input/output connections, i.e. pins, between the FPGAs, not all circuits
can be implemented using this approach. Moreover, using more than one FPGA
undesirably increases power consumption, cost, and space to implement the
user circuit design.
Another known solution has been increasing the number of CLBs and
interconnect structures in the FPGA. However, for any given semiconductor
fabrication technology, there are limitations to the number of CLBs that
can be fabricated on an integrated circuit chip of practical size. Thus,
there continues to be a need to increase the number of logic gates or CLB
densities for FPGAs.
Reconfiguring an FPGA to perform different logic functions at different
times is known in the art. However, this reconfiguration requires the
time-consuming step of reloading a configuration bit stream for each
reconfiguration. Moreover, reconfiguration of a prior art FPGA generally
requires suspending the implementation of the logic functions, saving the
current state of the logic functions in a memory device external to the
FPGA, reloading the entire array of memory configurations cells, and
inputting the states of the logic functions which have been saved off chip
along with any other needed inputs. Each of these steps requires a
significant amount of time, thereby rendering reconfiguration impractical
for implementing typical circuits.
SUMMARY OF THE INVENTION
In accordance with the present invention, a programmable logic device (PLD)
includes at least one configurable element and a plurality of
configuration memory points for configuring the at least one configurable
element, wherein each of the configuration memory points includes a
plurality of memory cells. The PLD switches between configurations
sequentially, by random access, or on command from an external or internal
signal. This switching, i.e. reconfiguration, allows the PLD to function
in one of N configurations, wherein N is equal to the maximum number of
memory cells assigned to any configuration memory point. In this manner, a
PLD with a number M of actual configuration logic blocks (one example of a
configurable element) functions as if it includes M times N effective
CLBs. Thus, assuming eight configurations, the PLD implements eight times
the amount of logic that it actually contains by including the additional
configuration memory. By reconfiguring, the CLBs of the present invention
are advantageously reused dynamically, thereby reducing the number of
physical CLBs needed to implement a given number of logic functions in a
particular user's circuit design by the factor of the number of
configurations.
In the above-described configuration, the PLD is operable in a variety of
modes. In a first mode, the timeshare mode, the PLD remains at a single
configuration for a plurality of user clock cycles, wherein a user clock
cycle is defined as the time at which the user's fastest clock cycles.
In a second mode, the logic engine mode, the PLD sequences through multiple
configurations for each user cycle. In this mode, the period of time
during which a configuration is active is called a micro cycle. In the
logic engine mode, the sequencing is triggered by a signal external to
said programmable logic device or by a signal from at least one
configurable element in said PLD. In one embodiment, the sequencing of the
plurality of configurations is repeated in a next user clock cycle.
In a third mode, the static mode, multiple configurations are programmed
identically, so that the PLD performs the same function regardless of the
configuration.
The PLD of the present invention is also operable in a combination mode,
wherein part of the chip operates in one mode, for example, the static
mode, and another part of the chip operates in the logic engine mode or
the timeshare mode. In an alternative or co-existing embodiment, the PLD
operates in one configuration mode during at least one user cycle and in
another configuration mode during at least another user cycle.
In one embodiment of the present invention, the PLD further includes a
latch to hide the delay of any precharging done on a plurality of bit
lines for configuring the memory cells. In another embodiment, the
configurable elements include both combinational logic elements and
sequential logic elements.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a prior art FPGA configuration bit.
FIG. 2 shows a configuration bit-slice in accordance with the invention.
FIG. 3 illustrates a block diagram of a time-multiplexed CLB.
FIG. 3A shows the configuration select signals, the read select signals,
and the write select signals of the present invention provided to a
plurality of memory cell blocks, an output multiplexer, and a micro
register, respectively.
FIG. 4 shows a more detailed block diagram of a portion of the
time-multiplexed CLB illustrated in FIG. 3.
FIG. 5 illustrates a more detailed diagram of a portion of the CLB of FIG.
4.
FIG. 6 shows a truth table for the circuitry of FIG. 5.
FIG. 7 illustrates a two level memory hierarchy.
FIG. 7A shows an embodiment in which two local busses and two global busses
carry true and complement versions of signals to a bit set. FIG. 7B
illustrates a register configuration for providing access to the memory
cells on a CLB-by-CLB basis.
FIG. 8 shows a known four transistor memory cell.
FIG. 9 illustrates a four transistor cell memory configuration in a PLD.
FIG. 10A shows a CLB with a storage device having a fixed delay in
accordance with one embodiment of the present invention.
FIG. 10B shows another CLB with a storage device having a fixed delay in
accordance with one embodiment of the present invention.
FIG. 11 shows a block diagram of a shared memory.
FIG. 12 shows detail of the shared memory of FIG. 11.
FIG. 13 illustrates word READ timing for the shared memory.
FIG. 14 shows word WRITE timing for the shared memory.
FIG. 15 illustrates burst READ timing for the shared memory.
FIG. 16 shows burst WRITE timing for the shared memory.
FIG. 17 illustrates a plurality of configuration bits for the shared
memory.
FIG. 18 shows a configuration access timing graph for the shared memory.
FIG. 19 illustrates a prior art self-timed circuit.
FIG. 20 illustrates timing for the circuit of FIG. 19.
FIG. 21 illustrates a timing circuit for generating multiple internal
cycles for each external clock cycle.
FIG. 22A illustrates a single clock sequencer in accordance with one
embodiment of the present invention.
FIG. 22B shows an illustrative timing sequence for three configurations.
FIG. 23 illustrates a split memory in accordance with the present
invention.
FIG. 24 shows one embodiment of a layout for a CLB.
FIG. 25 illustrates the multi-function time share operating mode of a PLD.
FIG. 26 shows an implementation of the logic engine mode in a PLD.
FIG. 26A illustrates a compression method in accordance with one embodiment
of the present invention in which pairs of the levels on the critical path
are merged into a single level using the micro register bypass to fit two
LUTs serially in the same micro cycle.
FIG. 26B shows two necessary scheduling relationships between a flip-flop
and other elements in the device.
FIG. 27 shows a gated clock flip-flop.
FIG. 28 illustrates various library elements and their relationship to the
micro cycle clock.
FIG. 29 shows a clock-enabled flip-flop.
FIG. 30 illustrates the rescheduled logic of FIG. 26.
FIGS. 31 and 32 show scheduling and placement look-up tables in two and
three-dimensional space, respectively.
FIGS. 33 and 34 illustrates micro cycle sequencing in a time-multiplexed
PLD.
FIG. 35 shows all CLBs having a different configuration for each memory
cycle.
FIG. 36 shows some CLBs not having a configuration for certain micro
cycles.
FIGS. 37A and 37B illustrates two variable depth time multiplexed CLBs.
FIG. 38 shows a CLB with different numbers of micro cycles for different
inputs.
FIG. 39 illustrates a state machine which provides appropriate waveforms if
the fastest clock is implemented as the user clock, and all other clocks
are implemented with micro cycle register enable signals.
FIG. 39A shows a timing diagram of the slow clock signal, the enable
signal, and the master clock signal of FIG. 39.
FIG. 40 shows a flow chart for optimizing scheduling in accordance with the
present invention.
FIG. 41 shows an illustrative input/output block in accordance with the
present invention.
FIG. 42 illustrates a circuit subject to micro cycle interrupt simulation.
FIG. 42A shows the partitioning of the user network of FIG. 42 into
sub-networks.
FIGS. 43, 44, and 45 show further transformations of the circuit of FIG.
42.
FIGS. 46 and 47 illustrate pseudo-code translations of the circuits of
FIGS. 42 and 45, respectively.
FIG. 48 and 49 show scheduling constraints used in conjunction with the
pseudo-code translations of FIGS. 46 and 47, respectively.
FIG. 50 illustrates one micro cycle allocation.
FIG. 51 shows a state diagram for FIG. 50.
FIG. 52 illustrates circuitry for determining an appropriate micro cycle.
FIGS. 53 and 54 show equivalent circuits with synchronized output signals.
FIG. 55 shows a time multiplexed PLD with expandable logic depth.
FIGS. 56A and 56B illustrate two CLBs having their own output micro
register and multiplexers.
FIG. 57 shows two CLBs sharing multiplexers.
FIG. 58 illustrates two CLBs sharing multiplexers and having feedback
paths.
FIG. 59 shows a portion of a PLD including interconnect.
FIG. 60 illustrates an inverter for use in the PLD of FIG. 59.
FIG. 61 shows an embodiment of the present invention in which an additional
register limits access to the memory during a memory access cycle.
FIG. 62 illustrates an embodiment of the present invention in which the
configuration data is read in two memory accesses.
FIG. 63A illustrates write select signals provided to the micro register
and configuration select signals provided to the configuration memory
which in turn controls one output multiplexer.
FIG. 63B shows another embodiment of an output multiplexer.
FIG. 63C illustrates yet another embodiment of an output multiplexer which
reduces the number of latches in comparison to the output multiplexer of
FIG. 63B.
FIG. 63D shows a table indicating the input signals for an output
multiplexer of the present invention.
FIG. 63E illustrates a truth table for a circuit included in the output
multiplexer shown in FIG. 63A.
FIG. 63F shows a timing diagram for the output multiplexer illustrated in
FIG. 63B.
FIG. 63G illustrates a detailed implementation of the circuit identified in
FIG. 63A.
FIGS. 64A and 64B show a timing diagram and circuit which exemplify a skew
problem solved by the invention.
DETAILED DESCRIPTION OF THE DRAWINGS
The detailed description is divided into topical sections which are
organized according to the following Table of Contents.
______________________________________
Table Of Contents Of Detailed Description
______________________________________
1.0 Terminology
2.0 Logic Array Architecture
2.1 Micro Registers
2.1a Micro Register Location
2.2 Bus Hierarchy
3.0 Power Conservation
4.0 Shared Memory
5.0 Chip Layout
6.0 Reconfiguration
7.0 Single clock sequencer
7.1 Configuration Sequencing
7.2 Configuration Duration
7.3 Micro cycle Generation for a Synchronous FPGA
8.0 Modes of Operation
8.1 Time-Share Mode
8.2 Logic Engine Mode
8.2a Synchronous/Asynchronous Clocking
8.2b Controller for Logic Engine Mode
8.2c The Scheduler
8.2d Scheduling Compression
8.2e Simultaneous Scheduling and Placement
8.2f Logic Engine Input and Output Signals
8.3 Static Mode
8.4 Mixed Mode
9.0 Miscellaneous
9.1 Variable Depth CLBs
9.2 Micro cycle Interrupt Simulation
9.3 Micro Register Alternatives
9.4 Alternatives for Deeper Logic
9.5 Per-CLB Memory Access Config Bit
9.6 Micro Register Selector Options
9.7 Low Power Interconnect Circuitry
9.8 Multiple Access for Configuration
9.9 Pipelining Features Mode
9.10 Incorporation of ROM Cells
______________________________________
1.0 Terminology
Three types of data (implying three types of memory or storage) are
discussed herein: configuration data, user data, and state data.
Configuration data determines the configuration of the logic blocks or
interconnect when the data is provided to those logic blocks or
interconnect. User data is data typically generated by the user logic and
stored/retrieved in memory that could otherwise be used for configuration
data storage. State data is data defining the logical values of nodes in
user logic at any specific time. Typically, state data is stored if the
values at the nodes are needed at a later time. The term "state" is used
to refer to either all of the node values at a particular time, or a
subset of those values.
2.0 Logic Array Architecture
One prior art FPGA, for example one device of the Xilinx XC4000.TM. family
of FPGAs which is commercially available from Xilinx, Inc., includes one
configuration memory cell to control each programming point. As shown in
FIG. 1, a conventional latch 101 (i.e. a four transistor device) plus a
select transistor 102 compromise a five transistor (5T) memory cell 100
which forms the basic unit of control for all logic functions on the FPGA
chip. U.S. Pat. No. 4,821,233 which issued on Apr. 11, 1989, and U.S. Pat.
No. 4,750,155, which issued on Jun. 7, 1988, discuss the configuration of
this 5T memory cell in detail and are incorporated by reference herein.
In accordance with the present invention and referring to FIG. 2, each
memory cell 100 (FIG. 1) is replaced with a random access memory (RAM) bit
set 200. Bit set 200 includes eight memory cells MC0-MC7. Each memory cell
MC has a latch 201 and an associated select transistor 202. Memory cells
MC0-MC7 are coupled to a common bit line 203 which provides signals to a
clocked latch 204. In another embodiment, memory cells MC0-MC7 are
conventional six transistor (6T) memory cells which are well known in the
art and therefore, not described in detail herein. All configuration bits
at the same location, (for example, the third configuration bit stored by
latch 201.sub.2 by memory cell MC2) in different bit sets are considered
to be in a single "slice" of memory, corresponding to a single
configuration of the array.
The additional configuration memory cells increase logic density by dynamic
re-use of the FPGA circuitry. Specifically, CLBs and interconnect are
configured to perform some defined task at one instant and are
reconfigured to perform another task at another instant. Thus, by
providing a bit set for each prior FPGA programming point, an FPGA in the
present invention "holds" eight times the amount of logic of the prior art
FPGA. By reconfiguring the CLBs, the number of function generators in the
CLB, typically conventional look up tables ("real LUTs"), needed to
implement a given number of LUTs in a user circuit ("virtual LUTs") are
reduced by a factor of the number of configurations.
FIG. 3 illustrates a block diagram of one embodiment of a CLB 301 in
accordance with the present invention. In this embodiment, CLB 301
includes 320 programming points, each point requiring one bit of
configuration data, wherein each bit includes an 8-bit memory. For
example, G logic function generator 302 is configured by 128 bits (16
bits.times.8). The configuration bits which control logic function
generators 302, 303, and 304, the plurality of multiplexers 305-321, and
SR Control are shown as shadowed boxes which represent the eight bit
memory set "behind" each of the bits within the configuration word. For
clarity, FIG. 3 does not show the switch box and the connection boxes and
their associated configuration bits, wherein each programming point in
these boxes also includes an 8 bit memory.
During operation, all values in the same slice are read out simultaneously
to update the configuration of the CLBs and interconnect on the chip,
thereby causing the CLBs to perform different logical functions and the
interconnect to make different connections.
2.1 Micro Registers
FIG. 3 shows micro registers 324 and 325 coupled to the output terminals of
multiplexers 311 and 312. Each micro register, which stores intermediate
logic states, includes eight micro register bits, wherein each micro
register bit corresponds to one of the previously described eight memory
slices (although in one embodiment, not all bits of all micro registers
are present). Just prior to a change of configuration, the micro register
bits corresponding to the current memory slice are clocked so as to
capture the state of all CLBs (and in some embodiments IOBs). In
accordance with the present invention, the contents of micro registers 324
and 325 can be used in any configuration. During each configuration,
signals propagate through the FPGA in a conventional manner, with the
addition of paths from the micro registers through the programmed
interconnect to input terminals of look-up tables (LUTs) or CLBs.
In one embodiment, multiple selectors for each micro register are provided,
so a single configuration can either access values produced by multiple
other configurations of the CLB, or access current CLB values which bypass
micro registers 324 and 325. For example, micro register 324 is coupled to
a plurality of output selectors, i.e. multiplexers 313, 314, 315, and 316.
In a similar manner, micro register 325 is coupled to multiplexers 317,
318, 319, and 320. Note that each of the above-mentioned multiplexers
(selectors) receives signals from function generators 302,303, 304, or
signals external to CLB 301 (i.e. signals Hi or DIN). The number of
multiplexers limits the number of signals from the micro register that can
be used at one time. For example, because there are four output
multiplexers for each micro register (i.e. multiplexers 313-316 for micro
register 324 and multiplexers 317-328 for micro register 325), a single
configuration cannot access more than four signals stored in the same
micro register in other configurations.
Referring to FIG. 3A, register write select (RWS) signals determine which
micro register bit, i.e. bits 0-7, to write. Read select signals control,
for example, output multiplexer 313 which in turn determines which micro
register bit to read. Configuration select (CRS) signals determine which
read select signals to use from blocks 330, wherein each block includes 8
memory cells MC0-MC7. Note that the RWS signal is provided by a memory
controller (explained in further detail in reference to FIGS. 11 and 12)
for memory write operations only and by a sequencer (explained in further
detail in reference to FIGS. 22A and 52) for other operations. In
contrast, the CRS signal is provided by the sequencer for configuration
read operations and by the memory controller for other operations.
In the simplest embodiment, the RWS signal is simply the CRS signal delayed
by one .mu.cycle (also referenced as "ucycle" and "micro cycle"). That is,
the CRS signal specifies the computation at the beginning of the
.mu.cycle, and the RWS signal stores the result at the end of the
.mu.cycle.
Because the output signal, for example output signal YA, is latched into a
pipeline latch 350 with a .mu.CLK signal, there is no need to latch the
read select signals or the CRS signals, thereby minimizing silicon area
and allowing multiplexer 313 to operate in parallel with the configuration
read process.
FIG. 4 shows a more detailed embodiment of a portion of CLB 301 (FIG. 3)
which includes micro register 324, multiplexers 305, 321, and 313-316, and
D flip-flop 322. Note that the structure shown in FIG. 4 is replicated
twice in CLB 301 because there are two sets of micro registers (i.e. micro
registers 324 and 325). In this embodiment, multiplexer 311 (FIG. 3)
comprises three multiplexers 402,403 and 404. Multiplexers 313, 314, 315,
and 316 provide buffered output signals YA, YB, YQA, and YQB,
respectively.
The functioning of the RECIRC path is controlled by a clock enable signal.
Specifically, when a clock enable signal EC is a logic zero the previous
value of the current micro register bit may be obtained in the following
manner. First, an output signal from micro register 324 is selected with a
CRS signal and transferred via multiplexer 408 into a latch 415 with a
.mu.CLK signal. As explained previously in reference FIG. 3A, the CRS
signal is the address or location in the bitset of the currently-active
configuration. Second, the latched signal QOLD is then fed back into the
current micro register bit via multiplexer 402 (controlled by signal EC'),
multiplexer 403 (controlled by signal SEL (provided by a configuration
bit)), and multiplexer 404 (controlled by signal SAVE (provided by the
sequencer)).
The input signals shown in FIG. 4 generally conform to those signals
provided in the commercially available Xilinx XC4000 family of FPGAs. For
example, signal K is the clock input signal; and signal IV is the initial
value of flip-flop 322 upon power-up or reset and is a value provided by a
bit set 200 (FIG. 2).
Signal X1 is the input signal to micro register 324 (wherein signal X2 (not
shown) is the input signal to micro register 325). Note that bit set 200
(FIG. 2) controls various elements of FIG. 4. The output signal of
multiplexer 408 is also provided to (MEM I/F) Memory Interface 405 which
provides values to multiplexer 404 for preloading of micro register 324,
for power up operations, or for debugging operations, for example. In one
embodiment, micro register 324 is addressed such that each bit of the
register resides in the same address space as the configuration which
generated it, thereby dramatically reducing complexity of accessing a
state. Note that the signals (SR' and EC') provided to the set/reset (S/R)
and enable clock (EC) terminals of D flip-flop 322 also control the
operation of multiplexer 402 via lines 413 and 414.
Multiplexer 403 determines whether a signal from D flip-flop 322 or a
signal from multiplexer 402 (in one configuration, a feedback signal
RECIRC from micro register 324) is provided to multiplexer 404. Latch 407
captures the output signal from multiplexer 404 and transfers this value
to micro register 324 upon the appropriate micro cycle clock signal
.mu.CLK. FIGS. 63A, 63B AND 63C illustrate various embodiments for
multiplexers 313-320 (FIG. 3).
FIG. 63A illustrates one embodiment of an output multiplexer, in this
example, multiplexer 313 (FIG. 4), in accordance with the present
invention which provides a CLB output signal OUT(bar) to the interconnect
structure. Note that latch 407 and register 324 are shared by multiplexers
314-316 (see FIG. 4). Register 324, receiving register write signals
RWS0-RWS7, provides signals uR0-uR7 to multiplexers 6301A-6301D. Address
bit A0 determines which of two signals to each multiplexer is then
transferred to multiplexers 6301E and 6301F. In a similar manner, address
bit A1 determines which of two signals to those multiplexers is
transferred to multiplexer 6301G. Address bit A2 determines which input
signal is inverted and transferred to multiplexer 6301H. Multiplexer 6301H
also receives an inverted register bypass signal RBYP from latch 407 and
provides (determined by select signal A3) an output signal to circuit
6302. Address bit A3 determines whether the output signal from multiplexer
6301G or a register bypass signal RBYP is subsequently provided to
multiplexer 6301I. Note that if signal RBYP is selected then multiplexer
6301H has provided the value written in the register in the previous micro
cycle. Although the RBYP signal eliminates the latency of tree multiplexer
6301, the signal may create some ambiguity as to the value in the previous
micro cycle in other than the logic engine mode.
The table illustrated in FIG. 63D indicates the input signals for each
multiplexer 313-320, wherein signal Xl is the output signal of latch 407
(i.e. the register bypass signal associated with micro register 324), and
signal X2 is the register bypass signal associated with micro register
325. Input signals SBYP0 and SBYP1 refer to sequential bypass signals that
are typically generated in the configuration logic blocks of the Xilinx
XC4000 family of devices (i.e. signals F, H, DIN, or Q).
Note that signals SBYP0 and SBYP1 are selected by address bits A1-A3.
Specifically, address bit A1 is stored in a latch 6303 which controls
multiplexer 6301J (i.e. selects between input signals SBYP0 and SBYP1),
whereas address bits A2 and A3 are provided to AND gate 6304. If both
address bits A2 and A3 are low, then a high signal is stored in latch
6305, otherwise a low signal is stored in latch 6305. The output signal of
latch 6305 controls whether multiplexer 6301I selects the output signal of
multiplexer 6301H or multiplexer 6301J (as explained in detail below).
FIG. 63B illustrates another embodiment in which latch 407 is connected to
micro register 324 which in turn is connected to latches 6311.sub.0
-6311.sub.7, as well as to latch 6312. Because all the above-referenced
latches are clocked by micro clock signal uClk, circuit 6315 functions as
a plurality of flip-flops with signals RWS0-RWS7 serving as the enable
signals to those flip-flops. Moreover, because the micro clock signal uClk
is distributed with low skew throughout the chip, signals RWS0-RWS7 can
have considerable slop as shown in FIG. 63F by the cross-hatched section
which indicates a "don't care" period for signal RWS. Note that to
eliminate race problems in circuit 6315, some non-overlap is provided
between micro clock signal uClk and uClk(bar) (otherwise, data may pass
through the latches during the overlap period). Note that in this
embodiment, signals SBYP0 and SBYP1, if chosen, are transferred by
multiplexer 313 irrespective of micro cycle clock uClk, whereas if a
signal from micro register 324 is chosen then such signal is sampled on
the edge of the micro cycle clock uClk.
FIG. 63C illustrates yet another embodiment in which latch 407 is connected
to micro register 324 which in turn is connected to multiplexer 313A. As
shown, this embodiment provides a multiplexer 313A for the input signals
that are latched and another multiplexer 313B for those input signals that
are not latched. Thus, latches 6311 (FIG. 63B) have been "pushed" through
multiplexer 313, thereby advantageously decreasing the number of latches
to one, i.e. latch 6317, from nine latches, i.e. latches 6311.sub.0
-6311.sub.7 in FIG. 63B. Multiplexer 313A is controlled by 4 blocks 330
(see FIG. 3A), whereas multiplexer 313B is controlled by blocks 330 via
latch 6318. In this embodiment, a latch 6317 is provided for the output
signals from multiplexer 313A. Therefore, once a reconfiguration is
complete, the embodiment of FIG. 63C need not wait for a value to ripple
through multiplexer 313A.
FIG. 63E illustrates the truth table for circuit 6302 (FIG. 63A). For
example, if either signal SBYP0 or signal SBYP1 is selected, then address
bits A2 and A3 are zero. Thus, the output signal of gate 6304 (effectively
a NOR gate because of its inverted input terminals) is high. After a uClk
signal is detected by latch 6305, it outputs a high signal, thereby
forcing the output signal of OR gate 6306 high. That high signal
effectively makes latch 6307 transparent, thereby allowing either signal
SBYP0 or SBYP1 to ripple directly to the CLB output line. In other words,
circuit 6302 functions as a multiplexer. Note that the structures shown in
FIGS. 63B and 63C also perform the same function, but the function is
implemented in a different manner.
On the other hand, if the output signal of micro register 324 is desired,
then the output signal of latch 6305 is low and the output signal of OR
gate 6306 is the same as the micro clock. In this manner, latch 6307
performs the same function as latch 6317 (FIG. 63C). Thus, in this
configuration, circuit 6302 functions as a multiplexer coupled to a latch.
FIG. 63G shows one detailed implementation for circuit 6302 which includes
transistors 6330-6333 and inverters 6334-6337.
FIG. 5 illustrates multiplexers 402, 403, and 404 and MEM I/F 405 which, in
this embodiment, are consolidated into effectively a single multiplexer
circuit 500 which reduces the delay by reducing the number of series
pass-transistors. Note that the read signal RD, write signal WR, and
memory select signal MSEL are provided by a memory controller (described
in detail in reference to FIG. 11), whereas a SAVE signal is provided by a
sequencer (described in further detail in reference to FIGS. 22A and 52)
and a select signal SEL is provided by a configuration bit. FIG. 6 is a
truth table 601 for the various input signals resulting in a particular
signal at node 501 (FIG. 5).
2.1a Micro Register Location
Micro registers 324 and 325 (FIG. 3) are located in alternative places. In
one embodiment (shown in FIG. 3), micro registers 324, 325 are coupled to
the input terminals of output multiplexers 313-320. In a second
embodiment, the micro registers are coupled to the input terminals of
logic function generators 302 and 303. If, for example, micro register 324
is coupled to the input terminals of logic function generator 302, then
multiplexers 313-316 are simplified. Note that if two signals are
generated in the same configuration and those signals are needed on the
same pin of logic function generators on different configurations, a
conflict arises. Specifically, if the micro registers are coupled to the
input terminals of the logic function generators, two signals provided to
those micro registers cannot be provided on the same configuration.
In a third embodiment, the micro registers are located in the interconnect,
wherein signals are routed to the micro registers when available and
routed from the micro registers when needed. In one instance, the micro
registers are assigned independently of the logic function generators
doing the calculation. In this manner, a placement program can
automatically select only those micro registers having no conflict. This
embodiment provides maximum flexibility as to data storage location.
In a fourth embodiment, the micro registers are located in a storage
location independent of the configuration. The address or part of the
address may be configuration bits or placement location. In this manner,
only those values to be kept are stored and only locations that have no
conflict are selected.
2.2 Bus Hierarchy
As described above in the Description of the Related Art, each
configuration operation in a prior art FPGA is controlled by a set of
configuration memory bits. The busses used to load these configuration
bits typically form a single level of hierarchy, with vertical address
lines spanning the full height of the CLB array, and horizontal data lines
(referred to as a global bus) spanning the full array width.
In accordance with the present invention, each of the prior art
configuration memory bits is replaced by N bits. Those N bits, i.e. the
bits stored in memory cells MC0-MC7, are connected via their local busses
203 through switches 700 to a global bus 701 as shown in FIG. 7. Local
buses 203 may randomly or sequentially access memory cells MC0-MC7 to
drive a memory function device 703 (i.e. a programmable point in a CLB or
interconnect structure). In one embodiment, switch 700 is a transistor,
whereas in other embodiments, switch 700 is a conventional buffered
switch. In one embodiment, each memory cell MC is implemented using a
5-transistor memory cell 100 (FIG. 1). Other memory cell implementations
are described below in detail.
Local busses 203 are more active because they carry bits for each
configuration (to latch 204), while global bus 701 is only active for
reconfiguring a plane (also referred to as a slice) or performing a user
memory operation. The capacitance of local busses 203 is minimized by
compact layout and small transistor sizes for power and speed reasons.
Busses 205 provide configuration select (CRS) signals to transistors 202,
wherein address busses 702 provide address signals to switches 700.
In one embodiment, local bus 203 and global bus 701 carry true and
complement versions of signals if desired. For example, if a memory cell
MC is implemented with a conventional six-transistor (6T) memory cell
(which is well known in the art and therefore not explained in detail
herein), two local busses 203A and 203B, two switches 700A and 700B, and
two global busses 701A and 701B are typically used as shown in FIG. 7A,
thereby increasing transistor count for each bit set 200A.
In a local bus to global bus transfer, there is only one memory cell MC per
global bus 701 taking part in the transfer (thus a column of MC cells for
the CLB array). In an illustrative CLB having four columns, and eighty bit
sets per column, in accordance with the present invention, a 16.times.16
CLB array forms an array of 64 columns with 1280 bit sets per column.
A refinement of the two level hierarchy is shown in FIG. 17, wherein two
local busses 1702A and 1702B are multiplexed onto a single global bus
1701. The advantage of this refinement is a reduction of global bus lines.
Note that in other embodiments (not shown), more than two local busses are
multiplexed onto a single global bus.
3.0 Power Conservation
Because a large number of bit sets 200, i.e. on the order of 160,000, are
provided on one chip, dynamic power consumption is significant. Note that
the bit line capacitances, voltage swings and clock cycle times of the 4T,
5T, and 6T memory cells are different. Moreover, the frequency of the
voltage swing of their respective bus lines differs. Specifically,
referring to FIGS. 8 and 9, 4T cell 801 cannot drive the signals on local
buses LB and LBB high because resistors 802 have too high a resistance.
Thus, local buses LB and LBB must be precharged (via a low precharge
signal PCHB provided to the gates of transistors 902A and 902B) each time
a configuration is read. The signal on local bus LBB is the inverse of
signal on local bus LB so that on every cycle, either local bus LB or
local bus LBB is discharged by one of memory cells 801. Therefore, there
is one high and one low transition per cycle which is detected by sense
amplifier 901 which in turn drives memory function device 703.
In contrast, referring back to FIG. 7, a 5T memory cell can drive local bus
203 high and low, thereby eliminating the necessity of precharge. (Note
that a 6T cell also need not be precharged.) Because sequential accesses
are as likely to have the same as have different data, the average bus
transition for the 5T case is every other cycle. Note that because the 6T
cell has two busses, the average bus transition for that cell is between
that of the 4T and 5T cells. Therefore, the 5T memory cell has one-fourth
the number of transitions as does the 4T cell, whereas the 6T memory cell
has one-half the number of transitions as does the 4T cell. Because each
bus transition corresponds to a power usage, the 5T cell reduces power
consumption by 75%, whereas | | |