WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Configuration modes for a time multiplexed programmable logic device    
United States Patent5600263   
Link to this pagehttp://www.wikipatents.com/5600263.html
Inventor(s)Trimberger; Stephen M. (San Jose, CA); Carberry; Richard A. (Los Gatos, CA); Johnson; Robert A. (San Jose, CA); Wong; Jennifer (Fremont, CA)
AbstractA PLD is operable in a variety of modes. In a first mode, the timeshare mode, the PLD remains at a single configuration for a plurality of user clock cycles. In a second mode, the logic engine mode, the PLD sequences through multiple configurations for each user cycle. In this mode, the period of time during which a configuration is active is called a micro cycle. In a third mode, the static mode, multiple configurations are programmed identically, so that the PLD performs the same function regardless of the configuration. Finally, the PLD is also operable in a combination mode, wherein part of the chip operates in one mode, for example, the static mode, and another part of the chip operates in the logic engine mode or the timeshare mode. In an alternative or co-existing embodiment, the PLD operates in one configuration mode during at least one user cycle and in another configuration mode during at least another user cycle.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5600263
Configuration modes for a time multiplexed programmable logic device - US Patent 5600263 Drawing
Configuration modes for a time multiplexed programmable logic device
Inventor     Trimberger; Stephen M. (San Jose, CA); Carberry; Richard A. (Los Gatos, CA); Johnson; Robert A. (San Jose, CA); Wong; Jennifer (Fremont, CA)
Owner/Assignee     Xilinx, Inc. (San Jose, CA)
Patent assignment
All assignments
Publication Date     February 4, 1997
Application Number     08/517,018
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 18, 1995
US Classification     326/39 326/41 326/93
Int'l Classification     H03K 019/177
Examiner     Hudspeth; David R.
Assistant Examiner    
Attorney/Law Firm     Harms; Jeanette S.
Address
Parent Case    
Priority Data    
USPTO Field of Search     326/38 326/39 326/40 326/41 326/46 326/93
Patent Tags     configuration modes time multiplexed programmable logic
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5426378
Ong
326/39
Jun,1995

[0 after 0 votes]
5426738
Hsieh
326/38
Jun,1995

[0 after 0 votes]
5155390
Hickman
326/41
Oct,1992

[0 after 0 votes]
4821233
Hsieh
365/203
Apr,1989

[0 after 0 votes]
4750155
Hsieh
365/203
Jun,1988

[0 after 0 votes]
4594661
Moore
712/248
Jun,1986

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method of time multiplexing a programmable logic device comprising the steps of:

providing at least one configurable element;

providing a plurality of configuration memory points for configuring said at least one configurable element, wherein each of said configuration memory points includes a plurality of memory cells and a latch.

2. The method of claim 1 including providing a plurality of configuration modes for said programmable logic device, wherein said programmable logic device operates in one configuration mode during at least one user cycle and in another configuration mode during at least another user cycle.

3. The method of claim 1 including providing a plurality of configuration modes for said programmable logic device, wherein a portion of said configurable elements are configured in one configuration mode and another portion of said configurable elements are configured in another configuration mode.

4. The method of claim 1 including providing a plurality of configuration modes for said programmable logic device, wherein said programmable logic device operates in one configuration mode during at least one user cycle and in another configuration mode during at least another user cycle; and

providing a plurality of configuration modes for said programmable logic device, wherein a portion of said configurable elements are configured in one configuration mode and another portion of said configurable elements are configured in another configuration mode.

5. A method of time multiplexing a programmable logic device comprising the steps of:

providing at least one configurable element including a plurality of configurable combinational logic elements and a plurality of configurable sequential logic elements;

providing a plurality of configuration memory points for configuring said plurality of configurable combinational logic elements and said plurality of configurable sequential logic elements, wherein each of said configuration memory points includes a plurality of memory cells.

6. A method of time multiplexing a programmable logic device comprising a timeshare mode wherein said programmable logic device remains at a single configuration for a plurality of user clock cycles and saves and restores a state value, wherein said step of saving and restoring is done on said programmable logic device.

7. The method of claim 6 wherein said saving and restoring is done after a configuration.

8. A method of time multiplexing a programmable logic device comprising a timeshare mode wherein said programmable logic device remains at a single configuration for a plurality of user clock cycles and recirculates a state value.

9. A method of time multiplexing a programmable logic device comprising providing a logic engine mode wherein said programmable logic device sequences through three or more configurations in one user clock cycle.

10. The method of claim 9 wherein the sequencing is triggered by a signal external to said programmable logic device.

11. The method of claim 9 wherein the sequencing is triggered by a signal generated by at least one configurable element in said programmable logic device.

12. The method of claim 9 wherein the sequencing of said plurality of configurations is repeated in a next user clock cycle.

13. The method of claim 9 further including providing a plurality of configuration modes for said programmable logic device, wherein at least one of said configuration modes includes said logic engine mode.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates generally to a programmable logic device, and in particular to a field programmable gate array in which the configurable logic blocks and the programmable routing matrices are reconfigured dynamically.

2. Description of Related Art

Programmable logic devices such as field programmable gate arrays ("FPGAs") are a well known type of integrated circuit and are of wide applicability due to the flexibility provided by their reprogrammable nature. An FPGA typically includes an array of configurable logic blocks (CLBs) that are programmably interconnected to each other to provide logic functions desired by a user (a circuit designer). An FPGA typically includes a regular array of identical CLBs, wherein each CLB is individually programmed to perform any one of a number of different logic functions. The FPGA has a configurable routing structure for interconnecting the CLBs according to the desired user circuit design. The FPGA also includes a number of configuration memory cells which are coupled to the CLBs to specify the function to be performed by each CLB, as well as to the configurable routing structure to specify the coupling of the input and output lines of each CLB. The FPGA may also include data storage memory cells accessible by a user during operation of the FPGA. However, unless specified otherwise, the term memory cells refers to the configuration memory cells. The Xilinx, Inc. 1994 publication entitled "The Programmable Logic Data Book" describes several FPGA products and is herein incorporated by reference in its entirety.

One approach available in the prior art to increase the complexity and size of logic circuits has been coupling multiple FPGAs (i.e. multiple chips) by external connections. However, due to the limited number of input/output connections, i.e. pins, between the FPGAs, not all circuits can be implemented using this approach. Moreover, using more than one FPGA undesirably increases power consumption, cost, and space to implement the user circuit design.

Another known solution has been increasing the number of CLBs and interconnect structures in the FPGA. However, for any given semiconductor fabrication technology, there are limitations to the number of CLBs that can be fabricated on an integrated circuit chip of practical size. Thus, there continues to be a need to increase the number of logic gates or CLB densities for FPGAs.

Reconfiguring an FPGA to perform different logic functions at different times is known in the art. However, this reconfiguration requires the time-consuming step of reloading a configuration bit stream for each reconfiguration. Moreover, reconfiguration of a prior art FPGA generally requires suspending the implementation of the logic functions, saving the current state of the logic functions in a memory device external to the FPGA, reloading the entire array of memory configurations cells, and inputting the states of the logic functions which have been saved off chip along with any other needed inputs. Each of these steps requires a significant amount of time, thereby rendering reconfiguration impractical for implementing typical circuits.

SUMMARY OF THE INVENTION

In accordance with the present invention, a programmable logic device (PLD) includes at least one configurable element and a plurality of configuration memory points for configuring the at least one configurable element, wherein each of the configuration memory points includes a plurality of memory cells. The PLD switches between configurations sequentially, by random access, or on command from an external or internal signal. This switching, i.e. reconfiguration, allows the PLD to function in one of N configurations, wherein N is equal to the maximum number of memory cells assigned to any configuration memory point. In this manner, a PLD with a number M of actual configuration logic blocks (one example of a configurable element) functions as if it includes M times N effective CLBs. Thus, assuming eight configurations, the PLD implements eight times the amount of logic that it actually contains by including the additional configuration memory. By reconfiguring, the CLBs of the present invention are advantageously reused dynamically, thereby reducing the number of physical CLBs needed to implement a given number of logic functions in a particular user's circuit design by the factor of the number of configurations.

In the above-described configuration, the PLD is operable in a variety of modes. In a first mode, the timeshare mode, the PLD remains at a single configuration for a plurality of user clock cycles, wherein a user clock cycle is defined as the time at which the user's fastest clock cycles.

In a second mode, the logic engine mode, the PLD sequences through multiple configurations for each user cycle. In this mode, the period of time during which a configuration is active is called a micro cycle. In the logic engine mode, the sequencing is triggered by a signal external to said programmable logic device or by a signal from at least one configurable element in said PLD. In one embodiment, the sequencing of the plurality of configurations is repeated in a next user clock cycle.

In a third mode, the static mode, multiple configurations are programmed identically, so that the PLD performs the same function regardless of the configuration.

The PLD of the present invention is also operable in a combination mode, wherein part of the chip operates in one mode, for example, the static mode, and another part of the chip operates in the logic engine mode or the timeshare mode. In an alternative or co-existing embodiment, the PLD operates in one configuration mode during at least one user cycle and in another configuration mode during at least another user cycle.

In one embodiment of the present invention, the PLD further includes a latch to hide the delay of any precharging done on a plurality of bit lines for configuring the memory cells. In another embodiment, the configurable elements include both combinational logic elements and sequential logic elements.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a prior art FPGA configuration bit.

FIG. 2 shows a configuration bit-slice in accordance with the invention.

FIG. 3 illustrates a block diagram of a time-multiplexed CLB.

FIG. 3A shows the configuration select signals, the read select signals, and the write select signals of the present invention provided to a plurality of memory cell blocks, an output multiplexer, and a micro register, respectively.

FIG. 4 shows a more detailed block diagram of a portion of the time-multiplexed CLB illustrated in FIG. 3.

FIG. 5 illustrates a more detailed diagram of a portion of the CLB of FIG. 4.

FIG. 6 shows a truth table for the circuitry of FIG. 5.

FIG. 7 illustrates a two level memory hierarchy.

FIG. 7A shows an embodiment in which two local busses and two global busses carry true and complement versions of signals to a bit set. FIG. 7B illustrates a register configuration for providing access to the memory cells on a CLB-by-CLB basis.

FIG. 8 shows a known four transistor memory cell.

FIG. 9 illustrates a four transistor cell memory configuration in a PLD.

FIG. 10A shows a CLB with a storage device having a fixed delay in accordance with one embodiment of the present invention.

FIG. 10B shows another CLB with a storage device having a fixed delay in accordance with one embodiment of the present invention.

FIG. 11 shows a block diagram of a shared memory.

FIG. 12 shows detail of the shared memory of FIG. 11.

FIG. 13 illustrates word READ timing for the shared memory.

FIG. 14 shows word WRITE timing for the shared memory.

FIG. 15 illustrates burst READ timing for the shared memory.

FIG. 16 shows burst WRITE timing for the shared memory.

FIG. 17 illustrates a plurality of configuration bits for the shared memory.

FIG. 18 shows a configuration access timing graph for the shared memory.

FIG. 19 illustrates a prior art self-timed circuit.

FIG. 20 illustrates timing for the circuit of FIG. 19.

FIG. 21 illustrates a timing circuit for generating multiple internal cycles for each external clock cycle.

FIG. 22A illustrates a single clock sequencer in accordance with one embodiment of the present invention.

FIG. 22B shows an illustrative timing sequence for three configurations.

FIG. 23 illustrates a split memory in accordance with the present invention.

FIG. 24 shows one embodiment of a layout for a CLB.

FIG. 25 illustrates the multi-function time share operating mode of a PLD.

FIG. 26 shows an implementation of the logic engine mode in a PLD.

FIG. 26A illustrates a compression method in accordance with one embodiment of the present invention in which pairs of the levels on the critical path are merged into a single level using the micro register bypass to fit two LUTs serially in the same micro cycle.

FIG. 26B shows two necessary scheduling relationships between a flip-flop and other elements in the device.

FIG. 27 shows a gated clock flip-flop.

FIG. 28 illustrates various library elements and their relationship to the micro cycle clock.

FIG. 29 shows a clock-enabled flip-flop.

FIG. 30 illustrates the rescheduled logic of FIG. 26.

FIGS. 31 and 32 show scheduling and placement look-up tables in two and three-dimensional space, respectively.

FIGS. 33 and 34 illustrates micro cycle sequencing in a time-multiplexed PLD.

FIG. 35 shows all CLBs having a different configuration for each memory cycle.

FIG. 36 shows some CLBs not having a configuration for certain micro cycles.

FIGS. 37A and 37B illustrates two variable depth time multiplexed CLBs.

FIG. 38 shows a CLB with different numbers of micro cycles for different inputs.

FIG. 39 illustrates a state machine which provides appropriate waveforms if the fastest clock is implemented as the user clock, and all other clocks are implemented with micro cycle register enable signals.

FIG. 39A shows a timing diagram of the slow clock signal, the enable signal, and the master clock signal of FIG. 39.

FIG. 40 shows a flow chart for optimizing scheduling in accordance with the present invention.

FIG. 41 shows an illustrative input/output block in accordance with the present invention.

FIG. 42 illustrates a circuit subject to micro cycle interrupt simulation.

FIG. 42A shows the partitioning of the user network of FIG. 42 into sub-networks.

FIGS. 43, 44, and 45 show further transformations of the circuit of FIG. 42.

FIGS. 46 and 47 illustrate pseudo-code translations of the circuits of FIGS. 42 and 45, respectively.

FIG. 48 and 49 show scheduling constraints used in conjunction with the pseudo-code translations of FIGS. 46 and 47, respectively.

FIG. 50 illustrates one micro cycle allocation.

FIG. 51 shows a state diagram for FIG. 50.

FIG. 52 illustrates circuitry for determining an appropriate micro cycle.

FIGS. 53 and 54 show equivalent circuits with synchronized output signals.

FIG. 55 shows a time multiplexed PLD with expandable logic depth.

FIGS. 56A and 56B illustrate two CLBs having their own output micro register and multiplexers.

FIG. 57 shows two CLBs sharing multiplexers.

FIG. 58 illustrates two CLBs sharing multiplexers and having feedback paths.

FIG. 59 shows a portion of a PLD including interconnect.

FIG. 60 illustrates an inverter for use in the PLD of FIG. 59.

FIG. 61 shows an embodiment of the present invention in which an additional register limits access to the memory during a memory access cycle.

FIG. 62 illustrates an embodiment of the present invention in which the configuration data is read in two memory accesses.

FIG. 63A illustrates write select signals provided to the micro register and configuration select signals provided to the configuration memory which in turn controls one output multiplexer.

FIG. 63B shows another embodiment of an output multiplexer.

FIG. 63C illustrates yet another embodiment of an output multiplexer which reduces the number of latches in comparison to the output multiplexer of FIG. 63B.

FIG. 63D shows a table indicating the input signals for an output multiplexer of the present invention.

FIG. 63E illustrates a truth table for a circuit included in the output multiplexer shown in FIG. 63A.

FIG. 63F shows a timing diagram for the output multiplexer illustrated in FIG. 63B.

FIG. 63G illustrates a detailed implementation of the circuit identified in FIG. 63A.

FIGS. 64A and 64B show a timing diagram and circuit which exemplify a skew problem solved by the invention.

DETAILED DESCRIPTION OF THE DRAWINGS

The detailed description is divided into topical sections which are organized according to the following Table of Contents.

______________________________________ Table Of Contents Of Detailed Description ______________________________________ 1.0 Terminology 2.0 Logic Array Architecture 2.1 Micro Registers 2.1a Micro Register Location 2.2 Bus Hierarchy 3.0 Power Conservation 4.0 Shared Memory 5.0 Chip Layout 6.0 Reconfiguration 7.0 Single clock sequencer 7.1 Configuration Sequencing 7.2 Configuration Duration 7.3 Micro cycle Generation for a Synchronous FPGA 8.0 Modes of Operation 8.1 Time-Share Mode 8.2 Logic Engine Mode 8.2a Synchronous/Asynchronous Clocking 8.2b Controller for Logic Engine Mode 8.2c The Scheduler 8.2d Scheduling Compression 8.2e Simultaneous Scheduling and Placement 8.2f Logic Engine Input and Output Signals 8.3 Static Mode 8.4 Mixed Mode 9.0 Miscellaneous 9.1 Variable Depth CLBs 9.2 Micro cycle Interrupt Simulation 9.3 Micro Register Alternatives 9.4 Alternatives for Deeper Logic 9.5 Per-CLB Memory Access Config Bit 9.6 Micro Register Selector Options 9.7 Low Power Interconnect Circuitry 9.8 Multiple Access for Configuration 9.9 Pipelining Features Mode 9.10 Incorporation of ROM Cells ______________________________________

1.0 Terminology

Three types of data (implying three types of memory or storage) are discussed herein: configuration data, user data, and state data. Configuration data determines the configuration of the logic blocks or interconnect when the data is provided to those logic blocks or interconnect. User data is data typically generated by the user logic and stored/retrieved in memory that could otherwise be used for configuration data storage. State data is data defining the logical values of nodes in user logic at any specific time. Typically, state data is stored if the values at the nodes are needed at a later time. The term "state" is used to refer to either all of the node values at a particular time, or a subset of those values.

2.0 Logic Array Architecture

One prior art FPGA, for example one device of the Xilinx XC4000.TM. family of FPGAs which is commercially available from Xilinx, Inc., includes one configuration memory cell to control each programming point. As shown in FIG. 1, a conventional latch 101 (i.e. a four transistor device) plus a select transistor 102 compromise a five transistor (5T) memory cell 100 which forms the basic unit of control for all logic functions on the FPGA chip. U.S. Pat. No. 4,821,233 which issued on Apr. 11, 1989, and U.S. Pat. No. 4,750,155, which issued on Jun. 7, 1988, discuss the configuration of this 5T memory cell in detail and are incorporated by reference herein.

In accordance with the present invention and referring to FIG. 2, each memory cell 100 (FIG. 1) is replaced with a random access memory (RAM) bit set 200. Bit set 200 includes eight memory cells MC0-MC7. Each memory cell MC has a latch 201 and an associated select transistor 202. Memory cells MC0-MC7 are coupled to a common bit line 203 which provides signals to a clocked latch 204. In another embodiment, memory cells MC0-MC7 are conventional six transistor (6T) memory cells which are well known in the art and therefore, not described in detail herein. All configuration bits at the same location, (for example, the third configuration bit stored by latch 201.sub.2 by memory cell MC2) in different bit sets are considered to be in a single "slice" of memory, corresponding to a single configuration of the array.

The additional configuration memory cells increase logic density by dynamic re-use of the FPGA circuitry. Specifically, CLBs and interconnect are configured to perform some defined task at one instant and are reconfigured to perform another task at another instant. Thus, by providing a bit set for each prior FPGA programming point, an FPGA in the present invention "holds" eight times the amount of logic of the prior art FPGA. By reconfiguring the CLBs, the number of function generators in the CLB, typically conventional look up tables ("real LUTs"), needed to implement a given number of LUTs in a user circuit ("virtual LUTs") are reduced by a factor of the number of configurations.

FIG. 3 illustrates a block diagram of one embodiment of a CLB 301 in accordance with the present invention. In this embodiment, CLB 301 includes 320 programming points, each point requiring one bit of configuration data, wherein each bit includes an 8-bit memory. For example, G logic function generator 302 is configured by 128 bits (16 bits.times.8). The configuration bits which control logic function generators 302, 303, and 304, the plurality of multiplexers 305-321, and SR Control are shown as shadowed boxes which represent the eight bit memory set "behind" each of the bits within the configuration word. For clarity, FIG. 3 does not show the switch box and the connection boxes and their associated configuration bits, wherein each programming point in these boxes also includes an 8 bit memory.

During operation, all values in the same slice are read out simultaneously to update the configuration of the CLBs and interconnect on the chip, thereby causing the CLBs to perform different logical functions and the interconnect to make different connections.

2.1 Micro Registers

FIG. 3 shows micro registers 324 and 325 coupled to the output terminals of multiplexers 311 and 312. Each micro register, which stores intermediate logic states, includes eight micro register bits, wherein each micro register bit corresponds to one of the previously described eight memory slices (although in one embodiment, not all bits of all micro registers are present). Just prior to a change of configuration, the micro register bits corresponding to the current memory slice are clocked so as to capture the state of all CLBs (and in some embodiments IOBs). In accordance with the present invention, the contents of micro registers 324 and 325 can be used in any configuration. During each configuration, signals propagate through the FPGA in a conventional manner, with the addition of paths from the micro registers through the programmed interconnect to input terminals of look-up tables (LUTs) or CLBs.

In one embodiment, multiple selectors for each micro register are provided, so a single configuration can either access values produced by multiple other configurations of the CLB, or access current CLB values which bypass micro registers 324 and 325. For example, micro register 324 is coupled to a plurality of output selectors, i.e. multiplexers 313, 314, 315, and 316. In a similar manner, micro register 325 is coupled to multiplexers 317, 318, 319, and 320. Note that each of the above-mentioned multiplexers (selectors) receives signals from function generators 302,303, 304, or signals external to CLB 301 (i.e. signals Hi or DIN). The number of multiplexers limits the number of signals from the micro register that can be used at one time. For example, because there are four output multiplexers for each micro register (i.e. multiplexers 313-316 for micro register 324 and multiplexers 317-328 for micro register 325), a single configuration cannot access more than four signals stored in the same micro register in other configurations.

Referring to FIG. 3A, register write select (RWS) signals determine which micro register bit, i.e. bits 0-7, to write. Read select signals control, for example, output multiplexer 313 which in turn determines which micro register bit to read. Configuration select (CRS) signals determine which read select signals to use from blocks 330, wherein each block includes 8 memory cells MC0-MC7. Note that the RWS signal is provided by a memory controller (explained in further detail in reference to FIGS. 11 and 12) for memory write operations only and by a sequencer (explained in further detail in reference to FIGS. 22A and 52) for other operations. In contrast, the CRS signal is provided by the sequencer for configuration read operations and by the memory controller for other operations.

In the simplest embodiment, the RWS signal is simply the CRS signal delayed by one .mu.cycle (also referenced as "ucycle" and "micro cycle"). That is, the CRS signal specifies the computation at the beginning of the .mu.cycle, and the RWS signal stores the result at the end of the .mu.cycle.

Because the output signal, for example output signal YA, is latched into a pipeline latch 350 with a .mu.CLK signal, there is no need to latch the read select signals or the CRS signals, thereby minimizing silicon area and allowing multiplexer 313 to operate in parallel with the configuration read process.

FIG. 4 shows a more detailed embodiment of a portion of CLB 301 (FIG. 3) which includes micro register 324, multiplexers 305, 321, and 313-316, and D flip-flop 322. Note that the structure shown in FIG. 4 is replicated twice in CLB 301 because there are two sets of micro registers (i.e. micro registers 324 and 325). In this embodiment, multiplexer 311 (FIG. 3) comprises three multiplexers 402,403 and 404. Multiplexers 313, 314, 315, and 316 provide buffered output signals YA, YB, YQA, and YQB, respectively.

The functioning of the RECIRC path is controlled by a clock enable signal. Specifically, when a clock enable signal EC is a logic zero the previous value of the current micro register bit may be obtained in the following manner. First, an output signal from micro register 324 is selected with a CRS signal and transferred via multiplexer 408 into a latch 415 with a .mu.CLK signal. As explained previously in reference FIG. 3A, the CRS signal is the address or location in the bitset of the currently-active configuration. Second, the latched signal QOLD is then fed back into the current micro register bit via multiplexer 402 (controlled by signal EC'), multiplexer 403 (controlled by signal SEL (provided by a configuration bit)), and multiplexer 404 (controlled by signal SAVE (provided by the sequencer)).

The input signals shown in FIG. 4 generally conform to those signals provided in the commercially available Xilinx XC4000 family of FPGAs. For example, signal K is the clock input signal; and signal IV is the initial value of flip-flop 322 upon power-up or reset and is a value provided by a bit set 200 (FIG. 2).

Signal X1 is the input signal to micro register 324 (wherein signal X2 (not shown) is the input signal to micro register 325). Note that bit set 200 (FIG. 2) controls various elements of FIG. 4. The output signal of multiplexer 408 is also provided to (MEM I/F) Memory Interface 405 which provides values to multiplexer 404 for preloading of micro register 324, for power up operations, or for debugging operations, for example. In one embodiment, micro register 324 is addressed such that each bit of the register resides in the same address space as the configuration which generated it, thereby dramatically reducing complexity of accessing a state. Note that the signals (SR' and EC') provided to the set/reset (S/R) and enable clock (EC) terminals of D flip-flop 322 also control the operation of multiplexer 402 via lines 413 and 414.

Multiplexer 403 determines whether a signal from D flip-flop 322 or a signal from multiplexer 402 (in one configuration, a feedback signal RECIRC from micro register 324) is provided to multiplexer 404. Latch 407 captures the output signal from multiplexer 404 and transfers this value to micro register 324 upon the appropriate micro cycle clock signal .mu.CLK. FIGS. 63A, 63B AND 63C illustrate various embodiments for multiplexers 313-320 (FIG. 3).

FIG. 63A illustrates one embodiment of an output multiplexer, in this example, multiplexer 313 (FIG. 4), in accordance with the present invention which provides a CLB output signal OUT(bar) to the interconnect structure. Note that latch 407 and register 324 are shared by multiplexers 314-316 (see FIG. 4). Register 324, receiving register write signals RWS0-RWS7, provides signals uR0-uR7 to multiplexers 6301A-6301D. Address bit A0 determines which of two signals to each multiplexer is then transferred to multiplexers 6301E and 6301F. In a similar manner, address bit A1 determines which of two signals to those multiplexers is transferred to multiplexer 6301G. Address bit A2 determines which input signal is inverted and transferred to multiplexer 6301H. Multiplexer 6301H also receives an inverted register bypass signal RBYP from latch 407 and provides (determined by select signal A3) an output signal to circuit 6302. Address bit A3 determines whether the output signal from multiplexer 6301G or a register bypass signal RBYP is subsequently provided to multiplexer 6301I. Note that if signal RBYP is selected then multiplexer 6301H has provided the value written in the register in the previous micro cycle. Although the RBYP signal eliminates the latency of tree multiplexer 6301, the signal may create some ambiguity as to the value in the previous micro cycle in other than the logic engine mode.

The table illustrated in FIG. 63D indicates the input signals for each multiplexer 313-320, wherein signal Xl is the output signal of latch 407 (i.e. the register bypass signal associated with micro register 324), and signal X2 is the register bypass signal associated with micro register 325. Input signals SBYP0 and SBYP1 refer to sequential bypass signals that are typically generated in the configuration logic blocks of the Xilinx XC4000 family of devices (i.e. signals F, H, DIN, or Q).

Note that signals SBYP0 and SBYP1 are selected by address bits A1-A3. Specifically, address bit A1 is stored in a latch 6303 which controls multiplexer 6301J (i.e. selects between input signals SBYP0 and SBYP1), whereas address bits A2 and A3 are provided to AND gate 6304. If both address bits A2 and A3 are low, then a high signal is stored in latch 6305, otherwise a low signal is stored in latch 6305. The output signal of latch 6305 controls whether multiplexer 6301I selects the output signal of multiplexer 6301H or multiplexer 6301J (as explained in detail below).

FIG. 63B illustrates another embodiment in which latch 407 is connected to micro register 324 which in turn is connected to latches 6311.sub.0 -6311.sub.7, as well as to latch 6312. Because all the above-referenced latches are clocked by micro clock signal uClk, circuit 6315 functions as a plurality of flip-flops with signals RWS0-RWS7 serving as the enable signals to those flip-flops. Moreover, because the micro clock signal uClk is distributed with low skew throughout the chip, signals RWS0-RWS7 can have considerable slop as shown in FIG. 63F by the cross-hatched section which indicates a "don't care" period for signal RWS. Note that to eliminate race problems in circuit 6315, some non-overlap is provided between micro clock signal uClk and uClk(bar) (otherwise, data may pass through the latches during the overlap period). Note that in this embodiment, signals SBYP0 and SBYP1, if chosen, are transferred by multiplexer 313 irrespective of micro cycle clock uClk, whereas if a signal from micro register 324 is chosen then such signal is sampled on the edge of the micro cycle clock uClk.

FIG. 63C illustrates yet another embodiment in which latch 407 is connected to micro register 324 which in turn is connected to multiplexer 313A. As shown, this embodiment provides a multiplexer 313A for the input signals that are latched and another multiplexer 313B for those input signals that are not latched. Thus, latches 6311 (FIG. 63B) have been "pushed" through multiplexer 313, thereby advantageously decreasing the number of latches to one, i.e. latch 6317, from nine latches, i.e. latches 6311.sub.0 -6311.sub.7 in FIG. 63B. Multiplexer 313A is controlled by 4 blocks 330 (see FIG. 3A), whereas multiplexer 313B is controlled by blocks 330 via latch 6318. In this embodiment, a latch 6317 is provided for the output signals from multiplexer 313A. Therefore, once a reconfiguration is complete, the embodiment of FIG. 63C need not wait for a value to ripple through multiplexer 313A.

FIG. 63E illustrates the truth table for circuit 6302 (FIG. 63A). For example, if either signal SBYP0 or signal SBYP1 is selected, then address bits A2 and A3 are zero. Thus, the output signal of gate 6304 (effectively a NOR gate because of its inverted input terminals) is high. After a uClk signal is detected by latch 6305, it outputs a high signal, thereby forcing the output signal of OR gate 6306 high. That high signal effectively makes latch 6307 transparent, thereby allowing either signal SBYP0 or SBYP1 to ripple directly to the CLB output line. In other words, circuit 6302 functions as a multiplexer. Note that the structures shown in FIGS. 63B and 63C also perform the same function, but the function is implemented in a different manner.

On the other hand, if the output signal of micro register 324 is desired, then the output signal of latch 6305 is low and the output signal of OR gate 6306 is the same as the micro clock. In this manner, latch 6307 performs the same function as latch 6317 (FIG. 63C). Thus, in this configuration, circuit 6302 functions as a multiplexer coupled to a latch.

FIG. 63G shows one detailed implementation for circuit 6302 which includes transistors 6330-6333 and inverters 6334-6337.

FIG. 5 illustrates multiplexers 402, 403, and 404 and MEM I/F 405 which, in this embodiment, are consolidated into effectively a single multiplexer circuit 500 which reduces the delay by reducing the number of series pass-transistors. Note that the read signal RD, write signal WR, and memory select signal MSEL are provided by a memory controller (described in detail in reference to FIG. 11), whereas a SAVE signal is provided by a sequencer (described in further detail in reference to FIGS. 22A and 52) and a select signal SEL is provided by a configuration bit. FIG. 6 is a truth table 601 for the various input signals resulting in a particular signal at node 501 (FIG. 5).

2.1a Micro Register Location

Micro registers 324 and 325 (FIG. 3) are located in alternative places. In one embodiment (shown in FIG. 3), micro registers 324, 325 are coupled to the input terminals of output multiplexers 313-320. In a second embodiment, the micro registers are coupled to the input terminals of logic function generators 302 and 303. If, for example, micro register 324 is coupled to the input terminals of logic function generator 302, then multiplexers 313-316 are simplified. Note that if two signals are generated in the same configuration and those signals are needed on the same pin of logic function generators on different configurations, a conflict arises. Specifically, if the micro registers are coupled to the input terminals of the logic function generators, two signals provided to those micro registers cannot be provided on the same configuration.

In a third embodiment, the micro registers are located in the interconnect, wherein signals are routed to the micro registers when available and routed from the micro registers when needed. In one instance, the micro registers are assigned independently of the logic function generators doing the calculation. In this manner, a placement program can automatically select only those micro registers having no conflict. This embodiment provides maximum flexibility as to data storage location.

In a fourth embodiment, the micro registers are located in a storage location independent of the configuration. The address or part of the address may be configuration bits or placement location. In this manner, only those values to be kept are stored and only locations that have no conflict are selected.

2.2 Bus Hierarchy

As described above in the Description of the Related Art, each configuration operation in a prior art FPGA is controlled by a set of configuration memory bits. The busses used to load these configuration bits typically form a single level of hierarchy, with vertical address lines spanning the full height of the CLB array, and horizontal data lines (referred to as a global bus) spanning the full array width.

In accordance with the present invention, each of the prior art configuration memory bits is replaced by N bits. Those N bits, i.e. the bits stored in memory cells MC0-MC7, are connected via their local busses 203 through switches 700 to a global bus 701 as shown in FIG. 7. Local buses 203 may randomly or sequentially access memory cells MC0-MC7 to drive a memory function device 703 (i.e. a programmable point in a CLB or interconnect structure). In one embodiment, switch 700 is a transistor, whereas in other embodiments, switch 700 is a conventional buffered switch. In one embodiment, each memory cell MC is implemented using a 5-transistor memory cell 100 (FIG. 1). Other memory cell implementations are described below in detail.

Local busses 203 are more active because they carry bits for each configuration (to latch 204), while global bus 701 is only active for reconfiguring a plane (also referred to as a slice) or performing a user memory operation. The capacitance of local busses 203 is minimized by compact layout and small transistor sizes for power and speed reasons. Busses 205 provide configuration select (CRS) signals to transistors 202, wherein address busses 702 provide address signals to switches 700.

In one embodiment, local bus 203 and global bus 701 carry true and complement versions of signals if desired. For example, if a memory cell MC is implemented with a conventional six-transistor (6T) memory cell (which is well known in the art and therefore not explained in detail herein), two local busses 203A and 203B, two switches 700A and 700B, and two global busses 701A and 701B are typically used as shown in FIG. 7A, thereby increasing transistor count for each bit set 200A.

In a local bus to global bus transfer, there is only one memory cell MC per global bus 701 taking part in the transfer (thus a column of MC cells for the CLB array). In an illustrative CLB having four columns, and eighty bit sets per column, in accordance with the present invention, a 16.times.16 CLB array forms an array of 64 columns with 1280 bit sets per column.

A refinement of the two level hierarchy is shown in FIG. 17, wherein two local busses 1702A and 1702B are multiplexed onto a single global bus 1701. The advantage of this refinement is a reduction of global bus lines. Note that in other embodiments (not shown), more than two local busses are multiplexed onto a single global bus.

3.0 Power Conservation

Because a large number of bit sets 200, i.e. on the order of 160,000, are provided on one chip, dynamic power consumption is significant. Note that the bit line capacitances, voltage swings and clock cycle times of the 4T, 5T, and 6T memory cells are different. Moreover, the frequency of the voltage swing of their respective bus lines differs. Specifically, referring to FIGS. 8 and 9, 4T cell 801 cannot drive the signals on local buses LB and LBB high because resistors 802 have too high a resistance. Thus, local buses LB and LBB must be precharged (via a low precharge signal PCHB provided to the gates of transistors 902A and 902B) each time a configuration is read. The signal on local bus LBB is the inverse of signal on local bus LB so that on every cycle, either local bus LB or local bus LBB is discharged by one of memory cells 801. Therefore, there is one high and one low transition per cycle which is detected by sense amplifier 901 which in turn drives memory function device 703.

In contrast, referring back to FIG. 7, a 5T memory cell can drive local bus 203 high and low, thereby eliminating the necessity of precharge. (Note that a 6T cell also need not be precharged.) Because sequential accesses are as likely to have the same as have different data, the average bus transition for the 5T case is every other cycle. Note that because the 6T cell has two busses, the average bus transition for that cell is between that of the 4T and 5T cells. Therefore, the 5T memory cell has one-fourth the number of transitions as does the 4T cell, whereas the 6T memory cell has one-half the number of transitions as does the 4T cell. Because each bus transition corresponds to a power usage, the 5T cell reduces power consumption by 75%, whereas