WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Computer-implemented method of optimizing a design in a time multiplexed programmable logic device    
United States Patent5701441   
Link to this pagehttp://www.wikipatents.com/5701441.html
Inventor(s)Trimberger; Stephen M. (San Jose, CA)
AbstractA computer-implemented method of optimizing a time multiplexed programmable logic device includes identifying a micro cycle, identifying all look-up tables (LUTs) from a list of LUTs of the PLD that may be scheduled in the micro cycle, ordering the LUTs in priority order, selecting the M LUTs with the highest priority (wherein M is the number of real LUTs in the PLD), labeling the M LUTs with the current micro cycle number, removing the M LUTs from the list, identifying the next micro cycle, and if labelled LUTs exist, then repeating all steps, otherwise exiting the computer-implemented method. The step of ordering alternatively includes identifying the LUTs with the earliest latest-possible schedule, identifying the LUTs having input nets in which all LUTs are already scheduled, identifying the LUTs that include a pin on a net that has had at least one of its destination instances scheduled, identifying the LUTs that will complete a net that may be introduced in the micro cycle by the addition of a LUT earlier in a list of the LUTs, or identifying the LUTs that include a pin on a net that may be introduced in the micro cycle by the addition of a LUT earlier in a list of the LUTs. In another embodiment, the method includes the first three steps, then further includes determining whether the number of selected LUTs is equal to M, wherein M is the number of real LUTs in the programmable logic device. The steps of labeling and removing are repeated until the number of selected LUTs is equal to M. Then, the M LUTs with the current micro cycle number are labelled and removed from the list. The next micro cycle is then identified. If labelled LUTs exist, then all steps are repeated.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5701441
Computer-implemented method of optimizing a design in a time multiplexed

     programmable logic device - US Patent 5701441 Drawing
Computer-implemented method of optimizing a design in a time multiplexed programmable logic device
Inventor     Trimberger; Stephen M. (San Jose, CA)
Owner/Assignee     Xilinx, Inc. (San Jose, CA)
Patent assignment
All assignments
Publication Date     December 23, 1997
Application Number     08/516,910
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 18, 1995
US Classification     716/16 326/39 326/41
Int'l Classification     G06F 017/50
Examiner     Teska; Kevin J.
Assistant Examiner     Frejd; Russell W.
Attorney/Law Firm     Harms; Jeanette S.
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/500 395/325 395/375 395/290 364/483 324/312 326/40 326/41 326/39 326/93
Patent Tags     computer-implemented optimizing design time multiplexed programmable logic
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5629637
Trimberger
326/93
May,1997

[0 after 0 votes]
5600263
Trimberger
326/39
Feb,1997

[0 after 0 votes]
5583450
Trimberger
326/41
Dec,1996

[0 after 0 votes]
5469577
Eng
710/110
Nov,1995

[0 after 0 votes]
5432388
Crafts

Jul,1995

[0 after 0 votes]
5426738
Hsieh
326/38
Jun,1995

[0 after 0 votes]
5426378
Ong
326/39
Jun,1995

[0 after 0 votes]
5426744
Sawase
712/228
Jun,1995

[0 after 0 votes]
5377331
Drerup
710/113
Dec,1994

[0 after 0 votes]
5144242
Zeilenga
324/312
Sep,1992

[0 after 0 votes]
5019996
Lee
702/60
May,1991

[0 after 0 votes]
4821233
Hsieh
365/203
Apr,1989

[0 after 0 votes]
4750155
Hsieh
365/203
Jun,1988

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


I claim:

1. A computer-implemented method of optimizing a design in a time multiplexed programmable logic device comprising:

(a) identifying a micro cycle;

(b) identifying all look-up tables (LUTs) from a list of LUTs of said design that may be scheduled in said micro cycle;

(c) ordering said LUTs in priority order;

(d) selecting the M LUTs with the highest priority, wherein M is the number of real LUTs in said programmable logic device;

(e) labeling said M LUTs with the current micro cycle number;

(f) removing said M LUTs from said list;

(g) identifying the next micro cycle; and

(h) if un-labelled LUTs exist, then repeating (a)-(h), otherwise exiting said computer-implemented method.

2. The computer-implemented method of claim 1 wherein (c) includes identifying the LUTs with the earliest latest-possible schedule.

3. The computer-implemented method of claim 1 wherein (c) includes identifying the LUTs having input nets in which all LUTs are already scheduled.

4. The computer-implemented method of claim 1 wherein (c) includes identifying the LUTs that include a pin on a net that has had at least one of its destination instances scheduled.

5. The computer-implemented method of claim 1 wherein (c) includes identifying the LUTs that will complete a net that may be introduced in said micro cycle by the addition of a LUT earlier in a list of said LUTs.

6. The computer-implemented method of claim 1 wherein (c)includes identifying the LUTs that include a pin on a net that may be introduced in said micro cycle by the addition of a LUT earlier in a list of said LUTs.

7. A computer-implemented method of optimizing a design in a time multiplexed programmable logic device comprising:

(a) identifying a micro cycle;

(b) identifying all look-up tables (LUTs) from a list of LUTs of said design that may be scheduled in said micro cycle;

(c) ordering said LUTs in priority order;

(d) selecting the a LUT with the highest priority;

(e) identifying whether the number of selected LUTs is equal to M, wherein M is the number of real LUTs in said programmable logic device;

(f) repeating (d) and (e) until the number of selected LUTs is equal to M;

(g) labeling the M LUTs with the current micro cycle number;

(h) removing said M LUTs from said list;

(i) identifying the next micro cycle; and

(j) if un-labelled LUTs exist, then repeating (a)-(j), otherwise exiting said computer-implemented method.

8. The computer-implemented method of claim 1 further including optimizing said labeling by moving at least one additional LUT into at least one micro cycle.

9. The computer-implemented method of claim 7 further including optimizing said labeling by moving at least one additional LUT into at least one micro cycle.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates generally to a programmable logic device, and in particular to a field programmable gate array in which the configurable logic blocks and the programmable routing matrices are reconfigured dynamically.

2. Description of Related Art

Programmable logic devices such as field programmable gate arrays ("FPGAs") are a well known type of integrated circuit and are of wide applicability due to the flexibility provided by their reprogrammable nature. An FPGA typically includes an array of configurable logic blocks (CLBs) that are programmably interconnected to each other to provide logic functions desired by a user (a circuit designer). An FPGA typically includes a regular array of identical CLBs, wherein each CLB is individually programmed to perform any one of a number of different logic functions. The FPGA has a configurable routing structure for interconnecting the CLBs according to the desired user circuit design. The FPGA also includes a number of configuration memory cells which are coupled to the CLBs to specify the function to be performed by each CLB, as well as to the configurable routing structure to specify the coupling of the input and output lines of each CLB. The FPGA may also include data storage memory cells accessible by a user during operation of the FPGA. However, unless specified otherwise, the term memory cells refers to the configuration memory cells. The Xilinx, Inc. 1994 publication entitled "The Programmable Logic Data Book" describes several FPGA products and is herein incorporated by reference in its entirety.

One approach available in the prior art to increase the complexity and size of logic circuits has been coupling multiple FPGAs (i.e. multiple chips) by external connections. However, due to the limited number of input/output connections, i.e. pins, between the FPGAs, not all circuits can be implemented using this approach. Moreover, using more than one FPGA undesirably increases power consumption, cost, and space to implement the user circuit design.

Another known solution has been increasing the number of CLBs and interconnect structures in the FPGA. However, for any given semiconductor fabrication technology, there are limitations to the number of CLBs that can be fabricated on an integrated circuit chip of practical size. Thus, there continues to be a need to increase the number of logic gates or CLB densities for FPGAs.

Reconfiguring an FPGA to perform different logic functions at different times is known in the art. However, this reconfiguration requires the time consuming step of reloading a configuration bit stream for each reconfiguration. Moreover, reconfiguration of a prior art FPGA generally requires suspending the implementation of the logic functions, saving the current state of the logic functions in a memory device external to the FPGA, reloading the entire array of memory configurations cells, and inputting the states of the logic functions which have been saved off chip along with any other needed inputs. Each of these steps requires a significant amount of time, thereby rendering reconfiguration impractical for implementing typical circuits.

SUMMARY OF THE INVENTION

In accordance with the present invention, a computer-implemented method of optimizing a time multiplexed programmable logic device includes identifying a micro cycle, identifying all look-up tables (LUTs) from a list of LUTs of the PLD that may be scheduled in the micro cycle, ordering the LUTs in priority order, selecting the M LUTs with the highest priority (wherein M is the number of real LUTs in the PLD), labeling the M LUTs with the current micro cycle number, removing the M LUTs from the list, identifying the next micro cycle, and if labelled LUTs exist, then repeating all steps, otherwise exiting the computer-implemented method. The step of ordering alternatively includes identifying the LUTs with the earliest latest-possible schedule, identifying the LUTs having input nets in which all LUTs are already scheduled, identifying the LUTs that include a pin on a net that has had at least one of its destination instances scheduled, identifying the LUTs that will complete a net that may be introduced in the micro cycle by the addition of a LUT earlier in a list of the LUTs, or identifying the LUTs that include a pin on a net that may be introduced in the micro cycle by the addition of a LUT earlier in a list of the LUTs.

In another embodiment of the present invention, the method includes the first three steps, then further includes determining whether the number of selected LUTs is equal to M, wherein M is the number of real LUTs in the programmable logic device. The steps of labeling and removing are repeated until the number of selected LUTs is equal to M. Then, the M LUTs with the current micro cycle number are labelled and removed from the list. The next micro cycle is then identified. If labelled LUTs exist, then all steps are repeated.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a prior art FPGA configuration bit.

FIG. 2 shows a configuration bit-slice in accordance with the invention.

FIG. 3 illustrates a block diagram of a time-multiplexed CLB.

FIG. 3A shows the configuration select signals, the read select signals, and the write select signals of the present invention provided to a plurality of memory cell blocks, an output multiplexer, and a micro register, respectively.

FIG. 4 shows a more detailed block diagram of a portion of the time-multiplexed CLB illustrated in FIG. 3.

FIG. 5 illustrates a more detailed diagram of a portion of the CLB of FIG. 4.

FIG. 6 shows a truth table for the circuitry of FIG. 5.

FIG. 7 illustrates a two level memory hierarchy.

FIG. 7A shows an embodiment in which two local busses and two global busses carry true and complement versions of signals to a bit set.

FIG. 7B illustrates a register configuration for providing access to the memory cells on a CLB-by-CLB basis.

FIG. 8 shows a known four transistor memory cell.

FIG. 9 illustrates a four transistor cell memory configuration in a PLD.

FIG. 10A shows a CLB with a storage device having a fixed delay in accordance with one embodiment of the present invention.

FIG. 10B shows another CLB with a storage device having a fixed delay in accordance with one embodiment of the present invention.

FIG. 11 shows a block diagram of a shared memory.

FIG. 12 shows detail of the shared memory of FIG. 11.

FIG. 13 illustrates word READ timing for the shared memory.

FIG. 14 shows word WRITE timing for the shared memory.

FIG. 15 illustrates burst READ timing for the shared memory.

FIG. 16 shows burst WRITE timing for the shared memory.

FIG. 17 illustrates a plurality of configuration bits for the shared memory.

FIG. 18 shows configuration access timing graph for the shared memory.

FIG. 19 illustrates a prior art self-timed circuit.

FIG. 20 illustrates timing for the circuit of FIG. 19.

FIG. 21 illustrates a timing circuit for generating multiple internal cycles for each external clock cycle.

FIG. 22A illustrates a single clock sequencer in accordance with one embodiment of the present invention.

FIG. 22B shows an illustrative timing sequence for three configurations.

FIG. 23 illustrates a split memory in accordance with the present invention.

FIG. 24 shows one embodiment of a layout for a CLB.

FIG. 25 illustrates the multi-function time share operating mode of a PLD.

FIG. 26 shows an implementation of the logic engine mode in a PLD.

FIG. 26A illustrates a compression method in accordance with one embodiment of the present invention in which pairs of the levels on the critical path are merged into a single level using the micro register bypass to fit two LUTs serially in the same micro cycle.

FIG. 26B shows two necessary scheduling relationships between a flip-flop and other elements in the device.

FIG. 27 shows a gated clock flip-flop.

FIG. 28 illustrates various library elements and their relationship to the micro cycle clock.

FIG. 29 shows a clock-enabled flip-flop.

FIG. 30 illustrates the rescheduled logic of FIG. 26.

FIGS. 31 and 32 show scheduling and placement look-up tables in two and three-dimensional space, respectively.

FIGS. 33 and 34 illustrates micro cycle sequencing in a time-multiplexed PLD.

FIG. 35 shows all CLBs having a different configuration for each memory cycle.

FIG. 36 shows some CLBs not having a configuration for certain micro cycles.

FIGS. 37A and 37B illustrates two variable depth time multiplexed CLBs.

FIG. 38 shows a CLB with different numbers of micro cycles for different inputs.

FIG. 39 illustrates a state machine which provides appropriate waveforms if the fastest clock is implemented as the user clock, and all other clocks are implemented with micro cycle register enable signals.

FIG. 39A shows a timing diagram of the slow clock signal, the enable signal, and the master clock signal of FIG. 39.

FIG. 40 shows a flow chart for optimizing scheduling in accordance with the present invention.

FIG. 41 shows an illustrative input/output block in accordance with the present invention.

FIG. 42 illustrates a circuit subject to micro cycle interrupt simulation.

FIG. 42A shows the partitioning of the user network of FIG. 42 into sub-networks.

FIGS. 43, 44, and 45 show further transformations of the circuit of FIG. 42.

FIGS. 46 and 47 illustrate pseudo-code translations of the circuits of FIGS. 42 and 45, respectively.

FIGS. 48 and 49 show scheduling constraints used in conjunction with the pseudo-code translations of FIGS. 46 and 47, respectively.

FIG. 50 illustrates one micro cycle allocation.

FIG. 51 shows a state diagram for FIG. 50.

FIG. 52 illustrates circuitry for determining an appropriate micro cycle.

FIGS. 53 and 54 show equivalent circuits with synchronized output signals.

FIG. 55 shows a time multiplexed PLD with expandable logic depth.

FIGS. 56A and 56B illustrate two CLBs having their own output micro register and multiplexers.

FIG. 57 shows two CLBs sharing multiplexers.

FIG. 58 illustrates two CLBs sharing multiplexers and having feedback paths.

FIG. 59 shows a portion of a PLD including interconnect.

FIG. 60 illustrates an inverter for use in the PLD of FIG. 59.

FIG. 61 shows an embodiment of the present invention in which an additional register limits access to the memory during a memory access cycle.

FIG. 62 illustrates an embodiment of the present invention in which the configuration data is read in two memory accesses.

FIG. 63A illustrates write select signals provided to the micro register and configuration select signals provided to the configuration memory which in turn controls one output multiplexer.

FIG. 63B shows another embodiment of an output multiplexer.

FIG. 63C illustrates yet another embodiment of an output multiplexer which reduces the number of latches in comparison to the output multiplexer of FIG. 63B.

FIG. 63D shows a table indicating the input signals for an output multiplexer of the present invention.

FIG. 63E illustrates a truth table for a circuit included in the output multiplexer shown in FIG. 63A.

FIG. 63F shows a timing diagram for the output multiplexer illustrated in FIG. 63B.

FIG. 63G illustrates a detailed implementation of the circuit identified in FIG. 63A.

FIGS. 64A and 64B show a timing diagram and circuit which exemplify a skew problem solved by the invention.

DETAILED DESCRIPTION OF THE DRAWINGS

The detailed description is divided into topical sections which are organized according to the following Table of Contents.

______________________________________ Table Of Contents Of Detailed Description ______________________________________ 1.0 Terminology 9.0 Miscellaneous 2.0 Logic Array Architecture 9.1 Variable Depth 2.1 Micro Registers CLBs 2.1a Micro Register 9.2 Micro cycle Location Interrupt 2.2 Bus Hierarchy Simulation 3.0 Power Conservation 9.3 Micro Register 4.0 Shared Memory Alternatives 5.0 Chip Layout 9.4 Alternatives for 6.0 Reconfiguration Deeper Logic 7.0 Single clock sequencer 9.5 Per-CLB Memory 7.1 Configuration Access Config Bit Sequencing 9.6 Micro Register 7.2 Configuration Selector Options Duration 9.7 Low Power 7.3 Micro cycle Interconnect Generation for a Circuitry Synchronous FFGA 9.8 Multiple Access for 8.0 Modes of Operation Configuration 8.1 Time-Share Mode 9.9 Pipelining Features 8.2 Logic Engine Mode Mode 8.2a Synchronous/ 9.10 Incorporation of Asynchronous ROM Cells Clocking 8.2b Controller for Logic Engine Mode 8.2c The Scheduler 8.2d Scheduling Compression 8.2e Simultaneous Scheduling and Placement 8.2f Logic Engine Input and Output Signals 8.3 Static Mode 8.4 Mixed Mode ______________________________________

1.0 Terminology

Three types of data (implying three types of memory or storage) are discussed herein: configuration data, user data, and state data. Configuration data determines the configuration of the logic blocks or interconnect when the data is provided to those logic blocks or interconnect. User data is data typically generated by the user logic and stored/retrieved in memory that could otherwise be used for configuration data storage. State data is data defining the logical values of nodes in user logic at any specific time. Typically, state data is stored if the values at the nodes are needed at a later time. The term "state" is used to refer to either all of the node values at a particular time, or a subset of those values.

2.0 Logic Array Architecture

One prior art FPGA, for example one device of the Xilinx XC4000.TM. family of FPGAs which is commercially available from Xilinx, Inc., includes one configuration memory cell to control each programing point. As shown in FIG. 1, a conventional latch 101 (i.e. a four transistor device) plus a select transistor 102 compromise a five transistor (5T) memory cell 100 which forms the basic unit of control for all logic functions on the FPGA chip. U.S. Pat. No. 4,821,233 which issued on Apr. 11, 1989, and U.S. Pat. No. 4,750,155, which issued on Jun. 7, 1988, discuss the configuration of this 5T memory cell in detail and are incorporated by reference herein.

In accordance with the present invention and referring to FIG. 2, each memory cell 100 (FIG. 1) is replaced with a random access memory (RAM) bit set 200. Bit set 200 includes eight memory cells MC0-MC7. Each memory cell MC has a latch 201 and an associated select transistor 202. Memory cells MC0-MC7 are coupled to a common bit line 203 which provides signals to a clocked latch 204. In another embodiment, memory cells MC0-MC7 are conventional six transistor (6T) memory cells which are well known in the art and therefore, not described in detail herein. All configuration bits at the same location, (for example, the third configuration bit stored by latch 2012 by memory cell MC2) in different bit sets are considered to be in a single "slice" of memory, corresponding to a single configuration of the array.

The additional configuration memory cells increase logic density by dynamic re-use of the FPGA circuitry. Specifically, CLBs and interconnect are configured to perform some defined task at one instant and are reconfigured to perform another task at another instant. Thus, by providing a bit set for each prior FPGA programming point, an FPGA in the present invention "holds" eight times the amount of logic of the prior art FPGA. By reconfiguring the CLBs, the number of function generators in the CLB, typically conventional look up tables ("real LUTs"), needed to implement a given number of LUTs in a user circuit ("virtual LUTS") are reduced by a factor of the number of configurations.

FIG. 3 illustrates a block diagram of one embodiment of a CLB 301 in accordance with the present invention. In this embodiment, CLB 301 includes 320 programming points, each point requiring one bit of configuration data, wherein each bit includes an 8-bit memory. For example, G logic function generator 302 is configured by 128 bits (16 bits.times.8). The configuration bits which control logic function generators 302, 303, and 304, the plurality of multiplexers 305-321, and SR Control are shown as shadowed boxes which represent the eight bit memory set "behind" each of the bits within the configuration word. For clarity, FIG. 3 does not show the switch box and the connection boxes and their associated configuration bits, wherein each programming point in these boxes also includes an 8 bit memory.

During operation, all values in the same slice are read out simultaneously to update the configuration of the CLBs and interconnect on the chip, thereby causing the CLBs to perform different logical functions and the interconnect to make different connections.

2.1 Micro Registers

FIG. 3 shows micro registers 324 and 325 coupled to the output terminals of multiplexers 311 and 312. Each micro register, which stores intermediate logic states, includes eight micro register bits, wherein each micro register bit corresponds to one of the previously described eight memory slices (although in one embodiment, not all bits of all micro registers are present). Just prior to a change of configuration, the micro register bits corresponding to the current memory slice are clocked so as to capture the state of all CLBs (and in some embodiments IOBs). In accordance with the present invention, the contents of micro registers 324 and 325 can be used in any configuration. During each configuration, signals propagate through the FPGA in a conventional manner, with the addition of paths from the micro registers through the programmed interconnect to input terminals of look-up tables (LUTs) or CLBs.

In one embodiment, multiple selectors for each micro register are provided, so a single configuration can either access values produced by multiple other configurations of the CLB, or access current CLB values which bypass micro registers 324 and 325. For example, micro register 324 is coupled to a plurality of output selectors, i.e. multiplexers 313, 314, 315, and 316. In a similar manner, micro register 325 is coupled to multiplexers 317, 318, 319, and 320. Note that each of the above-mentioned multiplexers (selectors) receives signals from function generators 302, 303, 304, or signals external to CLB 301 (i.e. signals H1 or DIN). The number of multiplexers limits the number of signals from the micro register that can be used at one time. For example, because there are four output multiplexers for each micro register (i.e. multiplexers 313-316 for micro register 324 and multiplexers 317-328 for micro register 325), a single configuration cannot access more than four signals stored in the same micro register in other configurations.

Referring to FIG. 3A, register write select (RWS) signals determine which micro register bit, i.e. bits 0-7, to write. Read select signals control, for example, output multiplexer 313 which in turn determines which micro register bit to read. Configuration select (CRS) signals determine which read select signals to use from blocks 330, wherein each block includes 8 memory cells MC0-MC7. Note that the RWS signal is provided by a memory controller (explained in further detail in reference to FIGS. 11 and 12) for memory write operations only and by a sequencer (explained in further detail in reference to FIGS. 22A and 52) for other operations. In contrast, the CRS signal is provided by the sequencer for configuration read operations and by the memory controller for other operations.

In the simplest embodiment, the RWS signal is simply the CRS signal delayed by one .mu.cycle (also referenced as "ucycle" and "micro cycle"). That is, the CRS signal specifies the computation at the beginning of the .mu.cycle, and the RWS signal stores the result at the end of the .mu.cycle.

Because the output signal, for example output signal YA, is latched into a pipeline latch 350 with a .mu.CLK signal, there is no need to latch the read select signals or the CRS signals, thereby minimizing silicon area and allowing multiplexer 313 to operate in parallel with the configuration read process.

FIG. 4 shows a more detailed embodiment of a portion of CLB 301 (FIG. 3) which includes micro register 324, multiplexers 305, 321, and 313-316, and D flip-flop 322. Note that the structure shown in FIG. 4 is replicated twice in CLB 301 because there are two sets of micro registers (i.e. micro registers 324 and 325). In this embodiment, multiplexer 311 (FIG. 3) comprises three multiplexers 402, 403 and 404. Multiplexers 313, 314, 315, and 316 provide buffered output signals YA, YB, YQA, and YQB, respectively.

The functioning of the RECIRC path is controlled by a clock enable signal. Specifically, when a clock enable signal EC is a logic zero the previous value of the current micro register bit may be obtained in the following manner. First, an output signal from micro register 324 is selected with a CRS signal and transferred via multiplexer 408 into a latch 415 with a .mu.CLK signal. As explained previously in reference FIG. 3A, the CRS signal is the address or location in the bitset of the currently-active configuration. Second, the latched signal QOLD is then fed back into the current micro register bit via multiplexer 402 (controlled by signal EC'), multiplexer 403 (controlled by signal SEL (provided by a configuration bit)), and multiplexer 404 (controlled by signal SAVE (provided by the sequencer)).

The input signals shown in FIG. 4 generally conform to those signals provided in the commercially available Xilinx XC4000 family of FPGAs. For example, signal K is the clock input signal; and signal IV is the initial value of flip-flop 322 upon power-up or reset and is a value provided by a bit set 200 (FIG. 2).

Signal X1 is the input signal to micro register 324 (wherein signal X2 (not shown) is the input signal to micro register 325). Note that bit set 200 (FIG. 2) controls various elements of FIG. 4. The output signal of multiplexer 408 is also provided to (MEM I/F) Memory Interface 405 which provides values to multiplexer 404 for preloading of micro register 324, for power up operations, or for debugging operations, for example. In one embodiment, micro register 324 is addressed such that each bit of the register resides in the same address space as the configuration which generated it, thereby dramatically reducing complexity of accessing a state. Note that the signals (SR' and EC' ) provided to the set/reset (S/R) and enable clock (EC) terminals of D flip-flop 322 also control the operation of multiplexer 402 via lines 413 and 414.

Multiplexer 403 determines whether a signal from D flip-flop 322 or a signal from multiplexer 402 (in one configuration, a feedback signal RECIRC from micro register 324) is provided to multiplexer 404. Latch 407 captures the output signal from multiplexer 404 and transfers this value to micro register 324 upon the appropriate micro cycle clock signal .mu.CLK. FIGS. 63A, 63B AND 63C illustrate various embodiments for multiplexers 313-320 (FIG. 3).

FIG. 63A illustrates one embodiment of an output multiplexer, in this example, multiplexer 313 (FIG. 4), in accordance with the present invention which provides a CLB output signal OUT(bar) to the interconnect structure. Note that latch 407 and register 324 are shared by multiplexers 314-316 (see FIG. 4). Register 324, receiving register write signals RWS0-RWS7, provides signals uR0-uR7 to multiplexers 6301A-6301D. Address bit A0 determines which of two signals to each multiplexer is then transferred to multiplexers 6301E and 6301F. In a similar manner, address bit A1 determines which of two signals to those multiplexers is transferred to multiplexer 6301G. Address bit A2 determines which input signal is inverted and transferred to multiplexer 6301H. Multiplexer 6301H also receives an inverted register bypass signal RBYP from latch 407 and provides (determined by select signal A3) an output signal to circuit 6302. Address bit A3 determines whether the output signal from multiplexer 6301G or a register bypass signal RBYP is subsequently provided to multiplexer 6301I. Note that if signal RBYP is selected then multiplexer 6301H has provided the value written in the register in the previous micro cycle. Although the RBYP signal eliminates the latency of tree multiplexer 6301, the signal may create some ambiguity as to the value in the previous micro cycle in other than the logic engine mode.

The table illustrated in FIG. 63D indicates the input signals for each multiplexer 313-320, wherein signal X1 is the output signal of latch 407 (i.e. the register bypass signal associated with micro register 324), and signal X2 is the register bypass signal associated with micro register 325. Input signals SBYP0 and SBYP1 refer to sequential bypass signals that are typically generated in the configuration logic blocks of the Xilinx XC4000 family of devices (i.e. signals F, H, DIN, or Q).

Note that signals SBYP0 and SBYP1 are selected by address bits A1-A3. Specifically, address bit A1 is stored in a latch 6303 which controls multiplexer 6301J (i.e. selects between input signals SBYP0 and SBYP1), whereas address bits A2 and A3 are provided to AND gate 6304. If both address bits A2 and A3 are low, then a high signal is stored in latch 6305, otherwise a low signal is stored in latch 6305. The output signal of latch 6305 controls whether multiplexer 6301I selects the output signal of multiplexer 6301H or multiplexer 6301J (as explained in detail below).

FIG. 63B illustrates another embodiment in which latch 407 is connected to micro register 324 which in turn is connected to latches 6311.sub.0 -6311.sub.7, as well as to latch 6312. Because all the above-referenced latches are clocked by micro clock signal uClk, circuit 6315 functions as a plurality of flip-flops with signals RWS0-RWS7 serving as the enable signals to those flip-flops. Moreover, because the micro clock signal uClk is distributed with low skew throughout the chip, signals RWS0-RWS7 can have considerable slop as shown in FIG. 63F by the cross-hatched section which indicates a "don't care" period for signal RWS. Note that to eliminate race problems in circuit 6315, some non-overlap is provided between micro clock signal uClk and uClk(bar) (otherwise, data may pass through the latches during the overlap period). Note that in this embodiment, signals SBYP0 and SBYP1, if chosen, are transferred by multiplexer 313 irrespective of micro cycle clock uClk, whereas if a signal from micro register 324 is chosen then such signal is sampled on the edge of the micro cycle clock uClk.

FIG. 63C illustrates yet another embodiment in which latch 407 is connected to micro register 324 which in turn is connected to multiplexer 313A. As shown, this embodiment provides a multiplexer 313A for the input signals that are latched and another multiplexer 313B for those input signals that are not latched. Thus, latches 6311 (FIG. 63B) have been "pushed" through multiplexer 313, thereby advantageously decreasing the number of latches to one, i.e. latch 6317, from nine latches, i.e. latches 6311.sub.0 -6311.sub.7 in FIG. 63B. Multiplexer 313A is controlled by 4 blocks 330 (see FIG. 3A), whereas multiplexer 313B is controlled by blocks 330 via latch 6318. In this embodiment, a latch 6317 is provided for the output signals from multiplexer 313A. Therefore, once a reconfiguration is complete, the embodiment of FIG. 63C need not wait for a value to ripple through multiplexer 313A.

FIG. 63E illustrates the truth table for circuit 6302 (FIG. 63A). For example, if either signal SBYP0 or signal SBYP1 is selected, then address bits A2 and A3 are zero. Thus, the output signal of gate 6304 (effectively a NOR gate because of its inverted input terminals) is high. After a uClk signal is detected by latch 6305, it outputs a high signal, thereby forcing the output signal of OR gate 6306 high. That high signal effectively makes latch 6307 transparent, thereby allowing either signal SBYP0 or SBYP1 to ripple directly to the CLB output line. In other words, circuit 6302 functions as a multiplexer. Note that the structures shown in FIGS. 63B and 63C also perform the same function, but the function is implemented in a different manner.

On the other hand, if the output signal of micro register 324 is desired, then the output signal of latch 6305 is low and the output signal of OR gate 6306 is the same as the micro clock. In this manner, latch 6307 performs the same function as latch 6317 (FIG. 63C). Thus, in this configuration, circuit 6302 functions as a multiplexer coupled to a latch.

FIG. 63G shows one detailed implementation for circuit 6302 which includes transistors 6330-6333 and inverters 6334-6337.

FIG. 5 illustrates multiplexers 402, 403, and 404 and MEM I/F 405 which, in this embodiment, are consolidated into effectively a single multiplexer circuit 500 which reduces the delay by reducing the number of series pass-transistors. Note that the read signal RD, write signal WR, and memory select signal MSEL are provided by a memory controller (described in detail in reference to FIG. 11), whereas a SAVE signal is provided by a sequencer (described in further detail in reference to FIGS. 22A and 52) and a select signal SEL is provided by a configuration bit. FIG. 6 is a truth table 601 for the various input signals resulting in a particular signal at node 501 (FIG. 5).

2.1a Micro Register Location

Micro registers 324 and 325 (FIG. 3) are located in alternative places. In one embodiment (shown in FIG. 3), micro registers 324, 325 are coupled to the input terminals of output multiplexers 313-320. In a second embodiment, the micro registers are coupled to the input terminals of logic function generators 302 and 303. If, for example, micro register 324 is coupled to the input terminals of logic function generator 302, then multiplexers 313-316 are simplified. Note that if two signals are generated in the same configuration and those signals are needed on the same pin of logic function generators on differ