WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Intermediate-grain reconfigurable processing device    
United States Patent5956518   
Link to this pagehttp://www.wikipatents.com/5956518.html
Inventor(s)DeHon; Andre (Cambridge, MA); Mirsky; Ethan (Cambridge, MA); Knight, Jr.; Thomas F. (Belmont, MA)
AbstractA programmable integrated circuit utilizes a large number of intermediate-grain processing elements which are multibit processing units arranged in a configurable mesh. The coarse-grain resources, such as memory and processing, are deployable in a way that takes advantage of the opportunities for optimization present in given problems. To accomplish this, the interconnect supports three different modes of operation: a static value in which a value set by the configuration data is provided to a functional unit, static source in which another functional unit serves as the value source, and a dynamic source mode in which the source is determined by the value from another functional unit.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5956518
Intermediate-grain reconfigurable processing device - US Patent 5956518 Drawing
Intermediate-grain reconfigurable processing device
Inventor     DeHon; Andre (Cambridge, MA); Mirsky; Ethan (Cambridge, MA); Knight, Jr.; Thomas F. (Belmont, MA)
Owner/Assignee     Massachusetts Institute of Technology (Cambridge, MA)
Patent assignment
All assignments
Publication Date     September 21, 1999
Application Number     08/632,371
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     April 11, 1996
US Classification     712/15 713/100
Int'l Classification     G06F 015/80
Examiner     Coleman; Eric
Assistant Examiner    
Attorney/Law Firm     Hamilton, Brook, Smith & Reynolds, P.C.
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/800.1 395/800.15 395/200.51 395/280 395/653
Patent Tags     intermediate-grain reconfigurable processing
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5684980
Casselman
703/23
Nov,1997

[0 after 0 votes]
5457644
McCollum

Oct,1995

[0 after 0 votes]
5426378
Ong
326/39
Jun,1995

[0 after 0 votes]
5336950
Popli
326/39
Aug,1994

[0 after 0 votes]
5305462
Grondalski
712/15
Apr,1994

[0 after 0 votes]
5301340
Cook
712/24
Apr,1994

[0 after 0 votes]
5265207
Zak
712/15
Nov,1993

[0 after 0 votes]
5241635
Papadopoulos
712/201
Aug,1993

[0 after 0 votes]
5239654
Ing-Simmons
712/20
Aug,1993

[0 after 0 votes]
5233539
Agrawal
716/16
Aug,1993

[0 after 0 votes]
5020059
Gorin
714/3
May,1991

[0 after 0 votes]
4873626
Gifford
710/120
Oct,1989

[0 after 0 votes]
4870302
Freeman
326/41
Sep,1989

[0 after 0 votes]
4858113
Saccardi
710/317
Aug,1989

[0 after 0 votes]
4754412
Deering
708/521
Jun,1988

[0 after 0 votes]
4748585
Chiarulli
712/15
May,1988

[0 after 0 votes]
4597041
Guyer
712/248
Jun,1986

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A programmable integrated circuit comprising:

logic units which perform operations on data in response to instructions of a defined set of instructions;

memories which store and retrieve data in response to received addresses;

a configurable interconnect which provides signal transmission between the logic units and memories, the interconnect being configurable from configuration control data to define data paths, originating at logic units and/or memories, through the interconnect as address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units such that the interconnect is configurable to define an interdependent functionality of the memories and logic units; and

programmable configuration storage for storing the configuration control data.

2. An integrated circuit as claimed in claim 1 wherein at least one memory provides at least part of the instructions to a logic unit.

3. An integrated circuit as claimed in claim 1 wherein the logic units are configurable to receive data directly from associated memories.

4. An integrated circuit as claimed in claim 1 wherein the configuration storage stores multiple contexts of configuration control data for reconfiguration of the programmable interconnect.

5. An integrated circuit as claimed in claim 4 wherein a context selection signal that selects among the multiple contexts is routed by the configurable interconnect.

6. An integrated circuit as claimed in claim 5 wherein a global context selection signal that selects among the multiple contexts is globally broadcast to the programmable configuration storage of the device.

7. An integrated circuit as claimed in claim 1 wherein the interconnect is configurable to provide a static value to a logic unit or to provide a variable value from a static source.

8. An integrated circuit as claimed in claim 1 wherein the memories are deployable as data memory, register files, program counters, and instruction stores for other logic units.

9. An integrated circuit as claimed in claim 1 wherein the interconnect comprises network drivers which transmit received signals between the memories and logic units.

10. A programmable integrated circuit comprising:

logic units which perform operations on data in response to instructions of a defined set of instructions;

memories which store and retrieve data in response to received addresses;

a configurable interconnect which provides signal transmission between the logic units and memories, the interconnect being configurable from configuration control data to define data paths through the interconnect as address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units such that the interconnect is configurable to define an interdependent functionality of the memories and logic units; and

programmable configuration storage for storing the configuration control data;

wherein a global context selection signal that selects among multiple contexts is globally broadcast to the programmable configuration storage of the device.

11. A programmable integrated circuit comprising:

logic units which perform operations on data in response to instructions of a defined set of instructions and which are configurable to be chained together to form wider data paths than provided by a single logic unit;

memories which store and retrieve data in response to received addresses;

a configurable interconnect which provides signal transmission between the logic units and memories, the interconnect being configurable from configuration control data to define data paths through the interconnect as address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units such that the interconnect is configurable to define an interdependent functionality of the memories and logic units; and

programmable configuration storage for storing the configuration control data.

12. A programmable integrated circuit comprising:

logic units which perform operations on data in response to instructions of a defined set of instructions;

memories which store and retrieve data in response to received addresses;

a configurable interconnect which provides signal transmission between the logic units and memories, the interconnect being configurable from configuration control data to define data paths through the interconnect as address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units such that the interconnect is configurable to define an interdependent functionality of the memories and logic units, the interconnect being configurable to provide a value to a logic unit or memory from a source, which is determined by a value from another logic unit or memory; and

programmable configuration storage for storing the configuration control data.

13. A programmable integrated circuit comprising:

logic units which perform operations on data in response to instructions of a defined set of instructions;

memories which store and retrieve data in response to received addresses, the logic units being each grouped with memories to form repeating functional units;

a configurable interconnect which provides signal transmission between the logic units and memories, the interconnect being configurable from configuration control data to define data paths through the interconnect as address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units such that the interconnect is configurable to define an interdependent functionality of the memories and logic units; and

programmable configuration storage for storing the configuration control data.

14. An integrated circuit as claimed in claim 13 further comprising programmable logic arrays on data paths between functional units which perform bit level logic operations.

15. An integrated circuit as claimed in claim 13 further comprising reduction logic which performs logic operations on the output of the logic units and passes a result to other functional units as control information.

16. An integrated reconfigurable computing device, comprising:

an array of functional units comprising:

multibit arithmetic logic units which perform operations on data in response to instructions;

memories which store and retrieve data in response to received addresses;

function switches which determine the source of the instructions to the logic units; and

address/data switches which are configurable by the other functional units and determine the source of addresses to the memories and the source of data to the logic units and memories.

17. An integrated circuit as claimed in claim 16 wherein the logic units are configurable to both operate on data from the associated memories and operate on data received from outside the functional unit via the address/data switches.

18. An integrated circuit as claimed in claim 16 wherein the function switches also determine the configuration of the memories.

19. An integrated circuit as claimed in claim 16 wherein the address/data switches are configurable to provide static values, values from other functional units, and values from sources, which are determined by other functional units.

20. An integrated circuit as claimed in claim 16 wherein in the output from the arithmetic logic units are distributed over a local network to near-neighbor functional units.

21. An integrated circuit as claimed in claim 16 wherein the functional units comprise network drivers which transmit received signals to other functional units.

22. An integrated circuit as claimed in claim 21 wherein sources of the received signals to the network drivers are programmable by other functional units.

23. A method for organizing signal transmission within an array of logic units which perform operations on data in response to instructions and memories which store and retrieve data in response to received addresses, the method comprising:

transmitting data read from the memories as instructions or data to the logic units or addresses or data to other memories; and

transmitting data generated by the logic units as instructions or data to other logic units or addresses or data to the memories.

24. A method as claimed in claim 23 further comprising transmitting data from logic units or memories as control to other logic units or memories.

25. A method as claimed in claim 23 further comprising reorganizing the paths of the data and instructions in response to control from the array.

26. A method as claimed in claim 23 further comprising transmitting static values, values from other memories or logic units, and values from sources determined by the memories or logic units.

27. A method as claimed in claim 23 further comprising performing bit level logic operations to control data paths between memories or logic units.

28. A method as claimed in claim 23 further comprising selecting among multiple contexts using a globally broadcast context selection signal to the programmable configuration storage of the device.

29. A method as claimed in claim 23 further comprising configuring the logic units to be chained together to form wider data paths than provided by a single logic unit.

30. A method as claimed in claim 23 further comprising providing a value to a logic unit or memory from a source, which is determined by a value from another logic unit or memory.

31. A method as claimed in claim 23 further comprising grouping each of the logic units with memories to form repeating functional units.
 Description Submit all comments and votes
 


BACKGROUND OD THE INVENTION

Continuing advances in semiconductor technology have greatly increased the amount of processing that can be performed by single-chip, general-purpose computing devices. The relatively slow increase in inter-chip communication bandwidth requires that modern high performance devices use as much of the potential on-chip processing power as possible. This results in large, dense integrated circuit devices and a large design space of processing architectures.

One way of viewing this design space is in terms of granularity. Designers have the option of building very large processing units, or many smaller ones, in the same space. Traditional architectures are either very coarse grain, such as microprocessors, or very fine grain, such as field programmable gate arrays (FPGAs). Both architectures have advantages and disadvantages.

Microprocessors incorporate very few large processing units that operate on wide data-words, and each unit is hardwired to perform defined instructions on these data-words. Usually each unit is optimized for a different set of instructions, such as integer and floating point, and the units are generally hardwired to operate in parallel. The hardwired nature of these units allows very rapid instructions. In fact, a great deal of area on modern microprocessor chips is dedicated to cache memories in order to support a very high rate of instruction issue. Thus, the devices efficiently handle very dynamic instruction streams.

Very fine grain devices, such as FPGAs, incorporate a large number of very small processing elements. These elements are arranged in a configurable interconnect network. The configuration data used to define the functionality of the processing units and network can be thought of as a very large, semantically powerful, instruction word. Nearly any operation can be described and mapped to hardware.

SUMMARY OF THE INVENTION

Unfortunately, because microprocessors are highly optimized for simple, wide-word, dynamic instructions, they are relatively inefficient when performing other kinds of operations. For example, many cycles are required to build up complex operations that are not part of the processor's pre-selected instruction set. Also, when performing short-word operations, much of the processing unit is not being used, and when the instructions being issued are very regular, the large instruction caches are unnecessary. Thus, very coarse-grain microprocessors are not equipped to take the maximum advantage of these cases.

The size of the "instruction word" creates a number of problems with fine-grain FPGA devices, however. Reloading new instructions takes a relatively long time, making dynamic instruction streams very difficult for these devices. Moreover, if the operation being performed is, in fact, a wide word operation, a great deal of this "instruction word" must be dedicated to re-describing the operation for each of the small processing elements. Thus, fine grain processing elements are not well equipped to take advantage of a large number of common computing operations.

The present invention utilizes a large number of intermediate-grain processing elements which are arranged in a configurable mesh. Thus, the regularity and rapid instruction issue features of coarse-grain units are exploited, but a reconfigurable or programmable interconnect allows these units to be connected in an application-specific manner. This means that coarse-grain resources, such as memory and processing, can be deployed in a way that takes advantage of the opportunities for optimization present in any given problem. In addition, configuration memories may be deployed to take advantage of application specific redundancy.

In general according to one aspect, the invention features a programmable integrated circuit that comprises a logic units that perform operations on data in response to instructions and memories that store and retrieve addressed data. A configurable or programmable interconnect provides a mode of signal transmission between the logic units and memories. Configuration control data defines data paths through the interconnect, which can be address inputs to memories, data inputs to memories and logic units, and instruction inputs to logic units. Thus, the interconnect is configurable to define an interdependent functionality of the functional units. A programmable configuration storage stores the configuration control data.

Thus the present invention may be configured to operate according to a number of traditionally distinct computing architectures. For example, a centrally located functional unit may be assigned the role of arithmetic logic unit (ALU) with memories of surrounding functional units being configured to act as instruction caches, register files, and program counters. Wider data paths are accommodated by tying near-neighbor ALUs to each other. Wider instructions are achieved by configuring instruction memories of separate functional units as if they were a single memory. For a different problem, the same integrated circuit may be reconfigured to emulate a single instruction multiple data (SIMD) architecture. The logic units of rows of functional units are tied together to create wider data paths, and the rows perform separate serial tasks.

In specific embodiments, functional units may provide at least part of the instructions to logic units of other functional units. Also, the configuration storage may hold multiple contexts of configuration control data for reconfiguration of the programmable interconnect.

In other embodiments, the interconnect may support three different modes of operation: a static value in which a value set by the configuration data is provided to a functional unit or static source in which another functional unit serves as the value source. A dynamic source mode can be included in which the source is determined by the value from another functional unit.

In still other embodiments, each logic unit can also have programmable logic arrays on data paths between functional units which perform bit level logic operations. Additionally, reduction logic can be added that performs logic operations on the output of the logic units and passes a result to other functional units as control information. Network drivers are assigned to each unit to transmit received signals to other functional units. The sources of the signals received by the drivers may also be dynamic so that the sources are programmable by other functional units.

In general according to another aspect, the invention features an integrated reconfigurable computing device, which has functional units of multi-bit arithmetic logic units and memories. A configurable interconnect that connects the units includes function ports which determine the source of the instructions to the logic units. Network ports of the units are configurable by the functional units and determine the source of addresses to the memories and the source of data to the logic units and memories.

In general according to still another aspect, the invention can also be characterized in the context of a method for organizing signal transmission within an array of functional units. Data read from the memories of functional units may be transmitted as instructions to the logic units of other functional units. Also, data read from logic units may be transmitted as addresses for the memories of other functional units. Finally, the data read from functional units can also be used as data inputs for the logic units of other functional units.

In specific embodiments, the paths of the data and instructions are dynamic in response to control from the functional units. More specifically, static values, values from other functional units, and values from sources may be transmitted between functional units.

The above and other features of the invention including various novel details of construction and combinations of parts, and other advantages, will now be more particularly described with reference to the accompanying drawings and pointed out in the claims. It will be understood that the particular method and device embodying the invention are shown by way of illustration and not as a limitation of the invention. The principles and features of this invention may be employed in various and numerous embodiments without departing from the scope of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

In the accompanying drawings, reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale; emphasis has instead been placed upon illustrating the principles of the invention. Of the drawings:

FIG. 1 shows a programmable integrated processing device of the present invention, which has been configured as an 8-bit microprocessor;

FIG. 2 shows a SIMD processor configuration for the processing device according to the invention;

FIG. 3 shows a 32-bit processor configuration for the processing device according to the invention;

FIG. 4 shows a very long instruction word (VLIW) processor configuration for the processing device according to the invention;

FIG. 5 shows multiple instruction multiple data (MIMD) processor configuration for the processing device according to the invention;

FIG. 6 is a block diagram showing the architecture of a basic functional unit (BFU) core of the present invention;

FIG. 7 is a block diagram showing the inter-BFU connectivity provided by the level-1 network connections;

FIG. 8 is a block diagram showing the BFU interconnection provided by the level-2 network connections;

FIG. 9 is a block diagram showing the network switch architecture for a BFU of the present invention;

FIG. 10 is a block diagram illustrating the function switch architecture of the present invention;

FIG. 11 is a block diagram showing the address/data and network switch architecture of the present invention;

FIG. 12 is a block diagram illustrating the floating port architecture of the present invention;

FIG. 13 is a block diagram showing the level-1 network drivers of the present invention;

FIG. 14 shows the level-2 drivers of the present invention;

FIG. 15 shows the level-3 drivers of the present invention;

FIG. 16 shows BFU input registers of the present invention;

FIG. 17 shows the reduction logic in the BFU control architecture of the present invention;

FIG. 18 is an example of multi-BFU reduction performed by the reduction logic of the present invention;

FIG. 19 is a block diagram illustrating the operation of the distributed programmable logic array (PLA) associated with each BFU according to the invention;

FIG. 20 is a block diagram showing the control logic for a single BFU;

FIG. 21 shows an alternative embodiment of the configuration memory supporting multiple contexts;

FIG. 22 is a block diagram of the configurable logic device of the present invention in the form of an integrated chip;

FIG. 23 is a block diagram showing the input/output port architecture for the chip of the present invention;

FIG. 24 is a block diagram showing the structure of an I/O register according to the invention;

FIG. 25 is a block diagram of a programmable logic array for customizing the chip's interface;

FIG. 26 is a block diagram showing the movement of data from the BFU core off-chip according to the invention;

FIG. 27 is a block diagram of a selector switch that chooses the core outputs to be driven on an output wire according to the invention;

FIG. 28 is a block diagram showing a tri-state buffer used in the selector switch of FIG. 20;

FIG. 29 is a block diagram illustrating how data enters the BFU core from off-chip;

FIG. 30 is a block diagram showing the selector switch that selects among incoming data bytes from I/O ports and PLAs according to the invention;

FIG. 31 is a block diagram of a C/R input architecture according to the invention;

FIG. 32 is a block diagram showing the construction of the controller switches of the level-3 network lines according to the invention;

FIG. 33 is a block diagram illustrating the dynamic control of the controller switches, which is shared between pairs of controllers at each column, according to the invention;

FIG. 34 shows the architecture of one of the dynamic control switches according to the invention;

FIG. 35 is a block diagram showing the connectivity of BFUs in a systolic-type configuration according to the invention;

FIG. 36 shows the configuration of the BFUs for a microcoded-type implementation for the convolution problem according to the invention;

FIG. 37 shows the organization of the BFUs for a VLIW, horizontal microcode-type implementation according to the invention; and

FIG. 38 shows the organization of the BFUs for a VLIW/MSIMD-type implementation according to the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows a multi-bit microprocessor configuration of a reconfigurable processing device, which has been constructed and programmed according to the principles of the present invention. A two-dimensional array of basic functional units 100 are located in a programmable interconnect 101. Five of the BFUs 100 and the portion of the reconfigurable interconnect connecting the BFUs have been configured to operate as a microprocessor 102.

Each of the BFUs 100 preferably has addressable memory resources and logic resources, such as an 8-bit arithmetic logic unit (ALU). One of the BFUs 100, denoted ALU, utilizes its logic resources to perform the logic operations of the microprocessor 102 and utilizes its memory resources as a data store and/or extended register file. Another BFU operates as a function store F that controls the successive logic operations performed by the logic resources of the ALU. Two additional BFUs, A and B, operate as further instruction stores that control the addressing of the memory resources of the ALU. A final BFU, PC, operates as a program counter for the various instruction BFUs F, A, B.

As shown in FIG. 2, the same reconfigurable processing array, however, may be reprogrammed to function as a SIMD system, and as described below, this reconfiguration can occur on a cycle-by-cycle basis. The functions of the program counter PC and instruction stores A, B and F have been again assigned to different BFUs 100, but the ALU function has been replicated into 12 BFUs. Each of the ALUs is connected via the reconfigurable interconnect 101 to operate on globally broadcast instructions from the instruction stores A, B, F. These same operations are performed by each of these ALU, or common instructions may be broadcast on a row-by-row basis.

FIG. 3 shows how wider data paths can be constructed in the programmable device. This 32-bit microprocessor configured device has the same instruction stores A, B, F and program counter as described in connection with FIG. 1. Four BFUs, however, have been assigned an ALU operation, and the ALUs are chained together to act as a single 32-bit wide microprocessor in which the interconnect 101 supports carry-in and carry-out operations between the ALUs.

FIG. 4 shows how the device can be configured to operate as a very long instruction word (VLIW) system. The various instruction stores A, B, F are defined to encompass multiple BFUs 100 to accommodate the desired instruction word width.

FIG. 5 shows the configuration of the present system to operate as a multiple instruction multiple data (MIMD) system. The 8-bit microprocessor configuration 102 of FIG. 1 is replicated into an adjacent set of BFUs to accommodate multiple, independent processing units within the same device. Of course, wider data paths could also be accommodated by chaining ALUs of each processor 102 to each other.

1. Basic Functional Unit Architecture

FIG. 6 shows the moderately coarse grain, preferably 8-bit, BFU core. Primarily, the BFU core has memory block 110, basic ALU core 120, and configuration memory 105.

The main memory block 110 is a 256 word.times.8 bit wide memory, which is arranged to be used in either single or dual port modes. In dual port mode, the memory size is reduced to 128 words in order to be able to perform the two simultaneous read operations without increasing the read latency of the memory. The memory mode is controlled by control logic 114 accessed through a Memory/Mux function port 112, and the write enable can be controlled either through the memory/mux function port 112 or by the control logic 134 accessed through ALU function port 132. Control logic is hardwired and also controls the ALU functions.

In single port mode, the memory 110 uses the A.sub.-- ADR port for an address and outputs the selected value to both A.sub.-- PORT and B.sub.-- PORT. In dual port mode, the A.sub.-- ADR port selects a value for A.sub.-- PORT only, and B.sub.-- ADR port selects a value for the B.sub.-- PORT.

In either mode the read operation takes place during the first half of the clock cycle, and the values are latched for the rest of the cycle. Write operations take place on the second half of the cycle via the DATA memory port. Writes are always done to the current A.sub.-- ADR address.

A feedback path 118 shown as a dashed line may be used. The BFU core performs "A op B.fwdarw.A" in one cycle. Two cycles are needed to perform "A op B.fwdarw.C" operations. In this case, the feedback is performed by the normal Level-1 network described in more detail later.

The configuration memory 105 stores configuration words that control the configuration of the interconnect. It also stores configuration information for a control architecture. Optionally, it can also be a multi-context memory that receives a globally broadcast 2-bit global context selecting signal. The memory is addressed via network port A 122 and receives data from port B 124. The write enable WE is issued by the control logic 114.

The ALU 120 is a basic 8-bit arithmetic logic processing unit. The following operations are supported:

Input Invert--Prior to performing any of the following operations either, or both of the ALU inputs, A.sub.-- in or B.sub.-- in, can be inverted.

Pass--Passes either A.sub.-- in or B.sub.-- in to Out. With the input inversion, this operation can be a NOT.

NAND--Performs bitwise operation: (A NAND B). With input inversions this can be an OR.

NOR--Performs bitwise operation: (A NOR B). With input inversions this can be a AND.

XOR--Perform bitwise operation: (A XOR B). With input inversions this can be a XNOR.

Shift--Shifts A or B either left or right one bit.

Add--Performs (A+B+C.sub.-- in). C.sub.-- in can be selected from 0, 1, or C.sub.-- out of an adjacent cell. Combined with the input inversion a subtract can be made: (A-B)=(A+B+1).

Multiply--Performs (A*B). Can also perform (A*B+X) and (A*B+X+Y), where X and Y are special inputs. These operations are needed to create pipelined multiply structures. Multiply operations require two cycles to fully complete. The low byte is available on the first cycle and the high byte is available on the second.

The two network ports 122,124 feed addresses to memory ports A.sub.-- ADR and B.sub.-- ADR. Data is feed to the memory 110 from Network port B via the memory DATA port. A data multiplexor 126 selects either the feedback back path 118 or the network port B output. Network ports A and B 122, 124 outputs can feed directly to the ALU 120 by configuring ALU input multiplexors 128,130. The memory function port 112 controls the operation of the data and ALU input multiplexors 126,128,130 via the control logic 114.

The BFU core is designed to be smoothly chained to other BFUs to form wider-word ALU structures by properly configuring the control logic 134 via the ALU function port Fa. In order to accomplish this, the user must specify the carry-chain of each of datapath element as it travels through multiple BFUs by setting the following bits in each of the BFUs:

LSB--Set to "1" marks the least-significant-byte position.

MSB--Set to "1" marks the most-significant-byte position.

Rightsource--Specifies the direction to the next least-significant-byte, which can also be set to receive a carry from another source.

Leftsource--Specifies the direction to the next most-significant-byte, which can also be set to receive a carry from another source.

The source selection can be one of the following:

North--North BFU.

East--East BFU.

South--South BFU.

West--West BFU.

Local--The local BFU's carry from the previous cycle.

Control Bit--The local Control Bit.

Zero--Constant Zero.

One--Constant one.

In addition, pipeline stages can be inserted into the carry chain by specifying CarryPipeline to be "1". This will register the incoming carry prior to its being used. This is important for addition operations, because the carry-chain is limited by the clock period and the speed of the adder.

Based on this local information, the actual Shift and Add operations of the ALU 120 have different effects. There are two main shift functions: Left and Right. Left shift moves the bits towards the MSB, and right shifts move the bits towards the LSB. Normally, the carry-in value is used to fill the newly-created opening, but if the cell is an LSB or MSB, the new bit is determined by additional information contained the chosen shift instruction. For left shifts, the LSB position will be different, while for the Right Shifts it will be the MSB position. The options are:

Force Carry--This option will override the LSB/MSB setting and force the shift to use the carry-in from its designated source (Left/Rightsource). This is useful for shift-rotations.

Skip Bit--This option will keep the same LSBit/MS