|
Description  |
|
|
FIELD OF THE INVENTION
This invention relates to a system of programmable logic devices (PLDs) for
implementing a program which traditionally has been software implemented
on a general purpose computer but now can be implemented in hardware. This
invention also relates to a method of translating a source code program in
an algorithmic language into a hardware description suitable for running
on one or more programmable logic devices.
BACKGROUND OF THE INVENTION
The general purpose computer was developed by at least the 1940s as the
ENIAC machine at the University of Illinois. Numerous developments lead to
semiconductor-based computers, then central-processing units (CPUs) on a
chip such as the early Intel 4040 or the more recent Intel 486, Motorola
68040, AMD 29000, and many other CPUs. A general purpose computer is
designed to implement instructions one at a time according to a program
loaded into the CPU or, more often, available in connected memory, usually
some form of random access memory (RAM).
A circuit specifically designed to process selected inputs and outputs can
be designed to be much faster than a general purpose computer when
processing the same inputs and outputs. Many products made today include
an application specific integrated circuit (ASIC) which is optimized for a
particular application. Such a circuit cannot be used for other
applications, however, and it requires considerable expense and effort to
design and build an ASIC.
To design a typical ASIC, an engineer begins with a specification which
includes what the circuit should do, what I/O is available and what
processing is required. An engineer must develop a design, program, flow
chart, or logic flow and then design a circuit to implement the
specification. This typically involves (1) analyzing the internal logic of
the design, (2) convening the logic to Boolean functions which can be
implemented in hardware logic blocks, (3) developing a schematic diagram
and net list to configure and connect the logic blocks, then (4)
implementing the circuit. There are a number of computerized tools
available to assist an engineer with this process, including simulation of
portions or all of a design, designing and checking schematics and
netlists, and laying out the final ASIC, typically a VLSI device. Finally,
a semiconductor device is created and the part can be tested. If the part
does not perform as expected or if the specification changes, some or all
of this process must be repeated and a new, revised ASIC must be designed
and created until an acceptable part can be made which meets or
approximates the specification. The entire design process is very time
consuming and requires the efforts of several engineers and assistants. It
is difficult to predict exactly what the final part will do once k is
finally manufactured and if the part does not perform as expected, a new
part must be designed and manufactured, requiring more time, resources and
money.
There are several alternatives to ASICs which may provide a solution when
balancing cost, number of units to be made, performance, and other
considerations. Field Programmable Gate Arrays (FPGAs) are high density
ASICs that provide a number of logic resources but are designed to be
configurable by a user. FPGAs can be configured in a short amount of time
and provide faster performance than a general purpose computer, although
generally not as fast as a fully customized circuit, and are available at
moderate cost. FPGAs can be manufactured in high volume, reducing cost,
since each user can select a unique configuration to run on the standard
FPGA. The configuration of a part can be changed repeatedly, allowing for
minor or even total revisions and specification changes. Other advantages
of a configurable, standard pan are: faster time implement a specification
and deliver a functional unit to market, lower inventory risks, easy
design changes, faster delivery, and availability of second sources. The
programmable nature of the FPGA allows a finished, commercial product to
be revised in the field to incorporate improvements or enhancements to the
specification or finished product.
A gate array allows higher gate densities than an FPGA plus custom circuit
design options but requires that the user design a custom interconnection
for the gate array and requires manufacturing a unique part and may
require one or more revisions if the specification was not right or if it
changes. The user must design or obtain masks for a small number of layers
which are fabricated on top of a standard gate array. The cost is less
than for fully custom ICs or standard cell devices.
One significant development in circuit design is a series of programmable
logic devices (PLDs) such as the Xilinx XC3000 Logic Cell Array Family.
Other manufacturers are beginning to make other programmable logic devices
which offer similar resources and functionality. A typical device includes
many configurable logic blocks (CLBs) each of which can be configured to
apply selected Boolean functions to the available inputs and outputs. One
type of CLB includes five logic inputs, a direct data-in line, clock
lines, reset, and two outputs. The device also includes input/output
blocks, each of which can be configured independently to be an input, an
output, or a bidirectional channel with three-state control. Typically,
each or even every pin on the device is connected to such an I/O block,
allowing considerable flexibility. Finally, the device is rich in
interconnect lines, allowing almost any two pins on the chip to be
connected. Any of these lines can be connected elsewhere on the device,
allowing significant flexibility. Modern devices such as the Xilinx XC
3000 series include the XC 3020 with 2000 gates through the XC 3090 9,000
gates. The XC 4000 series includes the XC 4020 with 20,000 gates.
To aid the designer, Xilinx can provide software to convert the output of a
circuit simulator or schematic editor into Xilinx netlist file (XNF)
commands which in turn can be loaded onto the FPGA to configure it. The
typical input for the design is a schematic editor, including standard CAE
software such as futureNet, Schema, OrCAD, VIEWlogic, Mentor or Valid.
Xilinx provides programmable gate array libraries to permit design entry
using Boolean equations or standard TTL functions. Xilinx design
implementation software converts schematic netlists and Boolean equations
into efficient designs for programmable gate arrays. Xilinx also provides
verification tools to allow simulation, in-circuit design verification and
testing on an actual, operating part.
There are several hardware description languages which can be used to
design or configure PALs, PLAs or FPGAs. Two such languages are HDL and
ABLE. Cross-compilers are available to convert PALASM, HDL or ABLE code
into XNF or into code suitable for configuring other manufacturer's
devices.
An enormous quantity of software is available today to run on general
purpose computers. Essentially all of that software was originally created
in a high level language such as C, PASCAL, COBOL or FORTRAN. A compiler
can translate instructions in a high level language into machine code that
will run on a specified general purpose computer or class of computers. To
date, no one has developed a method of translating software-oriented
languages to run as a hardware configuration on an FPGA or in fact on any
other hardware-based device.
Other recent products have been introduced by Aptix, Mentor Graphics and
Quickturn. See Mohsen, U.S. Pat. No. 5,077,451 (assigned to Aptix
Corporation), Butts, et al., U.S. Pat. No. 5,036,473 (assigned to Mentor
Graphics Corporation), and Sample et al, U.S. Pat. No. 5,109,353 (assigned
to Quickturn Systems, Incorporated). These references provide background
for the present invention and related technologies.
Others have attempted to partition logical functions over multiple PLDs but
these efforts have not provided a true, full function implementation of
algorithmic source code. McDermith et al, U.S. Pat. No. 5,140,526
(assigned to Minc Incorporated), describe an automated system for
partitioning a set of Boolean logic equations onto PLDs by comparing what
resources are required to implement the logic equations with information
on what PLD devices are commercially available that have the capability to
implement the logic equations, then evaluating the cost of any optional
solutions. The disclosure focuses on part selection and does not disclose
how logic is actually to be partitioned across multiple devices.
A computer program typically includes data gathering, data comparison and
data output steps, often with many branch points. The principles of
programming are well known in the art. A programmer usually begins with a
high level perspective on what a program should do and how it should
execute the program. The programmer must consider what machine will run
the program and how to convert the desired program from an idea in the
programmer's head to a functional program running on the target machine.
Ultimately, a typical program on a general purpose computer is written in
or converted by a compiler to machine code.
A programmer will usually write in a high level language to facilitate
organizing and coding the program. Using a high level language like the C
language, a programmer can control almost any function of the computer.
This control is limited, however, to operations accessible by the
computer. In addition, the programmer must work within the constraints of
the physical system and generally cannot add to, remove or alter the
configuration of computer components, the resources available, how the
resources are connected, or other physical attributes of the computer.
In contrast, a special purpose computer can be designed to provide specific
results for a range of expected inputs. Examples include controllers for
household appliances, automobile systems control, and sophisticated
industrial applications. Many such special purpose computers are designed
into a wide range of commercial products, generally based on an ASIC.
Programming an ASIC begins with a high level description of the program,
but the program must be implemented by selecting a series of gates and
circuits to achieve the programmer's goals. This usually involves
converting the high level description into a logical description which can
be implemented in hardware. Many values are handled as specific signals
which typically originate in one circuit then are carried by a "wire" to
another circuit where the information will be used. A typical signal is
created to provide for a single logical event or combination which may
never or rarely occur in real life, but must be considered and provided
for. Each such signal must be designed into the ASIC as one or several
gates and connections. A complex program may require many such signals,
and can consume a large portion of valuable, available circuit area and
resources. A reconfigurable device could allocate resources for signals
only as needed or when there is a high probability that the signal will be
needed, dramatically reducing the resources that must be committed to a
device.
Programming a typical ASIC circuit is not easy but there are many tools
available to help a programmer design and implement a circuit. Most
programmers use silicon compilers, computer assisted engineering tools to
design schematics which will perform the desired functions. An ASIC must
be built to be tested, although many parts can be simulated with some
accuracy. Almost any ASIC design requires revisions, which means making
more parts, which is time consuming and expensive. A reconfigurable
equivalent part can be incorporated in a design, tested, and modified
without no or minimal modifications to physical hardware, essentially
eliminating manufacturing revision costs in designing special purpose
computers. Current configurable devices, however, are severely limited in
capacity and cannot be used for complex applications.
A part can be simulated in hardware using PLDs, described above in the
background section. These, however, can only be effectively programmed
using hardware description languages, which have many shortcomings. Until
now, there has been no way to convert a program of any significant
complexity from a high level software language like C to a direct hardware
implementation.
SUMMARY OF THE INVENTION
The present invention provides a video processing module designed for high
performance using economical components. A programmable logic device (PLD)
is configured to modify a data stream, in particular a video stream. The
PLD can be connected to a memory resource. In addition, the PLD can be
connected to a second PLD through an interruptable connection. The second
PLD can be optimized for bus interface communication and connected to an
external system, typically a host computer. The second PLD can take
commands from the host to prepare a processing configuration for the first
PLD and can connect when needed to download a configuration to the first
PLD through the interruptable connection. An array of these modules can be
connected in a systolic array to provide powerful, pipelined video
processing.
The present invention provides a configurable hardware system for
implementing an algorithmic language program, including a programmable
logic device (PLD), a hardware resource connectible to the PLD, a means
for configuring the PLD, and a programmable connection to the PLD. The
programmable connection is typically an I/O bus connectible to the PLD.
The PLD may include an and/or matrix device or a gate array, that is, a
programmable array logic (PAL) device and a gate array logic (GAL) device.
The hardware resource may be a DSP, a memory device, or a CPU. The
hardware system is designed to provide resources which can be configured
to implement some or all of an algorithmic language program. These
resources can be placed on a module, referred to herein as a distributed
processing unit (DPU).
One example of an algorithmic program is the classic "Hello, World!" C
program. This program could easily be modified to output that famous
message to an LED readout only when prompted by user input or perhaps to
repeat that message at selected times without input or prompting. Another
example of an algorithmic program is a digital filter which modifies an
input data stream such as a sound or video signal.
A larger system can be built to make an extensible processing unit (EPU)
from multiple DPUs plus support modules. A typical DPU includes a PLD, a
hardware resource connected to the PLD, a means for configuring the PLD,
and programmable connections to the PLD. The programmable connections are
typically an I/O bus. In addition, a typical EPU will include one or more
dedicated bus lines as a configuration bus, used to carry configuration
information over the configuration bus.
One useful DPU is a VideoMod (Vmod) for processing video information. A
Vmod may be optimized for real time processing of an active video stream
or may be optimized for off-screen processing.
Each module in an EPU can be connected to other modules by one or more of
several buses. A neighbor bus (N-bus) connects a module to its nearest
neighbor, typically to the side or top or bottom in a two dimensional
wiring array. A module bus (M-bus) connects a group of modules, typically
two to eight modules, in a single bus. A host bus (H-bus) connects a
module to a host CPU, if present. A local bus (L-bus) connects components
within a single module.
The invention also includes a method of translating source code in an
algorithmic language into a configuration file for implementation on a
processing device which supports execution in place. This is particularly
useful for use with the modules described above, including PLDs connected
to a hardware device such as a DSP, CPU or memory. The PLD can be
connected to a device capable of processing digital instructions. The
algorithmic language can be essentially any such language, but C is a
preferred algorithmic language for use with this invention.
The method includes four sequential phases of translation, a tokenizing
phase, a logical mapping phase, a logic optimization phase, and a device
specific mapping phase. One embodiment of the method includes translating
source code instructions selected from the group consisting of a C
operator such as a mathematical or logical operator, a C expression, a
thread control instruction, an I/O control instruction, and a hardware
implementation instruction. The translator includes a stream splitter
which selects source code which can be implemented on an available
processing device and source code which should be implemented on a host
computer connected to the processing unit. The hardware implementation
instructions can include pin assignments, handling configurable I/O buses,
communication protocols between devices, clock generation, and host/module
I/O.
One object of the invention is to provide a high speed video processor.
Another object of the invention is to provide a systolic array of PLDs for
video processing.
Another object of this invention is to provide hardware resources to
implement an algorithmic software program in hardware.
Another object of this invention is to provide a system and method that can
implement in hardware an algorithmic software program for video
processing.
Another object of this invention is to provide a stream splitter to analyze
an algorithmic source program and implement as much of the program as
possible on the available hardware resources.
Yet another object of this invention is to provide hardware resources which
can be reconfigured in whole or in part in a relatively short time to
allow swapping of computer instructions. This allows a single set of
hardware resources to implement many different computer programs or a
large program on limited resources.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 1A, 1B, 1C and 1D illustrate different views of one embodiment of a
module of this invention, in DIP package format.
FIGS. 1A, 2B, 2C and 2D illustrate different views of a second embodiment
of a module of this invention, in SIMM module format.
FIG. 3 illustrates a PLD connected to an N-bus, M-bus and L-bus.
FIG. 4 illustrates the logic symbol and main connections to a distributed
processing Unit (DPU).
FIG. 5 illustrates a module with multiple PLDS, each connected to an
independent DRAM.
FIG. 6 illustrates a module with a PLD connected to a memory unit and
multiple DSP units.
FIG. 7 illustrates a different module including a PLD connected to multiple
DSP units.
FIG. 8 illustrates a bridge module.
FIG. 9 illustrates a repeater module.
FIG. 10 illustrates an extensible processing unit (EPU) and the
interconnections between distributed processing units.
FIG. 11 illustrates one pinout configuration of a DPU.
FIG. 12 illustrates a logic symbol for an EPU.
FIG. 13A illustrates a schematic view of one embodiment of an EPU assembled
on a PC board and connected to an ISA bus interface.
FIG. 13B illustrates that embodiment as laid out on a PC board.
FIG. 14 illustrates another embodiment of an EPU assembled on a PC board
and connected to an ISA bus interface.
FIGS. 15A, 15B, 15C, 15D, 15E, 15F, and 15G illustrate various views of an
embodiment of an EPU with two bridgemods, each connected to a common SCSI
interface. FIG. 15A provides a schematic representation of this
embodiment. FIG. 15B illustrates a top view of the same mods positioned
parallel to and connected to each other, on 0.3 inch (0.76 cm) centers.
FIGS. 15C and 15D illustrate a top and a bottom view, respectively, of an
EPU mod with multiple bus connectors. FIG. 15E shows that the PC board is
about 0.50" (1.27 mm), the PLD is about 3 mm thick (maximum vertical
distance from PCB), the DSP is 2.5 mm, the DRAM is about 1.2 mm, the SSM
connector is 5.72 mm and the dimension between PC boards (closest edge to
closest edge) is about 0.250" (6.35 mm). FIG. 15F is another view showing
a perspective drawing of four stacked EPUs with included components. FIG.
15G is a side and top perspective view comparable to FIG. 15B. FIG. 15H
illustrates a connector. FIG. 151 illustrates possible routing of lines
between connectors on the top and bottom, respectively, of a PC board for
auto bus programming.
FIGS. 16A, 16C and 16E illustrate several different configurations of buses
and FIGS. 16B, 16D and 16F illustrate corresponding timing diagrams. FIGS.
16G-I illustrate several additional configurations of buses.
FIG. 17 illustrates the components and process of stream splitting.
FIG. 18 illustrates the location of many code elements after using the
stream splitter.
FIGS. 19A and 19B illustrate program flow of an algorithmic source code
program before (19A) and after (19B) applying the stream splitter.
FIG. 20 illustrates the program code resident on the host before and after
applying the stream splitter.
FIG. 21 illustrates major elements of the steam splitter libraries and
applications.
FIG. 22 illustrates the location and program/time flow for a program
running on several modules without stream splitting.
FIG. 23 illustrates the location and program/time flow for the program of
FIG. 22 split to run on three modules and the host.
FIGS. 24A and 24B illustrate emulation of the "C" programming language in
PLDs.
FIGS. 25A and 25B illustrate several representations of flow-through
operations as implemented in DPUs.
FIG. 26 illustrates several representations of state operations implemented
in DPUs.
FIGS. 27A, 27B and 27C illustrate implementation in a DPU of execution
domains.
FIGS. 28A, 28B and 28C illustrate implementation in a DPU of conditional
statements.
FIG. 29 illustrates implementation in a DPU of a conditional (while) loop
and a for loop.
FIG. 30 illustrates implementation in a DPU of a function call and function
| | |