|
Description  |
|
|
FIELD OF THE INVENTION
This invention relates to distribution of clock signals in high density
integrated circuits, printed circuits, and the like, and more specifically
to a method for minimizing skew in clock signals distributed to different
regions in such circuits.
BACKGROUND OF THE INVENTION
The preferred embodiment of the present invention is the minimization of
clock skew in an integrated circuit. Although the present invention may
also find application in clock distribution on a printed circuit board,
the background of the invention will focus on the preferred embodiment.
Modern high density application specific integrated circuits ("ASICs") are
required to deliver high performance in a variety of critical
applications. Many such ASICs operate from a master clock signal that is
distributed to clocked loads, often via buffers, at different locations on
the ASIC chip. Frequently these clocked loads include flip-flops, latches,
binary cells, and the like, and the term "clocked load" as used herein may
refer to any such loads.
Usually the clock signals are distributed on metal lines or traces that are
formed on one or more layers that overlie the ASIC. Vias couple the clock
signals from the metal lines to the underlying clocked loads. A given
length of such metal line will have distributed resistance and
capacitance. Understandably, clock signals coupled to loads overlong metal
lines will be delayed (or skewed) relative to clock signals provided over
short metal lines. This skewing can result from differences in accumulated
resistance and capacitance, both within a single metal layer, and between
metal layers. Even if two buffers receive the same clock signal over equal
lengths of interconnect path, skew can still occur if the buffers present
substantially different load impedances to their drivers. Thus, load
impedance equalization is also an important design consideration in
minimizing skew.
Unfortunately skewing can degrade the ASIC performance, especially at high
clock frequencies where the amount of skew can represent a substantial
percentage of the clock period. In fact, a poorly implemented clock
distribution system can render an ASIC design non-functional.
It is known in the prior art to minimize clock skew by attempting to
equalize the lengths of metal lines that distribute the clock signals. If
each such line could be made identical in length (including the length
found at each metallization layer), clock skew would be substantially
minimized. The problem, however, is how to achieve such equalization in
practical applications.
Generally, the prior art tends to defer distribution of the clock
metallization lines until after the rest of the ASIC has been laid out.
Stated differently, in the prior art, the clock metallization is fitted
more or less into whatever routes are potentially available at the end of
the layout process.
Usually the ASIC designer provides a netlist that specifies circuit
functions and interconnects, and hard grouping information that specifies
what components or functions the designer wants grouped together. Further,
the ASIC designer specifies a general floor plan or layout of the chip
that can specify what IC regions must be reserved or dedicated to specific
functions. For example, a substantial portion of the IC area may be
dedicated to fabrication of a large block of RAM, which area is not
available to locate clocked destination devices.
While commercially available tools can route and place clock metallization
lines given these inputs from the ASIC designer, the resultant lines do
not produce minimum clock skew.
One prior art approach has been the use of a fixed clock grid, wherein the
ASIC designer attempts manually to equalize clock line lengths during the
base array/floor plan definition design phase. Historically this line
routing method has had some success. However, modern high density, large
scale complementary metal-oxide-semiconductor ("BiCMOS") and CMOS
integrated circuits make use of the method increasingly difficult.
Further, because it requires a high degree of manual trial-and-error, this
routing method precludes rapid turnaround time in fabricating new
circuits.
What is needed is a method for closely controlling clock distribution
through an ASIC design such that clock skew is essentially eliminated.
Given the design of the underlying ASIC, such method should be rapidly
implemented, preferably using commercially available placing and routing
equipment, and should be essentially transparent to the ASIC designer.
The present invention provides such a method.
SUMMARY OF THE PRESENT INVENTION
In the prior art, clock trace routing is considered only after the
clock-driven components have been positioned. By contrast, the present
invention regards the clock and its distribution as having greater
importance than placement of clock-driven components in an ASIC.
According to the present invention, an on-chip clock or buffer unit is
preferably located at the center of the ASIC chip and is defined by a
first clock net. The ASIC designer's netlist, hard grouping information
and floor plan information are examined with respect to approximately
where clock-driven components ("clocked loads" or "clocked destinations")
shall be placed. Further, this input information advises which regions of
the IC chip may not be used for interconnecting clock signals.
The IC chip is then partitioned into coordinate defined regions, wherein
each region contains a buffer that is surrounded by an island of
associated clocked destinations that shall be driven by that buffer. It is
the function of the present invention to provide interconnect paths that
couple the on-chip clock/buffer unit to the buffers and their associated
clocked destinations, with minimum clock skew. Preferably the on-chip
clock/buffer unit provides a dedicated buffer for each island-surrounded
buffer.
A series of netlists is created from the on-chip clock/buffer unit to each
buffer, and the order in which the netlists shall be routed is specified.
Preferably the metalization interconnect traces from the on-chip clock to
the region buffers define "H"-shaped configurations, wherein the arms of
each "H" terminate at a region buffer position. (Conventional vias couple
the interconnect traces to the on-chip clock and region buffers.) Joiner
cells, through which a conventional placement router tool will route, are
positioned to segment the net and provide the desired "H"-shapes. The
total metalization length from the on-chip clock to each buffer position
is equal, and further metallization layer 1 and metallization layer 2
sub-lengths comprising the interconnect metalization length are also
equal. This minimizes skew and capacitive effects at a macroscopic level,
e.g., from the on-chip clock to each region buffer.
Each region buffer is then surrounded by an island of clocked destinations
within a soft group that are to be driven by that buffer. A net is
declared for each such buffer. The associated group of clocked
destinations are arranged in a donut-shaped ring around the buffer, such
that each clocked destination is substantially the same electrical
distance from its associated buffer. To help equalize skew effects, each
buffer preferably drives the same number of actual (or equivalent) clocked
loads. Additional nets are declared for each clocked load.
The placement router then routes between each buffer and the clocked loads
in the associated island. Because the clocked loads are substantially
equidistant from their associated buffer, each buffer-clocked load path
length is substantially equal within an acceptable skew error margin. This
minimizes skew at the microscopic level.
In this fashion, the present invention minimizes clock skew both at the
macroscopic level wherein on-chip clock signals propagate along equal
length metal traces, and at the microscopic level, wherein each clocked
load is substantially equidistant from the region buffer.
Other features and advantages of the invention will appear from the
following description in which the preferred embodiments have been set
forth in detail, in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a generic depiction of an integrated circuit with reduced clock
skew, according to the present invention;
FIG. 2A and FIG. 2B depict fixed clock buffer "H"-shaped patterns for
different underlying ASIC base arrays, according to the present invention;
FIG. 3A depicts a logical hierarchy for soft grouping clocked loads
assignable to a regional buffer, according to the present invention;
FIG. 3B is a detailed depiction of a clocked load island, according to the
present invention;
FIG. 4 is a schematic depiction showing macroscopic and microscopic skew
minimization, according to the present invention;
FIG. 5 is a flow diagram depicting workflow associated with generation of
an updated netlist and clocked load preplacement batch command file,
according to the present invention;
FIG. 6 is a further detailed flow diagram of the present invention;
FIG. 7 provides an overall view of the software data flow, according to the
present invention;
FIG. 8A depicts an input DEF file in ASCII format, as used by the present
invention;
FIG. 8B further depicts an input DEF file, as used by the present
invention;
FIG. 8C depicts an output DEF file, as used by the present invention; and
FIG. 8D depicts a DEF modification file, as used by the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
FIG. 1 is a simplified depiction of an integrated circuit ("IC") chip 2
that includes layers 4, 6, 8 that contain the devices comprising the ASIC
and interconnecting traces. An IC may contain more or fewer layers, but
the layers are fabricated a semiconductor substrate 10. Level 4 contains
what will be referred to as a base array circuit, and may also include
circuitry and components to which clock signals need not be coupled, e.g.,
region 16. Of course, there may be several such non-clocked regions and
they need not reside in a corner as shown. Of course, it is understood
that instead of representing a semiconductor IC, FIG. 1 could instead
represent layers in a multi-layer printed circuit board, wherein clock
skew is to be minimized.
Generally, the upper layers, e.g., 6 and 8 in FIG. 1, include conductive
interconnect traces 12 and 14 that are typically metal. For ease of
illustration, FIG. 1 only depicts components and traces important to an
understanding of the present invention. For example, inter-level vias that
couple portions of the various layers are not depicted. It is also
understood that the various layers comprising an IC are in close contact
with each other, rather than spaced-apart as depicted in FIG. 1.
According to the present invention, the various layers comprising the IC
are deemed partitioned into a grid-like assembly of preferably rectangular
regions. In FIG. 1, the phantom lines dividing layers 4, 6 and 8 into
regions are understood not to exist physically, but merely to serve as a
reference coordinate system.
Layer 4 includes a master clock or buffer unit 18 whose clock output
signal(s) will be coupled through vias (not shown) and the various metal
interconnect traces 10, 14 to various clock buffers or drivers, e.g., 20B,
22B and 24B. Although unit 18 is preferably disposed centrally on IC 2 for
ease of symmetry, it need not be centrally located. Preferably unit 18
includes one buffer for each of the clock buffers or drivers 20B, 22B,
24B, although unit 18 would instead include fewer buffers that had greater
load fanout capability.
Each buffer or driver 20B, 22B, 24B preferably is centrally located within
a region that is surrounded by an island e.g., 20, 22, and 24. As best
seen in FIGS. 3A and 3B, within the islands are the various clocked loads
(e.g., 20CL, 22CL, 24CL) that are to be driven by the signal from the
on-chip master clock 18 by way of the interconnect traces.
Of course, layer 4 can in practice include thousands of clock islands,
rather than the three depicted. Also, there may be intermediate levels of
clock buffering between the master clock 18 and the region clock buffers,
for example, to enhance fanout. (See, for example, FIG. 4.) As used in
this context, "level" denotes not a semiconductor fabrication level, but
rather where in a sequence of series-coupled buffers a given buffer is to
be found.
By definition, clock skew is the discrepancy in time between the clock
signal seen directly at master clock 18, and the clock signal seen at the
various clocked loads within each clock island. It is the purpose of the
present invention to provide a method whereby such clock skew is
effectively minimized, preferably to 200 ps or less.
With reference to FIG. 1, the present invention provides a substantially
constant length metal trace interconnect path between the on-chip master
clock and the terminus node of each interconnect path. In practice, one or
more portions of a given interconnect path may be fabricated on layer 6
and/or layer 8. Traces on different layers may have different associated
capacitances and can thus influence skew differently. Thus, the present
invention requires not only that the total length for each interconnect
path be the same as the total length for other interconnect paths, but
also that the sub-lengths of each path fabricated on each metallization
layer are also equal.
In FIG. 1, the reference point for the various interconnect paths is the
center node N18, which communicates with underlying on-chip master
clock/buffer unit 18 through a via. At node N18 and indeed at other nodes
N, an inter-layer via will be present to couple signals from layer to
layer.
Consider first the interconnect path from N18 to terminus node 20N, which
overlies and is coupled to buffer 20B by a via (not shown). This
interconnect path consists of a path length on layer 6 (denoted P20-6)
plus a path length on layer 8 (denoted P20-8). Sub-length P20-6 is shown
as 5 units long, and sub-length P20-8 is 2 units long, a "unit" being the
side of a square defined by the phantom coordinate lines. Thus, the total
interconnect path from N18 to terminus node 20N is 7 units. Similarly,
from N18 to the terminus node 22N associated with buffer 22B is P22-6 (5
units) plus P22-8 (2 units) or 7 units total. Likewise, from N18 to the
terminus node is P24-6 (5 units) plus P24-8 (2 units), or 7 units total.
Note that while any of these interconnect paths could be shortened, so
doing is undesirable unless all paths can be appropriately shortened. It
is understood, for example, that a clock signal propagating along a seven
unit long path can arrive at a clocked destination sooner than will the
same signal propagating along a six unit long path.
The present invention achieves a tightly controlled initial distribution of
the interconnect paths by creating a netlist that contains joiner or
driver cells (denoted J in FIG. 1). A joiner cell is simply a dummy
pattern of metal whose presence forces a placement router tool to pass
through the joiner cell and to turn 90.degree., thus locating a metal
interconnect trace where desired, according to the present invention.
Preferably the joiner cell occupies the same chip area as a clocked
buffer, which simplifies rebuffering the associated net, whose terminus is
a joiner cell. For example, a dummy buffer could be located at level 4
beneath what would otherwise be the location of a joiner cell at level 6.
Joiner cell positioning intentionally lengthens some interconnect paths to
force substantially equal metallization interconnect trace lengths. So
doing substantially equalizes the propagation contribution of each path
upon the master clock signal from node 18N. This minimizes clock skew from
the on-chip clock to the region buffers, at least on a macroscopic level.
The placement of the joiner cells commands a generic placement router tool
such that the interconnect paths are laid out in a very predictable
manner. The preferred embodiment is practiced using the Gate Ensemble
placement router system, a commercially available package. A similar
system, known as Gate Compiler, is available commercially from Compass
Design Automation.
The present invention coerces the placement router system to behave in a
predictable manner. Conventionally, the internal algorithms associated
with such placement router systems dictate that the desired default path
is the shortest path. However, to equalize interconnect traces according
to the present invention, the desired path is generally not the shortest
path. By judicious placement of joiner cells, the present invention forces
a conventional placement joiner system to run traces through the joiner
cells, and to thus equalize interconnect lengths and minimize skew, at
least on a macroscopic scale.
Thus, as shown by FIG. 1, the present invention reduces skew between the
center node N18 and the terminus node of each interconnect path by
equalizing not only total interconnect path lengths, but also sub-lengths
on each metallization layer. The equalization process is accomplished by
grid-partitioning the various layers and by inserting joiner cells to
satisfy the length and sub-length requirements and to meet the electrical
requirements of each distribution net. FIG. 1 is a generic representation,
drawn for ease of illustration. In practice, the preferred embodiment of
the present invention utilizes "H"-shaped interconnect segments to satisfy
each net (as shown in FIGS. 2A and 2B).
As described above, the master clock signal travels equal distances to
arrive at the terminus node (20N, 22N, 24N) associated with each clocked
buffer (20B, 22B, 24B), thus minimizing skew at the macroscopic level. The
present invention next minimizes skew at the microscopic level by ensuring
that the path lengths from the terminus nodes to each associated clocked
load (e.g., the plurality of loads 20CL, 22CL and 24CL) are electrically
substantially equal within an acceptable error tolerance. (Preferably each
buffer drives the same number of equivalent loads, to further equalize
skew.)
As shown in FIG. 1 (and seen in more detail in FIG. 3A) electrically equal
region buffer-to-clocked load path lengths result from locating each group
of clocked loads within a typically donut-shaped island ring surrounding
the associated buffer. The terms "donut-shaped" or "ring-shaped" will be
used to describe the locations of the clocked loads around their
associated buffer. However, the island shape may in fact be elliptical
rather than circular. Interconnections between the clocked loads and their
associated buffers are made using metal traces 12 and 14, which traces may
not exhibit perfectly identical wiring characteristics. If, for example,
traces 12 exhibited lower resistivity than traces 14, the "donut-shaped"
island would preferably be elongated along the axis of the level 6 traces
12. Although one axis of the island traces would present a longer trace
length than the other axis, the effective electrical distances between the
clocked load and the associated buffer would still be equal electrically.
Because the clocked loads are thus placed substantially equidistant
electrically from the associated buffer within a donut or elliptical ring,
a conventional placement router system (e.g., a system from Gate Ensemble)
is allowed to route the buffer-to-clocked loads connection paths
unassisted. The margin of error is acceptably small, and indeed
statistically there appears to be some compensation in that some
buffer-to-clocked load connection paths may be slightly longer
electrically than nominal and some such paths will be slightly shorter
electrically than nominal. It will be appreciated that while the present
invention dictates placement of the metal interconnect lines, fabrication
of the underlying ASIC is not disturbed.
FIGS. 2A and 2B depict different fixed clock buffer "H"-patterns for two
different ASICs, according to preferred embodiments of the present
invention. A master signal at pad 30 is coupled through trace 32 to the
centrally located on-chip clock or buffer unit 18, which may in fact be a
plurality of buffers. Preferably unit 18 provides a separate buffer for
each island, although fewer buffers having multiple fanout capability
might be used instead. Because the IC chip is partitioned into clocked
zones, an "H"-pattern advantageously permits trimming back an arm of the
"H" without substantially affecting load impedance due to the buffering
that is used. Preferably such trimming occurs during distribution of
regional buffers.
In FIG. 2A, sixteen regions as shown, wherein four intermediate buffers Bi
are coupled by equi-length traces Ti to the on-chip clock 18. These four
intermediate level buffers Bi may each be considered as being located at
the center of a region comprising the four adjoining square regions.
Centrally located within the regions are regional buffers Br coupled by
equi-length traces Tr to an intermediate buffer Bi. In FIG. 2A, the buffer
Br20 in the upper left-hand corner could in fact be buffer 20B in FIG. 1.
FIG. 2B depicts metal interconnect routings for a different underlying
ASIC, wherein an off-chip clock signal may be coupled through pad 30 and
trace 32 to the centrally located clock/buffer 18. In FIG. 2B, twenty-four
regions are depicted, each containing a centrally located buffer Br,
coupled by equi-length traces in an "H"-shaped configuration to
clock/buffer 18. The buffer Br20 in the upper left-hand corner may, for
example, be considered to be buffer 20B in FIG. 1. Note in FIGS. 2A and 2B
that a hierarchy of "H-shaped" configurations is present. For example, one
"H"-shaped configuration is centered about clock/buffer unit 18, whose arm
termini form the centers of other "H"-shaped configurations. There could,
of course, be fewer or more nests of "H"-shaped configurations than what
is shown in FIGS. 2A and 2B.
FIG. 3A is similar to what is shown in FIG. 2A except that the
hierarchically soft grouped clocked loads and clock islands are now
depicted. As in FIG. 2A, sixteen regions whose regional buffers Br are
coupled to an on-chip clock/buffer 18 are shown. For example, the upper
left corner of FIG. 3A shows regional buffer 20B surrounded by an island
20 that contains a plurality of clocked loads CL arranged in a donut-ring
configuration. For ease of illustration, the clocked loads are shown
placed within a circular-shaped donut-ring, although as noted the ring
could in fact be elliptical to compensate for electrical differences in
the interconnect traces at the various wafer levels.
Generally when the ASIC is designed, the ASIC designer assigns clocks to
the design, and designates how many clocked loads should be associated
with a given region. However, the ASIC designer is not concerned with
clock distribution or fanout, per se. Typically, it suffices that for each
unique clock, the ASIC designer assign a clock name and associated clocked
loads. (Although the preferred embodiment has been described with respect
to a single master clock signal, of course the present invention is also
applicable to a system having multiple master clock signals.)
It suffices for the ASIC circuit designer to commit a number of clocked
loads to what the present invention grid-designates as a region. The
present invention makes a logical hierarchial soft grouping of clocked
loads, associating soft groups with regions. The ASIC designer's netlist
is then changed to assign new clock net names for all clocked loads within
each soft group, which soft groups are shown as A, B, C, D, E, F, G and H
in FIG. 3A. The present invention generates preplacement command files for
the clocked loads, the netlist is updated, and a clocked load preplacement
batch command file is created.
FIG. 3B shows in detail a typical region, for example the region containing
regional buffer 20B and island 20. In the preferred embodiment, each
region has a side dimension X.sub.L of about 1,800 .mu.m, although other
dimensions could be used instead. A plurality of clocked loads CL are
positioned within a donut-shaped ring centered upon regional clock/buffer
20B and comprising island 20. In the preferred embodiment, the ring has a
ring dimension .DELTA.L of 300 .mu.m, although other sizes could be used.
Since FIG. 3B depicts a circular-shaped ring, it is assumed that the
electrical characteristics of the traces 12 and 14 are substantially
equal.
In FIG. 3B, an intermediate buffer Bi is shown in the right-hand corner,
whose clocked signals are coupled by traces Tr to the regional
clock/buffer 20B. It is understood that clock/buffer 20B may in fact
include a plurality of buffers, whose substantially identical outputs are
coupled by traces to a preferably equal number of clocked loads CL. For
example, trace T' and trace T" each couple four clocked loads CL to
clock/buffer 20B. Of course in practice, the number of clocked loads
coupled to each buffer is substantially larger than four, more typically
two hundred or so.
Whereas a conventional placement router system was essentially constrained
by joiner cell placement to predictably route metal interconnect traces,
the placement router system is given considerably leeway in
interconnecting the various clocked loads CL to the associated region
buffer. However, this final clock interconnect distribution is actually
indirectly controlled in that the clocked loads will have been placed
within a small ring surrounding the associated regional (e.g., island)
buffer. In practice, the interconnects from the clocked loads to their
associated regional buffer will be a statically substantially constant
radius electrically. In any event, the distances contributed by these
regional buffer-to-clocked load interconnects is but a tiny fraction of
the total interconnect length from the on-chip clock or buffer unit. As a
result, skew error resulting from unequal island buffer-to-clocked load
interconnects is relatively negligible.
FIG. 4 depicts macroscopic and microscopic skew minimization, over three
levels (e.g., stages) of buffering, according to the present invention. In
FIG. 4, the first three levels or stages of buffering are denoted fixed
clock buffers in that interconnect lengths are forced to be substantially
equal, in total length, in metallization layer sub-length, and in
impedance load presented, thanks to joiner cell placement. Collectively,
buffers 18, Bi and Br and the preferably "H"-shaped pattern of
interconnect traces define a fixed clock tree.
Preferably the fixed clock tree takes into account the total number of
clock buffers at the various stages, the size of the base array and
location of operating voltage busses (whose presence can deny certain
areas of layers 6 and 8 for purposes of routing "H"-shaped interconnects),
the magnitude of RAM, ROM and other block modules (whose presence at layer
4 can affect interconnect routings at levels 6 and 8), and the specific
clocked loads used in the ASIC design. This information is used to
distribute and pre-route the clock buffers and associated interconnects
such that, macroscopically, skew is as close to zero as is possible.
As noted, microscopically, the final routing of the island interconnects to
the clocked loads is done by the placement router algorithm, but only
after the various load cells are placed within donut-shaped rings
surrounding an adjacent island buffer. It is understood that skew is
affected by the degree of interconnect equalization achieved for the fixed
clock buffers, and also (but to a lesser extent) by the final routing to
the clocked loads.
In FIG. 4, the first level buffering occurs when, at pad 30, an off-chip
clock signal is received and coupled by trace 32 to the on-chip
clock/buffer 18. Traces Ti then coupled the output from clock/buffer 18 to
various buffers Bi-1 through Bi-n, in the same manner as depicted in FIG.
2A and FIG. 2B. These buffers Bi comprise the second level or stage of
buffering.
For ease of illustration, FIG. 4 shows only the second buffering as
providing an output signal on traces Tr to several third stage buffers
Br-1 through Br-m. Again, the traces Tr and buffers Br are preferably
similar to what is depicted in FIGS. 2A and 2B. For ease of illustration,
only two of the Br buffers are shown coupled by traces Tx to a preferably
equal number of clocked loads CL. According to the present invention, if
the ASIC design does not provide a sufficient number of clocked loads to
ensure equal buffer loading, dummy equivalent impedance loads will be
fabricated and coupled as necessary to the Br buffers.
With reference to FIGS. 3A and 3B, it is understood that the Br buffers are
preferably located in the center of a donut-shaped or elliptical-shaped
configuration that contains the various associated clocked loads CL. The
traces Tx (which are analogous to traces T', T" in FIG. 3B) are positioned
by the placement router tool used with the present invention. Although the
placement router tool has some discretion in routing these traces, because
of the island configuration used, these traces will be substantially equal
in electrical length.
According to the present invention, the skew time T.sub.d is given by:
T.sub.d =T.sub.i +K›C.sub.unitgate .multidot..SIGMA.LV+C.sub.unitgate
.multidot..SIGMA.FANOUT!
from which equation, it is apparent that:
T.sub.d =T.sub.i +K›C.sub.unitgate .multidot..SIGMA.LV!+K›C.sub.unitgate
.multidot..SIGMA.FANOUT!=T.sub.A +T.sub.B
where C.sub.unitgate is the capacitance associated with each clocked load,
K is the ohmic loss associated with the preferably equal length
interconnect traces, and where FANOUT is the preferably equal number of
clocked loads coupled to the buffers. T.sub.A is a fixed delay time
associated with the fixed clock buffers of FIG. 4, a delay that is
substantially zero in magnitude. T.sub.B is associated with the placement
of the clocked loads CL in FIG. 4, and represents a variable delay that
typically is less than about 200 ps, according to the present invention.
In practice, if the allowed skew time is say 200 ps, e.g., the variation in
T.sub.D is, then for a FANOUT=20, it follows that:
##EQU1##
The above implies that a 0.1 pF capacitance difference is acceptable,
which is equivalent to a metal trace interconnect length difference
.DELTA.L of about 357 .mu.m, an acceptable and realizable variation.
FIG. 5 depicts the general workflow of the present invention, wherein input
data includes input netlist 40A, floorplan information 40B, and hard
grouping information 40C. Preferably netlist 40A is in design exchange
format ("DEF"), although other formats could instead be used. DEF is an
ASCII representation that uses syntax conventions, but has the limitation
that DEF file lines are truncated beyond 2048 characters. The ASIC
designer-provided input information will include all parameters required
to evaluate clock distribution. These parameters may include, without
limitation, clock buffer specification, metal interconnect and connective
via characteristics for loading, and possibly for secondary effects such
as cross-coupling, and fringe effects. Preferably the design does not
buffer the clocks in that all clocks, of the same type carry the same name
and assignment directly to their respective clocked load designations.
At step 44, the ASIC-designer input 40A, 40B, and 40C is received into the
present invention. Using this information, at step 47 a fixed "H"-tree
pattern of traces is generated, wherein interconnect lengths and
metallization layer sub-lengths are substantially equal. As noted, such
equalization follows from placement of joiner cells J.
At step 52 the present invention quantizes the grid-defined rectangular
regions into donut-shaped regions. Any clocked device located within a
given grid area becomes a candidate for inclusion within a local region
defined by a donut-shaped ring. The invention preliminarily decides upon a
grouping of clocked loads, including a preliminary decision as to the
donut-shaped region wherein each clocked load should be located, for
example region A or region B in FIG. 3B. On the initial decision, any
clocked load found within a region is associated with an adjacent
donut-shaped island and its region buffer.
At step 58, the donut-shaped rings are precisely filled. The present
invention calculates the exact location of each clocked load within a
donut-shaped ring to ensure that the path length from each final region
buffer to each clocked load is substantially the same. FIG. 3A depicts a
typical placement of clocked loads following step 58.
At step 59, the netlist (DEF) is updated to assign new and unique clock net
names for the clocked loads that have now been located within the
appropriate region donut-shaped ring. The updated netlists will be used to
accomplish the actual interconnections to the clocked loads. Steps 52, 58
and 59 are repeated for all soft groups, until every clocked load has been
placed, essentially the same constant distance from an associated region
buffer.
At this juncture, the final routing has been completed, taking perhaps
three hours to complete, and time T.sub.B ideally will be less than about
200 ps. At step 61, the present invention generates the preplacement
command files for the clocked devices, and generates an updated netlist.
Preferably this information is stored in a format recognizable by a
standard placement-router system, e.g., Gate Ensemble. The generated
updated netlist (60A) and clocked device preplacement batch command file
(60B) are then stored. (This information is received and used at steps 64
and 66 in FIG. 6, following.)
FIG. 6 is a flow diagram of the present invention, wherein details beyond
what was shown in FIG. 5 are indicated. At the top of FIG. 6, the
ASIC-designer provided inputs are shown. At the bottom of the figure, the
output provided by the present invention includes a final data base 42
that contains placement data for the interconnect metal traces and
placement of the various clocked loads within the donut-shaped regions
surrounding an associated region or island buffer. The output data base is
provided as control input to a conventional placement router system.
In FIG. 6, block 44 describes how the metal interconnect traces are
located, according to the present invention. More specifically, to form
the "H"-shaped tree interconnect patterns, the number of buffers located
at the tips of the arms of the "H's" must be determined. Clearly the area
size of the underlying base array will dictate how many "H"-shaped traces
can be formed.
Thus, input to block 44 includes the number of clocks (since the present
invention may be used to route multiple clocks), information as to volta | | |