|
Description  |
|
|
CROSS REFERENCE TO RELATED APPLICATIONS
This application relates to improvements in the inventions disclosed in the
following patents and U.S. patent applications, all of which are assigned
to Texas Instruments and all of which are incorporated by reference:
U.S. patent application Ser. No. 08/263,501 Jun. 21, 1994 entitled
"MULTI-PROCESSOR WITH CROSSBAR LINK OF PROCESSORS AND MEMORIES AND METHOD
OF OPERATION", a continuation of U.S. patent application Ser. No.
08/135,754 filed Oct. 12, 1993 and now abandoned, a continuation of U.S.
patent application Ser. No. 07/933,865 filed Aug. 21, 1992 and now
abandoned, which is a continuation of U.S. patent application Ser. No.
07/435,591 filed Nov. 17, 1989 and now abandoned;
U.S. Pat. No. 5,212,777, issued May 18, 1993, filed Nov. 17, 1989 and
entitled "SIMD/MIND RECONFIGURABLE MULTI-PROCESSOR AND METHOD OF
OPERATION".
U.S. patent application Ser. No. 08/264,111 filed Jun. 22, 1994 entitled
"RECONFIGURABLE COMMUNICATIONS FOR MULTI-PROCESSOR AND METHOD OF
OPERATION," a continuation of U.S. patent application Ser. No. 07/895,565
filed Jun. 5, 1992 and now abandoned, a continuation of U.S. patent
application Ser. No. 07/437,856 filed Nov. 17, 1989 and now abandoned;
U.S. patent application Ser. No. 08/264,582 filed Jun. 22, 1994 entitled
"REDUCED AREA OF CROSSBAR AND METHOD OF OPERATION", a continuation of U.S.
patent application Ser. No. 07/437,852 field Nov. 17, 1989 and now
abandoned;
U.S. patent application Ser. No. 08/032,530 field Mar. 15, 1993 entitled
"SYNCHRONIZED MIMD MULTI-PROCESSING SYSTEM AND METHOD OF OPERATION," a
continuation of U.S. patent application Ser. No. 07/437,835 filed Nov. 17,
1989 and now abandoned;
U.S. Pat. No. 5,197,140 issued Mar. 23, 1993 filed Nov. 17, 1989 and
entitled "SLICED ADDRESSING MULTI-PROCESSOR AND METHOD OF OPERATION"; p
U.S. Pat. No. 5,339447 issued Aug. 16, 1994 field Nov. 17, 1989 entitled
"ONES COUNTING CIRCUIT, UTILIZING A MATRIX OF INTERCONNECTED HALF-ADDRESS,
FOR COUNTING THE NUMBER OF ONES IN A BINARY STRING OF IMAGE DATA";
U.S. Pat. No. 5,239,654 issued Aug. 24, 1993 filed Nov. 17, 1989 and
entitled "DUAL MODE SIMD/MIMD PROCESSOR PROVIDING REUSE OF MIMD
INSTRUCTION MEMORIES AS DATA MEMORIES WHEN OPERATING IN SIMD MODE";
U.S. Pat. No. 5,410,649 issued Apr. 25, 1995 filed Jun. 29, 1992 entitled
"IMAGING COMPUTER AND METHOD OF OPERATION", a continuation of U.S. patent
application Ser. No. 07/437,854 filed Nov. 17, 1989 and now abandoned;
U.S. Pat. No. 5,226,125 issued Jul. 6, 1993 filed Nov. 17, 1989 and
entitled "SWITCH MATRIX HAVING INTEGRATED CROSSPOINT LOGIC AND METHOD OF
OPERATION".
U.S. patent application Ser. No. 08/160,299 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT WITH BARREL ROTATOR";
U.S. patent application Ser. No. 08/158,742 filed Nov. 30, 1993 and
entitled "ARITHMETIC LOGIC UNIT HAVING PLURAL INDEPENDENT SECTIONS AND
REGISTER STORING RESULTANT INDICATOR BIT FROM EVERY SECTION";
U.S. patent application Ser. No. 08/160,118 filed Nov. 30, 1993 "MEMORY
STORE FROM A REGISTER PAIR CONDITIONAL";
U.S. patent application Ser. No. 08/324,323 filed Oct. 17, 1994 and
entitled "ITERATIVE DIVISION APPARATUS, SYSTEM AND METHOD FORMING PLURAL
QUOTIENT BITS PER ITERATION"a continuation of U.S. patent application Ser.
No. 08/160,115 concurrently field with this application and now abandoned;
U.S. patent application Ser. No. 08/159,285 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT FORMING MIXED ARITHMETIC AND
BOOLEAN COMBINATIONS";
U.S. patent application Ser. No. 08/160,119 filed Nov. 30, 1993 and
entitled "METHOD, APPARATUS AND SYSTEM FORMING THE SUM OF DATA IN PLURAL
EQUAL SECTIONS OF A SINGLE DATA WORD";
U.S. patent application Ser. No. 08/159,359 filed Nov. 30, 1993 and
entitled "HUFFMAN ENCODING METHOD, CIRCUITS AND SYSTEM EMPLOYING MOST
SIGNIFICANT BIT CHANGE FOR SIZE DETECTION";
U.S. patent application Ser. No. 08/160,296 filed Nov. 30, 1993 and
entitled "HUFFMAN DECODING METHOD, CIRCUIT AND SYSTEM EMPLOYING
CONDITIONAL SUBTRACTION FOR CONVERSION OF NEGATIVE NUMBERS";
U.S. patent application Ser. No. 08/160,112 filed Nov. 30, 1993 and
entitled "METHOD, APPARATUS AND SYSTEM FOR SUM OF PLURAL ABSOLUTE
DIFFERENCES";
U.S. patent application Ser. No. 08/160,120 filed Nov. 30, 1993 and
entitled "ITERATIVE DIVISION APPARATUS, SYSTEM AND METHOD EMPLOYING LEFT
MOST ONE'S DETECTION AND LEFT MOST ONE'S DETECTION WITH EXCLUSIVE OR";
U.S. patent application Ser. No. 08/160,114 filed Nov. 30, 1993 and
entitled "ADDRESS GENERATOR EMPLOYING SELECTIVE MERGE OF TWO INDEPENDENT
ADDRESSES";
U.S. patent application Ser. No. 08/160,116 filed Nov. 30, 1993 and
entitled "METHOD, APPARATUS AND SYSTEM METHOD FOR CORRELATION";
U.S. patent application Ser. No. 08/160,297 filed Nov. 30, 1993 and
entitled "LONG INSTRUCTION WORD CONTROLLING PLURAL INDEPENDENT PROCESSOR
OPERATIONS";
U.S. patent application Ser. No. 08/159,346 filed Nov. 30, 1993 and
entitled "ROTATION REGISTER FOR ORTHOGONAL DATA TRANSFORMATION";
U.S. patent application Ser. No. 08/159,652 filed Nov. 30, 1993 "MEDIAN
FILTER METHOD, CIRCUIT AND SYSTEM";
U.S. patent application Ser. No. 08/159,344 filed Nov. 30, 1993 and
entitled "ARITHMETIC LOGIC UNIT WITH CONDITIONAL REGISTER SOURCE
SELECTION";
U.S. patent application Ser. No. 08/160,301 filed Nov. 30, 1993 and
entitled "APPARATUS, SYSTEM AND METHOD FOR DIVISION BY ITERATION";
U.S. patent application Ser. No. 08/159,650 filed Nov. 30, 1993 and
entitled "MULTIPLY ROUNDING USING REDUNDANT CODED MULTIPLY RESULT";
U.S. patent application Ser. No. 08/159,349 filed Nov. 30, 1993 and
entitled "SPLIT MULTIPLY OPERATION";
U.S. patent application Ser. No. 08/158,741 filed Nov. 30, 1993 and
entitled "MIXED CONDITION TEST CONDITIONAL AND BRANCH OPERATIONS INCLUDING
CONDITIONAL TEST FOR ZERO";
U.S. patent application Ser. No. 08/160,302 filed Nov. 30, 1993 and
entitled "PACKED WORD PAIR MULTIPLY OPERATION";
U.S. patent application Ser. No. 08/160,573 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT WITH SHIFTER";
U.S. patent application Ser. No. 08/159,282 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT WITH MASK GENERATOR";
U.S. patent application Ser. No. 08/160,111 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT WITH BARREL ROTATOR AND MASK
GENERATOR";
U.S. patent application Ser. No. 08/160,298 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT WITH SHIFTER AND MASK
GENERATOR";
U.S. patent application Ser. No. 08/159,345 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT FORMING THE SUM OF A FIRST
INPUT ADDED WITH A FIRST BOOLEAN COMBINATION OF A SECOND INPUT AND THIRD
INPUT PLUS A SECOND BOOLEAN COMBINATION OF THE SECOND AND THIRD INPUTS";
U.S. patent application Ser. No. 08/160,113 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT FORMING THE SUM OF FIRST
BOOLEAN COMBINATION OF FIRST, SECOND AND THIRD INPUTS PLUS A SECOND
BOOLEAN COMBINATION OF FIRST, SECOND AND THIRD INPUTS";
U.S. patent application Ser. No. 08/159,640 filed Nov. 30, 1993 and
entitled "THREE INPUT ARITHMETIC LOGIC UNIT EMPLOYING CARRY PROPAGATE
LOGIC";
U.S. patent application Ser. No. 08/160,300 filed Nov. 30, 1993 and
entitled "DATA PROCESSING APPARATUS, SYSTEM AND METHOD FOR IF, THEN, ELSE
OPERATION USING WRITE PRIORITY";
U.S. patent application Ser. No. 08/208,413 "TRANSPARENCY AND PLANE MASKING
IN TP TRANSFER PROCESSOR";
U.S. patent application Ser. No. 08/208,161 PIXEL BLOCK TRANSFER WITH
TRANSPARENCY";
U.S. patent application Ser. No. 08/208,171 "MESSAGE PASSING AND BLAST
INTERRUPT FROM PROCESSOR";
U.S. patent application Ser. No. 08/209,123 "GUIDED TRANSFERS WITH X,Y
DIMENSION AND VARIABLE STEPPING";
U.S. patent application Ser. No. 08/209,124 "GUIDED TRANSFER LINE DRAWING";
U.S. patent application Ser. No. 08/208,517 "TRANSFER PROCESSOR MEMORY
INTERFACE CONTROLS DIFFERENT MEMORY TYPES SIMULTANEOUSLY"; and
U.S. patent application Ser. No. 08/202,503 "ARCHITECTURE OF TP TRANSFER
PROCESSOR".
This application is also related to the following concurrently filed U.S.
patent applications, all of which are hereby incorporated by reference:
U.S. patent application Ser. No. 08/203,987 filed Mar. 8, 1993 and entitled
"MP VECTOR INSTRUCTIONS FP+LOAD/STORE"; and
U.S. patent application Ser. No. 08/207,992 filed Mar. 8, 1993 and entitled
"NORMALIZATION METHOD FOR FLOATING POINT NUMBERS".
TECHNICAL FIELD OF THE INVENTION
The technical field of this invention is the field of digital data
processing and more particularly microprocessor circuits, architectures
and methods for digital data processing especially digital image/graphics
processing.
BACKGROUND OF THE INVENTION
The inventive embodiments have many applications some of which relate to
the field of computer graphics, discussed herein as an illustrative
background. In a field of computer graphics known as bit mapped graphics,
computer memory stores data for each individual picture element or pixel
of an image at memory locations that correspond to the location of that
pixel within the image. This image may be an image to be displayed or a
captured image to be manipulated, stored, displayed or retransmitted. The
field of bit mapped computer graphics has benefited greatly from the
lowered cost and increased capacity of dynamic random access memory (DRAM)
and the lowered cost and increased processing power of microprocessors.
These advantageous changes in the cost and performance of component parts
enable larger and more complex computer image systems to be economically
feasible.
The field of bit mapped graphics has undergone several stages in evolution
of the types of processing used for image data manipulation. Initially a
computer system supporting bit mapped graphics employed the system
processor for all bit mapped operations. This type of system suffered
several drawbacks. First, the computer system processor was not
particularly designed for handling bit mapped graphics. Design choices
that are very reasonable for general purpose computing are unsuitable for
bit mapped graphics systems. Consequently some routine graphics tasks
operated slowly. In addition, it was quickly discovered that the
processing needed for image manipulation of bit mapped graphics was so
loading the computational capacity of the system processor that other
operations were also slowed.
The next step in the evolution of bit mapped graphics processing was
dedicated hardware graphics controllers. These devices can draw simple
figures, such as lines, ellipses and circles, under the control of the
system processor. Some of these devices can also do pixel block transfers
(PixBlt). A pixel block transfer is a memory move operation of image data
from one portion of memory to another. A pixel block transfer is useful
for rendering standard image elements, such as alphanumeric characters in
a particular type font, within a display by transfer from nondisplayed
memory to bit mapped display memory. This function can also be used for
tiling by transferring the same small image to the whole of bit mapped
display memory. Built-in algorithms for performing some of the most
frequently used graphics functions provide a way of improving system
performance. Also a graphics computer system may desirably include other
functions besides those few that are implemented in such a hardware
graphics controller. These additional functions might be implemented in
software by the system processor. These hardware graphics controllers will
typically allow the system processor only limited access to the bit map
memory. This limits the degree to which system software can augment the
fixed set of functions of the hardware graphics controller.
The graphics system processor represents yet a further step in the
evolution of bit mapped graphics processing. A graphics system processor
is a programmable device that has all the attributes of a microprocessor
and also includes special functions for bit mapped graphics. The TMS34010
and TMS34020 graphics system processors manufactured by Texas Instruments
Incorporated represent this class of devices. These graphics system
processors respond to a stored program in the same manner as a
microprocessor and include the capability of data manipulation via an
arithmetic logic unit, data storage in register files and control of both
program flow and external data memory. In addition, these devices include
special purpose graphics manipulation hardware that operate under program
control. Additional instructions within the instruction set of these
graphics system processors control the special purpose graphics hardware.
These instructions and the hardware that supports them are selected to
perform base level graphics functions that are useful in many contexts.
Thus a graphics system processor can be programmed for many differing
graphics applications using algorithms selected for the particular
problem. This provides an increase in usefulness similar to that provided
by changing from hardware controllers to programmed microprocessors.
Because such graphics system processors are programmable devices in the
same manner as microprocessors, they can operate as stand alone graphics
processors, graphics co-processors slaved to a system processor or tightly
coupled graphics controllers.
Several fields would desirably utilize more cost effective, powerful
graphics operations to be economically feasible. These include video
conferencing, multi-media computing with full motion video, high
definition television, color facsimile, smart photocopiers, image
recognition systems and digital photography, among other examples. Each of
these fields presents unique problems. Image data compression and
decompression are common themes in some of these applications. The amount
of transmission bandwidth and the amount of storage capacity required for
images and particular full motion video is enormous. Without efficient
video compression and decompression that result in acceptable final image
quality, these applications will be limited by the costs associated with
transmission bandwidth and storage capacity. There is also a need in the
art for a single system that can support both image processing functions
such as image recognition and graphics functions such as display control.
SUMMARY OF THE INVENTION
A floating point normalization circuit and method receives a mantissa part
and an exponent part of a floating point number. The exponent part is
encoded in a manner having a minimum expressible exponent. An exponent
decoder generates a coded multibit output corresponding to the maximum
decrease in the exponent part of the floating point number within the
minimum expressible exponent. This coded multibit output is bit-wise ORed
with the mantissa part of the floating point number. This forms a logical
OR output of each bit of the mantissa part with a corresponding bit of the
coded multibit output. A left most one circuit detects the bit position of
the most significant bit of the logical OR output having a "1". This bit
is the left most one bit. The mantissa and exponent are them normalized
according to this number. The mantissa is left shifted an amount equal to
the detected bit position of a most significant bit having a "1". Lastly,
the exponent is decremented an amount equal to the detected bit position
of a most significant bit having a "1".
The exponent decoder generates the coded multibit output in the form of a
mantissa equal to the minimum mantissa that can be normalized for the
input exponent part of the floating point number. This minimum mantissa is
equal to 2.sup.(M-N), where M is the minimum expressible exponent and N is
the exponent. If M-N is greater than the number of bits of the mantissa,
then the floating point number an always be normalized. In this case the
exponent decoder returns a zero output. In the preferred embodiment, the
exponent decoder includes a two to four line decoder for each pair of bits
of the exponent part of the floating point number, and an AND gate
connected to selected outputs of said two to four line decoders for each
bit of said mantissa.
The new circuit removes the need for an adder and a magnitude comparator as
previously required in the prior art. This significantly reduces the
evaluation time of this circuit. The longest path of the normalization
technique of this invention is the same length as the longest path of the
prior art normalization technique. However, the improved circuit is still
faster. This is because the OR function is significantly faster than the
adder and magnitude comparator of the prior art.
BRIEF DESCRIPTION OF THE FIGURES
These and other aspects of the present invention are described below
together with the Figures, in which:
FIG. 1 illustrates the system architecture of an image processing system
such as would employ this invention;
FIG. 2 illustrates the architecture of a single integrated circuit
multiprocessor that forms the preferred embodiment of this invention;
FIG. 3 illustrates the architecture of the master processor in the
preferred embodiment of this invention;
FIGS. 4a and 4b illustrate the organization of the data registers of the
master processor;
FIGS. 5a to 5g illustrate examples of the manner of storage of single
precision data in the registers of the master processor;
FIGS. 6a and 6b illustrate examples of the manner of storage of double
precision data in the registers of the master processor;
FIG. 7 illustrates the control registers of the master processor;
FIG. 8 illustrates the integer pipeline operation of the master processor;
FIG. 9a to 9c illustrate the instruction formats of the master processor;
FIG. 10 illustrates the floating point pipeline operation of the master
processor;
FIG. 11 illustrates the encoding of the floating point status register of
the master processor;
FIG. 12 illustrates the architecture of the floating point unit of the
master processor;
FIG. 13 illustrates the architecture of the floating point arithmetic logic
unit of the master processor;
FIG. 14 illustrates an example of a prior art floating point number
normalization circuit;
FIG. 15 illustrates an inventive embodiment of a floating point
normalization circuit;
FIG. 16 illustrates the construction of a few exemplary bits of the
exponent decode circuit of FIG. 15;
FIG. 17 illustrates the encoding of the vector floating point add
instruction of the master processor;
FIG. 18 illustrates the encoding of the vector multiply accumulate
instruction of the master processor;
FIG. 19 illustrates the encoding of the vector floating point multiply
instruction of the master processor;
FIG. 20 illustrates the encoding of the vector multiply subtract
instruction of the master processor;
FIG. 21 illustrates the encoding of the vector reverse subtract instruction
of the master processor;
FIG. 22 illustrates the encoding of the vector round floating point input
instruction of the master processor;
FIG. 23 illustrates the encoding of the vector round integer input
instruction of the master processor;
FIG. 24 illustrates the encoding of the vector floating point subtract
instruction of the master processor;
FIG. 25 illustrates the encoding of the vector load instruction of the
master processor;
FIG. 26 illustrates the encoding of the vector store instruction of the
master processor;
FIG. 27 illustrates the combined add and multiply floating point pipeline
operation of the master processor;
FIG. 28 illustrates an example embodiment of a high definition television
system; and
FIG. 29 illustrates an example of color facsimile system including a
multiprocessor integrated circuit having a single digital image/graphics
processor.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
FIG. 1 is a block diagram of an image data processing system including a
multiprocessor integrated circuit constructed for image and graphics
processing according to this invention. This data processing system
includes a host processing system 1. Host processing system 1 provides the
data processing for the host system of data processing system of FIG. 1.
Included in the host processing system 1 are a processor, at least one
input device, a long term storage device, a read only memory, a random
access memory and at least one host peripheral 2 coupled to a host system
bus. Because of its processing functions, the host processing system 1
controls the function of the image data processing system.
Multiprocessor integrated circuit 100 provides most of the data processing
including data manipulation and computation for image operations of the
image data processing system of FIG. 1. Multiprocessor integrated circuit
100 is bi-directionally coupled to an image system bus and communicates
with host processing system 1 by way of this image system bus. In the
arrangement of FIG. 1, multiprocessor integrated circuit 100 operates
independently from the host processing system 1. The multiprocessor
integrated circuit 100, however, is responsive to host processing system
1.
FIG. 1 illustrates two image systems. Imaging device 3 represents a
document scanner, charge coupled device scanner or video camera that
serves as an image input device. Imaging device 3 supplies this image to
image capture controller 4, which serves to digitize the image and form it
into raster scan frames. This frame capture process is controlled by
signals from multiprocessor integrated circuit 100. The thus formed image
frames are stored in video random access memory 5. Video random access
memory 5 may be accessed via the image system bus permitting data transfer
for image processing by multiprocessor integrated circuit 100..
The second image system drives a video display. Multiprocessor integrated
circuit 100 communicates with video random access memory 6 for
specification of a displayed image via a pixel map. Multiprocessor
integrated circuit 100 controls the image data stored in video random
access memory 6 via the image system bus. Data corresponding to this image
is recalled from video random access memory 6 and supplied to video
palette 7. Video palette 7 may transform this recalled data into another
color space, expand the number of bits per pixel and the like. This
conversion may be accomplished through a look-up table. Video palette 7
also generates the proper video signals to drive video display 8. If these
video signals are analog signals, then video palette 7 includes suitable
digital to analog conversion. The video level signal output from the video
palette 7 may include color, saturation, and brightness information.
Multiprocessor integrated circuit 100 controls data stored within the
video palette 7, thus controlling the data transformation process and the
timing of image frames. Multiprocessor integrated circuit 100 can control
the line length and the number of lines per frame of the video display
image, the synchronization, retrace, and blanking signals through control
of video palette 7. Significantly, multiprocessor integrated circuit 100
determines and controls where graphic display information is stored in the
video random access memory 6. Subsequently, during readout from the video
random access memory 6, multiprocessor integrated circuit 100 determines
the readout sequence from the video random access memory 6, the addresses
to be accessed, and control information needed to produce the desired
graphic image on video display 8.
Video display 8 produces the specified video display for viewing by the
user. There are two widely used techniques. The first technique specifies
video data in terms of color, hue, brightness, and saturation for each
pixel. For the second technique, color levels of red, blue and green are
specified for each pixel. Video palette 7 and video display 8 are designed
and fabricated to be compatible with the selected technique.
FIG. 1 illustrates an additional memory 9 coupled to the image system bus.
This additional memory may include additional video random access memory,
dynamic random access memory, static random access memory or read only
memory. Multiprocessor integrated circuit 100 may be controlled either in
wholly or partially by a program stored in the memory 9. This memory 9 may
also store various types of graphic image data. In addition,
multiprocessor integrated circuit 100 preferably includes memory interface
circuits for video random access memory, dynamic random access memory and
static random access memory. Thus a system could be constructed using
multiprocessor integrated circuit 100 without any video random access
memory 5 or 6.
FIG. 1 illustrates transceiver 16. Transceiver 16 provides translation and
bidirectional communication between the image system bus and a
communications channel. One example of a system employing transceiver 16
is video conferencing. The image data processing system illustrated in
FIG. 1 employs imaging device 3 and image capture controller 4 to form a
video image of persons at a first location. Multiprocessor integrated
circuit 100 provides video compression and transmits the compressed video
signal to a similar image data processing system at another location via
transceiver 16 and the communications channel. Transceiver 16 receives a
similarly compressed video signal from the remote image data processing
system via the communications channel. Multiprocessor integrated circuit
100 decompresses this received signal and controls video random access
memory 6 and video palette 7 to display the corresponding decompressed
video signal on video display 8. Note this is not the only example where
the image data processing system employs transceiver 16. Also note that
the bidirectional communications need not be the same type signals. For
example, in an interactive cable television signal the cable system head
and would transmit compressed video signals to the image data processing
system via the communications channel. The image data processing system
could transmit control and data signals back to the cable system head in
via transceiver 16 and the communications channel.
FIG. 1 illustrates multiprocessor integrated circuit 100 embodied in a
system including host processing system 1. Those skilled in the art would
realize from the disclosure the preferred embodiments of the invention
that multiprocessor integrated circuit 100 may also be employed as the
only processor of a useful system. In such a system multiprocessor
integrated circuit 100 is programmed to perform all the functions of the
system.
This multiprocessor integrated circuit 100 is particularly useful in
systems used for image processing. Multiprocessor integrated circuit 100
preferably includes plural identical processors. Each of these processors
will be called a digital image/graphics processor. This description is a
matter of convenience only. The processor embodying this invention can be
a processor separately fabricated on a single integrated circuit or a
plurality of integrated circuits. If embodied on a single integrated
circuit, this single integrated circuit may optionally also include read
only memory and random access memory used by the digital image/graphics
processor.
FIG. 2 illustrates the architecture of the multiprocessor integrated
circuit 100. Multiprocessor integrated circuit 100 includes: two random
access memories 10 and 20, each of which is divided into plural sections;
crossbar 50; master processor 60; digital image/graphics processors 71,
72, 73 and 74; transfer controller 80, which mediates access to system
memory; and frame controller 90, which can control access to independent
first and second image memories. Multiprocessor integrated circuit 100
provides a high degree of operation parallelism, which will be useful in
image processing and graphics operations, such as in the multi-media
computing. Since there are computing applications other than image and
graphics processing where these processors will be useful, reference to
processors 71, 72, 73 and 74 as image/graphics processors is for
convenience only.
Multiprocessor integrated circuit 100 includes two random access memories.
Random access memory 10 is primarily devoted to master processor 60. It
includes two instruction cache memories 11 and 12, two data cache memories
13 and 14 and a parameter memory 15. These memory sections can be
physically identical, but connected and used differently. Random access
memory 20 may be accessed by master processor 60 and each of the digital
image/graphics processors 71, 72, 73 and 74. Each digital image/graphics
processor 71, 72, 73 and 74 has five corresponding memory sections. These
include an instruction cache memory, three data memories and one parameter
memory. Thus digital image/graphics processor 71 has corresponding
instruction cache memory 21, data memories 22, 23, 24 and parameter memory
25; digital image/graphics processor 72 has corresponding instruction
cache memory 26, data memories 27, 28, 29 and parameter memory 30; digital
image/graphics processor 73 has corresponding instruction cache memory 31,
data memories 32, 33, 34 and parameter memory 35; and digital
image/graphics processor 74 has corresponding instruction cache memory 36,
data memories 37, 38, 39 and parameter memory 40. Like the sections of
random access memory 10, these memory sections can be physically identical
but connected and used differently. Each of these memory sections of
memories 10 and 20 includes 2K bytes for example, with a total memory
within multiprocessor integrated circuit 100 of 50K bytes.
Multiprocessor integrated circuit 100 is constructed to provide a high rate
of data transfer between processors and memory using plural independent
parallel data transfers. Crossbar 50 enables these data transfers. Each
digital image/graphics processor 71, 72, 73 and 74 has three memory ports
that may operate simultaneously each cycle. An instruction port (I) may
fetch 64 bit instruction words from the corresponding instruction cache. A
local data port (L) may read a 32 bit data word from or write a 32 bit
data word into the data memories or the parameter memory corresponding to
that digital image/graphics processor. A global data port (G) may read a
32 bit data word from or write a 32 bit data word into any of the data
memories or the parameter memories of random access memory 20. Master
Processor 60 includes two memory ports. An instruction port (I) may fetch
a 32 bit instruction word from either of the instruction caches 11 and 12.
A data port (C) may read a 32 bit data word from or write a 32 bit data
word into data caches 13 or 14, parameter memory 15 of random access
memory 10 or any of the data memories, the parameter memories or random
access memory 20. Transfer controller 80 can access any of the sections of
random access memory 10 or 20 via data port (C). Thus fifteen parallel
memory accesses may be requested at any single memory cycle. Random access
memories 10 and 20 are divided into 25 memories in order to support so
many parallel accesses.
Crossbar 50 controls the connections of master processor 60, digital
image/graphics processors 71, 72, 73 and 74, and transfer controller 80
with memories 10 and 20. Crossbar 50 includes a plurality of crosspoints
51 disposed in rows and columns. Each column of crosspoints 51 corresponds
to a si | | |