|
Description  |
|
|
BACKGROUND OF THE INVENTION
This invention relates generally to addressing a memory of a digital
system, and in particular to the method and apparatus for addressing a
memory by a set of parameters which specify an addressing sequence within
the memory for data arrays.
In many digital processing systems, specially programmed units control the
ordering of data access from memories while special purpose interfacing
units link up components with different data formatting requirements. The
proliferation of special purpose units results in inefficiencies, causing
high system development costs, long development times, high programming
costs, and high system maintenance costs.
The arithmetic unit (AU) in prior processors usually assisted the processor
control unit to sequence data items to the arithmetic section or transform
data items into a form appropriate for an operation to be performed; this
not only increased the complexity of the control unit, but also
interrupted data processing, causing a reduction in processor efficiency.
In addition, often the AU is idle while the next instruction is being
interpreted. It is desirable to continuously control formatting operations
over related data items, like arrays, and to let the AU perform continuous
AU functions.
Sometimes special purpose instructions are implemented in digital signal
processors to facilitate performing vectormatrix mathematics. Generally, a
series of instructions are required to perform a signal processing
algorithm using the available special purpose instructions and other
instructions for correctly indexing and dimensioning arrays. A higher
order language that eliminates the need for ancillary parameters to index
and dimension arrays is highly desirable, especially when the hardware
required to implement such a language is not prohibitive.
A digital system is provided for accessing a multidimensional array of data
in accordance with parameters of an array transformation. A method of
generating addressing sequences for accessing a multidimensional array of
data in a digital system is provided comprising the steps of interpreting
instruction commands, performing arithmetic operations based on the
instruction commands, storing data used in the arithmetic operations in a
memory means, generating addressing sequences specified by an array
transformation of a high-level programming language in accordance with a
nested series of a plurality of parameters of the array transformation,
for serially accessing all elements of the data array, and transferring
the data between the memory means and arithmetic means performing the
arithmetic operations as specified by the array transformation.
In accordance with the present invention a digital system is provided
comprising means for interpreting instruction commands, means for
performing arithmetic operations based on the instruction commands, means
for storing a multidimensional data array for the arithmetic operations,
the storing means comprising means for generating a plurality of
addressing sequences in response to an array transformation of a
high-level programming language for serially accessing all elements of the
data array, and means for transferring the data between the storing means
and the arithmetic operations performing means in accordance with the
array transformation.
In accordance with a further feature of the invention a memory is provided
comprising, means for storing a multidimensional data array, a plurality
of read/write port means, the port means comprises addressing means for
transferring all elements of the data array to and from a bus means in
accordance with serial addressing sequences specified by an array
transformation of a high-level programming language, the addressing means
comprises means for generating a plurality of multidimensional indices
specified by the array transformation, and switching network and
arbitration means coupled between the storing means and the port means for
routing data transfers between the storing means and the port means.
BRIEF DESCRIPTION OF THE DRAWINGS
The above-mentioned aspects and other features of the invention are
explained more fully in the following description taken in connection with
the accompanying drawings in which:
FIG. 1 is a block diagram of a Macro Function Signal Processor (MFSP)
utilizing an intelligent memory device of the present invention.
FIG. 2 is a block diagram of an intelligent memory and its interfacing
busses.
FIG. 3 is a block diagram of the invention comprising an intelligent memory
port.
FIG. 4A and FIG. 4B show the control word formats for specifying an array
transformation.
FIGS. 5A-5E show the sequences of factored addressing for transposing an
array specified by an array transformation.
FIG. 6 illustrates three boundary modes and functions when an address
generator displacement encounters the right edge of a matrix.
FIG. 7 illustrates directions of i, j and k displacements on a I; J; K
block.
FIG. 8 shows some initial points within a block for wrap-around boundary.
FIG. 9 shows some initial points in the zero-fill boundary mode.
FIG. 10 is a block diagram of the address generator of the present
invention.
FIG. 11 shows the location of the array transformation parameters within
the upper and lower matrix access chips.
FIG. 12 is a block diagram of one of the matrix access chips in the indices
generator of the address generator.
FIG. 13 is a block diagram of one of the index generators in the matrix
access chip.
FIG. 14 is a block diagram of one of the length counters in the matrix
access chip.
FIG. 15 illustrates a matrix having row and column boundaries and
references zones outside said boundaries.
FIG. 16 shows one illustrative sequence of elements generated by a matrix
access chip to define an output shape.
FIG. 17 illustrates index adjustment from outside array boundaries.
FIG. 18 shows a Digital Fourier Transform coefficient matrix illustrating
that the exponent of w is the product of the row and column index.
DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring to FIG. 1, there is shown a block diagram of a Macro Function
Signal Processor (MFSP) 10 illustrating an overall system in which an
intelligent memory 12 may be used comprising an array transformation
address generator implementation of the present invention. More
particularly, the intelligent memory 12 may have a plurality of
intelligent ports 14-22, although in the MFSP 10 embodiment described
herein two of the ports 14 and 22 are simply serving as direct memory
access (DMA) ports known to one skilled in the art. Port1 16, port2 18 and
port3 20 are the intelligent ports in the system of FIG. 1, primarily
because of their ability to execute addressing sequences based on an array
transformation operator while completely hiding data attribute
considerations from an arithmetic unit 38.
In addition to the intelligent memory 12, the MFSP 10 comprises the
arithmetic unit 38, a control processor 32, a node control unit 24, a
System I/O unit 40, an I Bus 26, an S Bus 28 and an A Bus 30. Intelligent
port2 18 and intelligent port3 20 of the intelligent memory 12 each have
32-bit direct connections to inputs of the arithmetic unit 38 and a 64-bit
output of the arithmetic unit 38 connecting to intelligent port1 16,
thereby providing the means for streaming data to and from the arithmetic
unit 33. The A bus 30 not only interconnects the three intelligent ports
16-20, but in addition serves as the MFSP 10 internal control bus
interconnecting a 2-port RAM 34 of control processor 32 and the arithmetic
unit 38. The control processor 32 comprises the 2-port RAM 34 and a
command interpreter 36 for interpreting instructions and setting up the
intelligent memory 12 and the arithmetic unit 38 for execution of said
instructions. The system bus, S Bus 28, interconnects a plurality of units
including the node control unit 24, port0 14, port4 22, 2-port RAM 34,
command interpreter 36, arithmetic unit 38 and system I/O unit 40. DMA
port0 14 is connected to the node control unit 24 via the I Bus 26. The
node control unit 24 provides an interface between networks of high speed
busses in a distributed multiprocessor and a processor or other device
such as the MFSP 10. DMA port4 24 has a direct 32-bit connection to the
system I/O unit 40 for system application I/O information transfer.
Referring now to FIG. 2, there is shown a block diagram of the intelligent
memory 12 comprising five memory banks 52-60, an arbitration and switching
network 62 and a plurality of ports 14-22, including the three intelligent
ports 16-20 and two DMA ports 14 and 22. The arbitration and switching
network 62 directs the flow of data between the ports 14-22 and the memory
banks 52-60; it provides an 88-bit wide, 5.times.5 crossbar switch with
arbitration logic for the five inputs (ports) and five outputs (memory
banks). Although a particular embodiment of the intelligent memory is
described here for the MFSP 10, the invention is not limited to a specific
number of memory banks nor to a specific number of intelligent ports nor
to a specific size crossbar switch.
Still referring to FIG. 2, the 5.times.5 crossbar handles 64 bits for data,
24 bits for address and control from each of the ports 14-22. The
arbitration logic resolves conflicts between the ports 14-22. If more than
one port requests the same bank of memory, the arbitration logic holds off
the lower (fixed) priority port until the higher priority port completes
its transfer. Arbitration is performed on a cycle by cycle basis. Four of
the memory banks, bank0 52, bank1 54, bank2 56 and bank3 58 are random
access memories (RAM), each organized as 64K words by 64 bits. Memory
bank4 60 is a read-only memory (ROM) organized as 16K words by 64 bits.
Two of the RAM memory banks 52, 58 are used primarily by port2 18 and
port3 20 which are primarily used as READ ports. The other two RAM memory
banks are scratch pad areas of memory and as such are used primarily for
storing intermediate values via port1 16 which is primarily used as a
WRITE port. ROM memory bank 60 stores constants and approximation tables
for use during the operation of various macro functions. The DMA ports 14
and 22 act as an interface between the I Bus 26 or the system I/O unit 40
and the intelligent memory 12 for the purpose of transferring large blocks
of data; the blocks of transferred data reside in consecutive locations in
the intelligent memory 12. The DMA ports 14 and 22 also interface with the
S Bus 28 for control and status, the I Bus 26 or the system I/O unit 40
for transfer block data accesses external to the intelligent memory 12 and
to the arbitration and switching network for transfer block data accesses
to the intelligent memory 12.
The intelligent ports, port1 16, port2 18 and port3 20, have independent
controls to address and format each data element. Each port's setup
parameters describe the shape of the packed data, where it is to be found
(i.e. base address), and the method of access (i.e. read/write,
transposed, reversed, etc.). When a port is started, it begins accessing
the first element of the described data and continues until all data is
read or written regardless of errors. The read ports and write ports are
identical except for port ID bits for determining the port's function.
Referring now to FIG. 3, there is shown a block diagram of one of the
intelligent memory ports 16-20 comprising an address generator 100, a
memory controller 102 and a data formatter 104. The address generator 100
produces addresses to access a data array in memory banks 59 so that the
data array can appear in various convenient shapes for arithmetic unit 38
manipulation. The data formatter 104 acts as a data translator between the
memory banks 59 and the arithmetic unit 38. Data is packed into 64-bit
words in the memo bank 59. Packed data is unpacked and left justified in
the data formatter 104 prior to its use in arithmetic unit 38. The memory
controller 102 provides control for the address generator 100 and the data
formatter 104, as well as providing (control line) interfaces to the
arithmetic unit 38. In addition, the memory controller 102 is coupled to
the arbitration and switching network 62 and it initiates and controls all
memory accesses of an intelligent memory port.
The intelligent memory 12 requires a small set of parameters to execute any
addressing sequence required by a high-level signal processing language
such as a Macro Function Language (MFL) described hereinbelow. These
parameters have been integrated into a single addressing operator called
an array transformation that directly specifies the hardware control
parameters from the signal processing language syntax. The address
generator 100 implements the address functions specified by the array
transformation. A pair of 16-bit control words, the displacement control
word 80 and length control word 90, as shown in FIGS. 4A and 4B and
described hereinbelow, contain array transformation parameters and
initialize address registers within the address generator 100 of any one
of the intelligent ports 16-20 which then proceeds to execute memory
address sequences specified by the array transformation.
Prior to describing further the structure and operation of the invention in
conjunction with the drawings, it is necessary at this point to describe
the array transformation operator and certain aspects of the language used
to specify the parameters in the array transformation in order to
understand the invention. The address generator 100 of the intelligent
memory 12 as previously noted is functionally specified by the array
transformation operator. The word "operator" is used here in the general
mathematical sense of an entity that transforms an input into an output
according to the definition of the operator. The input and output are both
arrays, hence the name "array transformation." This operator describes
array addressing in terms of a factored series of nested addressing
sequences. An array transformation comprises ten parameters for specifying
an operation and has the following syntax which is described below:
______________________________________
[.DELTA.4 .DELTA.3 .DELTA.2 .DELTA.1 .vertline. .DELTA.0]
[L4 L3 L2 L1 .vertline. B]
______________________________________
The language syntax corresponds directly to the parameters required to
initialize the address generator 100 component in the intelligent memory
12. As a result, the mathematical definition of the array transformation
operator serves as the hardware definition of the address generator 100.
An intelligent memory 12 becomes possible when the technique of instruction
factoring is used in conjunction with the separation of data parameters
from the processing program. Instructions are factored into control
operators, variable functions, array modifiers, and operands. Each of
these with the exception of functions has a significant effect upon memory
operation. Control operators act as an addressing control mode to
determine the sequence(s) of applying operand-data to the arithmetic unit
38. Control operators specify relationships between array transformations
such as length parameters. Variable functions specify the arithmetic and
logical operations to be performed on the elements fetched from memory.
Array modifiers alter the normal addressing mode specified by the control
operator in use. Operands refer to the specific data to be used.
When these instruction parameters are intentionally separated from the
parameters of data, the data parameters can be maintained in a data
descriptor. A data descriptor would consist of a collection of information
describing the variable operands; such as data type, format and location.
At run-time, a program references an operand through the variable's
descriptor. Dynamic changes in data "shape" can be handled with no changes
affecting the program. One important requirement of an intelligent memory
is the treatment of an entire variable data array as a single operand.
Consequently, the location of the data is determined by a base address or
an initial reference point which references the initial element of the
array of data.
Signal processing algorithms are conveniently expressed in the language of
matrix mathematics. For this reason, MFL is an array-oriented language.
Most variables in MFL programs denote many elements of related data to be
treated as single entities. Most operations are defined directly on arrays
without requiring item-by-item statements. MFL arrays take the form of
vectors, matrices and blocks. Referencing an individual element of an
array requires one, two or three numbers called indices to mark its
position in the array.
A vector is an array whose elements are selected by a single index. In
other words, a vector has one coordinate and is considered to be a
collection of elements arranged in a line. The number of elements in this
line is called the length of the vector. A null vector is a vector
containing no elements. The length of a null vector is zero.
A matrix is an array whose elements are selected by two indices. It has two
coordinates and is considered to be a collection of elements arranged in a
rectangle. The number of elements in each row is called the row length.
The position of an element along the row coordinate is called the row
position or the column number. The number of elements in each column is
called the column length. The position of an element along the column
coordinate is called the column position or row number. Together, the row
length I and column length J constitute the "shape" of the matrix. The
shape is written J;I.
In some applications the matrix has a direct correspondence to some
physical reality such as the data derived from an array of pixels which
represent a planar image. In these cases the properties of a matrix are
directly applicable to the processing. In other cases the matrix is simply
a convenience for purposes of processing. In many signal processing
operations, a matrix is merely a collection of vectors. The shape is
consistent with the number of row vectors and the length of each vector
(which is equivalent to the number of columns). The processing requires an
iterative use of the vector set using and modifying one vector at a time.
Consequently the usual interpretation of an array modifier is to apply it
to each row vector rather than to the entire array or matrix. Of course it
is sometimes necessary to modify the complete array as well. In this case
the array can be considered as a single long vector.
A block is an array whose elements are selected by three indices. It has
three coordinates and is considered to be a collection of elements
arranged in a set of matrices. A block uses the row and column terminology
defined previously for matrices. In addition, the number of matrices in
the block K is called the depth of the block. The position of an element
along this coordinate is called the depth position or the matrix number.
Together, the row length, column length and depth constitute the shape of
the block. The shape of the block is written K;J;I.
To illustrate the concept of factored addressing, the following example
qualitatively describes array transposition in terms of factored
addressing. When a matrix in row-major order is read by columns instead,
the output will be a matrix with rows consisting of columns from the input
array. The output array is of rank two, implying that the procedure must
require two displacement sequences.
Referring now to FIG. 5, let I equal the length of rows in the input array
and let J equal the length of columns. The array transformation is
described by the following sequences:
(0) Start at the upper left corner of the array.
(1) Move down one point in the column direction J-1 times to define a line
of J points. Return to the point before the first displacement in the
column direction.
(2) Move across one point in the row direction and repeat step (1) I-1
times to define an I.times.J matrix.
The sequences illustrated in the preceding example may be generalized into
a procedural definition of the array transformation:
(0) Go to the initial point delta 0 (.DELTA.0).
(1) Move by displacement delta 1 (.DELTA.1 a total of L1-1 times to define
a line of L1 points. Return to the point before the first displacement of
this sequence.
(2) Move by displacement delta 2 (.DELTA.2) and repeat step one L2-1 times
to define a matrix of L2.times.L1 points. Return to the point before the
first displacement of this sequence.
(3) Move by displacement delta 3 (.DELTA.3) and repeat steps one and two
L3-1 times to define a block of L3.times.L2.times.L1 points. Return to the
point before the first displacement of this sequence.
(4) Move by displacement delta 4 (.DELTA.4) and repeat steps one, two and
three L4-1 times and stop.
The result is a set of L4 blocks of L3.times.L2.times.L1 points. The rank
of the output array is equal to the number of displacement sequences
required to generate it. With the above definition, output arrays of up to
rank four are possible. Each nested sequence corresponds to a separate
hardware circuit. When necessary, more sequences may be added to the
definition to produce shapes of higher rank. In a Macro Function Language
(MFL), an array transformation on an input array C is specified by ten
parameters written directly below the name of the array and its data
descriptor according to the following syntax:
______________________________________
______________________________________
C16 10;20
[.DELTA.4 .DELTA.3 .DELTA.2 .DELTA.1 .vertline. .DELTA.0]
[L4 L3 L2 L1 .vertline. B]
______________________________________
The parameters fall into three categories: displacements (.DELTA.),
lengths.(L) and boundary modes (B).
Referring now to FIG. 6, the boundary mode (B) parameter of an array
transformation sets a geometrical context for the displacements and
lengths. The boundary mode determines the action of the address generator
100 if a given displacement results in an address that falls outside the
boundaries of the array. The boundary modes are as follows:
W=wrap-around. When a boundary is encountered by a displacement, the
address generator continues to read on the other side of the array. For
example, when reading from left to right along a row, the address
generator 100 moves back to the left-most element of the row if the right
edge is encountered.
Z=zero-fill. All points outside the array are assumed to be zero for a read
port and valid data from the AU 38 is dropped for a write port.
I=ignore boundaries. This suffix may be appended to either zero-fill or
wrap-around. In this case, all boundaries except the last point in the
array are ignored. So wrap-around with this suffix moves the address
pointer back to the head of the array, whereas zero-fill keeps the pointer
going.
FIG. 6 shows the effect of each boundary mode on an address generator
displacement encountering the right edge of a matrix.
Displacements of an array transformation are iteratively added to an
initial point to generate addresses in a regular sequence. The
displacements are defined as pairs of numbers representing generalized
spacing on a two-dimensional surface, thereby facilitating detection of
the endpoints of each row and column of a matrix. This representation is
appropriate for most signal processing macros. If desired, the concept may
be extended to n-tuples for general displacements on an n-dimensional
array. Displacements on arrays may be written as complex numbers where the
imaginary part is the displacement in the row direction and the real part
is the displacement in the column direction. Real number displacements are
in the row direction with no displacement in the column direction. Either
form may be used with vectors, matrices and blocks. Symbols may also be
used. As shown in FIG. 7, "i" defines a unit displacement to the right
across the row direction, "j" defines a unit displacement down the column
direction, and "k" defines a unit displacement into the depth direction.
Displacements by multiples of either i, j or k are indicated by preceding
the symbol with the size of the displacement. For example, -5j denotes a
displacement by five points in the negative column direction.
A general displacement through a block requires a triplet of numbers in a
particular form (e.g. depth; column; row). For most applications, the k
direction does not require the same level of flexibility given to the j
and i directions. Data formed into a three-dimensional block usually can
be treated as a set of matrices. In these instances, two-dimensional
displacements on each matrix along with sequential accessing of each
matrix in the block are sufficient. As a result, displacement triplets
through a block are not always directly supported in hardware. However,
devices implementing larger n-tuples might be advantageous for some
applications. A variable displacement value may be stored in a co-operand
to the array transformation. Presence of an explicit value in the
co-operand is indicated by a "d" symbol in the corresponding displacement
of the array transformation. The "d" is used for user-defined variable,
and may also be defined by a translator in order to insert a non-coded
constant into the appropriate register of the address generator 100. The
displacements notation of an array transformation along with their
interpretation in terms of .DELTA.1 through .DELTA.4 are summarized as
follows:
Xi=Moves across the row X points to the right.
Yj=Move Y points down the column.
Zk=Move Z points into the depth.
0=Do not displace--repeat the previous sequence.
d=The displacement is contained in a variable co-operand to the array
transformation.
b;a=Move "a" points along the row and "b" points down the column.
c;b;a=Move "a" points along the row, "b" points down the column, and "c"
points into the depth.
The instruction loading time is reduced by assigning codes to the most
frequently required displacements. One possibility is to choose three-bit
codes for .+-.i, .+-.j, .+-.k, or 0. The eighth code is for "d", to
signify that the displacement value is a separate complex number sent by
the command interpreter 36. All variable displacements and constants not
equal to .+-.i, .+-.j, .+-.k, or 0 are sent to the address generator 100
as "d"'s. In a two-dimensional address generator, each "d" is replaced
with a complex number denoting a general displacement in the combined i
and j directions. If either the real or imaginary part is zero, the
displacement is solely in the i or j direction respectively. "k"
displacements of the form Zk are specified as Z x Jj, on an array reshaped
to (K.times.J);I. Hardware directly supporting the full three-dimensional
representation would require "d"'s in the form of triplets. With the
three-bit codes, four displacements and the initial point can be placed
into a single 16 bit word as shown in FIG. 4A.
Referring now to FIG. 8 and FIG. 9, the initial point is a displacement
from 0;0;0, the upper left corner of the array. For negative values, its
location depends on the boundary mode. FIGS. 8 and 9 compare the different
locations of some initial points for wrap-around and zero-fill. For
wrap-around, the displacements to a few of the corners of a block are as
follows:
-i=upper right corner
-j=lower left corner
-k=upper left deep corner
-1;-1=lower right corner
0=upper left corner
For zero-fill, all of these points are clustered about the upper left
corner of the array as shown in FIG. 9. The zero-fill mode interprets the
array as having infinite extent with elements outside the K;J;I shape set
to zero.
Lengths of an array transformation are real integers indicating the number
of times a displacement is performed and the resulting shape of the array.
Several mnemonics are provided to specify output lengths in terms of the
input shape of the array. A capital letter "K" indicates the depth, "J"
indicates the column length and "I" indicates the row length of the input
array. Other numeric lengths must be written explicitly in the
transformation if they are constant and in a co-operand if they are
variable in a form similar to explicit displacements.
For most numerical applications, two ports simultaneously access data to
create a pair of input streams for a dyadic function. If a displacement
must occur as many times as necessary to match the corresponding
displacements of the other input argument to a dyadic function, the number
"1" is written. This symbol may be overwritten at run time by the command
interpreter 36 with the appropriate length to match the other argument's
array transformation or port1 16 and port2 18 can monitor the
corresponding port length to determine the replacement length dynamically.
If the two array transformations have corresponding lengths equal to one,
no displacement occurs for the sequence. This is how a control operator
becomes two coupled array transformations.
A displacement may be specified to continue until a boundary of the array
is encountered. The length S, meaning "stop on boundary," is used for this
case. In wrap-around mode, if length m is set to S, displacements by
.DELTA.m continue until a .DELTA.m encounters a boundary. In zero-fill
mode, displacements by .DELTA.m continue until any lower level
displacement encounters a boundary.
When stop on boundary is used as a length parameter of a port addressing
one of the input data streams of a dyadic function, the corresponding
length in the other port must be "1", meaning "repeat until the other
array stops on its length." The following summarizes the array
transformation length symbols:
K=depth of input array
J=column length of input array
I=row length of input array
d=the length is contained in a variable
1=repeat as necessary to match a corresponding output array shape
S=stop displacing when a boundary is encountered
These six lengths are specified in terms of three-bit codes similar to
displacements. The entire length field and boundary mode occupy a single
16 bit word as shown in FIG. 4B.
The following examples show a few simple forms of array operations possible
with array transformations. Each example is given in terms of a Macro
Function Language (MFL) syntax. The array transformation is applied to an
array B or C of some sample shape and type contained in its data
descriptor. Below the array transformation is written the data descriptor
of the output array from the transformation. The parentheses enclosing
this information indicate that it is a response from th | | |