|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to direct memory access in a data processing system, and specifically to controlling direct memory access using a user-programmable algorithm.
BACKGROUND OF THE INVENTION
Direct Memory Access (DMA) controllers are used in computer systems to offload repetitive data movement tasks from a processor in a data processing system. As the demand for increased performance of the processor, or central processing unit
(CPU), increases so does the need for high-throughput, flexible DMAs that work well with these processors. Original DMA controllers (DMACs) used only registers or memory storage devices to specify source, destination, and length of data to be
transferred. The DMAC was coupled to only one source device. Soon the need to carry out simultaneous block transfers led to the development of multi-channel DMACs that achieved the effect of performing several data movements simultaneously. As data
transfer rates continued to increase, set up, service and interrupt overhead for the DMACs became too high, especially when the DMAC was programmed for a single contiguous block of memory per interrupt.
To overcome these overhead issues, descriptor-based DMACs were introduced. As the computer system complexity increased, so the DMACs increased in complexity. Today, some DMACs use a dedicated processor to perform such complex functions. The
dedicated processor, or coprocessor, is often based on a reduced instruction set computer (RISC) methodology. Such coprocessors operate on increasingly complex protocols, and often provide algorithmic support, such as digital filtering operations. The
algorithmic support is critical to many applications where data movement and calculation rates are high. This is particularly true of entertainment applications, such as video, graphic and audio applications, and is also important in areas such as audio
and visual decompression calculations. While the need for flexible algorithmic manipulation of data by the DMAC increases, the coprocessor becomes less attractive as it operates on a data-structure descriptor architecture which has limited flexibility
and it can not achieve the high performance of the dedicated state machine of a traditional DMAC.
Therefore, there is a need for a DMAC that provides algorithmic support using descriptors that define DMA algorithms instead of data structures. Additionally, there is a need for a flexible method of programming a DMAC with simple building
blocks. Further, a need exists for a method of programming a DMAC that allows easy expansion for additional and complex data manipulations.
Still further, as imaging and entertainment applications continue to move and manipulate large amounts of data in a variety of ways, there is a need to allow the user to specify the functions done in a DMA, and a further need to increase the
throughput capabilities of a DMA to accommodate the ever increasing data sizes.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention may be more fully understood by a description of certain preferred embodiments in conjunction with the attached drawings in which:
FIGS. 1-2 illustrate, in block diagram form, prior art data processing systems having DMA controllers.
FIG. 3 illustrates, in block diagram form, a data processing system having a DMA controller in accordance with one embodiment of the present invention.
FIG. 4 illustrates, in block diagram form, a DMA controller as in FIG. 3 in accordance with one embodiment of the present invention.
FIG. 5 illustrates, in state diagram form, execution of operations within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.
FIGS. 6-10 illustrate descriptors within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.
FIG. 11 illustrates, in state diagram form, operation of a master DMA engine within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.
FIGS. 12-13 illustrate tables and pointers within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.
FIG. 14 illustrates, in block diagram and logical form, priority assignment of requesters within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED
EMBODIMENTS
For clarity the terms assertion and negation are used herein to avoid confusion regarding "active-low" and "active-high" signals. The term assert or assertion is used to indicate that a signal is active or true, independent of whether that level
is represented by a high or a low voltage. The term negate or negation is used to indicate that a signal is inactive or false.
In one aspect of the present invention, in a data processing system, comprising a processor and a memory coupled to the processor, a data movement engine (DME) is adapted to directly execute FOR tasks assigned by the processor, said task
including a movement of a data element from a first location in said memory to a second location in said memory. The DME includes an execution unit (EU) adapted to perform a selected one of an arithmetic operation and a logical operation, and a FOR task
controller adapted to perform said data movement and to select, in response to said FOR task, one of the arithmetic operation and the logical operation to be performed by the EU on said data element. The DME directly executes FOR tasks by a state
machine within the DME. The FOR tasks describe functions to be performed by an EU and are made up solely of C language style FOR loops. With these FOR loops, the FOR tasks may perform any number of functions including arithmetic and logical functions
for digital signal processing (DSP) operations like filtering, or various aspects of data communication protocols.
In another aspect of the present invention, in a data processing system, having a processor and a memory coupled to the processor, a method for moving data includes the steps of: the processor assigning a FOR task, the FOR task including a
movement of a data element from a first memory location to a second memory location; retrieving the data element from the first memory location; selecting one of an arithmetic operation and a logical operation to perform on the data element; performing
the selected one of the arithmetic operation and the logical operation on the data element; and storing a result of the selected one of the arithmetic operation and the logical operation on the data element to the second memory location.
In still another aspect of the present invention, in a data processing system, comprising a processor and a memory coupled to the processor, a data movement engine (DME) is adapted to directly execute tasks assigned by the processor, said task
including a movement of a data element from a first location in said memory to a second location in said memory. The DME includes a task controller adapted to perform said data movement and to select, in response to said task, one of the arithmetic
operation and the logical operation to be performed by the EU on said data element; and a priority selection unit adapted to select a requester from a plurality of requesters.
In one embodiment of the present invention, in a data processing system having a processor, an execution unit, and a memory storage location for storing an instruction descriptor where the instruction descriptor includes: a first field
identifying an operation code; a second field identifying a first operand set corresponding to the operation code; and a third field identifying a operand set corresponding to the operation code. In one embodiment of the invention, the operand sets are
stored successive storage locations., such as in data routing descriptors (DRDs).
The present invention allows a DMA to switch DMA requesters within a single FOR task to accomplish a complete algorithm. In this way, a FOR task may be written to encompass every step of the data flow even when the requesting device changes
throughout the task. The requester, or DMA initiator, is identified in a descriptor on a per-movement basis.
The present invention provides a DMA controller, implemented in one embodiment as a hardware block, that interfaces with on-chip peripherals having memory storage capabilities with only minimal processor intervention. Complex algorithms may be
implemented using iterative loops and multiple execution units (EUs). The descriptors describe the algorithm or task and may be used to identify a requestor associated with each step in the task. The descriptor provides a fill-in-the-blanks form for C
language type for loop constructs, referred to herein as "FOR loops," and data movements. The "FOR task" is a type of envelope, where the user fills in the specifics, but the task is according to the FOR loop format. This allows easy programming in C
language code, with the restriction that only FOR loops are used. The FOR loop provides a construct that may be used to implement "while loops" and "if" statements.
The DMA is responsive to a processor as well as sources, referred to as "initiators." The term is derived from their function to initiate DMA activity. The terms initiator and requester may be used interchangeably. Initiators may be
input/output (I/O) devices, or may be input only or output only. The initiation of DMA activity for I/O initiators is sensitive to threshold levels associated with first-in-first-out buffers (FIFO), where such levels indicate the presence of received
data or an empty or near-empty transmitter. Other initiators may be timer outputs, a custom co-processor, an always true source, a communication semaphore, or any other condition that initiates a DMA transfer.
In one embodiment, a priority table is implemented ranking the priority of initiators within the data processing system. The priority table selects a highest priority requestor for processing. Each DMA has a predetermined number of tasks, that
may be user defined, each having a task number. An association between requestor and specific task number indicates the task to be performed for each initiator. Once the processor assigns, or enables, a task, the associated requestor is identified by
each loop in the FOR task, and if there are no conflicts, that requestor's task is performed by the DMA.
The DMA interprets C language style FOR loop encodings, each represented as a sequence of descriptors. In one embodiment the descriptors are 32-bit descriptors. The variable initialization, termination conditions, and increment amount for each
FOR loop are encoded into loop control descriptors (LCDs). Each LCD is capable of defining variables. The functions and routing of operands, or data to implement the loop body, are encoded into data routing descriptors (DRDs). A DRD can define
operations using variables, and can define a write destination. Each DRD can be extended to multiple descriptors in order to define complex routing and operations. Loops may nest, and within an inner loop, multiple DRDs in sequence may be used to
accomplish more than one write destination per loop iteration.
The DMA works with an assembler to parse code segments written in C language syntax. The assembler then maps them to equivalent representations of LCDs and DRDs according to mappings defined in the appropriate descriptor section. The assembler
also assigns variables in accordance with mappings associated with a Variable Table. The assembler also assigns values in accordance with the mappings associated with a Task Table. The assembler function is performed by an external software application
to encode the fill-in-the-blank bit fields of the LCDS and DRDs from human-readable C-language source code. The assembler implements an almost direct mapping from the source code to the descriptors.
The DMA has a limited programming model accessible by the user. The Task Descriptors, Task Table, and Variable Table(s) are loaded by the user prior to enabling the DMA. Each of which is described in detail hereinbelow.
The present invention will be described with reference to two prior art data processing systems, each having a direct memory access (DMA) unit as illustrated in FIGS. 1 and 2. FIG. 1 illustrates a prior art data processing system 2 having a
direct memory access unit (DMA 4), a processor 6, a memory 8. The processor 6, the memory 8, and the DMA 4 are coupled via multiple buses 10, where data, address and control information is transmitted over the buses 10 and controlled by an arbiter,
specifically arbitrator 12. The processor 6 initiates a data transfer transaction by writing to a register in the DMA 4.
The DMA 4 includes three portions, each portion operating as an individual DMA unit. A first portion, DMA.sub.0 14, is coupled to an input/output device, I/O.sub.0 16, and is dedicated for transferring data to and from the I/O016. A second
portion, DMA.sub.1 18, is coupled to an input/output device, I/O.sub.1 20, and is dedicated for transferring data to and from the I/O120. A third portion, DMA.sub.2 22, is coupled to an input/output device, I/O.sub.2 24, and is dedicated for
transferring data to and from the I/O224. Each DMA portion operates as an individual DMA unit, having registers for storing source and destination addresses, length of data to be transferred, and any other information necessary to effect the
transactions. Each DMA portion is coupled to a dedicated I/O unit by a bus and is responsive to requests from the I/O unit. Note that the processor 6 includes a local cache region 26, which may include data and/or instruction portions.
In operation, data processing system 2 will initiate a transaction with the DMA 4 by initializing the channel, i.e. selecting one of the I/O devices and its associated DMA portion for data transfer. The processor 6 loads registers in the DMA
with control information, address pointers, and transfer counts. The processor 6 then starts the channel. In this way, the processor 6 enables one of the I/O devices, which then generates a request to the associated DMA.
In response to the request, the DMA transfers data until termination of a data block. During the data transfer phase, the DMA accepts requests for operand transfers and provides addressing and bus control for the transfers. The termination
phase occurs after the operation is complete, when the DMA indicates the status of the operation in a status register. Note that each DMA portion, DMA014, DMA118, and DMA222, is dedicated to one of the I/O devices, I/O016, I/O120, and I/O224. In other
words, the sources are each coupled to a dedicated DMA. Therefore each DMA portion, DMA014, DMA118, and DMA222, includes address registers, and control registers, as well as a storage location for transfer counts. The processor 6 enables one or more of
the I/O devices, which then generates a request to the associated DMA.
In the data processing system 2 of FIG. 1, the DMA 4 is configured with dedicated channels coupling each I/O device to an associated portion of the DMA. In contrast, FIG. 2 illustrates a data processing system 28 having a multiplexer (MUX 30)
that selects one I/O device for input to DMA 32. The I/O devices include I/O.sub.0 34, I/O.sub.1 36, and I/O.sub.2 38, that are coupled to MUX 30. The data processing system 28 includes a processor 40 having a local cache 42. A register 44 is coupled
to the MUX 30, where the register 44 provides selection control for the MUX 30. The data processing system 28 includes an arbitrator 46, a memory 48, and buses 50 for transferring address, data and control information. The DMA 32, the arbitrator 46,
the processor 40, and the memory 48 are each coupled to the buses 50. In this embodiment, the processor 40 selects the I/O device for the transfers.
In contrast to the prior art DMA methods, the present invention provides a DMA unit, i.e. DMA controller, that controls data transfers and other operations using a high-level programming language construct. In one embodiment, the DMA unit uses
C-language constructs, and specifically FOR loop constructs. The C-language and the FOR loop constructs are described in detail in "The C Programming Language," by Brian W. Kernighan and Dennis M. Ritchie, published by Prentice Hall, having copyright
1988.
According to one embodiment of the present invention, the DMA unit is a user-programmable engine that interprets a series of C language FOR loop style descriptors to perform a user configurable series of data movements and manipulations. A
collection of these descriptors is much like a software program. There are two types of descriptors: Loop Control Descriptors (LCDs) and Data Routing Descriptors (DRDs). These descriptors form a C language FOR loop programming for the DMA. This adds
to the flexibility of prior art DMA units by off-loading compute resources from the processor, while increasing the ease of use for the programmer. Additionally, this improves performance as the FOR loop may be performed by highly optimized, dedicated
purpose DMA state machines. The DMA architecture is optimized for very high throughput over complete processing generality meaning that it functions under the command of a processor.
With respect to the C-language constructs, the FOR loop has the general form:
for (<initial index value(s)>; <termination condition(s)>; <increment value(s)>) { /*loop body*/ }
where the "index initial value" initializes the loop and is therefore done before the loop proper is entered. The "termination condition" is a test that controls the loop. Note that although it is called a termination condition it is thought of
as a loop continuation condition, as the loop only continues while the termination condition is true. After each iteration of the loop, the increment step is executed, and the termination condition is again checked. If the termination condition is
false, then the loop terminates. The body of the loop may be any number of lines. Typically, in a FOR loop the initialization and increment are logically related.
As discussed hereinabove, descriptors include LCDs and DRDs. The LCDs specify the index variables, such as memory pointers, byte counters, etc. along with the termination and increment values, while the DRDs specify the nature of the loop body,
i.e. how data gets pumped to and from memory and how execution units manipulate data. Inner loops may initialize and compare their loop-index variable(s) to outer loops variable(s), allowing the DMA to perform a variety of useful functions. Further,
the DMA looping structure allows it to perform indirections and additions in a loop's loop-index initialization, adding flexibility to the available functions. The DMA also supports nested looping within this programming model.
As an example, a DMA program, listed as a sequence of LCDs and DRDs, is as follows:
LCD1 for(i=0; i<3; i++) { LCD2 for (j=0; j<i; j++) DRD2 *j = *i; DRD1 *i = 5; }
Each line in the DMA program above represents a successive memory location occupied by the indicated LCD or DRD.
In this example, LCD1 provides the initialization value, termination condition, and step size for a FOR loop. The variable i is initialized to zero (0) and the loop continues iterations while i is less than three (3). On each iteration, the
variable i is incremented. Nested inside this outer FOR loop is a FOR loop with another loop index variable, j, which is initialized to zero (0) and is incremented on each iteration of this inner loop. The DRD information is the body of the loop. On
each inner-loop iteration, variable i is used as a memory address of data that is to be moved to the memory location addressed by the variable j.
Similarly, on each outer-loop iteration, variable i holds the address of the memory location into which a value of five (5) is to be written. While this is a straightforward example, it illustrates the use of LCDs and DRDs as building blocks to
construct programs or algorithms within the DMA controller.
The DRDs are descriptors that describe assignment statements within the body of the loop in terms of data flow and manipulation. For example, data in the body of the loop may be multiplied together, in which case the DRD will specify data flow
through a multiplier for this operation. Similarly, if a logical operation, such as an AND, is indicated, the DRD will specify a data flow through a logic unit for completing this operation. The body of the loop may include a number of levels and
combinations of these type functions.
Using the C language constructs and structures, the present invention allows simple encoding of a broad range of applications, including but not limited to simple peripheral to memory transfers, simple memory to memory transfers, simple
one-dimensional processing of data, functions of two or more variables, filtering algorithms, such as finite impulse response (FIR), and infinite impulse response (IIR), and also scatter-gather processing via the indirection capability. Additional
processing available includes but is not limited to sample format conversion, data decompression, bit-block transfers, color conversion, as well as drawing characters. Note that program model memory regions may exist in any memory-mapped space within
the data processing system.
To better understand the utilization of programming constructs to implement such applications, it is desirable to define a few terms. According to the present invention, a "descriptor" is a piece of information, typically a predetermined number
of bits, that describes a portion of an algorithm, or location of data, or information relating to any other function to be performed within the DMA. This is in contrast to prior art descriptors that were used to indicate memory locations for data
movement, but did not include descriptive information necessary to execute an algorithm. A "task" as used throughout this description is a collection of LCD and DRD descriptors that embodies a desired function. A task could include the steps of
gathering an ethernet frame, performing some function on the data content, and storing the result in memory. Additionally, the task could complete by interrupting the processor. The DMA will support multiple enabled tasks simultaneously.
A "task table" is a region in memory that contains pointers to each of the DMA program model components on a per-task basis. A register within the DMA, referred to as a "TASKBAR" or task table base address register, gives the location of the
task table itself. The entries in the task table define task begin and end pointers, its variable table pointer and other task-specific information. Alternate embodiments may include a subset of this information or may include additional information.
The task table points to the tasks in a "task descriptor table," that is a task-specific region in memory containing descriptors that specify the sequence of events for each task. Each task has its own private variable table. FIG. 12 illustrates the
interaction of the TASKBAR, the task table, the task descriptor table, and the variable table according to one embodiment. Alternate embodiments may store the LCDs and DRDs in an alternate arrangement or using an alternate storage means.
FIG. 3 illustrates a data processing system 52 according to one embodiment of the present invention. The data processing system 52 includes a DMA 54, referred to as a SMART DMA, that performs direct memory access transactions incorporating a
user-programmable algorithm. The DMA 54 includes a memory portion 56 for storing operational information, such as descriptor storage and/or data buffers. The DMA 54 and the memory portion 56 are both coupled to a communication bus 58, where the
communication bus 58 is also coupled to a plurality of input-output (I/O) devices each having a FIFO, including I/O.sub.0 60, I/O.sub.1 62, through I/O.sub.n 64.
Each of the I/O devices is coupled to the DMA 54. The communication bus 58 is also coupled to an arbitrator 66, where communication bus 58 is used to communicate address and control information, plus data and tag information. The DMA 54
provides information to the arbitrator 66 via a DMA master bus 68.
The data processing system 52 also includes a processor 70 coupled to the arbitrator 66 via a processor master bus 72. Address and control information, along with data information is communicated via the processor master bus 72. The processor
70 includes a local cache 74, including instruction(s) and/or data.
In the data processing system 52, the processor 70 initializes the memory regions relating to the DMA 54, including registers, and tables for storing descriptor information and task information. This involves filling the appropriate memory
locations. Initialization is performed at start-up and may also be performed at any time, such as on occurrence of an error condition or to reconfigure the system. This information may be stored in the memory portion 56 or in memory 76 or in a
combination of both.
FIG. 4 details the DMA 54 where connections to the I/O devices are indicated as REQUESTS 0, 1, through n. The DMA 54 includes a priority decoder 78 that communicates with a master DMA engine (MDE 80) and an address and data sequencer (ADS 82).
The MDE is coupled to the ADS by way of a loop control bus 84. The DMA 54 also includes a data routing pool (DRP 86) coupled to a plurality of execution units, including EU.sub.0 88, EU.sub.1 90, through EU.sub.n 92. The priority decoder 78 provides an
active requestor to the ADS 82.
A task is started by setting predetermined enable bit(s) within the DMA 54, in response to which the DMA 54 accesses the memory locations where descriptor and task information is stored. The set enable bit(s) indicate a task number corresponding
to a task to be performed. Note that in the present embodiment, multiple tasks may be identified by the set bit(s), where each of the multiple tasks is enabled. The DMA 54 first reads a register in the MDE 80, labeled as TASKBAR 94, that provides
information relating to an origin location within a task table stored in memory, either memory 76 or the memory portion 56. Note that the TASKBAR register may be located in another functional block, where the information is accessible by the MDE 80.
The MDE 80 calculates the origin location in the task table. These registers and tables are further detailed hereinbelow with respect to FIGS. 12 and 13. The task table stores multiple task-specific pointers to at least one task descriptor table. In
one embodiment, the pointers include task descriptor start and end pointers, a variable table pointer, a function descriptor base address, configuration bit(s), status information, base address for context save space, and literal-initialization LCD base
information.
The task descriptor table stores algorithmic descriptors, which are used to implement a user-programmable algorithm. In one embodiment, the algorithm is written in C-language constructs, and each task is composed of at least one FOR loop. Each
FOR loop is made up of at least one loop control descriptor (LCD) and at least one data routing descriptor (DRD). The DRD defines the body of the loop, while the LCD provides the initialization value(s), the increment(s), and the termination
condition(s) for the FOR loop.
The DMA 54 retrieves the task information from the task descriptor table corresponding to the task identified by the enable bit(s). The DMA 54 then parses the task information. Parsing involves retrieving LCD and DRD information from the Task
Description Table, reading the first loop, decoding at least a portion of the C-encoded LCDs and DRDs stored in the task information, and determining a requester. The parsing is performed within the MDE 80 and provides the decoded information to the ADS
82. The decoded information then initializes loop index, termination, and increment registers within the ADS 82. The parsed task information identifies a requestor, and the MDE 80 waits to receive a request from that requester before instructing the
ADS 82 to begin processing. Operation of the ADS 82 and the MDE 80 are further detailed hereinbelow.
Continuing with FIG. 4, one embodiment of the invention allows for dynamic request selection. Here, multiple requests are provided to the priority decoder 78 from multiple I/O devices. The priority decoder 78 selects from among the inputs. The
initiator/task registers determine which task to parse and process based on the selected requester, where the selection is made according to information contained within the DRDs of the active task.
The priority decoder 78 selects a highest priority requester for processing based on a priority table. The selection is made of those requesters that are currently making a request to the DMA 54. The priority table allows for a predetermined
number of priority levels. The priority decoder 78 includes registers that define associated task number and the priority of each request unit. The priority decoder 78 provides handshake signals in response to request inputs. The priority decoder 78
allows programming of each task for external request, priority of request, auto-start of task, interrupts, etc. In alternate embodiments, priority may be based on schemes such as round robin, time sliced, first in, fixed, last in, etc.
An association is made from a requestor to a specific task number, in the present embodiment numbers 0 to 15. The specific task is executed until the initiator removes the request. Note that while a task is executing, a higher priority
requester may interrupt the task. Interruptions occur at loop iteration boundaries.
Upon receiving a request from the priority decoder 78, the ADS 82 reads data according to the order specified in the DRD retrieved from the memory 76. Note that data may be retrieved from an EU, an internal loop register, or a memory read. If
the data is routed to an EU, the data is run through a predetermined data path in the DRP 86 according to descriptor information. As illustrated in FIG. 4, data flows from the DRP 86 to the appropriate one or more of the execution units. Each of the
execution units has a specific assigned function for a given DRD. In this way, they are configurable and may be user programmed by changing the information in the DRD. This adds flexibility to the DMA 54 by providing a means of implementing any
combination of these functions. From the execution unit, manipulated data flows to the DRP 86 for further routing to another of the execution units or to memory or to an internal loop register via the ADS 82.
As discussed hereinabove, the DRD descriptors provide information relating to the body of the loop in terms of data flow and manipulation. If the DRD specifies that two terms are to be multiplied together and then the result is to be logically
ANDed with another term, the ADS 82 will first route data through the DRP 86 to the particular execution unit that performs the multiplication. The output of the execution unit is provided via a data bus back to the DRP 86. The ADS 82 then directs data
via the DRP 86 to the execution unit that performs the AND operation. The result of this execution unit is then provided back to the DRP 86, which routes the result to the ADS 82. The result is then stored in the memory or to loop register as specified
in the body of the loop. To facilitate this data flow, each execution unit, EU088, EU190, through EUn 92 is coupled to and receives data from the DRP 86 via data bus 96, data bus 98, through data bus 100 respectively. Similarly, each execution unit,
EU088, EU190, through EUn 92 is coupled to and provides data to the DRP 86 via data bus 102, data bus 104, through data bus 106 respectively.
The present invention presents a data flow-through execution unit, where the function of the execution unit is assigned and then data is pumped through it. This saves processing time and adds to the flexibility of data processing.
The ADS 82 provides information to the DRP 86 via bus 108, and receives information from the DRP 86 via bus 110. The ADS 82 provides address control information to a memory interface unit 112 via bus 114. The memory interface unit 112 is
coupled to the DMA master bus 68 and the communication bus 58. The memory interface unit 112 is bidirectionally coupled to ADS 82 via bus 116. The ADS 82 reads data from and writes data to the memory interface unit 112 via bus 116. The memory
interface unit 112 provides information directly to the DRP 86 via bus 118.
The ADS 82 also includes register(s) 120 for writing control information for each task. The ADS 82 is basically the engine that pumps data through the DMA 54. Based on configuration bits set by the MDE 80 per the application program, the ADS 82
fetches as many operations as required and optionally routes them to the execution units. The ADS 82 evaluates termination conditions and stores the result in memory or elsewhere. The ADS 82 may store results internally in loop-index registers.
Similarly, results may be provided to an EU as an operands. Operation of the ADS 82 is controlled by a state machine, as is operation of the MDE 80.
FIG. 5 illustrates the state machine operation of the ADS 82. Basically, the ADS 82 performs loop execution control and sequences data movement. At state 122 function descriptors are loaded into each of the execution units. The execution unit
function descriptors, or EUFDs, specify the operation to be performed. This may be a Boolean operation, a multiplication, an addition, an error check, a data compression or decompression, or any operation implementable by an execution unit. An EUFD is
loaded into each execution unit as needed specified by the function numbers in the DRDs. Once all of the function descriptors are loaded, the DMA 54 transitions to state 124 to read operand(s). The operands are the data values used in the body of the
FOR loop. Note that in an alternate embodiment, the EUs may store EUFD information internally, avoiding the need to load EUFDs at state 122.
From state 124, if the DRD indicates that an execution unit is to be used, the DMA 54 transitions to state 126, and the DRP 86 passes data to that execution unit. The DMA 54 loops through state 124 and state 126 until all required operands are
retrieved. For memory to memory transfers, the DMA 54 transitions from state 124 to state 128 to write data to memory. Similarly, after all required operands are retrieved, the DMA 54 transitions from state 126 to state 128. Once the iteration is
complete, the DMA 54 transitions back to state 124. Note that data may not be written at state 128, but data may be retained for later processing. Basically, the DRD describes the following cases: no action; write data to memory; just accumulate; or
write to internal register, such as a loop-index register.
FIG. 3 describes one mode of operation referred to as "precise-mode." In precise mode, consistent with C language program execution, the loop increments are performed at the end of the loop. If there are multiple indexes, the increments are
clustered at the end of the loop. In an alternate mode, referred to as an "imprecise mode," the loop increment is performed as the index is used within the loop, thus s | | |