WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Direct memory access controller and method therefor    
United States Patent6421744   
Link to this pagehttp://www.wikipatents.com/6421744.html
Inventor(s)Morrison; Gary R. (Austin, TX), Mason; Kristen L. (Austin, TX), Galloway; Frank C. (Dripping Springs, TX), Nuckolls; Charles E. (Austin, TX), McKeown; Jennifer L. (Austin, TX), Polega; Jeffrey M. (Austin, TX), Tietjen; Donald L. (Austin, TX)
AbstractDirect memory access controller (DMAC) (54) adapted to directly execute C language style FOR tasks assigned by a processor (70), where the FOR task includes a movement of a data element from a first location to a second location in memory. The DMAC includes multiple execution units (EUs) (88, 90, 92), each to perform an arithmetic or logical operation, and a FOR task controller (80, 82, 86) to perform the data movement. The FOR task controller selects the operation to be performed by the EU. In one embodiment, the FOR task is made up of C language type FOR loops, where descriptors identify the control and body of the loop. The descriptors identify the source of operands for an EU, and the source may be changed within a FOR task. A descriptor specifies a function code for an EU and may specify multiple sets of operands for the EU.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 6421744
Direct memory access controller and method therefor - US Patent 6421744 Drawing
Direct memory access controller and method therefor
Inventor     Morrison; Gary R. (Austin, TX) , Mason; Kristen L. (Austin, TX) , Galloway; Frank C. (Dripping Springs, TX) , Nuckolls; Charles E. (Austin, TX) , McKeown; Jennifer L. (Austin, TX) , Polega; Jeffrey M. (Austin, TX) , Tietjen; Donald L. (Austin, TX)
Owner/Assignee     Motorola, Inc. (Schaumburg, IL)
Patent assignment
All assignments
Publication Date     July 16, 2002
Application Number     09/426,009
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 25, 1999
US Classification     710/22 710/23 710/31 710/33 710/68 711/148 711/162 711/202
Int'l Classification    
Examiner     Nguyen; Than
Assistant Examiner    
Attorney/Law Firm    
Address
Parent Case    
Priority Data    
USPTO Field of Search     710/22 710/23 710/31 710/33 710/26 710/62 711/100 711/148 711/202 711/165
Patent Tags     direct memory access controller
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
6223268
Paluch, Jr. et al.

Apr,2001

[0 after 0 votes]
6185634
Wilcox

Feb,2001

[0 after 0 votes]
6006286
Baker et al.

Dec,1999

[0 after 0 votes]
5996032
Baker

Nov,1999

[0 after 0 votes]
5890218
Ogawa et al.

Mar,1999

[0 after 0 votes]
5857114
Kim

Jan,1999

[0 after 0 votes]
5404522
Carmon et al.

Apr,1995

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. In a data processing system, comprising a processor and a memory coupled to the processor, a data movement engine (DME) adapted to directly execute FOR tasks assigned by the processor, said task comprising a movement of a data element between a first location in said memory and a second memory storage location, the DME comprising: an execution unit (EU) adapted to perform a selected one of an arithmetic operation and a logical operation; and a FOR task controller adapted to perform said data movement and to select, in response to said FOR task, one of the arithmetic operation and the logical operation to be performed by the EU on said data element.

2. The DME as in claim 1, wherein the FOR task is defined by: at least one loop control descriptor (LCD) comprising control information used to perform the FOR task; and at least one data routing descriptor (DRD) identifying the selected one of the arithmetic operation and the logical operation to be performed by the EU.

3. The DME as in claim 2, further comprising: a plurality of execution units (EUs), each adapted to perform a selected one of the arithmetic operation and the logical operation, each of the plurality of EUs having an input and an output; and wherein the FOR task controller is adapted to select, in response to the FOR task, at least one of the plurality of EUs for performance of the selected one of the arithmetic operation and the logical operation.

4. The DME as in claim 3, further comprising: a data router coupled to input/output adapted to provide an input to each of the plurality of EUs and adapted to receive an output from each of the plurality of EUs.

5. The DME as in claim 4, wherein the data router is adapted to provide an output from one of the plurality of EUs as an input to one of the plurality of EUs.

6. The DME as in claim 2, wherein the DME provides an indicator signal to a peripheral conducting a data communication protocol directing completion of a data frame in the data communication protocol.

7. The DME as in claim 2, wherein the DME terminates a loop upon completion of a received data frame in a data communication protocol.

8. The DME as in claim 2, wherein the DME provides an interrupt signal to the processor upon termination of a loop.

9. The DME as in claim 1, further comprising: a priority selection unit adapted to select from among a plurality of requesters and assert an active requestor signal to the FOR task controller.

10. The DME as in claim 9, wherein the FOR task is defined by: at least one loop control descriptor (LCD) comprising control information used to perform the FOR task; and at least one data routing descriptor (DRD) identifying the selected one of the arithmetic operation and the logical operation to be performed by the EU; and wherein the FOR task controller assigns a requester to the FOR task based on at least one field in the DRD.

11. The DME of claim 1, wherein the FOR task comprises a finite impulse response filter (FIR) operation.

12. The DME of claim 1, wherein the FOR task comprises a digital signal processing (DSP) operation.

13. The DME as in claim 1, wherein the second memory storage location is an internal loop index register.

14. The DME as in claim 1, wherein the second memory storage location is the execution unit.

15. In a data processing system, comprising a processor and a memory coupled to the processor, a method for moving data comprising the steps of: the processor assigning a FOR task, the FOR task comprising a movement of a data element between a first location in said memory and a second memory storage location; retrieving the data element from the first memory location; selecting one of an arithmetic operation and a logical operation to perform on the data element; performing the selected one of the arithmetic operation and the logical operation on the data element; and storing a result of the selected one of the arithmetic operation and the logical operation on the data element to the second memory location.

16. The method as in claim 15, further comprising the steps of: retrieving at least one loop control descriptor (LCD) having control information used to perform the FOR task; and retrieving at least one data routing descriptor (DRD) identifying the selected one of the arithmetic operation and the logical operation to be performed by the EU.

17. The method as in claim 15, further comprising the steps of: selecting a second one of an arithmetic and a logical operation to perform on the result of performing the selected one of the arithmetic operation and the logical operation; and performing the selected second one of the arithmetic operation and the logical operation on said result.

18. The method as in claim 15, further comprising the step of: selecting from among a plurality of requestors; and asserting an active requester signal to the FOR task controller.

19. In a data processing system, comprising a processor, a data movement engine (DME), and a plurality of requesters coupled to the processor, a method for moving data comprising the steps of: assigning a task to the DME, the task comprising a movement of a data element from a first memory location to a second memory location; retrieving a first plurality of descriptors for the task, wherein each descriptor of the first plurality of descriptors identifies a different requester of the plurality of requesters; selecting one requester from among the plurality of requestors based on the plurality of descriptors for the task; and providing a first active requestor signal identifying the selected one requestor to the DME; and in response to receiving a request from the selected one requester, performing the movement of the data element from the selected one requestor.

20. The method as in claim 19, further comprising the steps of: assigning a second task to the DME, the second task comprising a movement of a second data element from a third memory location to a fourth memory location; retrieving a second plurality of descriptors for the second task, wherein each descriptor of the second plurality of descriptors identifies a different requestor of the plurality of requesters; selecting a second requestor from among the plurality of requesters based on the second plurality of descriptors for the second task; and providing a second active requestor signal identifying the selected second requester to the DME; and in response to receiving a request from the selected second requester, performing the movement of the data element from the selected second requester.

21. In a data processing system, comprising a processor and a memory coupled to the processor, a data movement engine (DME) adapted to directly execute tasks assigned by the processor, said task comprising a movement of a data element from a first location in said memory to a second location in said memory, the DME comprising: an execution unit (EU) that is capable of performing an arithmetic operation and a logic operation; a task controller adapted to perform said data movement and to select, in response to said task, one of the arithmetic operation and the logical operation to be performed by the EU on said data element; and a priority selection unit adapted to select a requestor from a plurality of requesters.

22. In a data processing system, comprising a processor and a memory coupled to the processor, a data movement engine (DME) adapted to directly execute FOR tasks assigned by the processor, said task comprising a movement of a data element from a first location in said memory to a second location in said memory, the DME comprising: an execution unit (EU) adapted to perform a selected one of a compression operation and a decompression operation and to perform an arithmetic operation and a logic operation; and a FOR task controller adapted to perform said data movement and to select, in response to said FOR task, one of the arithmetic operation and the logical operation to be performed by the EU on said data element.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates to direct memory access in a data processing system, and specifically to controlling direct memory access using a user-programmable algorithm.

BACKGROUND OF THE INVENTION

Direct Memory Access (DMA) controllers are used in computer systems to offload repetitive data movement tasks from a processor in a data processing system. As the demand for increased performance of the processor, or central processing unit (CPU), increases so does the need for high-throughput, flexible DMAs that work well with these processors. Original DMA controllers (DMACs) used only registers or memory storage devices to specify source, destination, and length of data to be transferred. The DMAC was coupled to only one source device. Soon the need to carry out simultaneous block transfers led to the development of multi-channel DMACs that achieved the effect of performing several data movements simultaneously. As data transfer rates continued to increase, set up, service and interrupt overhead for the DMACs became too high, especially when the DMAC was programmed for a single contiguous block of memory per interrupt.

To overcome these overhead issues, descriptor-based DMACs were introduced. As the computer system complexity increased, so the DMACs increased in complexity. Today, some DMACs use a dedicated processor to perform such complex functions. The dedicated processor, or coprocessor, is often based on a reduced instruction set computer (RISC) methodology. Such coprocessors operate on increasingly complex protocols, and often provide algorithmic support, such as digital filtering operations. The algorithmic support is critical to many applications where data movement and calculation rates are high. This is particularly true of entertainment applications, such as video, graphic and audio applications, and is also important in areas such as audio and visual decompression calculations. While the need for flexible algorithmic manipulation of data by the DMAC increases, the coprocessor becomes less attractive as it operates on a data-structure descriptor architecture which has limited flexibility and it can not achieve the high performance of the dedicated state machine of a traditional DMAC.

Therefore, there is a need for a DMAC that provides algorithmic support using descriptors that define DMA algorithms instead of data structures. Additionally, there is a need for a flexible method of programming a DMAC with simple building blocks. Further, a need exists for a method of programming a DMAC that allows easy expansion for additional and complex data manipulations.

Still further, as imaging and entertainment applications continue to move and manipulate large amounts of data in a variety of ways, there is a need to allow the user to specify the functions done in a DMA, and a further need to increase the throughput capabilities of a DMA to accommodate the ever increasing data sizes.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be more fully understood by a description of certain preferred embodiments in conjunction with the attached drawings in which:

FIGS. 1-2 illustrate, in block diagram form, prior art data processing systems having DMA controllers.

FIG. 3 illustrates, in block diagram form, a data processing system having a DMA controller in accordance with one embodiment of the present invention.

FIG. 4 illustrates, in block diagram form, a DMA controller as in FIG. 3 in accordance with one embodiment of the present invention.

FIG. 5 illustrates, in state diagram form, execution of operations within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.

FIGS. 6-10 illustrate descriptors within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.

FIG. 11 illustrates, in state diagram form, operation of a master DMA engine within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.

FIGS. 12-13 illustrate tables and pointers within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.

FIG. 14 illustrates, in block diagram and logical form, priority assignment of requesters within a DMA controller as in FIGS. 3-4 in accordance with one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

For clarity the terms assertion and negation are used herein to avoid confusion regarding "active-low" and "active-high" signals. The term assert or assertion is used to indicate that a signal is active or true, independent of whether that level is represented by a high or a low voltage. The term negate or negation is used to indicate that a signal is inactive or false.

In one aspect of the present invention, in a data processing system, comprising a processor and a memory coupled to the processor, a data movement engine (DME) is adapted to directly execute FOR tasks assigned by the processor, said task including a movement of a data element from a first location in said memory to a second location in said memory. The DME includes an execution unit (EU) adapted to perform a selected one of an arithmetic operation and a logical operation, and a FOR task controller adapted to perform said data movement and to select, in response to said FOR task, one of the arithmetic operation and the logical operation to be performed by the EU on said data element. The DME directly executes FOR tasks by a state machine within the DME. The FOR tasks describe functions to be performed by an EU and are made up solely of C language style FOR loops. With these FOR loops, the FOR tasks may perform any number of functions including arithmetic and logical functions for digital signal processing (DSP) operations like filtering, or various aspects of data communication protocols.

In another aspect of the present invention, in a data processing system, having a processor and a memory coupled to the processor, a method for moving data includes the steps of: the processor assigning a FOR task, the FOR task including a movement of a data element from a first memory location to a second memory location; retrieving the data element from the first memory location; selecting one of an arithmetic operation and a logical operation to perform on the data element; performing the selected one of the arithmetic operation and the logical operation on the data element; and storing a result of the selected one of the arithmetic operation and the logical operation on the data element to the second memory location.

In still another aspect of the present invention, in a data processing system, comprising a processor and a memory coupled to the processor, a data movement engine (DME) is adapted to directly execute tasks assigned by the processor, said task including a movement of a data element from a first location in said memory to a second location in said memory. The DME includes a task controller adapted to perform said data movement and to select, in response to said task, one of the arithmetic operation and the logical operation to be performed by the EU on said data element; and a priority selection unit adapted to select a requester from a plurality of requesters.

In one embodiment of the present invention, in a data processing system having a processor, an execution unit, and a memory storage location for storing an instruction descriptor where the instruction descriptor includes: a first field identifying an operation code; a second field identifying a first operand set corresponding to the operation code; and a third field identifying a operand set corresponding to the operation code. In one embodiment of the invention, the operand sets are stored successive storage locations., such as in data routing descriptors (DRDs).

The present invention allows a DMA to switch DMA requesters within a single FOR task to accomplish a complete algorithm. In this way, a FOR task may be written to encompass every step of the data flow even when the requesting device changes throughout the task. The requester, or DMA initiator, is identified in a descriptor on a per-movement basis.

The present invention provides a DMA controller, implemented in one embodiment as a hardware block, that interfaces with on-chip peripherals having memory storage capabilities with only minimal processor intervention. Complex algorithms may be implemented using iterative loops and multiple execution units (EUs). The descriptors describe the algorithm or task and may be used to identify a requestor associated with each step in the task. The descriptor provides a fill-in-the-blanks form for C language type for loop constructs, referred to herein as "FOR loops," and data movements. The "FOR task" is a type of envelope, where the user fills in the specifics, but the task is according to the FOR loop format. This allows easy programming in C language code, with the restriction that only FOR loops are used. The FOR loop provides a construct that may be used to implement "while loops" and "if" statements.

The DMA is responsive to a processor as well as sources, referred to as "initiators." The term is derived from their function to initiate DMA activity. The terms initiator and requester may be used interchangeably. Initiators may be input/output (I/O) devices, or may be input only or output only. The initiation of DMA activity for I/O initiators is sensitive to threshold levels associated with first-in-first-out buffers (FIFO), where such levels indicate the presence of received data or an empty or near-empty transmitter. Other initiators may be timer outputs, a custom co-processor, an always true source, a communication semaphore, or any other condition that initiates a DMA transfer.

In one embodiment, a priority table is implemented ranking the priority of initiators within the data processing system. The priority table selects a highest priority requestor for processing. Each DMA has a predetermined number of tasks, that may be user defined, each having a task number. An association between requestor and specific task number indicates the task to be performed for each initiator. Once the processor assigns, or enables, a task, the associated requestor is identified by each loop in the FOR task, and if there are no conflicts, that requestor's task is performed by the DMA.

The DMA interprets C language style FOR loop encodings, each represented as a sequence of descriptors. In one embodiment the descriptors are 32-bit descriptors. The variable initialization, termination conditions, and increment amount for each FOR loop are encoded into loop control descriptors (LCDs). Each LCD is capable of defining variables. The functions and routing of operands, or data to implement the loop body, are encoded into data routing descriptors (DRDs). A DRD can define operations using variables, and can define a write destination. Each DRD can be extended to multiple descriptors in order to define complex routing and operations. Loops may nest, and within an inner loop, multiple DRDs in sequence may be used to accomplish more than one write destination per loop iteration.

The DMA works with an assembler to parse code segments written in C language syntax. The assembler then maps them to equivalent representations of LCDs and DRDs according to mappings defined in the appropriate descriptor section. The assembler also assigns variables in accordance with mappings associated with a Variable Table. The assembler also assigns values in accordance with the mappings associated with a Task Table. The assembler function is performed by an external software application to encode the fill-in-the-blank bit fields of the LCDS and DRDs from human-readable C-language source code. The assembler implements an almost direct mapping from the source code to the descriptors.

The DMA has a limited programming model accessible by the user. The Task Descriptors, Task Table, and Variable Table(s) are loaded by the user prior to enabling the DMA. Each of which is described in detail hereinbelow.

The present invention will be described with reference to two prior art data processing systems, each having a direct memory access (DMA) unit as illustrated in FIGS. 1 and 2. FIG. 1 illustrates a prior art data processing system 2 having a direct memory access unit (DMA 4), a processor 6, a memory 8. The processor 6, the memory 8, and the DMA 4 are coupled via multiple buses 10, where data, address and control information is transmitted over the buses 10 and controlled by an arbiter, specifically arbitrator 12. The processor 6 initiates a data transfer transaction by writing to a register in the DMA 4.

The DMA 4 includes three portions, each portion operating as an individual DMA unit. A first portion, DMA.sub.0 14, is coupled to an input/output device, I/O.sub.0 16, and is dedicated for transferring data to and from the I/O016. A second portion, DMA.sub.1 18, is coupled to an input/output device, I/O.sub.1 20, and is dedicated for transferring data to and from the I/O120. A third portion, DMA.sub.2 22, is coupled to an input/output device, I/O.sub.2 24, and is dedicated for transferring data to and from the I/O224. Each DMA portion operates as an individual DMA unit, having registers for storing source and destination addresses, length of data to be transferred, and any other information necessary to effect the transactions. Each DMA portion is coupled to a dedicated I/O unit by a bus and is responsive to requests from the I/O unit. Note that the processor 6 includes a local cache region 26, which may include data and/or instruction portions.

In operation, data processing system 2 will initiate a transaction with the DMA 4 by initializing the channel, i.e. selecting one of the I/O devices and its associated DMA portion for data transfer. The processor 6 loads registers in the DMA with control information, address pointers, and transfer counts. The processor 6 then starts the channel. In this way, the processor 6 enables one of the I/O devices, which then generates a request to the associated DMA.

In response to the request, the DMA transfers data until termination of a data block. During the data transfer phase, the DMA accepts requests for operand transfers and provides addressing and bus control for the transfers. The termination phase occurs after the operation is complete, when the DMA indicates the status of the operation in a status register. Note that each DMA portion, DMA014, DMA118, and DMA222, is dedicated to one of the I/O devices, I/O016, I/O120, and I/O224. In other words, the sources are each coupled to a dedicated DMA. Therefore each DMA portion, DMA014, DMA118, and DMA222, includes address registers, and control registers, as well as a storage location for transfer counts. The processor 6 enables one or more of the I/O devices, which then generates a request to the associated DMA.

In the data processing system 2 of FIG. 1, the DMA 4 is configured with dedicated channels coupling each I/O device to an associated portion of the DMA. In contrast, FIG. 2 illustrates a data processing system 28 having a multiplexer (MUX 30) that selects one I/O device for input to DMA 32. The I/O devices include I/O.sub.0 34, I/O.sub.1 36, and I/O.sub.2 38, that are coupled to MUX 30. The data processing system 28 includes a processor 40 having a local cache 42. A register 44 is coupled to the MUX 30, where the register 44 provides selection control for the MUX 30. The data processing system 28 includes an arbitrator 46, a memory 48, and buses 50 for transferring address, data and control information. The DMA 32, the arbitrator 46, the processor 40, and the memory 48 are each coupled to the buses 50. In this embodiment, the processor 40 selects the I/O device for the transfers.

In contrast to the prior art DMA methods, the present invention provides a DMA unit, i.e. DMA controller, that controls data transfers and other operations using a high-level programming language construct. In one embodiment, the DMA unit uses C-language constructs, and specifically FOR loop constructs. The C-language and the FOR loop constructs are described in detail in "The C Programming Language," by Brian W. Kernighan and Dennis M. Ritchie, published by Prentice Hall, having copyright 1988.

According to one embodiment of the present invention, the DMA unit is a user-programmable engine that interprets a series of C language FOR loop style descriptors to perform a user configurable series of data movements and manipulations. A collection of these descriptors is much like a software program. There are two types of descriptors: Loop Control Descriptors (LCDs) and Data Routing Descriptors (DRDs). These descriptors form a C language FOR loop programming for the DMA. This adds to the flexibility of prior art DMA units by off-loading compute resources from the processor, while increasing the ease of use for the programmer. Additionally, this improves performance as the FOR loop may be performed by highly optimized, dedicated purpose DMA state machines. The DMA architecture is optimized for very high throughput over complete processing generality meaning that it functions under the command of a processor.

With respect to the C-language constructs, the FOR loop has the general form:

for (<initial index value(s)>; <termination condition(s)>; <increment value(s)>) { /*loop body*/ }

where the "index initial value" initializes the loop and is therefore done before the loop proper is entered. The "termination condition" is a test that controls the loop. Note that although it is called a termination condition it is thought of as a loop continuation condition, as the loop only continues while the termination condition is true. After each iteration of the loop, the increment step is executed, and the termination condition is again checked. If the termination condition is false, then the loop terminates. The body of the loop may be any number of lines. Typically, in a FOR loop the initialization and increment are logically related.

As discussed hereinabove, descriptors include LCDs and DRDs. The LCDs specify the index variables, such as memory pointers, byte counters, etc. along with the termination and increment values, while the DRDs specify the nature of the loop body, i.e. how data gets pumped to and from memory and how execution units manipulate data. Inner loops may initialize and compare their loop-index variable(s) to outer loops variable(s), allowing the DMA to perform a variety of useful functions. Further, the DMA looping structure allows it to perform indirections and additions in a loop's loop-index initialization, adding flexibility to the available functions. The DMA also supports nested looping within this programming model.

As an example, a DMA program, listed as a sequence of LCDs and DRDs, is as follows:

LCD1 for(i=0; i<3; i++) { LCD2 for (j=0; j<i; j++) DRD2 *j = *i; DRD1 *i = 5; }

Each line in the DMA program above represents a successive memory location occupied by the indicated LCD or DRD.

In this example, LCD1 provides the initialization value, termination condition, and step size for a FOR loop. The variable i is initialized to zero (0) and the loop continues iterations while i is less than three (3). On each iteration, the variable i is incremented. Nested inside this outer FOR loop is a FOR loop with another loop index variable, j, which is initialized to zero (0) and is incremented on each iteration of this inner loop. The DRD information is the body of the loop. On each inner-loop iteration, variable i is used as a memory address of data that is to be moved to the memory location addressed by the variable j.

Similarly, on each outer-loop iteration, variable i holds the address of the memory location into which a value of five (5) is to be written. While this is a straightforward example, it illustrates the use of LCDs and DRDs as building blocks to construct programs or algorithms within the DMA controller.

The DRDs are descriptors that describe assignment statements within the body of the loop in terms of data flow and manipulation. For example, data in the body of the loop may be multiplied together, in which case the DRD will specify data flow through a multiplier for this operation. Similarly, if a logical operation, such as an AND, is indicated, the DRD will specify a data flow through a logic unit for completing this operation. The body of the loop may include a number of levels and combinations of these type functions.

Using the C language constructs and structures, the present invention allows simple encoding of a broad range of applications, including but not limited to simple peripheral to memory transfers, simple memory to memory transfers, simple one-dimensional processing of data, functions of two or more variables, filtering algorithms, such as finite impulse response (FIR), and infinite impulse response (IIR), and also scatter-gather processing via the indirection capability. Additional processing available includes but is not limited to sample format conversion, data decompression, bit-block transfers, color conversion, as well as drawing characters. Note that program model memory regions may exist in any memory-mapped space within the data processing system.

To better understand the utilization of programming constructs to implement such applications, it is desirable to define a few terms. According to the present invention, a "descriptor" is a piece of information, typically a predetermined number of bits, that describes a portion of an algorithm, or location of data, or information relating to any other function to be performed within the DMA. This is in contrast to prior art descriptors that were used to indicate memory locations for data movement, but did not include descriptive information necessary to execute an algorithm. A "task" as used throughout this description is a collection of LCD and DRD descriptors that embodies a desired function. A task could include the steps of gathering an ethernet frame, performing some function on the data content, and storing the result in memory. Additionally, the task could complete by interrupting the processor. The DMA will support multiple enabled tasks simultaneously.

A "task table" is a region in memory that contains pointers to each of the DMA program model components on a per-task basis. A register within the DMA, referred to as a "TASKBAR" or task table base address register, gives the location of the task table itself. The entries in the task table define task begin and end pointers, its variable table pointer and other task-specific information. Alternate embodiments may include a subset of this information or may include additional information. The task table points to the tasks in a "task descriptor table," that is a task-specific region in memory containing descriptors that specify the sequence of events for each task. Each task has its own private variable table. FIG. 12 illustrates the interaction of the TASKBAR, the task table, the task descriptor table, and the variable table according to one embodiment. Alternate embodiments may store the LCDs and DRDs in an alternate arrangement or using an alternate storage means.

FIG. 3 illustrates a data processing system 52 according to one embodiment of the present invention. The data processing system 52 includes a DMA 54, referred to as a SMART DMA, that performs direct memory access transactions incorporating a user-programmable algorithm. The DMA 54 includes a memory portion 56 for storing operational information, such as descriptor storage and/or data buffers. The DMA 54 and the memory portion 56 are both coupled to a communication bus 58, where the communication bus 58 is also coupled to a plurality of input-output (I/O) devices each having a FIFO, including I/O.sub.0 60, I/O.sub.1 62, through I/O.sub.n 64.

Each of the I/O devices is coupled to the DMA 54. The communication bus 58 is also coupled to an arbitrator 66, where communication bus 58 is used to communicate address and control information, plus data and tag information. The DMA 54 provides information to the arbitrator 66 via a DMA master bus 68.

The data processing system 52 also includes a processor 70 coupled to the arbitrator 66 via a processor master bus 72. Address and control information, along with data information is communicated via the processor master bus 72. The processor 70 includes a local cache 74, including instruction(s) and/or data.

In the data processing system 52, the processor 70 initializes the memory regions relating to the DMA 54, including registers, and tables for storing descriptor information and task information. This involves filling the appropriate memory locations. Initialization is performed at start-up and may also be performed at any time, such as on occurrence of an error condition or to reconfigure the system. This information may be stored in the memory portion 56 or in memory 76 or in a combination of both.

FIG. 4 details the DMA 54 where connections to the I/O devices are indicated as REQUESTS 0, 1, through n. The DMA 54 includes a priority decoder 78 that communicates with a master DMA engine (MDE 80) and an address and data sequencer (ADS 82). The MDE is coupled to the ADS by way of a loop control bus 84. The DMA 54 also includes a data routing pool (DRP 86) coupled to a plurality of execution units, including EU.sub.0 88, EU.sub.1 90, through EU.sub.n 92. The priority decoder 78 provides an active requestor to the ADS 82.

A task is started by setting predetermined enable bit(s) within the DMA 54, in response to which the DMA 54 accesses the memory locations where descriptor and task information is stored. The set enable bit(s) indicate a task number corresponding to a task to be performed. Note that in the present embodiment, multiple tasks may be identified by the set bit(s), where each of the multiple tasks is enabled. The DMA 54 first reads a register in the MDE 80, labeled as TASKBAR 94, that provides information relating to an origin location within a task table stored in memory, either memory 76 or the memory portion 56. Note that the TASKBAR register may be located in another functional block, where the information is accessible by the MDE 80. The MDE 80 calculates the origin location in the task table. These registers and tables are further detailed hereinbelow with respect to FIGS. 12 and 13. The task table stores multiple task-specific pointers to at least one task descriptor table. In one embodiment, the pointers include task descriptor start and end pointers, a variable table pointer, a function descriptor base address, configuration bit(s), status information, base address for context save space, and literal-initialization LCD base information.

The task descriptor table stores algorithmic descriptors, which are used to implement a user-programmable algorithm. In one embodiment, the algorithm is written in C-language constructs, and each task is composed of at least one FOR loop. Each FOR loop is made up of at least one loop control descriptor (LCD) and at least one data routing descriptor (DRD). The DRD defines the body of the loop, while the LCD provides the initialization value(s), the increment(s), and the termination condition(s) for the FOR loop.

The DMA 54 retrieves the task information from the task descriptor table corresponding to the task identified by the enable bit(s). The DMA 54 then parses the task information. Parsing involves retrieving LCD and DRD information from the Task Description Table, reading the first loop, decoding at least a portion of the C-encoded LCDs and DRDs stored in the task information, and determining a requester. The parsing is performed within the MDE 80 and provides the decoded information to the ADS 82. The decoded information then initializes loop index, termination, and increment registers within the ADS 82. The parsed task information identifies a requestor, and the MDE 80 waits to receive a request from that requester before instructing the ADS 82 to begin processing. Operation of the ADS 82 and the MDE 80 are further detailed hereinbelow.

Continuing with FIG. 4, one embodiment of the invention allows for dynamic request selection. Here, multiple requests are provided to the priority decoder 78 from multiple I/O devices. The priority decoder 78 selects from among the inputs. The initiator/task registers determine which task to parse and process based on the selected requester, where the selection is made according to information contained within the DRDs of the active task.

The priority decoder 78 selects a highest priority requester for processing based on a priority table. The selection is made of those requesters that are currently making a request to the DMA 54. The priority table allows for a predetermined number of priority levels. The priority decoder 78 includes registers that define associated task number and the priority of each request unit. The priority decoder 78 provides handshake signals in response to request inputs. The priority decoder 78 allows programming of each task for external request, priority of request, auto-start of task, interrupts, etc. In alternate embodiments, priority may be based on schemes such as round robin, time sliced, first in, fixed, last in, etc.

An association is made from a requestor to a specific task number, in the present embodiment numbers 0 to 15. The specific task is executed until the initiator removes the request. Note that while a task is executing, a higher priority requester may interrupt the task. Interruptions occur at loop iteration boundaries.

Upon receiving a request from the priority decoder 78, the ADS 82 reads data according to the order specified in the DRD retrieved from the memory 76. Note that data may be retrieved from an EU, an internal loop register, or a memory read. If the data is routed to an EU, the data is run through a predetermined data path in the DRP 86 according to descriptor information. As illustrated in FIG. 4, data flows from the DRP 86 to the appropriate one or more of the execution units. Each of the execution units has a specific assigned function for a given DRD. In this way, they are configurable and may be user programmed by changing the information in the DRD. This adds flexibility to the DMA 54 by providing a means of implementing any combination of these functions. From the execution unit, manipulated data flows to the DRP 86 for further routing to another of the execution units or to memory or to an internal loop register via the ADS 82.

As discussed hereinabove, the DRD descriptors provide information relating to the body of the loop in terms of data flow and manipulation. If the DRD specifies that two terms are to be multiplied together and then the result is to be logically ANDed with another term, the ADS 82 will first route data through the DRP 86 to the particular execution unit that performs the multiplication. The output of the execution unit is provided via a data bus back to the DRP 86. The ADS 82 then directs data via the DRP 86 to the execution unit that performs the AND operation. The result of this execution unit is then provided back to the DRP 86, which routes the result to the ADS 82. The result is then stored in the memory or to loop register as specified in the body of the loop. To facilitate this data flow, each execution unit, EU088, EU190, through EUn 92 is coupled to and receives data from the DRP 86 via data bus 96, data bus 98, through data bus 100 respectively. Similarly, each execution unit, EU088, EU190, through EUn 92 is coupled to and provides data to the DRP 86 via data bus 102, data bus 104, through data bus 106 respectively.

The present invention presents a data flow-through execution unit, where the function of the execution unit is assigned and then data is pumped through it. This saves processing time and adds to the flexibility of data processing.

The ADS 82 provides information to the DRP 86 via bus 108, and receives information from the DRP 86 via bus 110. The ADS 82 provides address control information to a memory interface unit 112 via bus 114. The memory interface unit 112 is coupled to the DMA master bus 68 and the communication bus 58. The memory interface unit 112 is bidirectionally coupled to ADS 82 via bus 116. The ADS 82 reads data from and writes data to the memory interface unit 112 via bus 116. The memory interface unit 112 provides information directly to the DRP 86 via bus 118.

The ADS 82 also includes register(s) 120 for writing control information for each task. The ADS 82 is basically the engine that pumps data through the DMA 54. Based on configuration bits set by the MDE 80 per the application program, the ADS 82 fetches as many operations as required and optionally routes them to the execution units. The ADS 82 evaluates termination conditions and stores the result in memory or elsewhere. The ADS 82 may store results internally in loop-index registers. Similarly, results may be provided to an EU as an operands. Operation of the ADS 82 is controlled by a state machine, as is operation of the MDE 80.

FIG. 5 illustrates the state machine operation of the ADS 82. Basically, the ADS 82 performs loop execution control and sequences data movement. At state 122 function descriptors are loaded into each of the execution units. The execution unit function descriptors, or EUFDs, specify the operation to be performed. This may be a Boolean operation, a multiplication, an addition, an error check, a data compression or decompression, or any operation implementable by an execution unit. An EUFD is loaded into each execution unit as needed specified by the function numbers in the DRDs. Once all of the function descriptors are loaded, the DMA 54 transitions to state 124 to read operand(s). The operands are the data values used in the body of the FOR loop. Note that in an alternate embodiment, the EUs may store EUFD information internally, avoiding the need to load EUFDs at state 122.

From state 124, if the DRD indicates that an execution unit is to be used, the DMA 54 transitions to state 126, and the DRP 86 passes data to that execution unit. The DMA 54 loops through state 124 and state 126 until all required operands are retrieved. For memory to memory transfers, the DMA 54 transitions from state 124 to state 128 to write data to memory. Similarly, after all required operands are retrieved, the DMA 54 transitions from state 126 to state 128. Once the iteration is complete, the DMA 54 transitions back to state 124. Note that data may not be written at state 128, but data may be retained for later processing. Basically, the DRD describes the following cases: no action; write data to memory; just accumulate; or write to internal register, such as a loop-index register.

FIG. 3 describes one mode of operation referred to as "precise-mode." In precise mode, consistent with C language program execution, the loop increments are performed at the end of the loop. If there are multiple indexes, the increments are clustered at the end of the loop. In an alternate mode, referred to as an "imprecise mode," the loop increment is performed as the index is used within the loop, thus s