Disclosed is a digital multiplier-accumulator circuit utilizing a carry save adder tree, pipeline register and carry select adder. Also disclosed is a digital multiplier circuit including a carry save adder tree and a pipeline register.
A discrete cosine transform engine is disclosed. The engine receives an input matrix of data and provides a transformed matrix of data, the input matrix and the output matrix each having a plurality of row locations and a plurality of column locations. The engine includes a plurality of input accumulators and a plurality of output accumulators. The plurality of input accumulators accumulate data from the input matrix of data in parallel to provide a plurality of transform coefficient outputs. A digital signal processor receives the plurality of transform coefficients, multiplies the transform coefficients by a plurality of transform constants and provides a plurality of transform products. Each output accumulator receives the transform products and accumulates the products to provide the transformed matrix of data.
An inner product calculating circuit for executing a calculation of an inner product on the basis of one or more vector data and one or more coefficients. The circuit comprises a selective inverter for selectively inverting individual bits of the vector data; a bit position shifter for shifting, in accordance with the coefficients, the bit positions of the vector data inverted selectively by the selective inverter; a bit supplementer for supplementing, with either "1" or "0", any vacant bit of the vector data where the bit positions have been shifted by the bit position shifter; and an accumulator for accumulating the initial values preset in conformity with the coefficients and the vector data supplemented with "1" or "0" in any vacant bit thereof by the bit supplementer. This circuit configuration is capable of eliminating the necessity of a powerful buffer or a low-order bit controller to consequently achieve a reduction in the circuit scale while realizing a fast inner product calculation merely by presetting specific values as initial values in conformity with the coefficients.
A high speed three-to-one data dependency collapsing ALU can be used to support multiple issue of instructions. The computing apparatus supports multiple issue of instructions it is useful in CISC, superscalar, superscalar RISC, etc. type computer designs. The concept of the ALU is presented along with a detailed description of a design. The apparatus allows the execution of any combination of two independent or dependent arithmetic or logical instructions in a single machine cycle. The 3-1 collapsing ALU structure has a 3-2 carry save adder (CSA); and a 2-1 control arithmetic logic unit (CALU) coupled for an input from the carry save adder; and a first pre-adder logic block coupled with an output to the control arithmentic logic unit; and a control generator; and a second controlled logic block coupled to receive an input from said control generator and having its output coupled to said control arithmetic logic unit. Instructions have an add/logical combinatorial operation which combines all four of the combinations: add-add, add-logical, logical-add, and logical-logical functions; and wherein two or more disassociated ALU operations are specified by a single interlock collapsing ALU which responds to the parallel issuance of a plurality of separate instructions, including RISC type instructions, each of which specifies ALU operations, and the computing apparatus executes the instructions in parallel in a single machine cycle.
A data processing system (10) which primarily supports fractional multiplication operations has a multiplication logic circuit (20) for executing integer multiplication functions efficiently. During an integer multiplication function, two multiplicands are multiplied together as if the multiplication function was fractional. A predetermined accumulation input is stored and shifted to the right by a Right Shift Logic circuit (32) before being added to a product of the two multiplicands. An accumulated product of the multiplication function is formed by an adder (36) and shifted to the left by a Left Shift Logic circuit (38) until the accumulated product is in integer form. Implementing an integer multiplication operation with a fractional multiplier in the data processing system requires a single software instruction.
A multiply-accumulate unit, or MAC, may achieve high throughput. The MAC need not use redundant hardware, such as multiple Wallace trees, or pipelining logic, yet may perform Wallace tree and carry look-ahead adder functions simultaneously for different operations.