Apparatus for calculating delay when executing vector tailgating instructions and using delay to facilitate simultaneous reading of operands from and writing of results to same vector register
Improved performance is obtained in computers of the type having vector registers which communicate with one or more functional units and common memory. As elements of a vector are read from a vector register for transmission to common memory or as operands to a functional unit, the vector register immediately becomes available to receive and store elements of a vector from common memory or a functional unit. The element-by-element storing takes place simultaneously with the element-by-element reading, and trails the reading by at least one element so as to not overwrite elements yet to be read. Through the use of this technique a vector register can be loaded with a vector for a subsequent operation without having to wait for the completion of the previous operation which uses the same vector register.
A system for handling distributed results in a high-performance wide-issue superscalar processor having result-forwarding capability is disclosed. The system generally includes buffer logic configured to produce write data and write information to a register file. The register file generally has a plurality of registers and is adapted to receive the write information, the write data, and read information. The register file also includes logic configured to produce the write data as read data output when the read information and the write information specify the same register. An embodiment of the disclosed register file includes multiple registers for storing data, read logic, correction logic, and muxing logic.
Vector shifting elements of a vector register by varying amounts in a single process is achieved in a vector supercomputer processor. A first vector register contains a set of operands, and a second vector register contains a set of shift counts, one shift count for each operand. Operands and shift counts are successively transferred to a vector shift functional unit, which shifts the operand by an amount equal to the value of the shift count. The shifted operands are stored in a third vector register. The vector shift functional unit also achieves word shifting of a predetermined number of vector register elements to different word locations of another vector register.
Vector shifting elements of a vector register by varying amounts in a single process is achieved in a vector supercomputer processor. A first vector register contains a set of operands, and a second vector register contains a set of shift counts, one shift count for each operand. Operands and shift counts are successively transferred to a vector shift functional unit, which shifts the operand by an amount equal to the value of the shift count. The shifted operands are stored in a third vector register. The vector shift functional unit also achieves word shifting of a predetermined number of vector register elements to different word locations of another vector register.
Method and apparatus for vector processing on a computer system. As the last element of a group of elements (called a "chunk") in a vector register is loaded from memory, the entire chunk is marked valid and thus made available for use by subsequent or pending operations. The vector processing apparatus comprises a plurality of vector registers, wherein each vector register holds a plurality of elements. For each of the vector registers, a validity indicator is provided wherein each validity indicator indicates a subset of the elements in the corresponding vector register which are valid. A chunk-validation controller is coupled to the validity indicators operable to adjust a value of the validity indicator in response to a plurality of elements becoming valid. An arithmetic logical functional unit (ALFU) is coupled to the vector registers to execute functions specified by program instructions. A vector register controller is connected to control the vector registers in response to program instructions in order to cause valid elements of a selected vector register to be successively transmitted to said ALFU, so that elements are streamed through said ALFU at a speed that is determined by the availability of valid elements from the vector registers. The ALFU optionally comprises a processor pipeline to hold operand data for operations not yet completed while receiving operands for successive operations. The ALFU also optionally comprises an address pipeline to hold element addresses corresponding to the operands for operations not yet completed while receiving element addresses corresponding to the operands for successive operations.
A plurality of special multi-element registers, called "vector registers" herein, are incorporated into a scalar computer. The vector registers are controlled to sequence the transfer of vector data between a main memory and a processing unit of the computer to occur one element at a time until an entire array of vector data has been processed. The vector registers operate concurrently with the processing unit and the main memory. A common address scheme is used between the vector registers and the scalar registers of the computer so the vector registers are visible in the scalar register address space. Pointers are used in the vector registers to keep track of the order of the array elements during processing. Vector registers are used to store intermediate results of the vector processing operations.