A technique for providing adaptive 128-bit load and store operations to support architecture extensions for computations on a 128-bit quadruple precision format, in which a single set of load and store instructions provides for save and restore operations on both 80-bit and 128-bit floating point register files. A 128-bit load and store instructions are utilized for moving values that are 128-bit aligned in memory. The transfer entails the movement of data between a 128-bit memory boundary and a floating point register file for register save and restore operations. In one embodiment, 80-bit registers are used and in a second embodiment 128-bit registers are used. The same instructions operate on both the 80-bit and 128-bit registers to map the content of a given register into a 128-bit boundary field in memory. A load/store unit allocates the bit positioning so that when 80-bit registers are used, the 80 bits are moved into the most significant bit positions of the 128-bit boundary field. The remaining bit positions are filled with 0s. When values are moved to memory the reverse operation is performed.
A floating-point unit of a computer includes a floating-point computation unit, floating-point registers and a floating-point status register. The floating-point status register may include a main status field and one or more alternate status fields. Each of the status fields contains flag and control information. Different floating-point operations may be associated with different status fields. Subfields of the floating-point status register may be updated dynamically during operation. The control bits of the alternate status fields may include a trap disable bit for deferring interruptions during speculative execution. A widest range exponent control bit in the status fields may be used to prevent interruptions when the exponent of an intermediate result is within the range of the register format but exceeds the range of the memory format. The floating-point data may be stored in big endian or little endian format.
A software mechanism for enabling a programmer to embed selected machine instructions into program source code in a convenient fashion, and optionally restricting the re-ordering of such instructions by the compiler without making any significant modifications to the compiler processing. Using a table-driven approach, the mechanism parses the embedded machine instruction constructs and verifies syntax and semantic correctness. The mechanism then translates the constructs into low-level compiler internal representations that may be integrated into other compiler code with minimal compiler changes. When also supported by a robust underlying inter-module optimization framework, library routines containing embedded machine instructions according to the present invention can be inlined into applications. When those applications invoke such library routines, the present invention enables the routines to be optimized more effectively, thereby improving run-time application performance. A mechanism is also disclosed using a "_fpreg" data type to enable floating-point arithmetic to be programmed from a source level where the programmer gains access to the full width of the floating-point register representation of the underlying processor.
A method and apparatus for emulating an instruction on a processor. The instruction operates on an operand in a first data format and the processor operates in a second data format. The operand is converted from the first data format to the second data format. The processor then executes the instruction in the second data format to generate a result in the second data format. The result is converted from the second data format to the first data format.
A software mechanism for enabling a programmer to embed selected machine instructions into program source code in a convenient fashion, and optionally restricting the re-ordering of such instructions by the compiler without making any significant modifications to the compiler processing. Using a table-driven approach, the mechanism parses the embedded machine instruction constructs and verifies syntax and semantic correctness. The mechanism then translates the constructs into low-level compiler internal representations that may be integrated into other compiler code with minimal compiler changes. When also supported by a robust underlying inter-module optimization framework, library routines containing embedded machine instructions according to the present invention can be inlined into applications. When those applications invoke such library routines, the present invention enables the routines to be optimized more effectively, thereby improving run-time application performance. A mechanism is also disclosed using a "_fpreg" data type to enable floating-point arithmetic to be programmed from a source level where the programmer gains access to the full width of the floating-point register representation of the underlying processor.
A method and instruction for converting a number from a floating point format to an integer format are described. Numbers are stored in the floating point format in a register of a first set of architectural registers in a packed format. At least one of the numbers in the floating point format is converted to at least one 8-bit number in the integer format. The 8-bit number in the integer format is placed in a register of a second set of architectural registers in the packed format.