|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates generally to the field of digital data processing
systems.
2. Description of the Prior Art
A typical digital data processing system includes three basic elements,
namely a processor element, a memory element, and an input/output element.
The memory element stores information in addressable storage locations.
This information includes both data and instructions for processing the
data. The processor element includes one or more digital data processing
units, or "processors", each of which causes information to be
transferred, or fetched, to it from the memory element, interprets the
incoming information as either instructions or data, and processes the
data in accordance with the instructions. The results are then stored in
addressed locations in the memory element.
The input/output element also communicates with the memory element in order
to transfer information into the system and to obtain the processed data
from it. Units comprising the input/output element normally operate in
accordance with control information supplied to it by the processor
element. The control information defines the operation to be performed by
the input/output unit. At least one class of operations performed by an
input/output unit is the transfer of user information, that is,
information used by a user program, between the input/output unit and the
memory element. Typical units comprising the input/output element include,
for example, printers, teletypewriters, and video display terminals, and
may also include secondary information storage devices such as disk or
tape storage units.
In addition to functioning as input/output devices, disk storage units and,
sometimes, tape storage units may also function as part of the memory
element. In particular, a memory element typically includes a main memory,
whose contents are accessible to the processor relatively quickly but
which is generally relatively high-cost storage. Modern main memories are
typically implemented using MOS or bipolar semiconductor technology and
may provide on the order of a fraction of a megabyte to several tens of
megabytes of storage.
In many digital data processing systems, the processor (assuming only one
processor), mass storage devices and other input/output devices all
communicate with a single main memory or only a few main memory modules.
This may produce contention for the main memory which can interfere with
the processor's ability to quickly obtain information from the main
memory. This, in turn, can slow the processor's ability to execute
programs. The contention problem is exacerbated if all of the units are
connected to a single input/output bus, as all information that is
transferred must be transferred over the single bus.
Accordingly, in many modern computer systems, the processor includes a
cache memory, which is a small private memory accessible only to the
processor which stores information from the most recently-requested
locations in main memory and from nearby locations. In typical data
processing systems, when the processor requests an item of information
from a location in the main memory, it will oftentimes require the
contents of adjacent locations shortly thereafter. Accordingly, when the
processor is able to request information from the main memory, it requests
more than it needs at that immediate time, with the expectation that it
will likely need at least some of the remaining information shortly
thereafter. When the processor gets the item information it then needs, it
can immediately begin using it, and if it turns out that the processor can
use the other information that was received, it will have that information
stored in the cache, and will not have to wait until it is obtained from
the main memory.
Typically a cache memory is organized into blocks each capable of storing a
predetermined amount of information. When information has been retrieved
from main memory and loaded into a cache block, that block is assigned an
address, termed a "tag". The tag corresponds to the address of the
corresponding locations in main memory from which the information was
retrieved; thus the blocks of the cache are identified with the locations
in the main memory. When the processor requires information, the lags in
the cache can be examined to determine whether a block contains the
requested information. If one does, the information is obtained from the
cache; otherwise, the processor retrieves the information from the main
memory.
A number of problems arise if the data processing system is a
multiprocessing system, that is, if it has a number of processors each of
which has access to the memory and each of which has a cache memory. For
example, under some circumstances, it will be necessary to indicate to
other processors that one processor has updated a location in memory which
has been cached. Otherwise, the processors which may have cached the
updated data may operate on stale data from their caches.
In addition, under some circumstances data retrieved by a processor should
not be cached. For example, if the processor is performing a
read-modify-write operation, it will read the contents of a location,
modify it and write the modified data to the same location. Usually, data
retrieved during a read-modify-write operation will not be cached.
Similarly, data retrieved from input/output units should not be cached.
Since processors normally cache retrieved data, it will be desirable to
indicate to a retrieving processor when the data should not be cached.
SUMMARY OF THE INVENTION
The invention provides a new processor for use in a digital data processing
system.
The processor includes a circuit which receives a signal from external
circuitry regulating caching of the data. If the signal is asserted, the
processor does not store the received data in the cache.
BRIEF DESCRIPTION OF THE DRAWINGS
This invention is pointed out with particularity in the appended claims.
The above and further advantages of this invention may be better
understood by referring to the following description taken in conjunction
with the accompanying drawings, in which:
FIG. 1A is a general block diagram of a digital data processing system
which incorporates the invention, and FIG. 1B is an organizational block
diagram of a processor used in the system depicted in FIG. 1A;
FIG. 2, comprising FIGS. 2A through 2D, is a timing diagram useful in
understanding the invention;
FIGS. 3A, 3B and 3C, are block diagrams of a portion of the processor
depicted in FIG. 1B particularly relating to the transfer of information
through the data path;
FIG. 4A is a detailed block diagram, and FIGS. 4B-1 and 4B-2, are more
detailed circuit diagrams, of a portion of the processor depicted in FIG.
1 particularly relating to the translation of virtual addresses into
physical addresses;
FIG. 5 is a detailed block diagram of a portion of the processor depicted
in FIG. 1B particularly relating to the retrieval of data from the cache
memory; and
FIG. 6 is a detailed block diagram of a portion of the processor depicted
in FIG. 1B particularly relating to the circuits for controlling transfers
with other portions of the system.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
General Description
Referring to FIG. 1, a data processing system including the invention
includes, as basic elements, a central processor unit (CPU) 10, a memory
11 and one or more input/output subsystems 12 (one input/output subsystem
is shown in FIG. 1). A bus 13 interconnects the CPU 10, memory 11 and
input/output subsystems 12 in parallel. The CPU 10 executes instructions
that are stored in addressable storage locations in the memory 11. The
instructions identify operations that are to be performed on operands,
which are also stored in addressable locations in the memory unit. The
instructions and operands are fetched by the CPU 10 as they are needed,
and processed data are returned for storage in the memory 11. The CPU 10
also transmits control information to the input/output subsystems 12,
enabling them to perform selected operations, such as transmitting data to
or retrieving data from the memory 11. Such data may include instructions
or operands which may be transmitted to the memory 11 or processed data
which is retrieved from the memory 11 for storage or display.
An operators console 14 serves as the operator's interface. It allows the
operator to examine and deposit data, halt the operation of the CPU 10 or
step the CPU 10 through a sequence of instructions and determine the
responses of the CPU 10 in response thereto. It also enables an operator
to initialize the system through a boot strap procedure, and perform
various diagnostic tests on the entire data processing system.
The data processing system may include several types of input/output
input/output units 20, including disk and tape secondary storage units,
teletypewriters, video display terminals, line printers, telephone and
computer network units, and the like. All of these units communicate with
the bus 13 over a device bus 21 through one or more controllers 22. A
controller 22, the device bus 21 to which it is connected, and the
input/output units 20 which communicate with the controller defines one
input/output subsystem 12.
The memory 11 includes a memory controller 15, which is connected directly
to the bus 13 and to a plurality of arrays 17. The arrays 17 contain a
plurality of addressable storage location in which information is stored.
The memory controller 15 receives transfer requests from the CPU 10 or
from an input/output subsystem 12 over the bus 13. Several types of
transfer requests may be transmitted over bus 13, which fall into two
general categories. In one category, information is written into, or
stored in, a storage location, and in the other category, information is
retrieved, or read, from a storage location.
The system depicted in FIG. 1 also includes a write buffer 23 which
connects to bus 13 and memory controller 15 and intercepts write transfer
requests which are directed to by CPU 10 to memory 11. In that system,
memory controller 15 does not respond to write requests which are
transmitted over the bus 13 by either the CPU 10 or the input/output
controller 22. In particular, the write buffer 23 buffers the write
information, including both the data to be written and the associated
addresses identifying the locations in arrays 17 into which the data is to
be stored. When the memory controller can accept a write operation, the
write buffer transmits the address and associated data over a private bus
24 to the memory controller 15, which proceeds to enable the arrays 17 to
store the data in the location identified by the address. Thus, if the
rate of transmission of write data by the CPU 10 over bus 13 becomes too
great for the memory 11 to accept, the write buffer 23 can buffer the
requests until the memory 11 can accept them. The memory controller 15 is
also connected directly to bus 13 to respond to read requests from the CPU
10 or input/output controller 22 and return read data thereto.
It will be appreciated by those skilled in the art that a write buffer 23
can be advantageously used in a uniprocessor system as depicted in FIG. 1,
but it will be most advantageously used in a multiprocessor system (not
shown). In a multiprocessor system, the memory 11 will receive read and
write requests from a number of CPUs and associated input/output
subsystems 12. To avoid delaying processing by a CPU 10 waiting to perform
a write operation, the write buffer 23 takes the write address and data
and the CPU 10 can resume processing.
The write buffer further includes circuits for monitoring read requests
over the bus 13 from the CPU 10. If the write buffer 23 determines that a
read request has been transmitted over the bus 13 which identifies data
which it is buffering and which it has not yet transferred to the memory
11, it inhibits, over its private bus 24, the memory controller from
responding to the request. Instead, the write buffer 23 transmits the
requested data over the bus 13 to complete the read operation.
The system depicted in FIG. 1 also includes a system control circuit 25
that, under control of the CPU 10, performs arbitration operations thereby
regulating access of the various input/output subsystems 12 to the bus 13
if more than one is present in the system.
The CPU 10 includes a processor 30 and an optional floating point processor
31. As is typical, the floating point processor is an option and need not
be present in a digital data processing system or CPU 10 constructed in
accordance with the invention. The floating point processor includes
circuits which are optimized for processing instructions on selected types
of data, namely data in floating point formats. Typically, the processor
30 can process the same data, but it requires more time to perform the
processing.
A detailed functional block diagram of one processor 30 used in the system
is depicted in FIG. 1B. With reference to FIG. 1B, processor 30 includes a
bus interface circuit 33 which connects to various control lines of bus 13
(collectively indicated by reference numeral 13A) and transmits and
receives signals over the various lines of the bus as described below. The
bus interface circuit also connects to an internal IDEAL bus 34 which
transfers signals to and from a cache 35, a data path 36, a memory
management unit 37, and a processor control circuit 40. A bus interface
circuit 33 for one embodiment of processor 30 will be described below in
connection with FIG. 6.
A number of registers also connect to the internal IDAL bus 34 and, under
control of the bus interface circuit 33 transfers data between the
internal IDEAL bus 34 and DAL lines 50 of bus 13. Specifically, under
control of the bus interface unit 33, a write data register 250 and a
write address register 251 receive, respectively, write data and the
address of the location in memory 11 or input/output units 12 in which the
write data is to be stored. At appropriate times, as described below, the
bus interface unit 33 enables the contents of these registers to be
transmitted through a multiplexer 253 onto the DAL lines 50 to perform a
write operation. Similarly, under control of the bus interface unit 33, a
read address register 252 receives an address of a location containing
data to be read. At an appropriate time, the bus interface unit 33 enables
the contents of the read address register 252 to be coupled through
multiplexer 253 onto the DAL lines 50 to perform a read operation. The
read data is latched in an input register 254, also under control of the
bus interface unit 33. The bus interface unit 33 may enable the contents
of the input register 254 to be coupled, as RCV DAT received data signals,
onto the internal IDEAL bus 34.
The processor control circuit 40 decodes program instructions which are
retrieved from the memory 11 and in successive processing cycles enables
the data path 36 to perform the arithmetic and logical operations which
are required to execute the instruction. The data path 36 includes a set
of registers 255 for storing data to be processed and arithmetic and logic
circuits 256 for performing the processing. The data path 36 will be
described in more detail below in connection with FIGS. 3A and 3B.
One embodiment of processor 30 uses virtual addresses and provides virtual
address translation circuits 37 for translating the virtual addresses to
physical addresses. The virtual address translation circuits include a set
of source registers 257 which receive the virtual addresses from other
circuits in processor 30, most notably the data path 36, and a translation
buffer 260 which includes some translation information. Translations are
performed as necessary under control of the processor control circuit 40.
Physical addresses are coupled from the translation circuits 37 onto the
internal IDAL bus 34 through a multiplexer 261. The data path 36 may also
include physical addresses, and provides a second source input for
multiplexer 261. The processor control circuit 40 controls multiplexer
261.
Cache memory 35 is a conventional information storage circuit in a CPU 10.
Cache memories are described in K. Hwang and F. Briggs, Computer
Architecture And Parallel Processing (McGraw-Hill, 1984), Section 2.4, pp.
98, et seq, and V. Hamacher, Computer Organization (McGraw-Hill, 1984),
Section 8.6, pp. 306, et seq. Cache memory 35 includes a data storage area
38 comprising a plurality of storage locations. The data storage area 38
is organized into blocks, with each block containing two storage
locations. Each storage location stores one word of information, that is,
the amount of information which may be transferred over bus 13 at one
time. In one specific embodiment, a word of information corresponds to
four bytes, or thirty-two binary digits, of information. Thus, each block
can store eight bytes of information.
As described below more fully in connection with FIG. 5, cache memory 35
includes hit/miss logic circuits 262 which determines when a physical
address generated by the virtual address translation circuits corresponds
to an address in the cache memory 35. The low order portion of the virtual
address from the source registers 257, in one embodiment the VA SRCE (8:3)
signals, is coupled through a multiplexer 264 to select one block in the
data storage area, and the associated tags 41 entry. The hit/miss logic
circuits 262 then determine whether the contents of the associated tags 41
entry corresponds to the translated physical address. If there is such a
correspondence, the hit/miss logic generates an asserted HIT signal which
is transmitted to the bus interface unit 33. If the bus interface unit 33
does not receive an asserted HIT signal, it enables, in a conventional
manner, an operation over bus 13 to retrieve the contents of the addressed
location. If the HIT signal is asserted, the bus interface unit 33 does
not enable the operation over bus 13, but instead allows the data from the
cache data storage area 38 to be transmitted through a multiplexer 263
over the internal IDAL bus 34. Generally, such data will be transmitted to
the data path 36.
As will be appreciated by those skilled in the art, the information stored
in a block in the cache memory 35, when it is received from the memory
unit 11, is a copy of the information stored in the memory unit 11. Each
block in the cache memory 35 has an associated tag 41 whose contents are
established by the bus interface circuit 33 to identify the locations in
the memory unit 11 from which the information was copied. In addition,
each block includes a stale flag 42 which is reset, or cleared, by the bus
interface circuit to indicate whether or not the contents of the block are
in fact copies of the locations identified by the tag, that is, whether or
not the contents of the block are stale.
In one embodiment of cache memory 35 the data storage area 38, tags 41 and
flags 42 are dynamic memories. A refresh counter 262, under control of the
bus interface unit 33 generates refresh addresses which are coupled
through multiplexer 264 to refresh the dynamic memories.
An instruction may contain one or more operand specifiers which identify
the location of the operand in the registers in the data path 36, or which
identify an address which identifies the location of the operand in the
virtual address space. See, for example, U.S. Pat. No. 4,236,206, for a
Central Processor Unit For Executing Instructions Of Variable Length,
issued to W. D. Strecker, et al., on Nov. 25, 1980. The processor control
circuit 40, in conjunction with the data path, decodes each operand
specifier to identify the locations of the operands, and then proceeds to
obtain them from the identified locations. An operand specifier may itself
contain the operand (that is, the operand specifier may be a "literal"),
the operand specifier may identify one of the data path's registers (not
shown) as containing the operand.
Alternatively, the operand may be in a location in the program's virtual
memory space, and the operand specifier may indicate how to determine that
location. If the operand is in the virtual memory space, the control
circuit 40 enables the memory management circuit 37 to translate the
virtual address to the physical address. After the physical address of the
operand has been obtained, the bus interface 33 obtains the operand. It
first determines whether the operand is in the cache memory 35. If the
operand is in the cache memory, the bus interface transmits the operand to
the data path 36. On the other hand, if the operand is not in the cache
memory 35, the bus interface circuit 33 transmits a read request over the
bus 13 to the memory 11 to retrieve the operand. After all of the operands
have been obtained, the data path 36 may perform the operation required by
the instruction.
The operand specifier may also identify the location into which processed
data is to be stored. The control circuit 40 and memory management circuit
37 are used in the same way as described above to determine the physical
address. If the processed data is to be stored in memory 11, the bus
interface 33 performs the required write operation over bus 13. In
addition, if the physical address corresponds to an appropriate tag in
cache 35, the bus interface 33 enables the data to be stored in the cache
35.
The bus interface unit 33 includes a state machine 270, which controls the
transfer of data over bus 13, and an IDEAL state machine 271, which
controls the transfer of data over internal IDEAL bus 34. The bus
interface unit also controls an FPP logic circuit 272 which, in turn,
controls communications with the floating point processor 31. The bus
interface unit 33 will be described in more detail below in connection
with FIG. 6.
Operations Over Bus 13
The bus 13 includes a number of lines for transferring signals representing
information among the various units connected to it. In particular, bus 13
includes DAL (31:0) data address lines 50, which carry DAT data and ADRS
address signals. If the CPU 10, specifically the processor 30, is
initiating a transfer, making t the bus master for the transfer, processor
30 first transmits the ADRS address signals over the DAL (31:0) data
address lines 50 and contemporaneously transmits TR TYPE (2:0) transfer
type command signals on lines 52, which indicate whether the transfer
operation is a read or a write operation. A short time later, sufficient
to allow the ADRS address signals and TR TYPE (2:0) transfer type command
signals to settle, the processor 30 then asserts an ADRS STR address
strobe signal on a line 51
When the ADRS STR address strobe signal is asserted, all of the other units
connected to bus 13 receive and decode the ADRS address and TR TYPE (2:0)
transfer type command signals, with the unit containing the location
identified by the ADRS address signals being the responding unit, or
slave, for the transfer. If the transfer operation is a write operation
and the ADRS address signals identify a location in the memory 11, the
write buffer 23 is the slave unit). A selected time later after the
processor 30 asserts the ADRS STR address strobe signal, it removes the
ADRS address signals and TR TYPE (2:0) transfer type command signals from
the respective lines.
If the transmitted TR TYPE (2:0) transfer type command signals define a
write operation, the master unit then transmits data signals over the
lines 50, and then asserts a DATA STR data strobe signal on a line 53. The
slave unit then receives and stores the transmitted data. When the data
has been stored, the addressed unit then asserts a RDY ready signal on a
line 54 if the operation was completed without error, or an ERR error
signal on a line 55 if an error occurred during the storage operation.
If, on the other hand, the transmitted TR TYPE (2:0) transfer type command
signals define a read operation, the slave unit retrieves the data from
the location identified by the address signals, transmits them over the
DAL (31:0) data address lines 50, and transmits an asserted RDY ready
signal over line 54. In response, the processor 30 receives the data and
transmits an asserted DATA STR data strobe signal over line 53.
In either a read or a write operation, after the slave has asserted the RDY
ready signal or the ERR error signal if an error occurred during the
transfer, the processor 30 negates the DATA STR data strobe signal. The
slave unit then negates the RDY ready or ERR error signal, and then the
processor 30 negates the ADRS STR address strobe signal to complete the
transfer.
Units connected to bus 13 other than processor 30 may constitute bus
masters and initiate transfers with the memory 11 thereover. The
input/output subsystem 12, and in particular, their input/output
controller 22 may become bus master. To become bus master, input/output
controller 22 asserts a DMR direct memory request signal over a line 56.
The processor 30 then asserts a DMG direct memory grant signal on a line
57, which is received by the input/output controller 22. At that point the
input/output controller initiates a transfer with the memory in the same
way as described above in connection with the processor 30. The
input/output controller maintains the DMR direct memory request signal
asserted until it has completed the transfer. Thus, if the input/output
controller requires multiple transfers, it may maintain the DMR direct
memory request signal asserted until it has completed the transfers. While
the DMR direct memory request signal s asserted, the processor 30 is in a
stalled condition, that is, it monitors the signals on the various lines
of bus 13, but otherwise it does not execute any instructions.
If the system includes multiple input/output subsystems 12, separate
request signals by the input/output controllers 22 to become bus master
are transmitted to the system controller, which asserts the DMR direct
memory request signal and monitors the condition of the DMG direct memory
grant signal When the processor 30 asserts the DMG direct memory grant
signal, the system controller enables one of the input/output controllers
22 to become bus master according to any priority arbitration scheme.
Bus 13 also has a number of other lines which carry status and control
signals. A line 60 carries CLK clock signals which are used to synchronize
operations in the system. The various signals on bus 13 are timed in
response to the CLK clock signals.
A line 61 carries a CCTL cache control signal which has two functions. As
described in copending U.S. patent application Ser. No. 908,825, filed
Sept. 12, 1986, in the name of Paul Rubinfeld, for Cache Invalidate
Protocol for Digital Data Processing System, the CCTL cache control signal
is asserted by, for example, an input/output controller 22 when it is bus
master and performing a write operation to memory 11. The input/output
controller 22 asserts the CCTL signal while it is transmitting the ADRS
address signals on the DAL data address lines 50, TR TYPE transfer type
signals on lines 52 and asserting the ADRS STR address strobe signal on
line 51. When the CCTL cache control signal is asserted and the TR TYPE
transfer type signals indicate a write operation to memory 11, the bus
interface 33 checks the contents of the tags 41 of all of the cache
entries. If the ADRS signals on the DAL data address lines 50 of bus 13
correspond to the contents of a tag 41, the bus interface 33 resets the S
stale flag 42 for that cache block.
The CCTL cache control signal is also asserted by memory 11 to prevent the
processor 30 from storing data in the cache 35 that was requested during a
read operation. This may be used, for example, where memory 11 is a
multiport memory, that is, if it is being shared by several processors,
with each processor accessing the memory 11 over a separate bus, and the
data being retrieved is from a set of addressable storage locations that
are available to all of the processors. It is undesirable to have such
data stored in the cache 35 since another processor may update the
contents of the shared locations and, since the updates are not over bus
13 they cannot be detected by the processor 30. If the processor 30 used
such data from the cache, it may not correspond to the contents of the
appropriate locations in memory. In connection with this use of the CCTL
cache control signal, the memory 11 asserts the CCTL cache control signal
contemporaneously with its transmission of the data over the DAL data
address lines 50, and maintains the CCTL cache control signal asserted
until it removes the data.
Bus 13 also includes a line 62 which carries a CLR WRT BUF clear write
buffer signal. The CLR WRT BUF clear write buffer signal is asserted by
the processor 30 in response to certain conditions internal to processor
30 which would not be otherwise detectable outside of processor 30. For
example, the processor 30 asserts the CLR WRT BUF clear write buffer
signal when it executes an instruction which causes it to switch process
contexts or when it starts to execute an interrupt service routine or an
exception routine. The CLR WRT BUF clear write buffer signal is controlled
by a field in microinstructions that are generated by the processor
control circuit 40 while executing those instructions.
When the CLR WRT BUF clear write buffer signal is asserted, the write
buffer 23 determines whether it contains data to be stored in memory 11.
If it does not, it does nothing. However, if the write buffer 23 does
contain data to be stored in memory 11, it asserts the DMR direct memory
request signal and continues to attempt to store its remaining data in the
memory 11. In response to the asserted DMR direct memory request signal,
the processor asserts the DMG direct memory grant signal, which is ignored
by the write buffer 23, and it also stalls. The write buffer 23 maintains
the DMR direct memory request signal in the asserted condition until all
of the data which it contains has been properly stored in memory 11. If no
error occurs in the storage, the write buffer 23 then negates the DMR
direct memory request signal allowing the processor 30 to continue.
lf an error does occur during a write to memory 11, the write buffer 23
signals an error to the processor, allowing the processor 30 to process
routines to locate and correct the error within the current context. This
greatly simplifies error recovery. If the processor is allowed to switch
contexts before an error is detected, it would be difficult to determine
the context which initially generated the data. Error recovery is
simplified if the context can be identified, and so the write buffer 23
prevents the processor from switching contexts until all of the data from
the current context has been properly stored in memory 11.
Transfers With Floating Point Processor 31
Processor 30 also is connected to floating point processor 31 to (1)
transfer the operation codes of floating point instructions to the
floating point processor 31 to indicate the operation to be performed, as
described below in connection with FIG. 2A, (2) enable operand data to be
transferred to the floating point processor 31 for processing as described
in connection with FIGS. 2B and 2C and (3) obtain processed data from the
floating point processor 31 as described in connection with FIG. 2D. The
processor 30 and floating point processor 31 are interconnected by two
sets of lines 70 and 71, lines 70 carrying CP STA (1:0) floating point
status signals and lines 71 carrying CP DAT (5:0) floating point data
signals. The floating point processor 31 is also connected to several
lines of bus 13, including DAL data address lines 50, line 60 for
receiving the CLK signals, line 51 for receiving the ADRS STR address
strobe signal, line 54 for receiving the RDY ready signal, line 55 for
receiving the ERR error signal, and line 57 for receiving the DMG direct
memory grant signal. The CP STA (1:0) floating point status signals and CP
DAT (5:0) floating point data signals are transmitted synchronously with
the CLK signals on line 60.
While it is idle, the floating point processor 31 repetitively samples,
synchronously with the CLK signal on line 60, the conditions of the
signals on the lines 70 and 71. When at least one of the lines 71 carries
an asserted level signal, the floating point processor 31 latches the
signals on those lines and the signals on lines 70. With reference to FIG.
2A, when the processor 30 transmits an instruction to the floating point
processor 31, it transmits at least a portion of the instruction's
operation code to the floating point processor 31 as CP DAT (5:0) floating
point data signals over lines 71 during an interval defined by a selected
number of ticks of the CLK clock signals. During the interval, in
synchronism with one of the ticks of the CLK clock signals, the floating
point processor 31 latches and stores the signals. At the end of the
interval, the processor 30 removes the signals from the lines 70 and 71.
The CP DAT (5:0) floating point data signals transmitted over lines 71 are
sufficient to identify a floating point arithmetic operation to be
performed, and also identifies the number of operands to be used in the
operation. Concurrently with the transmission of the operation information
over lines 71, other information is transmitted as the CP STA (1:0)
floating point status signals over lines 70 which provides further
information relating to floating point processing. In particular, floating
point operands may be encoded in a number of formats, termed data types,
and information as to the format of the operands is transmitted as CP STA
(1:0) floating point status signals over lines 70. In one embodiment, some
of the information as to the format of the operands is also transmitted
over the lines 71 along with the operation information.
Upon receiving the operation code, the floating point processor 31 decodes
it to determine the operation to be performed and the number of operands
which are required. The processor 30 (in response to sending the operation
code) and the floating point processor 31 (in response to receiving the
operation code) then go into a condition in which the operands are
transferred over DAL data address lines 50. The data type information is
used to identify to the floating point processor 31 the format of each of
the operands. In connection with some operand formats, more bits are
required in some operand formats than can be accommodated by a single
transfer over the DAL data address lines 50, and so multiple transfers are
required to transfer a single operand. The data type information thus also
indicates the number of transfers over DAL data address lines 50 that are
required to transfer each operand.
An operand may be stored in one of three sources, namely, in the memory 11
(FIG. 1), in the cache 35, or in the processor's registers (shown in FIG.
3A) in the data path 36. The different operands required for a single
operation may also be stored in any of the three sources. If multiple
transfers over DAL data address lines 50 are required to transfer a single
operand, however, all of the transfers are normally with respect to a
single source. FIG. 2B depicts the conditions of the signals that are
transmitted to retrieve an operand from memory and FIG. 2C depicts the
signals transmitted to transfer an operand from the cache 35 or from
registers in the data path 36. In particular, FIGS. 2B and 2C depict the
conditions of the signals to effect a single transfer over DAL data
address lines 50, and it should be recognized that multiple transfers may
be required for a single operand.
With reference to FIG. 2B, if an operand is in memory 11, the processor 30
initiates its retrieval from the memory 11. In particular, the processor
30 performs a read operation, as described above, placing the ADRS address
signals on the DAL data address lines 50 and asserts the ADRS STR address
strobe signal. Shortly thereafter, the processor 30 places CP STA (1:0)
floating point status signals on lines 70 having the binary value zero,
that is, it negates both of the CP STA (1:0) floating point status
signals. In addition, the processor 30 transmits CP DAT (5:0) floating
point data signals on lines 71 in which the CP DAT (5:4) floating point
data signals contain an address alignment code, which indicates how much
of the data transmitted over the DAL data address lines 50 is to be used
in the operand. The CP DAT (0) floating point data signal is asserted if
the operand is a short literal on the DAL (5:0) data address lines, and
otherwise the CP DAT (1) floating point data signal is asserted.
Since the floating point processor 31 has already received the operation
information in the procedure described above in connection with FIG. 2A,
it is in condition to receive an operand. The asserted CP DAT (5:0)
floating point data signal indicates to the floating point processor 31
that it is to sample the signals on selected lines of bus 13, in
particular the line 51 which carries the ADRS STR address strobe signal.
The floating point processor 31 uses the asserted condition of the ADRS
STR address strobe signal to determine that the operand is being retrieved
from the memory 11. If the ADRS STR address strobe signal is asserted when
it receives the asserted CP DAT (5:0) floating point data signal, the
floating point processor 31 latches the data signals on the DAL data
address lines 50 in response to the assertion by the memory 11 of the RDY
ready signal on line 54. The processor 30 responds with the DATA STR data
strobe signal to complete the transfer.
It will be appreciated that, if the memory 11 responds to a retrieval
request with an asserted ERR error signal instead of an asserted RDY ready
signal, the floating point processor 31 will not latch the transmitted
data signals on the DAL data address lines 50. The processor 30 performs
any required error recovery operations, such as retries, which may be
required and repeats the operation depicted in FIG. 2B.
FIG. 2C depicts a timing diagram useful in understanding the transfer of an
operand from the processor 30 to the floating point processor 31, whether
the operand is in the cache 35 or in a register in data path 36 (described
below in connection with FIG. 3A). In either case, the processor places
data signals on the DAL data address lines 50 and CP DAT (5:0) floating
point data signals having the same encoding as described above in
connection with FIG. 2B, and negates both of the CP STA (1:0) floating
point status signals. These signals are maintained by the processor 30 for
a selected number of ticks of the CLK clock signals. During that interval,
the floating point processor 31 latches the signals on the DAL data
address lines 50. If multiple transfers are required over the DAL data
address lines 50 to transfer an entire operand, the sequence depicted in
FIG. 2C is repeated.
If an operand's data type is such that multiple transfers are required over
DAL data address lines 50 to transfer an entire operand, the processor 30,
memory 11 and floating point processor 31 repeat the operations depicted
in FIGS. 2B and 2C until a complete operand is transferred.
It will be appreciated that the sequence of operations depicted in FIG. 2B
is similar to the sequence of operations depicted in FIG. 2C, with the
following difference. If the ADRS STR address strobe signal is asserted on
line 51 when the CP DAT (5:0) floating point data signal is asserted, the
floating point processor 31 uses the asserted RDY ready signal as an
indication that the operand (or portion of the operand) is then on the DAL
data address lines 50. However, if the ADRS STR address strobe signal is
not asserted when the CP DAT (5:0) floating point data signal is asserted,
the floating point processor 31 uses the assertion of the CP DAT (5:0)
floating point data signal as an indication that the operand (or portion
of the operand) is then on the DAL data address lines 50. In both cases,
the floating point processor 31 latches the signals on the DAL data
address lines 50 in synchronism with the CLK clock signals on line 60, in
the first case after receiving the RDY ready signal and in the second case
after receiving a CP DAT (5:0) floating point data signal which is
asserted.
After the operands have been transferred, the processor 30 and floating
point processor 31 go into a condition in which the processor 30 is
prepared to receive the results when the floating point processor 31 is
prepared to send them. FIG. 2D depicts a timing diagram which details the
sequence of operations used by the processor 30 and floating point
processor 31 to transfer the processed data to the processor 30. The
processed data comprises both the condition codes, which indicates whether
the result was a negative or a zero and selected other facts concerning
the result, and data signals representing the value of the computation
performed by the floating point processor 31.
With reference to FIG. 2D, initially the processor 30 transmits a signal
code over the lines 70 and 71 indicating that it is ready to receive the
processed data. In one embodiment, the CP STA (1:0) floating point status
signals are both negated, and the CP DAT (3) floating point data signal is
asserted with the others negated. Thereafter, the floating point processor
31 may transmit over lines 70 and 71.
When the floating point processor 31 is ready to transfer the processed
data, it transmits CP STA (1:0) floating point status signals representing
a code to that effect, concurrently with CP DAT (5:0) floating point data
signals representing the condition codes. The floating point processor 31
maintains these signals for a selected number of ticks of the CLK clock
signals, and then places the data signals on the DAL data address lines
50, along with a code on lines 70 and 71 to that effect. If multiple
transfers over DAL data address lines 50 to transfer the processed data
signals, the floating point processor 31 transfers them synchronously with
the CLK clock signals.
While the floating point processor 31 is processing operands and before it
has transmitted the results to the processor 30, the processor 30 may
assert the DMG direct memory grant signal to allow input/output subsystem
12 to engage in a transfer with memory 11. The floating point processor 31
monitors the condition of line 57 after the processor 30 has indicated
that it is ready to receive the processed data. If the DMG direct memory
grant signal is asserted on line 57 whe | | |