|
Claims  |
|
|
What is claimed is:
1. A unified memory system comprising:
a processor;
a memory controller;
a plurality of bus transactor circuits;
a shared memory port, including a memory address interface, a memory
control interface and a memory data interface, which are coupled to the
memory controller;
a processor bus which is coupled between the processor and the memory
controller;
a first multiple-bit, bidirectional system data bus which is coupled
between the memory data interface of the shared memory port, the memory
controller and the plurality of bus transactor circuits and which carries
memory data between the memory data interface and the plurality of bus
transactor circuits; and
a second multiple-bit, bidirectional system command bus which is coupled
between the memory controller and the plurality of bus transactor circuits
and which carries non-memory data, including requests for access to the
memory data interface over the data bus and memory addresses related to
the memory data, between the memory controller and the plurality of bus
transactor circuits.
2. The unified memory system of claim 1 wherein the plurality of bus
transactor circuits comprises:
a display controller which comprises a first bus interface unit coupled to
the data bus and the command bus;
a parallel input-output controller which comprises a second bus interface
unit coupled to the data bus and the command bus; and
a serial input-output controller which comprises a third bus interface unit
coupled to the data bus and the command bus.
3. The unified memory system of claim 2 wherein the first and second system
buses, the processor bus, the shared memory port, the processor, the
memory controller, the display controller, the parallel input-output
controller and the serial input-output controller are fabricated on a
single semiconductor integrated circuit.
4. The unified memory system of claim 1 wherein:
one of the plurality of bus transactor circuits comprises a display
controller which has a display queue for queueing an amount display data
received from the shared memory port over the data bus and has a watermark
output which is coupled to the memory controller, wherein the watermark
output indicates whether the amount of display data queued in the display
queue is more than or less than a predetermined amount; and
the memory controller preempts memory data transfers over the data bus by
the other of the plurality of bus transactor circuits and the processor
when the watermark output indicates the amount of display data queued in
the display queue is less than the predetermined amount.
5. The unified memory system of claim 1 wherein:
one of the plurality of bus transactor circuits comprises a display
controller which has a display queue for queueing an amount display data
received from the shared memory port over the data bus and has a watermark
output which is coupled to the memory controller, wherein the watermark
output indicates whether the amount of display data queued in the display
queue is more than or less than a predetermined amount; and
the memory controller controls access to the command bus by the processor,
the display controller and the other bus transactor circuits according to
the following priority:
the display controller has a first, highest priority when the watermark
output indicates the amount of display data queued in the display queue is
less than the predetermined amount;
the processor has a second priority which is less than the first priority;
the other bus transactor circuits have a third priority which is less than
the second priority; and
the display controller has a fourth, priority which is less than the third
priority when the watermark output indicates the amount of display data
queued in the display queue is more than the predetermined amount.
6. The unified memory system of claim 1 wherein each bus transactor circuit
comprises:
a dual port random access memory (DPRAM) having first and second ports,
wherein the first port is operably coupled to the data bus and the command
bus; and
a subsystem which is operably coupled to the second port of the DPRAM.
7. The unified memory system of claim 6 wherein each bus transactor circuit
further comprises:
a bus interface circuit which is coupled between the first port and the
data bus and between the first port and the command bus; and
a subsystem interface circuit which is coupled between the second port and
the subsystem.
8. The unified memory system of claim 7 wherein:
the bus interface circuits of at least two of the plurality of bus
transactor circuits are logically and physically identical to one another;
and
the subsystem interface circuits of the at least two bus transactor
circuits are logically and physically unique to the subsystems of the
respective bus transactor circuits.
9. The unified memory system of claim 1 wherein the memory controller
comprises means for transferring the memory data between the memory data
interface of the shared memory port and the plurality of bus transactor
circuits over the data bus and for transferring the non-memory data
between the plurality of bus transactor circuits over the command bus.
10. The unified memory system of claim 1 wherein the memory controller
comprises means for controlling access by the plurality of bus transactor
circuits to the data bus independently of access to the command bus.
11. The unified memory system of claim 1 wherein the memory controller
comprises a command queue for storing memory access commands transferred
over the command bus by the plurality of bus transactor circuits and
wherein the memory controller controls access to the data bus based on the
memory access commands stored in the command queue.
12. The unified memory system of claim 1 wherein the memory controller
comprises means for enabling a data transaction by one of the plurality of
bus transactor circuits over the data bus and for simultaneously enabling
a command transaction by another of the plurality of bus transactor
circuits over the command bus.
13. The unified memory system of claim 1 wherein:
the memory controller further comprises a plurality of load data bus
control outputs and a plurality of data bus grant control outputs; and
each bus transactor circuit comprises a load data bus control input which
is coupled to a corresponding one of the load data bus control outputs and
a data bus grant control input which is coupled to a corresponding one of
the data bus grant control outputs.
14. The unified memory system of claim 1 wherein:
the memory controller further comprises a plurality of load command bus
control outputs, a plurality of command bus grant control outputs, and a
plurality of command bus request inputs; and
each bus transactor circuit comprises a load command bus control input
which is coupled to a corresponding one of the load command bus control
outputs, a command bus grant control input which is coupled to a
corresponding one of the command bus grant control outputs, and a command
bus request output which is coupled to a corresponding one of the command
bus request inputs.
15. The unified memory system of claim 1 wherein the memory controller
comprises means for receiving memory data from the shared memory port over
the data bus and passing the memory data received from the shared memory
port to the processor over the processor bus and comprises means for
receiving memory data from the processor over the processor bus and
passing the memory data received from the processor to the shared memory
port over the data bus.
16. A method of passing data between a shared memory port, a memory
controller and a plurality of bus transactor circuits, the method
comprising:
passing memory data between the shared memory port, the memory controller
and the plurality of bus transactor circuits over a multiple-bit,
bidirectional data bus;
passing non-memory data including requests for access to the shared memory
port over the data bus and memory addresses related to the memory data,
between the memory controller and the plurality of bus transactor circuits
over a multiple-bit, bidirectional command bus;
controlling access by the plurality of bus transactor circuits to the data
bus with the memory controller based on the requests for access to the
shared memory port; and
controlling access by the plurality of bus transactor circuits to the
command bus with the memory controller independently of access to the data
bus.
17. The method of claim 16 wherein controlling access to the data bus
comprises:
passing a data bus request command from a first of the bus transactor
circuits to the memory controller over the command bus;
passing a data bus grant signal from the memory controller to the first bus
transactor circuit in response to the data bus request command; and
performing the step of passing memory data between the shared memory port
and the first bus transactor circuit over the data bus in response to the
data bus grant signal.
18. The method of claim 17 wherein passing a data bus request command
comprises:
passing a command bus request signal from the first bus transactor circuit
to the memory controller;
passing a command bus grant signal from the memory controller to the first
bus transactor circuit in response to the command bus request signal; and
passing the data bus request command from the first bus transactor circuit
to the memory controller over the command bus in response to the command
bus grant signal.
19. A single integrated circuit comprising:
a processor;
a memory controller;
a plurality of bus transactor circuits;
a shared memory port, including a memory address interface, a memory
control interface and a memory data interface, which are coupled to the
memory controller;
a processor bus which is coupled between the processor and the memory
controller;
a data bus which is coupled to the memory data interface of the shared
memory port, the memory controller and the plurality of bus transactor
circuits for passing memory data between the memory data interface and the
plurality of bus transactor circuits; and
a command bus which is coupled to the memory controller and the plurality
of bus transactor circuits for passing non-memory data, including requests
for access to the memory data interface over the data bus and memory
addresses related to the memory data, between the memory controller and
the plurality of bus transactor circuits. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to integrated circuits and, in particular, to
an integrated circuit having a unified memory architecture.
Unified memory architectures have been used for various computer
applications, such as network computers, Internet appliances and mission
specific terminal applications. In a typical unified memory architecture,
all devices requiring access to memory are coupled to a common system bus.
These devices can include a processor, an input-output device or a
graphics device, for example. A memory controller arbitrates access to
memory between the various devices.
Memory latency is a common difficulty in unified memory architectures since
each device must arbitrate for access to memory over the system bus.
Latency can be reduced by requesting bursts of data from memory. For
example, graphics devices may request bursts of display data from a frame
buffer. Since graphics devices continually supply data to a screen
display, these devices have a high bandwidth requirement and cannot easily
accommodate long memory latencies. On the other hand, processors typically
request specific data from memory or another device and then wait for the
data without giving up access to the system bus. Also, processors require
a relatively high priority. This often results in contention for the
system bus between the processor and devices having high bandwidth
requirements.
A conventional system with multiple bus masters uses an address bus and a
data bus to control the memory system. Typically, both of these busses are
arbitrated for and granted to one master at a time. Many cycles of bus
time are lost due to dead time between masters, and time required for each
master to communicate its data request to the memory controller. In
addition, the processor uses the same bus for doing "program Input/Output"
functions, which are very inefficient in terms of bus utilization.
A typical system that includes a raster scan display output for graphics
uses a second memory system for this time critical function. Not only does
this extra memory system increases cost, but the overall performance of
the system is impacted due to the need for the data to be copied from
processor memory space into the display memory space.
SUMMARY OF THE INVENTION
The unified memory system of the present invention provides a high enough
bandwidth to enable a graphics and display subsystem to use the same
memory as a processor and other bus transactor circuits. The unified
memory system includes a processor, a memory controller, a plurality of
bus transactor circuits and a shared memory port. A processor bus is
coupled between the processor and the memory controller. A first
multiple-bit, bidirectional system bus is coupled between the shared
memory port, the memory controller and the plurality of bus transactor
circuits. A second multiple-bit, bidirectional system bus is coupled
between the memory controller and the plurality of bus transactor
circuits.
Another aspect of the present invention relates to a method of passing data
between a shared memory port, a memory controller and a plurality of bus
transactor circuits, the method includes: passing memory data between the
shared memory port, the memory controller and the plurality of bus
transactor circuits over a multiple-bit, bidirectional data bus; passing
non-memory data between the memory controller and the plurality of bus
transactor circuits over a multiple-bit, bidirectional command bus;
controlling access by the plurality of bus transactor circuits to the data
bus with the memory controller; and controlling access by the plurality of
bus transactor circuits to the command bus with the memory controller
independently of access to the data bus.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an integrated circuit according to one
embodiment of the present invention.
FIG. 2 is a block diagram showing the integrated circuit coupled to a
variety external devices.
FIG. 3 is a memory map of the integrated circuit.
FIG. 4 is a more detailed block diagram of the integrated circuit,
according to one embodiment of the present invention.
FIG. 5 is a diagram illustrating inputs and outputs of a system bus
interface unit in a bus transactor circuit within the integrated circuit.
FIG. 6 is a diagram illustrating an acknowledge message format.
FIG. 7 is a diagram illustrating logical separation of a dual port RAM in
the system bus interface unit shown in FIG. 5.
FIG. 8 is a diagram illustrating a command bus message header format.
FIG. 9 is a diagram illustrating a command bus message header format for a
screen block transfer.
FIG. 10 is a table illustrating available transaction types of a command
field in the header formats of FIGS. 8 and 9.
FIG. 11 is a waveform diagram illustrating data bus timing within the
integrated circuit.
FIG. 12 is a waveform diagram illustrating command bus within the
integrated circuit.
FIG. 13 is a block diagram illustrating an example of a subsystem interface
to the DPRAM shown in FIG. 5.
FIG. 14 is a waveform diagram illustrating waveforms in the subsystem
interface shown in FIG. 13 during a PIO read.
FIG. 15 is a waveform diagram illustrating waveforms in the subsystem
interface shown in FIG. 13 during a PIO write.
FIG. 16 is a waveform diagram illustrating waveforms during outbound data
transfers.
FIG. 17 is a block diagram of a processor in the integrated circuit
according to one embodiment of the present invention.
FIG. 18 is a simplified block diagram illustrating connection of a memory
controller to the system blocks of integrated circuit 10.
FIG. 19 is a diagram illustrating inputs and outputs of the memory
controller shown in FIG. 18.
FIG. 20 is a block diagram of an interface between the memory controller
and external memory.
FIGS. 21A-21C together form a table of memory controller registers.
FIG. 22 is a table which defines each bit of a reset and status register.
FIG. 23 is a table which defines each bit of a system configuration
register.
FIG. 24 is a table which defines each bit of a memory configuration
register.
FIG. 25 is a table which defines each bit of a memory initialization and
refresh register.
FIG. 26 is a table which defines each bit of a frame configuration
register.
FIG. 27 is a table which defines each bit of frame starting tile address
and tile configuration registers.
FIG. 28 is a table which lists common frame resolution numbers.
FIG. 29 is a table which defines each bit of a display DMA control
register.
FIG. 30 is a table which defines each bit of a display DMA ID register.
FIG. 31 is a table which defines each bit of a display starting offset
register.
FIG. 32 is a table which defines each bit of a display screen size
register.
FIG. 33 is a table which defines each bit of a dither LUT register.
FIG. 34 is a diagram illustrating how pixel data is cached in a window
cache.
FIG. 35 is a table which defines each bit of a window starting address
register.
FIG. 36 is a table which defines each bit of a window size register.
FIG. 37 is a table which defines each bit of a load window cache register.
FIG. 38 is a table which defines each bit of a flush window cache register.
FIG. 39 is a table which defines each bit of a window cache status
register.
FIG. 40 is a table which defines a packer data register.
FIG. 41 is a table which defines each bit of a packer starting address
register.
FIG. 42 is a table which defines each bit of a packer data size register.
FIG. 43 is a table which defines each bit of display current address
registers.
FIG. 44 is a table which defines each bit of display remain size registers.
FIG. 45 is a table which defines each bit of a window current address
register.
FIG. 46 is a table which defines each bit of window remain registers.
FIG. 47 is a waveform diagram illustrating PIO read response timing.
FIG. 48 is a waveform diagram illustrating cache line fill response timing.
FIG. 49 is a waveform diagram illustrating PIO write timing.
FIG. 50 is a waveform diagram illustrating PIO read timing.
FIG. 51 is a waveform diagram illustrating DMA request timing.
FIG. 52 is a diagram illustrating interface signals to and from a graphics
and display subsystem within the integrated circuit.
FIG. 53 is a table indicating a DISP_LD[1:0] signal format.
FIG. 54 is a diagram of a DMA command header for Screen relative addressing
direct memory accesses (DMAs).
FIG. 55 is a block diagram of the graphics and display subsystem.
FIG. 56 is a diagram illustrating partitioning of a DPRAM in the graphics
and display subsystem.
FIG. 57 is a simplified block diagram of a data path a bus interface unit
of the graphics and display subsystem.
FIG. 58 is a simplified block diagram of a subsystem interface unit of the
graphics and display subsystem.
FIG. 59 is a block diagram of a pixel pipe section of the graphics and
display subsystem.
FIG. 60 is a block diagram of a graphics BitBLT data flow through the
graphics and display subsystem.
FIG. 61 is a block diagram of a serial subsystem in the integrated circuit.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The integrated circuit of the present invention has a unified memory and
dual bus architecture which maximizes bandwidth to and from an external
memory device while minimizing latency for individual subsystems that
compete for access to the memory device.
FIG. 1 is a block diagram of the integrated circuit of the present
invention. Integrated circuit 10 includes processor 12, memory controller
14, plurality of bus transactor circuits 15A-15C, shared memory port 20
and dual system buses 22 and 24. Processor 12 is coupled to memory
controller 14 over a bidirectional processor bus 26 which includes
processor address lines 28, processor control lines 30 and processor data
lines 32 which allow processor 12 to communicate with memory controller
14.
Memory controller 14 is coupled to shared memory port 20 and system buses
22 and 24. Shared memory port 20 includes a memory address interface 40, a
memory control interface 42 and a memory data interface 44. Memory data
interface 44 is coupled to system bus 22. Shared memory port 20 is coupled
to an external memory device 46, which can include a synchronous dynamic
random access memory (SDRAM), for example.
Bus transactor circuits 15A-15C are coupled to memory controller 14, shared
memory port 20 and to one another through multiple-bit, bidirectional
system bus 22. Bus transactor circuits 15A-15C are also coupled to one
another and to memory controller 14 through multiple-bit, bidirectional
system bus 24. System bus 22 is a data bus which carries memory data being
transmitted to and from external memory 46 by bus transactor circuits
15A-15C and processor 12 (through memory controller 14). System bus 24 is
a command bus which carries command data and programmed input-output (PIO)
data being transmitted between bus transactor circuits 15A-15C and
processor 12 (through memory controller 14).
Data bus 22 is used exclusively for transferring memory data between memory
46 and one of the bus masters. Command bus 24 is used for transferring
"requests" for memory data transfers by bus transactor circuits 15A-15C
and for PIO operations. Memory controller 14 includes a command queue for
storing the requests so that the next memory access can be started at the
earliest possible time without relying on performance or latency of
command bus 24. Access to data bus 22 results from memory controller 14
executing one of the commands that is stored in the command queue. If the
next command in the queue is for access to memory 46, data bus 22 is
automatically granted to the requesting device. A bus transactor circuit
requesting read access to memory 46 is always ready to receive the
corresponding data, and a bus transactor circuit requesting write access
is always ready to send the data.
Each bus transactor circuit 15A-15C can include a variety of devices
requiring access to external memory 46 such as another processor, a serial
input-output (I/O) subsystem, a parallel I/O subsystem and a graphics and
display subsystem.
With two system buses, including data bus 22 and command bus 24, bus
transactor circuits 15A-15C can request access to external memory 46 and
pass memory controller 14 the address of the next block of data over
command bus 24 while data is being transferred simultaneously to another
one of the bus transactor circuits or the processor over data bus 22. Bus
transactor circuits 15A-15C do not have to wait until the end of the data
transfer to pass the address of the next block of data to be transferred.
This reduces memory latency. Also, PIO data transfers are passed over
command bus 24, which leaves data bus 22 free for higher bandwidth data
transfers and therefore reduces contention on data bus 22.
The dual bus architecture of the present invention allows the system to
utilize a much greater amount of the theoretical memory performance. This
enables a graphics and display subsystem to use the same memory as the
processor and other bus transactor circuits in unified memory system. A
second memory system for display data is not required as in conventional
computer systems. This results in a significant cost savings and
performance improvement.
In one embodiment of the present invention, integrated circuit 10 is
implemented as an Application-Specific Standard Product (ASSP) for use in
Network Computer, Internet Appliance and mission specific terminal
applications. In this embodiment, integrated circuit 10 integrates many of
the common functions associated with attaching to the Internet such that
all of the functions needed for an Internet browser box can be implemented
with only the addition of memory, such as external memory 46.
For example, FIG. 2 is a block diagram showing integrated circuit 10
coupled to a variety external devices, including a peripheral component
interface (PCI) 60, an Ethernet local area network (LAN) 62, an
Interactive Services Digital Network (ISDN) network 64, a keyboard 66, a
mouse 68, a monitor or LCD panel 70, an audio digital-to-analog (D/A)
converter 72, an audio analog-to-digital converter 74, SDRAM 46, a
read-only memory 76, a serial electrically-erasable read-only memory
(EEPROM) 78, an ISO7816 compliant SmartCard interface 80, a printer 82 and
a scanner 84.
1. Physical Address Map for Integrated Circuit 10
Integrated circuit 10 has a 32--bit physical address, which allows
integrated circuit 10 to address four gigabytes of contiguous physical
memory. All internal resources, as well as system resources, are mapped
within this address space.
FIG. 3 is a memory map illustrating the division of system resources. The
starting address of each block of memory is indicated at 90, where "0x"
represents a hexadecimal number. The system resource associated with each
block of memory is indicated at 92. The quantity of memory contained in
each block of memory is indicated at 94, where "M" represents megabytes
and "G" represents gigabytes.
2. Overall System Architecture
FIG. 4 is a block diagram of integrated circuit 10 according to the
above-example. The same reference numerals are used in FIG. 4 as were used
in FIG. 1 for the same or similar elements. Integrated circuit 10 includes
a plurality of external pins, including serial I/O pins 100, PCI and
parallel I/O pins 102, display pins 104, and SDRAM pins which include
memory data pins 106 and memory address and control pins 108. Pins 106 and
108 form shared memory port 20.
Integrated circuit 10 further includes processor 12, memory controller 14,
bus transactor circuits 15A-15C, data bus 22 and command bus 24. In one
embodiment, processor 12 includes a CW4011 Microprocessor Core available
from LSI Logic Corporation, a Multiply/Shift Unit, a MMU/TLB, 16K
instruction cache, 8K data cache, and a Cache Controller/Bus Interface
Unit. The CW4011 core is a MIPS.RTM. architecture processor that
implements the R4000, MIPS.RTM. II compliant 32--bit instruction set.
Other types of processors can also be used.
Processor 12 is coupled to memory controller 14 and interrupt controller
110 through processor bus interface unit 112. As memory and interrupt
functions are closely tied to processor 12, interrupt controller 110 is
coupled to processor 12 to take advantage of an arbitration scheme geared
towards maintaining processor performance. System interrupts are funneled
through interrupt controller 110 to the processor 12. Interrupt controller
110 supports programmable priority assignments which provide flexibility
to the system design of integrated circuit 10. Processor 12 can read from
or write to any one of the bus transactor circuits 15A-15C directly over
command bus 24 via programmed I/O cycles. In most cases, data to and from
external memory 46 is transferred over data bus 22 via one of many on-chip
direct memory access (DMA) engines located in bus transactor circuits
15A-15C and memory controller 14, as described in more detail below. The
DMA capabilities serve to off-load data transfer duties from processor 12
as well as to ensure that data bus 22 is used most effectively by using
burst transfers whenever possible.
Memory controller 14 passes memory data between shared memory port 20,
processor 12 and bus transactor circuits 15A-15C over data bus 22. Memory
controller 14 passes non-memory data between processor 12 and bus
transactor circuits 15A-15C over command bus 22. For example, memory
controller 14 passes header data (data transfer requests) between memory
controller 14 and bus transactor circuits 15A-15C and passes programmed
input-output (PIO) data between processor 12 and bus transactor circuits
15A-15C over command bus 24.
Bus transactor circuits 15A-15C include bus interface units (BIUs)
120A-120C, dual port RAMs (DPRAMs) 122A-122C, subsystem interface units
(SIUs) 124A-124C and subsystems 126A-126C, respectively. Subsystems
126A-126C are also referred to as "peripheral blocks".
Subsystem 126A is a serial I/O subsystem which implements a fast Ethernet
10 Mbit/100 Mbit per second peripheral device, a four port universal
serial bus host controller, an audio-97 AC-link audio peripheral and a set
of generic programmed I/O pins. Subsystem 126B is a PCI and parallel I/O
subsystem which includes a high performance PCI interface, an IEEE 1284
compliant parallel port, and IDE/ATA-PI disk interface, provisions for
flash ROM and PCMCIA adapters, PS2 compatible keyboard and mouse inputs,
I.sup.2 C interfaces and a SmartCard interface.
Subsystem 126C is a graphics and display subsystem which supports direct
attachment to a CRT monitor or an LCD panel, such as monitor 70, shown in
FIG. 2, through red-green-blue (RGB) and digital outputs formed by display
pins 104. External memory 46, shown in FIG. 1, is coupled to SDRAM pins
106 and 106 and is used to hold a video frame buffer for display and
graphics subsystem 126C.
Each subsystem 126A-126C uses a message passing, split transaction protocol
to transfer data and control information over data bus 22 and command bus
24. Buses 22 and 24 are 64-bit, bidirectional, tri-state, buses. Each bus
transactor circuit 15A-15C has an input and output queue within DPRAMs
122A-122C for storing messages being passed to and from its subsystem and
the other bus transactor circuits 15A-15C and processor 12. Since
processor 12 requires a low latency, high speed access to memory, it has a
private port to memory controller 14 through processor bus 26 (shown in
FIG. 1).
2.1 Data and Command Bus Interfaces
Bus interface units (BIUS) 120A-120C direct traffic over buses 22 and 24 to
and from respective subsystems 126A-126C. Messages are passed between
buses 22 and 24 and subsystem interface units (SIUs) 124A-124C through bus
interface units 120A-120C and DPRAMs 122A-122C, respectively.
Typically, the operating frequency of each subsystem differs from that of
system buses 22 and 24. DPRAMs 122A-122C are the logical boundaries for
the different clock domains. In one embodiment, BIUs 120A-120C and DPRAMs
122A-122C are logically and physically identical. Although some portions
of SIUs 124A-124C are similar, subsystem specific logic is typically
required in each implementation. Thus, each SIU 124A-124C is logically and
physically unique to the corresponding subsystem.
2.1.1 Bus Interface Unit Signals
FIG. 5 is a diagram illustrating the inputs and outputs of one of the
system bus interfaces for subsystems 126A-126C. The system bus interface
includes BIU 120, DPRAM 122 and SIU 124. DPRAM 122 is divided into a
plurality of queues and forms a clock boundary 130 between BIU 120 and SIU
124. BIU 120 has the following input and output signals:
BCLK (input) is a System Bus Clock to which all bus signals are referenced.
RESET_N (input) is a System Reset signal.
DATA[63:0] (tri-state, bidirectional) is the 64-bit bi-directional data bus
22 (shown in FIGS. 1 and 4) for transferring data to and from external
memory 46.
DATA_ERR (input) is asserted by memory controller 14 when the subsystem
attempts a transaction to an invalid memory address.
DATA_LD (input) is a signal which loads the contents of data bus 22 into
DPRAM 122. This signal will be asserted by memory controller 14 when data
is to be transferred from external memory 46 to DPRAM 122. Data will be
valid on Data Bus 22 on the following clock. This signal is used for
direct memory access (DMA) data transfers from external memory to the
corresponding subsystem.
DATA_GNT (input) is a DATA_GRANT signal which is asserted by memory
controller 14 to the subsystem, indicating that BIU 120 should drive data
onto data bus 22 on the following clock. This signal is used for DMA data
transfers to external memory.
DATA_EOT (input) is a Data bus End Of Transfer signal which is asserted by
memory controller 14 on the clock cycle that precedes the last cycle of a
data transfer.
CMD[63:0] (tri-state, bidirectional) is the 64-bit bi-directional Command
Bus 24 for communicating command headers and CPU data transfers (PIO)
between memory controller 14 and each subsystem.
CMD_LD (input) is a Command Load signal which is asserted by memory
controller 14 when the CPU (processor 12) is requesting a PIO transfer to
the corresponding subsystem indicating that a valid command header will be
present on command bus 24 on the following clock.
CMD_GNT (input) is a COMMAND GRANT signal which is asserted by memory
controller 14 to indicate that BIU 120 is granted command bus 24.
CMD_PWA (output) is a PIO Write Acknowledge signal which is asserted by BIU
120 to indicate to memory controller 14 that a PIO write has been
completed.
CMD_REQ[1:0] (output) is a Command Request signal which is asserted by BIU
120 to memory controller 14 to request that Command Bus data be
transferred. The Command Request signal is coded per the following Table:
TABLE 1
00 IDLE
01 Memory Request
10 CPU Read Reply
11 Interrupt Request
2.1.2 Subsystem Interface Unit (SIU) Signals
Subsystem Interface Unit (SIU) 124 provides a synchronous interface between
DPRAM 122 and the subsystem hardware logic. SIU 124 has the following
input and output signals:
SCLK (input) is a Sub-system clock signal to which all SIU signals are
referenced.
SRESET_N (output) is an SIU Reset signal which provides a synchronized
system reset to the subsystem.
Dout[63:0] (output) is a 64-bit Data Out signal from SIU 124 to the
subsystem.
Din[63:0] (input) is a 64-bit Data In signal from the subsystem to SIU 124.
ADDin (input) is an Address in signal from the subsystem to SIU 124.
WCE (input) is a Write Clock Enable which is asserted by the subsystem
during the clock period when valid address and data are presented to SIU
124. Data will be written in DPRAM 122 on the rising edge of the clock
when WCE is asserted.
VALID_PIO (output) is a Valid Program I/O in queue signal which, when
asserted, indicates that PIO information is still being held in an Input
Command Queue in DPRAM 122. The assertion of VP_ACK will pop an entry off
the VALID_Input Command Queue. The signal VALID_PIO may remain asserted if
additional PIO requests have been loaded into the queue.
VP_ACK is a Valid PIO Acknowledge input which is asserted by the subsystem
to indicate that the top entry in the Input Command Queue has been used
and can be discarded. This signal will be used by Input Command Queue
pointers to advance to the next entry as well as to decrement the
VALID_PIO counter.
WRITE is an SIU write input which is asserted by the subsystem to indicate
that the PIO data has been decoded to be a write.
ACK_VLD is an ACK bus Valid output which is asserted by SIU 124 to indicate
that ACK_BUS[7:0] contains a valid acknowledge message. This signal will
be asserted when a data transfer begins.
AB_ACK is an ACK bus Acknowledge input which is asserted by the subsystem
to indicate that the current acknowledge message has been read and is no
longer needed.
PNTR_VLD is a Pointer Valid output which is asserted by SIU 124 to indicate
that ACK_BUS[7:0] contains an updated queue pointer. This signal will be
asserted when a data transfer completes.
ACK[7:0] is the Acknowledge Bus output which includes an Acknowledge
message sent from BIU 120 to SIU 124 to inform the subsystem when memory
requests have been completed and to provide the updated DPRAM address for
buffer queue management (may be used by the subsystems for FIFO control).
The Acknowledge message format is illustrated in FIG. 6, where "CMD" is a
command field which indicates a memory write, a memory read or an error
condition, "SSID" is a subsystem identification field, and "NEWRAMADR" is
a new address for DPRAM 122.
HEADER_ADD is a Header Queue Addition input which is asserted for one clock
when the subsystem has placed a header into a Request queue in DPRAM 122.
HQ_FULL is a Header Queue Full output which is asserted by SIU 124 when the
Request queue is full.
2.1.3 Global Signals
The following signals are global signals within integrated circuit 10,
which are not specifically shown in FIG. 5.
BIG is a Big Endian Mode signal. When asserted, BIG indicates that system
buses 22 and 24 are operating in Big endian mode (i.e. byte address 0 is
bits 63:56).
CONFIG_ENABLE is a Configuration Mode Enable. When asserted, this signal
indicates that integrated circuit 10 is in a configuration mode and that
the power-on defaults are being shifted in through a CONFIG_DIN port.
CONFIG_CLK is a signal on which configuration data is based.
CONFIG_DINx is a serial Configuration Data signal stream which is used to
establish reset defaults. Each hierarchical block will take Din, direct it
to all necessary register elements, then provide Dout.
CONFIG_Doutx is a serial Configuration Data output.
2.2 System Bus Transactions
To facilitate communications with system buses 22 and 24, DPRAM 122 is
logically separated as illustrated in FIG. 7. DPRAM 122 has a Data Queue
150A, a reserved section 150B, a Read Response Queue 150C, an Input
Command Queue 150D and a Request Queue 150E. The individual locations in
DPRAM 122 are shown at 152, and their corresponding hexadecimal addresses
are shown at 154.
The first 256 locations in DPRAM 122 define Data Queue 150A and are used to
store DMA data for the subsystem. Read Response Buffer 150C is used to
store PIO and Cache Line Fill data from the subsystem when processor 12 is
reading data from the subsystem (a CPU read cycle). Input Command Queue
150D stores incoming PIO requests from processor 12 to the subsystem.
Request Queue 150E is used for storing subsystem messages being sent to
system command bus 24.
2.2.1 Header Format
All command bus messages which are passed through Input Command Queue 150D
or Request Queue 150E commence with a header 160 which is formatted as
shown in FIG. 8. Each field of header 160 is defined below:
ERROR (Transaction Error) is a read reply error flag. In the event that a
PIO read request cannot be completed, the subsystem will return a header
with this bit set.
CMD (Command) contains the three bit Transaction type (see FIG. 10).
BCNT(7:0)/Mask (Byte Count/Write Mask). For all read operations and burst
write transfers, this field contains the number of bytes to be
transferred. For write single commands, this field indicates the byte
lanes to be written. Bit 7 corresponds to bits 63-56 of the 64-bit word,
and bit 0 corresponds to bits 7-0 of the 64-bit word.
SSID (Subsystem ID) is used for message tracking to identify the particular
subsystem associated with the message. These bits are set by the subsystem
when a memory data transfer is requested. They are undefined for PIO
headers.
RAMADR[7:0] (Ram Address) is the address offset into Data Queue 150A which
contains the data to be used for the data transfer. The most significant
bit (MSB) of the DPRAM 122 is implied by the type of transfer (i.e. DMA
data versus command data).
WRAP (Address Wrap Select) is the bit on which to wrap the RAM pointer. A
value of zero wraps on bit 0, resulting in a two word buffer. A value of 1
wraps on bit 1, providing a two bit address, resulting in a four word
buffer. A value of 7 wraps the address on bit 7, which yields a 256 word
buffer in Data Queue 150A.
DEC (decrementing burst direction), when set, instructs memory controller
14 that the memory addresses for a burst transfer should decrement.
ADDRESS [31-0] (System Address) is the physical address in the external
memory where the data will be transferred. This is a byte address, and
bits 1-0 are significant.
2.2.2 Screen Block Header Format
For graphics accesses, such as for graphics and display subsystem 126C, a
special command header is used to allow tile based DMA operations using
screen relative addressing. This header is used when memory controller 14
must perform address translation from a screen coordinate to a physical
memory location in the external memory. The header format for a special
command header 170 is shown in FIG. 9. The fields in header 170 are
defined as follows:
offset (bits [7-0]) define an X offset within a tile for a starting pixel.
offset (bits [15-8]) define a Y offset within a tile for the starting
pixel.
TileID (bits [23-16]) define a tile number with respect to a particular
frame buffer for the starting pixel.
Width (bits [31-24]) define a number of bytes per line.
Height (bits [36-32]) define the number of lines (5 bits).
Direction (bit [37]) 1=read; 0=write.
FrameID (bits [39-38]) is a frame buffer ID (e.g. front/back buffer or
overlay plane).
RAMADR (bits [47-40]) define the starting DPRAM address (8 bits) for
subsystem use.
Bits [49-48] are reserved.
BSize (bits [55-50]) define the burst size.
BSteer (bits [59-56]) are used by the subsystem on a read for byte
steering.
CMD bits ([62-60]) are set to "000" for this special header type.
ERROR (bit [63]) is always `0` for compatibility with other command
headers.
2.2.3 Transaction Types
FIG. 10 is a table which shows the transaction types supported by command
bus 22. The transaction types defined by the command CMD field in headers
160 and 170.
The dual system bus architecture of integrated circuit 10 allows for
concurrent transfers on data bus 22 and command bus 24. There are some
limitations and rules that should be adhered to, however. Concurrent
transfers on data bus 22 and command bus 24 to the same bus transactor
circuit are not supported. Memory controller 14 has the responsibility to
ensure this does not occur. One clock of bus free time is required between
data transfers into DPRAM 122 (Data_LD or CMD_LD asserted) and data
transfers out of DPRAM 122 (assertion of DATA_GNT or CMD_GNT). BIU 120
guarantees part of this requirement by not asserting CMD_REQ[1:0] during
an ongoing data phase, assuring that CMD_GNT will not be issued. Memory
controller 14 assures that a data phase (DATA_GNT) is not started until
one clock after a CMD_LD has been issued.
2.3 Bus Interface Unit (BIU)
BIU 120 controls transfers on system buses 22 and 24 by managing the input
and output message queues in DPRAM 122. BIU 120 is the transport mechanism
by which the subsystem communicates to memory controller 14 and processor
12. BIU 120 contains no subsystem specific data. All DMA and PIO
functions, such as buffer allocation, address generation, and register
processing, are maintained by the corresponding subsystem.
BIU 120 reacts to messages sent by the subsystem or memory controller
14/processor 12 and manages flow control on buses 22 and 24.
2.3.1 Data Bus Timing
Data bus 22 is used exclusively for passing data between external memory
46, through shared memory port 20, and a bus master (either processor 12
or one of the bus transactor circuits 15A-15C). FIG. 11 is a waveform
diagram illustrating the timing for a four cycle data burst on data bu | | |