WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Synchronous memory device    
United States Patent5995443   
Link to this pagehttp://www.wikipatents.com/5995443.html
Inventor(s)Farmwald; Michael (Berkeley, CA); Horowitz; Mark (Palo Alto, CA)
AbstractThe present invention includes a memory subsystem comprising at least two semiconductor devices, including at least one memory device, connected to a bus, where the bus includes a plurality of bus lines for carrying substantially all address, data and control information needed by said memory devices, where the control information includes device-select information and the bus has substantially fewer bus lines than the number of bits in a single address, and the bus carries device-select information without the need for separate device-select lines connected directly to individual devices. The present invention also includes a protocol for master and slave devices to communicate on the bus and for registers in each device to differentiate each device and allow bus requests to be directed to a single or to all devices. The present invention includes modifications to prior-art devices to allow them to implement the new features of this invention. In a preferred implementation, 8 bus data lines and an AddressValid bus line carry address, data and control information for memory addresses up to 40 bits wide.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Farmwald; Michael (Berkeley, CA); Horowitz; Mark (Palo Alto, CA)
Owner/Assignee     Rambus Inc. (Mountain View, CA)
Patent assignment
All assignments
Publication Date     November 30, 1999
Application Number     09/262,114
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     March 4, 1999
US Classification    
Int'l Classification    
Examiner     Nguyen; Tan T.
Assistant Examiner    
Attorney/Law Firm     Steinberg; Neil A. Steinberg & Whitt, LLP
Address
Parent Case     This application is a continuation of application Ser. No. 09/196,199, filed on Nov. 20, 1998 (still pending), which is a continuation of application Ser. No. 08/798,520, filed on Feb. 10, 1997 (now U.S. Pat. No. 5,841,580), which is a division of application Ser. No. 08/448,657, filed May 24, 1995 (now U.S. Pat. No. 5,638,334), which is a division of application Ser. No. 08/222,646, filed on Mar. 31, 1994 (now U.S. Pat. No. 5,513,327); which is a continuation of application Ser. No. 07/954,945, filed on Sep. 30, 1992 (now U.S. Pat. No. 5,319,755), which is a continuation of application Ser. No. 07/510,898, filed on Apr. 18, 1990 (now abandoned).
Priority Data    
USPTO Field of Search    
Patent Tags     synchronous memory
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A synchronous memory device having a memory cell array divided into a plurality of subarrays, wherein each subarray includes a plurality of subarray sections, the memory device comprising:

clock receiver circuitry to receive an external clock signal from an external bus;

clock generation circuitry, coupled to the clock receiver circuitry, to generate a first internal clock signal having a clock edge which is synchronized with the external clock signal and to generate a second internal clock signal having a clock edge which is synchronized with the external clock signal;

a first subarray section having a first internal I/O line to access data from a first memory cell location and a second internal I/O line to access data from a second memory cell location, wherein the first and second memory cell locations are in the first subarray section;

a second subarray section having a first internal I/O line to access data from a third memory cell location and a second internal I/O line to access data from a fourth memory cell location, wherein the third and fourth memory cell locations are in the second subarray section;

output driver circuitry, including a first output driver and a second output driver, to output data onto the external bus in response to a read request; and

multiplexer circuitry, coupled to the output driver circuitry, wherein:

the multiplexer circuitry couples the first internal I/O line of the first subarray section to an input of the first output driver and couples the first internal I/O line of the second subarray section to an input of the second output driver in response to the clock edge of the first internal clock signal; and

the multiplexer circuitry couples the second internal I/O line of the first subarray section to an input of the first output driver and couples the second internal I/O line of the second subarray section to an input of the second output driver in response to the clock edge of the second internal clock signal.

2. The memory device of claim 1 wherein the memory device further includes data input receiver circuitry, coupled to the external bus, to input data from the bus in response to a write request wherein the data input receiver circuitry includes:

a first data input receiver, coupled to the external bus, to latch data from the external bus in response to the clock edge of the first internal clock signal; and

a second data input receiver, coupled to the external bus, to latch data from the bus in response to the clock edge of the second internal clock signal.

3. The memory device of claim 1 wherein the first internal clock signal is generated by a delay locked loop.

4. The memory device of claim 1 wherein the first internal clock signal and the second internal clock signal are substantially complementary.

5. The memory device of claim 4 wherein the first subarray section and the second subarray section are included in the same subarray.

6. The memory device of claim 1 further including an internal register for storing a value indicative of a number of clock cycles of the external clock signal to transpire before data is driven onto the external bus in response to a read request.

7. The memory device of claim 6 wherein the value represents a fraction or a whole number of clock cycles of the external clock.

8. The memory device of claim 1 further including a first plurality of sense amplifiers and a second plurality of sense amplifiers, wherein:

data accessed from a first memory cell location is latched in a sense amplifier of the first plurality of sense amplifiers and data accessed from the second memory cell location is latched in a sense amplifier of the second plurality of sense amplifiers; and

data accessed from a third memory cell location is latched in a sense amplifier of the first plurality of sense amplifiers and data accessed from the fourth memory cell location is latched in a sense amplifier of the second plurality of sense amplifiers.

9. The memory device of claim 8 wherein the first plurality of sense amplifiers is located proximal to a first side of the first subarray section and the second plurality of sense amplifiers is located proximal to a second side of the first subarray section.

10. The memory device of claim 9 wherein the first plurality of sense amplifiers is located proximal to a first side of the second subarray section and the second plurality of sense amplifiers is located proximal to a second side of the second subarray section.

11. The memory device of claim 1 wherein the first subarray section is automatically precharged after executing the read request.

12. The memory device of claim 1 further including a plurality of sense amplifiers located proximal to the first subarray section, wherein the plurality of sense amplifiers latch data from the first memory cell location and the second memory cell location in response to the read request.

13. The memory device of claim 12 wherein the data latched in the plurality of sense amplifiers is maintained in the sense amplifiers for a subsequent read request.

14. The memory device of claim 12 wherein the data latched in the plurality of sense amplifiers is maintained in the sense amplifiers and output by the output driver circuitry in response to the subsequent read request.

15. The memory device of claim 12 wherein the data latched in the plurality of sense amplifiers is maintained in the sense amplifiers and modified in response to a write request.

16. The memory device of claim 1 wherein the external clock signal has a fixed frequency.

17. The memory device of claim 1 wherein the clock edge of the first internal clock signal is a rising edge and the clock edge of the second internal clock signal is a rising edge.

18. The memory device of claim 1 wherein the output driver circuitry outputs a predetermined amount of data defined by block size information and wherein the block size information is a binary code.

19. The memory device of claim 18 wherein the block size information and the read request are included in a request packet.

20. The memory device of claim 19 wherein the block size information and the read request are included in the same request packet.

21. A synchronous memory device having a memory cell array divided into a plurality of subarrays, including a first subarray and a second subarray wherein each subarray includes a plurality of subarray sections, the memory device comprises:

clock receiver circuitry to receive an external clock signal from an external bus;

clock generation circuitry, coupled to the clock receiver circuitry, to generate a first internal clock signal having a clock edge which is synchronized with the external clock signal and to generate a second internal clock signal having a clock edge which is synchronized with the external clock signal;

a first subarray section of a first subarray, the first subarray section having a first internal I/O line to access data from a first memory cell location and a second internal I/O line to access data from a second memory cell location, wherein the first and second memory cell locations are in the first subarray section;

a second subarray section of a second subarray, the second subarray section having a first internal I/O line to access data from a third memory cell location and a second internal I/O line to access data from a fourth memory cell location, wherein the third and fourth memory cell locations are in the second subarray section;

output driver circuitry, including a first output driver and a second output driver, to output data onto the external bus in response to a read request; and

multiplexer circuitry, coupled to the output driver circuitry, wherein:

the multiplexer circuitry couples the first internal I/O line of the first subarray section to an input of the first output driver and couples the first internal I/O line of the second subarray section to an input of the second output driver in response to the clock edge of the first internal clock signal; and

the multiplexer circuitry couples the second internal I/O line of the first subarray section to an input of the first output driver and couples the second internal I/O line of the second subarray section to an input of the second output driver in response to the clock edge of the second internal clock signal.

22. The memory device of claim 21 wherein the first internal clock signal is generated by a delay locked loop.

23. The memory device of claim 21 wherein the first internal clock signal is generated by a first delay locked loop and the second internal clock signal is generated by a second delay locked loop.

24. The memory device of claim 21 further including a first plurality of sense amplifiers and a second plurality of sense amplifiers, wherein data accessed from a first memory cell location is latched in a sense amplifier of the first plurality of sense amplifiers and data accessed from the second memory cell location is latched in a sense amplifier of the second plurality of sense amplifiers.

25. The memory device of claim 24 wherein the first plurality of sense amplifiers is located proximal to a first side of the first subarray section and the second plurality of sense amplifiers is located proximal to a second side of the first subarray section.

26. The memory device of claim 24 further including a third plurality of sense amplifiers and a fourth plurality of sense amplifiers, wherein data accessed from a third memory cell location is latched in a sense amplifier of the third plurality of sense amplifiers and data accessed from the fourth memory cell location is latched in a sense amplifier of the fourth plurality of sense amplifiers.

27. The memory device of claim 26 wherein the third plurality of sense amplifiers is located proximal to a first side of the second subarray section and the fourth plurality of sense amplifiers is located proximal to a second side of the second subarray section.

28. The memory device of claim 21 further including a plurality of sense amplifiers located proximal to the first subarray section, wherein the plurality of sense amplifiers latch data from the first memory cell location and the second memory cell location in response to the read request.

29. A memory system having a synchronous memory device coupled to a bus, the memory device having a memory cell array divided into a plurality of subarrays, wherein each subarray includes a plurality of subarray sections, the memory system comprises:

clock receiver circuitry to receive a bus clock from the bus;

clock generation circuitry to generate a first clock edge which is synchronized with the bus clock and to generate a second clock edge which is synchronized with the bus clock;

a first subarray section having a first data line to access data from a first memory cell location and a second data line to access data from a second memory cell location, wherein the first and second memory cell locations are in the first subarray section;

a second subarray section having a first data line to access data from a third memory cell location and a second data line to access data from a fourth memory cell location, wherein the third and fourth memory cell locations are in the second subarray section;

output drivers, including a first output driver and a second output driver, to output a predetermined amount of data onto the bus in response to a read request; and

multiplexer circuitry, coupled to the output drivers, wherein:

the multiplexer circuitry couples the first data line of the first subarray section to an input of the first output driver and couples the first data line of the second subarray section to an input of the second output driver in response to the first clock edge; and

the multiplexer circuitry couples the second data line of the first subarray section to an input of the first output driver and couples the second data line of the second subarray section to an input of the second output driver in response to the second clock edge.

30. The memory system of claim 29 wherein clock generation circuitry includes a delay locked loop.

31. The memory system of claim 29 further including a register for storing bus clock information indicative of a number of clock cycles of the bus clock to transpire before data is driven onto the bus in response to a read request wherein the bus clock information represents a fraction or a whole number of clock cycles of the bus clock.

32. The memory system of claim 29 wherein the bus is external to the memory device.

33. The memory system of claim 29 wherein the bus further includes a plurality of conductors terminated by an impedance to a power source.

34. The memory system of claim 29 wherein the bus further includes a plurality of low voltage swing conductors.

35. The memory system of claim 34 wherein the low voltage swing is more than 0.25 volts and less than V.sub.DD.

36. The memory system of claim 29 wherein the first subarray section and the second subarray section are included in the same subarray.

37. The memory system of claim 29 wherein the first subarray section is included in a first subarray and the second subarray section is included in a second subarray.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

An integrated circuit bus interface for computer and video systems is described which allows high speed transfer of blocks of data, particularly to and from memory devices, with reduced power consumption and increased system reliability. A new method of physically implementing the bus architecture is also described.

BACKGROUND OF THE INVENTION

Semiconductor computer memories have traditionally been designed and structured to use one memory device for each bit, or avll group of bits, of any individual computer word, where the word size is governed by the choice of computer. Typical word sizes range from 4 to 64 bits. Each memory device typically is connected in parallel to a series of address lines and connected to one of a series of data lines. When the computer seeks to read from or write to a specific memory location, an address is put on the address lines and some or all of the memory devices are activated using a separate device select line for each needed device. One or more devices may be connected to each data line but typically only a small number of data lines are connected to a single memory device. Thus data line 0 is connected to device(s) 0, data line 1 is connected to device(s) 1, and so on. Data is thus accessed or provided in parallel for each memory read or write operation. For the system to operate properly, every single memory bit in every memory device must operate dependably and correctly.

To understand the concept of the present invention, it is helpful to review the architecture of conventional memory devices. Internal to nearly all types of memory devices (including the most widely used Dynamic Random Access Memory (DRAM), Static RAM (SRAM) and Read Only Memory (ROM) devices), a large number of bits are accessed in parallel each time the system carries out a memory access cycle. However, only a small percentage of accessed bits which are available internally each time the memory device is cycled ever make it across the device boundary to the external world.

Referring to FIG. 1, all modern DRAM, SRAM and ROM designs have internal architectures with row (word) lines 5 and column (bit) lines 6 to allow the memory cells to tile a two dimensional area 1. One bit of data is stored at the intersection of each word and bit line. When a particular word line is enabled, all of the corresponding data bits are transferred onto the bit lines. Some prior art DRAMs take advantage of this organization to reduce the number of pins needed to transmit the address. The address of a given memory cell is split into two addresses, row and column, each of which can be multiplexed over a bus only half as wide as the memory cell address of the prior art would have required.

COMPARISON WITH PRIOR ART

Prior art memory systems have attempted to solve the problem of high speed access to memory with limited success. U.S. Pat. No. 3,821,715 (Hoff et. al.), was issued to Intel Corporation for the earliest 4-bit microprocessor. That patent describes a bus connecting a single central processing unit (CPU) with multiple RAMs and ROMs. That bus multiplexes addresses and data over a 4-bit wide bus and uses point-to-point control signals to select particular RAMs or ROMs. The access time is fixed and only a single processing element is permitted. There is no block-mode type of operation, and most important, not all of the interface signals between the devices are bused (the ROM and RAM control lines and the RAM select lines are point-to-point).

In U.S. Pat. No. 4,315,308 (Jackson), a bus connecting a single CPU-to a bus interface unit is described. The invention uses multiplexed address, data, and control information over a single 16-bit wide bus. Block-mode operations are defined, with the length of the block sent as part of the control sequence. In addition, variable access-time operations using a "stretch" cycle signal are provided. There are no multiple processing elements and no capability for multiple outstanding requests, and again, not all of the interface signals are bused.

In U.S. Pat. No. 4,449,207 (Kung, et. al.), a DRAM is described which multiplexes address and data on an internal bus. The external interface to this DRAM is conventional, with separate control, address and data connections.

In U.S. Pat. Nos. 4,764,846 and 4,706,166 (Go), a 3-D package arrangement of stacked die with connections along a single edge is described. Such packages are difficult to use because of the point-to-point wiring required to interconnect conventional memory devices with processing elements. Both patents describe complex schemes for solving these problems. No attempt is made to solve the problem by changing the interface.

In U.S. Pat. No. 3,969,706 (Proebsting, et. al.), the current state-of-the-art DRAM interface is described. The address in two-way multiplexed, and there are separate pins for data and control (RAS, CAS, WE, CS). The number of pins grows with the size of the DRAM, and many of the connections must be made point-to-point in a memory system using such DRAMs.

There are many backplane buses described in the prior art, but not in the combination described or having the features of this invention. Many backplane buses multiplex addresses and data on a single bus (e.g., the NU bus). ELXSI and others have implemented split-transaction buses (U.S. Pat. Nos. 4,595,923 and 4,491,625 (Roberts)). ELXSI has also implemented a relatively low-voltage-swing current-mode ECL driver (approximately 1 V swing). Address-space registers are implemented on most backplane buses, as is some form of block mode operation.

Nearly all modern backplane buses implement some type of arbitration scheme, but the arbitration scheme used in this invention differs from each of these. U.S. Pat. Nos. 4,837,682 (Culler), 4,818,985 (Ikeda), 4,779,089 (Theus) and 4,745,548 (Blahut) describe prior art schemes. All involve either log N extra signals, (Theus, Blahut), where N is the number of potential bus requestors, or additional delay to get control of the bus (Ikeda, Culler). None of the buses described in patents or other literature use only bused connections. All contain some point-to-point connections on the backplane. None of the other aspects of this invention such as power reduction by fetching each data block from a single device or compact and low-cost 3-D packaging even apply to backplane buses.

The clocking scheme used in this invention has not been used before and in fact would be difficult to implement in backplane buses due to the signal degradation caused by connector Stubs. U.S. Pat. No. 4,247,817 (Heller) describes a clocking scheme using two clock lines, but relies on ramp-shaped clock signals in contrast to the normal rise-time signals used in the present invention.

In U.S. Pat. No. 4,646,270 (Voss), a video RAN is described which implements a parallel-load, serial-out shift register on the output of a DRAM. This generally allows greatly improved bandwidth (and has been extended to 2, 4 and greater width shift-out paths.) The rest of the interfaces to the DRAM (RAS, CAS, multiplexed address, etc.) remain the same as for conventional DRAMS.

One object of the present invention is to use a new bus interface built into semiconductor devices to support high-speed access to large blocks of data from a single memory device by an external user of the data, such as a microprocessor, in an efficient and cost-effective manner.

Another object of this invention is to provide clocking scheme to permit high speed clock signals to be sent along the bus with minimal clock skew between devices.

Another object of this invention is to allow mapping out defective memory devices or portions of memory devices.

Another object of this invention is to provide a method for distinguishing otherwise identical devices by assigning a unique identifier to each device.

Yet another object of this invention is to provide a method for transferring address, data and control information over a relatively narrow bus and to provide a method of bus arbitration when multiple devices seek to use the bus simultaneously.

Another object of this invention is to provide a method of distributing a high-speed memory cache within the DRAM chips of a memory system which is much more effective than previous cache methods.

Another object of this invention is to provide devices, especially DRAMs, suitable for use with the bus architecture of the invention.

SUMMARY OF INVENTION

The present invention includes a memory subsystem comprising at least two semiconductor devices, including at least one memory device, connected in parallel to a bus, where the bus includes a plurality of bus lines for carrying substantially all address, data and control information needed by said memory devices, where the control information includes device-select information and the bus has substantially fewer bus lines than the number of bits in a single address, and the bus carries device-select information without the need for separate device-select lines connected directly to individual devices.

Referring to FIG. 2, a standard DRAM 13, 14, ROM (or SRAM) 12, microprocessor CPU 11, I/O device, disk controller or other special purpose device such an a high speed switch is modified to use a wholly bus-based interface rather than the prior art combination of point-to-point and bus-based wiring used with conventional versions of these devices. The new bus includes clock signals, power and multiplexed address, data and control signals. In a preferred implementation, 8 bus data lines and an AddressValid bus line carry address, data and control information for memory addresses up to 40 bits wide. Persons skilled in the art will recognize that 16 bus data lines or other numbers of bus data lines can be used to implement the teaching of this invention. The new bus is used to connect elements such as memory, peripheral, switch and processing units.

In the system of this invention, DRAMs and other devices receive address and control information over the bus and transmit or receive requested data over the same bus. Each memory device contains only a single bus interface with no other signal pins. Other devices that may be included in the system can connect to the bus and other non-bus lines, such as input/output lines. The bus supports large data block transfers and split transactions to allow a user to achieve high bus utilization. This ability to rapidly read or write a large block of data to one single device at a time in an important advantage of this invention.

The DRAMs that connect to this bus differ from conventional DRAMs in a number of ways. Registers are provided which may store control information, device identification, device-type and other information appropriate for the chip such as the address range for each independent portion of the device. New bus interface circuits must be added and the internals of prior art DRAM devices need to be modified so they can provide and accept data to and from the bus at the peak data rate of the bus. This requires changes to the column access circuitry in the DRAM, with only a minimal increase in die size. A circuit is provided to generate a low skew internal device clock for devices on the bus, and other circuits provide for demultiplexing input and multiplexing output signals.

High bus bandwidth is achieved by running the bus at a very high clock rate (hundreds of MHz). This high clock rate is made possible by the constrained environment of the bus. The bus lines are controlled-impedance, doubly-terminated lines. For a data rate of 500 MHz, the maximum bus propagation time is less than 1 ns (the physical bus length is about 10 cm). In addition, because of the packaging used, the pitch of the pins can be very close to the pitch of the pads. The loading on the bus resulting from the individual devices is very small. In a preferred implementation, this generally allows stub capacitances of 1-2 pF and inductances of 0.5-2 nH. Each device 15, 16, 17, shown in FIG. 3, only has pins on one side and these pins connect directly to the bus 18. A transceiver device 19 can be included to interface multiple units to a higher order bus through pins 20.

A primary result of the architecture of this invention is to increase the bandwidth of DRAM access. The invention also reduces manufacturing and production costs, power consumption, and increases packing density and system reliability.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram which illustrates the basic 2-D organization of memory devices.

FIG. 2 is a schematic block diagram which illustrates the parallel connection of all bus lines and the serial Reset line to each device in the system.

FIG. 3 is a perspective view of a system of the invention which illustrates the 3-D packaging of semiconductor devices on the primary bus.

FIG. 4 shows the format of a request packet.

FIG. 5 shows the format of a retry response from a slave.

FIG. 6 shows the bus cycles after a request packet collision occurs on the bus and how arbitration is handled.

FIGS. 7a and 7b show the timing whereby signals from two devices can overlap temporarily and drive the bus at the same time.

FIGS. 8a and 8b show the connection and timing between bus clocks and devices on the bus.

FIG. 9 is a perspective view showing how transceivers can be used to connect a number of bus units to a transceiver bus.

FIG. 10 is a block and schematic diagram of input/output circuitry used to connect devices to the bus.

FIG. 11 is a schematic diagram of a clocked sense-amplifier used as a bus input receiver.

FIG. 12 is a block diagram showing how the internal device clock is generated from two bus clock signals using a set of adjustable delay lines.

FIG. 13 is a timing diagram showing the relationship of signals in the block diagram of FIG. 12.

FIG. 14 is timing diagram of a preferred means of implementing the reset procedure of this invention.

FIG. 15 is a diagram illustrating the general organization of a 4 Mbit DRAM divided into 8 subarrays.

FIG. 16 is a block diagram representation of a set of internal registers within each device illustrated in FIG. 2.

FIG. 17 is the block diagram of the DRAM of FIG. 15 in conjunction with the output circuitry of the input/output circuitry of FIG. 10.

DETAILED DESCRIPTION

The present invention is designed to provide a high speed, multiplexed bus for communication between processing devices and memory devices and to provide devices adapted for use in the bus system. The invention can also be used to connect processing devices and other devices, such as I/O interfaces or disk controllers, with or without memory devices on the bus. The bus consists of a relatively small number of lines connected in parallel to each device on the bus. The bus carries substantially all address, data and control information needed by devices for communication with other devices on the bus. In many systems using the present invention, the bus carries almost every signal between every device in the entire system. There is no need for separate device-select lines since device-select information for each device on the bus is carried over the bus. There is no need for separate address and data lines because address and data information can be sent over the same lines. Using the organization described herein, very large addresses (40 bits in the preferred implementation) and large data blocks (1024 bytes) can be sent over a small number of bus lines (8 plus one control line in the preferred implementation).

Virtually all of the signals needed by a computer system can be sent over the bus. Persons skilled in the art recognize that certain devices, such as CPUs, may be connected to other signal lines and possibly to independent buses, for example a bus to an independent cache memory, in addition to the bus of this invention. Certain devices, for example cross-point switches, could be connected to multiple, independent buses of this invention. In the preferred implementation, memory devices are provided that have no connections other than the bus connections described herein and CPUs are provided that use the bus of this invention as the principal, if not exclusive, connection to memory and to other devices on the bus.

All modern DRAM, SRAN and ROM designs have internal architectures with row (word) and column (bit) lines to efficiently tile a 2-D area. Referring to FIG. 1, one bit of data is stored at the intersection of each word line 5 and bit line 6. When a particular word line is enabled, all of the corresponding data bits are transferred onto the bit lines. This data, about 4000 bits at a time in a 4 MBit DRAM, is then loaded into column sense amplifiers 3 and held for use by the I/O circuits.

In the invention presented here, the data from the sense amplifiers is enabled 32 bits at a time onto an internal device bus running at approximately 125 MHz. This internal device bus moves the data to the periphery of the devices where the data is multiplexed into an 8-bit wide external bus interface, running at approximately 500 MHz.

The bus architecture of this invention connects master or bus controller devices, such as CPUS, Direct Memory Access devices (DMAS) or Floating Point Units (FPUs), and slave devices, such as DRAM, SRAM or ROM memory devices. A slave device responds to control signals; a master sends control signals. Persons skilled in the art realize that some devices may behave as both master and slave at various times, depending on the mode of operation and the state of the system. For example, a memory device will typically have only slave functions, while a DMA controller, disk controller or CPU may include both slave and master functions. Many other semiconductor devices, including I/O devices, disk controllers, or other special purpose devices such as high speed switches can be modified for use with the bus of this invention.

With reference to FIG. 16, each semiconductor device contains a set of internal registers 170, preferably including a device identification (device ID) register, 171 a device-type descriptor register, 174 control registers 175 and other registers containing other information relevant to that type of device. In a preferred implementation, semiconductor devices connected to the bus contain registers 172 which specify the memory addresses contained within that device and access-time registers 173 which store a set of one or more delay times at which the device can or should be available to send or receive data.

Most of these registers can be modified and preferably are set as part of an initialization sequence that occurs when the system is powered up or reset. During the initialization sequence each device on the bus is assigned a unique device ID number, which is stored in the device ID register 171. A bus master can then use these device ID numbers to access and set appropriate registers in other devices, including access-time registers, 173 control registers, 175 and memory registers, 172 to configure the system. Each slave may have one or several access-time registers 173 (four in a preferred embodiment). In a preferred embodiment, one access-time register in each slave is permanently or semi-permanently programmed with a fixed value to facilitate certain control functions. A preferred implementation of an initialization sequence in described below in more detail.

All information sent between master devices and slave devices is sent over the external bus, which, for example, may be 8 bits wide. This is accomplished by defining a protocol whereby a master device, such as a microprocessor, seizes exclusive control of the external bus (i.e., becomes the bus master) and initiates a bus transaction by sending a request packet (a sequence of bytes comprising address and control information) to one or more slave devices on the bus. An address can consist of 16 to 40 or more bits according to the teachings of this invention. Each slave on the bus must decode the request packet to see if that slave needs to respond to the packet. The slave that the packet is directed to must then begin any internal processes needed to carry out the requested bus transaction at the requested time. The requesting master may also need to transact certain internal processes before the bus transaction begins. After a specified access time the slave(s) respond by returning one or more bytes (8 bits) of data or by storing information made available from the bus. More than one access time can be provided to allow different types of responses to occur at different times.

A request packet and the corresponding bus access are separated by a selected number of bus cycles, allowing the bus to be used in the intervening bus cycles by the same or other masters for additional requests or brief bus accesses. Thus multiple, independent accesses are permitted, allowing maximum utilization of the bus for transfer of short blocks of data. Transfers of long blocks of data use the bus efficiently even without overlap because the overhead due to bus address, control and access times is small compared to the total time to request and transfer the block.

Device Address Mapping

Another unique aspect of this invention is that each memory device is a complete, independent memory subsystem with all the functionality of a prior art memory board in a conventional backplane-bus computer system. Individual memory devices may contain a single memory section or may be subdivided into more than one discrete memory section. Memory devices preferably include memory address registers for each discrete memory section. A failed memory device (or even a subsection of a device) can be "mapped out" with only the loss of a small fraction of the memory, maintaining essentially full system capability. Mapping out bad devices can be accomplished in two ways, both compatible with this invention.

The preferred method uses address registers in each memory device (or independent discrete portion thereof) to store information which defines the range of bus addresses to which this, memory device will respond. This is similar to prior art schemes used in memory boards in conventional backplane bus systems. The address registers can include a single pointer, usually pointing to a block of known size, a pointer and a fixed or variable block size value or two pointers, one pointing to the beginning and one to the end (or to the "top" and "bottom") of each memory block. By appropriate settings of the address registers, a series of functional memory devices or discrete memory sections can be made to respond to a contiguous range of addresses, giving the system access to a contiguous block of good memory, limited primarily by the number of good devices connected to the bus. A block of memory in a first memory device or memorysection can be assigned a certain range of addresses, then a block of memory in a next memory device or memory section can be assigned addresses starting with an address one higher (or lower, depending on the memory structure) than the last address of the previous block.

Preferred devices for use in this invention include device-type register information specifying the type of chip, including how much memory is available in what configuration on that device. A master can perform an appropriate memory test, such as reading and writing each memory cell in one or more selected orders, to test proper functioning of each accessible discrete portion of memory (based in part on information like device ID number and device-type) and write address values (up to 40 bits in the preferred embodiment, 10.sup.12 bytes), preferably contiguous, into device address-space registers. Non-functional or impaired memory sections can be assigned a special address value which the system can interpret to avoid using that memory.

The second approach puts the burden of avoiding the bad devices on the system master or masters. CPUs and DNA controllers typically have some sort of translation look-aside buffers (TLBs) which map virtual to physical (bus) addresses. With relatively simple software, the TLBS can be programmed to use only working memory (data structures describing functional memories are easily generated). For masters which don't contain TLBs (for example, a video display generator), a small, simple RAM can be used to map a contiguous range of addresses onto the addresses of the functional memory devices.

Either scheme works and permits a system to have a significant percentage of non-functional devices and still continue to operate with the memory which remains. This means that systems built with this invention will have much improved reliability over existing systems, including the ability to build systems with almost no field failures.

Bus

The preferred bus architecture of this invention comprises 11 signals: BusData[0:7]; AddrValid; Clk1 and Clk2; plus an input reference level and power and ground lines connected in parallel to each device. Signals are driven onto the bus during conventional bus cycles. The notation "signal[i:j]" refers to a specific range of signals or lines, for example, BusData[0:7] means BusData0, BusData1, . . ., BusData7. The bus lines for BusData[0:7] signals form a byte-wide, multiplexed data/address/control bus. AddrValid is used to indicate when the bus is holding a valid address request, and instructs a slave to decode the bus data as an address and, if the address is included on that slave, to handle the pending request. The two clocks together provide a synchronized, high speed clock for all the devices on the bus. In addition to the bused signals, there is one other line (ResetIn, ResetOut) connecting each device in series for use during initialization to assign every device in the system a unique device ID number (described below in detail).

To facilitate the extremely high data rate of this external bus relative to the gate delays of the internal logic, the bus cycles are grouped into pairs of even/odd cycles. Note that all devices connected to a bus should preferably use the same even/odd labeling of bus cycles and preferably should begin operations on even cycles. This is enforced by the clocking scheme.

Protocol and Bus Operation

The bus uses a relatively simple, synchronous, split-transaction, block-oriented protocol for bus transactions. One of the goals of the system in to keep the intelligence concentrated in the masters, thus keeping the slaves an simple as possible (since there are typically tany more slaves than masters). To reduce the complexity of the slaves, a slave should preferably respond to a request in a specified time, sufficient to allow the slave to begin or possibly complete a device-internal phase including any internal actions that must precede the subsequent bus access phase. The time for this bus access phase is known to all devices on the bus--each master being responsible for making sure that the bus will be free when the bus access begins. Thus the slaves never worry about arbitrating for the bus. This approach eliminates arbitration in single master systems, and also makes the slave-bus interface simpler.

In a preferred implementation of the invention, to initiate a bus transfer over the bus, a master sends out a request packet, a contiguous series of bytes containing address and control information. It is preferable to use a request packet containing an even number of bytes and also preferable to start each packet on an even bus cycle.

The device-select function is handled using the bus data lines. AddrValid is driven, which instructs all slaves to decode the request packet address, determine whether they contain the requested address, and if they do, provide the data back to the master (in the case of a read request) or accept data from the master (in the case of a write request) in a data block transfer. A master can also select a specific device by transmitting a device ID number in a request packet. In a preferred implementation, a special device ID number is chosen to indicate that the packet should be interpreted by all devices on the bus. This allows a master to broadcast a message, for example to set a selected control register of all devices with the same value.

The data block transfer occurs later at a time specified in the request packet control information, preferably beginning on an even cycle. A device begins a data block transfer almost immediately with a device-internal phase as the device initiates certain functions, such as setting up memory addressing, before the bus access phase begins. The time after which a data block is driven onto the bus lines is selected from values stored in slave access-time registers 173. The timing of data for reads and writes is preferably the same; the only difference is which device drives the bus. For reads, the slave drives the bus and the master latches the values from the bus. For writes the master drives the bus and the selected slave latches the values from the bus.

In a preferred implementation of this invention shown in FIG. 4, a request packet 22 contains 6 bytes of data--4.5 address bytes and 1.5 control bytes. Each request packet uses all nine bits of the multiplexed data/address lines (AddrValid 23+BusData[0:7] 24) for all six bytes of the request packet. Setting 23 AddrValid=1 in an otherwise unused cycle indicates the start of an request packet (control information). In a valid request packet, AddrValid 27 must be 0 in the last byte. Asserting this signal in the last byte invalidates the request packet. This is used for the collision detection and arbitration logic (described below). Bytes 25-26 contain the first 35 address bits, Address[0:35]. The last byte contains AddrValid 27 (the invalidation switch) and 28, the remaining address bits, Address[36:39], and BlockSize[0:3] (control information).

The first byte contains two 4 bit fields containing control information, AccessType[0:3], an op code (operation code) which, for example, specifies the type of access, and Master[0:3], a position reserved for the master sending the packet to include its master ID number. Only master numbers 1 through 15 are allowed--master number 0 is reserved for special system commands. Any packet with Master[0:3]=0 is an invalid or special packet and is treated accordingly.

The AccessType field specifies whether the requested operation is a read or write and the type of access, for example, whether it is to the control registers or other parts of the device, such as memory. In a preferred implementation, AccessType[0] is a Read/Write switch: if it is a 1, then the operation calls for a read from the slave (the slave to read the requested memory block and drive the memory contents onto the bus); if it is a 0, the operation calls for a write into the slave (the slave to read data from the bus and write it to memory). AccessType[1:3] provides up to 8 different access types for a slave. AccessType[1:2] preferably indicates the timing of the response, which is stored in an access-time register, AccessRegN. The choice of access-time register can be selected directly by having a certain op code select that register, or indirectly by having a slave respond to selected op codes with pre-selected access times (see table below). The remaining bit, AccessType[3] may be used to send additional information about the request to the slaves.

One special type of access is control register access, which involves addressing a selected register in a selected slave. In the preferred implementation of this invention, AccessType[1:3] equal to zero indicates a control register request and the address field of the packet indicates the desired control register. For example, the most significant two bytes can be the device ID number (specifying which slave is being addressed) and the least significant three bytes can specify a register address and may also represent or include data to be loaded into that control register. Control register accesses are used to initialize the access-time registers, so it in preferable to use a fixed response time which can be preprogrammed or even hard wired, for example the value in AccessReg0, preferably 8 cycles. Control register access can also be used to initialize or modify other registers, including address registers.

The method of this invention provides for access mode control specifically for the DRAMs. One such access mode determines whether the access is page mode or normal RAS access. In normal mode (in conventional DRAMS and in this invention), the DRAM column sense amps or latches have been precharged to a value intermediate between logical 0 and 1. This precharging allows access to a row in the RAM to begin as soon as the access request for either inputs (writes) or outputs (reads) in received and allows the column sense amps to sense data quickly. In page mode (both conventional and in this invention), the DRAM holds the data in the column sense amps or latches from the previous read or write operation. If a subsequent request to access data is directed to the same row, the DRAM does not need to wait for the data to be sensed (it has been sensed already) and access time for this data is much shorter than the normal access time. Page mode generally allows much faster access to data but to a smaller block of data (equal to the number of sense amps). However, if the requested data is not in the selected row, the access time is longer than the normal access time, since the request must wait for the RAM to precharge before the normal mode access can start. Two access-time registers in each DRAM preferably contain the access times to be used for normal and for page-mode accesses, respectively.

The access mode also determines whether the DRAM should precharge the sense amplifiers or should save the contents of the sense amps for a subsequent page mode access. Typical settings are "precharge after normal access" and "save after page mode access" but "precharge after page mode access" or "save after normal access" are allowed, selectable modes of operation. The DRAM can also be set to precharge the sense amps if they are not accessed for a selected period of time.

In page mode, the data stored in the DRAM sense amplifiers may be accessed within much less time than it takes to read out data in normal mode (.sup..about. 10-20 nS vs. 40-100 nS). This data may be kept available for long periods. However, if these sense amps (and hence bit lines) are not precharged after an access, a subsequent access to a different memory word (row) will suffer a precharge time penalty of about 40-100 nS because the sense amps must precharge before latching in a new value.

The contents of the sense amps thus may be held and used as a cache, allowing faster, repetitive access to small blocks of data. DRAM-based page-mode caches have been attempted in the prior art using conventional DRAM organizations but they are not very effective because several chips are required per computer word. Such a conventional page-mode cache contains many bits (for example, 32 chips.times.4 Kbits) but has very few independent storage entries. In other words, at any given point in time the sense amps hold only a few different blocks or memory "locales" (a single block of 4K words, in the example above). Simulations have shown that upwards of 100 blocks are required to achieve high hit rates (>90% of requests find the requested data already in cache memory) regardless of the size of each block. See, for example, Anant Agarwal, et. al., "An Analytic Cache Model," ACM Transactions on Computer Systems, Vol. 7(2), pp. 184-215 (May 1989).

The organization of memory in the present invention allows each DRAM to hold one or more (4 for 4 MBit DRAMS) separately-addressed and independent blocks of data. A personal computer or workstation with 100 such DRAMs (i.e. 400 blocks or locales) can achieve extremely high, very repeatable hit rates (98-99% on average) as compared to the lower (50-80%), widely varying hit rates using DRAMS organized in the conventional fashion. Further, because of the time penalty associated with the deferred precharge on a miss of the page-mode cache, the conventional DRAM-based page-mode cache generally has been found to work less well than no cache at all.

For DRAM slave access, the access types are preferably used in the following way:

______________________________________ AccessType[1:3] Use AccessTime ______________________________________ 0 Control Register Fixed, 8[AccessReg0] Access 1 Unused Fixed, 8[AccessReg0] 2-3 Unused AccessReg1 4-5 Page Mode DRAM AccessReg2 access 6-7 Normal DRAM access AccessReg3 ______________________________________

Persons skilled in the art will recognize that a series of available bits could be designated as switches for controlling these access modes. For example:

AccessType[2]=page-mode/normal switch

AccessType[3]=precharge/save-data switch

BlockSize[0:3] specifies the size of the data block transfer. If BlockSize[0] is 0, the remaining bits are the binary representation of the block size (0-7). If BlockSize[0] is 1, then the remaining bits give the block size as a binary power of 2, from 8 to 1024. A zero-length block can be interpreted as a special command, for example, to refresh a DRAM without returning any data, or to change the DRAM from page mode to normal access mode or vice-versa.

______________________________________ BlockSize[0:2] Number of Bytes in Block ______________________________________ 0-7 0-7 respectively 8 8 9 16 10 32 11 64 12 126 13 256 14 512 15 1024 ______________________________________

Persons skilled in the art will recognize that other block size encoding schemes or values can be used.

In most cases, a slave will respond at the selected access time by reading or writing data from or to the bus over bus lines BusData[0:7] and AddrValid will be at logical 0. In a preferred embodiment, substantially each memory access will involve only a single memory device, that is, a single block will be read from or written to a single memory device.

Retry Format

In some cases, a slave may not be able to respond correctly to a request, e.g., for a read or write. In such a situation, the slave should return an error message, sometimes called a N(o)ACK(nowledge) or retry message. The retry message can include information about the condition requiring a retry, but this increases system requirements for circuitry in both slave and masters. A simple message indicating only that an error has occurred allows for a less complex slave, and the master can take whatever action is needed to understand and correct the cause of the error.

For example, under certain conditions a slave might not be able to supply the requested data. During a page-mode access, the DRAM selected must be in page mode and the requested address must match the address of the data held in the sense amps or latches. Each DRAM can check for this match during a page-mode access. If no match