WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Apparatus and method for optimizing access to memory    
United States Patent6735679   
Link to this pagehttp://www.wikipatents.com/6735679.html
Inventor(s)Herbst; Joseph (Milpitas, CA); Flippin; Allan (Brentwood, CA)
AbstractA method and apparatus for optimizing access to memory, wherein the method includes the steps of receiving a first request for access to a memory, receiving at least two additional requests for access to the memory, and determining a first clock overhead associated with the first request for access to the memory. The method further includes the steps of determining an additional clock overhead associated with each of the at least two additional requests for access to the memory in conjunction with the first request, determining a combination of requests that can be processed together using an optimized overhead, and processing the combination of requests as a single request with the optimal overhead.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 6735679
Apparatus and method for optimizing access to memory - US Patent 6735679 Drawing
Apparatus and method for optimizing access to memory
Inventor     Herbst; Joseph (Milpitas, CA); Flippin; Allan (Brentwood, CA)
Owner/Assignee     Broadcom Corporation (Irvine, CA)
Patent assignment
All assignments
Publication Date     May 11, 2004
Application Number     09/599,525
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 23, 2000
US Classification     711/167 711/168 711/169
Int'l Classification     G06F 013/00
Examiner     Elmore; Reba I.
Assistant Examiner    
Attorney/Law Firm     Squire, Sanders & Dempsey L.L.P.
Address
Parent Case     REFERENCE TO RELATED APPLICATIONS This application is a continuation in part of Ser. No. 09/343,409 filed Jun. 30, 1999, and claims benefit of Serial No. 60/095,972 filed Aug. 10, 1998, and claims benefit of Serial No. 60/092,220 filed Jul. 8, 1998, and a Provisional Patent Application Serial No. 60/144,097, filed on Jul. 16, 1999, U.S. Provisional Patent Application Serial No. 60/144,098, filed on Jul. 16, 1999, U.S. Provisional Patent Application Serial No. 60/144,283, filed on Jul. 16, 1999, U.S. Provisional Patent Application Serial No. 60/144,286, filed on Jul. 16, 1999, U.S. Provisional Patent Application Serial No. 60/144,284, filed on Jul. 16, 1999, and U.S. Provisional Patent Application Serial No. 60/144,094, filed on Jul. 16, 1999. The subject matter of these earlier filed applications is hereby incorporated by reference.
Priority Data    
USPTO Field of Search     711/167 711/168 711/169
Patent Tags     optimizing access memory
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
6427196
Adiletta
711/158
Jul,2002

[0 after 0 votes]
6335932
Kadambi
370/391
Jan,2002

[0 after 0 votes]
6253297
Chauvel
711/167
Jun,2001

[0 after 0 votes]
5920898
Bolyn
711/167
Jul,1999

[0 after 0 votes]
5784582
Hughes
710/117
Jul,1998

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method for optimizing access to memory, said method comprising the steps of:

receiving a first request for access to a memory;

receiving at least two additional requests for access to the memory;

determining a first clock overhead associated with the first request for access to the memory;

determining an additional clock overhead associated with each of the at least two additional requests for access to the memory in conjunction with the first request;

determining a combination of requests that can be processed together using an optimized overhead; and

processing the combination of requests as a single request with the optimized overhead.

2. A method for optimizing access to memory as recited in claim 1, wherein said optimized overhead further comprises a combination of said first clock overhead and said additional clock overhead.

3. A method for optimizing access to SDRAM in a network switch, said method comprising the steps of:

receiving a plurality of requests for access to an SDRAM; and

combining at least two of the plurality of requests for processing as a single request utilizing an optimized clock overhead, in accordance with a predetermined algorithm.

4. A method for optimizing access to SDRAM in a network switch as recited in claim 3, wherein the combining step further comprises the steps of:

determining a necessary clock overhead for a first request for access to SDRAM;

determining a necessary clock overhead for the remaining plurality of requests for access to SDRAM;

determining an optimal request from the remaining plurality of requests, wherein the optimal request is calculated to generate the optimized dock overhead when combined with the first request for access to SDRAM; and

processing the first request for access to the SDRAM simultaneously with the optimal request.

5. A method for optimizing access to SDRAM as recited in claim 4, wherein the step of determining an optimal request further comprises the steps of:

determining an overhead associated with individually processing each of the remaining plurality of requests in combination with the first request; and

determining which combination of requests uses the least clock overhead.

6. A method for optimizing SDRAM in a network switch, said method comprising the steps of:

receiving a first request for access to an SDRAM;

receiving a second, third, and fourth request for access to the SDRAM;

determining the clock overhead associated with processing the first request in conjunction with the second request;

determining a clock overhead associated with processing the first request in conjunction with the third request;

determining a dock overhead associated with processing the first request in conjunction with the fourth request;

determining an optimal request for access to SDRAM, said optimal request being calculated to yield a minimal clock overhead; and

processing the optimal request.

7. An apparatus for optimizing access to memory in a network switch, said apparatus comprising:

means for receiving a first request for access to a memory;

means for receiving at least two additional requests for access to the memory;

means for determining a first dock overhead associated with the first request for access to the memory;

means for determining an additional clock overhead associated with each of the at least two additional requests for access to the memory in conjunction with the first request;

means for determining a combination of requests that can be processed together using an optimized overhead; and

means for processing the combination of requests as a single request with the optimal overhead.

8. An apparatus for optimizing access to memory in a network switch as recited in claim 7, wherein said means for a first request for access to a memory and said means for receiving at least two additional requests further comprises a memory controller.

9. An apparatus for optimizing access to memory in a network switch as recited in claim 8, wherein said memory controller further comprises an SDRAM controller.

10. An apparatus for optimizing access to memory in a network switch as recited in claim 7, wherein said means for determining a first dock overhead and said means for determining an additional clock overhead further comprises a memory controller.

11. An apparatus for optimizing access to memory in a network switch as recited in claim 10, wherein said memory controller further comprises an SDRAM controller.

12. An apparatus for optimizing access to memory in a network switch as recited in claim 7, wherein said means for determining a combination of requests and said means for processing the combination of requests further comprises a memory controller.

13. An apparatus for optimizing access to memory in a network switch as recited in claim 12, wherein said memory controller further comprises an SDRAM controller.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to a method and apparatus for high performance switching in local area communications networks such as token ring, asynchronous transfer mode (ATM), ethernet, fast ethernet, and gigabit ethernet environments, generally known as local area networks (LAN). In particular, the invention relates to a new switching architecture in an integrated, modular, single chip solution, which can be implemented on a semiconductor substrate such as a silicon chip.

2. Description of the Related Art

As computer performance has increased in recent years, the demands on computer networks has significantly increased; faster computer processors and higher memory capabilities need networks with high bandwidth capabilities to enable high speed transfer of significant amounts of data. The well-known ethernet technology, which is based upon numerous Institute of Electrical and Electronic Engineers (IEEE) ethernet standards, is one example of computer networking technology which has been able to be modified and improved to remain a viable computing technology. A more complete discussion of prior art networking systems can be found, for example, in SWITCHED AND FAST ETHERNET, by Breyer and Riley (Ziff-Davis, 1996), and numerous IEEE publications relating to IEEE 802 standards. Based upon the Open Systems Interconnect (OSI) 7-layer reference model, network capabilities have grown through the development of repeaters, bridges, routers, and, more recently, "switches", which operate with various types of communication media. Thickwire, thinwire, twisted pair, and optical fiber are examples of media which has been used for computer networks. Switches, as they relate to computer networking and to ethernet, are hardware-based devices which control the flow of data packets or cells based upon destination address information which is available in each packet. A properly designed and implemented switch should be capable of receiving a packet and switching the packet to an appropriate output port at what is referred to wirespeed or linespeed, which is the maximum speed capability of the particular network. Basic ethernet wirespeed is up to 10 megabits per second, and Fast Ethernet is up to 100 megabits per second. The newest ethernet is referred to as gigabit ethernet, and is capable of transmitting data over a network at a rate of up to 1,000 megabits per second. As speed has increased, design constraints and design requirements have become more and more complex with respect to following appropriate design and protocol rules and providing a low cost, commercially viable solution. For example, high speed switching requires high speed memory to provide appropriate buffering of packet data; conventional Dynamic Random Access Memory (DRAM) is relatively slow, and requires hardware-driven refresh. The speed of DRAMs, therefore, as buffer memory in network switching, results in valuable time being lost, and it becomes almost impossible to operate the switch or the network at linespeed. Furthermore, external central processing unit (CPU) involvement should be avoided, since CPU involvement also makes it almost impossible to operate the switch at linespeed. Additionally, as network switches have become more and more complicated with respect to requiring rules tables and memory control, a complex multi-chip solution is necessary which requires logic circuitry, sometimes referred to as glue logic circuitry, to enable the various chips to communicate with each other. Additionally, cost/benefit tradeoffs are necessary with respect to expensive but fast static random access memory (SRAM) versus inexpensive but slow DRAMs. Additionally, DRAMs, by virtue of their dynamic nature, require refreshing of the memory contents in order to prevent losses thereof. SRAMs do not suffer from the refresh requirement, and have reduced operational overhead which compared to DRAMs such as elimination of page misses, etc. Although DRAMs have adequate speed when accessing locations on the same page, speed is reduced when other pages must be accessed.

Referring to the OSI 7-layer reference model discussed previously, and illustrated in FIG. 7, the higher layers typically have more information. Various types of products are available for performing switching-related functions at various levels of the OSI model. Hubs or repeaters operate at layer one, and essentially copy and "broadcast" incoming data to a plurality of spokes of the hub. Layer two switching-related devices are typically referred to as multiport bridges, and are capable of bridging two separate networks. Bridges can build a table of forwarding rules based upon which MAC (media access controller) addresses exist on which ports of the bridge, and pass packets which are destined for an address which is located on an opposite side of the bridge. Bridges typically utilize what is known as the "spanning tree" algorithm to eliminate potential data loops; a data loop is a situation wherein a packet endlessly loops in a network looking for a particular address. The spanning tree algorithm defines a protocol for preventing data loops. Layer three switches, sometimes referred to as routers, can forward packets based upon the destination network address. Layer three switches are capable of learning addresses and maintaining tables thereof which correspond to port mappings. Processing speed for layer three switches can be improved by utilizing specialized high performance hardware, and off loading the host CPU so that instruction decisions do not delay packet forwarding.

SUMMARY OF THE INVENTION

The present invention is related to a method for optimizing access to memory, wherein the method includes the steps of receiving a first request for access to a memory, receiving at least two additional requests for access to the memory, and determining a first clock overhead associated with the first request for access to the memory. The method further includes the steps of determining an additional clock overhead associated with each of the at least two additional requests for access to the memory in conjunction with the first request, determining a combination of requests that can be processed together using an optimized overhead, and processing the combination of requests as a single request with the optimal overhead.

The present invention is further related to a method for optimizing access to sychronous dynamic random access memory (SDRAM) in a network switch, wherein the method includes the steps of receiving a plurality of requests for access to an SDRAM, and combining at least two of the plurality of requests for processing as a single request utilizing an optimal clock overhead, in accordance with a predetermined algorithm.

The present invention is further related to a method for optimizing SDRAM in a network switch including the steps of receiving a first, second, third, and fourth requests for access to the SDRAM. Determining the clock overhead associated with processing the first request in conjunction with the second request, determining a clock overhead associated with processing the first request in conjunction with the third request, and determining a clock overhead associated with processing the first request in conjunction with the fourth request. Finally, the method includes the steps of determining an optimal request for access to SDRAM, wherein the optimal request is calculated to yield a minimal clock overhead, and thereafter processing the optimal request.

The present invention is further related to an apparatus for optimizing access to memory in a network switch, wherein the apparatus includes a means for receiving a first request for access to a memory, a means for receiving at least two additional requests for access to the memory, and a means for determining a first clock overhead associated with the first request for access to the memory. Further, the apparatus includes a means for determining an additional clock overhead associated with each of the at least two additional requests for access to the memory in conjunction with the first request, a means for determining a combination of requests that can be processed together using an optimized overhead, and a means for processing the combination of requests as a single request with the optimal overhead.

BRIEF DESCRIPTION OF THE DRAWINGS

The objects and features of the invention will be more readily understood with reference to the following description and the attached drawings, wherein:

FIG. 1 is a general block diagram of elements of the present invention;

FIG. 2 is a more detailed block diagram of a network switch according to the present invention;

FIG. 3 illustrates the data flow on the cell protocol sideband (CPS) channel of a network switch according to the present invention;

FIG. 4A illustrates demand priority round robin arbitration for access to the C-channel of the network switch;

FIG. 4B illustrates access to the Cell-channel (C-channel) based upon the round robin arbitration illustrated in FIG. 4A;

FIG. 5 illustrates Protocol-channel (P-channel) message types;

FIG. 6 illustrates a message format for Sideband-channel (S-channel) message types;

FIG. 7 is an illustration of the OSI 7 layer reference model;

FIG. 8 illustrates an operational diagram of an ethernet port interface controllers (EPIC) module;

FIG. 9 illustrates the slicing of a data packet on the ingress to an EPIC module;

FIG. 10 is a detailed view of elements of the pipelined management unit (PMMU);

FIG. 11 illustrates the common buffer manager (CBM) cell format;

FIG. 12 illustrates an internal/external memory admission flow chart;

FIG. 13 illustrates a block diagram of an egress manager 76 illustrated in FIG. 10;

FIG. 14 illustrates more details of an EPIC module;

FIG. 15 is a block diagram of a fast filtering processor (FFP);

FIG. 16 is a block diagram of the elements of CPU management internet controller (CPIC) 40;

FIG. 17 illustrates a series of steps which are used to program an FFP;

FIG. 18 is a flow chart illustrating the aging process for address resolution logic (ARL) (L2) and L3 tables;

FIG. 19 illustrates communication using a trunk group according to the present invention;

FIG. 20 is a detailed illustration of the Memory Management Unit (MMU);

FIG. 21 is a timing diagram for the MMU;

FIG. 22 is a timing diagram for the slot free address pool (SFAP) to the SDRAM Scheduler;

FIG. 23 is a timing diagram for the slot assembly unit (SAU) to the SDRAM Scheduler;

FIG. 24 is a timing diagram for the SDRAM Scheduler to the slot disassembly unit (SDU);

FIG. 25 is a timing diagram for the SDRAM Controller interface;

FIG. 26 is a timing diagram for the SDRAM Controller DATA Write first-in-first-out (FIFO);

FIG. 27 is a timing diagram for the SDRAM Controller DATA Read FIFO;

FIG. 28 illustrates the first and second word formats;

FIG. 29 illustrates number of words within SAU and SDRAM that correspond to four possible two bit cell sizes;

FIG. 30 illustrates the SAU word format;

FIG. 31 illustrates a data storage configuration;

FIG. 32 illustrates the timing for the logical configuration shown in FIG. 31;

FIG. 33 is a flowchart for receiving a cell within status, location & budget manager (SLBM);

FIG. 34 is a flowchart local accrual process;

FIG. 35 is a flowchart of the global accrual process;

FIG. 36 is a flowchart of the continue local accrual process;

FIG. 37 is a flowchart of the continue global accrual process;

FIG. 38 is an illustration of flow control; and

FIG. 39 is a further illustration of the Memory Management Unit.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 illustrates a configuration wherein a switch-on-chip (SOC) 10, in accordance with the present invention, is functionally connected to external devices 11, external memory 12, fast ethernet ports 13, and gigabit ethernet ports 15. For the purposes of this embodiment, fast ethernet ports 13 will be considered low speed ethernet ports, since they are capable of operating at speeds ranging from 10 Mbps to 100 Mbps, while the gigabit ethernet ports 15, which are high speed ethernet ports, are capable of operating at 1000 Mbps. External devices 11 could include other switching devices for expanding switching capabilities, or other devices as may be required by a particular application. External memory 12 is additional off-chip memory, which is in addition to internal memory which is located on SOC 10, as will be discussed below. CPU 52 can be used as necessary to program SOC 10 with rules which are appropriate to control packet processing. However, once SOC 10 is appropriately programmed or configured, SOC 10 operates, as much as possible, in a free running manner without communicating with CPU 52. Because CPU 52 does not control every aspect of the operation of SOC 10, CPU 52 performance requirements, at least with respect to SOC 10, are fairly low. A less powerful and therefore less expensive CPU 52 can therefore be used when compared to known network switches. As also will be discussed below, SOC 10 utilizes external memory 12 in an efficient manner so that the cost and performance requirements of memory 12 can be reduced. Internal memory on SOC 10, as will be discussed below, is also configured to maximize switching throughput and minimize costs.

It should be noted that any number of fast ethernet ports 13 and gigabit ethernet ports 15 can be provided. In one embodiment, a maximum of 24 fast ethernet ports 13 and 2 gigabit ports 15 can be provided. Similarly, additional interconnect links to additional external devices 11, external memory 12, and CPUs 52 may be provided as necessary.

FIG. 2 illustrates a more detailed block diagram of the functional elements of SOC 10. As evident from FIG. 2 and as noted above, SOC 10 includes a plurality of modular systems on-chip, with each modular system, although being on the same chip, being functionally separate from other modular systems. Therefore, each module can efficiently operate in parallel with other modules, and this configuration enables a significant amount of freedom in updating and re-engineering SOC 10.

SOC 10 includes a plurality of Ethernet Port Interface Controllers (EPIC) 20a, 20b, 20c, etc., a plurality of Gigabit Port Interface Controllers (GPIC) 30a, 30b, etc., a CPU Management Interface Controller (CMIC) 40, a Common Buffer Memory Pool (CBP) 50, a Pipelined Memory Management Unit (PMMU) 70, including a Common Buffer Manager (CBM) 71, and a system-wide bus structure referred to as CPS channel 80. The PMMU 70 communicates with external memory 12, which includes a Global Buffer Memory Pool (GBP) 60. The CPS channel 80 comprises C channel 81, P channel 82, and S channel 83. The CPS channel is also referred to as the Cell Protocol Sideband Channel, and is a 17 Gbps channel which glues or interconnects the various modules together. As also illustrated in FIG. 2, other high speed interconnects can be provided, as shown as an extendible high speed interconnect. In one embodiment of the invention, this interconnect can be in the form of an interconnect port interface controller (IPIC) 90, which is capable of interfacing CPS channel 80 to external devices 11 through an extendible high speed interconnect link. As will be discussed below, each EPIC 20a, 20b, and 20c, generally referred to as EPIC 20, and GPIC 30a and 30b, generally referred to as GPIC 30, are closely interrelated with appropriate address resolution logic and layer three switching tables 21a, 21b, 21c, 31a, 31b, rules tables 22a, 22b, 22c, 31a, 31b, and virtual LAN (VLAN) tables 23a, 23b, 23c, 31a, 31b. These tables will be generally referred to as 21, 31, 22, 32, 23, 33, respectively. These tables, like other tables on SOC 10, are implemented in silicon as two-dimensional arrays.

In a preferred embodiment of the invention, each EPIC 20 supports 8 fast ethernet ports 13, and switches packets to and/or from these ports as may be appropriate. The ports, therefore, are connected to the network medium (coaxial, twisted pair, fiber, etc.) using known media connection technology, and communicates with the CPS channel 80 on the other side thereof. The interface of each EPIC 20 to the network medium can be provided through a Reduced Media Internal Interface (RMII), which enables the direct medium connection to SOC 10. As is known in the art, auto-negotiation is an aspect of fast ethernet, wherein the network is capable of negotiating a highest communication speed between a source and a destination based on the capabilities of the respective devices. The communication speed can vary, as noted previously, between 10 Mbps and 100 Mbps; auto negotiation capability, therefore, is built directly into each EPIC module. The address resolution logic (ARL) and layer three tables (ARL/L3) 21a, 21b, 21c, rules table 22a, 22b, 22c, and VLAN tables 23a, 23b, and 23c are configured to be part of or interface with the associated EPIC in an efficient and expedient manner, also to support wirespeed packet flow.

Each EPIC 20 has separate ingress and egress functions. On the ingress side, self-initiated and CPU-initiated learning of level 2 address information can occur. Address resolution logic (ARL) is utilized to assist in this task. Address aging is built in as a feature, in order to eliminate the storage of address information which is no longer valid or useful. The EPIC also carries out layer 2 mirroring. A fast filtering processor (FFP) 141 (see FIG. 14) is incorporated into the EPIC, in order to accelerate packet forwarding and enhance packet flow. The ingress side of each EPIC and GPIC, illustrated in FIG. 8 as ingress submodule 14, has a significant amount of complexity to be able to properly process a significant number of different types of packets which may come in to the port, for linespeed buffering and then appropriate transfer to the egress. Functionally, each port on each module of SOC 10 has a separate ingress submodule 14 associated therewith. From an implementation perspective, however, in order to minimize the amount of hardware implemented on the single-chip SOC 10, common hardware elements in the silicon will be used to implement a plurality of ingress submodules on each particular module. The configuration of SOC 10 discussed herein enables concurrent lookups and filtering, and therefore, processing of up to 6.6 million packets per second. Layer two lookups, Layer three lookups and filtering occur simultaneously to achieve this level of performance. On the egress side, the EPIC is capable of supporting packet polling based either as an egress management or class of service (COS) function. Rerouting/scheduling of packets to be transmitted can occur, as well as head-of-line (HOL) blocking notification, packet aging, cell reassembly, and other functions associated with ethernet port interface.

Each GPIC 30 is similar to each EPIC 20, but supports only one gigabit ethernet port, and utilizes a port-specific ARL table, rather than utilizing an ARL table which is shared with any other ports. Additionally, instead of an RMII, each GPIC port interfaces to the network medium utilizing a gigabit media independent interface (GMII).

CMIC 40 acts as a gateway between the SOC 10 and the host CPU. The communication can be, for example, along a peripheral component interconnect (PCI) bus, or other acceptable communications bus. CMIC 40 can provide sequential direct mapped accesses between the host CPU 52 and the SOC 10. CPU 52, through the CMIC 40, will be able to access numerous resources on SOC 10, including management information base (MIB) counters, programmable registers, status and control registers, configuration registers, ARL tables, port-based VLAN tables, IEEE 802.1q VLAN tables, layer three tables, rules tables, CBP address and data memory, as well as GBP address and data memory. Optionally, the CMIC 40 can include direct memory access (DMA) support, DMA chaining and scatter-gather, as Well as master and target PCI64.

Common buffer memory pool or CBP 50 can be considered to be the on-chip data memory. In one embodiment of the invention, the CBP 50 is first level high speed SRAM memory, to maximize performance and minimize hardware overhead requirements. The CBP can have a size of, for example, 720 kilobytes running at 132 MHz. Packets stored in the CBP 50 are typically stored as cells, rather than packets. As illustrated in the figure, PMMU 70 also contains the Common Buffer Manager (CBM) 71 thereupon. CBM 71 handles queue management, and is responsible for assigning cell pointers to incoming cells, as well as assigning common packet IDs (CPID) once the packet is fully written into the CBP. CBM 71 can also handle management of the on-chip free address pointer pool, control actual data transfers to and from the data pool, and provide memory budget management.

Global memory buffer pool or GBP 60 acts as a second level memory, and can be located on-chip or off chip. In the preferred embodiment, GBP 60 is located off chip with respect to SOC 10. When located off-chip, GBP 60 is considered to be a part of or all of external memory 12. As a second level memory, the GBP does not need to be expensive high speed SRAMs, and can be a slower less expensive memory such as DRAM. The GBP is tightly coupled to the PMMU 70, and operates like the CBP in that packets are stored as cells. For broadcast and multicast messages, only one copy of the packet is stored in GBP 60.

As shown in the figure, PMMU 70 is located between GBP 60 and CPS channel 80, and acts as an external memory interface. In order to optimize memory utilization, PMMU 70 includes multiple read and write buffers, and supports numerous functions including global queue management, which broadly includes assignment of cell pointers for rerouted incoming packets, maintenance of the global free address pool (FAP), time-optimized cell management, global memory budget management, GPID assignment and egress manager notification, write buffer management, read prefetches based upon egress manager/class of service requests, and smart memory control.

As shown in FIG. 2, the CPS channel 80 is actually three separate channels, referred to as the C-channel, the P-channel, and the S-channel. The C-channel is 128 bits wide, and runs at 132 MHz. Packet transfers between ports occur on the C-channel. Since this channel is used solely for data transfer, there is no overhead associated with its use. The P-channel or protocol channel is synchronous or locked with the C-channel. During cell transfers, the message header is sent via the P-channel by the PMMU. The P-channel is 32 bits wide, and runs at 132 MHz.

The S or sideband channel runs at 132 MHz, and is 32 bits wide. The S-channel is used for functions such as four conveying Port Link Status, receive port full, port statistics, ARL table synchronization, memory and register access to CPU and other CPU management functions, and global memory full and common memory full notification.

A proper understanding of the operation of SOC 10 requires a proper understanding of the operation of CPS channel 80. Referring to FIG. 3, it can be seen that in SOC 10, on the ingress, packets are sliced by an EPIC 20 or GPIC 30 into 64-byte cells. The use of cells on-chip instead of packets makes it easier to adapt the SOC to work with cell based protocols such as, for example, Asynchronous Transfer Mode (ATM). Presently, however, ATM utilizes cells which are 53 bytes long, with 48 bytes for payload and 5 bytes for header. In the SOC, incoming packets are sliced into cells which are 64 bytes long as discussed above, and the cells are further divided into four separate 16 byte cell blocks Cn0 . . . Cn3. Locked with the C-channel is the P-channel, which locks the opcode in synchronization with Cn0. A port bit map is inserted into the P-channel during the phase Cn1. The untagged bit map is inserted into the P-channel during phase Cn2, and a time stamp is placed on the P-channel in Cn3. Independent from occurrences on the C and P-channel, the S-channel is used as a sideband, and is therefore decoupled from activities on the C and P-channel.

Cell or C-channel

Arbitration for the CPS channel occurs out of band. Every module (EPIC, GPIC, etc.) monitors the channel, and matching destination ports respond to appropriate transactions. C-channel arbitration is a demand priority round robin arbitration mechanism. If no requests are active, however, the default module, which can be selected during the configuration of SOC 10, can park on the channel and have complete access thereto. If all requests are active, the configuration of SOC 10 is such that the PMMU is granted access every other cell cycle, and EPICs 20 and GPICs 30 share equal access to the C-channel on a round robin basis. FIGS. 4A and 4B illustrate a C-channel arbitration mechanism wherein section A is the PMMU, and section B consists of two GPICs and three EPICs. The sections alternate access, and since the PMMU is the only module in section A, it gains access every other cycle. The modules in section B, as noted previously, obtain access on a round robin basis.

Protocol or P-channel

Referring once again to the protocol or P-channel, a plurality of messages can be placed on the P-channel in order to properly direct flow of data flowing on the C-channel. Since P-channel 82 is 32 bits wide, and a message typically requires 128 bits, four smaller 32 bit messages are put together in order to form a complete P-channel message. The following list identifies the fields and function and the various bit counts of the 128 bit message on the P-channel.

Opcode--2 bits long--Identifies the type of message present on the C channel 81;

Internet Protocol (IP) Bit--1 bit long--This bit is set to indicate that the packet is an IP switched packet;

lnternetwork Packet Exchange (IPX) Bit--1 bit long--This bit is set to indicate that the packet is an IPX switched packet;

Next Cell--2 bits long--A series of values to identify the valid bytes in the corresponding cell on the C channel 81;

Source Port Address Destination (SRC DEST Port)--6 bits long--Defines the port number which sends the message or receives the message, with the interpretation of the source or destination depending upon Opcode;

Class of Service (COS or Cos)--3 bits long--Defines class of service for the current packet being processed;

J--1 bit long--Describes whether the current packet is a jumbo packet;

S--1 bit long--Indicates whether the current cell is the first cell of the packet;

E--1 bit long--Indicates whether the current cell is the last cell of the packet;

CRC--2 bits long--Indicates whether a Cyclical Redundancy Check (CRC) value should be appended to the packet and whether a CRC value should be regenerated;

Bit--1 bit long--Determines whether MMU should Purge the entire packet;

Len--7 bytes--Identifies the valid number of bytes in current transfer;

O--2 bits--Defines an optimization for processing by the CPU 52; and

Broadcast/multicast (Bc/Mc) Bitmap--28 bits--Defines the broadcast or multicast bitmap. Identifies egress ports to which the packet should be set, regarding multicast and broadcast messages.

Untag Bits/Source Port--28/5 bits long--Depending upon Opcode, the packet is transferred from Port to MMU, and this field is interpreted as the untagged bit map. A different Opcode selection indicates that the packet is being transferred from MMU to egress port, and the last six bits of this field is interpreted as the Source Port field. The untagged bits identifies the egress ports which will strip the tag header, and the source port bits identifies the port number upon which the packet has entered the switch;

U Bit--1 bit long--For a particular Opcode selection (0x01, this bit being set indicates that the packet should leave the port as Untagged; in this case, tag stripping is performed by the appropriate MAC;

CPU Opcode--18 bits long--These bits are set if the packet is being sent to the CPU for any reason. Opcodes are defined based upon filter match, learn bits being set, routing bits, destination lookup failure (DLF), station movement, etc;

Time Stamp--14 bits--The system puts a time stamp in this field when the packet arrives, with a granularity of 1 .mu.sec.

The opcode field of the P-channel message defines the type of message currently being sent. While the opcode is currently shown as having a width of 2 bits, the opcode field can be widened as desired to account for new types of messages as may be defined in the future. Graphically, however, the P-channel message type defined above is shown in FIG. 5.

An early termination message is used to indicate to CBM 71 that the current packet is to be terminated. During operation, as discussed in more detail below, the status bit (S) field in the message is set to indicate the desire to purge the current packet from memory. Also in response to the status bit all applicable egress ports would purge the current packet prior to transmission.

The Src Dest Port field of the P-channel message, as stated above, define the destination and source port addresses, respectively. Each field is 6 bits wide and therefore allows for the addressing of sixty-four ports.

The CRC field of the message is two bits wide and defines CRC actions. Bit 0 of the field provides an indication whether the associated egress port should append a CRC to the current packet. An egress port would append a CRC to the current packet when bit 0 of the CRC field is set to a logical one. Bit 1 of the CRC field provides an indication whether the associated egress port should regenerate a CRC for the current packet. An egress port would regenerate a CRC when bit 1 of the CRC field is set to a logical one. The CRC field is only valid for the last cell transmitted as defined by the E bit field of P-channel message set to a logical one.

As with the CRC field, the status bit field (st), the Len field, and the Cell Count field of the message are only valid for the last cell of a packet being transmitted as defined by the E bit field of the message.

Last, the time stamp field of the message has a resolution of 1 .mu.s and is valid only for the first cell of the packet defined by the S bit field of the message. A cell is defined as the first cell of a received packet when the S bit field of the message is set to a logical one value.

As is described in more detail below, the C channel 81 and the P channel 82 are synchronously tied together such that data on C channel 81 is transmitted over the CPS channel 80 while a corresponding P channel message is simultaneously transmitted.

S-channel or Sideband Channel

The S channel 83 is a 32-bit wide channel which provides a separate communication path within the SOC 10. The S channel 83 is used for management by CPU 52, SOC 10 internal flow control, and SOC 10 inter-module messaging. The S channel 83 is a sideband channel of the CPS channel 80, and is electrically and physically isolated from the C channel 81 and the P channel 82. It is important to note that since the S channel is separate and distinct from the C channel 81 and the P channel 82, operation of the S channel 83 can continue without performance degradation related to the C channel 81 and P channel 82 operation. Conversely, since the C channel is not used for the transmission of system messages, but rather only data, there is no overhead associated with the C channel 81 and, thus, the C channel 81 is able to free-run as needed to handle incoming and outgoing packet information.

The S channel 83 of CPS channel 80 provides a system wide communication path for transmitting system messages, for example, providing the CPU 52 with access to the control structure of the SOC 10. System messages include port status information, including port link status, receive port full, and port statistics, ARL table 22 synchronization, CPU 52 access to GBP 60 and CBP 50 memory buffers and SOC 10 control registers, and memory full notification corresponding to GBP 60 and/or CBP 50.

FIG. 6 illustrates a message format for an S channel message on S channel 83. The message is formed of four 32-bit words; the bits of the fields of the words are defined as follows:

Opcode--6 bits long--Identifies the type of message present on the S channel;

Dest Port--6 bits long--Defines the port number to which the current S channel message is addressed;

Src Port--6 bits long--Defines the port number of which the current S channel message originated;

COS--3 bits long--Defines the class of service associated with the current S channel message; and

C bit--1 bit long--Logically defines whether the current S channel message is intended for the CPU 52.

Error Code--2 bits long--Defines a valid error when the E bit is set;

DataLen--7 bits long--Defines the total number of data bytes in the Data field;

E bit--1 bit long--Logically indicates whether an error has occurred in the execution of the current command as defined by opcode;

Address--32 bits long--Defines the memory address associated with the current command as defined in opcode;

Data--0-127 bits long--Contains the data associated with the current opcode.

With the configuration of CPS channel 80 as explained above, the decoupling of the S channel from the C channel and the P channel is such that the bandwidth on the C channel can be preserved for cell transfer, and that overloading of the C channel does not affect communications on the sideband channel.

SOC Operation

The configuration of the SOC 10 supports fast ethernet ports, gigabit ports, and extendible interconnect links as discussed above. The SOC configuration can also be "stacked", thereby enabling significant port expansion capability. Once data packets have been received by SOC 10, sliced into cells, and placed on CPS channel 80, stacked SOC modules can interface with the CPS channel and monitor the channel, and extract appropriate information as necessary. As will be discussed below, a significant amount of concurrent lookups and filtering occurs as the packet comes in to ingress submodule 14 of an EPIC 20 or GPIC 30, with respect to layer two and layer three lookups, and fast filtering.

Now referring to FIGS. 8 and 9, the handling of a data packet is described. For explanation purposes, ethernet data to be received will consider to arrive at one of the ports 24a of EPIC 20a. It will be presumed that the packet is intended to be transmitted to a user on one of ports 24c of EPIC 20c. All EPICs 20 (20a, 20b, 20c, etc.) have similar features and functions, and each individually operate based on packet flow.

An input data packet 112 is applied to the port 24a is shown. The data packet 112 is, in this example, defined per the current standards for 10/100 Mbps Ethernet transmission and may have any length or structure as defined by that standard. This discussion will assume the length of the data packet 112 to be 1024 bits or 128 bytes.

When the data packet 112 is received by the EPIC module 20a, an ingress sub-module 14a, as an ingress function, determines the destination of the packet 112. The first 64 bytes of the data packet 112 is buffered by the ingress sub-module 14a and compared to data stored in the lookup tables 21a to determine the destination port 24c. Also as an ingress function, the ingress sub-module 14a slices the data packet 112 into a number of 64-byte cells; in this case, the 128 byte packet is sliced in two 64 byte cells 112a and 112b. While the data packet 112 is shown in this example to be exactly two 64-byte cells 112a and 112b, an actual incoming data packet may include any number of cells, with at least one cell of a length less than 64 bytes. Padding bytes are used to fill the cell. In such cases the ingress sub-module 14a disregards the padding bytes within the cell. Further discussions of packet handling will refer to packet 112 and/or cells 112a and 112b.

It should be noted that each EPIC 20 (as well as each GPIC 30) has an ingress submodule 14 and egress submodule 16, which provide port specific ingress and egress functions. All incoming packet processing occurs in ingress submodule 14, and features such as the fast filtering processor, layer two (L2) and layer three (L3) lookups, layer two learning, both self-initiated and CPU 52 initiated, layer two table management, layer two switching, packet slicing, and channel dispatching occurs in ingress submodule 14. After lookups, fast filter processing, and slicing into cells, as noted above and as will be discussed below, the packet is placed from ingress submodule 14 into dispatch unit 18, and then placed onto CPS channel 80 and memory management is handled by PMMU 70. A number of ingress buffers are provided in dispatch unit 18 to ensure proper handling of the packets/cells. Once the cells or cellularized packets are placed onto the CPS channel 80, the ingress submodule is finished with the packet. The ingress is not involved with dynamic memory allocation, or the specific path the cells will take toward the destination. Egress submodule 16, illustrated in FIG. 8 as submodule 16a of EPIC 20a, monitors CPS channel 80 and continuously looks for cells destined for a port of that particular EPIC 20. When the PMMU 70 receives a signal that an egress associated with a destination of a packet in memory is ready to receive cells, PMMU 70 pulls the cells associated with the packet out of the memory, as will be discussed below, and places the cells on CPS channel 80, destined for the appropriate egress submodule. A FIFO in the egress submodule 16 continuously sends a signal onto the CPS channel 80 that it is ready to receive packets, when there is room in the FIFO for packets or cells to be received. As noted previously, the CPS channel 80 is configured to handle cells, but cells of a particular packet are always handled together to avoid corrupting of packets. In order to overcome data flow degradation problems associated with overhead usage of the C channel 81, all L2 learning and L2 table managem