WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for fault tolerant call processing    
United States Patent5974114   
Link to this pagehttp://www.wikipatents.com/5974114.html
Inventor(s)Blum; Andrea G. (Middletown, NJ), Potochniak; Paul A. (Jackson, NJ)
AbstractA method and apparatus for processing call data. A first server in active mode replicates call data to a second server in standby mode. The first server is monitored for a fault condition by the second server, as well as other network devices. If a fault condition is detected, the first server is switched to standby mode and the second server to active mode.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5974114
Method and apparatus for fault tolerant call processing - US Patent 5974114 Drawing
Method and apparatus for fault tolerant call processing
Inventor     Blum; Andrea G. (Middletown, NJ) , Potochniak; Paul A. (Jackson, NJ)
Owner/Assignee     AT&T Corp (New York, NY)
Patent assignment
All assignments
Publication Date     October 26, 1999
Application Number     08/937,762
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 25, 1997
US Classification     379/9 370/217 379/221.01 379/269 714/17 714/6
Int'l Classification    
Examiner     Loomis; Paul
Assistant Examiner    
Attorney/Law Firm    
Address
Parent Case    
Priority Data    
USPTO Field of Search     379/1 379/9 379/10 379/14 379/15 379/32 379/34 379/219 379/220 379/221 379/268 379/269 379/279 370/216 370/217 370/220 395/182.04 395/181 395/182.07 395/182.08 395/182.09 714/5 714/6 714/10 714/11
Patent Tags     fault tolerant call processing
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5883939
Friedman et al.

Mar,1999

[0 after 0 votes]
5848128
Frey

Dec,1998

[0 after 0 votes]
5661719
Townsend et al.

Aug,1997

[0 after 0 votes]
5182750
Bales et al.

Jan,1993

[0 after 0 votes]
4949373
Baker, Jr. et al.

Aug,1990

[0 after 0 votes]
4914572
Bitzinger et al.

Apr,1990

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method for processing call data, comprising the steps of:

replicating call data from a first server in active mode to a second server in standby mode;

monitoring said first server by said second server and other network devices for a fault condition; and

switching said first server to standby mode and said second server to active mode if a fault condition is detected.

2. The method of claim 1, wherein said step of replicating call data comprises the steps of:

receiving the call data at said first server;

processing the call data at said first server;

updating a call data record for said first server to reflect the call data;

sending the call data to said second server; and

updating a call data record for said second server to reflect the call data.

3. The method of claim 1, wherein said step of monitoring comprises the steps of:

querying said first server by said network devices to detect a fault condition; and

sending a message from said network devices to said second server of a detected fault condition.

4. The method of claim 3, wherein said step of switching comprises the steps of:

receiving at said second server said messages;

determining whether said messages reaches a predetermined threshold number, and if so:

switching said second server from standby mode to active mode; and

sending a message from said second server to said first server to switch to standby mode.

5. The method of claim 4, further comprising the step of sending a message to said network devices to redirect call data to said second server.

6. The method of claim 1, further comprising the steps of:

receiving static call data at a database;

storing said static call data in a static call data profile at said database; and

replicating said static call data to said first and second servers if said static call data is updated.

7. The method of claim 6, wherein said step of replicating comprises the steps of:

receiving said static call data at said first and second servers; and

updating a static call data profile for said first server, and a static call data profile for said second servers.

8. The method of claim 7, further comprising the step of auditing said call data records and said static call data profiles on a periodic basis to ensure data synchronization.

9. A method for processing call data, comprising the steps of:

receiving the call data at a first server in an active mode;

processing the call data at said first server;

updating a call data record for said first server to reflect the call data;

replicating the call data to a second server in a standby mode;

monitoring said first server by said second server and other network devices for a fault condition; and

switching said first server to standby mode and said second server to active mode if a fault condition is detected.

10. The method of claim 9, further comprising the steps of:

receiving the replicated call data at said second server; and

updating a call data record for said second server to reflect the replicated call data.

11. The method of claim 10, further comprising the step of sending a message that said first server has switched to standby mode and said second server has switched to active mode.

12. An apparatus for processing calls, comprising:

a first call control computer in active mode for receiving call data;

a second call control computer in standby mode coupled to said first call control computer;

means for replicating said call data from said first call control computer to said second call control computer;

means for monitoring said first call control computer to detect failure of said first call control computer; and

means for switching said second call control computer to active mode and said first call control computer to standby mode if said failure occurs.

13. The apparatus of claim 12, further comprising a database coupled to said first and second call control computers.

14. The apparatus of claim 13, wherein said call information comprises static call information and dynamic call information, and said database stores said static information.

15. The apparatus of claim 14, further comprising a means for replicating said static call information on said first and second call control computers.

16. The apparatus of claim 13, wherein said means for replicating replicates static call information on said first and second call control computers whenever said static call information is modified.

17. The apparatus of claim 12, wherein said means for monitoring comprises:

means for remotely monitoring said first and second call control computers; and

means for locally monitoring said first and second call control computers.

18. The apparatus of claim 17, wherein said means for locally monitoring comprises:

means for setting said first call control computer in active mode and said second call control computer in standby mode;

means for initializing said first call control computer in active mode;

means for determining whether a set of internal processes within said first call control computer are running within normal parameters; and

means for sending a message to said second call control computer to switch from standby mode to active mode if said set of internal processes are not running within normal parameters.

19. The apparatus of claim 17, wherein said means for remotely monitoring comprises:

means for determining whether a set of internal processes within said first call control computer are running within normal parameters; and

means for sending a message to said second call control computer voting to switch said second call control computer from standby mode to active mode if said set of internal processes are not running within normal parameters.

20. The apparatus of clam 12, wherein said means for switching comprises:

means for receiving at said second server vote-to-switch messages;

means for determining whether said messages reaches a predetermined threshold number, and if so:

means for switching said second server from standby mode to active mode; and

means for sending a message from said second server to said first server to switch to standby mode.

21. The apparatus of claim 20, further comprising means for sending a message to said network devices to redirect call data to said second server.

22. A computer for performing call processing, comprising:

a memory containing:

a computer program for replicating call data from a first server in active mode to a second server in standby mode;

a set of computer programs for monitoring said first server by said second server and other network devices for a fault condition;

a computer program for switching said first server to standby mode and said second server to active mode if a fault condition is detected; and

a processor for running said programs.

23. A computer-readable medium whose contents cause a computer system to perform a remote procedure call, the computer system having a computer program that when executed performs the steps of:

replicating call data from a first server in active mode to a second server in standby mode;

monitoring said first server by said second server and other network devices for a fault condition; and

switching said first server to standby mode and said second server to active mode if a fault condition is detected.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The invention relates to a call processing in general. More particularly, the invention relates to a method and apparatus for automatically switching call processing from an active call processor to a standby call processor in the event the active call processor fails.

BACKGROUND OF THE INVENTION

Given the current state of telephony technology, telephone calls over modern telecommunications networks are relatively reliable in terms of speed in completing a call connection, meeting quality of service requirements, and maintaining a call connection during the course of a conversation. The last category, maintaining a call connection, is provided in large part by building redundancy into the network, especially in the call processing platform. The call processing platform generally controls the set-up and shut-down of a call connection, and ensures that billing for a call is accurately maintained. This redundancy in the call processing platform ensures that a call connection is maintained even if there is a hardware failure in the equipment used to establish the call connection, and is sometimes referred to as "fault tolerant call processing."

Conventional technology and methods to build redundancy in a call processing platform, however, are less than desirable for a number of reasons. For example, a call processing platform typically has a call control computer that is responsible for implementing call flow by coordinating and assigning the resources of the other platform components, such as a switching matrix, voice response computers, and data base computers. Given its central function, the operation of the call control computer is extremely important in maintaining a call connection. Consequently, the call control computer is typically a specialized computer designed with redundant hardware components, such as a back-up microprocessor, memory, power supply, and so forth. This specialized call control computer, however, is very expensive. In addition, a single call control computer, even with redundant hardware, is susceptible to common mode failure. Common mode failure occurs when a single failure of a system component causes total system failure to occur. Further, the specialized call control computer is difficult to upgrade and maintain.

In an attempt to avoid the above problems, some call processing platforms utilize multiple call control computers, rather than a single dedicated call control computer with redundant hardware. The use of multiple call control computers, however, poses a new set of problems. Typically, one of the call control computers is designated as an active call control computer, with a second designated a standby call control computer. The active call control computer actively controls call processing functions for the call processing platform, while the standby call control computer stands ready to take over control of the call processing platform in the event the active call control computer experiences a hardware or software failure. To ensure that calls are not dropped when the active call control computer fails, it becomes necessary to duplicate all call processing data to the standby call control computer. Further, it becomes necessary to implement a monitoring scheme to monitor the active call control computer, and determine when it becomes necessary to switch over to the standby call controller.

Conventional techniques exist for duplicating call processing data from an active call control computer to a standby call control computer, such as the technique disclosed in a paper authored by Rachid Guerraoui et al. Titled "Software Based Replication for Fault Tolerance," Computer Journal, IEEE, April 1997. The technique described in the Guerraoui paper, however, is unsatisfactory for a number of reasons. For example, the Guerraoui paper fails to disclose a monitoring and switch over scheme that minimizes dropped calls in the case of failure of the active call control computer. Further, the Guerraoui paper fails to disclose a means for synchronizing the call processing data across the call processing platform.

In addition, the Guerraoui paper fails to teach how to ensure that the standby computer has accurate records regarding static call data. Typically, a call processing platform requires two types of data to process a call: (1) dynamic call data; and (2) static call data. Dynamic call data is information about the caller or call connection that changes for every call. For example, a destination telephone number is considered dynamic call data since it typically changes from call to call. Static call data is information about a caller that is relative stable, that is, it does not change on a call by call basis. An example of static call data would be a billing address for a caller, or perhaps a Personal Identification Number. The Guerraoui paper fails to discuss the duplication of static call data to the standby call control computer.

In view of the foregoing, it can be appreciated that a substantial needs exists for a fault tolerant call processing method and apparatus that solves the above-discussed problems.

SUMMARY OF THE INVENTION

The present invention includes method and apparatus for processing call data. A first server in an active mode replicates call data to a second server in a standby mode. The first server is monitored for a fault condition by the second server, as well as other network devices. If a fault condition is detected, the first server is switched to standby mode and the second server to active mode.

With these and other advantages and features of the invention that will become hereinafter apparent, the nature of the invention may be more clearly understood by reference to the following detailed description of the invention, the appended claims and to the several drawings attached herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a communications system suitable for practicing one embodiment of the invention.

FIG. 2 illustrates a call processing platform in accordance with one embodiment of the invention.

FIG. 3 is a block diagram of a call control computer in accordance with one embodiment of the invention.

FIG. 4 illustrates a block flow diagram of steps performed by a dynamic data replication module in accordance with one embodiment of the invention.

FIG. 5(a) illustrates a first block flow diagram of a High Availability Daemon (HAD) module in accordance with one embodiment of the invention.

FIG. 5(b) illustrates a second block flow diagram of a HAD module in accordance with one embodiment of the invention.

FIG. 6(a) illustrates a first block flow diagram of a Monitor Service (MON) module in accordance with one embodiment of the invention.

FIG. 6(b) illustrates a second block flow diagram of a MON module in accordance with one embodiment of the invention.

DETAILED DESCRIPTION

The invention includes a method and apparatus for fault tolerant call processing. More particularly, the invention includes a method and apparatus for automatically switching from an active call control computer to a standby call control computer in the event of a hardware or software failure of the active call control computer, without interrupting active call connections being processed by the active call control computer. Two key elements required to perform this automatic switch over are call data synchronization and communications monitoring.

One embodiment of the invention comprises a call processing platform built upon general purpose computer devices. The general purpose, non-specialized computing devices are combined with voice response units (VRUs) and a switching matrix to create a distributed, fault tolerant, easily maintained call processing platform that provides high service availability through the use of "hot" standby sparing, full data sharing, database replication and synchronization, and a software-based distributed monitoring system. It is worthy to note that although the distributed monitoring system of this embodiment of the invention is implemented in software, it can be appreciated that the distributed monitoring system could be implemented in hardware or software and still fall within the scope of the invention.

The call processing platform performs call control and resource management using general purpose, non-specialized computer devices. The use of general purpose, non-specialized computer devices significantly reduces the cost of the call processing platform in general, and the call control computers in particular. This embodiment of the invention utilizes a pair of general purpose, non-specialized computer devices as call control computers, with one of the computers actively controlling call processing for the call processing platform ("active call control computer"), and with the other placed in a standby mode ("standby call control computer") and ready to assume call processing responsibilities in the event the active call control computer experiences a hardware or software failure.

Switching from the active call control computer to the standby call control computer can be performed on demand or automatically in the event of failure of the active call control computer. The on-demand "active/standby switch over" of the call control computers permits a platform administrator to request either an ON.sub.-- DEMAND GRACEFUL switch over or an ON.sub.-- DEMAND QUICK switch-over. The ON.sub.-- DEMAND GRACEFUL switch over resynchronizes the entire call processing platform by temporarily halting call processing and cleaning up all currently utilized switch resources. The ON.sub.-- DEMAND QUICK switch over operates similar to the automatic active/standby switch over described below.

The automatic active/standby switch over of the call control computers is accomplished utilizing two key elements. The first key element is platform monitoring. The second key element is synchronizing call state information.

Platform monitoring is accomplished using distributed monitors for the call control computers and other critical processes. Each call control computer is equipped with a communications monitor for monitoring the internal processes for the call control computer, as well as the health of the other network devices that are part of the call processing platform. In addition, each network device is equipped with a communications monitor for monitoring the internal processes of each network device, as well as the call control computers. Each communications monitor can detect failure of the device that is running the monitor, as well as the failure of other devices external to the device running the monitor. Thus, each network device, including the call control computers, is capable of detecting device failures and reporting the device failures to the active call control computer. Additionally, each communications monitor remote to the active call computer can detect or confirm communication failure of the active call control computer and alert the standby call control computer of the need for a possible takeover.

In this embodiment of the invention, platform monitoring is accomplished through the use of two sets of monitoring processes. These processes monitor the platform for hardware and software failure so that call processing is maintained by activating the standby call control computer upon the failure of the active call control computer.

The first set of monitoring processes are referred to as High Availability Daemon (HAD) processes. The HAD processes run on the call control computers, with one HAD per computer. The HADs are responsible for: (1) coordinating startup and shutdown of call processing on the platform; (2) tracking the health of applications local to their own processors; (3) tracking the communication status and system state of the other platform components; and (4) monitoring the health of each other's call control computer. The HAD process is described in more detail with reference to FIGS. 3, 5(a) and 5(b).

The second set of monitoring processes are referred to as Monitor Service (MON) processes. The MON processes runs on the other components of the platform, e.g., VRUs and database computer. Each component has one MON process. In general, MONs are responsible for: (1) tracking the health of the application local to their own processor; (2) reporting the state of the local processor to the two call control computers; and (3) directing call flow to the active call control computer. The MON process is described in more detail with reference to FIGS. 3, 6(a) and 6(b).

If any of the monitoring processes (HADs or MONs) detects a failure that affects the call processing capabilities of the active call control computer, they register a vote-to-switch with the standby call control computer. Upon receiving two such votes the standby activates. First the standby tells its (formerly active) mate call control computer to enter a standby mode. Then the standby informs the other platform components to redirect the call flow to it as the newly active call control computer.

The other key component of the automatic active/standby switch over is the fully associated synchronization of each call state data structure contained on the active call control computer with its replicated call state data structure on the standby call control computer. As part of normal operation the call control computer maintains call information on a per call basis, i.e., dynamic call data. This information deals with switch and VRU resources currently assigned to a call, and caller data such as a target number and billing instrument (e.g., calling card) data. As this information is collected by the active call control computer from the other platform elements, the data is synchronized in real time to the standby call control computer. By this method, the standby call control computer always has all call information necessary to continue call processing should the monitoring processes determine that the active has failed.

Thus, the call control computers are fully synchronized with respect to call data used for the call processing. The active call control computer immediately shares all call state updates with the hot standby so that upon the active's failure, the standby can accept re-directed call flow with minimal loss of active calls or queuing delay.

Database synchronization and replication of static call data is also performed for both call control computers. a database computer stores static call data in a static call data profile, and then replicates the static call data onto both the active and standby call control computers whenever the static call data is accessed or modified. This ensures that should data be lost on any unit, it may be easily recovered from a replication. The replication of static call data for this embodiment of the invention utilizes an Advanced Replication product provided by Oracle Corporation. The call server copies of the database are read-only, and propagated to the call servers using Oracle's Read-Only Snapshots product.

Periodic data audits of dynamic and static call data records on both call control computers are performed to confirm that all data is synchronized. This ensures that both call control computers have updated call records regarding a particular call so that the call is not dropped in the event of a failure by the active call control computer.

Referring now in detail to the drawings wherein like parts are designated by like reference numerals throughout, there is illustrated in FIG. 1 a communications system suitable for practicing one embodiment of the invention. As shown in FIG. 1, terminals A and B (each labeled number 7) are connected to a Public Switched Telephone Network (PSTN) 9. PSTN 9 is also connected to a Call Servicing Center (CSC) 8. A calling party initiates a telephone call from terminal A. The call is processed by CSC 8, and a call connection is completed to the called party at terminal B via PSTN 9. CSC 8 includes a call processing platform (CPP) 10 that is described in more detail with reference to FIG. 2.

FIG. 2 illustrates a call processing platform in accordance with one embodiment of the invention. A CPP 10 includes a computer controlled switching matrix 12, a first call control computer 14, a second call control computer 20, a plurality of VRUs 16, and a database computer 18.

Switching matrix 12 interfaces with a pair of call control computers via local area network (LAN) 44. Switching matrix 12 is responsible for providing all network terminations to the PSTN.

Call control computers 14 and 20 are responsible for the implementation of call flow between an origination number and a destination number. Call control computers 14 and 20 coordinate and assign the resources of the other platform components such as switch 12, VRUs 16 and database computer 18. Each call control computer has an active mode and a standby mode. A call control computer in active mode actively controls call processing for CPP 12, while the other call computer is placed in standby mode as a back-up to the call control computer in active mode.

VRUs 16 are computers capable of providing speech and touch tone resources used to interact with the caller. VRUs 16 are connected to switching matrix 12 via a network such as an Integrated Services Digital Network Primary Rate Interface (ISDN-PRI), and to call control computer 14 over another network, such as LAN 44.

Database computer 18 is a general purpose computer containing a relational database for use in call processing. Database computer 18 is connected to the call control computers via LAN 44.

FIG. 3 is a block diagram of a call control computer in accordance with one embodiment of the invention. For purposes of clarity, the following description will make reference to call computer 14. Call control computers 14 and 20 are similar, however, and therefore any discussion regarding one call control computer is equally applicable to the other call control computer.

Call control computer 14 comprises a main memory module 24, a central processing unit (CPU) 26, a system control module 28, a bus adapter 30, a High Availability Daemon (HAD) module 32, and a dynamic data replication module 34 each of which is connected to a CPU/memory bus 22 and an Input/Output (I/O) bus 38 via bus adapter 30. Further, call control computer 20 contains multiple I/O controllers 40, as well as a external memory 46 and a network interface 48, each of which is connected to I/O bus 38 via I/O controllers 40.

The overall functioning of call control computer 14 is controlled by CPU 26, which operates under the control of executed computer program instructions that are stored in main memory 24 or external memory 46. Both main memory 24 and external memory 46 are machine readable storage devices. The difference between main memory 24 and external memory 46 is that CPU 26 can typically access information stored in main memory 24 faster than information stored in external memory 36. Thus, for example, main memory 24 may be any type of machine readable storage device, such as random access memory (RAM), read only memory (ROM), programmable read only memory (PROM), erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM). External memory 46 may be any type of machine readable storage device, such as magnetic storage media (i.e., a magnetic disk), or optical storage media (i.e., a CD-ROM). Further, call control computer 14 may contain various combinations of machine readable storage devices through other I/O controllers, which are accessible by CPU 26, and which are capable of storing a combination of computer program instructions and data.

CPU 26 includes any processor of sufficient processing power to perform the HAD and data replication functionality found in call control computer 14. Examples of CPUs suitable to practice the invention includes the INTEL family of processors, such as the Pentium.RTM., Pentium.RTM. Pro, and Pentium.RTM. II microprocessors.

Network interface 48 is used for communications between call control computer 14 and a communications network, such as LAN 44. Network interface 48 supports appropriate signaling and voltage levels, in accordance with techniques well known in the art.

I/O controllers 40 are used to control the flow of information between call control computer 14 and a number of devices or networks such as external memory 46 and network interface 48. System control module 28 includes human user system control, user interface, and operation. Bus adapter 30 is used for transferring data back and forth between CPU/memory bus 22 and I/O bus 38.

VRUs 16 and database computer 18 are similar to call control computer 14 described with reference to FIG. 3. VRUs 16 and database computer 18, however, replaces HAD module 32 with a Monitor Service (MON) module 50 (not shown in FIG. 3). MON 50 may also be implemented on other network devices internal or external to CPP 10.

HAD 32, MON 50 and dynamic data replication module 34 implements the main functionality for this embodiment of the invention. It is noted that HAD module 32 and dynamic data replication module 34 are shown in FIG. 3 as, and MON module 50 is described as, separate functional modules. It can be appreciated, however, that the functions performed by these modules can be further separated into more modules, combined together to form one module, or be distributed throughout the system, and still fall within the scope of the invention. Further, the functionality of these modules may be implemented in hardware, software, or a combination of hardware and software, using well-known signal processing techniques.

HAD 32 and MON 50 share responsibility for four central functions: (1) coordinating startup and shutdown of call control computers 14 and 20; (2) tracking and logging communication and activity states for call control computer 14 and 20; (3) detecting and alarming any hardware, software or other failures/problems; and (4) monitoring the operations of each other.

HAD 32 runs on both call control computers 14 and 20. Call control computers 14 and 20 have two primary modes: (1) an active mode; and (2) a standby mode. When a call control computer is in active mode, it is actively controlling call processing functions for CPP 10, and is referred to as an active call control computer. Similarly, HAD 32 running on the active call control computer is referred to as an active HAD (HAD-CurrActy). When a call control computer is in standby mode, it is kept ready to take over active control of the call processing functions for CPP 10 either on-demand or automatically with minimal impact on currently active calls. A call control computer in standby mode is referred to as a standby call control computer, and HAD 32 running on the standby call control computer is referred to as a standby HAD (HAD-Stand). At any time, only one of the two call control computers may be in active control of CPP 10.

HAD 32 provides the following functionality for call control computers 14 and 20:

1. Bringing up and shutting down the active call control computer's critical processes in a particular order during platform startup and shutdown.

2. Notifying the MONs running on the other network devices to bring up or shut down critical processes on the other network devices.

3. Performing on-demand or automatic switch-over of platform control between call control computers 14 and 20.

4. The standby HAD recognizes the need for, and initiates, automatic switch-over of platform control to the standby call control computer from a failed active call control computer with minimal loss of currently active calls.

5. Keeping track of the status of a call server's critical processes.

6. Keeping track of the status of other network devices' critical processes.

7. Recognizing which is the default active call control computer upon cold-start or re-start and automatically initializing the default active call control computer accordingly.

8. Responding to any MON's heartbeats or state queries from other network devices.

MON 50 runs on all network devices remote to call control computers 14 and 20, such as VRUs 16 and database computer 18. MON 50 provides the following functionality for these other network devices:

1. Recognizing which is the currently active call control computer by communication with the HAD-CurrActy.

2. Responding to either HAD's heartbeats, state queries, state change reports and state transition requests.

3. Keeping track of the status of the other network devices' critical processes.

4. Notifying the currently active HAD of any state changes or alarms.

5. Monitoring the communication status of the HAD-CurrActy and notifying the standby HAD of any problems.

To properly implement automatic switch over, the HAD 32 or MON 50 processes must detect and act upon system failures within a short period of time, e.g., 5 seconds of their occurrence. The type of failures that may be detected by HAD 32 or MON 50 include:

1. The failure of a critical process on a call server;

2. The loss of heart beat messages from a critical process; or

3. The loss of the active call server due to network or operating system failure.

Additional details for HAD 32 and MON 50 will be described later in this specification.

Dynamic data replication module 34 is responsible for replicating call data received at the active call control computer to the standby call control computer. Thus, if the active call control computer fails, the standby call control computer can take over call processing operations for CPP 10 while minimizing the number of calls dropped during the switch over process. Dynamic data replication module 34 is described in more detail with reference to FIG. 4.

FIG. 4 illustrates a block flow diagram of steps performed by a dynamic data replication module in accordance with one embodiment of the invention. As shown in FIG. 4, call data is received at step 52. At step 54, the system determines whether the active call control computer or standby call control computer is to receive the call data.

If the active call control computer is to receive the call data at step 54, the active call control computer processes the call data at step 56. Active call control computer accesses a call data record, and compares the received call data with the call data stored in the call data record at step 60. If the call data differs from the call data stored in the call data record at step 60, the call data is replicated and sent to the standby call control computer at step 62. If the call data is not different from the call data stored in the call data record at step 60, the system looks for the next set of call data at step 52.

If the standby call control computer is to receive the call data at step 54, the system determines whether the call data is from the active call control computer at step 64. If it is, the call data record for the standby call control computer is updated with the new call data at step 66. If the call data is not from the active call control computer at step 64, the system looks for the next set of call data at step 52.

Database computer 18 is a general purpose computer containing a relational database for use in call processing. As with the other network devices described with reference to CPP 10, database computer 18 includes a MON module 50 for monitoring call control computers 14 and 20. Database computer 18 also includes a static data replication module. The static data replication module receives static call data, and stores the static call data in a static call data profile in the relational database. Every time the static call data profile is updated, the static data replication module replicates the static call data stored in the static call data profile to call control computers 14 and 20.

CPP 10 periodically audits the call data records and the static call data profiles on a periodic basis. The data audits help ensure data synchronization between call control computers 14 and 20.

FIG. 5(a) illustrates a first block flow diagram of a High Availability Daemon (HAD) module in accordance with one embodiment of the invention. CPP 10 has two call control computers, a first call control computer and a second call control computer. Each call control computer executes a HAD process, with each HAD process in communication with the other. For purposes of clarity, a HAD process running on the first call control computer will be referred to as "the first HAD process," and a HAD process running on the second call control computer will be referred to as "the second HAD process." Similarly, a HAD process running on the active call control computer will be referred to as "the active HAD process" and a HAD process running on the standby call control computer will be referred to as "the standby HAD process."

As shown in FIG. 5(a), each HAD process executes steps 70, 72, 74, 76, 78, 80, 82 and 84. At step 70, the HAD process is initiated. Upon start up, the HAD process activates the call control computer on which it is running at step 72. At step 74, the HAD is taken out of service. At Step 76, the HAD process determines whether the call control computer upon which it is running is the default active call control computer. In this embodiment of the invention, this determination is accomplished by querying stored data at step 78 and receiving a response to the query at step 76. Alternatively, other means could be implemented for choosing the default active call processor, such as through an alternating or random selection process, and still fall within the scope of the invention. At step 80, the HAD process exchanges heart beats with the internal processes running on the same call control computer that is running the HAD process. The HAD process determines whether all the internal processes are operating within normal performance parameters at step 82. If at step 82 all the internal processes are not operating according to normal performance parameters, then the HAD process is placed out of service again at step 71. If all internal processes are operating according to normal performance parameters at step 82, the HAD process is put in standby mode at step 84.

Thus at step 84, both HAD processes are placed in standby mode. The default active HAD process is initialized at step 86. The default active HAD processor then determines whether the other HAD process ("HAD mate") is already in active mode at step 88. If the HAD Mate is already active at step 88, the default active HAD process is placed on standby at step 84. If the HAD Mate is not already active at step 88, then the default active HAD is placed into a waiting mode at step 90. At step 92, the default active HAD process activates VRU 16, and sends a MON go active message 132 to MON 50.

FIG. 5(b) illustrates a second block flow diagram of a HAD module in accordance with one embodiment of the invention. The default active HAD process determines at step 94 whether the threshold number of VRU's have been activated. If the threshold number of VRU's have not been activated at step 94, then the default active HAD is placed in standby mode at step 84. If the threshold number of VRU's have been activated at step 94, the default active HAD process checks the switch status at step 96. The default active HAD process determines whether the switch is ready to perform switching functions at step 98. If the switch is not ready at step 98, the default active HAD process is put in standby mode at step 84. If the switch is ready to perform switching functions at step 98, then the default active HAD is placed in an active mode at step 100. Once the active HAD has been placed in active mode, the HAD process announces its active status to all the other network devices at step 102.

One function shared by the HAD processes and MON processes is to monitor the internal processes of the computer running the HAD or MON process, and also monitor network devices external to the computer running the HAD or MON process. At steps 104, 106 and 108, the active HAD process queries the internal processes of the active call control computer, as well as other network devices such as VRU 16 and Switching Matrix 12. At step 104, the active HAD process sends status queries to the internal processes, VRU 16 and Switching Matrix 12. The HAD process receives responses from the internal processes, VRU 16 and Switching Matrix 12 at step 106. At step 108, the HAD process determines whether the internal processes, VRU 16 and Switching Matrix 12 are functioning properly. If the internal processes, VRU 16 and Switching Matrix 12 are operating properly at step 108, then steps 104, 106 and 108 are repeated until the HAD process determines that one of the internal processes, VRU 16 or Switching Matrix 12 is not operating appropriately at step 108.

If a failure does occur in the internal processes, VRU 16 or Switching Matrix 12 at step 108, the HAD process determines whether it is the internal processes that have failed at step 110. If the internal processes have not failed at step 110, the HAD process determines whether the call processing platform 10 has lost a threshold number of VRU 16 at step 112. If a threshold number of VRUs are not present at step 112, then an alarm is raised at step 114 and the active HAD process is placed on standby at step 84.

If the active HAD process determines that an internal process has failed at step 110, the active HAD process notifies the standby HAD process to activate, and then orders the active call control computer to go out of service at step 116. The active HAD process is then placed out of service at step 118.

FIG. 6(a) illustrates a first block flow diagram of a Monitor Service (MON) module in accordance with one embodiment of the invention. FIG. 6(a) shows a Mon process which may be running on any of the network devices that are part of CPP 10. At step 120, a Mon process is started. At step 122, the Mon process activate, VRU 16. At step 124, the Mon process is placed out of service. The Mon process then checks the status of the internal processes of the device which is running the Mon process at step 126. The Mon process determines whether all the internal processes are running properly at step 128. If all the internal processes are not running properly at step 128, then the Mon process is placed out of service at step 124. If, however, all internal processes are running properly at step 128, then the Mon process is placed in a standby mode at step 130.

At step 134, the MON process determines whether it has received a MON go active message 132. If it has not received a MON go active message 132, then the MON process remains on standby mode at step 130. If a MON go active message 132 is received at step 130, the Mon process is placed in a waiting mode at step 136.

FIG. 6(b) illustrates a second block flow diagram of a MON module in accordance with one embodiment of the invention. At step 138, the Mon process checks the status of VRU 16. At step 140, the Mon process determines whether VRU 16 ports are ready. If the VRU ports are not ready at step 140, then steps 138 and 140 are repeated until the VRU ports are ready. If the VRU ports are ready at step 140, then the Mon process is placed in active mode at step 14