|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
A portion of the disclosure of this patent document contains materials
which is subject to copyright protection. The copyright owner has no
objection to the facsimile reproduction by anyone of the patent document
or the patent disclosure, as it appears in the patent and trademark
office, patent file or records, but otherwise reserves all copyrights
whatsoever.
This invention relates to the implementation of a managed object system for
monitoring the operation of complex electrical systems and isolating
faults therein. In particular, it relates to the generation, control, and
propagation of alarm conditions within a telecommunications network.
2. Description of Related Art
Today's complex telecommunications systems have thousands of functional
elements which are interdependent in their operation. When a fault occurs
in one of the functional elements, the fault must be detected, and the
faulty element must be isolated for replacement or repair. With thousands
of such elements in modern telecommunications systems, it is not
economically feasible to perform such monitoring and fault isolation
functions manually. For this reason, automated performance monitoring and
fault isolation systems have been developed.
In general, when a fault or a malfunction is detected in an electrical
system, the system puts out an alarm to the operator or operation support
system that is managing the system. If the system has many elements or
managed objects (MOs), there may be a chain of functional dependencies
between the various MOs. In such a case, multiple alarms may be generated
by a single fault, and the need for alarm coordination arises. For
example, if an object A is faulty, it should obviously send an alarm
notification to the operation support system. If an object B is
functionally dependent on object A, and object A is faulty, object B may
also be non-functional and should send an alarm notification as well.
This, of course, results in two alarm notifications caused by a single
fault.
In most cases involving complex telecommunications systems, multiple
objects, rather than the illustrative single object B, will be dependent
upon object A. If B.sub.n denotes all objects that are functionally
dependent upon object A, there may also be objects C.sub.m,n that are
functionally dependent upon object B.sub.n and so on. In such complex
systems, one fault may be detected nearly simultaneously with, or
independently of other detections of the same fault, in different parts of
the system. A serious fault in object A may create malfunction symptoms in
a great number of the B and C objects, which then report the malfunctions
by sending alarm notifications. If the number of notifications is great,
the system experiences a mass alarm situation. The operator or the
operation support system, in the case of a mass alarm situation, may be
flooded with information. The vast amount of information makes it
difficult to take proper corrective action in a reasonable time.
For existing systems, the solution to the problem described above is for
the operation support system to post-process the mass of alarm
notifications. Each network element or managed object sends alarm
notifications as they occur for any abnormalities that are detected. The
operation support system attempts to store the alarm notifications until
all notifications resulting from a particular event are generated and
received. They are then processed off-line to determine the cause of the
mass alarm situation. This approach requires, in the case of complex
telecommunication systems, an expensive, high capacity management system
with an accurate model of the supervised electrical system. Even if the
operation support system can handle the large number of alarms, the
telecommunications system remains inoperative or degraded until the post
processing can be completed and the cause of the problem identified and
corrected.
In telecommunications systems, mass alarm conditions often lead to the
failure of the high capacity management systems, and experienced trouble
shooters are required to manually isolate the fault and effect repairs.
Such failures lead to increased cost of operation and increased amounts of
down time of the telecommunications system.
Therefore, it would be a distinct advantage within the telecommunications
industry to have a model-based alarm coordination system which is more
intelligent in its reporting of detected malfunctions in order to avoid
mass alarm situations. The system of the present invention provides such a
system.
SUMMARY OF THE INVENTION
In one sense, the present invention is a model-based alarm coordination
system for controlling the reporting of faults in complex electrical
systems having a plurality of managed objects. The coordination system
detects out-of-specification performance in the plurality of managed
objects and differentiates between managed objects which have
out-of-specification performance due to internal faults, and managed
objects which have out-of-specification performance due to faults in other
managed objects. The managed objects with internal faults are then
localized.
In another aspect, the present invention is a model-based alarm
coordination system for identifying and localizing faults in complex
electrical systems having a plurality of managed objects. The coordination
system generates primary alarm notifications within those managed objects
which are fault-causing, and secondary alarm notifications within those
managed objects which are not fault-causing, but which are affected by the
fault-causing managed objects. The system also generates requests to
coordinate the primary and secondary alarm notifications, and requests to
localize the fault-causing managed objects. Additionally, fault
identification messages are generated within the managed objects in
response to the sensing of faults by the managed objects. The fault
identification message then identifies the generating managed object and
the type of fault sensed. The system further establishes dependency
relationships between the managed objects, and transmits the fault
identification messages from managed objects which are sensing faults to
the managed objects with which the dependency relationships exist. The
fault identification message is transmitted in conjunction with the
primary alarm notifications, secondary alarm notifications, requests to
coordinate, and requests to localize.
In still another aspect, the present invention includes a method for
controlling the reporting of faults in complex electrical systems having a
plurality of managed objects. The method comprises detecting
out-of-specification performance in the plurality of managed objects and
differentiating between managed objects which have out-of-specification
performance due to internal faults, and managed objects which have
out-of-specification performance due to faults in other managed objects.
The managed objects with internal faults are then localized.
In yet another aspect, the present invention includes a method for
identifying and localizing faults in complex electrical systems having a
plurality of managed objects. The method comprises generating primary
alarm notifications within those managed objects which are fault-causing
and generating secondary alarm notifications within those managed objects
which are not fault-causing, but are affected by the fault-causing managed
objects. Requests to coordinate the primary and secondary alarm
notifications and requests to localize the fault-causing managed objects
are generated within the managed objects as well as fault identification
messages which are generated within the managed objects in response to the
managed objects sensing a fault, where the fault identification message
identifies both the generating managed object and the type of fault
sensed. Dependency relationships are established between the managed
objects and fault identification messages are transmitted from managed
objects sensing faults to managed objects with which the dependency
relationships exist, where the fault identification message is transmitted
in conjunction with the primary alarm notifications, secondary alarm
notifications, requests to coordinate, and requests to localize.
It is an object of the present invention to provide the user with the
capability to couple fault/fault symptom alarms to the faulty unit,
thereby making it possible for the user to trace the actual fault and take
corrective action.
It is another object of the invention to assemble and present coordinated
alarms to the receivers in a consistent manner.
It is still another object of the invention to automatically and
unequivocally localize faults in the system to a single replacement unit
including both hardware or software.
It is still yet another object of the invention to identify faults in the
system by a unique fault identification message which consists of a
reference to the faulty managed object, fault number, alarm type, and
problem type. All primary and secondary alarm notifications generated by a
fault contain the same fault identification message.
It is another object of the invention for all managed objects affected by
the same fault to store an identification of the fault which may be
retrieved by the operator.
Through the above, the operator may more easily determine the consequences
of a fault. Additionally, uniform fault management and alarm reporting is
achieved throughout the system. Although the system and method of the
present invention provides a decentralized fault-management scheme, it may
be combined with more traditional centralized fault management systems,
thus increasing their capability and flexibility.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be better understood and its numerous objects and
advantages will become more apparent to those skilled in the art by
reference to the following drawing, in conjunction with the accompanying
specification, in which:
FIG. 1 is a block diagram illustrating the manner in which relationships
between managed objects (MOs) are established, when one object is
functionally dependent on the other, within the model that is part of the
system of the present invention;
FIG. 2 is a block diagram illustrating a hierarchy of dependencies between
MOs in a complex electrical system through which the operational or alarm
state is propagated in accordance with the teachings of the present
invention;
FIG. 3 is a block diagram illustrating a dependency relationship between
fault-causing objects and fault-detecting objects in one embodiment of the
system of the present invention;
FIG. 4 is a block diagram illustrating a method by which fault
identification messages, alarm notifications, and "time-out" localization
determinations are propagated throughout an electrical system in one
embodiment of the system of the present invention;
FIG. 5 is a block diagram illustrating the propagation of a coordination
request along dependency lines and through multiply dependent objects in
accordance with the teachings of the present invention;
FIG. 6 is a block diagram illustrating a recursive fault localization
scheme utilized in one embodiment of the system of the present invention
through which it is determined which server object in an electrical system
is faulty;
FIG. 7 is a block diagram illustrating a representative chain of MOs along
with their dependency relationships and propagation of a localize
function;
FIG. 8 is a graphical illustration of the interaction between MOs and
between application parts and general parts of MOs when a fault has been
discovered, in one embodiment of the system of the present invention;
FIG. 9 is a block diagram illustrating a "pool" MO and its relationship
with pool members in one embodiment of the system of the present
invention;
FIG. 10 is a block diagram illustrating a portion of a complex electrical
system where many internal MOs are functionally dependent upon a single
external resource in the system of the present invention;
FIG. 11 is a flow chart of an FHSupport Program which implements the
functions of localizing faulty MOs, coordinating affected MOs, updating of
MO fault states, and sending of alarm notifications in one embodiment of
the system of the present invention;
FIG. 12 is a flow chart illustrating the actions performed by a
functionally dependent MO upon receiving a coordinate request from a
server MO in one embodiment of the system of the present invention;
FIG. 13 is a flow chart of the steps taken by a server MO upon receipt of a
localization request from a functionally dependent MO in one embodiment of
the system of the present invention; and
FIG. 14 is a block diagram illustrating the implementation of message
sending services between two managed objects in one embodiment of the
system of the present invention.
DETAILED DESCRIPTION
The system of the present invention is a model-based alarm coordination
system which coordinates primary and secondary alarm notifications in
order to ascertain whether they are caused by a single fault, or multiple
faults, in a complex electrical system. The alarm coordination function is
part of a larger overall Fault Management Support (FMS) system. The FMS
system is a frame work that, when combined with object-specific fault
management parts, offers uniform fault management functions to managed
objects (MOs) within a complex electrical system such as a
telecommunications system. Each MO is viewed as a self-contained,
functional unit, and is responsible for its own internal fault management.
Therefore, in the present system there are no global or centralized fault
management functions.
The present system for alarm coordination provides designers and
programmers of telecommunication systems with a frame work for defining
functional dependencies between objects. These object relation models are
used to automatically solve the alarm coordination problem in a
generalized and standardized way. Relatively little object-specific
programming is required.
The FMS system consists of three parts: an operation support system (OSS),
fault handling support, and repair handling support. The OSS provides
overall management support and an interface for human operators. Fault
handling support performs the functions of alarm coordination, fault
localization, and alarm information packing. Repair handling support
controls the repair process for hardware units by performing multiple
functions which enable component replacement without expert skills. Repair
handling support is not a subject of this patent.
Model-based alarm coordination is required when a single fault triggers
multiple fault notifications due to the relationships between various MOs
in the electrical system. In a complex system, such as a
telecommunications exchange, the functional elements are stratified, and
as previously noted, objects from different levels may be functionally
related. As shown by the illustration in FIG. 1, two roles in the
relationship are identified: the client 10 and the server 11, where the
client 10 is functionally dependent on the server 11. One of the
consequences of this dependency relationship 12 is that the operational
state of the server 11 is propagated to its clients 10. For example, when
the server 11 is blocked (its operational state has changed to disabled),
the client 10 is said to be secondarily blocked.
FIG. 2 is a block diagram illustrating a hierarchy of dependencies between
MOs in a complex electrical system through which the operational or alarm
state is propagated in accordance with the teachings of the present
invention. When there is a hierarchy of dependencies, there is a hierarchy
of client/servers 13, where the servers on one level are clients to
servers on the next level and so on. Most complex electrical systems
contain such hierarchies of dependencies.
FIG. 3 is a block diagram illustrating a dependency relationship between a
fault-causing object A 14 and a fault-detecting object B 15 in one
embodiment of the system of the present invention. The relationship "Is
Dependent Upon" 16 means that object B is dependent upon object A to
maintain its operability. The "Is Dependent Upon" relationship 16 is used
to propagate information between the objects 14 and 15. The information
transferred is mainly fault identification messages denoting the nature
and location of the actual fault. These fault identifications are then
used by the alarm notifications. Secondary alarm notifications for the
symptoms contain the same fault identification as the primary alarm
notification for the fault cause.
In most complex electrical systems, multiple objects (B.sub.n) are
functionally dependent upon object A, therefore, multiple alarm
notifications are generated when object A fails. The system of the present
invention coordinates the respective alarm notifications with each other
so that it is clear to the operation support system that the alarm
notification from B.sub.n, and each object between B.sub.n and object A,
is a consequence of object A's alarm. This coordination is based on two
conditions:
1. Fault management functions are distributed in the different managed
objects. Each MO implements its part of the fault management scheme; and
2. If an MO is functionally dependent on another MO, the nature of that
relationship is established in a model between the objects.
Coordination is mainly a matter of informing involved objects about the
fault identity. Alarm coordination is thus implemented by propagation of
fault identifications between MOs. The effect of the propagation is to
link faults and their symptoms to each other in order to tell that they
are caused by the same fault in the system. The fault identification is
created by the fault-causing MO and is stored in all affected MOs. Alarm
coordination uses relationship references, mainly the "Is Dependent Upon"
relationship, but also, the "Is Handled By" relationship (described
later), to communicate between MOs. The "Is Dependent Upon" relationship
is bi-directional because a fault coordination function uses one direction
and a fault localizing function uses the other.
In one embodiment an ELIN.sup.1 -specification is used to specify the "Is
Dependent Upon" relationship. In the following example, RefClient and
RefServer are used to denote two reference attributes, one from each
object in the relationship, although other names may be used:
______________________________________
ELIN-specification:
PERSISTENT ADT Server
BASE Cofms.sub.-- FHSupport;
. . .
ATTRIBUTES
. . .
RefClient: REFERENCE MANY TO Client
INVERSE RefServer;
. . .
END ADT Server;
PERSISTENT ADT Client
BASE Cofms.sub.-- FHSupport;
. . .
ATTRIBUTES
. . .
RefServer: REFERENCE TO Server
INVERSE RefClient;
. . .
END ADT Client;
.circleincircle. 1993 Telefonaktiebolaget L M Ericsson
______________________________________
.sup.1 ELIN is a programming language especially developed for
telecommunications systems and is described in the publication entitled
"ELIN REFERENCE MANUAL", attached as Appendix A to copending patent
application entitled "System for Dynamic Run-Time Binding of Software
Modules in a Computer System", filed Jul. 1, 1992, Ser. No. 07/907,307 by
Kenneth Lundin et al., hereby incorporated by reference herein.
In this example, a one-to-many cardinality is specified, but other
cardinalities may be specified by inserting or omitting the keyword MANY.
The keyword INVERSE indicates that the "Is Dependent Upon" relationship is
bi-directional, as indicated above.
If a fault occurs, it may be detected either by the object in which the
fault resides or by a functionally dependent (client) object. The most
common case is that in which the fault is first detected in the server
object actually causing the problem. Referring again to FIG. 2, this
object is labelled S.sub.n+1. The following measures are taken in the
object S.sub.n+1 in the normal case wherein its internal error detection
function determines a fault:
1. Primary protection is initiated. If the operational state of S.sub.n+1
is "enabled," it is set to "disabled." If the operational state is already
in the disabled position, it remains there.
2. A self test is performed to determine the precise cause of the problem.
If, however, the self test cannot find an internal object fault, or if the
test indicates that the problem is with one of its servers, the need for
fault localization arises.
3. A fault identification message is generated which is unique to the fault
identified.
4. The operational state of S.sub.n+1 is propagated all the way up to
C.sub.1. A list of references to all secondarily blocked objects may be
maintained during the propagation transaction and may then be associated
with S.sub.n+1.
5. An alarm notification is generated which includes the list of
secondarily blocked objects.
Any object that receives the message that one of its servers is disabled
will perform these same activities.
As mentioned above when an object's internal error detection function
determines a fault, the object's operational state may already be in the
disabled condition. This may be caused by a fault identification
propagation from one of its servers. In this case the primary protection
phase is entered in an already secondarily blocked object. This may have
one of two explanations:
1. The detected secondary blocking is due to a fault identification
propagation from one of its servers; or
2. The detected secondary blocking is due to another, potentially
historical, problem.
The object must, through its own internal fault detection function,
determine which explanation is correct. This is done by a functional self
test. If the functional self test finds a fault in this object, it is a
new fault, and the object's operational state is set to "primary disabled"
which starts a new chain of propagation. If the self test provides no
evidence of internal faults, the system determines whether or not there is
a new fault in another object which is causing the secondary blocking, or
whether the secondary blocking is, in fact, caused by the fault which the
fault identification message indicates. This again raises the need for
fault localization.
FIG. 4 is a block diagram illustrating a method by which fault
identification messages, alarm notifications, and "time-out" localization
determinations are propagated throughout an electrical system in one
embodiment of the system of the present invention. It can be seen that a
fault identification message 17 is created and stored within the
fault-causing MO 18. The fault identification message 17 is included in a
coordination request 21 which is propagated to all functionally dependent
MOs 19, 20, where the fault identification 17 is stored. Concurrently, the
fault-causing MO sends out a primary alarm notification 22. Secondarily
affected MOs 19, 20 send out a secondary alarm notification 23 which
includes the fault identification message 17. The coordination of related
MOs is achieved by propagating requests along dependency relationship
lines 16 between the MOs, and storing the same fault identity 17. The
coordination of alarm notifications 22, 23 is accomplished by including
the same fault identity 17 in the alarm information.
FIG. 5 is a block diagram illustrating the propagation of a coordination
request 21 along dependency lines 16 and through multiple dependent
objects. It can be seen that an MO 31 with multiple lines of dependency 16
propagates the coordination request 21 along all lines to dependent MOs
32-34.
As discussed above, the need for fault localization arises when the self
test of an MO determines that the fault is outside of that object. The
fault is then known to exist in one or more of that object's servers, but
a specific server is unknown. There are two ways in which to locate the
faulty server. The two methods may be used independently, or the methods
may be combined.
The first method is a "time-out" method of localizing a faulty MO, and is
also illustrated by FIG. 4. As a design specification, any MO in the
electrical system that develops a fault for any reason must detect that
fault within a specified time, .tau..sub.1. The time for propagation of a
fault identification message from server objects to a particular MO is
designated by K, and may be estimated. If an MO 20 performs a self test
and determines that the problem is outside the object, then a propagated
secondary blocking (coordination request) 21 will arrive within a certain
time period .tau..sub.2, where .tau..sub.2 =.tau..sub.1 +K. Therefore the
system waits for a time .tau..sub.2, and if no propagation has arrived at
the MO 20, the system has "timed out," and the affected MO 20 generates an
alarm notification. The alarm notification may not point out the specific
faulty server object 18, but it indicates that the fault resides in a
server object 18, 19 for this particular MO 20.
Alternatively, if no server objects 18, 19 for this MO 20 are reporting
faulty, the secondary alarm notification 23 from this MO may be seen by
the operation support system as an indication that this MO's self test is
insufficient, and that a primary alarm should be sent. If, however, the
propagation 21 arrives within the stipulated time, .tau..sub.2, it may
continue through the hierarchy and result in an appropriate alarm
notification in due time.
The second method of isolating which server object is faulty is through a
recursive fault localization scheme as shown in FIG. 6. By using the same
relationship references 16 as in the coordination case to communicate
between objects, the faulty server 18 can be located by implementing a
recursive localize propagation 41 requesting the next MO up the chain of
MOs 40 to perform a functional check and report its operability. Starting
with the fault-detecting MO 42, each object 43 in the chain 40 performs a
self test, and if it finds itself affected but not faulty, it questions
the next object in the chain.
FIG. 7 is a block diagram illustrating a representative chain of MOs 50
along with their dependency relationships 16 and propagation of a localize
function 41. The propagation 41 starts with the fault-detecting MO 51 and
proceeds along the "Is Dependent Upon" relationship 16 to server MOs 52
and 18. Eventually there is an object 18 that may be pointed out as faulty
either by the answer from the object's self test or if the next object up
the chain is not affected. The fault-causing object 18 usually detects the
fault itself, but there are cases where other objects detect symptoms
prior to the causing object. Fault localization takes care of these
situations as well as the case when several objects detect symptoms at the
same time.
Several interfaces operate within the fault management system (FMS) to
coordinate the activities of the fault handling support function of the
FMS. The following interfaces are present:
1. Fault Handling Management Interface (FH-MI). This interface between
fault handling support and the operation support system (OSS) provides the
capability for an operator to configure the fault handling support
function. Through the FH-MI interface, the MOs may be configured to stop
the sending of secondary alarm notifications. The contents of some alarm
notifications may also be changed.
2. Fault Handling Object Programmer's Interface (FH-OPI). This interface is
used to indicate that a fault/fault symptom has been detected or that a
fault situation has been cleared. By using this interface, primary and
secondary alarms are coordinated, and an unequivocal view of the alarm
situation is provided at the management level. Support for collecting
alarm event information is provided in order to report alarm events with
consistent contents and format.
3. Fault Handling--Propagation Interface (FH-PropI). This interface is used
to send requests for localization and coordination between MOs. The
purpose of the messages are to locate the faulty MO and to coordinate MOs
affected by the same fault.
Fault handling support performs the functions of alarm coordination, fault
localization, and alarm information packing. The fault handling function
is primarily aimed at supporting the implementation of interface functions
with the operation support system; it is not supporting object-specific
internal fault handling. Fault handling support provides general support
for all types of managed objects, and may be used for both hardware and
software faults.
The Fault Handling--Propagation Interface (FH-PropI), as noted above, is
used to send requests for localization and coordination between MOs. The
purpose of the messages are to locate the faulty MO and to coordinate MOs
affected by the same fault. FH-PropI uses four messages to accomplish its
functions: Localize, Coordinate, Not Faulty, and Clear Fault. The Localize
message 41 (FIG. 6) is sent from a dependent MO to a server MO, and
directs the server MO to take part in a localizing activity aimed at
finding a faulty MO. The sending MO is not fully operational, but does not
consider itself as being faulty. The response to a Localize message is
either a Coordinate or a Not Faulty message.
The Coordinate message 21 (FIG. 4) is sent from a server MO 18 to a
dependent MO 19, 20, and tells the receiving MO that the faulty MO 18 is
found and, if the receiving MO's operability is affected, it should be
coordinated with other affected MOs. The MOs are coordinated by storing
the same fault identification 17 in each. The fault identification is
included as an argument in the Coordinate message 21. There is no
responding message to a Coordinate message.
The Not Faulty message is also sent by a server MO to a dependent MO in
response to a Localize message 41. It tells the receiving MO that the
sending MO is fully operational and is not affected by any fault. There is
no responding message to a Not Faulty message.
The Clear Fault message is sent by a server MO to a dependent MO when the
sending MO is fully operational after having been affected by a fault.
There is no responding message to a Clear Fault message.
Fault handling support may be sub-divided into the following parts:
1. FHSupport Program. This program implements the functions of localization
of faulty MOs, coordination of affected MOs, updating of the fault state,
and sending of alarm notifications. The FHSupport Program is described in
more detail in conjunction with FIG. 12 below.
2. FHSupport Propagations. This program implements propagation services for
the coordinate and the localize functions. It utilizes the interface
FH-PropI to send and receive fault messages between related MOs. The
FHSupport Propagations program is described in more detail in conjunction
with FIG. 16 below.
3. FHSupport State Handler. This program stores fault status information
and reflects this status in the MO state. It combines MO-specific status
information with general MO status information required from all MOs. The
FHSupport State Handler program is described in more detail in conjunction
with FIG. 17 below.
4. FHSupport Alarm Handler. This program collects the information needed
for alarm notifications, and sends the notifications to a notification
handler.
The fault-causing object usually detects the fault itself, but there are
cases where other objects detect symptoms prior to the causing object.
Fault localization takes care of these situations as well as the case when
several objects detect symptoms at the same time. Faults in the electrical
system are always automatically and unequivocally localized to a single
replacement hardware or software unit.
The alarm information packing function collects the error detection
information needed to create an alarm notification. Alarm notifications
include a unique fault identity which enables the faulty MO to be
localized and repaired. The information included in the fault identity is
referred to as "attributes," and may be specified in a list of attributes
supplied by the fault handling support function. The attributes defined by
fault handling support are Event Type, Probable Cause, Severity, Threshold
Info, Proposed Repair Action, Problem Text, and Problem Comment. Event
Type, Probable Cause, and Severity are mandatory attributes which must be
reported by each MO; the rest are optional. Additional attributes may be
programmed in a specific MO to complement this list. All alarm
notifications created or utilized by the present invention comply with the
specifications of the CCITT and ANSI T1M1 standards for fault management
and managed object modelling.
The mandatory attribute Event Type tells what type of alarm event was
detected, and may comprise one of the following sub-attributes:
Communication Alarm, Quality of Service Alarm, Processing Error Alarm,
Equipment Alarm, or Environmental Alarm.
The mandatory attribute Probable Cause is used to further refine Event
Type, and may comprise one of the following sub-attributes: Loss of
Signal, Framing Error, Local Transmission Error, Remote Transmission
Error, Call Establishment Error, Response Time Excessive, Queue Size
Excessive, Bandwidth Reduced, Retransmission Rate Excessive, Reduced
Reliability, Storage Capacity Problem, Version Mismatch, Corrupt Data, CPU
Cycles Limit Exceeded, Software Error, Out of Memory, Power Problem,
Timing Problem, Trunk Card Problem, Line Card Problem, Processor Problem,
Terminal Problem, Data Set Problem, External Interface Device Problem,
Multiplexer Problem, or Switch Problem.
The mandatory attribute Severity is used to indicate the importance of an
alarm event, and may comprise one of the following sub-attributes:
Indeterminate, Critical, Major, Minor, Warning, and Clear.
The optional attribute Threshold Info contains information when the
detection is the result of a threshold crossing, and may indicate whether
an upper or a lower threshold value was crossed. The optional attribute
Proposed Repair Action is used when the system is able to suggest a
solution. The optional attribute Problem Text provides a free-form text
description of the problem detected.
Error detections are modelled in the MO that represents the supervised
resource, not the supervising resource. For example, when a computer
hardware unit is executing software that supervises a signalling link, a
failure of the link is reported as an error detection in the link MO, not
in the software or hardware MOs.
The operation support system receives the alarm notifications and uses them
for purposes such as:
1. Initiating repair actions such as replacing the faulty equipment
indicated in the alarm notification;
2. Initiating network reconfiguration. If an alarm notification indicates a
malfunctioning line of communication, the management system may change
routing information in the network in order to bypass the faulty line; and
3. Building a database. The management system may be designed to merely
record the status of the various communications lines for future analysis.
The alarm coordination components of | | |