|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to a system for testing electronic system design and, specifically, to an artificial intelligence system for system level failure analysis.
BACKGROUND OF THE INVENTION
Changing technology in the electronics field is driving the design of avionics systems, as well as other electronics applications, towards increased performance, functionality, and sophistication. As avionics systems become more complex, it
becomes more difficult to analyze an entire system to provide a clear perspective of system behavior in the presence of faults. This difficulty in analyzing system behaviour affects the areas of system design, testing, and maintenance.
In avionics systems, failure propagation and fault isolation are two important aspects of system behavior. Failure propagation refers to the effect on overall system performance of a system component failure. Fault isolation refers to the
process of locating a failed system component. A failed component normally creates a failure indication on the flight deck of an aircraft, commonly referred to as a flight deck effect. The flight deck effect is thus the clue to the faulty component.
These same types of failure analyses must be performed in other electronics systems that share certain common characteristics with avionics systems.
Avionics systems are designed as and built from subsystems. The subsystems are interconnected by signal carrying buses, connected to power sources, etc. Most subsystems contain one or more Line Replaceable Unit(s) (LRU). By design, each LRU
performs a specific task and is engineered to meet specific specification requirements.
Avionics subsystems were at one time relatively simple and tended to function somewhat independently of other subsystems. System operational characteristics and behavior in the event of equipment failures could be analyzed by a few engineers
using available analytical and test tools. With the advent of digital avionics, subsystems have become more complex, more integrated with, and more inter-dependent upon one another. The engineering resources required to analyze and test these
integrated systems are far greater than were required for the earlier, simpler systems.
One of the complex and challenging tasks facing modern avionics design/development engineers is that of defining all significant effects on the avionics system of one or more equipment failures. A comprehensive analysis requires the
consideration of multiple avionics system operating configurations. Massive software simulation and/or hardware test systems are often developed to assist engineers in performing the required design and system validation analyses. The software
simulation and hardware test systems are essentially an attempt to duplicate the entire avionics system. In order to analyze such a software simulation system in real-time, one or more large mainframe computers are required. Generally, such simulation
systems utilize conventional programming language, e.g., FORTRAN or PASCAL. The use of these languages contributes to the immense size of the simulation systems.
One of the tasks for which these software simulations or test systems may be used is to assess failure effects of simulated system faults. These simulation and test systems take a long time to develop and the results they provide may come late
in a development program when indicated system changes are costly to make.
When an avionics system design is completed, all of the development information and performance requirement information are integrated into a package of avionics system documentation that describes an "as-built" system. The documentation
includes LRU specifications, signal bus connections, power source information, etc. The avionics system documentation is later used as a reference tool by the maintenance crews responsible for the maintenance and repair of the avionics system. Such
comprehensive system documentation is not available until the system is completed, i.e., until LRU designs and interconnections are finalized.
Modern avionics systems are designed to identify to the maintenance crew the faulty LRU(s) in the event of equipment failure(s). Many subsystems include self-diagnostic capabilities. An LRU fault may be flagged by a light on the LRU itself, or
an indication on an overall system operation indicator. Sometimes, replacing the identified LRU does not clear the fault indication. When this occurs it may be necessary for the maintenance crew to manually trace and isolate the fault using test
equipment and system design information provided in the maintenance manuals. Fault isolation may also be performed when no fault identification is provided by the avionics system but a fault is otherwise apparent.
During a fault isolation analysis, the failure response, i.e., flight deck effect, is traced "backwards" to a source LRU that may have caused the flight deck effect to be generated. During the analysis, the aircraft operating configuration at
the time the flight deck effect was generated must be taken into account when the system analyst refers to the system documentation. As with the failure propagation simulation, manually isolating faults may be a time-consuming and difficult process.
Certain characteristics that are inherent in complex avionics systems have heretofore precluded efficient comprehensive analysis of system failure response. These characteristics include: the existence of multiple component levels in a system;
the use of redundant signal sources; LRU behavior response based on system mode and external conditions; and the need for extremely fast system response to failures, generally on the order of tenths of seconds. These characteristics produce a system
that is difficult to physically test and to model for design, testing and maintenance purpose. Other complex electronic systems including some or all of these characteristics suffer from the same testing and modeling problems as avionics systems.
The effect of multiple component levels on failure analysis is illustrated by the fact that an indication that an LRU has failed may be generated due to a fault elsewhere in the system. The LRU, or a signal output by the LRU, may fail because of
a power failure to the LRU, an input bus physical failure, an input signal failure, a failed LRU component, etc. The system complexity suggests that if an LRU is only suspected of being in a failed state, it is preferable to first determine the exact
source of the failure rather than immediately replacing the LRU in an attempt to rectify the problem.
One reason for the signal source redundancies in avionics systems is the high level of functional availability required in aircraft operation. As an example of the use of signal source redundancy, important LRU signals are usually generated by
more than one source, e.g., a left and right source. The LRUs that receive multiple signals must be provided with a way of selecting an initial signal source, determining when the signal source has failed, and selecting an alternate source. In
instances where there are more than two signal sources, e.g., left, center, and right sources, a hierarchy of source selection is required. Source selection may be automatically controlled by hardware or software that is a part of an LRU or the avionics
system. Additionally, for certain subsystems, the source selection is performed manually by the flight crew in response to a flight deck effect. In order to comprehensively test avionics system behavior, it is necessary to test the system response in a
variety of operating configurations that are each defined by a set of initial source selections.
An LRU's behavior may be highly dependent upon system mode and external conditions. The system mode describes the aircraft's flight trajectory. The trajectory includes take-off, climb, cruise, descend, land, and go-around modes. Each mode is
further described by pitch, roll, and throttle, i.e. power, characteristics. External conditions include altitude, speed, etc. Thus, besides having certain behavioral characteristics based on input source selection, an LRU's behavior may also be
contingent on the system mode and/or external conditions.
Present avionics design systems and automated test equipment usually do not provide tools for testing the effects of LRU design modifications on the overall system prior to system completion. Further, once an avionics system design is completed,
it is a time consuming and tedious process to simulate the propagation of a failure or to isolate a fault in the avionics system using system documentation and test equipment. Until now there were no simulation or test systems that could satisfy in a
timely manner the needs of both design/development engineers and airline maintenance personnel in the area of failure analyses. The present invention is directed toward overcoming the problems in design, testing, and maintenance of avionics systems as
described above, as well as other problems in the prior art.
SUMMARY OF THE INVENTION
The failure analysis system of the present invention provides a method and apparatus for performing failure analysis on a simulated electronic system. The electronic system to be simulated and analyzed is composed of subsystems and can be
described in terms of subsystem interconnection, mode selection, subsystem input signal source selection, and subsystem failure response indications. A characteristic of the electronic system is that at least two of the subsystems are related in that
they transmit redundant output signals to a destination subsystem. For each multi-source subsystem, the electronic system must include at least one switching mechanism for selecting one of the redundant subsystems as the primary input signal source.
Further, the electronic system to be simulated and analyzed has one or more system modes. The operating configuration of the electronic system depends on the selection of the input signal source(s) and/or the system mode. The failure analysis system
utilizes a knowledge base including data that defines a variety of operating configurations for the electronic system.
In accordance with the present invention, the failure analysis system obtains simulation condition data from a system analyst in order to identify the electronic system operating configuration that is to be analyzed. The failure analysis system
also receives failure data, which identifies the failure to be analyzed. The failure analysis system simulates the propagation of the failure's effect through the electronic system, by analyzing the failure data via the knowledge base in accordance with
the simulation condition data. If the knowledge base describes an as-built electronic system, the failure analysis system simulates the behavior of the electronic system in the operating configuration identified by the simulation condition data. By
altering the simulation condition data, a number of operating configurations can be tested. Alternatively, during the design of the electronic system, the knowledge base can be modified to reflect modifications to the electronic system design. The
failure analysis system then simulates the propagation of a failure's effect through an electronic system built according to the modified knowledge base having the specific operating configuration described by the simulation condition data.
In accordance with further aspects of the present invention, the simulation condition data includes source selection data identifying a switching mechanism selection.
In accordance with additional aspects of the present invention, the simulation condition data includes mode selection data identifying a system mode of the electronic system.
In accordance with the other aspects of the present invention, the failure analysis system is integrated into a processor including a central processing unit, a memory, and input and output devices. The failure analysis system comprises a
knowledge base, a user interface and a failure analysis means. The knowledge base describes the electronic system in a variety of operating configurations related to the status of one or more source selection switching mechanisms and/or the system mode. The knowledge base includes subsystem interconnection data and rule data. The rule data includes subsystem failure rules and source selection rules that describe the subsystem failure responses and input source selection logic. The rule data includes
operating configuration contingencies. The user interface receives simulation condition data and failure data from a system analyst or other data source. The failure analysis system identifies, from the interconnection data, which destination
subsystems are directly connected to the failed subsystem and determines, from the rule data, each destination subsystem's response to the failure. If the destination subsystem's response is to fail, then that destination subsystem is treated as the
next failed subsystem. The steps of identifying the destination subsystems and determining the destination subsystems' responses are repeated until all of the failure paths are considered. The result of the simulation is a set of subsystem failure
responses that would occur in the electronic system if the initial subsystem failure actually occurred.
In accordance with other aspects of the present invention, the failure analysis means further includes means for predicting the source of a failure response by simulating the isolation of a fault in an electronic system that is described by the
knowledge base data in accordance with the simulation condition data. The user interface receives failure response data from the system analyst or other data source. The failure response data identifies a failure. Each source subsystem which may have
generated the failure response is identified by referencing the interconnection data. Each of the failure rules that may have caused a source subsystem to generate the failure response are also identified. The responses identified by the failure rules
are then treated as the next failure responses. The steps of identifying possible source subsystems and related failure rules are repeated for each next failure response until all failure source subsystems are identified. The failure analysis means
produces a list of the possible source(s) of the initial failure response in an electronic system in an operating configuration that is described by the integration of the knowledge base with the simulation condition data.
In accordance with additional aspects of the present invention, the electronic system is an avionics system and the subsystems contain Line Replaceable Unit(s) (LRU). Additionally, the failure analysis means includes expert system means for
controlling the failure propagation simulation and fault isolation analysis, and an artificial intelligence shell for interpreting the expert system means via the central processing unit. The failure analysis means is dynamic, allowing early
implementation of the test system, i.e., with only partial electronic system design information. Further, it is a straightforward task to keep the failure analysis system current with the electronic system development.
In accordance with still further aspects of the present invention, the operating configuration is also related to external conditions and the simulation condition data includes data describing external conditions such as airspeed, altitude, etc.
The failure analysis system of the present invention is an electronic system development tool that is used to identify potential subsystem design and interconnection problems as the overall system is being developed. One reason the system is
efficient is because it is designed for the specific purpose of analyzing system behavior. The system allows simulated failure effect propagation through an electronic system to be readily analyzed in a variety of operating configurations. This
approach to performing system failure analyses, beginning early in the design cycle, yields substantial benefits to system developers. The failure analysis system is also a useful maintenance and trouble shooting tool since it is useful for simulating
failure effect propagation and performing fault isolation analysis in completed electronic systems by establishing a knowledge base from the as-built system documentation and taking into account actual source selections, system mode, and external
conditions during the simulation. Further, for avionics systems, the failure analysis system generates the LRU failure information, e.g., a certification failure analysis deliverable, that is required to obtain FAA certification.
BRIEF
DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a preferred embodiment of a failure analysis system formed in accordance with the present invention;
FIG. 2 is a screen presentation of the simulation conditions that are established by the embodiment of the present invention illustrated in FIG. 1;
FIGS. 3A and 3B are screen presentations of the LRU failure rules and source selection rules that are established by the embodiment of the invention illustrated in FIG. 1;
FIG. 4 is a schematic diagram of a subsystem failure effect propagation scenario;
FIG. 5 is a flow diagram of a simulated failure effect propagation in accordance with the present invention; and
FIG. 6 is a flow diagram of a fault isolation analysis in accordance with the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
With reference to FIG. 1, the failure analysis system 10 of the present invention includes a knowledge base 12, a user interface 14 and a failure analysis engine 16. For ease of description, the failure analysis system will be described in terms
of an avionics system. However, it is to be understood that a failure analysis system formed in accordance with this invention is also suitable for use with other electronic systems that exhibit certain operating characteristics related to subsystem
connectivity, redundant signal source availability, system mode, external operating conditions, and temporal constraints on operation.
An avionics system's operating configurations are generally defined by the knowledge base 12. Taken as a whole, the knowledge base describes a range of avionics system physical configurations and behavioral characteristic for a single aircraft.
To identify a specific operating configuration for a simulation, simulation condition parameters are received by the user interface 14 from a system analyst or other source. The system analyst also identifies the type of simulation to be performed,
i.e., failure propagation or fault isolation, and selects the component failure or failure response to be analyzed via the user interface.
During the simulation of a failure effect propagation, the failure analysis engine 16 simulates the propagation of the initial failure's effect through an avionics system described by the knowledge base and the simulation condition parameters.
The result is a list of failures that would result from the initial failure. Additionally, a list of flight deck effects, which are the failure or warning indications that are presented to the flight crew via the flight deck, is generated. During a
fault isolation analysis, the failure analysis engine processes the selected failure response "backwards" through the knowledge base to simulate a fault isolation analysis in an avionics system described by the knowledge base and the simulation condition
parameters. The procedure results in a list of possible failure sources, i.e., faulty components, the failure of which may have caused the flight deck effect to be generated. Because the knowledge base along with the simulation condition parameters
represents a specific avionics system operating configuration, by altering the knowledge base and/or the simulation condition parameters, a different avionics system operating configuration is defined and can be analyzed.
In one actual embodiment of the invention, the failure analysis system was hosted on an APOLLO workstation. The workstation included a central processing unit, memory, an output screen, and an input keyboard and/or mouse. In one embodiment, the
failure analysis system is stored in the workstation memory. Alternatively, the failure analysis system may be downloaded from another source that is connected to the workstation.
The knowledge base 12 includes fact data and rule data that describe the avionics system and its behavior. In one embodiment, the fact data is subsystem interconnection data representing the physical interconnections between the Line Replaceable
Units (LRU), i.e., path data, and the signal transfer between the LRUs, i.e., signal data. In one actual embodiment of the failure analysis system, the subsystem interconnection data was extracted from avionics system documentation describing LRU hard
wire interconnections, an electrical power distribution database, and bus switching information included in a standard wiring diagram manual.
The rule data includes LRU failure and source selection rules. LRU failure rules describe how a given LRU will respond, in a specific operating configuration, if an input signal to the LRU fails. The LRU may totally or partially fail thereby
affecting the status of the LRU's output signals. A simple example of a failure rule is "if signal-1 on bus-1 fails, then LRU-2 fails." The failure rule data also describes the results of an LRU failure, i.e., the subsequent condition of the failed
LRU's output signals. Since the failure response of an LRU is considered a part of its function and is a design specification element, a major portion of the LRU failure response rules are extracted from LRU design specification information or from LRU
requirements documents. Additional information related to an LRU response is obtained from the individual design engineer(s) responsible for the specific LRU design.
The source selection rules define how each LRU chooses its input data given the operating configuration and the status of its data sources. The source selection rules describe source switching that occurs at the signal level and is software
driven. Typically, an LRU monitors input from a data source for signal validity. If the LRU determines that the input signal is invalid, its software control logic selects an alternate data source, if one is available. A simple example of a source
select rule is "if signal-1 on bus-1 fails, then LRU-2 obtains signal-1 from bus-2." Typically, each LRU employs a unique source selection scheme. Thus, unique source selection rules are established for each LRU in the system. In one actual embodiment,
the source selection rules are extracted from automatic Source Selection and Failure Detection (SSFD) logic that is included in the avionics system documentation.
One benefit of the present invention is the inclusion in the rule data of operating configuration contingencies, i.e., simulation conditions. An operating configuration contingency appears in the "if" statement of each rule and is referred to as
the simulation condition code. Conceptually, each rule tests "if the simulation condition parameters entered by the system analyst match the simulation condition parameters in the rule, then . . . ." In other words, a rule is only considered pertinent
to a specific failure analysis if it is a rule that describes LRU behavior in the operating configuration described by the simulation condition parameters.
The user interface 14 includes a set-up module and a failure analysis module. The set-up module allows the system analyst to assign values or statuses to simulation condition parameters. The failure analysis module allows the system analyst to
select the type of simulation to be carried out and to identify a failed component or a failure response. In one actual embodiment, if failure propagation is to be simulated, the system analyst also inputs the desired level of failure propagation via
the failure analysis module.
In one embodiment, source selection, system mode, and external condition parameters make up the simulation condition parameters. The source selection parameters describe the initial LRU input source selection. For some LRUs, data source
selection is performed manually by the setting of flight deck source selection switches. A flight deck source selection switch drives a hardware relay that is internal or external to an LRU. Changing the position of a flight deck source selection
switch actually changes the path along which a signal travels. System mode parameters describe the aircraft's flight trajectory. System modes include: take-off, climb, cruise, descend, land, and go-around. In each mode, the flight trajectory is
further described by pitch, roll and autothrottle, i.e., power, parameters. The external condition parameters include speed and altitude.
With reference to FIG. 2, the set-up module provides the system analyst with a screen through which to enter the values or states of the simulation condition parameters. In one actual embodiment, the system analyst enters: aircraft
identification, Instrument Select (IS) switch position, Flight Management Computer (FMC) master switch position, and Flight Director (F/D) switch position, autoflight (system) mode selections, and external conditions values. The switch positions dictate
the data sources for specific LRUs at the beginning of the simulation. The autoflight mode selections dictate the system mode for the simulation. In one embodiment, if a parameter value is not entered by the system analyst, the user interface sets the
parameter to a default value.
The failure analysis engine 16 includes an artificial intelligence (AI) shell and a failure analysis expert system. In one particular embodiment of the present invention, the AI shell comprises a commercially available PROLOG interpreter,
QUINTUS PROLOG. Other AI shells may also be used. Suitable examples and definitions of expert systems and related entities may be found in A Guide to Expert Systems by Donald A. Waterman, Addison-Wesley, 1986. The AI shell for interprets or executes a
specific PROLOG program, e.g., the failure analysis expert system. Before describing the failure analysis expert system in detail, an introduction to PROLOG is provided.
PROLOG is an AI language that is based on first-order predicate logic. The language is a list manipulation language that works against a knowledge base including facts and rules. Each fact consists of a statement about objects and their
relationships. Each rule defines how to draw conclusions from given facts. The facts and rules are formatted in a manner such that it is sometimes difficult to distinguish them. For example, facts may be: ##EQU1## A rule might be: ##EQU2## In AI
terms, if a rule's criteria are matched then the rule is "fired." The firing of the rule generally generates a response such as the establishment of a related fact or simply an indication that the rule is "true." In the example, when the conditions of
the first line of the rule are met, the fact that some LRU, LRU-X, is a destination for a signal, signal Y, is established.
A system analyst may query a knowledge base. A sample PROLOG query is: ##EQU3## The expert system must determine whether the facts fit the rules in order to answer the query. In the present case, the expert system determines that according to
the rule, if LRU-2 is a destination for signal 1, then there must be an LRU-Z that is a source for signal 1 to LRU-2. The expert system then references each fact in turn to determine whether there is a fact that provides a suitable LRU-Z. The first fact
matches the information provided in the query. The value for LRU-Z is set to LRU-1 according to the fact. The rule is thus completed and fired. The firing of the rule results in the establishment of the fact that LRU-2 is a destination for signal 1.
The new fact can be used during further processing by the expert system.
A PROLOG expert system also includes a mechanism for responding to a query with a list of possible responses. When discrete values for one or more variables are not included in the query, the expert system response is a set of possible values
for the unknown variables that are extracted from the facts. For example, the query: ##EQU4## will return a list of all signals that are input signals for LRU-2 in the knowledge base. In the present example, the expert system again references the first
rule to determine the logic for LRU-to-signal relationships. Then, the facts are considered, one at a time, to determine which facts in the knowledge base match all of the information provided in the query. For each fact that matches, the unknown
variable information from the fact is presented to the system analyst. In the present query, the unknown variable is signal Y. Thus, the rule related to destination LRU's is considered in the form: ##EQU5## Applying the rule to the facts, the first fact
provides the information that LRU-2 is a destination for signal 1; the second fact provides the information that LRU-2 is a destination for signal 2. Thus, the system analyst is provided with the list response: ##EQU6## If the knowledge base included
other rules describing the logic for the relationship between LRU-2 and signals, each of the rules in turn would be analyzed in accordance with the facts. The process is repeated until all possible responses to the query, e.g., all rules and facts, are
considered.
The failure analysis expert system of the present invention uses a combination of these types of queries. The failure analysis expert system is described in further detail in relationship to one actual knowledge base suitable for use with the
present invention.
With reference to TABLE 1, one actual knowledge base includes path, signal, LRU subrule, result, and source select data. TABLE 1 includes data type formats in italics and data samples in regular type. The LRU subrule and result data comprise
the LRU failure rule data.
TABLE 1 __________________________________________________________________________ path(bus ID, source LRU, destination LRU, switches) path(%OIL TEMP, FCC-B, AFC-AU, NONE); path(+28V DC DIM BUS, AIRCRAFT, EIS-P, NONE); signal(source LRU,
data ID) signal(CDU-L, MCDU-L-1-354); signal(DAA-L, DAA-L-1-006); signal(FCC-A, FCC-L-1-076); subrule(LRU, rule no., sim. cond. code, failed ID list, condition, response code) subrule(IRU-L, -9999, NWA1/----------/-/--/---/--/--/--/--/-/-,
IRU-L-i-1-999-00, c-NONE-X-003, r-IRU-L-999) result(response code, text, data ID list, condition) result(r-ND-L-001, Remove MAP display and display MAP flag, IRU-L-A-3-320-00, c-NONE-X-002) src-sel(LRU, rule no., sim. cond. code, alternate ID list,
failed ID list, condition) src-sel(IFMC-L, -S047, NWA1/----------/-/--/---/--/--/--/--/-/-, [IRU-L-A-3-367-00, URU-L-A-3-336-00, IRU-L-A-3-310-00],.quadrature.); src-sel(IFMC-L, -S050, NWA1/----------/-/--/---/--/--/--/--/-/-, [IRU-C-A-3-367-00,
IRU-C-A-3-336-00], [IRU-L-A-3-367-00], IRU-L-A-3-366-00, IRU-L-A-3-310-00 [NCD, NO ACTIVITY]) __________________________________________________________________________
The failure analysis expert system identifies the data types by name. The data type constructions are used to format queries. For example, in order to find a list of destination LRUs for bus-1 and source LRU-1, regardless of switch status, a
query in PROLOG source code would be:
In English, the query means--generate a list of distination LRUs from path data that match bus-1 and LRU-1 in the bus identification and source LRU fields, respectively. Each query for a specific data type must include a value or a null
indicator in each field that is defined for the data type.
Simulation condition parameters are included in the subrule and source selection rule data. The parameters appear in the rules as codes formed by concatenating the simulation condition parameters that must be true for the rule to be applicable.
For example, the parameters for the flight deck configuration illustrated in FIG. 2 are concatenated into the following code:
The parameters are separated in the code by a "/". If a segment of the code is filled with "-", then the value of the particular parameter is irrelevant to the rule. In order to fire a specific rule, the rule's simulation condition code must
match the simulation condition parameters that are selected for the simulation. By integrating the simulation condition code into the knowledge base rules, only the relevant portions of the knowledge base are used during a simulation. One benefit of
the present invention is that individual knowledge bases do not have to be established for each operating configuration.
With reference to FIGS. 3A and 3B, in one actual embodiment the user interface formats the LRU failure rules and the source selection rules into readable logical statements. The logical statements provide further explanation of how the rules are
applied. The data extracted from the rules appear in bold type in the logical statements. In order to obtain the display, the system analyst uses the user interface to select either an LRU failure rule or source selection rule, and then to identify the
LRU name, LRU location, and unique rule number. The user interface extracts the data from the knowledge base. If an LRU failure rule is selected, the user interface extracts the LRU subrule identified by the unique rule number and then extracts the
text and results for the response by matching the response code in the LRU subrule to the response code in the result data.
Before describing the failure effect propagation simulation of the failure analysis expert system, a diagrammatical example of the propagation of an LRU failure's effect is presented. With reference to FIG. 4, a partial avionics system is
illustrated, specifically a portion of the left side of the system. Thus, all of the LRUs illustrated have a right or some other counterpart system (not shown). An Air Data Computer-Left (ADC-L) failure's effect, given the simulation condition
parameters illustrated in FIG. 2, results in at least three o | | |