|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates generally to data processing systems and,
more particularly, to the generation of Bayesian networks.
BACKGROUND OF THE INVENTION
The advent of artificial intelligence within computer science has brought
an abundance of decision-support systems. Decision-support systems are
computer systems in which decisions, typically rendered by humans, are
recommended and sometimes made. In creating decision-support systems,
computer scientists seek to provide decisions with the greatest possible
accuracy. Thus, computer scientists strive to create decision-support
systems that are equivalent to or more accurate than a human expert.
Applications of decision-support systems include medical diagnosis,
troubleshooting computer networks, or other systems wherein a decision is
based upon identifiable criteria.
One of the most promising new areas for research in decision-support
systems is Bayesian networks. A Bayesian network is a representation of
the probabilistic relationships among distinctions about the world. Each
distinction, sometimes called a variable, can take on one of a mutually
exclusive and exhaustive set of possible states. A Bayesian network is
expressed as an acyclic-directed graph where the variables correspond to
nodes and the relationships between the nodes correspond to arcs. FIG. 1
depicts an examplary Bayesian network 101. In FIG. 1 there are three
variables, X.sub.1, X.sub.2, and X.sub.3, which are represented by nodes
102, 106 and 110, respectively. This Bayesian network contains two arcs
104 and 108. Associated with each variable in a Bayesian network is a set
of probability distributions. Using conditional probability notation, the
set of probability distributions for a variable can be denoted by
p(x.sub.i.vertline..PI..sub.i, .xi.), where "p" refers to the probability
distribution, where ".PI..sub.i " denotes the parents of variable X.sub.i
and where ".xi." denotes the knowledge of the expert. The Greek letter
".xi." indicates that the Bayesian network reflects the knowledge of an
expert in a given field. Thus, this expression reads as follows: the
probability distribution for variable X.sub.i given the parents of X.sub.i
and the knowledge of the expert. For example, X.sub.1 is the parent of
X.sub.2. The probability distributions specify the strength of the
relationships between variables. For instance, if X.sub.1 has two states
(true and false), then associated with X.sub.1 is a single probability
distribution p(x.sub.1.vertline..xi.) and associated with X.sub.2 are two
probability distributions p(x.sub.2.vertline.x.sub.1 =t, .xi.) and
p(x.sub.2.vertline.x.sub.1 =f, .xi.). In the remainder of this
specification, .xi. is not specifically mentioned.
The arcs in a Bayesian network convey dependence between nodes. When there
is an arc between two nodes, the probability distribution of the first
node depends upon the value of the second node when the direction of the
arc points from the second node to the first node. For example, node 106
depends upon node 102. Therefore, nodes 102 and 106 are said to be
conditionally dependent. Missing arcs in a Bayesian network convey
conditional independencies. For example, node 102 and node 110 are
conditionally independent given node 106. However, two variables
indirectly connected through intermediate variables are conditionally
dependent given lack of knowledge of the values ("states" ) of the
intermediate variables. Therefore, if the value for node 106 is known,
node 102 and node 110 are conditionally dependent.
In other words, sets of variables X and Y are said to be conditionally
independent, given a set of variables Z, if the probability distribution
for X given Z does not depend on Y. If Z is empty, however, X and Y are
said to be "independent" as opposed to conditionally independent. If X and
Y are not conditionally independent, given Z, then X and Y are said to be
conditionally dependent given Z.
The variables used for each node may be of different types. Specifically,
variables may be of two types: discrete or continuous. A discrete variable
is a variable that has a finite or countable number of states, whereas a
continuous variable is a variable that has an uncountably infinite number
of states. All discrete variables considered in this specification have a
finite number of states. An example of a discrete variable is a Boolean
variable. Such a variable can assume only one of two states: "true" or
"false." An example of a continuous variable is a variable that may assume
any real value between -1 and 1. Discrete variables have an associated
probability distribution. Continuous variables, however, have an
associated probability density function ("density"). Where an event is a
set of possible outcomes, the density p(x) for a variable "x" and events
"a" and "b" is defined as:
##EQU1##
where p(a.ltoreq.x.ltoreq.b) is the probability that x lies between a and
b. Conventional systems for generating Bayesian networks cannot use
continuous variables in their nodes.
FIG. 2 depicts an example Bayesian network for troubleshooting automobile
problems. The Bayesian network of FIG. 2 contains many variables 202, 204,
206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, and
234, relating to whether an automobile will work properly, and arcs 236,
238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 268.
A few examples of the relationships between the variables follow. For the
radio 214 to work properly, there must be battery power 212 (arc 246).
Battery power 212, in turn, depends upon the battery working properly 208
and a charge 210 (arcs 242 and 244). The battery working properly 208
depends upon the battery age 202 (arc 236). The charge 210 of the battery
depends upon the alternator 204 working properly (arc 238) and the fan
belt 206 being intact (arc 240). The battery age variable 202, whose
values lie from zero to infinity, is an example of a continuous variable
that can contain an infinite number of values. However, the battery
variable 208 reflecting the correct operations of the battery is a
discrete variable being either true or false.
The automobile troubleshooting Bayesian network also provides a number of
examples of conditional independence and conditional dependence. The nodes
operation of the lights 216 and battery power 212 are dependent, and the
nodes operation of the lights 216 and operation of the radio 214 are
conditionally independent given battery power 212. However, the operation
of the radio 214 and the operation of the lights 216 are conditionally
dependent. The concept of conditional dependence and conditional
independence can be expressed using conditional probability notation. For
example, the operation of the lights 216 is conditionally dependent on
battery power 212 and conditionally independent of the radio 214 given the
battery power 212. Therefore, the probability of the lights working
properly 216 given both the battery power 212 and the radio 214 is
equivalent to the probability of the lights working properly given the
battery power alone, P(Lights.vertline.Battery Power,
Radio)=P(Lights.vertline.Battery Power). An example of a conditional
dependence relationship is the probability of the lights working properly
216 given the battery power 212 which is not equivalent to the probability
of the lights working properly given no information. That is,
p(Lights.vertline.Battery Power).noteq.p(Lights).
There are two conventional approaches for constructing Bayesian networks.
Using the first approach ("the knowledge-based approach"), a person known
as a knowledge engineer interviews an expert in a given field to obtain
the knowledge of the expert about the field of expertise of the expert.
The knowledge engineer and expert first determine the distinctions of the
world that are important for decision making in the field of the expert.
These distinctions correspond to the variables of the domain of the
Bayesian network. The "domain" of a Bayesian network is the set of all
variables in the Bayesian network. The knowledge engineer and the expert
next determine the dependencies among the variables (the arcs) and the
probability distributions that quantify the strengths of the dependencies.
In the second approach ("called the data-based approach"), the knowledge
engineer and the expert first determine the variables of the domain. Next,
data is accumulated for those variables, and an algorithm is applied that
creates a Bayesian network from this data. The accumulated data comes from
real world instances of the domain. That is, real world instances of
decision making in a given field. Conventionally, this second approach
exists for domains containing only discrete variables.
After the Bayesian network has been created, the Bayesian network becomes
the engine for a decision-support system. The Bayesian network is
converted into a computer-readable form, such as a file and input into a
computer system. Then, the computer system uses the Bayesian network to
determine the probabilities of variable states given observations,
determine the benefits of performing tests, and ultimately recommend or
render a decision. Consider an example where a decision-support system
uses the Bayesian network of FIG. 2 to troubleshoot automobile problems.
If the engine for an automobile did not start, the decision-based system
could request an observation of whether there was gas 224, whether the
fuel pump 226 was in working order by possibly performing a test, whether
the fuel line 228 was obstructed, whether the distributor 230 was working,
and whether the spark plugs 232 were working. While the observations and
tests are being performed, the Bayesian network assists in determining
which variable should be observed next.
U.S. application Ser. No. 08/240,019 filed May 9, 1994 entitled "Generating
Improved Belief Networks" describes an improved system and method for
generating Bayesian networks (also known as "belief networks") that
utilize both expert data received from an expert ("expert knowledge") and
data received from real world instances of decisions made ("empirical
data"). By utilizing both expert knowledge and empirical data, the network
generator provides an improved Bayesian network that is more accurate than
conventional Bayesian networks. In addition, the exemplary embodiment
facilitates the use of continuous variables in Bayesian networks and
handles missing data in the empirical data that is used to construct
Bayesian networks.
Expert knowledge consists of two components: an equivalent sample size or
sizes ("sample size"), and the prior probabilities of all possible
Bayesian-network structures ("priors on structures"). The effective sample
size is the effective number of times that the expert has rendered a
specific decision. For example, a doctor with 20 years of experience
diagnosing a specific illness may have an effective sample size in the
hundreds. The priors on structures refers to the confidence of the expert
that there is a relationship between variables (e.g., the expert is 70
percent sure that two variables are related). The priors on structures can
be decomposed for each variable-parent pair known as the "prior
probability" of the variable-parent pair. Empirical data is typically
stored in a database. An example of acquiring empirical data can be given
relative to the Bayesian network of FIG. 2. If, at a service station, a
log is maintained for all automobiles brought in for repair, the log
constitutes empirical data. The log entry for each automobile may contain
a list of the observed state of some or all of the variables in the
Bayesian network. Each log entry constitutes a case. When one or more
variables are unobserved in a case, the case containing the unobserved
variable is said to have "missing data." Thus, missing data refers to when
there are cases in the empirical data database that contain no observed
value for one or more of the variables in the domain. An assignment of one
state to each variable in a set of variables is called an "instance" of
that set of variables. Thus, a "case" is an instance of the domain. The
"database" is the collection of all cases.
An example of a case can more clearly be described relative to the Bayesian
network of FIG. 2. A case may consist of the battery age 202 being 2.132
years old, the battery working properly 208 being true, the alternator
working properly 204 being true, the fan belt being intact 206 being true,
the charge 210 being sufficient, the battery power 212 being sufficient,
the starter working properly 220 being true, the engine turning over 218
being true, the amount of gas 224 being equal to 5.3 gallons, the fuel
pump working properly 226 being true, the fuel line working properly 228
being true, the distributor working properly 230 being false, the spark
plugs working properly 232 being true and the engine starting 234 being
false. In addition, the variables for the gas gauge 222, the radio working
properly 214 and the lights working properly 216 may be unobserved. Thus,
the above-described case contains missing data.
BACKGROUND RELATIVE TO DECISION GRAPHS
Although Bayesian networks are quite useful in decision-support systems,
Bayesian networks require a significant amount of storage. For example, in
the Bayesian network 300 of FIG. 3A, the value of nodes X and Y causally
influences the value of node Z. In this example, nodes X, Y, and Z have
binary values of either 0 or 1. As such, node Z maintains a set of four
probabilities, one probability for each combination of the values of X and
Y, and stores these probabilities into a table 320 as shown in FIG. 3B.
When performing probabilistic inference, it is the probabilities in table
320 that are accessed. As can be seen from table 320, only the
probabilities for Z equaling 0 are stored; the probabilities for Z
equaling 1 need not be stored as they are easily derived by subtracting
the probability of when Z equals 0 from 1. As the number of parents of a
node increases, the table in the node that stores the probabilities
becomes multiplicatively large and requires a significant amount of
storage. For example, a node having binary values with 10 parents that
also have binary values requires a table consisting of 1,024 entries. And,
if either the node or one of its parents has more values than a binary
variable, the number of probabilities in the table increases
multiplicatively. To improve the storage of probabilities in a Bayesian
network node, some conventional systems use a tree data structure. A tree
data structure is an acyclic, undirected graph where each vertex is
connected to each other vertex via a single path. The graph is acyclic in
that there is no path that both emanates from a vertex and returns to the
same vertex, where each edge in the path is traversed only once. FIG. 3C
depicts an example tree data structure 330 that stores into its leaf
vertices 336-342 the probabilities shown in table 320 of FIG. 3B. Assuming
that a decision-support system performs probabilistic inference with X's
value being 0 and Y's value being 1, the following steps occur to access
the appropriate probability in the tree data structure 330: First, the
root vertex 332, vertex X, is accessed, and its value determines the edge
or branch to be traversed. In this example, X's value is 0, so edge 344 is
traversed to vertex 334, which is vertex Y. Second, after reaching vertex
Y, the value for this vertex determines which edge is traversed to the
next vertex. In this example, the value for vertex Y is 1, so edge 346 is
traversed to vertex 338, which is a leaf vertex. Finally, after reaching
the leaf vertex 338, which stores the probability for Z equaling 0 when
X=0 and Y =1, the appropriate probability can be accessed. As compared to
a table, a tree is a more efficient way of storing probabilities in a node
of a Bayesian network, because it requires less space. However, tree data
structures are inflexible in the sense that they can not adequately
represent relationships between probabilities. For example, because of the
acyclic nature of tree data structures, a tree cannot be used to indicate
some types of equality relationships where multiple combinations of the
values of the parent vertices have the same probability (i.e., refer to
the same leaf vertex). This inflexibility requires that multiple vertices
must sometimes store the same probabilities, which is wasteful. It is thus
desirable to improve Bayesian networks with tree distributions.
BACKGROUND RELATIVE TO COLLABORATIVE FILTERING
Collaborative filtering systems have been developed that predict the
preferences of a user. The term "collaborative filtering" refers to
predicting the preferences of a user based on known attributes of the
user, as well as known attributes of other users. For example, a
preference of a user may be whether they would like to watch the
television show "I Love Lucy" and the attributes of the user may include
their age, gender, and income. In addition, the attributes may contain one
or more of the user's known preferences, such as their dislike of another
television show. A user's preference can be predicted based on the
similarity of that user's attributes to other users. For example, if all
users over the age of 50 with a known preference happen to like "I Love
Lucy" and if that user is also over 50, then that user may be predicted to
also like "I Love Lucy" with a high degree of confidence. One conventional
collaborative filtering system has been developed that receives a database
as input. The database contains attribute-value pairs for a number of
users. An attribute is a variable or distinction, such as a user's age,
gender or income, for predicting user preferences. A value is an instance
of the variable. For example, the attribute age may have a value of 23.
Each preference contains a numeric value indicating whether the user likes
or dislikes the preference (e.g., 0 for dislike and 1 for like). The data
in the database is obtained by collecting attributes of the users and
their preferences. It should be noted that conventional collaborative
filtering systems can typically only utilize numerical attributes. As
such, the values for non-numeric | | |