|
|
|
| United States Patent | 5704017 |
| Link to this page | http://www.wikipatents.com/5704017.html |
| Inventor(s) | Heckerman; David E. (Bellevue, WA);
Breese; John S. (Mercer Island, WA);
Horvitz; Eric (Kirkland, WA);
Chickering; David Maxwell (Los Angeles, CA) |
| Abstract | The disclosed system provides an improved collaborative filtering system by
utilizing a belief network, which is sometimes known as a Bayesian
network. The disclosed system learns a belief network using both prior
knowledge obtained from an expert in a given field of decision making and
a database containing empirical data obtained from many people. The
empirical data contains attributes of users as well as their preferences
in the field of decision making. After initially learning the belief
network, the belief network is relearned at various intervals when
additional attributes are identified as having a causal effect on the
preferences and data for these additional attributes can be gathered. This
relearning allows the belief network to improve its accuracy at predicting
preferences of a user. Upon each iteration of relearning, a cluster model
is automatically generated that best predicts the data in the database.
After relearning the belief network a number of times, the belief network
is used to predict the preferences of a user using probabilistic
inference. In performing probabilistic inference, the known attributes of
a user are received and the belief network is accessed to determine the
probability of the unknown preferences of the user given the known
attributes. Based on these probabilities, the preference most likely to be
desired by the user can be predicted. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5704017 |
|
|
Collaborative filtering utilizing a belief network |
|
|
|
|
|
| Publication Date |
December 30, 1997 |
|
|
|
|
|
| Filing Date |
February 16, 1996 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. A method in a computer system having a collaborative filtering system
for predicting a desired preference of a user based on attributes of the
user, comprising the steps of:
receiving into the collaborative filtering system a belief network
containing attribute nodes and preference nodes, the attribute nodes
reflecting the attributes of the user and having a value, the preference
nodes reflecting available preferences of the user, each preference node
having a probability indicating the likelihood that the available
preference is the desired preference of the user given the values of the
attribute nodes;
receiving into the collaborative filtering system a request to determine an
available preference having a greatest likelihood of being the desired
preference, the request coutaining values for the attribute nodes;
determining by the collaborative tiltering system the available preference
having the greatest likelihood of being the desired preference by
evaluating the probabilities of the preference nodes given the values of
the attribute nodes; and
indicating by the collaborative filtering system the available preference
having the greatest likelihood of being the desired preference to the
user.
2. The method of claim 1, including the steps of, for each available
preference, calculating by the collaborative filtering system a likelihood
that the available preference is the desired preference given the values
of the attribute nodes and indicating by the collaborative filtering
system the calculated likelihood to the user.
3. A method in a computer system having a collaborative filtering system
for predicting a desired preference of a user based on an attribute of the
user, comprising the computer-implemented steps of:
creating a belief network containing an attribute node and preference
nodes, the attribute node reflecting an attribute of the user and having a
value, the preference nodes reflecting available preferences of the user,
each preference node having a probability indicating a likelihood that the
available preference is the desired preference of the user given the value
of the attribute node;
providing the created belief network to the collaborative filtering system;
receiving a request by the collaborative filtering system to determine an
available preference having a greatest likelihood of being the desired
preference, the request containing a value for the attribute node;
determining the available preference having the greatest likelihood of
being the desired preference by evaluating the probabilities of the
preference nodes given the value of the attribute node; and
indicating the available preference having the greatest likelihood of being
the desired preference to the user.
4. The method of claim 3 wherein the step of creating a belief network
includes the steps of:
receiving expert data comprising an initial belief network containing a
hidden attribute node and at least one preference node, the hidden
attribute node being an unknown factor that causally influences the
available preferences;
receiving data comprising sets of values for the preference nodes;
determining groups of sets having similar values, wherein each group has a
plurality of sets of values;
receiving an indication of a causal factor in the determined groups to
account for the similarity of the values in the sets of values in the
groups;
receiving observations for the identified causal factor and adding the
observations to the data; and
adding the causal factor to the initial belief network as an attribute node
after receiving the observations, and
wherein the step of providing the created belief network includes providing
the initial belief network to the collaborative filtering system.
5. The method of claim 4 wherein the steps of determining groups, receiving
an indication, receiving observations and adding the causal factor are
repeatedly performed until either a causal factor cannot be identified or
observations for the causal factor cannot be gathered.
6. A method in a collaborative filtering system for predicting a desired
preference of a user, comprising the computer-implemented steps of:
receiving a belief network into the collaborative filtering system, the
belief network containing a hidden attribute node and preference nodes,
the hidden attribute node reflecting an unobserved attribute of the user
that causally influences the desired preference, the preference nodes
reflecting available preferences of the user, each preference node having
a value and having a probability indicating a likelihood that the
available preference is the desired preference of the user given the
values of the preference nodes;
receiving data containing values for the preference nodes;
receiving a plurality of a number of states for the hidden attribute node;
for each number of states for the hidden attribute node,
generating a belief network having the preference nodes and the hidden
attribute node, wherein the hidden attribute node has the number of
states; and
scoring the generated belief network for goodness in predicting the desired
preference based on the data;
selecting a generated belief network having a best goodness in predicting
the desired preference based on the data;
receiving a request to determine an available preference having a greatest
likelihood of being the desired preference, the request containing a value
for at least one of the preference nodes;
determining an available preference having the greatest likelihood of being
the desired preference by evaluating the probabilities of the preference
nodes in the selected belief network given the value for the at least one
preference node; and
indicating the available preference having the greatest likelihood of being
the desired preference to the user.
7. A method in a computer system for generating a belief network containing
caused variables and unbidden causal variables, the computer system having
a decision support system, the belief network for providing decision
support in a given field of expertise, the caused variables reflecting a
decision to be rendered by the decision support system and the causal
variables reflecting factors that causally effect the decision, comprising
the computer-implemented steps of:
receiving expert data from an expert in the given field comprising a belief
network containing a hidden causal variable and caused variables and
wherein the expert data includes a plurality of a number of states for the
hidden causal variable, the hidden causal variable being an unknown factor
that causally influences the caused variables;
receiving data comprising sets of values for the caused variables after
receiving the expert data;
determining groups of sets in the received data having similar values;
receiving an indication of a causal factor in the determined groups to
account for the similarity of the values in each group;
receiving observations for the identified causal factor and adding the
observations to the data;
adding the causal factor to the belief network as an unbidden causal
variable after receiving the observations;
receiving a new plurality of a number of states for the hidden causal
variable after adding the causal factor; and
repeatedly performing the steps of determining groups, receiving an
indication, receiving observations, adding the causal factor and receiving
a new plurality of a number of states until either a causal factor cannot
be identified to account for the similarity of the values in each group or
observations for the causal factor cannot be obtained.
8. A collaborative filtering system for predicting a desired preference of
a user based on attributes of the user, comprising:
a belief network containing attribute nodes and preference nodes, the
attribute nodes reflecting the attributes of the user and having a value,
the preference nodes reflecting available preferences of the user, each
preference node having a probability indicating a likelihood that the
available preference is the desired preference of the user given the
values of the attribute nodes;
a receive component for receiving a request to determine an available
preference having a greatest likelihood of being the desired preference,
the request containing values for the attribute nodes;
a determination component for determining the available preference having
the greatest likelihood of being the desired preference by evaluating the
probabilities of the preference nodes given the values of the attribute
nodes; and
an output component for indicating the available preference having the
greatest likelihood of being the desired preference.
9. The collaborative filtering system of claim 8 wherein the attribute
nodes of the belief network contain discrete values and the preference
nodes contain discrete values.
10. The collaborative filtering system of claim 8 wherein at least one of
the attribute nodes contains a non-numerical value.
11. The collaborative filtering system of claim 8 wherein at least one of
the preference nodes contains a non-numerical value.
12. The collaborative filtering system of claim 8 wherein one of the
attribute nodes contains a known preference of the user.
13. A computer-readable memory device containing:
a data structure representing a belief network that comprises nodes, the
belief network for use in a collaborative filtering system for predicting
a desired preference of a user based on attributes of the user, the nodes
further comprising:
attribute nodes reflecting the attributes of the user, wherein each
attribute node has a value; and
preference nodes reflecting available preferences of the user, each
preference node having a probability indicating a likelihood that the
available preference is the desired preference of the user given the
values of the attribute nodes, wherein the collaborative filtering system
receives a request for determining a desired preference of a user among
the available preferences, wherein the request contains values for the
attribute nodes and in response to receiving the request, the
collaborative filtering system evaluates the probabilities of the
preference nodes to determine the available preference that is most likely
to be the desired preference given the values for the attribute nodes in
the request.
14. A computer-readable media that causes a collaborative filtering system
in a computer system to predict a desired preference of a user based on
attributes of the user, by performing the steps of:
receiving a belief network into the collaborative filtering system, the
belief network containing attribute nodes and preference nodes, the
attribute nodes reflecting the attributes of the user and having a value,
the preference nodes reflecting available preferences of the user, each
preference node having a probability indicating the likelihood that the
available preference is the desired preference of the user given the
values of the attribute nodes;
receiving a request to determine an available preference having a greatest
likelihood of being the desired preference, the request containing values
for the attribute nodes;
determining the available preference having the greatest likelihood of
being the desired preference by evaluating the probabilities of the
preference nodes given the values of the attribute nodes; and
indicating the available preference having the greatest likelihood of being
the desired preference to the user.
15. A computer-readable media that causes a collaborative filtering system
in a computer system to predict a desired preference of a user based on an
attribute of the user, by performing the steps of:
creating a belief network containing an attribute node and preference
nodes, the attribute node reflecting an attribute of the user and having a
value, the preference nodes reflecting available preferences of the user,
each preference node having a probability indicating a likelihood that the
available preference is the desired preference of the user given the value
of the attribute node;
providing the created belief network to the collaborative filtering system;
receiving a request by the collaborative filtering system to determine an
available preference having a greatest likelihood of being the desired
preference, the request containing a value for the attribute node;
determining the available preference having the greatest likelihood of
being the desired preference by evaluating the probabilities of the
preference nodes given the value of the attribute node; and
indicating the available preference having the greatest likelihood of being
the desired preference to the user.
16. A computer-readable media that causes a collaborative filtering system
in a computer system to predict a desired preference of a user, by
performing the steps of:
receiving a belief network into the collaborative filtering system, the
belief network containing a hidden attribute node and preference nodes,
the hidden attribute node reflecting an unobserved attribute of the user
that causally influence the desired preference, the preference nodes
reflecting available preferences of the user, each preference node having
a value and having a probability indicating a likelihood that the
available preference is the desired preference of the user given the
values of the preference nodes;
receiving data containing values for the preference nodes;
receiving a range of a number of states for the hidden attribute node;
for each number of states for the hidden attribute node,
generating a belief network having the preference nodes and the hidden
attribute node, wherein the hidden attribute node has the number of
states; and
scoring the generated belief network for goodness in predicting the desired
preference based on the data;
selecting a generated belief network having a best goodness in predicting
the desired preference based on the data;
receiving a request to determine an available preference having a greatest
likelihood of being the desired preference, the request containing a value
for at least one of the preference nodes;
determining an available preference having the greatest likelihood of being
the desired preference by accessing the generated belief network having
the best goodness and evaluating the probabilities of the preference nodes
given the value for the at least one preference node; and
indicating the available preference having the greatest likelihood of being
the desired preference to the user.
17. A computer-readable media that causes a computer system to generate a
belief network containing caused variables and unbidden causal variables,
the computer system having a decision support system, the belief network
for providing decision support in a given field of expertise, the caused
variables reflecting a decision to be rendered by the decision support
system and the causal variables reflecting factors that causally effect
the decision, by performing the steps of:
receiving expert data from an expert in the given field comprising a belief
network containing a hidden causal variable and caused variables and
wherein the expert data includes a plurality of a number of states for the
hidden causal variable, the hidden causal variable being an unknown factor
that causally influences the caused variables;
receiving data comprising sets of values for the caused variables after
receiving the expert data;
determining groups of sets in the received data having similar values;
receiving an indication of a causal factor in the determined groups to
account for the similarity of the values in each group;
receiving observations for the identified causal factor and adding the
observations to the data after identifying a causal factor;
adding the causal factor to the belief network as an unhidden causal
variable after receiving the observations;
receiving a new plurality of a number of states for the hidden causal
variable after adding the causal factor; and
repeatedly performing the steps of determining groups, receiving an
indication, receiving observations, adding the causal factor and receiving
a new range of a number of states until either a causal factor cannot be
identified to account for the similarity of the values in each group or
observations for the causal factor cannot be obtained.
18. A method in a collaborative filtering system for predicting a desired
preference of a user based on a characteristic of the user, comprising the
computer-implemented steps of:
receiving a belief network into the collaborative filtering system, the
belief network containing preference nodes reflecting available
preferences of the user, each preference node having a probability
indicating the likelihood that the available preference is the desired
preference of the user given a value for the characteristic of the user;
receiving a request to determine an available preference having a greatest
likelihood of being the desired preference, the request containing a value
for the characteristic of the user;
determining the available preference having the greatest likelihood of
being the desired preference by evaluating the probabilities of the
preference nodes given the value of the characteristics; and
indicating the available preference having the greatest likelihood of being
the desired preference to the user.
19. The method of claim 18 wherein the collaborative filtering system
predicts a desired song of a user, wherein the preference nodes of the
belief network contain available songs, and wherein the step of indicating
the available preference includes indicating the available song.
20. The method of claim 18 wherein the collaborative filtering system
predicts a desired music video of a user, wherein the preference nodes of
the belief network contain available music videos, and wherein the step of
indicating the available preference includes indicating the available
music video having the greatest likelihood of being the desired music
video.
21. The method of claim 18 wherein the collaborative filtering system
predicts desired music to be purchased by a user, wherein the preference
nodes of the belief network contain available music, and wherein the step
of indicating the available preference includes indicating the available
music having the greatest likelihood of being the desired music.
22. The method of claim 18 wherein the collaborative filtering system
predicts a desired television show of a user, wherein the preference nodes
of the belief network contain available television shows, and wherein the
step of indicating the available preference includes indicating the
available television show having the greatest likelihood of being the
desired television show.
23. The method of claim 18 wherein the collaborative filtering system is
used to incorporate a desired customization of a user into an electronic
newspaper, wherein the preference nodes of the belief network contain
available customizations for the electronic newspaper, and wherein the
step of indicating the available preference includes indicating the
available customization having the greatest likelihood of being the
desired customization.
24. The method of claim 18 wherein the collaborative filtering system
predicts a desired Internet page of a user, wherein the preference nodes
of the belief network contain available Internet pages, and wherein the
step of indicating the available preference includes indicating the
available Internet page having the greatest likelihood of being the
desired Internet page.
25. The method of claim 18 wherein the collaborative filtering system
predicts a desired Encarta article of a user, wherein the preference nodes
of the belief network contain available Encarta articles, and wherein the
step of indicating the available preference includes indicating the
available Encarta article having the greatest likelihood of being the
desired Encarta article.
26. The method of claim 18 wherein the collaborative filtering system
predicts a desired graphical layout of a user for a graphical presentation
program, wherein the preference nodes of the belief network contain
available graphical layouts, and wherein the step of indicating the
available preference includes indicating the available graphical layout
having the greatest likelihood of being the desired graphical layout.
27. The method of claim 18 wherein the collaborative filtering system
predicts a desired email alias of a user, wherein the preference nodes of
the belief network contain available email aliases, and wherein the step
of indicating the available preference includes indicating the available
email alias having the greatest likelihood of being the desired email
alias.
28. The method of claim 18 wherein the request includes a selection
criteria, wherein the step of determining the available preference
includes determining the available preference that has the greatest
likelihood of being the desired preference and that conforms to the
selection criteria, and wherein the step of indicating the available
preference includes indicating the available preference that has the
greatest likelihood of being the desired preference and that conforms to
the selection criteria.
29. The method of claim 18 wherein the collaborative filtering system
predicts a desired movie of a user, wherein the preference nodes of the
belief network contain available movies, and wherein the step of
indicating the available preference includes indicating the available
movie having the greatest likelihood of being the desired movie.
30. The method of claim 18 wherein the collaborative filtering system
predicts a desired wine of a user, wherein the preference nodes of the
belief network contain available wines, and wherein the step of indicating
the available preference includes indicating the available wine having the
greatest likelihood of being the desired wine.
31. The method of claim 18 wherein the collaborative filtering system
predicts a desired restaurant of a user, wherein the preference nodes of
the belief network contain available restaurants, and wherein the step of
indicating the available preference includes indicating the available
restaurant having the greatest likelihood of being the desired restaurant.
32. The method of claim 18 wherein the collaborative filtering system
predicts desired real estate of a user, wherein the preference nodes of
the belief network contain available real estate, and wherein the step of
indicating the available preference includes indicating the available real
estate having the greatest likelihood of being the desired real estate.
33. The method of claim 18 wherein the collaborative filtering system
predicts a desired advertisement of a user, wherein the preference nodes
of the belief network contain available advertisements, and wherein the
step of indicating the available preference includes indicating the
available advertisement having the greatest likelihood of being the
desired advertisement.
34. The method of claim 33 wherein the step of indicating the available
advertisement includes viewing the available advertisement having the
greatest likelihood of being the desired advertisement by the user via an
on-line system.
35. The method of claim 33 wherein the step of indicating the available
advertisement includes viewing the available advertisement having the
greatest likelihood of being the desired advertisement by the user via an
interactive television.
36. The method of claim 33 wherein the step of indicating the available
advertisement includes viewing the available advertisement having the
greatest likelihood of being the desired advertisement by a user via a
computer network.
37. The method of claim 18 wherein the collaborative filtering system
predicts desired products of a user, wherein the preference nodes of the
belief network contain available products, and wherein the step of
indicating the available preference includes indicating the available
product having the greatest likelihood of being the desired product.
38. The method of claim 18 wherein the collaborative filtering system
predicts inappropriate material that is deemed unsuitable for viewing by
children, wherein the preference nodes of the belief network contain
available material, and wherein the step of indicating the available
preference includes indicating the available material having the greatest
likelihood of being the inappropriate material.
39. The method of claim 38 wherein the available material includes music.
40. The method of claim 38 wherein the available material includes music
videos.
41. The method of claim 38 wherein the available material includes movies.
42. The method of claim 38 wherein the available material includes Internet
pages.
43. The method of claim 18 wherein the collaborative filtering system
predicts a desired user interface of a user, wherein the preference nodes
of the belief network contain available user interfaces, and wherein the
step of indicating the available preference includes indicating the
available user interface having the greatest likelihood of being the
desired user interface. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
TECHNICAL FIELD
The present invention relates generally to data processing systems and,
more particularly, to a collaborative filtering system that utilizes a
belief network.
BACKGROUND OF THE INVENTION
Collaborative filtering systems have been developed that predict the
preferences of a user. The term "collaborative filtering" refers to
predicting the preferences of a user based on known attributes of the
user, as well as known attributes of other users. For example, a
preference of a user may be whether they would like to watch the
television show "I Love Lucy" and the attributes of the user may include
their age, gender, and income. In addition, the attributes may contain one
or more of the user's known preferences, such as their dislike of another
television show. A user's preference can be predicted based on the
similarity of that user's attributes to other users. For example, if all
users over the age of 50 with a known preference happen to like "I Love
Lucy" and if that user is also over 50, then that user may be predicted to
also like "I Love Lucy" with a high degree of confidence. One conventional
collaborative filtering system has been developed that receives a database
as input. The database contains attribute-value pairs for a number of
users. An attribute is a variable or distinction, such as a user's age,
gender or income, for predicting user preferences. A value is an instance
of the variable. For example, the attribute age may have a value of 23.
Each preference contains a numeric value indicating whether the user likes
or dislikes the preference (e.g., 0 for dislike and 1 for like). The data
in the database is obtained by collecting attributes of the users and
their preferences.
It should be noted that conventional collaborative filtering systems can
typically only utilize numerical attributes. As such, the values for
non-numerical attributes, such as gender, are transposed into a numerical
value, which sometimes reduces the accuracy of the system. For example,
when a variable has three non-numerical states, such as vanilla, chocolate
and strawberry, transposing these states into a numerical value will
unintentionally indicate dissimilarity between the states. That is, if
vanilla were assigned a value of 1, chocolate 2 and strawberry 3, the
difference between each value indicates to the system how similar each
state is to each other. Therefore, the system may make predictions based
on chocolate being more similar to both vanilla and strawberry than
vanilla is similar to strawberry. Such predictions may be based on a
misinterpretation of the data and lead to a reduction in the accuracy of
the system.
In performing collaborative filtering, the conventional system first
computes the correlation of attributes between a given user "v" and each
other user "u" (except v) in the database. The computation of the
"correlation" is a well-known computation in the field of statistics.
After computing the correlation, the conventional system computes, for
example, the preference of a user "v" for a title of a television show "t"
as follows:
##EQU1##
where "pref(t, v)" is the preference of user "v" for title "t," where
"<pref(t)>" is the average preference of title "t" by all users, where
"pref(t, u)" is the preference of user "u" for title "t," where "corr(u,
v)" is the correlation of users "u" and "v," and the sums run over the
users "u" that have expressed a preference for title "t." One drawback to
this conventional system is that the entire database must be examined when
predicting preferences, which requires a significant amount of processing
time.
One way to improve upon this conventional system is to utilize a clustering
algorithm. Using this approach, a collaborative filtering system uses any
of a number of well-known clustering algorithms to divide the database
into a number of clusters. For example, the algorithms described in
KoJain, Algorithms for Clustering Data (1988) can be used. Each cluster
contains the data of users whose preferences tend to be similar. As such,
when predicting the preferences of one user in a cluster, only the
preferences of the other users in the cluster need to be examined and not
the preferences of all other users in the database. A collaborative
filtering system that utilizes a clustering algorithm receives as input a
database, as described above, a guess of the number of clusters and a
distance metric. The guess of the number of clusters is provided by an
administrator of the collaborative filtering system based on their own
knowledge of how many clusters the database can probably be divided into.
The distance metric is a metric provided by the administrator for each
user in the database that estimates how similar one user is to each other
in the database based on user's preferences and attributes. The distance
metric is a range between 0 and 1 with 0 indicating that two users are
least similar and 1 indicating that two users are most similar. This
similarity is expressed as a numerical value. Each user will have a
distance metric for every other user. Thus, the distance metrics are
conveniently represented by an N-by-N matrix, where "N" is the number of
users. After receiving the number of clusters and the distance metric, the
clustering algorithm identifies the clusters.
The clustering algorithm outputs a list of the users in the database and a
cluster number assigned to each user. To determine the preferences of a
user, the other users within that user's cluster are examined. For
example, if the system is attempting to determine whether a user would
like the television show "I Love Lucy," the other users within that
cluster are examined. If there are six other users within the cluster and
five out of the six like "I Love Lucy," then it is likely that so will the
user.
Although utilizing a clustering algorithm is an improvement over the
previously-described conventional system, it has limitations. One such
limitation is that the exact number of clusters is determined manually,
which renders the algorithm prone to human error. Another limitation is
that all attributes are numerical and as such, the values of non-numerical
attributes must be transposed into numerical values. Based upon the
above-described limitations of conventional collaborative filtering
systems, it is desirable to improve collaborative filtering systems.
SUMMARY OF THE INVENTION
The disclosed system provides an improved collaborative filtering system by
utilizing a belief network, which is sometimes known as a Bayesian
network. The disclosed system learns a belief network using both prior
knowledge obtained from an expert in a given field of decision making and
a database containing empirical data obtained from many users. The
empirical data contains attributes of users as well as their preferences
in the field of decision making. After initially learning the belief
network, the belief network is relearned at various intervals when
additional attributes are identified as having a causal effect on the
preferences and data for these additional attributes can be gathered. This
relearning allows the belief network to improve its accuracy at predicting
preferences of a user. Upon each iteration of relearning, a cluster model
is automatically generated which indicates a division of the database into
a number of clusters that best predict the data in the database. After
relearning the belief network a number of times, the belief network is
used to predict the preferences of a user using probabilistic inference.
In performing probabilistic inference, the known attributes of a user are
received and the belief network is accessed to determine the probability
of the unknown preferences of the user given the known attributes. Based
on these probabilities, the preference most likely to be desired by the
user can be predicted.
The disclosed system provides many advantages over conventional
collaborative filtering systems. First, prior knowledge is utilized from
an expert in a given field of decision making to seed the clustering,
which produces clusters that more accurately reflect the data in the
database than some conventional systems. Second, the number of clusters is
determined automatically from the data in the database, which is more
reliable than manually predicting and inputting the number of the clusters
as done in some conventional systems. Third, no distance metric is needed
by the disclosed system. This advantage helps reduce the mount of data
that must be gathered before the system can be run. Fourth, the disclosed
system can use non-numerical attributes so as to eliminate errors
introduced to the system through the transposition of non-numerical values
into numerical values. Fifth, the output of the disclosed system is a
clustering model that is easily modifiable by an administrator so that it
can be fed back into the system and improved upon by the system in an
iterative manner. This iterative approach leads to improved accuracy in
determining the preferences of a user. Other advantages of the disclosed
system will become apparent to those skilled in the art upon reading the
following description.
In accordance with a first aspect of the present invention, a method is
provided in a collaborative filtering system for predicting a desired
preference of a user based on attributes of the user. The method of the
first aspect receives a belief network into the collaborative filtering
system. The belief network contains attribute nodes and preference nodes.
The attribute nodes reflect the attributes of the user and have a value.
The preference nodes reflect available preferences of the user with each
preference node having a probability indicating the likelihood that the
available preference is the desired preference of the user. The first
aspect receives a request, determines the available preference having the
greatest likelihood of being the desired preference, and indicates the
available preference having the greatest likelihood of being the desired
preference.
In accordance with a second aspect of the present invention a collaborative
filtering system is provided for predicting a desired preference of a user
based on attributes of the user. The collaborative filtering system
comprises a belief network, a receive component, a determination
component, and an output component. The belief network contains attribute
nodes and preference nodes. The attribute nodes reflect the attributes of
the user and have a value. The preference nodes reflect available
preferences of the user with each preference node having a probability.
The receive component receives a request to determine an available
preference having the greatest likelihood of being the desired preference.
The determination component determines the available preference having the
greatest likelihood of being the desired preference by evaluating the
probabilities of the preference nodes. The output component indicates the
available preference having the greatest likelihood of being the desired
preference.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts an example of a belief network.
FIG. 2 depicts an example of a belief network for troubleshooting
automobile problems.
FIG. 3 depicts a computer system suitable for practicing the preferred
embodiment oft he present invention.
FIG. 4 depicts a belief network that reflects prior knowledge.
FIG. 5 depicts the database of FIG. 3 in greater detail.
FIG. 6 depicts the network generator of FIG. 3 in greater detail.
FIG. 7 depicts a flowchart of the steps performed when learning a belief
network to be utilized in the collaborative filtering system depicted in
FIG. 3.
FIG. 8 depicts an initial belief network to be used in the learning process
of FIG. 7.
FIG. 9 depicts a cluster model that is output by the network generator of
FIG. 3.
FIG. 10 depicts a flowchart of the steps performed by the network
constructor of FIG. 6.
FIG. 11 depicts a flowchart of the steps performed by the network scorer of
FIG. 6.
FIG. 12 depicts a flowchart of the steps performed by the collaborative
filtering system of FIG. 3.
FIG. 13 depicts a flowchart of the steps performed by an alternative
embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
A preferred embodiment of the present invention provides an improved
collaborative filtering system by utilizing a belief network, which is
sometimes known as a Bayesian network. The preferred embodiment "learns" a
belief network using both prior knowledge obtained from an expert in a
given field of decision making and a database containing empirical data
obtained from many users. The empirical data contains attributes of users
as well as their preferences in the field of decision making. After
initially learning the belief network, the belief network is relearned at
various intervals when additional attributes are identified as having a
causal effect on the preferences and data for these additional attributes
can be gathered. This relearning allows the belief network to improve its
accuracy at predicting preferences of a user. Upon each iteration of
relearning, a cluster model is automatically generated which indicates a
number of clusters that best predicts the data in the database. After
relearning the belief network a number of times, the belief network is
used to predict the preferences of a user using probabilistic inference.
In performing probabilistic inference, the known attributes of a user are
received and the belief network is accessed to determine the probability
of the unknown preferences of the user given the known attributes. Based
on these probabilities, the set of preferences most likely to be desired
by the user can be predicted.
The preferred embodiment of the present invention provides a number of
advantages over conventional systems. First, prior knowledge is utilized
from an expert in a given field of decision making to seed the clustering,
which produces clusters that more accurately reflect the data in the
database than some conventional systems. Second, the number of clusters is
determined automatically from the data in the database, which is more
reliable than manually predicting and inputting the number of the clusters
as done in some conventional systems. Third, no distance metrics are
needed by the preferred embodiment. This advantage helps reduce the mount
of data that must be gathered before the system can be run and improves
the accuracy of the system. Fourth, the preferred embodiment can use
non-numerical attributes so as to eliminate errors introduced to the
system through the transposition of non-numerical values into numerical
values. Fifth, the output of the preferred embodiment is a clustering
model that is easily modifiable by an administrator so that it can be fed
back into the system and improved upon by the system in an iterative
manner. This iterative approach leads to improved accuracy in determining
the preferences of a user. Other advantages of the preferred embodiment
will become apparent to those skilled in the art upon reading the
following description.
Before delving into the details of the preferred embodiment, an overview of
belief networks is provided so that the reader may better comprehend the
details of the preferred embodiment.
INTRODUCTION TO BELIEF NETWORKS
A belief network is a representation of the probabilistic relationships
among states of a portion of the world. The states of the world in a
belief network can change and are, therefore, called variables. A belief
network is expressed as an acyclic-directed graph where the variables
correspond to nodes and the relationships between the nodes correspond to
arcs. FIG. 1 depicts an example belief network 101. In the belief network
101 there are three variables, x.sub.1, x.sub.2, and x.sub.3, which are
represented by nodes 102, 106 and 110, respectively. The example belief
network contains two arcs 104 and 108. Associated with each variable in a
belief network is a set of probability distributions. Using conditional
probability notation, the set of probability distributions for a variable
can be denoted by p(x.sub.i .vertline..PI..sub.i, .xi.), where "p" refers
to the probability distribution, where ".PI..sub.i " denotes the parents
of variable x.sub.i and where ".xi." denotes the knowledge of the expert.
The Greek letter ".xi." indicates that the belief network reflects the
knowledge of an expert in a given field. Thus, this expression reads as
follows: the probability distribution for variable x.sub.i given the
parents of x.sub.i and the knowledge of the expert. For example, x.sub.1
is the parent of x.sub.2. The probability distributions specify the
strength of the relationships between variables. For instance, if x.sub.1
has two states (true and false), then associated with x.sub.1 is a single
probability distribution p(x.sub.1 .vertline..xi.) and associated with
x.sub.2 are two probability distributions p(x.sub.2 .vertline.x.sub.1
=t,.xi.) and p(x.sub.2 .vertline.x.sub.1 =f,.xi.).
One important concept for belief networks is the concept of dependence.
Sets of variables x and y are said to be conditionally independent, given
a set of variables z, if the probability distribution for x given z does
not depend on y. That is, if p(x.vertline.z,y)=p(x.vertline.z), x and y
are conditionally independent given z. If z is empty, however, x and y are
said to be "independent" as opposed to conditionally independent. If x and
y are not conditionally independent, given z, x and y are said to be
conditionally dependent given z.
The arcs in a belief network convey dependence between nodes. When there is
an are from a first node to a second node, the probability distribution of
the second node depends upon the value of the first node. For example,
there is an arc from node 102 to node 106. Therefore, node 106 is said to
be dependent on node 102. Missing arcs in a belief network convey
conditional independence. For example, node 102 and node 110 are
conditionally independent given node 106. That is, the values of nodes 102
and 110 are conditionally independent if the value of node 106 is known,
the condition being the observation node 106. However, two variables
indirectly connected through intermediate variables are dependent given
lack of knowledge of the values ("states" | | |