WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Collaborative filtering utilizing a belief network    
United States Patent5704017   
Link to this pagehttp://www.wikipatents.com/5704017.html
Inventor(s)Heckerman; David E. (Bellevue, WA); Breese; John S. (Mercer Island, WA); Horvitz; Eric (Kirkland, WA); Chickering; David Maxwell (Los Angeles, CA)
AbstractThe disclosed system provides an improved collaborative filtering system by utilizing a belief network, which is sometimes known as a Bayesian network. The disclosed system learns a belief network using both prior knowledge obtained from an expert in a given field of decision making and a database containing empirical data obtained from many people. The empirical data contains attributes of users as well as their preferences in the field of decision making. After initially learning the belief network, the belief network is relearned at various intervals when additional attributes are identified as having a causal effect on the preferences and data for these additional attributes can be gathered. This relearning allows the belief network to improve its accuracy at predicting preferences of a user. Upon each iteration of relearning, a cluster model is automatically generated that best predicts the data in the database. After relearning the belief network a number of times, the belief network is used to predict the preferences of a user using probabilistic inference. In performing probabilistic inference, the known attributes of a user are received and the belief network is accessed to determine the probability of the unknown preferences of the user given the known attributes. Based on these probabilities, the preference most likely to be desired by the user can be predicted.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5704017
Collaborative filtering utilizing a belief network - US Patent 5704017 Drawing
Collaborative filtering utilizing a belief network
Inventor     Heckerman; David E. (Bellevue, WA); Breese; John S. (Mercer Island, WA); Horvitz; Eric (Kirkland, WA); Chickering; David Maxwell (Los Angeles, CA)
Owner/Assignee     Microsoft Corporation (Redmond, WA)
Patent assignment
All assignments
Publication Date     December 30, 1997
Application Number     08/602,238
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     February 16, 1996
US Classification     706/12 706/45 706/52
Int'l Classification     G06F 017/00
Examiner     Hafiz; Tariq R.
Assistant Examiner     Rhodes; Jason W.
Attorney/Law Firm     Seed and Berry LLP
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/62 395/61 395/77 395/50 395/51 395/10 395/11 395/12 395/13
Patent Tags     collaborative filtering utilizing belief network
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5546502
Hart
706/11
Aug,1996

[0 after 0 votes]
5410344
Graves
725/46
Apr,1995

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method in a computer system having a collaborative filtering system for predicting a desired preference of a user based on attributes of the user, comprising the steps of:

receiving into the collaborative filtering system a belief network containing attribute nodes and preference nodes, the attribute nodes reflecting the attributes of the user and having a value, the preference nodes reflecting available preferences of the user, each preference node having a probability indicating the likelihood that the available preference is the desired preference of the user given the values of the attribute nodes;

receiving into the collaborative filtering system a request to determine an available preference having a greatest likelihood of being the desired preference, the request coutaining values for the attribute nodes;

determining by the collaborative tiltering system the available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes given the values of the attribute nodes; and

indicating by the collaborative filtering system the available preference having the greatest likelihood of being the desired preference to the user.

2. The method of claim 1, including the steps of, for each available preference, calculating by the collaborative filtering system a likelihood that the available preference is the desired preference given the values of the attribute nodes and indicating by the collaborative filtering system the calculated likelihood to the user.

3. A method in a computer system having a collaborative filtering system for predicting a desired preference of a user based on an attribute of the user, comprising the computer-implemented steps of:

creating a belief network containing an attribute node and preference nodes, the attribute node reflecting an attribute of the user and having a value, the preference nodes reflecting available preferences of the user, each preference node having a probability indicating a likelihood that the available preference is the desired preference of the user given the value of the attribute node;

providing the created belief network to the collaborative filtering system;

receiving a request by the collaborative filtering system to determine an available preference having a greatest likelihood of being the desired preference, the request containing a value for the attribute node;

determining the available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes given the value of the attribute node; and

indicating the available preference having the greatest likelihood of being the desired preference to the user.

4. The method of claim 3 wherein the step of creating a belief network includes the steps of:

receiving expert data comprising an initial belief network containing a hidden attribute node and at least one preference node, the hidden attribute node being an unknown factor that causally influences the available preferences;

receiving data comprising sets of values for the preference nodes;

determining groups of sets having similar values, wherein each group has a plurality of sets of values;

receiving an indication of a causal factor in the determined groups to account for the similarity of the values in the sets of values in the groups;

receiving observations for the identified causal factor and adding the observations to the data; and

adding the causal factor to the initial belief network as an attribute node after receiving the observations, and

wherein the step of providing the created belief network includes providing the initial belief network to the collaborative filtering system.

5. The method of claim 4 wherein the steps of determining groups, receiving an indication, receiving observations and adding the causal factor are repeatedly performed until either a causal factor cannot be identified or observations for the causal factor cannot be gathered.

6. A method in a collaborative filtering system for predicting a desired preference of a user, comprising the computer-implemented steps of:

receiving a belief network into the collaborative filtering system, the belief network containing a hidden attribute node and preference nodes, the hidden attribute node reflecting an unobserved attribute of the user that causally influences the desired preference, the preference nodes reflecting available preferences of the user, each preference node having a value and having a probability indicating a likelihood that the available preference is the desired preference of the user given the values of the preference nodes;

receiving data containing values for the preference nodes;

receiving a plurality of a number of states for the hidden attribute node;

for each number of states for the hidden attribute node,

generating a belief network having the preference nodes and the hidden attribute node, wherein the hidden attribute node has the number of states; and

scoring the generated belief network for goodness in predicting the desired preference based on the data;

selecting a generated belief network having a best goodness in predicting the desired preference based on the data;

receiving a request to determine an available preference having a greatest likelihood of being the desired preference, the request containing a value for at least one of the preference nodes;

determining an available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes in the selected belief network given the value for the at least one preference node; and

indicating the available preference having the greatest likelihood of being the desired preference to the user.

7. A method in a computer system for generating a belief network containing caused variables and unbidden causal variables, the computer system having a decision support system, the belief network for providing decision support in a given field of expertise, the caused variables reflecting a decision to be rendered by the decision support system and the causal variables reflecting factors that causally effect the decision, comprising the computer-implemented steps of:

receiving expert data from an expert in the given field comprising a belief network containing a hidden causal variable and caused variables and wherein the expert data includes a plurality of a number of states for the hidden causal variable, the hidden causal variable being an unknown factor that causally influences the caused variables;

receiving data comprising sets of values for the caused variables after receiving the expert data;

determining groups of sets in the received data having similar values;

receiving an indication of a causal factor in the determined groups to account for the similarity of the values in each group;

receiving observations for the identified causal factor and adding the observations to the data;

adding the causal factor to the belief network as an unbidden causal variable after receiving the observations;

receiving a new plurality of a number of states for the hidden causal variable after adding the causal factor; and

repeatedly performing the steps of determining groups, receiving an indication, receiving observations, adding the causal factor and receiving a new plurality of a number of states until either a causal factor cannot be identified to account for the similarity of the values in each group or observations for the causal factor cannot be obtained.

8. A collaborative filtering system for predicting a desired preference of a user based on attributes of the user, comprising:

a belief network containing attribute nodes and preference nodes, the attribute nodes reflecting the attributes of the user and having a value, the preference nodes reflecting available preferences of the user, each preference node having a probability indicating a likelihood that the available preference is the desired preference of the user given the values of the attribute nodes;

a receive component for receiving a request to determine an available preference having a greatest likelihood of being the desired preference, the request containing values for the attribute nodes;

a determination component for determining the available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes given the values of the attribute nodes; and

an output component for indicating the available preference having the greatest likelihood of being the desired preference.

9. The collaborative filtering system of claim 8 wherein the attribute nodes of the belief network contain discrete values and the preference nodes contain discrete values.

10. The collaborative filtering system of claim 8 wherein at least one of the attribute nodes contains a non-numerical value.

11. The collaborative filtering system of claim 8 wherein at least one of the preference nodes contains a non-numerical value.

12. The collaborative filtering system of claim 8 wherein one of the attribute nodes contains a known preference of the user.

13. A computer-readable memory device containing:

a data structure representing a belief network that comprises nodes, the belief network for use in a collaborative filtering system for predicting a desired preference of a user based on attributes of the user, the nodes further comprising:

attribute nodes reflecting the attributes of the user, wherein each attribute node has a value; and

preference nodes reflecting available preferences of the user, each preference node having a probability indicating a likelihood that the available preference is the desired preference of the user given the values of the attribute nodes, wherein the collaborative filtering system receives a request for determining a desired preference of a user among the available preferences, wherein the request contains values for the attribute nodes and in response to receiving the request, the collaborative filtering system evaluates the probabilities of the preference nodes to determine the available preference that is most likely to be the desired preference given the values for the attribute nodes in the request.

14. A computer-readable media that causes a collaborative filtering system in a computer system to predict a desired preference of a user based on attributes of the user, by performing the steps of:

receiving a belief network into the collaborative filtering system, the belief network containing attribute nodes and preference nodes, the attribute nodes reflecting the attributes of the user and having a value, the preference nodes reflecting available preferences of the user, each preference node having a probability indicating the likelihood that the available preference is the desired preference of the user given the values of the attribute nodes;

receiving a request to determine an available preference having a greatest likelihood of being the desired preference, the request containing values for the attribute nodes;

determining the available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes given the values of the attribute nodes; and

indicating the available preference having the greatest likelihood of being the desired preference to the user.

15. A computer-readable media that causes a collaborative filtering system in a computer system to predict a desired preference of a user based on an attribute of the user, by performing the steps of:

creating a belief network containing an attribute node and preference nodes, the attribute node reflecting an attribute of the user and having a value, the preference nodes reflecting available preferences of the user, each preference node having a probability indicating a likelihood that the available preference is the desired preference of the user given the value of the attribute node;

providing the created belief network to the collaborative filtering system;

receiving a request by the collaborative filtering system to determine an available preference having a greatest likelihood of being the desired preference, the request containing a value for the attribute node;

determining the available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes given the value of the attribute node; and

indicating the available preference having the greatest likelihood of being the desired preference to the user.

16. A computer-readable media that causes a collaborative filtering system in a computer system to predict a desired preference of a user, by performing the steps of:

receiving a belief network into the collaborative filtering system, the belief network containing a hidden attribute node and preference nodes, the hidden attribute node reflecting an unobserved attribute of the user that causally influence the desired preference, the preference nodes reflecting available preferences of the user, each preference node having a value and having a probability indicating a likelihood that the available preference is the desired preference of the user given the values of the preference nodes;

receiving data containing values for the preference nodes;

receiving a range of a number of states for the hidden attribute node;

for each number of states for the hidden attribute node,

generating a belief network having the preference nodes and the hidden attribute node, wherein the hidden attribute node has the number of states; and

scoring the generated belief network for goodness in predicting the desired preference based on the data;

selecting a generated belief network having a best goodness in predicting the desired preference based on the data;

receiving a request to determine an available preference having a greatest likelihood of being the desired preference, the request containing a value for at least one of the preference nodes;

determining an available preference having the greatest likelihood of being the desired preference by accessing the generated belief network having the best goodness and evaluating the probabilities of the preference nodes given the value for the at least one preference node; and

indicating the available preference having the greatest likelihood of being the desired preference to the user.

17. A computer-readable media that causes a computer system to generate a belief network containing caused variables and unbidden causal variables, the computer system having a decision support system, the belief network for providing decision support in a given field of expertise, the caused variables reflecting a decision to be rendered by the decision support system and the causal variables reflecting factors that causally effect the decision, by performing the steps of:

receiving expert data from an expert in the given field comprising a belief network containing a hidden causal variable and caused variables and wherein the expert data includes a plurality of a number of states for the hidden causal variable, the hidden causal variable being an unknown factor that causally influences the caused variables;

receiving data comprising sets of values for the caused variables after receiving the expert data;

determining groups of sets in the received data having similar values;

receiving an indication of a causal factor in the determined groups to account for the similarity of the values in each group;

receiving observations for the identified causal factor and adding the observations to the data after identifying a causal factor;

adding the causal factor to the belief network as an unhidden causal variable after receiving the observations;

receiving a new plurality of a number of states for the hidden causal variable after adding the causal factor; and

repeatedly performing the steps of determining groups, receiving an indication, receiving observations, adding the causal factor and receiving a new range of a number of states until either a causal factor cannot be identified to account for the similarity of the values in each group or observations for the causal factor cannot be obtained.

18. A method in a collaborative filtering system for predicting a desired preference of a user based on a characteristic of the user, comprising the computer-implemented steps of:

receiving a belief network into the collaborative filtering system, the belief network containing preference nodes reflecting available preferences of the user, each preference node having a probability indicating the likelihood that the available preference is the desired preference of the user given a value for the characteristic of the user;

receiving a request to determine an available preference having a greatest likelihood of being the desired preference, the request containing a value for the characteristic of the user;

determining the available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes given the value of the characteristics; and

indicating the available preference having the greatest likelihood of being the desired preference to the user.

19. The method of claim 18 wherein the collaborative filtering system predicts a desired song of a user, wherein the preference nodes of the belief network contain available songs, and wherein the step of indicating the available preference includes indicating the available song.

20. The method of claim 18 wherein the collaborative filtering system predicts a desired music video of a user, wherein the preference nodes of the belief network contain available music videos, and wherein the step of indicating the available preference includes indicating the available music video having the greatest likelihood of being the desired music video.

21. The method of claim 18 wherein the collaborative filtering system predicts desired music to be purchased by a user, wherein the preference nodes of the belief network contain available music, and wherein the step of indicating the available preference includes indicating the available music having the greatest likelihood of being the desired music.

22. The method of claim 18 wherein the collaborative filtering system predicts a desired television show of a user, wherein the preference nodes of the belief network contain available television shows, and wherein the step of indicating the available preference includes indicating the available television show having the greatest likelihood of being the desired television show.

23. The method of claim 18 wherein the collaborative filtering system is used to incorporate a desired customization of a user into an electronic newspaper, wherein the preference nodes of the belief network contain available customizations for the electronic newspaper, and wherein the step of indicating the available preference includes indicating the available customization having the greatest likelihood of being the desired customization.

24. The method of claim 18 wherein the collaborative filtering system predicts a desired Internet page of a user, wherein the preference nodes of the belief network contain available Internet pages, and wherein the step of indicating the available preference includes indicating the available Internet page having the greatest likelihood of being the desired Internet page.

25. The method of claim 18 wherein the collaborative filtering system predicts a desired Encarta article of a user, wherein the preference nodes of the belief network contain available Encarta articles, and wherein the step of indicating the available preference includes indicating the available Encarta article having the greatest likelihood of being the desired Encarta article.

26. The method of claim 18 wherein the collaborative filtering system predicts a desired graphical layout of a user for a graphical presentation program, wherein the preference nodes of the belief network contain available graphical layouts, and wherein the step of indicating the available preference includes indicating the available graphical layout having the greatest likelihood of being the desired graphical layout.

27. The method of claim 18 wherein the collaborative filtering system predicts a desired email alias of a user, wherein the preference nodes of the belief network contain available email aliases, and wherein the step of indicating the available preference includes indicating the available email alias having the greatest likelihood of being the desired email alias.

28. The method of claim 18 wherein the request includes a selection criteria, wherein the step of determining the available preference includes determining the available preference that has the greatest likelihood of being the desired preference and that conforms to the selection criteria, and wherein the step of indicating the available preference includes indicating the available preference that has the greatest likelihood of being the desired preference and that conforms to the selection criteria.

29. The method of claim 18 wherein the collaborative filtering system predicts a desired movie of a user, wherein the preference nodes of the belief network contain available movies, and wherein the step of indicating the available preference includes indicating the available movie having the greatest likelihood of being the desired movie.

30. The method of claim 18 wherein the collaborative filtering system predicts a desired wine of a user, wherein the preference nodes of the belief network contain available wines, and wherein the step of indicating the available preference includes indicating the available wine having the greatest likelihood of being the desired wine.

31. The method of claim 18 wherein the collaborative filtering system predicts a desired restaurant of a user, wherein the preference nodes of the belief network contain available restaurants, and wherein the step of indicating the available preference includes indicating the available restaurant having the greatest likelihood of being the desired restaurant.

32. The method of claim 18 wherein the collaborative filtering system predicts desired real estate of a user, wherein the preference nodes of the belief network contain available real estate, and wherein the step of indicating the available preference includes indicating the available real estate having the greatest likelihood of being the desired real estate.

33. The method of claim 18 wherein the collaborative filtering system predicts a desired advertisement of a user, wherein the preference nodes of the belief network contain available advertisements, and wherein the step of indicating the available preference includes indicating the available advertisement having the greatest likelihood of being the desired advertisement.

34. The method of claim 33 wherein the step of indicating the available advertisement includes viewing the available advertisement having the greatest likelihood of being the desired advertisement by the user via an on-line system.

35. The method of claim 33 wherein the step of indicating the available advertisement includes viewing the available advertisement having the greatest likelihood of being the desired advertisement by the user via an interactive television.

36. The method of claim 33 wherein the step of indicating the available advertisement includes viewing the available advertisement having the greatest likelihood of being the desired advertisement by a user via a computer network.

37. The method of claim 18 wherein the collaborative filtering system predicts desired products of a user, wherein the preference nodes of the belief network contain available products, and wherein the step of indicating the available preference includes indicating the available product having the greatest likelihood of being the desired product.

38. The method of claim 18 wherein the collaborative filtering system predicts inappropriate material that is deemed unsuitable for viewing by children, wherein the preference nodes of the belief network contain available material, and wherein the step of indicating the available preference includes indicating the available material having the greatest likelihood of being the inappropriate material.

39. The method of claim 38 wherein the available material includes music.

40. The method of claim 38 wherein the available material includes music videos.

41. The method of claim 38 wherein the available material includes movies.

42. The method of claim 38 wherein the available material includes Internet pages.

43. The method of claim 18 wherein the collaborative filtering system predicts a desired user interface of a user, wherein the preference nodes of the belief network contain available user interfaces, and wherein the step of indicating the available preference includes indicating the available user interface having the greatest likelihood of being the desired user interface.
 Description Submit all comments and votes
 


TECHNICAL FIELD

The present invention relates generally to data processing systems and, more particularly, to a collaborative filtering system that utilizes a belief network.

BACKGROUND OF THE INVENTION

Collaborative filtering systems have been developed that predict the preferences of a user. The term "collaborative filtering" refers to predicting the preferences of a user based on known attributes of the user, as well as known attributes of other users. For example, a preference of a user may be whether they would like to watch the television show "I Love Lucy" and the attributes of the user may include their age, gender, and income. In addition, the attributes may contain one or more of the user's known preferences, such as their dislike of another television show. A user's preference can be predicted based on the similarity of that user's attributes to other users. For example, if all users over the age of 50 with a known preference happen to like "I Love Lucy" and if that user is also over 50, then that user may be predicted to also like "I Love Lucy" with a high degree of confidence. One conventional collaborative filtering system has been developed that receives a database as input. The database contains attribute-value pairs for a number of users. An attribute is a variable or distinction, such as a user's age, gender or income, for predicting user preferences. A value is an instance of the variable. For example, the attribute age may have a value of 23. Each preference contains a numeric value indicating whether the user likes or dislikes the preference (e.g., 0 for dislike and 1 for like). The data in the database is obtained by collecting attributes of the users and their preferences.

It should be noted that conventional collaborative filtering systems can typically only utilize numerical attributes. As such, the values for non-numerical attributes, such as gender, are transposed into a numerical value, which sometimes reduces the accuracy of the system. For example, when a variable has three non-numerical states, such as vanilla, chocolate and strawberry, transposing these states into a numerical value will unintentionally indicate dissimilarity between the states. That is, if vanilla were assigned a value of 1, chocolate 2 and strawberry 3, the difference between each value indicates to the system how similar each state is to each other. Therefore, the system may make predictions based on chocolate being more similar to both vanilla and strawberry than vanilla is similar to strawberry. Such predictions may be based on a misinterpretation of the data and lead to a reduction in the accuracy of the system.

In performing collaborative filtering, the conventional system first computes the correlation of attributes between a given user "v" and each other user "u" (except v) in the database. The computation of the "correlation" is a well-known computation in the field of statistics. After computing the correlation, the conventional system computes, for example, the preference of a user "v" for a title of a television show "t" as follows: ##EQU1## where "pref(t, v)" is the preference of user "v" for title "t," where "<pref(t)>" is the average preference of title "t" by all users, where "pref(t, u)" is the preference of user "u" for title "t," where "corr(u, v)" is the correlation of users "u" and "v," and the sums run over the users "u" that have expressed a preference for title "t." One drawback to this conventional system is that the entire database must be examined when predicting preferences, which requires a significant amount of processing time.

One way to improve upon this conventional system is to utilize a clustering algorithm. Using this approach, a collaborative filtering system uses any of a number of well-known clustering algorithms to divide the database into a number of clusters. For example, the algorithms described in KoJain, Algorithms for Clustering Data (1988) can be used. Each cluster contains the data of users whose preferences tend to be similar. As such, when predicting the preferences of one user in a cluster, only the preferences of the other users in the cluster need to be examined and not the preferences of all other users in the database. A collaborative filtering system that utilizes a clustering algorithm receives as input a database, as described above, a guess of the number of clusters and a distance metric. The guess of the number of clusters is provided by an administrator of the collaborative filtering system based on their own knowledge of how many clusters the database can probably be divided into. The distance metric is a metric provided by the administrator for each user in the database that estimates how similar one user is to each other in the database based on user's preferences and attributes. The distance metric is a range between 0 and 1 with 0 indicating that two users are least similar and 1 indicating that two users are most similar. This similarity is expressed as a numerical value. Each user will have a distance metric for every other user. Thus, the distance metrics are conveniently represented by an N-by-N matrix, where "N" is the number of users. After receiving the number of clusters and the distance metric, the clustering algorithm identifies the clusters.

The clustering algorithm outputs a list of the users in the database and a cluster number assigned to each user. To determine the preferences of a user, the other users within that user's cluster are examined. For example, if the system is attempting to determine whether a user would like the television show "I Love Lucy," the other users within that cluster are examined. If there are six other users within the cluster and five out of the six like "I Love Lucy," then it is likely that so will the user.

Although utilizing a clustering algorithm is an improvement over the previously-described conventional system, it has limitations. One such limitation is that the exact number of clusters is determined manually, which renders the algorithm prone to human error. Another limitation is that all attributes are numerical and as such, the values of non-numerical attributes must be transposed into numerical values. Based upon the above-described limitations of conventional collaborative filtering systems, it is desirable to improve collaborative filtering systems.

SUMMARY OF THE INVENTION

The disclosed system provides an improved collaborative filtering system by utilizing a belief network, which is sometimes known as a Bayesian network. The disclosed system learns a belief network using both prior knowledge obtained from an expert in a given field of decision making and a database containing empirical data obtained from many users. The empirical data contains attributes of users as well as their preferences in the field of decision making. After initially learning the belief network, the belief network is relearned at various intervals when additional attributes are identified as having a causal effect on the preferences and data for these additional attributes can be gathered. This relearning allows the belief network to improve its accuracy at predicting preferences of a user. Upon each iteration of relearning, a cluster model is automatically generated which indicates a division of the database into a number of clusters that best predict the data in the database. After relearning the belief network a number of times, the belief network is used to predict the preferences of a user using probabilistic inference. In performing probabilistic inference, the known attributes of a user are received and the belief network is accessed to determine the probability of the unknown preferences of the user given the known attributes. Based on these probabilities, the preference most likely to be desired by the user can be predicted.

The disclosed system provides many advantages over conventional collaborative filtering systems. First, prior knowledge is utilized from an expert in a given field of decision making to seed the clustering, which produces clusters that more accurately reflect the data in the database than some conventional systems. Second, the number of clusters is determined automatically from the data in the database, which is more reliable than manually predicting and inputting the number of the clusters as done in some conventional systems. Third, no distance metric is needed by the disclosed system. This advantage helps reduce the mount of data that must be gathered before the system can be run. Fourth, the disclosed system can use non-numerical attributes so as to eliminate errors introduced to the system through the transposition of non-numerical values into numerical values. Fifth, the output of the disclosed system is a clustering model that is easily modifiable by an administrator so that it can be fed back into the system and improved upon by the system in an iterative manner. This iterative approach leads to improved accuracy in determining the preferences of a user. Other advantages of the disclosed system will become apparent to those skilled in the art upon reading the following description.

In accordance with a first aspect of the present invention, a method is provided in a collaborative filtering system for predicting a desired preference of a user based on attributes of the user. The method of the first aspect receives a belief network into the collaborative filtering system. The belief network contains attribute nodes and preference nodes. The attribute nodes reflect the attributes of the user and have a value. The preference nodes reflect available preferences of the user with each preference node having a probability indicating the likelihood that the available preference is the desired preference of the user. The first aspect receives a request, determines the available preference having the greatest likelihood of being the desired preference, and indicates the available preference having the greatest likelihood of being the desired preference.

In accordance with a second aspect of the present invention a collaborative filtering system is provided for predicting a desired preference of a user based on attributes of the user. The collaborative filtering system comprises a belief network, a receive component, a determination component, and an output component. The belief network contains attribute nodes and preference nodes. The attribute nodes reflect the attributes of the user and have a value. The preference nodes reflect available preferences of the user with each preference node having a probability. The receive component receives a request to determine an available preference having the greatest likelihood of being the desired preference. The determination component determines the available preference having the greatest likelihood of being the desired preference by evaluating the probabilities of the preference nodes. The output component indicates the available preference having the greatest likelihood of being the desired preference.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts an example of a belief network.

FIG. 2 depicts an example of a belief network for troubleshooting automobile problems.

FIG. 3 depicts a computer system suitable for practicing the preferred embodiment oft he present invention.

FIG. 4 depicts a belief network that reflects prior knowledge.

FIG. 5 depicts the database of FIG. 3 in greater detail.

FIG. 6 depicts the network generator of FIG. 3 in greater detail.

FIG. 7 depicts a flowchart of the steps performed when learning a belief network to be utilized in the collaborative filtering system depicted in FIG. 3.

FIG. 8 depicts an initial belief network to be used in the learning process of FIG. 7.

FIG. 9 depicts a cluster model that is output by the network generator of FIG. 3.

FIG. 10 depicts a flowchart of the steps performed by the network constructor of FIG. 6.

FIG. 11 depicts a flowchart of the steps performed by the network scorer of FIG. 6.

FIG. 12 depicts a flowchart of the steps performed by the collaborative filtering system of FIG. 3.

FIG. 13 depicts a flowchart of the steps performed by an alternative embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

A preferred embodiment of the present invention provides an improved collaborative filtering system by utilizing a belief network, which is sometimes known as a Bayesian network. The preferred embodiment "learns" a belief network using both prior knowledge obtained from an expert in a given field of decision making and a database containing empirical data obtained from many users. The empirical data contains attributes of users as well as their preferences in the field of decision making. After initially learning the belief network, the belief network is relearned at various intervals when additional attributes are identified as having a causal effect on the preferences and data for these additional attributes can be gathered. This relearning allows the belief network to improve its accuracy at predicting preferences of a user. Upon each iteration of relearning, a cluster model is automatically generated which indicates a number of clusters that best predicts the data in the database. After relearning the belief network a number of times, the belief network is used to predict the preferences of a user using probabilistic inference. In performing probabilistic inference, the known attributes of a user are received and the belief network is accessed to determine the probability of the unknown preferences of the user given the known attributes. Based on these probabilities, the set of preferences most likely to be desired by the user can be predicted.

The preferred embodiment of the present invention provides a number of advantages over conventional systems. First, prior knowledge is utilized from an expert in a given field of decision making to seed the clustering, which produces clusters that more accurately reflect the data in the database than some conventional systems. Second, the number of clusters is determined automatically from the data in the database, which is more reliable than manually predicting and inputting the number of the clusters as done in some conventional systems. Third, no distance metrics are needed by the preferred embodiment. This advantage helps reduce the mount of data that must be gathered before the system can be run and improves the accuracy of the system. Fourth, the preferred embodiment can use non-numerical attributes so as to eliminate errors introduced to the system through the transposition of non-numerical values into numerical values. Fifth, the output of the preferred embodiment is a clustering model that is easily modifiable by an administrator so that it can be fed back into the system and improved upon by the system in an iterative manner. This iterative approach leads to improved accuracy in determining the preferences of a user. Other advantages of the preferred embodiment will become apparent to those skilled in the art upon reading the following description.

Before delving into the details of the preferred embodiment, an overview of belief networks is provided so that the reader may better comprehend the details of the preferred embodiment.

INTRODUCTION TO BELIEF NETWORKS

A belief network is a representation of the probabilistic relationships among states of a portion of the world. The states of the world in a belief network can change and are, therefore, called variables. A belief network is expressed as an acyclic-directed graph where the variables correspond to nodes and the relationships between the nodes correspond to arcs. FIG. 1 depicts an example belief network 101. In the belief network 101 there are three variables, x.sub.1, x.sub.2, and x.sub.3, which are represented by nodes 102, 106 and 110, respectively. The example belief network contains two arcs 104 and 108. Associated with each variable in a belief network is a set of probability distributions. Using conditional probability notation, the set of probability distributions for a variable can be denoted by p(x.sub.i .vertline..PI..sub.i, .xi.), where "p" refers to the probability distribution, where ".PI..sub.i " denotes the parents of variable x.sub.i and where ".xi." denotes the knowledge of the expert. The Greek letter ".xi." indicates that the belief network reflects the knowledge of an expert in a given field. Thus, this expression reads as follows: the probability distribution for variable x.sub.i given the parents of x.sub.i and the knowledge of the expert. For example, x.sub.1 is the parent of x.sub.2. The probability distributions specify the strength of the relationships between variables. For instance, if x.sub.1 has two states (true and false), then associated with x.sub.1 is a single probability distribution p(x.sub.1 .vertline..xi.) and associated with x.sub.2 are two probability distributions p(x.sub.2 .vertline.x.sub.1 =t,.xi.) and p(x.sub.2 .vertline.x.sub.1 =f,.xi.).

One important concept for belief networks is the concept of dependence. Sets of variables x and y are said to be conditionally independent, given a set of variables z, if the probability distribution for x given z does not depend on y. That is, if p(x.vertline.z,y)=p(x.vertline.z), x and y are conditionally independent given z. If z is empty, however, x and y are said to be "independent" as opposed to conditionally independent. If x and y are not conditionally independent, given z, x and y are said to be conditionally dependent given z.

The arcs in a belief network convey dependence between nodes. When there is an are from a first node to a second node, the probability distribution of the second node depends upon the value of the first node. For example, there is an arc from node 102 to node 106. Therefore, node 106 is said to be dependent on node 102. Missing arcs in a belief network convey conditional independence. For example, node 102 and node 110 are conditionally independent given node 106. That is, the values of nodes 102 and 110 are conditionally independent if the value of node 106 is known, the condition being the observation node 106. However, two variables indirectly connected through intermediate variables are dependent given lack of knowledge of the values ("states"