WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
System and method for hierarchically grouping and ranking a set of objects in a query context based on one or more relationships    
United States Patent5875446   
Link to this pagehttp://www.wikipatents.com/5875446.html
Inventor(s)Brown; Eric William (New Fairfield, CT); Chang; Rong Nickle (Ossining, NY); Ellozy; Hamed Abdelfattah (Bedford Hills, NY); Prager; John Martin (Hackensack, NJ); So; Edward Cholchin (Flushing, NY)
AbstractTopically relevant objects in an object database are first identified using any generally known methods to obtain a set of topically relevant objects (topically relevant set). Parents, and in alternative embodiments other ancestors, of one or more of the topically relevant objects are identified according to directional structural relationships that the parents have with respect to the topically relevant objects. These objects form a set of structurally relevant objects (structurally relevant set). In some embodiments, the user query identifies one or more of these structural relationships. The topically relevant objects are then organized under one or more of their respective parents to form a hierarchy level of both (topically relevant and structurally relevant) sets of objects. In some preferred embodiments, the process can iterate to create more than one hierarchy level.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Brown; Eric William (New Fairfield, CT); Chang; Rong Nickle (Ossining, NY); Ellozy; Hamed Abdelfattah (Bedford Hills, NY); Prager; John Martin (Hackensack, NJ); So; Edward Cholchin (Flushing, NY)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     February 23, 1999
Application Number     08/804,599
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     February 24, 1997
US Classification     707/3 707/1 707/4 715/970.1
Int'l Classification     G06F 017/30
Examiner     Lintz; Paul R.
Assistant Examiner     Mizrahi; Diane D.
Attorney/Law Firm     Percello; Louis J. McGinn & Gibb, P.C.
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/200.79 395/683 395/703 395/708 395/200.75 707/203 707/1 707/3 707/4 707/103 707/104 380/4 345/348 345/356 706/5 706/25 434/350 370/351 370/222 379/202
Patent Tags     hierarchically grouping ranking set of objects query context based one more relationships
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5765028
Gladden
706/25
Jun,1998

[0 after 0 votes]
5727175
Malone
715/763
Mar,1998

[0 after 0 votes]
5710918
Lagarde
707/10
Jan,1998

[0 after 0 votes]
5701451
Rogers
707/1
Dec,1997

[0 after 0 votes]
5692180
Lee
707/10
Nov,1997

[0 after 0 votes]
5680452
Shanton
713/167
Oct,1997

[0 after 0 votes]
5491820
Belove
707/3
Feb,1996

[0 after 0 votes]
5371852
Attanasio
709/245
Dec,1994

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A computer system having one or more memories and one or more central processing units, the computer capable of accessing one or more database memories, a plurality of objects stored in one or more of the database memories and each object identified by an index, the computer system further comprising:

a user interface for entering a query;

a hit list, stored in one or more of the memories, the hit list being a topically relevant set of a plurality of topically relevant objects, the topically relevant set being selected from one or more of the database memories and ranked by a topical search engine using the index, the rank indicating a topical relevance to the query;

a relationship data structure containing information about how two or more of the objects are related to one another by one or more structural relationships, the structural relationships being directed; and

a hierarchical view generator that selects a structurally relevant set of the objects, each object in the structurally relevant set being a candidate parent of one or more of the topically relevant objects according to one or more of the structural relationships, said hierarchical view generator dynamically determining which of the candidates for parents are selected as actual parents for said query entered by said user interface, based on said one or more structural relationships,

each of the topically relevant objects being a child of one or more of the selected parents, the hierarchical view generator organizing each of the child objects under one or more of its respective selected parents to create a directional hierarchy.

2. A computer system, as in claim 1, further comprising a display that renders a presentation of one or more of the selected parents and the child objects associated with the respective selected parent.

3. A computer system, as in claim 1, where the database memory resides on one or more connected computers that are connected to the computer system by a network.

4. A computer system, as in claim 3, where the query accesses the computer system from the network.

5. A computer system, as in claim 1, where the hierarchical view generator subselects a subset of the most relevant topically relevant objects according to rank as the topically relevant set.

6. A computer system, as in claim 1, where the information in the relationship data structure includes weights for the structural relationships and the hierarchical view generator subselects a subset of the structurally relevant objects as the structurally relevant set as a function of the weights.

7. A computer system, as in claim 1, where one or more relationship data structures defining one or more structural relationships are selected using information in the query.

8. A computer system, as in claim 1, where the structural relationships include any one or more of the following: the hyperlink relationship, the subclass or categorization relationship, the geographic location relationship, the location within a file system relationship, the conceptual relationship, and the is-a-part-of relationship.

9. The system of claim 1, wherein said hit-list is organized into a hierarchical structure of parent and children objects,

the connections in the structure corresponding to one or more of the structural relationships, and said structured hit-list allowing a user to understand a nature of the results of the query, and to find an appropriate object to satisfy the user's information need.

10. The system of claim 1, wherein a topical search, responsive to the query, comprises a content-based similarity search where the query does not define a precise set of objects,

the search engine computing the similarity of the query to each object in the database to generate a ranked list of objects ordered by relevance to the query.

11. The system of claim 1, wherein said objects comprise data objects, and said data objects are stored in a database with relationships defined between the objects,

said hierarchical view generator selecting and processing the relationships to identify structurally relevant parents from an original child set.

12. The system of claim 1, wherein said hierarchical view generator includes an iterator, said iterator processing structural relationships to build a parent hierarchy based on topical and structural relevance,

said iterator selecting structural relationships in a query context and ranks parent objects based on their structural relevance to the query.

13. The system of claim 1, wherein the hierarchical structure groups objects for answering a user's query, and provides insight into the relevant structure of the database, to satisfy the user's information need.

14. The system of claim 1, wherein the user's query selects certain relationships for finding structurally relevant objects and for organizing and displaying the result list accordingly,

wherein said objects are ranked based on degrees of topical relevance, and a subset of the topically relevant objects is selected for further relationship processing.

15. The system of claim 1, wherein said objects on the result list are organized based on their structural relationships, and display of objects at certain levels of the hierarchy is suppressed based on the relationships in which they participate,

wherein objects in the topically relevant set are ranked, and the children thereof are organized under their parents according to a structural relationship.

16. A computer system having one or more memories and one or more central processing units, the computer capable of accessing one or more database memories, a plurality of objects stored in one or more of the database memories and identified by an index, the computer system further comprising:

a user interface for entering a query;

a hit list, stored in one or more of the memories, the hit list being a topically relevant set of a plurality of topically relevant objects, the topically relevant set being selected from one or more of the database memories and ranked by a topical search engine using the index, the rank indicating a topical relevance to the query,

a relationship data structure containing information about how two or more of the objects are related to one another by one or more structural relationships, the structural relationships being directed;

a hierarchical view generator that selects a structurally relevant set of the objects, each object in the structurally relevant set being a candidate parent of one or more of the topically relevant objects according to one or more of the structural relationships, said hierarchical view generator dynamically determining which of the candidates for parents are selected as actual parents for said query entered by said user interface, based on said one or more structural relationships,

each of the topically relevant objects being a child of one or more of the selected parents, the hierarchical view generator organizing each of the child objects under one or more of its respective selected parents to create a directional hierarchy; and

an iterator in the hierarchical view generator that treats the selected parents as topically relevant objects and runs the hierarchical view generator zero or more times, each time creating a parent level in the directional hierarchy.

17. A system, as in claim 16, where a different set of one or more structural relationships is selected in one or more iterations of the iterator.

18. A computer system, as in claim 16, further comprising:

a display that renders a presentation of one or more selected parents in one of the parent levels and one or more of the objects, being displayed objects, structurally related to one of the selected parents and at a lower level than the parent level in the directional hierarchy.

19. A system, as in claim 18, where one or more of the displayed objects is suppressed in the display.

20. A system, as in claim 19, where the displayed object is suppressed due to its structural relationship with the respective selected parent.

21. A computer system having one or more memories and one or more central processing units, the computer capable of accessing one or more database memories, a plurality of objects stored in one or more of the database memories and each object identified by an index, the computer system further comprising:

means for selecting a topically relevant set of two or more of the objects;

means for ranking the objects in the topically relevant set;

means for identifying one or more structural relationships between one or more of the objects in the topically relevant set, being children, and one or more of the objects in the database memories, being parents, the structural relationships being directional from the parent to the child, said identifying means including means for dynamically determining which of the candidates for parents are selected as actual parents for said query entered by said user interface based on said one or more structural relationships; and

means for organizing one or more of the children under each of the respective selected parents in a structural hierarchy.

22. A method of hierarchically grouping a plurality of objects stored In one or more database memories of a computer system, comprising steps of:

a. selecting a topically relevant set of two or more of the objects;

b. ranking the objects in the topically relevant set;

c. identifying one or more structural relationships between one or more of the objects in the topically relevant set, being children, and one or more of the objects in the database memories, being parents, the structural relationships being directional from the parent to the child, said step of dynamically determining which of the candidates for parents are selected as actual parents for said query entered by said user interface based on said one or more structural relationships; and

d. organizing one or more of the children under each of the respective selected parents.

23. A method, as in claim 22, further comprising steps of:

e. deselecting the children and identifying the parents as children; and

f. repeating steps c and d zero or more times, each time creating a next hierarchical level.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

This invention relates to the field of searching and navigating a large object collection, particularly a hypermedia database in a networking environment. More specifically, the invention relates to a system and method for generating grouped hierarchical views (with ranking) for a set of (hypermedia) objects in a query context based on one or more relationships.

BACKGROUND OF THE INVENTION

A hypermedia object database is a collection of hypermedia objects stored electronically as files on one or more computers. Hypermedia objects contain information in the form of text, images, sound, or video. Hypermedia objects may also participate in relationships, where a relationship identifies one or more hypermedia objects that are related somehow (a hypermedia object may be related to itself). One common relationship is the hyperlink relationship. Two hypermedia objects are related by a (directed) hyperlink relationship if one of the objects contains a hyperlink pointer to the other object. A hyperlink pointer is a reference to a hypermedia object that allows the target object to be accessed directly from the source object.

Users access a hypermedia object database to locate objects of interest and retrieve those objects for processing (e.g., reading, viewing, listening, analysis). Finding objects of interest by manually inspecting every object in a large database is impractical. Instead, users typically search the database for interesting objects using a search system. A search system allows a user to express an information need in the form of a query. The system's search engine processes the query and returns to the user a hit-list of relevant objects. The user then selects interesting objects from the hit-list and retrieves those objects.

A relational database management system (RDBMS) may be used to index and search arbitrary hypermedia objects based on their attributes. Attributes include items such as size, creation date, author, and title. Searching for objects in this fashion is well known. In addition to attribute-based searching, users may want to search for hypermedia objects based on their content. The algorithms and data structures used by a content-based search system depend on the kind of object being searched. Text objects are typically searched using an information retrieval (IR) system (e.g., IBM Search Manager/2, a trademark of the IBM Corporation). Image objects are typically searched using an image indexing and retrieval system (e.g., IBM QBIC, a trademark of the IBM Corporation). Content-based search techniques for video and sound exist and have been incorporated into prototype systems, but this technology is less mature than text and image search. Objects found using an attribute-based or content-based search system are said to be "topically relevant" to the query.

Some prior art content-based search systems attempt to improve the search results for hypermedia object databases by refining object relevance scores based on the structural relationships (e.g., hyperlinks) between the objects. Three representative techniques are used by these systems. The first technique is a form of "spreading activation," where object relevance scores are propagated along outbound hyperlink pointers to neighboring objects and used to modify the relevance scores of those objects (see Cohen, P. R., and Kjeldsen, R. "Information Retrieval by Constrained Spreading Activation in Semantic Networks," Information Processing & Management, 23(2), pp. 255-268, 1987; Savoy, J. "Citation Schemes in Hypertext Information Retrieval," in M. Agosti and A. Smeaton (Eds.), Information Retrieval and Hypertext, Boston, Kluwer Academic Publishers, pp. 99-120, 1996). This procedure is typically iterated until a steady-state is reached or some terminating condition is met.

The objects are then sorted by their final relevance scores and returned on a flat hit-list (i.e., the hit-list simply enumerates the objects without describing any structural relationships).

In the second technique, it is assumed that the hypermedia objects are organized in a given hierarchy, such that every object has at most one parent and the children of a given object are explicitly identified. An object's relevance score is then calculated as a function of its content-based relevance score and the relevance scores of its children. Relevance scores must be propagated from the leaves of a hierarchy to the root (see Frisse, M. E. "Searching for Information in a Hypertext Medical Handbook," Communications of the ACM, 31(7), pp. 880-886, 1988). The objects are then sorted by their final relevance scores and returned on a flat hit-list.

In the third technique, the content of neighboring objects is added to the content of the current object when determining the relevance score for the current object (see Croft et al. "Retrieving Documents by Plausible Inference: an Experimental Study," Information Processing & Management, 25(6), pp. 599-614, 1989). Neighboring objects are those objects to which the current object contains hyperlink pointers. As in the previous two techniques, objects are sorted by their relevance scores and returned on a flat hit-list.

The above cited references are incorporated by reference in their entirety.

Regardless of the search technology being used, most search systems follow the same basic procedure for indexing and searching a hypermedia object database. First, the objects to be searched must be input to the search system for indexing. Next, attributes and/or contents are extracted from the objects and processed to create an index. An index consists of data that is used by the search system to process queries and identify relevant objects. After the index is built, queries may be submitted to the search system. The query represents the user's information need and is expressed using a query language and syntax defined by the search system. The search system processes the query using the index data for the database and a suitable similarity ranking algorithm, and returns a hit-list of topically relevant objects. The user may then select relevant objects from the hit-list for viewing and processing.

A user may also use objects on the hit-list as navigational starting points. Navigation is the process of moving from one hypermedia object to another hypermedia object by traversing a hyperlink pointer between the objects. This operation is typically facilitated by a user interface that displays hypermedia objects, highlights the hyperlinks in those objects, and provides a simple mechanism for traversing a hyperlink and displaying the referent object. One such user interface is a Web browser (see below). By navigating, a user may find other objects of interest.

In a networking environment, the components of a hypermedia object database system may be spread across multiple computers. A computer comprises a Central Processing Unit (CPU), main memory, disk storage, and software (e.g., a personal computer (PC) like the IBM ThinkPad). (ThinkPad is a trademark of the IBM Corporation.) A networking environment consists of two or more computers connected by a local or wide area network (e.g., Ethernet, Token Ring, the telephone network, and the Internet.) (See for example, U.S. Pat. No. 5,371,852 to Attanasio et al. issued on Dec. 6, 1994 which is herein incorporated by reference in its entirety.) A user accesses the hypermedia object database using a client application on the user's computer. The client application communicates with a search server (the hypermedia object database search system) on either the user's computer (e.g. a client) or another computer (e.g. one or more servers) on the network. To process queries, the search server needs to access just the database index, which may be located on the same computer as the search server or yet another computer on the network. The actual objects in the database may be located on any computer on the network.

A Web environment, such as the World Wide Web on the Internet, is a networking environment where Web servers, e.g. Netscape Enterprise Server and IBM Internet Connection Server, and browsers, e.g. Netscape Navigator and IBM WebExplorer, are used. (Netscape Navigator is a trademark of the Netscape Communications Corporation and WebExplorer is a trademark of the IBM Corporation.) Users can make hypermedia objects publicly available in a Web environment by registering the objects with a Web server. Moreover, users can create arbitrary relationships between these objects, even if the objects were created by another user. Other users in the Web environment can then retrieve these objects using a Web browser. The collection of objects retrievable in a Web networking environment can be considered as a large hypermedia object database.

To create an index for a hypermedia object database in a Web networking environment, the prior art often uses Web crawlers, also called robots, spiders, wanderers, or worms (e.g., WebCrawler, WWWWorm), to gather the available objects and submit them to the search system indexer. Web crawlers make use of the (physical) hyperlinks stored in objects. All of the objects are gathered by identifying a few key starting points, retrieving those objects for indexing, retrieving and indexing all objects referenced by the objects just indexed (via hyperlinks), and continuing recursively until all objects reachable from the starting points have been retrieved and indexed. The graph of objects in a Web environment is typically well connected, such that nearly all of the available objects can be found when appropriate starting points are chosen.

Having gathered and indexed all of the objects available in the Web environment, the index can then be used, as described above, to search for objects in the Web. Again, the index may be located independently of the objects, the client, and even the search server. A hit-list, generated as the result of searching the index, will typically identify the locations of the relevant objects on the Web, and the user will retrieve those objects directly with their Web browser.

STATEMENT OF PROBLEMS WITH THE PRIOR ART

When searching a hypermedia object database, the prior art fails to create a hierarchically structured search result based on arbitrary relationships between the objects in the database. Some prior art systems consider relationships when ranking objects, but the final result in most of these systems is still a simple hit-list (comparable to systems that ignore relationships altogether) that conveys no information about the relationships between the objects. Those prior art systems that can display a hierarchically structured search result require that a single, pre-determined hierarchy be specified for all of the objects in the hypermedia database. For large databases (e.g., the World Wide Web), this requirement is impossible to satisfy, and in general, this requirement is restrictive and inflexible.

No prior art system fully exploits the relationships between objects in a hypermedia object database to 1) produce a search result that includes both topically and structurally relevant objects ("structurally relevant" objects are objects that may not be topically relevant, but are still relevant to the query due to their structural relationships with topically relevant objects), and 2) present the search results such that the end user can easily identify and exploit the relationships between the objects on the hit-list.

The prior art performs even worse in a networked multi-user environment, e.g. the World Wide Web, where documents are created by multiple authors and are poorly cross referenced. For example, when a hypertext document is created by a single author, its pages are typically well cross referenced and provide links to next, previous, parent, index, and table of contents pages. Even though a prior art search system ignores these links and identifies only topically relevant pages in the document, since the document is well cross referenced, the user can navigate to structurally relevant pages in the document by following the cross reference links. However, the user does not have this option in a networked multi-user environment where a second, independent author could create a related document with hypertext links to the document created by the first author. A relationship now exists between these two documents, but there are no good cross references from the first document to the second document, preventing the user from navigating to the second document. If the second document is not topically relevant to the user's query, the user will fail to find the second document even though it is structurally relevant to the query due to its relationship to the first document.

OBJECTS OF THE INVENTION

An object of this invention is a system and method that generates a hierarchical grouping of topically and structurally relevant objects in a query context.

An object of this invention is a system and method that generates a ranked, hierarchical grouping of topically and structurally relevant objects in a query context.

An object of this invention is a system and method that generates a hierarchical grouping of relevant objects in a query context based on one or more structural relationships.

An object of this invention is a system and method that generates ranked, hierarchical groupings of topically and structurally relevant objects in a query context in a networking environment.

An object of this invention is a system and method that generates a hierarchical grouping of topically and structurally relevant objects in a query context by: identifying objects that are structurally relevant, organizing structurally and topically relevant objects into meaningful groups, identifying navigational starting points, and presenting the groups to a user in a meaningful way.

SUMMARY OF THE INVENTION

The present invention is a system and method for identifying and hierarchically grouping one or more (hypermedia) objects that are topically relevant to a user query and/or have a structural relation with one another.

Topically relevant objects are first identified using any generally known methods to obtain a set of topically relevant objects (topically relevant set). Parents, and in alternative embodiments other ancestors, of one or more of the topically relevant objects are identified according to directional structural relationships that the parents have with respect to the topically relevant objects. These objects form a set of structurally relevant objects (structurally relevant set). In some embodiments, the user query identifies one or more of these structural relationships. The topically relevant objects are then organized under one or more of their respective parents to form a hierarchy level of both (topically relevant and structurally relevant) sets of objects. In some preferred embodiments, the process can iterate to create more than one hierarchy level.

In alternative embodiments, subsets of the topically relevant set can be selected, e.g., by rank, as the topically relevant set. Also, subsets of the structurally relevant set, e.g., by weighting, can be selected as the structurally relevant set. For example, the weight can be based on the number of hyperlink paths that connect the ancestor to one or more of the topically relevant objects, the strengths of these hyperlink paths, and other attributes of the ancestor, such as the total number of hyperlinks originating at the ancestor and the topical relevance of the ancestor.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of preferred embodiments of the invention with reference to the drawings that include the following:

FIG. 1 is a block diagram of the computing environment in which the present invention is used in a non limiting preferred embodiment.

FIG. 2, comprising FIGS. 2A and 2B, is a block diagram of an example object collection, in particular a hypermedia object database.

FIG. 3 is a block diagram of an example hypermedia object database that is hierarchically grouped by the present invention.

FIG. 4 is a block diagram of an example hypermedia object database that is hierarchically grouped as defined by a first structural relationship.

FIG. 5 is a block diagram of an example hypermedia object database that is hierarchically grouped as defined by a second structural relationship.

FIG. 6, includes prior art FIGS. 6A and 6B, where FIG. 6A shows an object catalog, and FIG. 6B a table of named attribute values for objects in the catalog.

FIG. 7 comprises FIGS. 7A and 7B, where FIG. 7A shows a structured query and FIG. 7B shows an object hit-list.

FIG. 8 shows a relationship catalog and several of the relationship tables to which the relationship catalog refers.

FIG. 9 a children table.

FIG. 10 is a flow chart showing the steps of one preferred hierarchical view generator process executed by the present invention.

FIG. 11 is a flow chart showing the steps of a preferred process for scoring parent objects based on structural relationships.

FIG. 12, comprising FIGS. 12A and 12B, are flow charts showing the steps of a preferred process for displaying the results in a ranked hierarchical order.

FIG. 13 is a screen dump showing a sample output result of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a block diagram of the computing environment in which the present invention is used in a non limiting preferred embodiment. The figure shows some of the possible hardware, software, and networking configurations that make up the computing environment.

The computing environment or system 100 comprises one or more general purpose computers 170, 175, 180, 185, 190, and 195 interconnected by a network 105. Examples of general purpose computers include the IBM Aptiva personal computer, the IBM RISC System/6000 workstation, and the IBM POWER parallel SP2. (These are Trademarks of the IBM Corporation.) The network 105 may be a local area network (LAN), a wide area network (WAN), or the Internet. Moreover, the computers in this environment may support the Web information exchange protocol (HTTP) and be part of a local Web or the World Wide Web (WWW). Some computers (e.g., 195) may occasionally or always be disconnected 196 from the network and operate as stand-alone computers.

Hypermedia objects 140 are items such as books, articles, reports, pictures, movies, or recordings that contain text, images, video, audio, or any other multimedia object and/or information. One or more hypermedia objects are stored on one or more computers in the environment.

To find a particular hypermedia object in the environment, a query (see FIG. 7A) is submitted for processing to a topical search engine 120 running on a computer in the environment. The topical search engine uses an index 130 to identify hypermedia objects that are relevant to the query. The topical search engine creates an index by indexing a particular set of hypermedia objects in the environment, called a hypermedia object database 141. A hypermedia object database 141 may comprise hypermedia objects located anywhere in the computing environment, e.g., spread across two or more computers. The relevant hypermedia objects identified by the index are ranked and returned by the topical search engine in the form of a hit-list (see FIG. 7B). The process is well known in the prior art. Examples of topical search engines include Search Manager/2 (a trademark of the IBM corporation.)

The result of the search is further processed by submitting the query and topical search engine results to a novel hierarchical view generator 110. The hierarchical view generator 110 uses structural relationships in the index 130 (see FIG. 8) to analyze (parent selection and ranking) the relationships in the database and improve the search result. The structural relationships are stored in the index by the hierarchical view generator at indexing time. In some preferred embodiments, the hierarchical view generator selects structurally relevant objects by calculating parent ranks based on the ranks of the topically relevant objects (generated by the topical search engine) and/or the (weighted) structural relationships in which the topically relevant objects participate. Structurally relevant objects have a structural relationship (see FIG. 8) with one or more of the topically relevant objects. (Structurally relevant objects may or may not be topically relevant.) The hierarchical view generator then aggregates topically relevant objects based on their relationships and generates ranked hierarchies of both the topically relevant objects and structurally relevant objects to present to the user. For convenience, the topical search engine 120 and hierarchical view generator 10 are shown here as separate components. Note, however, that both systems may be internal components of a general hypermedia object search system.

Hypermedia objects 140 and/or indexes 130 on one computer may be accessed over the network by another computer using the Web (http) protocol, a networked file system protocol (e.g., NFS, AFS), or some other protocol. Services on one computer (e.g., topical search engine 120) may be invoked over the network by another computer using the Web protocol, a remote procedure call (RPC) protocol, or some other protocol.

A number of possible configurations for accessing hypermedia objects, indexes, and services locally or remotely are depicted in the present figure. These possibilities are described further below.

One configuration is a stand-alone workstation 195 that may or may not be connected to a network 105. The stand-alone system 195 has hypermedia objects 140 and an index 130 stored locally. The stand-alone system 195 also has a topical search engine 120 and hierarchical view generator 110 installed locally. When the system is used, a query is input to the workstation 195 and processed by the local topical search engine 120 and hierarchical view generator 110 using the index 130. The results from the topical search engine are output by the workstation 195.

A second configuration is 185, a workstation with hypermedia objects and indexes connected to a network 105. This configuration is similar to the stand-alone workstation 195, except that 185 is always connected to the network 105. Also, the local index 130 may be derived from local hypermedia objects 140 and/or remote hypermedia objects accessed via the network 105, and created by either a local topical search engine 120 and hierarchical view generator 110 or a remote topical search engine 120 and/or a remote hierarchical view generator 110 accessed via the network 105. When queries are input at the workstation 185, they may be processed locally at 185 using the local topical search engine 120, local hierarchical view generator 110, and local index 130. Alternatively, the local topical search engine 120 and hierarchical view generator 110 may access a remote index 130 (e.g. on system 175) via the network 105. Alternatively, the workstation 185 may access a remote topical search engine 120 and/or hierarchical view generator 110 via the network 105.

Another possible configuration is 175, a workstation with an index only. Computer 175 is similar to computer 185 with the exception that there are no local hypermedia objects 140. The local index 130 is derived from hypermedia objects 140 accessed via the network 105. Otherwise, as in computer 185, the index 130, topical search engine 120, and hierarchical view generator 110 may be accessed locally or remotely via the network 105 when processing queries.

Another possible configuration is computer 180, a workstation with hypermedia objects only. The hypermedia objects 140 stored locally at computer 180 may be accessed by remote topical search engines 120 and hierarchical view generators 110 via the network 105. When queries are entered at computer 180, the topical search engine 120, hierarchical view generator 110, and index 130 must all be accessed remotely via the network 105.

Another possible configuration is computer 190, a client station with no local hypermedia objects 140, index 130, topical search engine 120, or hierarchical view generator 110. When queries are entered at computer 190, the topical search engine 120, hierarchical view generator 110, and index 130 must all be accessed remotely via the network 105.

Another possible configuration is computer 170, a typical web server. Queries are entered at another workstation (e.g., 175, 180, 185, or possibly 195) or a client station (e.g., 190) and sent for processing to the web server 170 via the network 105. The web server 170 uses a remote topical search engine 120, hierarchical view generator 110, and index 130 (accessed via the network 105) to process the query. Alternatively, one or more of these functions (110, 120, and 130) can reside on the web server 170. The results are returned to the workstation or client station from which the query was originally sent.

FIGS. 2 and 3 give an intuitive description of the current invention. The current invention operates on an object collection, which consists of objects and directed relationships between those objects. (The hypermedia object database 141 in FIG. 1 is an example of an object collection.) An object is an identifiable entity that typically contains data, called the object's content. The nature of this data depends on the kind of object. For example, a document object would contain text data, while an image object would contain image data.

A directed relationship describes how objects in the object collection are related to each other. A directed relationship R consists of a set of instances of R, where an instance of relationship R, denoted R(o1,o2), indicates that relationship R holds from object o1 to object o2. Each relationship instance may optionally have a weight associated with it, e.g., R(o1,o2,w), where w is the weight of the instance of relationship R between objects o1 and o2. If R(o1,o2), then object o1 is a parent of object o2, and object o2 is a child of object o1. If object o2 can be reached from object o1 by following one or more instances of relationship R, then there is a path in relationship R from object o1 to object o2. The weight of a path is a function of, e.g. the sum total, of the weights of the relationship instances that make up the path. A directed relationship defines a structural organization of the objects in the collection, therefore a directed relationship is also called a structural relationship.

Note that an undirected relationship is supported by representing it as a directed relationship that holds in both directions, e.g., undirected relationship U between object o1 and object o2 would be represented by two instances of the directed relationship U', e.g., U'(o1,o2) and U'(o2,o1).

An example of a directed relationship is