The present invention provides a system and method for inferring information need in a collection of hypermedia documents that is based on the observation that a user's hypertext link traversal decisions are typically based on the nature of that user's information need. The system identifies the hypermedia linkage structure among the plurality of documents in the collection. The documents include content items that may be relevant to a user information need. The system then accepts a user path item that represents a user's hypermedia link traversal history and applies a network flow model to the user path item in the hypermedia link information in order to create a document vector. The system also determines the distribution of the content items in the document collection, and then compares the document vector to the content item distribution in order to determine an inferred information need.
Techniques for determining user types based on multi-modal clustering are provided. The topology, content and usage of a document collection or web site is determined. The user paths are identified using longest repeating subsequence techniques and a multi-modal information need vector is determined for each significant user path. Multi-modal vectors for each document in the significant path, content, uniform resource locators, inlink and outlink multi-modal vectors are determined and combined based on path position and access frequency. Multi-modal clustering is performed based on a multi-modal similarity function and a specified measure of similarity using a type of multi-modal clustering such as K-means or wavefront clustering. The identified clusters may be further analyzed based on changes to the weighting of the corresponding content, url, inlinks and outlinks multi-modal feature vectors.