or
Bookmark and Share
Verifying relevance between keywords and web site contents
   
Document Number
US Patent 7260568
Issued Date
August 21, 2007
Link
Inventors
Li; Li (Kirkland, WA)
Li; Ying (Bellevue, WA)
Najm; Tarek (Kirkland, WA)
Map
Abstract
Systems and methods for verifying relevance between terms and Web site contents are described. In one aspect, site contents from a bid URL are retrieved. Expanded term(s) semantically and/or contextually related to bid term(s) are calculated. Content similarity and expanded similarity measurements are calculated from respective combinations of the bid term(s), the site contents, and the expanded terms. Category similarity measurements between the expanded terms and the site contents are determined in view of a trained similarity classifier. The trained similarity classifier having been trained from mined web site content associated with directory data. A confidence value providing an objective measure of relevance between the bid term(s) and the site contents is determined from the content, expanded, and category similarity measurements evaluating the multiple similarity scores in view of a trained relevance classifier model.
Tags:
Description:
Amusing 0%
Clever 0%
Complex 0%
Efficient 0%
Historic 0%
Important 0%
Innovative 0%
Interesting 0%
Practical 0%
Simple 0%
Number of Claims:
41
Comments:
no comments yet
Owner
Microsoft Corporation (Redmond, WA)
Published
August 21, 2007
Application Number
10/826,162
Filed
April 15, 2004
US Classification
707/3   707/4 707/5 707/E17.002 707/E17.071
Int'l Classification
G06F   17/30   (20060101)  
Attorney/Law Firm
USPTO Field of Search
707/1   707/2   707/3   707/4   707/5   707/6   707/7   707/8   707/9   707/10   707/1   707/2   707/3   707/4   707/5   707/6   707/7   707/8   707/9   707/10.1   707/1   707/2   707/3   707/4   707/5   707/6   707/7   707/8   707/9   707/10   382/218   382/220   382/225  
Related Patents
7493293 - System and method for extracting entities of interest from text using n-gram models - Owned by International Business Machines Corporation (Armonk, NY)

A document (or multiple documents) is analyzed to identify entities of interest within that document. This is accomplished by constructing n-gram or bi-gram models that correspond to different kinds of text entities, such as chemistry-related words and generic English words. The models can be constructed from training text selected to reflect a particular kind of text entity. The document is tokenized, and the tokens are run against the models to determine, for each token, which kind of text entity is most likely to be associated with that token. The entities of interest in the document can then be annotated accordingly.

Claims
Description
About| FAQs| Terms & Disclaimer| Link to Us| Contact Us