or
Bookmark and Share
Distributed crawling of hyperlinked documents
   
Document Number
US Patent 7305610
Issued Date
December 4, 2007
Link
Inventors
Map
Abstract
Techniques for crawling hyperlinked documents are provided. Hyperlinked documents to be crawled are grouped by host and the host to be crawled next is selected according to a stall time of the host. The stall time can indicate the earliest time that the host should be crawled and the stall times can be a predetermined amount of time, vary by host and be adjusted according to actual retrieval times from the host.
Drawing
Distributed crawling of hyperlinked documents - US Patent 7305610 Drawing
Drawing from US Patent 7305610
Tags:
Description:
Amusing 0%
Clever 0%
Complex 0%
Efficient 0%
Historic 0%
Important 0%
Innovative 0%
Interesting 0%
Practical 0%
Simple 0%
Number of Claims:
25
Comments:
no comments yet
Owner
Google, Inc. (Mountain View, CA)
Published
December 4, 2007
Application Number
09/638,082
Filed
August 14, 2000
US Classification
715/205  
Int'l Classification
G06F   15/00   (20060101)   G06F   17/00   (20060101)  
Examiner
Attorney/Law Firm
Parent Case
This application claims the benefit of U.S. Provisional Application No. 60/195,581, filed Apr. 6, 2000, which is hereby incorporated by reference.
USPTO Field of Search
715/501.1   715/513  
Related Patents
7599931 - Web forum crawler - Owned by Microsoft Corporation (Redmond, WA)

A crawling system crawls a web site initially in a pattern detection phase and subsequently in a pattern usage phase. The pattern detection phase attempts to identify patterns of references to pages that contain informational content of interest and patterns of references to pages that contain little informational content of interest. During the pattern usage phase, the crawling system crawls the web site. When the crawling system encounters a reference contained on an accessed page, the crawling system determines whether the reference matches a reference pattern. If the reference matches a reference pattern associated with pages that contain informational content of interest, the crawling system accesses the referenced page. If, however, the reference matches a reference pattern of pages with little informational content, then the crawling system discards that reference without accessing the referenced page.

Claims
Description
About| FAQs| Terms & Disclaimer| Link to Us| Contact Us