WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and system for managing distributed data    

Get related patents on CD
United States Patent6182111   
Link to this pagehttp://www.wikipatents.com/6182111.html
Inventor(s)Inohara; Shigekazu (Kokubunji, JP), Kagimasa; Toyohiko (Yokohama, JP), Masuoka; Yoshimasa (Kodaira, JP), Noda; Fumio (Kodaira, JP), Min; Jinghua (Kodaira, JP)
AbstractIrregular and unstable natures of the Internet to be caused by an increase in Internet accessing users are alleviated and services of an information system more comfortable to users are provided. To this end, each servers among a plurality of servers cooperating to provide services stores the past communications line state (throughput and latency), and in accordance with the stored communications lines state, cache reference and prefetch are preformed between optional servers.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Drawing from US Patent 6182111
Method and system for managing distributed data - US Patent 6182111 Drawing
Method and system for managing distributed data
Inventor     Inohara; Shigekazu (Kokubunji, JP) , Kagimasa; Toyohiko (Yokohama, JP) , Masuoka; Yoshimasa (Kodaira, JP) , Noda; Fumio (Kodaira, JP) , Min; Jinghua (Kodaira, JP)
Owner/Assignee     Hitachi, Ltd. (Tokyo, JP)
Patent assignment
All assignments
Company News
Publication Date     January 30, 2001
Application Number     09/079,151
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 15, 1998
US Classification     709/201 707/10 709/202 709/203 709/217 709/219 709/223
Int'l Classification    
Examiner     Harrell; Robert B.
Assistant Examiner     Barot; Bharat
Attorney/Law Firm     Antonelli, Terry, Stout & Kraus, LLP
Address
Parent Case    
Priority Data     May 15, 1997 [JP] 9-125247
USPTO Field of Search     709/225 709/226 709/233 709/235 709/238 709/239 709/241 709/244 709/200 709/201 709/202 709/203 709/204 709/200 709/201 709/202 709/203 709/204 709/200 709/201 709/202 709/203 709/204 709/200 709/201 709/202 709/203 709/204 707/1 707/9 707/10 707/100 707/104
Patent Tags     managing distributed data
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
6085239
Kubo

Jul,2000

[0 after 0 votes]
5915095
Miskowiec
709/223
Jun,1999

[0 after 0 votes]
5913041
Ramanathan
709/233
Jun,1999

[0 after 0 votes]
5864670
Hayashi
709/204
Jan,1999

[0 after 0 votes]
5774660
Brendel

Jun,1998

[0 after 0 votes]
5764908
Shoji
709/217
Jun,1998

[0 after 0 votes]
5712981
McKee
709/241
Jan,1998

[0 after 0 votes]
5548724
Akizawa
709/203
Aug,1996

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A distributed data management method used by a process as a client using one or more sets of data and by two or more processes as servers providing data designated by a request from the client, in a computer system having two or more computers each executing one or more processes and interconnected by a network, the method comprising the steps of:

(1) storing in memory of each of a plurality of servers past communications history between said each server and others of said servers;

(2) receiving from a first server a request for data, and selecting one or more third servers for transmitting the requested data at high speed from second servers which store the requested data based on communications history stored in the memory of said first server; and

(3) transmitting the requested data to said first server from said one or more third servers.

2. A distributed data management method according to claim 1, wherein said storing step (1) includes a step of storing throughput representative of a transferrable data amount per unit time and latency representative of communications delay time.

3. A distributed data management method according to claim 1, wherein said storing step (1) includes a step of storing the communications history including a size of the necessary data.

4. A distributed data management method according to claim 1, wherein said storing step (1) includes a step of storing the communications history including a data storage probability predicted by said each server, said data storage probability indicating a possibility that said others of said servers store the requested data.

5. A distributed data management method according to claim 1, wherein said selecting step (2) includes a step of predicting a time taken to complete the transfer of the necessary data from the communications history and a step of selecting one or more third servers from one or more second servers if the predicted communications completion time is equal to or shorter than a predetermined value.

6. A distributed data management method according to claim 5, wherein the predicting step is performed by using an equation (the latency+(the size of the necessary data/the throughput)).

7. A distributed data management method according to claim 4, wherein said selecting step (2) includes a step of selecting one or more third servers from one or more second servers if a product of the data storage probability and the communications completion time is equal to or smaller than a predetermined value.

8. A distributed data management method according to claim 4, wherein said transmission requesting step (3) includes a step of updating the communications history of all of two or more servers stored in the memory, when communications between first and second servers is executed.

9. A distributed data management method used by two or more processes as servers in a computer system having two or more computers each executing one or more processes and interconnected by a network, comprising the steps of:

performing a process in a first server of requesting two or more second servers to transmit necessary data to said first server in accordance with a possibility of a presence of the necessary data in said two or more second servers,

wherein said process comprises the steps of:

(1) selecting by said first server one or more third servers from two or more second servers,

(2) requesting the selected third server to transmit the necessary data, and

(3) requesting the second server other than the third server to hold a transmission of the necessary data, and

wherein said selecting step (1) comprises the steps of:

storing by the first server past communications history between the first server and the second servers in a memory, and

selecting the third server from the second servers by using the past communications history.

10. A distributed data management method according to claim 9, wherein said requesting step (2) includes a step of requesting by the first server some or all of the second servers not selected as the third server to immediately transmit the necessary data, if some or all of the third servers sends a message that the necessary data cannot be transmitted, to the first server.

11. A distributed data management method according to claim 10, wherein said requesting step (2) includes a step of continuing a transmission only from one or more fourth servers and stopping a transmission from other servers, after two or more second servers starts a transmission of the necessary data.

12. A distributed data management method according to claim 11, wherein said selecting step (1) includes a step of selecting the third server and the fourth server.

13. A distributed data management method used by a process as a client using one or more sets of data and by two or more processes as servers in a computer system having two or more computers each executing one or more processes and interconnected by a network, the method comprising the steps of:

(1) storing by a first server in a memory, past communications history with time during a first time at a second time interval between the first server and two or more second servers;

(2) predicting by the first server a time when prefetch data can be acquired at high speed from one or more second servers having a possibility of possessing the prefetch data, by using the communications history with time, in order to request servers other than the first server to acquire the prefetch data before a request from the client, the prefetch data being expected to have a high possibility to be requested by the client in a future; and

(3) requesting by the first server at the predicted time the third server selected from at least some of the second servers to transmit the prefetch data.

14. A distributed data management method according to claim 13, wherein said storing step (1) includes a step of storing the communications history with time containing a history of throughput of communications during the first time at the second time interval and a history of latency of communications during the first time at the second time interval.

15. A distributed data management method used by two or more processes as servers in a computer system having two or more computers each executing one or more processes and interconnected by a network, the method comprising the steps of:

(1) storing by a first server past communications history between said first server and second servers in a memory;

(2) selecting one or more third servers from second servers associated with the first server, by using the communications history; and

(3) transmitting from the first server to the second server at least part of a list of data possessed by the first server.

16. A distributed data management method according to claim 15, wherein said transmitting step (3) includes a step of determining a data presence probability of the first server in accordance with a difference between a time when the data list is transmitted and a current time.

17. A distributed data management method according to claim 16, wherein said transmitting step (3) includes a step of lowering the data presence probability at a predetermined time interval after the data presence probability is set to 1 when the data list is transmitted.

18. A distributed data management method used by a process as a client using one or more sets of data and by two or more processes as servers, in a computer system having two or more computers each executing one or more processes and interconnected by a network, the method comprising the steps of:

(1) storing at least one of past communications history between a first server and second servers and request history from the client to the first server in a memory,

(2) determining by the first server one or more second data sets having a higher frequency of a request following a request for a first data set in accordance with the request history;

(3) storing by the first server, as reference relationship information, data representative of a combination of a name of the first data set and names of second data sets; and

(4) exchanging the reference relationship information of the first server with reference relationship information of one or more second servers.

19. A distributed data management method according to claim 18, wherein said determining step (2) includes a step of determining by the first server one or more second data sets having a higher possibility of being requested after the first data set is requested by the client, in accordance with the reference relationship information and a step of prefetching the second data sets from one or more second servers having a possibility of possessing the second data sets.

20. A distributed data management method according to claim 19, wherein said prefetching step includes a step of selecting one or more second servers from one or more third servers associated with the first server, by using the communications history.

21. A distributed data management method used by a process as a client using one or more sets of data and by two or more processes as servers, in a computer system having two or more computers each executing one or more processes and interconnected by a network, the method comprising the steps of:

(1) receiving by a first server a transmission request of a first data set from a second server;

(2) selecting one or more second servers expected to store a second data set having a possibility of being requested after the first data set; and

(3) requesting the selected one or more second servers to transmit the second data set to hierarchically prefetch the second data set from the first server to one or more second servers, and

wherein said selecting step (2) comprises a step of:

selecting one or more third servers associated with the first server from one or more second servers by using past communications history between the first server and the second servers.

22. A distributed data management method used by a process as a client using one or more sets of data and by two or more processes as servers, in a computer system having two or more computers each executing one or more processes and interconnected by a network, the method comprising the steps of:

(1) predicting a time taken to acquire from the second server two or more data sets processed by the first server;

(2) scheduling an order of discarding two or more data sets by using the predicted time, and

wherein said predicting step (1) is executed by using past communications history between servers.

23. A distributed data management method according to claim 22, wherein said scheduling step (2) is executed by using the predicted time and the number of times the data set requested by the client in a predetermined past time period.

24. A computer program product comprising:

a computer useable medium having computer readable program code means embodied therein for performing distributed data management by a process as a client using one or more sets of data and by two or more processes as servers providing data designated by a request from the client, in a computer system having two or more computers each executing one or more processes and interconnected by a network, the computer readable program code means in the computer program product comprising:

computer readable program code means for storing past communications history between first and second servers in a memory;

computer readable program code means for selecting one or more third servers from second servers by using the communications history, in accordance with a possibility that one or more second servers store necessary data for the first server; and

computer readable program code means for transmitting the necessary data from the first server to the third servers.

25. A distributed data management system comprising:

a network for interconnecting a computer as a client for using one or more data set and two or more computers as servers possessing one or more data sets for providing data designated by the client;

storage means for storing past communications history between a first server and one or more second servers;

means for selecting at least one second server having a possibility of possessing necessary data for the first server, in accordance with the past communications history; and

means for requesting the selected second server to transmit the necessary data.

26. A computer readable program product comprising:

a computer useable medium having a computer program embodied therein for performing distributed data management by two or more processes as servers in a computer system having two or more computers each executing one or more processes and interconnected by a network, said computer program comprising:

Computer program code means for requesting by a first server, two or more second servers to transmit necessary data to said first server in accordance with a possibility of a presence of the necessary data in said two or more second servers,

wherein said computer program code means comprises:

first computer program code means for selecting by said first server selecting one or more third servers from two or more second servers,

second computer program code means for requesting the selected third server to transmit the necessary data, and

third computer program code means for requesting the second server other than the third server to hold a transmission of the necessary data, and

wherein said first computer program code means comprises:

computer program code means for storing by the first server past communications history between the first server and the second servers in a memory, and

computer program code means for selecting the third server from the second servers by using the past communications history.

27. A computer readable program product comprising:

a computer useable medium having computer readable program code means embodied therein for performing distributed data management by a process as a client using one or more sets of data and by two or more processes as servers in a computer system having two or more computers each executing one or more processes and interconnected by a network, the computer readable program code means in the computer readable program product comprising:

computer readable program code means for storing by a first server in a memory, past communications history with time during a first time at a second time interval between the first server and two or more second servers;

computer readable program code means for predicting by the first server a time when prefetch data can be acquired at high speed from one or more second servers having a possibility of possessing the prefetch data, by using the communications history with time, in order to request servers other than the first server to acquire the prefetch data before a request from the client, the prefetch data being expected to have a high possibility to be requested by the client in a future; and

computer readable program code means for requesting by the first server at the predicted time the third server selected from at least some of the second servers to transmit the prefetch data.

28. A computer readable program product comprising:

a computer useable medium having computer readable program code means embodied therein for performing distributed data management by two or more processes as servers in a computer system having two or more computers each executing one or more processes and interconnected by a network, the computer readable program code means in the computer readable program product comprising:

computer readable program code means for storing by a first server past communications history between the first server and second servers in a memory;

computer readable program code means for selecting one or more third servers from second servers associated with the first server, by using the communications history; and

computer readable program code means for transmitting from the first server to the second server at least part of a list of data possessed by the first server.

29. A computer readable program product comprising:

a computer useable medium having computer readable program code means embodied therein for performing distributed data management by a process as a client using one or more sets of data and by two or more processes as servers, in a computer system having two or more computers each executing one or more processes and interconnected by a network, the computer readable program code means in the computer readable program product comprising:

computer readable program code means for storing at least one of past communications history between a first server and second servers and request history from the client to the first server in a memory,

computer readable program code means for determining by the first server one or more second data sets having a higher frequency of a request following a request for a first data set in accordance with the request history;

computer readable program code means for storing by the first server, as reference relationship information, data representative of a combination of a name of the first data set and names of second data sets; and

computer readable program code means for exchanging the reference relationship information of the first server with reference relationship information of one or more second servers.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The present invention relates to a computer system, and more particularly to a method and system for managing distributed data, suitable particularly for the world wide web (WWW), in which a plurality of computers interconnected by a network distribute, share and exchange data in an information system.

First, several terms used in the following description will be explained.

An information system such as WWW and anonymous FTP on the Internet is configured as a "client-server system" which is one type of distributed computer systems. In the client-server system, the processes in the whole system are classified into two parts. The first part is executed by a program (hereinafter called a process) called a "server", and the second part is executed by processes called "clients". A client in the information system generally runs on a computer operated by a home user or a company user. The server in the information system stores information to be supplied to clients. The client in the information system stores new information in the server or requests information from the server.

It is common in a computer system that the same information is temporarily copied to a plurality of sites in order to access the information at high speed or increase a possibility of accessibility. Such a copy is discriminably called hint, cache, replica, stash and the like (refer to a document "Distributed Systems (1st ed.) compiled by Sape Mullender, pp. 13-15, ACM press, 1989). In the following, these copies are collectively called a "cache". To make a cache is called "cache".

A WWW server in WWW stores information to be serviced, in a unit called a "home page". Each home page has a name called URL (abbreviation for uniform resource locator). URL is a character string capable of designating a protocol used in WWW, a host name of a computer as an information source, and specific data in the information source. For example, "http://www.hitachi.co.jp/index.html" is a URL.

Generally, each URL is in correspondence with a collection of data including character and image data of a home page. In the following, a collection of such data is called "URL corresponding information" or "URL contents". A second URL contained in the first URL corresponding information is called a "hyper text link (or simply link)". That the first URL corresponding information contains a second URL is hereinafter described as "there is a link from the first URL to the second URL".

The techniques (hereinafter called prior example 1) used by WWW will be explained in the following.

A user of a WWW client supplies a URL of the home page to be accessed, to the WWW server. In the first process type between a WWW server and client, the WWW client requests the WWW server designated by the second element of URL to transmit the home page of URL. In response to this request, the WWW server supplies the home page to the WWW client.

In the second process type, instead of requesting the WWW server designated by the second element of URL supplied from the user, the WWW client requests a second server called a "proxy server" to transmit the home page. The second server acquires the home page of URL from the first WWW server or requests another proxy server to acquire the URL corresponding information. At the repetitive stage of proxy server requests, these proxy servers have parent-child relationships. Proxy servers having a parent-child relationship are described, for example, in a document "A Hierarchical Internet Object Cache" by A. Chankhunthod, et.al., 1996 USENIX Technical Conference, pp. 153-163, 1996.

A WWW client and proxy server can have caches. A cache of a client stores home pages the client acquired in the past, and can be used only by this client. A cache of a proxy server stores home pages acquired by the proxy server in response to a request from one or more clients, a request from another or more other servers, or a request from both, and can be shared by the clients using this proxy server or by this proxy server itself.

The Network News System (hereinafter called prior example 2) is described, for example, in a document "Network News Transfer Protocol: A proposed Standard for the Stream-Based Transmission of News" by B. Kantor, et.al., Network Working Group RFC-977. This system is configured by one or more servers. Generally, a user selects one of the servers by using its client. The information unit in the Network News System is called "news". Generally, a user supplies news to the server by using its client, and acquires news from the server. As the user supplies news to a first server, the first server sends a copy of the news to a second server, and the second server supplies a copy of the news to another server, and so on. Finally, the copy of the news is supplied to all the servers.

Next, the global area name service, Domain Name System (hereinafter called prior example 3, abbreviated as DNS) will be explained. DNS is described, for example, in a document "Domain Names-Implementation and Specification" by P. Mockapetris, Network Working Group RFC-1035, particularly in the second section thereof. DNS has a correspondence mainly between a symbolic host name and host related information (IP address and mail address). A plurality of DNS servers have a tree structure. A request from a client is processed by tracing the tree structure and transferring the request to a plurality of servers. A resolver which is a DNS client requests the host related information corresponding to a host name to one of the servers. This server returns the host related information corresponding to the host name back to the client, or transfers the request to a parent server of this server (a DNS server nearer to the root of the DSN server tree structure from this server). The parent server grasps which of its child servers have what host related information. Therefore, after the request is transferred to the root of the tree structure, the request is then transferred downward the tree structure to the DNS server capable of processing the request. The request finally reaches the DNS server having the host related information from which the host related information is returned to the client, or alternatively a failure is returned back to the client if every DNS server cannot supply the host related information corresponding to the host name while the request is transferred downward the tree structure.

A method (hereinafter called prior example 4) is also known in which a space of caches is shared by a plurality of computers in a distributed file system of a local area network (LAN). According to a document "Cooperative Caching: Using Remote Client Memory to Improve File System Performance" by Michael Dahlin, et.al., First USENIX Symposium on Operating Systems Design and Implementation, pp. 267-280, 1994, a client first requests for a file block to a server called a "manager". The manager grasps which file block is stored in what computer. The manager informs the client of the computer which stores the file block, or transfers the request of the client to the computer. Similar methods are known as described in a document "Implementing Global Memory Management in an Workstation Cluster" by M. J. Feeley, et.al., ACM 15th Symposium on Operating Systems Principles, pp. 201-212, 1995, or in a document "Efficient Cooperative Caching Using Hints" by P. Sarkar, et.al., Second USENIX Symposium on Operating Systems Design and Implementation, pp. 35-46, 1996. A plurality of managers can be provided. However, a correspondence between file blocks and managers is prefixed and is known by all clients and servers. This correspondence does not change during system running.

Techniques used by computers called cache-coherent non-uniform memory access (CC-NUMA) and cache only memory access (COMA) will be explained by using following prior examples 5 and 6. The CC-NUMA computer or COMA computer has a mechanism of maintaining coherency between memory fragments (cache lines) cached near at a number of processors. The following two methods are known in particular.

With the first method (hereinafter called prior example 5), a processor or data called a "home" corresponding to the "manager" grasps which memory fragment is cached to what processor. This first method is used in a system described, for example, in a document "The Stanford FLASH Multiprocessor" by Jeffrey Kuskin, et.al., Proceedings of the 21th Annual International Symposium on Computer Architecture, pp. 302-314, ACM, 1994.

With the second embodiment (hereinafter called prior example 6), some restrictions are imposed on the formation, deletion and communications of caches, to thereby ensure identification and coherence of caches during a predetermined number of communications (generally including multicast or broadcast). This second method is used in a system described, for example, in a document "The Data Diffusion Machine with a Scalable Point-to-Point Network" by Henk L. Muller, et.al., Technical Report CSTR-93-17, Department of Computer Science, University of Bristol, October 1993.

The current communications performance of the Internet is much more slower than the speed users desire, and has various unstable factors. Because of rocketing spread of the WWW, a number of wide area networks (WAN) are congested. While high speed backbone communications lines are being increased and enhanced day after day, users at homes connect to the Internet via communication lines much slower than a general LAN. The number of users of WWW servers and the Internet is increasing even at present. According to certain statistics as of January 1996, the number of computers connecting the Internet in the world is nine million, and increasing by twofold in less than six months.

These circumstances make Internet communications lines irregular and unstable. "Irregular" means congestions of various communications lines. For example, each communications line has a different throughput (communications data amount per unit time) and a different latency (communications delay time). "Unstable" means that the throughput and latency of a communications line change from time to time and at worst the communications becomes impossible. For example, the congestion degree of a communications line changes with a time zone and a day of the week, or a routing pattern changes because of some enhanced communications line so that another communications line becomes congested or becomes free of congestion.

Under such circumstances, it is required to shorten the time from when a user issues a request to a client to when the request is satisfied, in order for the information system to provide services more comfortable to users. The following issues (1) to (5) regarding such user requirements will be explained.

(1) Under the conditions that some communications line is unstable, a client request may not be processed at high speed even if another communications line operates normally.

A WWW client and a WWW proxy server of the prior example 1 communicate with a specific WWW server and a specific proxy server designated by URLs. Therefore, if the communications line to the server (or proxy server) is congested or interrupted, it takes a long time for the client (or proxy serv