WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Uniform load distributing method for use in executing parallel processing in parallel computer    

Get related patents on CD
United States Patent5535387   
Link to this pagehttp://www.wikipatents.com/5535387.html
Inventor(s)Matsuoka; Hidetoshi (Kawasaki, JP), Hirose; Fumiyasu (Kawasaki, JP)
AbstractA uniform load distributing method for use in executing a parallel processing in a parallel computer for executing a plurality of processings in parallel manner. The parallel computer including a plurality of processors for executing individual processings, a network for conducting communication between the plurality of processors, and a synchronizing mechanism for issuing an execution start command for a next step to all the processors under the condition that a completion information of the processing is received from all the processors and the inter-processor communication in the network has been completed. All the processors are synchronized with one another in each step and process calculation of a multitude of individual processings in parallel manner by executing by the respective processors only the individual the individual processings processings whose output data in a current step are used immediately in a next step when all the data necessary for the multitude of individual processings are ready in the step wherein the respective processors execute the processings; informing of the synchronizing mechanism the completion of the execution, immediately after the execution has been completed, to the synchronizing mechanism; and executing, during a waiting time until the execution start command for the next step is given from the synchronizing mechanism, by the respective processors, the individual processings whose input data required for the execution are ready and whose output data are used at an earliest timing after the next step.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Drawing from US Patent 5535387
Uniform load distributing method for use in executing parallel
     processing in parallel computer - US Patent 5535387 Drawing
Uniform load distributing method for use in executing parallel processing in parallel computer
Inventor     Matsuoka; Hidetoshi (Kawasaki, JP) , Hirose; Fumiyasu (Kawasaki, JP)
Owner/Assignee     Fujitsu Limited (Kanagawa, JP)
Patent assignment
All assignments
Company News
Publication Date     July 9, 1996
Application Number     08/043,808
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     April 7, 1993
US Classification     718/105 718/106
Int'l Classification    
Examiner     Kriess; Kevin A.
Assistant Examiner    
Attorney/Law Firm     Staas & Halsey
Address
Parent Case    
Priority Data     Apr 10, 1992 [JP] 4-91016
USPTO Field of Search     395/650 395/700 364/281
Patent Tags     uniform load distributing executing parallel processing parallel computer
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5388262
Hillis

Feb,1995

[0 after 0 votes]
5317734
Gupta

May,1994

[0 after 0 votes]
5241677
Naganuma et al.

Aug,1993

[0 after 0 votes]
5202987
Bayer et al.

Apr,1993

[0 after 0 votes]
5056000
Chang

Oct,1991

[0 after 0 votes]
4920487
Baffes

Apr,1990

[0 after 0 votes]
4394725
Bienvenu et al.

Jul,1983

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A uniform load distributing method for use in executing a plurality of individual processings in a parallel manner in a series of steps in a parallel computer having a plurality of processors for executing the individual processings, a network for conducting inter-processor communication, and a synchronizing mechanism to synchronize the plurality of processors by issuing an executing start command for a next step to all the processors when an indication of the completion of the individual processings performed in a current step is received from all the processors and the inter-processor communication related to the current step has been completed, said method comprising:

executing, by the plurality of processors in a current step of the series, the plurality of respective individual processings having the necessary input data and whose output data is needed in a next step of the series;

sending an indication of the completion of the execution, by each of the plurality of processors, to the synchronizing mechanism, immediately after the execution has been completed; and

executing, until the execution start command for the next step is given by the synchronizing mechanism, the individual processings whose input data is ready and whose output data will be required at an earliest time following the completion of the next step.

2. A uniform load distributing method for use in executing a plurality of individual processings in a parallel manner in a series of steps in a parallel computer having a plurality of processors for executing the individual processings, a network for conducting inter-processor communication, and a synchronizing mechanism to synchronize the plurality of processors by issuing an execution start command for a next step to all the processors when an indication of the completion of the individual processings performed in a current step is received from all the processors and the inter-processor communication related to the current step has been completed, said method comprising:

determining an input level for each individual processing based on when input data necessary for said individual processings will be ready;

determining an output level for each individual processing based on when output data of the individual processings is required for other processings;

calculating, in each processor, a current step number by counting how many execution start commands the synchronizing mechanism has issued;

executing only the individual processings whose output levels are equal to the current step number; and

executing in each processor while waiting for the other processors to complete the execution of the individual processings, the individual processings whose input levels are not larger than the current step number and whose output levels are larger than the current step number.

3. A uniform load distributing method for use in executing a plurality of individual processings in a parallel manner in a series of steps in a parallel computer having a plurality of processors for executing the individual processings, a network for conducting inter-processor communication, and a synchronizing mechanism to synchronize the plurality of processors by issuing an execution start command for a next step to all the processors when an indication of the completion of the individual processings performed in a current step is received from all the processors and the inter-processor communication related to the current step has been completed, said method comprising:

determining an input level for each individual processor based on when input data necessary for each processings will be ready;

determining an output level for each individual processing based on when output data from the individual processing is required for other processings;

estimating an execution time for each of the individual processings to be executed;

calculating a step number by counting the number of times the synchronizing mechanism has issued an execution start command;

scheduling, for each processor, each of the individual processings whose output levels are equal to the step number and which have not been scheduled yet as an individual processing in each processor in each step;

estimating an overall execution time for the current step for each processor based on the estimated execution time of each of the individual processors;

calculating the waiting times of the respective processors for synchronization in the current step, based on a maximum value of the estimated overall execution time of all the processors;

scheduling the individual processing whose input levels are not larger than the step number, whose output levels are larger than the step number, and which are not scheduled yet as individual processings for the current step in the processors during the waiting times; and

executing the scheduled individual processings while synchronizing the processors in each step.

4. A uniform load distributing method according to claim 1, wherein said method further comprises:

developing an individual processing which is to be executed repeatedly into a plurality of individual processings corresponding to a number of times the individual process is to be executed; and

executing the plurality of individual processings.

5. A method according to claim 1, wherein said method further comprises:

defining a single individual processing which is to be executed repeatedly as an individual processing; and

executing the newly defined individual processing only once, whereby the individual processing to be executed is repeatedly executed during the course of the single individual processing.

6. A method according to claim 1, wherein said method further comprises:

dividing an individual processing which is to be executed repeatedly into a plurality of individual processings; and

executing the plurality of newly defined individual processing.

7. A method according to claim 1, wherein said method further comprises:

designating a plurality of individual processings having a short execution time; and

defining and executing the plurality of the designated individual processings as a unit in an individual processing having an execution time approximately equal to the execution times of the other individual processings in the current step.

8. A method according to claim 1, wherein said method further comprises:

designating the individual processing having a long execution time; and

dividing the designated individual processing into a plurality of processings each having an execution time approximately equal to the execution times of the other individual processings in the current step;

defining and executing the plurality of divided designated individual processings as a plurality of individual processings.

9. A method according to claim 2, wherein said method further comprises:

detecting whether data of a designated output level is transferred in the network; and

starting the execution of the next step when an indication of the completion of the current processing from all the processors has been issued and no data of the designated output level is being transferred in the network.

10. A method according to claim 2, wherein said method further comprises:

detecting whether parallel computer is a a pipe-line type parallel computer having a pipe-line system to transfer the output data; and

starting the execution of the next step when no data of the designated output level is being transferred in the pipe-line system.

11. A uniform load distributing method according to claim 2, wherein said method further comprises:

developing an individual processing which is to be executed repeatedly into a plurality of individual processings corresponding to a number of times the individual process is to be executed; and

executing the plurality of individual processings.

12. A method according to claim 2, wherein said method further comprises:

defining an individual processing which is to be executed repeatedly as a single individual processing; and

executing the newly defined individual processing only once, whereby the individual processing to be executed is repeatedly executed during the course of the single individual processing.

13. A method according to claim 2, wherein said method further comprises:

dividing an individual processing which is to be executed repeatedly into a plurality of individual processings; and

executing the plurality of newly defined individual processing.

14. A method according to claim 2, wherein said method further comprises:

designating a plurality of individual processings having a short execution time; and

defining and executing the plurality of the designated individual processings as a unit in an individual processing having an execution time approximately equal to the execution times of the other individual processings in the current step.

15. A method according to claim 2, wherein said method further comprises:

designating the individual processing having a long execution time; and

dividing the designated individual processing into a plurality of processings each having an execution time approximately equal to the execution times of the other individual processings in the current step;

defining and executing the plurality of divided designated individual processings as a plurality of individual processings.

16. A uniform load distributing method according to claim 3, wherein said method further comprises:

developing an individual processing which is to be executed repeatedly into a plurality of individual processings corresponding to a number of times the individual process is to be executed; and

executing the plurality of individual processings.

17. A method according to claim 3, wherein said method further comprises:

defining an individual processing which is to be executed repeatedly as a single individual processing; and

executing the newly defined individual processing only once, whereby the individual processing to be executed is repeatedly executed during the course of the single individual processing.

18. A method according to claim 3, wherein said method further comprises:

dividing an individual processing which is to be executed repeatedly into a plurality of individual processings; and

executing the plurality of newly defined individual processing.

19. A method according to claim 3, wherein said method further comprises:

designating a plurality of individual processings having a short execution time; and

defining and executing the plurality of the designated individual processings as a unit in an individual processing having an execution time approximately equal to the execution times of the other individual processings in the current step.

20. A method according to claim 3, wherein said method further comprises:

designating the individual processing having a long execution time; and

dividing the designated individual processing into a plurality of processings each having an execution time approximately equal to the execution times of the other individual processings in the current step;

defining and executing the plurality of divided designated individual processings as a plurality of individual processings.

21. A method according to claim 1 wherein the individual processings are to be performed once.

22. A method according to claim 2 wherein the individual processings are to be performed once.

23. A method according to claim 3 wherein the individual processings are to be performed once.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to a load uniforming method for use in executing a parallel processing in a parallel computer for executing a plurality of processings in a parallel manner and, particularly to a load uniforming method for use in executing a parallel processing which makes loads uniform on respective processors by allotting to the respective processors individual processings to be executed in a parallel manner so as to shorten a waiting time required for all the processors to synchronize with one another in each step.

2. Description of the Related Art

In a parallel computer, it is necessary to divide a calculation into several steps and to synchronize all the processors with one another in each step lest the calculation results of the respective processors contradict one another.

In view of this, other processors are required to wait until the slowest processor finishes a calculation, and accordingly a calculation speed of the computer is determined by that of the slowest processor. Further, even if a total calculation time is made uniform among the respective processors, some processors are caused to wait since the calculation times of the respective processors are nonuniform in each step. This can result in a long overall calculation time.

SUMMARY OF THE INVENTION

The present invention aims to solve the above-mentioned drawbacks in the prior art.

An object of the present invention is to provide a load uniforming method capable of shortening the waiting time for synchronization and improving calculation speed in a parallel processing in a parallel computer in which a synchronization mechanism detects that a range of information processing has been completed by all the processors, and all the processors are given a command to start the execution of a next step. Multiple calculations are thus carried out by synchronizing all the processors with each other.

In accordance with an aspect of the invention, there is provided a uniform load distributing method for use in executing parallel processing in a parallel computer, in the parallel computer including a plurality of processors for executing individual processings, a network for conducting communication between the plurality of processors, and a synchronizing mechanism for issuing an execution start command for a next step to all the processors under the condition that a completion information of the processing is received from all the processors and the inter-processor communication in the network has been completed, to synchronize all the processors with one another in each step and process calculation of a multitude of individual processings in a parallel manner, the method comprising the steps of: executing by the respective processors only the individual processings whose output data in a current step are used immediately in a next step when all the data necessary for the multitude of individual processings are ready in the step wherein the respective processors execute the processings; informing the completion of the execution, immediately after the execution has been completed, to said synchronizing mechanism; and executing, during a waiting time until the execution start command for the next step is given from said synchronizing mechanism, by the respective processors, the individual processings whose input data required for the execution are ready and whose output data are used at an earliest timing after the next step.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing a configuration of a parallel computer in the prior art.

FIG. 2(A) is a flowchart of preprocessing in the prior art and FIG. 2(B) is a flowchart of processing in each processor in the prior art.

FIG. 3 is a block diagram showing a processing procedure in the parallel computer in consideration of input and output levels and a mutual dependency of individual processings.

FIG. 4(A) is a view showing the prior art level sorting method.

FIG. 4(B) is a view showing execution in the present invention.

FIG. 5 is a flowchart showing a uniform load distributing method in accordance with the present invention.

FIG. 6(A) is a flowchart showing a preprocessing in a first embodiment in accordance with the present invention.

FIG. 6(B) is a flowchart showing the processing in each processor in a first embodiment.

FIG. 7(A) is a flowchart showing a preprocessing in a second embodiment in accordance with the present invention, and

FIG. 7(B) is a flowchart showing an estimation of the execution time of individual processings in a step S35 in FIG. 7(A) and a scheduling of order.

FIGS. 8 and 9 are views showing an outline of a third to seventh embodiments of the present invention.

FIG. 10 is a view showing an eighth embodiment of the present invention in which a counter is added.

FIG. 11 is a view showing a ninth embodiment of the present invention in which the eighth embodiment is applied to a pipe-line type parallel computer.

PREFERRED EMBODIMENTS OF THE INVENTION

An embodiment of the present invention will be described after suitably describing a parallel processing executed in a parallel computer according to the prior art.

FIG. 1 is a diagram showing the construction of a parallel computer.

In this figure, a group of processors 1-1 to 1-n each have an arithmetic unit, 1-1-a, a data unit 1-1b, a control unit 1-1c. Further, a synchronizing mechanism, 2 and communication network 3 is provided between processors or an inter-processor communication network (hereinafter referred to as a network). The arithmetic unit 1-1a carries out calculations based on an input data given from the data unit 1-1b in accordance with a control of the control unit 1-1c and outputs the calculation result to the data unit 1-1b and the network 3.

The processors 1-1 to 1-n are constructed identically to the processor 1-1. In this case, the n processors are arranged in parallel.

The synchronizing mechanism 2 is adapted to synchronize the respective processors 1-1 to n-1 with one another. When the operations of the processors 1-1 to 1-n are completed and the network 3 finishes communicating the data between the processors, commands representative of the start of the execution of a next processing are given to the control units 1-1 of the processors 1-1 to n-1.

The network 3 transfers the processing results between the processors 1-1 to n-1.

In the case where the arithmetic operation is carried out by the above parallel computer, calculation is divided into several steps which are allotted to the processors 1-1 to n-1.

The processors 1-1 to n-1 carry out the calculations allotted thereto in accordance with the control of the controllers 1-1c. Upon the completion of the calculation, the processors 1-1 to n-1 inform the synchronizing mechanism 2 of its completion and transfer the calculation result to the other processors through the network 3.

The synchronizing mechanism 2 gives a command representative of the start of the execution of the next calculation to the processors 1-1 to n-1 when the completion information is output from all the processors 1-1 to n-1, or the network 3 becomes empty of data and the completion of the communication is detected, thereby starting the next calculation.

FIG. 3 is a diagram showing a relationship of mutual dependency among the individual processings executed by the respective processors in the above parallel processing. Specifically, FIG. 3 shows a case where a repeat calculation is divided into individual processings a to h which are executed by two processors 1 and 2.

Out of the individual processings, the individual processings a to e are those in which the calculation is carried out in accordance with an initial input signal, and the individual processing f is the one in which the calculation is carried out using the results of the individual processings d and e as inputs.

Likewise, the individual processing g is the one in which the calculation is carried out using the results of the individual processings c, f, and the individual processing h is the one in which the calculation is carried out using the results of the individual processings b, d, g.

As shown in FIG. 3, since the output data of the individual processings are used as input data for the next individual processings, the calculation cannot be started until all the input data are ready if a certain individual processing is taken note of.

Thus, in the parallel computer, the calculation proceeds while synchronizing the processors with one another in each step in order to receive the input data for these individual processings from the other processors and to transfer the output data to the other processors.

In the parallel computer of FIG. 1, the output data of the processors 1-1 to n-1 are transferred between the processors through the network 3, and the processors are synchronized with one another through the synchronizing mechanism 2.

Incidentally, when the individual processings shown in FIG. 3 are executed by the parallel computer according to the prior art, levels are affixed to the individual processings in the order that the inputs of the individual processings are set, and the calculation proceeds while the processors are synchronized with one another for each level (hereinafter, this processing method is referred to as a level sorting method).

In FIG. 3, the levels of the individual processings, for example, the levels of the individual processing a are determined as follows. Since the input data of the individual processing a is an initial input and the output data thereof is a final calculation result, input and output levels are respectively 1, 4. If the input level=i and the output level=j are represented as [i, j], the levels of the individual processing a are represented as [1, 4] as shown in FIG. 3.

Likewise, the levels of the individual processing f is [2, 2] since the input level thereof is 2 and the output level thereof is 2. Hereafter, the levels of the individual processings a to h are determined as shown in FIG. 3 in a similar manner.

Generally, in the level sorting which is carried out when all the input data are ready, the individual processing which can be executed using only the initial data has the input level 1 when the level of the initial data is 0. Likewise, the individual processing which requires the calculation result of the input level (i-1) and can be executed using only the calculation result of the input level (i-1) or smaller. The initial data is an input which is required to carry out a series of calculations and is given initially.

In the level sorting which is carried out depending upon when the output data is required in the other individual processings to be described later, the output level of the individual processing is set at the minimum input level minus 1 out of all the individual processings which use the calculation result of this individual processing.

For instance, in the case where the output data of a certain individual processing A is used in the individual processings having the input levels 3, 4 and 5, the output level of the individual processing A is 2.

There is always established a relationship: output level.gtoreq.input level.

A conventional parallel processing is executed as shown in FIG. 2, after setting the levels in the individual processings as described above. FIG. 2(A) shows the preprocessing while FIG. 2(B) shows the processing executed in the respective processors.

In Step S91 of FIG. 2(A), the individual processings are read. In Step S92, the input levels of all the individual processings are sorted. In Step S93, all the individual processings are allotted to the respective processors.

In the respective processors, STP is initialized to 1 in Step S94, as shown in FIG. 2(B). In Step S95, out of the individual processings whose input levels=STP, all the unexecuted processings are executed. Subsequently, the completion information is issued in Step S96 and it is determined whether all the processors have issued the completion information and the network has become empty of data in Step S97.

In the case where the discrimination result is in the negative in Step S97, this routine waits in this step until the above conditions are satisfied.

When the above conditions are satisfied, this routine proceeds to Step S98 in which STP is incremented by one. In Step S99, it is determined whether the individual processings have been completed. If the individual processings have not been completed, the routine returns to Step S95 to repeat the above operations.

If it is determined that the individual processings have been completed in Step S99, the parallel processing is completed.

Here, it is assumed that two processors take partial charge of the individual processings, as shown in FIG. 3, as described above and the individual processings a, b, c, d are allotted to the processor 1 while the individual processings e, f, g, h are allotted to the processor 2. In this case, if the individual processings of FIG. 3 are level sorted in the order the inputs are set, there are five individual processings having the input level 1, one individual processing having the input level 2, one individual processing having the input level 3, and one individual processing having the input level 4.

If the parallel processing is executed in accordance with the procedure shown in FIG. 2(B), while synchronizing the processors with each other for each input level on the assumption that it takes one unit time to execute one individual processing, the processings executed in the processors 1 and 2 are as shown in FIG. 4(A).

More specifically, the processors 2, 1 are caused to wait during unit times 2, 3, 4 and during unit times 5, 6, 7, respectively, since there is no individual processing available. This is because the routine recycles a loop for the synchronization, as shown in FIG. 2.

In this example, the individual processings which need to be executed to start the individual processing having the input level 2 are only the individual processings d and e. The processor 2 is caused to wait by the processor 1 for the synchronization despite the fact that the individual processing of the level 2 can be executed without execution of the individual processings a, b and c.

Thus in the conventional level sorting method, a waiting time of the other processor(s) is extended more than necessary, thereby slowing the processing speed.

The present invention is devised in view of the aforementioned problems in the prior art, and an object thereof is to provide a load uniforming method capable of shortening a waiting time for the synchronization and improving a calculation speed in a parallel processing in a parallel computer which includes a synchronizing mechanism for detecting that the completion of processing has been informed from all the processors and giving to all the processors a command to start the execution of a next step, and proceeding with the calculations of a multitude of individual processings while synchronizing all the processors with one another for each step.

FIG. 5 is a flowchart showing the basic feature of the invention. The present invention in accordance with a first preferred embodiment is constructed as shown in FIG. 5 in order to solve the aforementioned problems. According to the first preferred embodiments of the invention, a parallel computer includes a plurality of processors for executing individual processings, a network for conducting communication between the plurality of processors, and a synchronizing mechanism for issuing an execution start command for a next step to all the processors under the condition that a completion information of the processing is received from all the processors and the inter-processor communication in the network has been completed, to synchronize all the processors with one another in each step and process calculation of a multitude of individual processings in parallel manner. The respective processors execute only the individual processings whose output data in a current step are used immediately in a next step when all the data necessary for the multitude of individual processings are ready in the step wherein the respective processors execute the processings (Step S11), and the completion of the execution is informed to the synchronizing mechanism (Step S12).

During a waiting time, until the execution start command for the next step is given from the synchronizing mechanism, the respective processors execute the individual processings whose input data required for the execution are ready and whose output data are used at an earliest timing after the next step (Step S14). This shortens the waiting times of the processors for the synchronization and to make loads between the processors in each synchronizing step uniform.

According to a second preferred embodiment of the present invention, when the respective individual processings are level sorted when the input data necessary for these processings are ready and the output data of the individual processings are level sorted when they are required for other processings to thereby cause the respective processors to execute the individual processings using the results of level sorting, each processor calculates a current step number by counting the execution start command from the synchronizing mechanism in each processor and executes only the individual processings whose output levels are equal to the calculated current step number.

During a waiting time, to wait for the other processors to complete the execution of the individual processings for the synchronization, the individual processings whose input levels are not larger than the current step number and whose output levels are larger than the current number steps are executed. This minimizes a time during which the other processors are caused to wait for the synchronization and to make the loads between the processors in each synchronizing step uniform.

According to a third preferred embodiment of the present invention, the individual processings are allotted to the respective processors by level sorting the respective individual processings when the input data necessary for these processings are ready and also by level sorting the output data of the individual processings when they are required for other processings, thereby estimating execution times of all the individual processings to be executed. Based on the estimated execution times of the individual processings, out of the individual processings whose output levels are equal to the step number, the individual processing which is not scheduled yet is scheduled as an individual processing in each processor in each step. In addition, an overall execution time in the step is estimated.

In each step, the waiting times of the respective processors for the synchronization in each step are calculated based on a maximum value of the estimated overall execution time of all the processors. Out of the individual processings whose input levels are not larger than the step number and whose output levels are larger than the step number, the individual processings which are not scheduled yet are scheduled as individual processings for this step in the processors during the waiting times.

The scheduled individual processings are executed by the processors while synchronizing the processors in each step, thereby minimizing the time to cause the other processors to wait for the synchronization and making the loads between the processors uniform in each synchronizing step.

According to a fourth preferred embodiment of the present invention, a processing for one cycle is taken from the individual processing which is executed repeatedly in the case where an individual processing to be executed repeatedly is found in an overall calculation to be processed.

The processing taken for one cycle is redefined as an individual processing. Assuming that there are the newly defined individual processings corresponding to the number of repetitions, a plurality of individual processings are executed to thereby execute the individual processings which are executed repeatedly.

According to a fifth preferred embodiment of the present invention, as in the fourth aspect thereof, in taking the processing for one cycle from an individual processing to be executed repeatedly in the entire calculation to be processed, the individual processing to be executed repeatedly is developed into individual processings corresponding to the number of repetitions to be copied.

By executing the plurality of copied individual processings, the individual processing to be executed repeatedly is executed.

According to a sixth preferred embodiment of the present invention, the individual processing to be executed repeatedly is defined as an individual processing in the case where the individual processing to be executed repeatedly is found in the entire calculation to be processed.

By executing the newly defined individual processing only once, the individual processing to be executed repeatedly is executed.

According to a seventh preferred embodiment of the present invention, in the case where the individual processing to be executed repeatedly is found in the entire calculation to be processed, the processing to be executed repeatedly is divided into a suitable number of processings and the divided repeat processings are defined as a plurality of new individual processings.

By executing the plurality of newly defined individual processings, the individual processing to be executed repeatedly is executed.

According to an eighth preferred embodiment of the present invention, a plurality of individual processings having a short execution time are taken from the individual processings in the entire calculation to be processed.

The plurality of individual processings taken are defined and executed as an individual processing in a unit so as to be even with the execution times of the other individual processings. This is designed to make the execution times of the individual processings even to be executed and to shorten the waiting time to wait for the other processors to complete the execution for the synchronization.

According to a ninth preferred embodiment of the present invention, the individual processing having a long execution time is taken from the individual processings in the entire calculation to be processed.

The individual processing taken is divided into a plurality of processings in a unit so as to be even with the execution times of the other individual processings. The plurality of divided processings are defined and executed as a plurality of individual processings, thereby making the execution times of the individual processings to be executed even, and shortening the waiting time to wait for the other processors to complete the execution for the synchronization.

According to a tenth preferred embodiment of the present invention, there is provided a means for detecting whether a data of the designated output level is transferred in the network for conducting the communication between the plurality of processors.

The execution of the next step is started under the condition that completion information of the processing from all the processors is ready and no data of the designated output level is transferred in the network.

According to an eleventh preferred embodiment of the present invention, there is provided a means for detecting whether data of the designated output level is transferred in a pipe-line system in executing the individual processing in a parallel manner in a pipe-line type parallel computer.

The execution of the next step is started under the condition that no data of the designated output level is transferred in the pipe-line system.

In the first to third embodiments of the invention, the processors are synchronized with one another to proceed with the calculation processing under the condition that the execution of all the individual processings whose output data are required in a next processing step are completed out of the individual processings whose input data are ready.

Accordingly, even a processor which has many individual processings and has to cause the other processors to wait, executes only the individual processings minimally required to start the next processing, thereby shortening the time that the other processors are caused to wait.

The processors caused to wait for the synchronization execute the individual processings whose output are needed immediately, utilizing their waiting times. This eliminates the unnecessary waiting time of the processor to shorten the processing time.

Particularly in the second embodiment, it is not necessary to estimate the time to execute the individual processing in advance. Even if the calculation time varies, the order of the individual processings is changed accordingly. Thus, the waiting time of the processing can be shortened more than in the construction according to the third aspect.

Further in the third embodiment, the execution times of all the individual processings are estimated and the order thereof is scheduled. Accordingly, this can be carried out using a conventional parallel computer as it is.

In the case where the individual processings executed by the processors include a processing to be executed repeatedly, the repeatedly executed processing is defined as a proper unit of individual processing by developing or dividing the same as in the fourth to seventh aspects.

The time to wait for the other processors to complete the execution for the synchronization can be shortened by uniting or dividing the individual processings to make the execution times of the individual processings to be executed evenly with those of the other individual processings.

In the tenth embodiment, there is provided a means for detecting whether the data of the designated output level is transferred in the network for conducting the communication between the plurality of processors. Accordingly, the execution of the next step can be started without waiting for the network 3 to become empty even if the network 3 includes the individual processing of the different output level, with the result that