|
Claims  |
|
|
What is claimed is:
1. A uniform load distributing method for use in executing a plurality of individual processings in a parallel manner in a series of steps in a parallel computer having a
plurality of processors for executing the individual processings, a network for conducting inter-processor communication, and a synchronizing mechanism to synchronize the plurality of processors by issuing an executing start command for a next step to
all the processors when an indication of the completion of the individual processings performed in a current step is received from all the processors and the inter-processor communication related to the current step has been completed, said method
comprising:
executing, by the plurality of processors in a current step of the series, the plurality of respective individual processings having the necessary input data and whose output data is needed in a next step of the series;
sending an indication of the completion of the execution, by each of the plurality of processors, to the synchronizing mechanism, immediately after the execution has been completed; and
executing, until the execution start command for the next step is given by the synchronizing mechanism, the individual processings whose input data is ready and whose output data will be required at an earliest time following the completion of
the next step.
2. A uniform load distributing method for use in executing a plurality of individual processings in a parallel manner in a series of steps in a parallel computer having a plurality of processors for executing the individual processings, a
network for conducting inter-processor communication, and a synchronizing mechanism to synchronize the plurality of processors by issuing an execution start command for a next step to all the processors when an indication of the completion of the
individual processings performed in a current step is received from all the processors and the inter-processor communication related to the current step has been completed, said method comprising:
determining an input level for each individual processing based on when input data necessary for said individual processings will be ready;
determining an output level for each individual processing based on when output data of the individual processings is required for other processings;
calculating, in each processor, a current step number by counting how many execution start commands the synchronizing mechanism has issued;
executing only the individual processings whose output levels are equal to the current step number; and
executing in each processor while waiting for the other processors to complete the execution of the individual processings, the individual processings whose input levels are not larger than the current step number and whose output levels are
larger than the current step number.
3. A uniform load distributing method for use in executing a plurality of individual processings in a parallel manner in a series of steps in a parallel computer having a plurality of processors for executing the individual processings, a
network for conducting inter-processor communication, and a synchronizing mechanism to synchronize the plurality of processors by issuing an execution start command for a next step to all the processors when an indication of the completion of the
individual processings performed in a current step is received from all the processors and the inter-processor communication related to the current step has been completed, said method comprising:
determining an input level for each individual processor based on when input data necessary for each processings will be ready;
determining an output level for each individual processing based on when output data from the individual processing is required for other processings;
estimating an execution time for each of the individual processings to be executed;
calculating a step number by counting the number of times the synchronizing mechanism has issued an execution start command;
scheduling, for each processor, each of the individual processings whose output levels are equal to the step number and which have not been scheduled yet as an individual processing in each processor in each step;
estimating an overall execution time for the current step for each processor based on the estimated execution time of each of the individual processors;
calculating the waiting times of the respective processors for synchronization in the current step, based on a maximum value of the estimated overall execution time of all the processors;
scheduling the individual processing whose input levels are not larger than the step number, whose output levels are larger than the step number, and which are not scheduled yet as individual processings for the current step in the processors
during the waiting times; and
executing the scheduled individual processings while synchronizing the processors in each step.
4. A uniform load distributing method according to claim 1, wherein said method further comprises:
developing an individual processing which is to be executed repeatedly into a plurality of individual processings corresponding to a number of times the individual process is to be executed; and
executing the plurality of individual processings.
5. A method according to claim 1, wherein said method further comprises:
defining a single individual processing which is to be executed repeatedly as an individual processing; and
executing the newly defined individual processing only once, whereby the individual processing to be executed is repeatedly executed during the course of the single individual processing.
6. A method according to claim 1, wherein said method further comprises:
dividing an individual processing which is to be executed repeatedly into a plurality of individual processings; and
executing the plurality of newly defined individual processing.
7. A method according to claim 1, wherein said method further comprises:
designating a plurality of individual processings having a short execution time; and
defining and executing the plurality of the designated individual processings as a unit in an individual processing having an execution time approximately equal to the execution times of the other individual processings in the current step.
8. A method according to claim 1, wherein said method further comprises:
designating the individual processing having a long execution time; and
dividing the designated individual processing into a plurality of processings each having an execution time approximately equal to the execution times of the other individual processings in the current step;
defining and executing the plurality of divided designated individual processings as a plurality of individual processings.
9. A method according to claim 2, wherein said method further comprises:
detecting whether data of a designated output level is transferred in the network; and
starting the execution of the next step when an indication of the completion of the current processing from all the processors has been issued and no data of the designated output level is being transferred in the network.
10. A method according to claim 2, wherein said method further comprises:
detecting whether parallel computer is a a pipe-line type parallel computer having a pipe-line system to transfer the output data; and
starting the execution of the next step when no data of the designated output level is being transferred in the pipe-line system.
11. A uniform load distributing method according to claim 2, wherein said method further comprises:
developing an individual processing which is to be executed repeatedly into a plurality of individual processings corresponding to a number of times the individual process is to be executed; and
executing the plurality of individual processings.
12. A method according to claim 2, wherein said method further comprises:
defining an individual processing which is to be executed repeatedly as a single individual processing; and
executing the newly defined individual processing only once, whereby the individual processing to be executed is repeatedly executed during the course of the single individual processing.
13. A method according to claim 2, wherein said method further comprises:
dividing an individual processing which is to be executed repeatedly into a plurality of individual processings; and
executing the plurality of newly defined individual processing.
14. A method according to claim 2, wherein said method further comprises:
designating a plurality of individual processings having a short execution time; and
defining and executing the plurality of the designated individual processings as a unit in an individual processing having an execution time approximately equal to the execution times of the other individual processings in the current step.
15. A method according to claim 2, wherein said method further comprises:
designating the individual processing having a long execution time; and
dividing the designated individual processing into a plurality of processings each having an execution time approximately equal to the execution times of the other individual processings in the current step;
defining and executing the plurality of divided designated individual processings as a plurality of individual processings.
16. A uniform load distributing method according to claim 3, wherein said method further comprises:
developing an individual processing which is to be executed repeatedly into a plurality of individual processings corresponding to a number of times the individual process is to be executed; and
executing the plurality of individual processings.
17. A method according to claim 3, wherein said method further comprises:
defining an individual processing which is to be executed repeatedly as a single individual processing; and
executing the newly defined individual processing only once, whereby the individual processing to be executed is repeatedly executed during the course of the single individual processing.
18. A method according to claim 3, wherein said method further comprises:
dividing an individual processing which is to be executed repeatedly into a plurality of individual processings; and
executing the plurality of newly defined individual processing.
19. A method according to claim 3, wherein said method further comprises:
designating a plurality of individual processings having a short execution time; and
defining and executing the plurality of the designated individual processings as a unit in an individual processing having an execution time approximately equal to the execution times of the other individual processings in the current step.
20. A method according to claim 3, wherein said method further comprises:
designating the individual processing having a long execution time; and
dividing the designated individual processing into a plurality of processings each having an execution time approximately equal to the execution times of the other individual processings in the current step;
defining and executing the plurality of divided designated individual processings as a plurality of individual processings.
21. A method according to claim 1 wherein the individual processings are to be performed once.
22. A method according to claim 2 wherein the individual processings are to be performed once.
23. A method according to claim 3 wherein the individual processings are to be performed once. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a load uniforming method for use in executing a parallel processing in a parallel computer for executing a plurality of processings in a parallel manner and, particularly to a load uniforming method for use in executing
a parallel processing which makes loads uniform on respective processors by allotting to the respective processors individual processings to be executed in a parallel manner so as to shorten a waiting time required for all the processors to synchronize
with one another in each step.
2. Description of the Related Art
In a parallel computer, it is necessary to divide a calculation into several steps and to synchronize all the processors with one another in each step lest the calculation results of the respective processors contradict one another.
In view of this, other processors are required to wait until the slowest processor finishes a calculation, and accordingly a calculation speed of the computer is determined by that of the slowest processor. Further, even if a total calculation
time is made uniform among the respective processors, some processors are caused to wait since the calculation times of the respective processors are nonuniform in each step. This can result in a long overall calculation time.
SUMMARY OF THE INVENTION
The present invention aims to solve the above-mentioned drawbacks in the prior art.
An object of the present invention is to provide a load uniforming method capable of shortening the waiting time for synchronization and improving calculation speed in a parallel processing in a parallel computer in which a synchronization
mechanism detects that a range of information processing has been completed by all the processors, and all the processors are given a command to start the execution of a next step. Multiple calculations are thus carried out by synchronizing all the
processors with each other.
In accordance with an aspect of the invention, there is provided a uniform load distributing method for use in executing parallel processing in a parallel computer, in the parallel computer including a plurality of processors for executing
individual processings, a network for conducting communication between the plurality of processors, and a synchronizing mechanism for issuing an execution start command for a next step to all the processors under the condition that a completion
information of the processing is received from all the processors and the inter-processor communication in the network has been completed, to synchronize all the processors with one another in each step and process calculation of a multitude of
individual processings in a parallel manner, the method comprising the steps of: executing by the respective processors only the individual processings whose output data in a current step are used immediately in a next step when all the data necessary
for the multitude of individual processings are ready in the step wherein the respective processors execute the processings; informing the completion of the execution, immediately after the execution has been completed, to said synchronizing mechanism;
and executing, during a waiting time until the execution start command for the next step is given from said synchronizing mechanism, by the respective processors, the individual processings whose input data required for the execution are ready and whose
output data are used at an earliest timing after the next step.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram showing a configuration of a parallel computer in the prior art.
FIG. 2(A) is a flowchart of preprocessing in the prior art and FIG. 2(B) is a flowchart of processing in each processor in the prior art.
FIG. 3 is a block diagram showing a processing procedure in the parallel computer in consideration of input and output levels and a mutual dependency of individual processings.
FIG. 4(A) is a view showing the prior art level sorting method.
FIG. 4(B) is a view showing execution in the present invention.
FIG. 5 is a flowchart showing a uniform load distributing method in accordance with the present invention.
FIG. 6(A) is a flowchart showing a preprocessing in a first embodiment in accordance with the present invention.
FIG. 6(B) is a flowchart showing the processing in each processor in a first embodiment.
FIG. 7(A) is a flowchart showing a preprocessing in a second embodiment in accordance with the present invention, and
FIG. 7(B) is a flowchart showing an estimation of the execution time of individual processings in a step S35 in FIG. 7(A) and a scheduling of order.
FIGS. 8 and 9 are views showing an outline of a third to seventh embodiments of the present invention.
FIG. 10 is a view showing an eighth embodiment of the present invention in which a counter is added.
FIG. 11 is a view showing a ninth embodiment of the present invention in which the eighth embodiment is applied to a pipe-line type parallel computer.
PREFERRED EMBODIMENTS OF THE INVENTION
An embodiment of the present invention will be described after suitably describing a parallel processing executed in a parallel computer according to the prior art.
FIG. 1 is a diagram showing the construction of a parallel computer.
In this figure, a group of processors 1-1 to 1-n each have an arithmetic unit, 1-1-a, a data unit 1-1b, a control unit 1-1c. Further, a synchronizing mechanism, 2 and communication network 3 is provided between processors or an inter-processor
communication network (hereinafter referred to as a network). The arithmetic unit 1-1a carries out calculations based on an input data given from the data unit 1-1b in accordance with a control of the control unit 1-1c and outputs the calculation result
to the data unit 1-1b and the network 3.
The processors 1-1 to 1-n are constructed identically to the processor 1-1. In this case, the n processors are arranged in parallel.
The synchronizing mechanism 2 is adapted to synchronize the respective processors 1-1 to n-1 with one another. When the operations of the processors 1-1 to 1-n are completed and the network 3 finishes communicating the data between the
processors, commands representative of the start of the execution of a next processing are given to the control units 1-1 of the processors 1-1 to n-1.
The network 3 transfers the processing results between the processors 1-1 to n-1.
In the case where the arithmetic operation is carried out by the above parallel computer, calculation is divided into several steps which are allotted to the processors 1-1 to n-1.
The processors 1-1 to n-1 carry out the calculations allotted thereto in accordance with the control of the controllers 1-1c. Upon the completion of the calculation, the processors 1-1 to n-1 inform the synchronizing mechanism 2 of its
completion and transfer the calculation result to the other processors through the network 3.
The synchronizing mechanism 2 gives a command representative of the start of the execution of the next calculation to the processors 1-1 to n-1 when the completion information is output from all the processors 1-1 to n-1, or the network 3 becomes
empty of data and the completion of the communication is detected, thereby starting the next calculation.
FIG. 3 is a diagram showing a relationship of mutual dependency among the individual processings executed by the respective processors in the above parallel processing. Specifically, FIG. 3 shows a case where a repeat calculation is divided into
individual processings a to h which are executed by two processors 1 and 2.
Out of the individual processings, the individual processings a to e are those in which the calculation is carried out in accordance with an initial input signal, and the individual processing f is the one in which the calculation is carried out
using the results of the individual processings d and e as inputs.
Likewise, the individual processing g is the one in which the calculation is carried out using the results of the individual processings c, f, and the individual processing h is the one in which the calculation is carried out using the results of
the individual processings b, d, g.
As shown in FIG. 3, since the output data of the individual processings are used as input data for the next individual processings, the calculation cannot be started until all the input data are ready if a certain individual processing is taken
note of.
Thus, in the parallel computer, the calculation proceeds while synchronizing the processors with one another in each step in order to receive the input data for these individual processings from the other processors and to transfer the output
data to the other processors.
In the parallel computer of FIG. 1, the output data of the processors 1-1 to n-1 are transferred between the processors through the network 3, and the processors are synchronized with one another through the synchronizing mechanism 2.
Incidentally, when the individual processings shown in FIG. 3 are executed by the parallel computer according to the prior art, levels are affixed to the individual processings in the order that the inputs of the individual processings are set,
and the calculation proceeds while the processors are synchronized with one another for each level (hereinafter, this processing method is referred to as a level sorting method).
In FIG. 3, the levels of the individual processings, for example, the levels of the individual processing a are determined as follows. Since the input data of the individual processing a is an initial input and the output data thereof is a final
calculation result, input and output levels are respectively 1, 4. If the input level=i and the output level=j are represented as [i, j], the levels of the individual processing a are represented as [1, 4] as shown in FIG. 3.
Likewise, the levels of the individual processing f is [2, 2] since the input level thereof is 2 and the output level thereof is 2. Hereafter, the levels of the individual processings a to h are determined as shown in FIG. 3 in a similar manner.
Generally, in the level sorting which is carried out when all the input data are ready, the individual processing which can be executed using only the initial data has the input level 1 when the level of the initial data is 0. Likewise, the
individual processing which requires the calculation result of the input level (i-1) and can be executed using only the calculation result of the input level (i-1) or smaller. The initial data is an input which is required to carry out a series of
calculations and is given initially.
In the level sorting which is carried out depending upon when the output data is required in the other individual processings to be described later, the output level of the individual processing is set at the minimum input level minus 1 out of
all the individual processings which use the calculation result of this individual processing.
For instance, in the case where the output data of a certain individual processing A is used in the individual processings having the input levels 3, 4 and 5, the output level of the individual processing A is 2.
There is always established a relationship: output level.gtoreq.input level.
A conventional parallel processing is executed as shown in FIG. 2, after setting the levels in the individual processings as described above. FIG. 2(A) shows the preprocessing while FIG. 2(B) shows the processing executed in the respective
processors.
In Step S91 of FIG. 2(A), the individual processings are read. In Step S92, the input levels of all the individual processings are sorted. In Step S93, all the individual processings are allotted to the respective processors.
In the respective processors, STP is initialized to 1 in Step S94, as shown in FIG. 2(B). In Step S95, out of the individual processings whose input levels=STP, all the unexecuted processings are executed. Subsequently, the completion
information is issued in Step S96 and it is determined whether all the processors have issued the completion information and the network has become empty of data in Step S97.
In the case where the discrimination result is in the negative in Step S97, this routine waits in this step until the above conditions are satisfied.
When the above conditions are satisfied, this routine proceeds to Step S98 in which STP is incremented by one. In Step S99, it is determined whether the individual processings have been completed. If the individual processings have not been
completed, the routine returns to Step S95 to repeat the above operations.
If it is determined that the individual processings have been completed in Step S99, the parallel processing is completed.
Here, it is assumed that two processors take partial charge of the individual processings, as shown in FIG. 3, as described above and the individual processings a, b, c, d are allotted to the processor 1 while the individual processings e, f, g,
h are allotted to the processor 2. In this case, if the individual processings of FIG. 3 are level sorted in the order the inputs are set, there are five individual processings having the input level 1, one individual processing having the input level
2, one individual processing having the input level 3, and one individual processing having the input level 4.
If the parallel processing is executed in accordance with the procedure shown in FIG. 2(B), while synchronizing the processors with each other for each input level on the assumption that it takes one unit time to execute one individual
processing, the processings executed in the processors 1 and 2 are as shown in FIG. 4(A).
More specifically, the processors 2, 1 are caused to wait during unit times 2, 3, 4 and during unit times 5, 6, 7, respectively, since there is no individual processing available. This is because the routine recycles a loop for the
synchronization, as shown in FIG. 2.
In this example, the individual processings which need to be executed to start the individual processing having the input level 2 are only the individual processings d and e. The processor 2 is caused to wait by the processor 1 for the
synchronization despite the fact that the individual processing of the level 2 can be executed without execution of the individual processings a, b and c.
Thus in the conventional level sorting method, a waiting time of the other processor(s) is extended more than necessary, thereby slowing the processing speed.
The present invention is devised in view of the aforementioned problems in the prior art, and an object thereof is to provide a load uniforming method capable of shortening a waiting time for the synchronization and improving a calculation speed
in a parallel processing in a parallel computer which includes a synchronizing mechanism for detecting that the completion of processing has been informed from all the processors and giving to all the processors a command to start the execution of a next
step, and proceeding with the calculations of a multitude of individual processings while synchronizing all the processors with one another for each step.
FIG. 5 is a flowchart showing the basic feature of the invention. The present invention in accordance with a first preferred embodiment is constructed as shown in FIG. 5 in order to solve the aforementioned problems. According to the first
preferred embodiments of the invention, a parallel computer includes a plurality of processors for executing individual processings, a network for conducting communication between the plurality of processors, and a synchronizing mechanism for issuing an
execution start command for a next step to all the processors under the condition that a completion information of the processing is received from all the processors and the inter-processor communication in the network has been completed, to synchronize
all the processors with one another in each step and process calculation of a multitude of individual processings in parallel manner. The respective processors execute only the individual processings whose output data in a current step are used
immediately in a next step when all the data necessary for the multitude of individual processings are ready in the step wherein the respective processors execute the processings (Step S11), and the completion of the execution is informed to the
synchronizing mechanism (Step S12).
During a waiting time, until the execution start command for the next step is given from the synchronizing mechanism, the respective processors execute the individual processings whose input data required for the execution are ready and whose
output data are used at an earliest timing after the next step (Step S14). This shortens the waiting times of the processors for the synchronization and to make loads between the processors in each synchronizing step uniform.
According to a second preferred embodiment of the present invention, when the respective individual processings are level sorted when the input data necessary for these processings are ready and the output data of the individual processings are
level sorted when they are required for other processings to thereby cause the respective processors to execute the individual processings using the results of level sorting, each processor calculates a current step number by counting the execution start
command from the synchronizing mechanism in each processor and executes only the individual processings whose output levels are equal to the calculated current step number.
During a waiting time, to wait for the other processors to complete the execution of the individual processings for the synchronization, the individual processings whose input levels are not larger than the current step number and whose output
levels are larger than the current number steps are executed. This minimizes a time during which the other processors are caused to wait for the synchronization and to make the loads between the processors in each synchronizing step uniform.
According to a third preferred embodiment of the present invention, the individual processings are allotted to the respective processors by level sorting the respective individual processings when the input data necessary for these processings
are ready and also by level sorting the output data of the individual processings when they are required for other processings, thereby estimating execution times of all the individual processings to be executed. Based on the estimated execution times
of the individual processings, out of the individual processings whose output levels are equal to the step number, the individual processing which is not scheduled yet is scheduled as an individual processing in each processor in each step. In addition,
an overall execution time in the step is estimated.
In each step, the waiting times of the respective processors for the synchronization in each step are calculated based on a maximum value of the estimated overall execution time of all the processors. Out of the individual processings whose
input levels are not larger than the step number and whose output levels are larger than the step number, the individual processings which are not scheduled yet are scheduled as individual processings for this step in the processors during the waiting
times.
The scheduled individual processings are executed by the processors while synchronizing the processors in each step, thereby minimizing the time to cause the other processors to wait for the synchronization and making the loads between the
processors uniform in each synchronizing step.
According to a fourth preferred embodiment of the present invention, a processing for one cycle is taken from the individual processing which is executed repeatedly in the case where an individual processing to be executed repeatedly is found in
an overall calculation to be processed.
The processing taken for one cycle is redefined as an individual processing. Assuming that there are the newly defined individual processings corresponding to the number of repetitions, a plurality of individual processings are executed to
thereby execute the individual processings which are executed repeatedly.
According to a fifth preferred embodiment of the present invention, as in the fourth aspect thereof, in taking the processing for one cycle from an individual processing to be executed repeatedly in the entire calculation to be processed, the
individual processing to be executed repeatedly is developed into individual processings corresponding to the number of repetitions to be copied.
By executing the plurality of copied individual processings, the individual processing to be executed repeatedly is executed.
According to a sixth preferred embodiment of the present invention, the individual processing to be executed repeatedly is defined as an individual processing in the case where the individual processing to be executed repeatedly is found in the
entire calculation to be processed.
By executing the newly defined individual processing only once, the individual processing to be executed repeatedly is executed.
According to a seventh preferred embodiment of the present invention, in the case where the individual processing to be executed repeatedly is found in the entire calculation to be processed, the processing to be executed repeatedly is divided
into a suitable number of processings and the divided repeat processings are defined as a plurality of new individual processings.
By executing the plurality of newly defined individual processings, the individual processing to be executed repeatedly is executed.
According to an eighth preferred embodiment of the present invention, a plurality of individual processings having a short execution time are taken from the individual processings in the entire calculation to be processed.
The plurality of individual processings taken are defined and executed as an individual processing in a unit so as to be even with the execution times of the other individual processings. This is designed to make the execution times of the
individual processings even to be executed and to shorten the waiting time to wait for the other processors to complete the execution for the synchronization.
According to a ninth preferred embodiment of the present invention, the individual processing having a long execution time is taken from the individual processings in the entire calculation to be processed.
The individual processing taken is divided into a plurality of processings in a unit so as to be even with the execution times of the other individual processings. The plurality of divided processings are defined and executed as a plurality of
individual processings, thereby making the execution times of the individual processings to be executed even, and shortening the waiting time to wait for the other processors to complete the execution for the synchronization.
According to a tenth preferred embodiment of the present invention, there is provided a means for detecting whether a data of the designated output level is transferred in the network for conducting the communication between the plurality of
processors.
The execution of the next step is started under the condition that completion information of the processing from all the processors is ready and no data of the designated output level is transferred in the network.
According to an eleventh preferred embodiment of the present invention, there is provided a means for detecting whether data of the designated output level is transferred in a pipe-line system in executing the individual processing in a parallel
manner in a pipe-line type parallel computer.
The execution of the next step is started under the condition that no data of the designated output level is transferred in the pipe-line system.
In the first to third embodiments of the invention, the processors are synchronized with one another to proceed with the calculation processing under the condition that the execution of all the individual processings whose output data are
required in a next processing step are completed out of the individual processings whose input data are ready.
Accordingly, even a processor which has many individual processings and has to cause the other processors to wait, executes only the individual processings minimally required to start the next processing, thereby shortening the time that the
other processors are caused to wait.
The processors caused to wait for the synchronization execute the individual processings whose output are needed immediately, utilizing their waiting times. This eliminates the unnecessary waiting time of the processor to shorten the processing
time.
Particularly in the second embodiment, it is not necessary to estimate the time to execute the individual processing in advance. Even if the calculation time varies, the order of the individual processings is changed accordingly. Thus, the
waiting time of the processing can be shortened more than in the construction according to the third aspect.
Further in the third embodiment, the execution times of all the individual processings are estimated and the order thereof is scheduled. Accordingly, this can be carried out using a conventional parallel computer as it is.
In the case where the individual processings executed by the processors include a processing to be executed repeatedly, the repeatedly executed processing is defined as a proper unit of individual processing by developing or dividing the same as
in the fourth to seventh aspects.
The time to wait for the other processors to complete the execution for the synchronization can be shortened by uniting or dividing the individual processings to make the execution times of the individual processings to be executed evenly with
those of the other individual processings.
In the tenth embodiment, there is provided a means for detecting whether the data of the designated output level is transferred in the network for conducting the communication between the plurality of processors. Accordingly, the execution of
the next step can be started without waiting for the network 3 to become empty even if the network 3 includes the individual processing of the different output level, with the result that | | |