|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to a multiprocessor system and a
method of load balancing thereof, and more particularly to a
multiprocessor system which has a plurality of processors and a network
system and a method of load balancing processing in the multiprocessor
system in which a given computational task or load is divided into a
plurality of load segments and each of the load segments is dynamically
assigned to a predetermined processor while the multiprocessor system
operates.
2. Prior Art
The conventional multiprocessor system has a plurality of processors and a
network system. In the case where a given computational task or load
written in a logic programming Language (e.g., Prolog) is executed in
parallel in the conventional multiprocessor system, the given load (or an
initial goal) is divided into plural initial load segments which are
assigned to all of the processors at an initial load balancing stage. More
specifically, a first initial load segment is given to a first processor
wherein data representative of the processing result of the first initial
load segment is obtained, and such data must be transferred to a second
processor which starts to process a second initial load segment thereof by
use of such data. Thus, data representative of the processing result in
the presently operating first processor must be transferred to the next
processor, which is idle during the operation of the first processor, but
will start to process its initial load segment by use of the data from the
first processor. As described heretofore, the initial load segments are
sequentially assigned to the processors in turn. Hence, the conventional
multiprocessor system requires a long processing time before the given
load is executed in parallel.
At a time when the initial goal is given to one processor, all of the other
idle processors within the multiprocessor system do not operate. Hence,
one processor must divide the given initial goal into plural initial load
segments which must be assigned to the other processors. In addition, the
conventional multiprocessor system must provide the network system for
transferring information concerning the given initial goal which must be
divided. For this reason, the conventional multiprocessor system cannot
perform an initial load balancing of the initial goal with high speed.
Originally, it is possible to obtain a performance improvement due to a
parallel effect for shortening processing times (hereinafter, simply
referred to as the parallel effect) when the given load is executed in
parallel in the conventional multiprocessor system. However, the
conventional multiprocessor system suffers from a problem in that it is
not actually possible to obtain such parallel effect because of the reason
described above.
Further, the above mentioned one processor supplied with the initial goal
must transfer a certain part of the information thereof to all of other
processors so that the amount of information to be transferred is
increased. Hence, the conventional multiprocessor system suffers another
problem in that it must have the ability to transfer data at high speed
and transfer a large quantity of data for the network system.
Next, a description will be given with respect to the above-mentioned
problems in detail by considering that the logic programming language
(i.e., Prolog) is executed in the conventional multiprocessor system.
In a process for sequentially executing Prolog (shown in FIG. 2), a
predetermined priority (i.e., a depth-first-search) is given such that
branches are searched from an upper side to a lower side and from a left
side to a right side within an inference tree (or a proof tree) of Prolog.
When the system fails to find the correct branch (or the desirable branch)
while searching, the system backtracks to the preceding node and searches
all branches connected thereto so as to find the correct branch.
In another process for executing Prolog in parallel, plural processors
simultaneously search a certain section or all sections of the inference
tree so as to find the correct branches in accordance with a predetermined
breadth-first-search. Such a process is called an OR parallel execution in
which all branches within the inference tree are divided into plural
sections (hereinafter, referred to as OR processes) each having a certain
number of the branches and all of the OR processes are respectively
assigned to the idle processors when the initial goal is given to the
system. In this case, information required to execute each OR process must
be transferred to the corresponding idle processor.
As described before, one processor supplied with the initial goal must
divide the given initial goal into plural initial load segments which must
be assigned to the other idle processors at the initial load balancing
stage. Hence, the conventional system can not perform the initial load
balancing with high speed.
Meanwhile, after the load balancing is performed between the first and
second processors within the multiprocessor system, it is desirable that
the first and second processors be able to independently proceed with
their respective processes without transferring data representative of the
working environment of the first processor from that processor to the
second processor.
In order to realize the above-mentioned load balancing within the
conventional multiprocessor system, a predetermined working environment
required for the second processor must be extracted (or selected) from the
working environments which are obtained by performing predetermined
processes within the first processor, before performing the load balancing
in the first processor, and such predetermined working environment must be
transferred to the second processor.
In other words, the above predetermined working environment is identical to
the information which is obtained by performing predetermined processes
other than the load balancing process within the first processor. Such a
predetermined working environment is necessary for the second processor in
the case where a certain part of the load to be executed in the first
processor is shared with and executed by the second processor. In
addition, the amount of information representative of the working
environments increases as the system proceeds to balance the load.
Therefore, quite a large amount of information must be transferred to the
other processors when the load balancing is performed after a long process
is performed in each processor.
As described heretofore, the first processor must stop performing its
original process and extract the predetermined working environment
required for the load balancing from its working environments (at a load
generation stage), and then such predetermined working environment, which
has a large amount of information, must be transferred to the second
processor. Thereafter, the second processor must store the transferred
information (at a load storing stage) so it can proceed with its original
process. Specifically, a data conversion is required in order to transfer
such information by use of the network system. In the present
specification, the meaning of the data conversion will be considered to be
included in the meanings of the above load generation and load storing.
As shown in FIG. 1, overhead time must inevitably be provided for with the
above-mentioned load balancing in the conventional multiprocessor system.
In FIG. 1, the first processor cannot prevent a first overhead time from
occurring, and the second processor also cannot prevent a second overhead
time from occurring.
Due to the overhead time accompanying the load balancing (or due to the
stopping of the process in the first processor in particular), each
processor can not demonstrate its processing ability by every time unit.
In addition, the load balancing is required to be performed between
processors at an arbitrary and asynchronous time. Hence, the conventional
multiprocessor system suffers from the problem in that it is not possible
to demonstrate the parallel effect as described before. This parallel
effect can be evaluated by the total ability which can be obtained from
the following formula: (Total Ability)=(Processing Ability of each
processor).times.(Number of processors which are operable in parallel in
order to process the given load). Hence, the conventional system needs a
network system having a high cost to transfer the large amount of
information with arbitrary and asynchronous timing. In order to transfer
the large amount of information, the network system must be occupied for a
long time, hence, it becomes impossible to perform the load balancing
between the processors properly. Therefore, the conventional system
stiffers a problem in that a load unbalancing must occur.
Compared to an improvement in the processing speed of the processor, an
improvement in the transfer speed of the network system within the
multiprocessor system has relatively little effect. This results in a
tendency to increase the communication time of the network system more
than that of the processors. In this case, the above-mentioned problem
becomes serious. As the number of processors within the multiprocessor
system increases, such a tendency becomes rather remarkable.
Next, a description will be given with respect to the above-mentioned
problem in conjunction with FIG. 2 when Prolog is executed in parallel in
the multiprocessor system.
In the case where the first processor performs the load balancing on the
second processor in the OR parallel execution described before, the first
processor divides an OR process from all branches of the inference tree,
and the divided OR process is assigned to the second processor.
In this case, transfer data (to be transferred from the first processor to
the second processor) can be classified as first and second transfer data.
The first transfer data represent the information of the divided OR
process. The second transfer data represent the information of the divided
OR process and other information which is required to execute the divided
OR process.
The first processor must transfer the above second transfer data to the
second processor while the first and second processors independently
proceed with their respective processes after the load balancing is
performed. This happens because the second processor must refer to the
working environment of the first processor when the first processor
transfers the first transfer data to the second processor, instead of the
second transfer data.
However, the second transfer data must include data representative of the
large amount of information of the working environment of the first
processor which is necessary for executing the divided OR process. This
working environment in the Prolog execution includes "bind information"
representative of a connection relation between variables and values and
"control information" for controlling the backtracking of Prolog, for
example.
The above-mentioned working environment is produced by the first processor
before performing the load balancing. The second processor requires such
working environment to execute the divided OR process after the load
balancing is performed. Because, when the second processor independently
obtains a solution (or a processing result) of the initial goal by
performing the divided OR process, the second processor may need all of
the bind information which is produced by the first processor between a
time when the initial goal is given and a later time when the first
processor starts to perform the load balancing. In addition, the amount of
such bind information must be increased nearly in proportion to the
processing time. Therefore, the first processor must transfer quite a
large amount of information representative of its working environment to
the second processor when the first processor performs the load balancing
on the second processor after a long processing time has been passed.
Since the first processor must divide the OR process and transfer its large
amount of information representative of the working environment when every
time the first processor performs the load balancing on the second
processor, the original process of the first processor must be stopped so
it performs intermittently. On the other hand, since the second processor
receives the working environment of the first processor every time the
load balancing is performed, the original process of the second processor
must be stopped history information order to receive the large amount of
information representative of the working environment of the first
processor and to store such transferred information.
Therefore, each processor can not demonstrate its full processing ability.
In addition, the load balancing is required between the processors at
arbitrary and asynchronous times. Hence, the multiprocessor system suffers
from the problem that it is impossible to obtain the parallel effect as
described before.
Further, the conventional system requires an expensive network system to
transfer large amounts of information at arbitrary and asynchronous times.
Since the network system in this case is occupied for a long time in order
to transfer the large amount of information, it becomes almost impossible
to perform the load balancing between the processors. Therefore, the
conventional multiprocessor system suffers from the above described
problem in that the load becomes unbalanced.
The above-mentioned problem becomes serious in a recently developed
sequential inference machine (or a Prolog machine), which machine can
sequentially perform the inference by itself with high speed. When the
multiprocessor system controls one thousand or more of such machines
(i.e., the processors) in parallel, the conventional system has the
tendency to cause the improvement of the data transfer speed of the
network system to become smaller than that of the processing speed of each
machine, as described before. As the number of the processors within the
multiprocessor system increases, the above-mentioned tendency becomes even
greater.
A sequential inference machine of 1 MLIPS (i.e., one Mega Logical Inference
Per Second) produces a working environment having about 5 MW (i.e., five
Mega Word) of information (in case of 40 Bit/W). For example, a serial
link of 10 MBPS (i.e., ten Mega Bit Per Second) is actually required
between two mutually adjacent processors as the network system which
connects all one thousand of the sequential inference machines provided
within the multiprocessor system. In this case, it is possible to transfer
data of 0.25 MW per second (which is obtained by dividing 10 MBPS by 40
Bit/W) representative of the working environment between two mutually
adjacent processors.
In this case, the processing time for performing the inference divided by
the communication time of the network system becomes equal to 1/20. The
value 20 which appears in the denominator is obtained by dividing 5 MW by
0.25 MW. Due to the load balancing (or due to the transfer of the large
amount of information in particular), the sequential inference machine
(i.e., the processor) must stop performing the original inference process
for a long time. Hence, the apparent processing ability of the sequential
inference machine must be lowered.
Since the operating processors and the network system are occupied in order
to transfer the information representative of the working environments for
a long time, it becomes impossible to perform the required load balancing
so that the availability of the processor must be lowered. Thus, the
parallel effect applied to the multiprocessor system must be lowered as
described before.
SUMMARY OF THE INVENTION
It is therefore a primary object of the present invention to provide a
method of load balancing processing in a multiprocessor system which can
obtain a high parallel effect by performing the initial load balancing of
the initial goal with a high speed when the logic programming language is
execute in parallel in the multiprocessor system.
It is another object of the present invention to provide a method of load
balancing processing in a multiprocessor system which remarkably reduces
the amount of the information transferred between the processors so as to
perform the load balancing at high speed and without intermittently
stopping the execution of the original process of each processor so that a
high parallel effect can be obtained, even when a network system having a
reasonable price is used in the multiprocessor system.
In a first aspect of the invention, there is provided a multiprocessor
system for processing a given load written by a predetermined programming
language comprising:
a plurality of processors and a network system linking the processors,
each of the processors comprising
(a) first means for storing system information representing characteristics
of the multiprocessor system and
(b) second means for automatically and dynamically selecting a specific
initial load segment from the given load by use of the system information
without transferring information between the processors, whereby initial
load balancing is obtained in the multiprocessor system.
In a second aspect of the invention, there is provided a multiprocessor
system for processing a given load written by a predetermined programming
language comprising:
a plurality of processors and a network system linking the processors,
each of the processors comprising
(a) first means for generating history information, the amount of which is
smaller than that of information representative of the whole working
environment of a source processor, while processing a specific load
segment given to each processor,
(b) second means for transferring the history information to a destination
processor while operating the multiprocessor system,
(c) third means for reproducing the working environment of the source
processor by use of the history information transferred from a source
processor, and
(d) fourth means for processing the specific load segment by use of the
reproduced working environment of the source processor, whereby load
balancing is obtained in the multiprocessor system.
In a third aspect of the invention, there is provided a method of load
balancing processing in a multiprocessor system having a plurality of
processors and a network system linking the processors, comprising the
steps of:
providing a computational task or load written in a predetermined
programming language,
storing system information representative of characteristics of the
multiprocessor system,
automatically and dynamically selecting a specific initial load segment for
each processor from the given load by the use of the system information
without transferring information between the processors, and
performing the specific initial load segment processing in each processor
independently, whereby initial load balancing is obtained in the
multiprocessor system.
In a fourth aspect of the invention, there is provided a method of load
balancing processing in a multiprocessor system having a plurality of
processors and a network system linking the processors, comprising the
steps of:
providing a load written in a predetermined programming language,
generating history information in a first processor, the amount of which is
smaller than that of information representative of the whole working
environment of the first processor, while the first processor processes a
specific load segment given thereto,
transferring the history information from the first processor to a second
processor while operating the multiprocessor system,
dynamically reproducing the working environment of the first processor in
the second processor by use of the history information transferred from
the first processor, and
processing a specific load segment in the second processor by use of the
reproduced working environment of the first processor, whereby balancing
is obtained between the first and second processors and a partial load
segment of the first processor is shared with the second processor.
In a fifth aspect of the invention, there is provided a method of load
balancing processing in a multiprocessor system having a plurality of
processors and a network system linking the processors, comprising the
steps of:
providing a load written in a predetermined programming language,
storing system information representative of characteristics of the
multiprocessor system,
automatically and dynamically selecting a specific initial load segment for
each processor from the given load by use of the system information,
without transferring information between the processors,
processing the specific initial load segment in each processor
independently, whereby an initial load balancing is obtained in the
multiprocessor system,
generating history information in a first processor, the amount of which is
smaller than that of information representative of the whole working
environment of the first processor, while the first processor processes a
specific load segment given thereto,
transferring the history information from the first processor to a second
processor while operating the multiprocessor system,
dynamically reproducing the working environment of the first processor in
the second processor by use of the history information transferred from
the first processor, and
processing a specific load segment in the second processor by the use of
the reproduced working environment of the first processor, whereby load
balancing is obtained between the first and second processors and a
partial load segment of the first processor is shared with the second
processor.
BRIEF DESCRIPTION OF THE DRAWINGS
Further objects and advantages of the present invention will be apparent
from the following description, reference being had to the accompanying
drawings wherein preferred embodiments of the present invention are
clearly shown.
In the drawings:
FIG. 1 shows time charts for explaining the overhead that inevitably
accompanies load balancing in conventional multiproccesor systems;
FIG. 2 shows an example of an inference tree representing a solution of the
Prolog program;
FIG. 3 is a block diagram showing an embodiment of a multiprocessor system
according to the present invention;
FIG. 4 is a block diagram showing a first embodiment of the processor which
constitutes the multiprocessor system according to the present invention;
FIGS. 5 and 6 show inference trees for explaining the initial load
balancing of the initial goal performed in the multiprocessor system
according to the present invention;
FIGS. 7 and 8 are block diagrams both showing a second embodiment of the
processor within the multiprocessor system according to the present
invention;
FIG. 9 shows a partial inference tree of the logic programming language for
explaining a process for producing in the first processor history
information (represented by data of three words or five words) to be
transferred;
FIG. 10 shows a partial inference tree for explaining a process for
reproducing a required working environment by the use of the transferred
history information (represented by data of three words) in the second
processor;
FIG. 11 shows a partial inference tree for explaining a process for
reproducing a required working environment by use in the second processor
of the transferred history information (represented by data of five
words); and
FIG. 12 shows a partial inference tree for explaining a process of the
second processor for reproducing a required working environment by use of
the transferred history information represented by data of (1+2n) words at
time periods "n" for performing the load balancing in the first processor.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Referring now to the drawings, wherein like reference characters designate
like or corresponding parts throughout the several views, FIG. 3 is a
block diagram showing all the elements of a multiprocessor system 1
according to the present invention. This multiprocessor system 1
(surrounded by a dashed line) provides a network system (surrounded by a
dotted line) and a plurality of processors PR0 to PRn (where n denotes an
integral number). These processors PR0 to PRn are connected to each other
by the network system. The present invention is directed to the processors
rather than the network system; hence, a description of the network system
will be omitted from this specification.
[A] FIRST EMBODIMENT
Next, a description will be given with respect to the structure of the
first embodiment of a processor PRi (where i denotes an integral number
lying between 1 to n) in conjunction with FIG. 4. In FIG. 4, the processor
PRi comprises a register Rpi for storing data representative of a
processor identifier pi, a register Rpn for storing data representative of
a number pn of processors (hereinafter, referred to as assignable
processors) which are subjected to the initial load balancing of the
initial goal, a counter Cpc for counting a number pc of the branches at
each node (i.e., at each process for searching the desirable branch in the
initial load balancing), a flag section F for storing a flag representing
whether the initial load balancing has been performed or not, an operation
section 11 for performing operations and big-or-small judgment which will
be described later, a program memory 12 for storing programs written in
the logic programming language to be executed, and a working memory 13 for
storing data representative of the working environments which are used for
executing the programs.
Next, a description will be given with respect to a "strategic procedure"
for performing the OR process in each processor PRi in conjunction with
FIG. 5. FIG. 5 shows a strategic procedure for performing the initial load
balancing in the case where the number of initially assignable processors
is set to twelve. In FIG. 5, each of characters p0 to p11 designates each
of the identifiers pi of the twelve processors.
In the following description, a number id representing an inference depth
of the logic programming language varies "0", "1", "2", . . . as the
inference depth becomes deeper. In addition, the number id equals "0" at
the initial goal stage. Further, a number m (where m denotes an integral
number) of branches bm connected to a common node varies "0", "1", . . .
from the left most branch in turn.
(I) First, the multiprocessor system initializes the registers Rpi and Rpn,
the counter Cpc and the flag section F.
(II) Secondly, the assignable processors are equally assigned to the
branches of the inference tree which branches from an initial goal point.
For example, five processors having the identifiers p0, p1, p2, p3 and p4
are respectively and dynamically assigned to five branches b0, b1, b2, b3
and b4 connected to the common node (i.e., the initial goal point) from
the left most branch. Similarly, seven other processors having the
identifiers p5 to p11 are assigned to the five branches b0 to b4. Thus,
three processors having the identifiers p0, p5 and p10 are assigned to the
branch b0. In addition, the three processors having the identifiers p1, p6
and p11 are assigned to the branch b1. Similarly, the two processors
having the identifiers p4 and p9 are assigned to the branch b4.
As described above, plural processors are assigned to each of the branches
b0 to b4. The processors assigned to one branch designate the assignable
processors in a next stage.
(III) Next, the present system performs the initial load balancing in the
direction of the inference depth. Similar to the above-mentioned procedure
(II), the processors assigned to one branch are assigned to next branches
connected to that one branch. When one processor is assigned to each of
the next branches, the initial load balancing is completed. In this case,
when the number pn of the assignable processors is smaller than the number
of assignable branches, a branch next to the branch assigned by the last
processor is saved for the last processor (i.e., the right branch is saved
for the last processor), but branches next to the branches each assigned
by the other processors are not saved (i.e., the right branches are left
for the other processors).
For example, the processors having the identifiers p0, p5 and p10 are
respectively assigned to the branches b0, b1 and b2 in inference depth
id=1, and the branch b3 is saved for the processor having the identifier
p10. Similarly, other branches are assigned by the corresponding
processors as shown in FIG. 5.
The above is a diagrammatical explanation of the initial load balancing of
the initial computational task or goal. Next, a generalized descripti | | |