|
Claims  |
|
|
What is claimed is:
1. A distributed processing apparatus for executing a program having a
plurality of callable procedures, comprising:
a first processor coupled to a first local memory for executing procedures
contained in said first local memory;
a second processor coupled to a second local memory for executing
procedures contained in said second local memory;
means for communicating data between said first and second processors;
means for a first callable procedure of said program contained in said
first local memory to call a second callable procedure of said program
contained in said second local memory;
means for said second callable procedure to recursively call a callable
procedure contained in said first local memory while executing on behalf
of said first callable procedure.
2. The distributed processing apparatus of claim 1, further comprising:
means for maintaining a first program stack contained in said first local
memory, said first program stack comprising one or more activation blocks,
each activation block containing the state information of an instance of a
procedure contained in said first local memory; and
means for maintaining a second program stack contained in said second local
memory, said second program stack comprising one or more activation
blocks, each activation block containing the state information of an
instance of a procedure contained in said second local memory.
3. The distributed processing apparatus of claim 1, further comprising:
means for calling an outer procedure of said program to commence execution
of said program in said first processor;
means for automatically establishing a link, without prompting from an
external command, between said first procedure and said second procedure,
wherein said means for said first callable procedure to call said second
callable procedure of is automatically enabled; and
means for automatically establishing a link, without prompting from an
external command, between said second procedure and a callable procedure
contained in said first local memory, wherein said means for said second
callable procedure to call a callable procedure contained in said first
local memory while executing on behalf of said first callable procedure is
automatically enabled.
4. A distributed processing apparatus for executing a program having a
plurality of callable procedures, comprising:
a first processor coupled to a first local memory for executing procedures
contained in said first local memory;
a second processor coupled to a second local memory for executing
procedures contained in said second local memory;
means for communicating data between said first and second processors;
means for a first callable procedure of said program contained in said
first local memory to call a second callable procedure of said program
contained in said second local memory, said means comprising:
(a) means for said first callable procedure to issue a first local call to
a first c-stub module, said first c-stub module being contained in said
first local memory and representing said second callable procedure;
(b) means, responsive to said first local call, for said first c-stub
module to communicate data contained in said first local call to a first
s-stub module, said first s-stub module contained in said second local
memory and representing said second callable procedure;
(c) means, responsive to said means for said first c-stub module to
communicate data contained in said first local call to said first s-stub
module, for said first s-stub module to issue a second local call to said
second callable procedure on behalf of said first callable procedure; and
means for said second callable procedure to call a callable procedure
contained in said first local memory while executing on behalf of said
first callable procedure, said means comprising:
(a) means for said second callable procedure to issue a third local call to
a second c-stub module while said second callable procedure is executing
on behalf of said first callable procedure, said second c-stub module
being contained in said second local memory and representing said
procedure contained in said first local memory which may be called by said
second callable procedure while executing on behalf of said first callable
procedure;
(b) means, responsive to said third local call, for said second c-stub
module to communicate data contained in said third local call to a second
s-stub module, said second s-stub module being contained in said first
local memory and representing said procedure contained in said first local
memory which may be called by said second callable procedure while
executing on behalf of said first callable procedure;
(c) means, responsive to said means for said second c-stub module to
communicate data contained in said third local call to said second s-stub
module, for said second s-stub module to issue a fourth local call to said
procedure contained in said first local memory which may be called by said
second callable procedure while executing on behalf of said first callable
procedure wherein said fourth local call is issued on behalf of said
second callable procedure.
5. The distributed processing apparatus of claim 4, wherein said means for
said second callable procedure to call a procedure contained in said first
local memory comprises means for said second callable procedure to
recursively call said first callable procedure.
6. The distributed processing apparatus of claim 4, further comprising:
means for maintaining a first program stack contained in said first local
memory, said first program stack comprising one or more activation blocks,
wherein each instance of a procedure contained in said first local memory,
each instance of a c-stub contained in said first local memory, and each
instance of an s-stub contained in said first local memory, is represented
by a unique activation block in said first program stack containing the
state information of said instance; and
means for maintaining a second program stack contained in said second local
memory, said second program stack comprising one or more activation
blocks, wherein each instance of a procedure contained in said second
local memory, each instance of a c-stub contained in said second local
memory, and each instance of an s-stub contained in said second local
memory, is represented by a unique activation block in said first program
stack containing the state information of said instance.
7. The distributed processing apparatus of claim 4, further comprising:
means for calling an outer procedure of said program to commence execution
of said program in said first processor;
means for automatically establishing a link, without prompting from an
external command, between said first procedure and said second procedure,
wherein said means for said first callable procedure to call said second
callable procedure of is automatically enabled; and
means for automatically establishing a link, without prompting from an
external command, between said second procedure and a callable procedure
contained in said first local memory, wherein said means for said second
callable procedure to call a callable procedure contained in said first
local memory while executing on behalf of said first callable procedure is
automatically enabled.
8. A method for executing a computer program on a multi-processor system,
comprising the steps of:
allocating a first set of callable procedures contained in said program to
a first processor;
allocating a second set of callable procedures contained in said program to
a second processor;
executing, with said first processor, a first callable procedure contained
in said first set of callable procedures;
calling, from said first callable procedure, a callable procedure contained
in said second set of callable procedures while performing said step of
executing, with said first processor, a first callable procedure;
executing, with said second processor, a second callable procedure on
behalf of said first callable procedure; and
recursively calling, from said second callable procedure, a callable
procedure contained in said first set of callable procedures while
performing said step of executing, with said second processor, said second
callable procedure.
9. The method of claim 8, further comprising the steps of:
maintaining, in a first local memory of said first processor, a first
program stack, said first program stack comprising one or more activation
blocks, each activation block containing state information of an instance
of a procedure contained in said first local memory; and
maintaining, in a second local memory of said second processor, a second
program stack, said second program stack comprising one or more activation
blocks, each activation block containing state information of an instance
of a procedure contained in said second local memory.
10. The method of claim 8, further comprising the steps of:
calling an outer procedure of said program to commence execution of said
program in said first processor;
automatically establishing a link, without prompting from an external
command, between said first procedure and a callable procedure contained
in said second set, wherein said step of calling, from said first callable
procedure, a callable procedure contained in said second set of callable
procedures is automatically enabled; and
automatically establishing a link, without prompting from an external
command, between said second procedure and a callable procedure contained
in said first set of callable procedures, wherein said step of calling,
from said second callable procedure, a callable procedure contained in
said first set of callable procedures is automatically enabled.
11. A method for executing a computer program on a multi-processor system,
comprising the steps of:
allocating a first set of callable procedures contained in said program to
a first processor;
allocating a second set of callable procedures contained in said program to
a second processor;
executing, with said first processor, a first callable procedure contained
in said first set of callable procedures;
calling, from said first callable procedure, a callable procedure contained
in said second set of callable procedures while performing said step of
executing, with said first processor, a first callable procedure, wherein
said step of calling, from said first callable procedure, a callable
procedure in said second set comprises the steps of:
(a) issuing a first local call from said first callable procedure to a
first c-stub module contained in a first local memory of said first
processor;
(b) communicating said first local call to a first s-stub module contained
in a second local memory of said second processor;
(c) issuing a second local call from said first s-stub module to a callable
procedure in said second set on behalf of said first callable procedure;
executing, with said second processor, a second callable procedure on
behalf of said first callable procedure; and
calling, from said second callable procedure, a callable procedure
contained in said first set of callable procedures while performing said
step of executing, with said second processor, said second callable
procedure, wherein said step of calling, from said second callable
procedure, a callable procedure contained in said first set of callable
procedures comprises the steps of:
(d) issuing a third local call from said second callable procedure to a
second c-stub module contained in said second local memory;
(e) communicating said third local call to a second s-stub module contained
in said first local memory;
(f) issuing a fourth local call from said second s-stub module to a
callable procedure in said first set on behalf of said second callable
procedure.
12. The method of claim 11, further comprising the steps of:
maintaining, in said first local memory of said first processor, a first
program stack, said first program stack comprising one or more activation
blocks, wherein each instance of a procedure contained in said first local
memory, each instance of a c-stub contained in said first local memory,
and each instance of an s-stub contained in said first local memory, is
represented by a unique activation block in said first program stack
containing the state information of said instance; and
maintaining, in said second local memory of said second processor, a second
program stack, said second program stack comprising one or more activation
blocks, wherein each instance of a procedure contained in said second
local memory, each instance of a c-stub contained in said second local
memory, and each instance of an s-stub contained in said second local
memory, is represented by a unique activation block in said second program
stack containing the state information of said instance.
13. A method for executing a single-thread computer program having a
plurality of callable procedures, comprising the steps of:
allocating each of said plurality of callable procedures in said program to
one of a plurality of sets of callable procedures;
storing callable procedures of a first set of said plurality of sets of
callable procedures in a first local memory of a first processor of a
multi-processor computer system;
storing callable procedures of a second set of said plurality of sets of
callable procedures in a second local memory of a second processor of said
multi-processor computer system;
executing said program on said multi-processor system, wherein said
executing step comprises the steps of:
(a) executing callable procedures contained in said first set of callable
procedures on said first processor, wherein at least one procedure in said
first set calls a procedure in said second set while executing on behalf
of a procedure in said second set; and
(b) executing callable procedures contained in said second set of callable
procedures on said second processor, wherein at least one procedure in
said second set calls a procedure in said first set while executing on
behalf of a procedure in said first set.
14. The method of claim 13, wherein said step of allocating each of said
plurality of callable procedures in said program to one of a plurality of
sets of callable procedures comprises the step of:
determining, with respect to each said callable procedure, which processor
of said multi-processor system should execute the callable procedure,
wherein said determination is made without reference to a calling history
of said program.
15. The method of claim 14, wherein said first processor executes callable
procedures of a first type more efficiently than said second processor,
and said second processor executes callable procedures of a second type
more efficiently than said first processor, and wherein said determining
step determines that callable procedures of said first type should execute
on said first processor, and that callable procedures of said second type
should execute on said second processor.
16. The method of claim 15, wherein said first processor is a
general-purpose commercial transaction processor and said second processor
is a numeric-intensive processor.
17. The method of claim 13, wherein said step of executing said program on
said multi-processor system comprises the steps of:
calling an outer procedure of said program to commence execution of said
program in said first processor;
automatically establishing a link, without prompting from an external
command, between a procedure in said first set of callable procedures and
a procedure in said second set, wherein the procedure in said first set is
automatically enabled to call the procedure in said second set; and
automatically establishing a link, without prompting from an external
command, between a procedure in said second set of callable procedures and
a procedure in said first set, wherein the procedure in said first set is
automatically enabled to call the procedure in said second set.
18. A method for adapting a single-thread computer program having a
plurality of callable procedures and originally written to execute on a
single processor system to execute on a multi-processor system, said
method comprising the machine-executed steps of:
creating a first partition of said computer program, said first partition
comprising a first set of callable procedures;
creating a second partition of said computer program, said second partition
comprising a second set of callable procedures, wherein said first and
second sets are disjoint;
wherein a callable procedure in said first set of callable procedures
contains a call to a callable procedure in said second set of callable
procedures;
wherein a callable procedure in said second set of callable procedures
contains a call to a callable procedure in said first set of callable
procedures;
placing in said first partition means for a receiving a local call from a
procedure in said first set to a procedure in said second set;
placing in said second partition means for issuing a local call to a
procedure in said second set on behalf of a procedure in said first set;
placing in said second partition means for receiving a local call from a
procedure in said second set to a procedure in said second set; and
placing in said first partition means for issuing a local call to a
procedure in said first set on behalf of a procedure in said second set.
19. The method of claim 18, wherein said multi-processor system comprises a
first processor and a second processor, wherein said first processor
executes callable procedures of a first type more efficiently than said
second processor, and said second processor executes callable procedures
of a second type more efficiently than said first processor, and wherein
said steps of creating a first partition and creating a second partition
allocate procedures of said first type to said first partition and
procedures of said second type to said second partition.
20. The method of claim 19, wherein said first processor is a
general-purpose commercial transaction processor and said second processor
is a numeric-intensive processor.
21. A program product for execution on a multi-processor computer system,
said program product having a plurality of computer executable
instructions recorded on a computer readable recording medium, said
plurality of computer executable instructions representing a computer
program for execution in single-thread mode, said program product
comprising:
a first partition of said computer program for execution on a first
processor of said multi-processor system, said first partition comprising
a plurality of callable procedures executable on said first processor;
a second partition of said computer program for execution on a second
processor of said multi-processor system, said second partition comprising
a plurality of callable procedures executable on said second processor;
means in a first callable procedure contained in said first partition, when
executing on said first processor of said multi-processor system, for
issuing a call to a callable procedure in said second partition while
executing on behalf of a callable procedure in said second partition;
means in a second callable procedure contained in said second partition,
when executing on said second processor of said multi-processor system,
for issuing a call to a callable procedure contained in said first
partition while executing on behalf of said first callable procedure.
22. The program product of claim 21, wherein said second callable procedure
comprises means for recursively calling said first callable procedure.
23. The program product of claim 21, further comprising:
a first c-stub module contained in said first partition and representing a
callable procedure contained in said second partition;
a first s-stub module contained in said second partition and representing
said first callable procedure;
a second c-stub module contained in said second partition and representing
a callable procedure contained in said first partition;
a second s-stub module contained in said first partition and representing
said second callable procedure;
wherein said means, contained in said first callable procedure, for issuing
a call to a procedure in said second partition, comprises means for
issuing a first local call to said first c-stub module;
wherein said first c-stub module comprises means for sending data to said
first s-stub module in response to said first local call;
wherein said first s-stub module comprises means for issuing a second local
call to a callable procedure contained in said second partition, on behalf
of said first callable procedure;
wherein said means, contained in said second callable procedure, for
issuing a call to a callable procedure contained in said first partition
while executing on behalf of said first callable procedure, comprises
means for issuing a third local call to said second c-stub module;
wherein said second c-stub module comprises means for sending data to said
second s-stub module in response to said third local call; and
wherein said second s-stub module comprises means for issuing a fourth
local call to a callable procedure contained in said first partition on
behalf of said second callable procedure.
24. A distributed processing apparatus for executing a program having a
plurality of callable procedures, comprising:
a first processor coupled to a first local memory for executing procedures
contained in said first local memory;
a second processor coupled to a second local memory for executing
procedures contained in said second local memory;
means for communicating data between said first and second processors;
means for maintaining a first program stack contained in said first local
memory, said first program stack comprising one or more activation blocks,
each activation block containing state information of an instance of a
procedure contained in said first local memory;
means for maintaining a second program stack contained in said second local
memory, said second program stack comprising one or more activation
blocks, each activation block containing state information of an instance
of a procedure contained in said second local memory;
means for a first instance of a callable procedure of said program
contained in said first local memory to call a second instance of a
callable procedure of said program contained in said second local memory;
means for said second instance of a callable procedure to call a third
instance of a callable procedure contained in said first local memory
while executing on behalf of said first instance of a callable procedure;
and
means for said third instance of a callable procedure to call an instance
of a callable procedure contained in said second local memory while
executing on behalf of said first and second instances of callable
procedures.
25. The distributed processing apparatus of claim 24, wherein said means
for a third instance of a callable procedure to call an instance of a
callable procedure in said second local memory comprises means for said
third instance to recursively call a procedure in said second local
memory.
26. The distributed processing apparatus of claim 24, further comprising
means for instances of callable procedures contained in said first and
second local memories to alternately call instances of callable procedures
contained in the respective opposite local memory, up to and including an
Nth instance of a callable procedure contained in the (((N-1) modulo 2)+1)
ordered local memory to call an instance of a callable procedure contained
in the ((N modulo 2)+1) ordered local memory while executing on behalf of
callable procedure instances 1 through (N-1), wherein the value of N is
limited only by a maximum size of said first and second program stacks.
27. A distributed processing apparatus for executing a program having a
plurality of callable procedures, comprising:
a first processor coupled to a first local memory for executing a first set
of callable procedures of said program from said first local memory;
a second processor coupled to a second local memory for executing a second
set of callable procedures of said program from said second local memory;
a communications connection for communicating data between said first and
second processors;
means for maintaining a first program stack contained in said first local
memory, said first program stack containing state information for each
instance of a procedure of said first set;
means for maintaining a second program stack contained in said second local
memory, said second program stack containing state information for each
instance of a procedure of said second set;
a bi-directional peer-to-peer remote procedure call mechanism,
said remote procedure call mechanism enabling a callable first procedure
executing on said first processor and contained in said first set of
callable procedures to call a callable procedure in said second set of
callable procedures for execution on said second processor, while said
first procedure is executing on behalf of a procedure in said second set
of callable procedures,
said remote procedure call mechanism enabling a callable second procedure
executing on said second processor and contained in said second set of
callable procedures to call a callable procedure in said first set of
callable procedures for execution on said first processor, while said
second procedure is executing on behalf of a procedure in said first set
of callable procedures; and
wherein said means for maintaining a first program stack and said means for
maintaining a second program stack update said respective first and second
program stacks in response to a remote procedure call executed by said
remote procedure call mechanism.
28. The distributed processing apparatus of claim 27, wherein,
said first program stack is not updated in response to a local call of a
callable procedure in said second set by a callable procedure of said
second set executing on said second processor; and
said second program stack is not updated in response to a local call of a
callable procedure in said first set by a callable procedure of said first
set executing on said first processor.
29. The distributed processing apparatus of claim 27, wherein said remote
procedure call mechanism enables callable procedures contained in said
first and second local memories to alternately call instances of callable
procedures contained in the respective opposite local memory, up to and
including an Nth instance of a callable procedure contained in the (((N-1)
modulo 2)+1) ordered local memory to call an instance of a callable
procedure contained in the ((N modulo 2)+1) ordered local memory while
executing on behalf of callable procedure instances 1 through (N-1),
wherein the value of N is limited only by a maximum size of said first and
second program stacks.
30. A distributed processing apparatus for executing a program having a
plurality of callable procedures, comprising:
a first processor coupled to a first local memory for executing procedures
contained in said first local memory;
a second processor coupled to a second local memory for executing
procedures contained in said second local memory;
means for communicating data between said first and second processors;
means for a first callable procedure of said program contained in said
first local memory to make a first local call to a second callable
procedure of said program contained in said second local memory;
means for said second callable procedure to make a second local call a
callable procedure contained in said first local memory while executing on
behalf of said first callable procedure.
31. A method for executing a computer program on a multi-processor system,
comprising the steps of:
allocating a first set of callable procedures contained in said program to
a first processor of said multi-processor system;
allocating a second set of callable procedures contained in said program to
a second processor of said multi-processor system;
executing, with said first processor, a first callable procedure contained
in said first set of callable procedures;
calling, from said first callable procedure, a callable procedure contained
in said second set of callable procedures while performing said step of
executing, with said first processor, a first callable procedure, wherein
said calling step is performed via a local call using said first and
second processors;
executing, with said second processor, a second callable procedure on
behalf of said first callable procedure; and
recursively calling, from said second callable procedure, a callable
procedure contained in said first set of callable procedures while
performing said step of executing, with said second processor, said second
callable procedure, wherein said recursively calling step is performed via
a local call using said first and second processors. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to data processing software usage, and in
particular to efficiently executing a single-thread computer program on
more than one processor.
BACKGROUND OF THE INVENTION
A modern computer system typically comprises a single central processing
unit (CPU), and other supporting hardware such as system memory,
communications busses, input/output controllers, storage devices, etc. The
CPU is the heart of the system. It executes the instructions which
comprise a computer program and directs the operation of the other system
components.
In the early years of computer development, the CPU was the most expensive
part of the system. As a result, systems were constructed around the CPU,
to optimize its usage. Multi-tasking systems, capable of serving a number
of users performing various tasks simultaneously, were a result of this
development history. Multi-tasking allows multiple users and tasks to
share the CPU. Although the system may be capable of serving a number of
users performing various tasks simultaneously, only one task can be
running in the CPU at any instant in time. If a particular task needs the
CPU and the CPU is busy, the task must wait. Thus, while multi-tasking
permits greater utilization of the CPU, it also means that the CPU is more
likely to be a bottleneck to overall system performance.
With the advent of integrated circuits, the cost of processors relative to
other system components has declined. As a result, computer systems are
being designed with more processors. For example, it has been standard for
a number of years to perform certain low level peripheral functions in
slave processors, such as disk drive controller processors, workstation
controller processors, etc. As the relative cost of such peripheral
processors has declined, system designers have expanded their use,
reducing the workload burden on the CPU.
In recent years, this availability of inexpensive processors has led to the
development of parallel and distributed processing systems, containing
multiple processors performing the functions traditionally performed by a
single CPU. The processors in such a multi-processor system have separate
address spaces, and may have their own storage and their own internal data
bus and I/O. The processors may be coupled through shared bus and shared
memory, or more loosely via communication networks or other I/O
controllers.
A special case of such a multi-processor system is the use of a
numeric-intensive co-processor with a general purpose main processor. The
architecture of the numeric-intensive co-processor is optimized for
performing applications requiring intensive computation (usually floating
point), while the main processor is optimized for handling a typical
instruction mix of data moves, compares, I/O, etc.
One of the problems with such multi-processor systems is that most programs
designed for execution on a computer system are inherently single-thread.
As used herein, "single-thread" means that the program contains a single
flow of control, whereby at any instant in time, a single sequence of
instructions is executing. Such a sequence may loop or jump to a different
point in the code, but it always follows a single path. Such a
single-thread program is to be distinguished from multiple threads of
control, in which program flow may divide, as a road does at a fork, and
proceed down both paths simultaneously. A single-thread program does not
adapt well to execution on multiple processors.
Where a single-thread program is to be executed on a multi-processor system
containing different types of processors, portions of the program must be
allocated to the different processors for execution. One alternative is to
re-write single-thread code to support a different flow of control,
enabling optimization of multiple processors. Certain computer languages
support such multi-processing, although only a small fraction of existing
computer programs are written in these languages. For example, the SIMULA
language supports the use of co-routines, which enable multiple
simultaneous threads of program execution. However, this solution is not
always possible, and even where possible, re-writing existing code tends
to be very expensive.
Another method for allocating portions of a program to multiple processors
is the client-server model, which is commonly used in distributed
processing systems. Each program part executes on some processor (the
client). When it needs the services of another processor (the server),
which has different capabilities than the client processor, it issues a
request that the server do some work on its behalf. The server returns
control to the client when done, along with the results of its work if
required. The client-server model allows different processors to cooperate
in executing a program, but the degree of cooperation is constrained. The
client must provide all information that may be needed to the server
before it begins execution, usually before it knows what information will
be needed. Existing client-server models are unidirectional in nature; the
server lacks capability to issue a call back to the client.
It is desirable to allocate different parts of a program to different
processors in a multi-processor system without extensive alteration to the
code. In particular, in the case of a system having a general purpose main
processor and a numeric-intensive co-processor, it is desirable to execute
the numeric-intensive procedures on the co-processor, and other procedures
on the main processor. Unfortunately, prior art mechanisms restrict the
ability of the system to allocate procedures in an optimal fashion.
It is therefore an object of the present invention to provide an enhanced
method and apparatus for executing programs on a multi-processor computer
system.
Another object of this invention is to provide an enhanced method and
apparatus for allocating portions of a program to different processors in
a multi-processor computer system.
Another object of this invention is to increase the flexibility of
allocating portions of a program to different processors in a
multi-processor computer system.
Another object of this invention is to increase the efficiency of processes
running on a multi-processor computer system.
Another object of this invention is to reduce the amount of alteration
required of a single-thread program to enable it to run efficiently on a
multi-processor computer system.
Another object of this invention is to reduce the cost of executing
programs on a multi-processor computer system.
Another object of this invention to provide an enhanced method and
apparatus for executing programs on a computer system comprising a general
purpose processor and a numeric-intensive co-processor.
SUMMARY OF THE INVENTION
A computer program comprising a plurality of program modules, each module
containing one or more callable procedures, executes on a multi-processor
system. Each program module executes on one of the processors in the
system, although any one processor may execute more than one module. The
locally addressable memory of each processor contains a program stack, the
object code of each module that executes on that processor, and an agent
object and data structures containing linkage information for handling
communications with the other processors. In addition, the local memory
contains a c-stub module for each procedure executable on a different
processor that can be called by one of the procedures in local memory, and
a s-stub module for each procedure in local memory that can be called by a
procedure executing on another processor. The set of program modules,
stubs, stack and agent in the local memory of a processor is called a
partition.
When a procedure P1 executing on processor A wishes to call a procedure P2
which executes in processor B, it issues a local call to the c-stub
corresponding to P2 in processor A's locally addressable memory. The P2
c-stub then invokes the agent process in processor A, which communicates
with a corresponding agent process in processor B. The agent process in
processor B causes a s-stub in processor B corresponding to procedure P2
to issue a local call to procedure P2. The return from | | |