|
Description  |
|
|
CROSS REFERENCE TO RELATED APPLICATION
Concurrently filed herewith and assigned to the same assignee as this
application are:
T. P. Bishop, et al., "InterProcessor Communication Protocol", Ser. No.
941,702.
T. P. Bishop, et al., "Virtual Execution of Programs on a Multiprocessor
System", Ser. No. 941,700 now U.S. Pat. No. 4,849,877; and
T. P. Bishop, et al., "Controlled Dynamic Load Balancing for a
Multiprocessor System", Ser. No. 941,701.
TECHNICAL FIELD
Our invention relates to computer operating systems and more particularly
to an extended process that is active on a plurality of processors
simultaneously.
BACKGROUND OF THE INVENTION
The problem of controlling the allocation of resources in a distributed or
multiprocessor system is well known. Multiprocessor systems fall into one
of two categories: tightly coupled multiprocessor systems or distributed
architecture multiprocessor systems. In a tightly coupled multiprocessor
system, the processors share common memory and kernel data structures and
schedule processes from a common pool. In a distributed architecture
multiprocessor system, the processors are pooled to allow resource sharing
but each processor retains autonomy over its own environment. Each
processor or computer is an autonomous unit consisting of a CPU, memory,
and peripherals. A computer can be used in the distributed architecture
even though it does not have local file storage. The most important
feature that distinguishes distributed systems from tightly coupled
systems is that the physical memory available to each machine is
independent of activity on other machines. Consequently, the kernels on
each machine are independent, subject only to the external constraints of
running in a distributed environment. There are three major types of
distributed systems. The first type is satellite systems which are tightly
clustered groups of machines that are centered on one machine. Normally,
the center machine is a larger machine. The satellite processors share the
process load with the center processor and refer all system calls to it.
The purpose of a satellite system is to increase system throughput and,
possibly, to allow dedicated use of a processor for one process in a UNIX
system environment. The system runs as a unit; unlike other models of the
distributed system, satellite processors do not have real autonomy except,
sometimes, in process scheduling and in local memory allocation.
Newcastle distributed systems are the second type of system. A Newcastle
distributed system allows access to remote systems by recognizing the
names of remote files in the C library. The remote files are distinguished
by special characters embedded in the path name or by special path
component sequences that proceed the file system route. This method can be
implemented without making changes to the kernel and is therefore easier
to implement then the other types of systems, but is less flexible. The
final type of distributed system is the transparent distributed system.
The latter system allows standard path names to refer to files on other
machines; the kernel recognizes that the files are remote. Path names
cross machine boundaries at mount points, much as they cross file system
mount points on disks.
The satellite architecture provides a configuration that improves system
throughput by offloading processes from the central processor and
executing them on the satellite processors. Each satellite processor has
no local peripherals except for those it needs to communicate with the
central processor. Each process on a satellite processor has an associated
stub process on the central processor. When a process on a satellite
processor makes a system call that requires services provided only by the
central processor, the satellite process communicates with its stub
process on the central processor to satisfy the request. The stub process
executes the system call and sends the results back to the satellite
processor. The stub process is created when the process is assigned to the
satellite processor. The problem with the satellite architecture is that
all system calls involving external files or devices must be handled by
the central processor thus slowing down the throughput of the system.
Further information concerning the satellite architecture may be found in
the article by Birrel, et al., "Implementing Remote Procedure Calls", ACM
Transactions on Computer Systems, Vol. 2, No. 1, Feb. 1984, pp. 39-59.
In the Newcastle architecture, the kernel does not participate in
determining that a file is remote; instead, C library functions that
provide the kernel interface detect that a file access is remote and take
the appropriate action. For both naming conventions, the C library parses
the first components of a file name to determine that a file is remote.
The problems associated with the Newcastle architecture are as follows.
System performance may be degraded. Because of the larger C library, each
process takes up more memory even though it makes no remote references;
the library duplicates kernel functions and takes up more space. Local
requests may execute more slowly because they take longer to get into the
kernel, and remote requests may also be slow because they have to do more
processing at user level to send requests across a network. Finally,
programs must be recompiled with new libraries to access remote files; old
programs and vendor supplied object modules do not work for remote files
unless recompiled. The problem then with the Newcastle architecture is
that it is not transparent to the user.
The transparent distributed architecture has a pool of server processes and
assigns them temporarily to handle each remote request as it arrives.
After handling a request, the server process re-enters the pool and is
available for assignment to other requests. The server process does not
remember the user context between system calls, because it may handle
system calls for several processes. The server processes are set up by the
system administrator at initialization time. The problem with the
transparent distributed architecture is that for each remote operation,
process-specific information must be transmitted to the server process
thus increasing the amount of information that must be communicated by
packet. Another problem is in handling flow control since a server process
is locked up waiting to finish the operation using a large number of
server processes.
SUMMARY OF THE INVENTION
This invention is directed to solving these and other problems and
disadvantages of the prior art. According to the invention, a program's
execution is controlled by an extended process that spans a plurality of
processors in a multiprocessor system. The extended process comprises an
user process on one processor for executing the object code of the program
and stub processes each on an individual one of the remaining processors
for accessing system resources required for execution of the program. Each
stub process gives the extended process access to the resources associated
with the processor executing the stub process. Further, a stub process is
unique to one particular extended process. Each stub process is
interconnected to the user process by an individual virtual communication
channel.
A method in accordance with the invention comprises the steps of:
establishing a user process on a first processor, establishing a first
stub process on a second processor to perform file, I/O, and other types
of operations for only the user process, establishing a second stub
process on a third processor to perform file, I/O, and other types of
operations for only the user process, setting up a first communication
channel between the user process and the first stub process, and setting
up a second virtual communication channel between the user process and the
second stub process.
The establishment of the user process involves in part creating a process
control block, building a channel table having data structures to identify
communication channels, building a link list to identify data structures
in the channel table, and inserting a list pointer into the process
control block to identify the link list. The setting up of the
communication channels involves loading information into the link list and
data structures of the channel table so as to identify the communication
channels.
Advantageously, the first processor maintains a user file table in the
process control block to identify files used by the user process and a
system file table and an inode file table which together identify all
files local to the first processor. Local files are accessed via the user,
system, and inode file tables. A remote file is accessed by first
identifying the communication channel that interconnects the user process
with the stub process associated with the remote file via the user file
table and channel table and then sending a packet to the stub process
requesting that the stub process read the remote file. I/O type operations
are handled in a similar manner. Other types of operations that are remote
to the user process are performed by first accessing the channel table via
the list pointer and link list to identify the communication channel that
interconnects the user process to the stub process where the remote
operation is to be performed and then sending a packet to that stub
process requesting that the remote operation be performed.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 illustrates, in block diagram form, a multiprocessor system for
utilizing the present invention;
FIG. 2 illustrates, in flowchart form, the functions performed during the
execution of an exec system call by the multiprocessor system of FIG. 1;
FIG. 3 illustrates, in block diagram form, the interconnection of an
extended process for a subset of the processors of FIG. 1;
FIG. 4 illustrates, in greater detail, the software interconnection of FIG.
3;
FIG. 5 illustrates, in block diagram form, the file control structure for
an extended process executing on the processors of FIG. 1; and
FIG. 6 illustrates, in block diagram form, the file control structure for
accessing a.out files for an extended process executing on the processors
of FIG. 1.
DETAILED DESCRIPTION
FIG. 1 shows a multiprocessor system having a plurality of computers 101
through 106 interconnected by bus 107. Some of the computers illustrated
in FIG. 1 have particular functions. For example, computer 101 is
considered to be the host computer, and computers 105 through 106 may be
designated as computational servers or file servers. Each computer
operates under control of an operating system kernel which illustratively
is similar to the UNIX operating system described in an article by K.
Thompson, "Unix Implementation" the Bell System Technical Journal,
July-August, 1978, Volume 57, Number 6, and in a book by M. J. Bach
entitled The Design Of The Unix Operating System, Prentice-Hall, 1986,
Englewood Cliffs, N.J. Whereas the operating system kernel described by
Thompson is restricted to only a single computer, the kernels of FIG. 1
allow a process to be extended over a number of computers. This extended
process is a collection of individual special processes running on
separate computers and is described in greater detail later in this
section. These special processes are also referred to as primary or user
and auxiliary or stub processes. Each kernel associated with the extended
process maintains a distinct process necessary to allow the extended
process to function on the computer controlled by the associated kernel.
Each computer has associated memory and I/O devices; however, certain
computers may be interconnected to special I/O devices such as
telecommunication data interfaces or mass storage devices.
The initiation of a new program in the system illustrated in FIG. 1 results
in that program being automatically assigned to an unspecified computer
that has processing capacity to spare or which has special resources
required by the program. The unspecified computer may be the same computer
executing the request or a different computer. The execution of the program
can be distributed over a number of computers utilizing one computer which
has processing capacity and yet using one or more computers which have the
necessary files or special I/O capabilities. When program execution is
distributed, an extended process is created. The operation of the extended
process so as to allow the execution of a program to be performed among a
plurality of computers and yet making the existence of the extended
process transparent to the application programmer is the subject of this
invention.
In the previously referenced article by Thompson, it was noted that a
process is a software unit which requires text, data and stack areas of
memory and which is identified to the operating system by a process
control block. In the operating system described by Thompson, the process
control block is contained in one area of memory since that operating
system is executed on a uniprocessor system. In the system illustrated in
FIG. 1 for example, the process control block is distributed among all the
computers which are associated with the extended process. The extended
process comprises processes 112, 110, 111 and possibly processes located
in computer 105 and 106 after an exec system call has finished initiating
execution of the program. The extended process consists of a user process
and a number of stub processes. The user process has text, data and bss
areas of memory in addition to the process control block. A stub process
contains only a portion of the process control block relating to operating
system functions pertaining to that particular computer's operations with
respect to the extended process as required at any point in time.
The extended process is an entity that dynamically adjusts to supply the
required resources of the multiprocessor system to allow execution of the
program. New stub processes are added to the extended process as other
resources are required in different processors of the multiprocessor
system. The kernel of the processor executing the user process of the
extended process automatically detects the need to create a stub process
when a system call is made by the user process requiring resources on a
new processor. The user process's kernel then communicates with the kernel
of the new processor to establish a stub process on the new processor. The
establishment of a stub process also includes the creation of a virtual
communication channel between the user process and the new stub process.
Once established, subsequent communication flows between the user process
and the stub process via the virtual channel.
As described in the article by Thompson, the execution of a new program on
a single processor controlled by the Unix operating system is as follows.
First, a fork system call is executed which replicates the executing
process into child and parent processes. These processes share the same
program, but have different data storage. The child process then executes
an exec system call. The execution of the exec system call results in a
new program being executed. To further understand the structure of the
extended process, consider the following example which illustrates the
initiation of a new program by the execution of the exec system call. The
exec system call has been modified to allow for the execution of programs
on a multiprocessor system. Olduser process 109, on computer 102, executes
the exec system call. The end result is that the new program is eventually
executed by newuser process 111 on computer 104. Initially, the file
containing the new program is in the file system of computer 103 and is
accessed by a.out process 110. Computers 105 and 106 also have resources
that maybe utilized by newuser process 111.
Upon olduser process 109 executing the exec system call, the kernel of
computer 102 transmits a packet to computer 103 to obtain the header
portion of the a.out file via a.out process 110 so as to determine the
type of resources required to execute this program. The allocation of
resources and dynamic load balancing is performed by process manager (M)
function 108 being executed by computer 101 which is designated as the
host computer of the system of FIG. 1. The kernel of computer 102 then
transmits a packet to process manager function 108 of computer 101
requesting allocation of resources for the execution of newuser process
111. In our present example, process manager function 108 transmits back a
message designating that computer 104 is to execute newuser process 111.
Further information concerning the operations of process manager function
108 is illustrated in the copending application of Bishop et al., Ser. No.
941,701. The kernel of computer 102 then transmits process control
information to the kernel of computer 104 so that the latter kernel can
setup newuser process 111 and stub processes in computers 102, 105, and
106 for the future execution of the extended process.
Once this initialization has been performed, the kernel of computer 102
passes the execution of the exec system call to the kernel of computer
104. The latter kernel obtains the a.out file from computer 103. The
kernel of computer 104 also transmits messages to the kernels of the other
computers informing them that the user process which was initially olduser
process 109 has migrated to computer 104 and is now newuser process 111.
Olduser process 109 now becomes a stub process. The kernels of the other
computers will now direct any signals for olduser process 109 to newuser
process 111. Further, the kernel of computer 104 transmits a message to
the kernel of computer 102 to recover all signals transmitted to olduser
process 109 that arrived at computer 102 before the other computers were
informed that the extended process had migrated to computer 104. Once
newuser process 111 has been set up and begins to execute, it can utilize
the resources of the other computers as required via stub processes that
were created in these computers. If, during the execution of the program,
it is necessary to access a computer that was not initially designated as
being part of the extended process, then the operating system of computer
104 requests the creation of a stub process on that computer necessary to
continue execution of the program.
FIG. 2 illustrates in greater detail the execution of the exec system call
and creation of the extended process for the present example. Upon
execution of the exec system call by olduser process 109, decision block
202 is performed. The exec system call may specify parameters for
influencing the processor assignment. Decision block 202 determines
whether or not the file containing the a.out file is local to computer 102
or is on a remote computer. Since the file is on computer 103 in the
present example, it is remote; and if a stub process does not already
exist on computer 103 for the present extended process, a packet is sent
to create a stub process on computer 103. In response to the packet, the
kernel of computer 103 creates a.out process 110 that allows access to the
a.out file. The a.out process 110 then becomes part of the extended
process. Block 206 accesses the a.out file located on computer 103 via
a.out process 110. The header information is read from the a.out file and
is stored in the process control block of a.out process 110. The kernel of
computer 103 then transmits a subset of the header to computer 102's kernel
which stores the subset in the process control block of olduser process 109
in computer 102. The information obtained from the a.out file at this point
specifies the size of the a.out file and may specify parameters for
influencing the processor assignment decision. After obtaining the
information from the a.out file, the kernel of computer 102 transmits a
packet to the kernel of computer 101 requesting that the kernel execute
process manager function 108 to select a computer upon which newuser
process 111 is to assigned at block 208. This packet contains the
information obtained from the a.out file in block 206 and any parameters
regarding processor assignment in the exec system call. PM function 108 is
responsive to this packet to validate an explicit assignment if one existed
in the a.out or exec system call information or to perform a dynamic load
balancing for the multiprocessor system illustrated in FIG. 1 in order to
make a processor assignment for newuser process 111. In the present
example, newuser process 111 is assigned to computer 104.
Next, the kernel of computer 102 executes block 210. The execution of block
210 results in the arguments of the exec system call being read. The kernel
of computer 102 is responsive to these arguments and any environment
variables from the olduser process 109's address space to transfer this
information into a system work area formatting this information into an
initial stack for newuser process 111. Block 212 is next executed which
releases the resources of olduser process 109 back to the operating system
of computer 102. In particular, the address space of olduser process 109 is
released.
The actions just performed represent a preexecution stage of the exec
system call. If the newuser process is present on a different computer
than the olduser process, then blocks 220 through 238 are executed before
blocks 240 through 250. In the present example, the kernel of computer 102
executes blocks 220 through 228, and the kernel of computer 104 executes
blocks 230 through 238. However, if the olduser and the newuser processes
are on the same computer, then the blocks 240 through 250 illustrated in
FIG. 2 are executed at this point in time. Decision block 214 determines
whether or not the newuser and olduser processes are on different
computers. In the present example, olduser process 109 is on computer 102
and newuser process 111 is on computer 104. If a stub process does not
already exist on computer 104, the kernel of computer 102 executes block
220 which results in a packet being transmitted over to the kernel of
computer 104. This packet requests that a stub process be created which
will become newuser process 111 on computer 104. The kernel is responsive
to this request to create a skeleton stub process by performing a kernel
fork function on a prototype stub process. Each kernel of FIG. 1 maintains
a copy of the prototype stub process for the purpose of creating stub
processes. The kernel of computer 102 then executes block 222. The latter
block results in the transmission of a migration packet from computer 102
to computer 104. The packet contains the initial process control
information for newuser process 111. That information was formatted in
block 210. The migration packet contains the information necessary to
transform the stub for newuser process 111 on computer 104 into a viable
user process of an extended process. Viability is defined here to mean
that the newuser process has all the information necessary to exit or
terminate gracefully if required. A graceful exit is one where all parts
of the extended process can be removed from all the computers of FIG. 1 if
it is necessary to terminate the extended process.
The principal information contained in the migration packet is the
reconnection data for the stub processes and information defining open
files of the extended process. This data is used to reattach the stub
processes and files that had been attached to olduser process 109 to
newuser process 111. The reattachment is performed by rearranging the
virtual channels and discussed with respect to FIG. 3. Certain crucial
data from the process control block de | | |