|
Claims  |
|
|
I claim:
1. A maintenance method for updating system software for a plurality of processing units in a communication network from a first version to a second version, the processing units being
distributed among multiple nodes being linked by communication channels, each of the processing units being coupled to one or more storage units each having a status identification, the method comprising the steps of:
installing the second version in a storage unit of a source node;
transmitting the second version through the network to one or more specified storage units in other nodes, including the steps of assigning a first status to the status identification of the specified storage units; and
for at least one storage unit and a coupled processing unit in at least one of said other nodes,
changing the status identification of the specified storage unit to a second status upon successfully transmitting the second version thereto;
initiating a trial use of the second version in the processing unit of the one of said other nodes, including the step of changing the status identification of the storage unit containing the second version to a third status;
detecting in the processing unit of the one of said other nodes whether the second version operates successfully or fails to operate successfully in the one of said other nodes during the trial use;
restoring, upon detecting that the second version fails to operate successfully during the trial use, the one of said other nodes to the first version, including the step of changing the status identification of the storage units containing the
second version to the second status; and
designating, upon detecting that the second version operates successfully during the trial use, the second version as a preferred version of system software in the one of said other nodes including the step of changing the status identification
of the storage units containing the second version to a fourth status.
2. A method as in claim 1, further comprising the step of performing a consistency check on the second version of system software before the transmitting step.
3. A method as in claim 2, wherein the checking step comprises the steps of partitioning the second version into a prespecified number of modules and performing a checksum operation on each module.
4. A method as in claim 1, wherein the step of initiating a trial use comprises the steps of loading the second version from a storage unit for the one of said other nodes and executing the second version in the processing unit.
5. A method as in claim 4, wherein the detection step includes the step of monitoring for errors during the loading and execution o the second version.
6. A method as in claim 1, wherein the detecting step includes the steps of communicating one or more messages between a node and another node in the network and monitoring for errors during the communication.
7. A method as in claim 1, further comprising the steps, in response to a failure in the transmitting step, of:
locating a particular storage unit having the fourth status,
copying the contents of the particular storage unit having the fourth status to a storage unit receiving the second version, and
changing the status of the storage unit receiving the second version to a fourth status if the copying is successful.
8. A method as in claim 1, further comprising the steps, upon detecting a failure during the trial use, of:
locating a particular storage unit having a fourth status,
copying the contents of the particular storage unit having a fourth status to the storage unit containing the second version, and
changing the status identification of the storage unit containing the second version to the second status.
9. A method as in claim 1, further comprising, in a node with one or more additional processing units, the step of loading the one or more additional processing units with the second version after the second version has been designated as the
preferred operational version.
10. A method as in claim 1, further including the step, in a node with one ore more additional processing units, of transferring processing from said processing unit to said one or more additional processing units in the node.
11. A method as in claim 1, wherein the step of transmitting the second version is initiated by a command entered at one of the processing units.
12. In a communication network having a plurality of processing units distributed among multiple nodes linked by communication channels where each of the processing units is coupled to at least one memory unit and where each memory unit has a
status identification, maintenance apparatus for updating system software among the processing units from a first version to a second version comprising:
first means in a source node for receiving the second version;
second means coupled to the first means for transmitting the second version through the network to one or more specified memory units in one or more other nodes, including means for assigning a first status to the specified memory units, and
means for changing the first status to a second status upon successfully communicating the second version thereto;
for at least one of said other nodes,
third means for initiating a trial use of the second version in a processing unit of the one of said other nodes, including means for changing the status of memory units containing the second version to a third status;
fourth means for detecting whether the second version operates successfully or fails to operate successfully during the trial use;
said fourth means including, in response to a detection that the second version fails to operate successfully during the trial use means, for restoring the one of said other nodes to the first version, including means for changing the memory
units containing the second version to the second status; and
fifth means, in response to a detection that the second version operates successfully during the trial use, for designating the second version as a preferred version of system software, including means for changing the status of the memory units
containing the second version to a fourth status.
13. The apparatus as in claim 12, further comprising means for performing a consistency check of the second version before transmitting the second version through the network.
14. The apparatus as in claim 13, comprising means for partitioning the second version into a number of modules and means for calculating a checksum for each module, and wherein the means for performing the consistency check further comprises
means for checking checksums of the modules in the second version.
15. The apparatus as in claim 12, wherein the means for of initiating a trial use includes storage unit means for loading the second version for execution in the processing unit of the one of said other nodes.
16. The apparatus as in claim 12, wherein the detecting means includes means for monitoring for errors during the loading and execution of the second version.
17. The apparatus as in claim 12, wherein the detection means includes means for communicating at least one message between a node and another node in the network and means for monitoring for errors in communication of the message.
18. An apparatus as in claim 12, further comprising means, responsive to a failure in transmitting the second version to one of said one or more specified memory units, for locating a memory unit having a fourth status, means for copying the
second version form the memory unit having the fourth status to said one of said one or more specified memory units, and means for changing the status of said one of said one or more specified memory units to the fourth status if the copying of the
second version is successful.
19. An apparatus as in claim 12, further comprising in a node with one or more additional processing units, means for loading said one or more additional processing units with the second version after the second version has been designated as
the preferred operational version.
20. An apparatus as in claim 12, further including, in a node with one ore more additional processing units, means for transferring processing from said processing unit to said one or more additional processing units in the node.
21. A network maintenance apparatus as in claim 12, further including means for receiving a command entered at one of the processing units for initiating a trial use by said third means. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
LIMITED COPYRIGHT WAIVER
A portion of this patent document contains material to which a claim of copyright protection is made. The copyright owner has no objection to facsimile production by anyone of the patent document, or the patent disclosure as it appears in the
United States Patent and Trademark Office patent file records, but reserves all other rights whatsoever.
FIELD OF THE INVENTION
The present invention relates to communication networks wherein a plurality of processing units are distributed among nodes being linked by communication channels. Specifically, the present invention relates to the maintenance of such
communication networks, and more specifically to method and apparatus for updating system management software in a communication network.
BACKGROUND OF THE INVENTION
Recent advances in communication and data processing technologies have given rise to a rapid development of communication networks. In addition to providing distributed data processing capability and data base sharing, modern communication
networks allow voice and data to be more efficiently transmitted from one place to another through economical topological configurations (that is the manner in which the nodes of the network are interconnected).
Moreover, whereas the geographical confines of past communication networks, such as local area networks, were limited to closely dispersed stations interconnected by cables (for example, different offices within the same building), modern
communication networks can easily span over wide geographical areas, with the stations distributed among nodes interconnected by communication media such as satellite, microwave and fiber optic transmission, using T1 transmission or other communication
facilities.
In many modern communication networks, each node has a controller which is typically a processing unit executing system software to perform such management functions as: processing the handshaking communication protocol, routing voice and data
messages (based upon topological or other information), and performing other control operations. Because enhancements and new functions may be added to the control operations, there exists a need to periodically update the system management software
being executed in the processing units of a communication network.
In conventional local area networks, updating the system management software has been performed without too much difficulty by physically swapping out memory cards which contain the older software and replacing them with cards which contain the
new software. However, in modern wide area networks such as the Integrated Digital Network Exchange (IDNX.RTM.) products marketed by Network Equipment Technologies, the assignee of the present application, this approach suffers many drawbacks. For
example, updating the system management software may become a costly operation both in terms of human resources and system downtime involved in the swapping process. Furthermore, when the size (that is the number of nodes) of a network becomes large, a
considerable amount of time may need to be spent to swap the software in all the nodes. These drawbacks may be further aggravated if test runs of the new software become necessary.
In view of the foregoing, there is a need for a method whereby system management software in a communication network can be easily updated.
SUMMARY OF THE INVENTION
The present invention is a maintenance method and apparatus for updating system software among a plurality of data processing units in a communication network. The processing units in the network are distributed among multiple nodes linked by
communication channels. The method and the apparatus operate to install a second version of system software in a first node. The second version is communicated to a set of second nodes in response to an update command. A trial use of the second
version is then initiated in a subset of the second nodes. Upon detecting predefined failures during the trial use in a second node, the second node is restored to the first version. If the trial use completes successfully in second node, the second
version is then used as a preferred version in the second node.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a communication network wherein the present invention is embodied.
FIG. 2 illustrates the logical steps of a preferred embodiment of the present invention.
FIG. 3 is a block diagram illustrating the logic structure of the master task of the preferred embodiment.
FIG. 4 is a block diagram illustrating the logic structure of the slave task of the preferred embodiment.
FIG. 5 is a block diagram illustrating the distribution of a new software.
FIG. 6 is a block diagram illustrating the initiation of a trial use.
FIG. 7 is a block diagram illustrating the restoring of a node to an old version.
FIG. 8 is a block diagram illustrating the designation of a new version of the preferred version of the system software.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention is embodied in the Integrated Digital Network Exchange (IDNX) products marketed by Network Equipment Technologies as an improvement thereto. A block diagram of an IDNX wide area communication network is illustrated in FIG.
1. The network comprises a plurality of message stations MS.sub.1, MS.sub.2, . . . distributed among nodes, N.sub.1, N.sub.2, . . .N.sub.n, which are interconnected by communication channels L.sub.1, L.sub.2, . . . L.sub.n, of, for example, the T1
communication facility. The message stations MS.sub.1, MS.sub.2, . . . communicate voice and data messages with each other. Each node has a controller 100 which controls the communication of messages between its message stations and message stations
in other nodes of the network. A controller 100 also controls the routing of messages from one node to another.
The manner in which the messages are communicated and routed, including the protocols used in connection therewith, is well known in the art and need not be described in this disclosure.
Each controller 100 comprises one or more central processing units (CPUs) 101, one or more trunk ports 105 for providing interfaces to the communication channels, one or more digital voice/data/video ports 106 for interfacing with message
stations attached to it, as well as one or more non-volatile storage devices 103, such as non-volatile random access memories, for storing system management software. In the preferred embodiment, these components are coupled by a common bus 107 through
which data and control signals are communicated. In addition, a controller optionally has input/output devices (not shown) to allow an operator to control and maintain a node in the network. For example, an operator can enter a system maintenance
command to disable or enable one or more ports within the node, or a command to download data and software from an auxiliary device to a non-volatile storage device 103.
To communicate voice and data messages between message stations, communication links are established between the nodes, based upon a variety of supported topologies. The process whereby a controller establishes links to the neighbor nodes
involves processing a prespecified handshaking protocol by a CPU 101. The CPU 101 performs this function, along with other system operations, by executing instructions residing in its working memory 108. The topological information, which is used for
communicating and routing messages between nodes, also resides in a CPU's working memory 108.
The instructions for performing the above-mentioned functions are parts of a system software which, along with the topological information, are stored in the nonvolatile storage device 103 and loaded into the working memory 108 of a CPU 101
during an initial program load (IPL), otherwise known as a boot. IPL can be activated manually by pressing a button connected to a CPU, or it can be activated by another CPU which sends an "IPL" signal to the local CPU.
The system software is stored in a non-volatile storage device 103 as a number of modules. At a predefined location in each non-volatile storage device 103, a header is provided to indicate the number of modules of the system software stored
therein. This header can be accessed by a CPU 101 as data.
The network topological information is periodically maintained and updated to reflect changes in the configuration of the network, such as when a node goes out of service or when new links are established or deleted. Also, the system software
may be changed periodically to add new functions or enhancements.
When the system software is changed, it becomes desirable to perform a process (hereinafter called a "softload process" or "softload" ) to update the non-volatile storage devices 103 in a predetermined subset of nodes in the network so that the
new system software may be used to operate the CPU's 101 of the subset of nodes.
According to the preferred embodiment of present invention, the non-volatile storage devices 103 in the network are classified, based upon the respective statuses of their contents, into four categories:
(a) A "new" non-volatile storage device 103 is a non-volatile storage device 103 which has been installed with a new version of system software. The new software version may be installed either by a download operation, or by the successful
completion of a distribution step of the softload process. A "new" non-volatile storage device 103 will not be updated or downloaded. Moreover, a "new" non-volatile storage device 103 will not be used to IPL a CPU 101.
(b) A "dirty" non-volatile storage device 103 is a non-volatile storage device 103 which contains contaminated or unusable data. A non-volatile storage device 103 is contaminated if, for example, an error has occurred during a distribution in
the softload process. A non-volatile storage device 103 is unusable if it is being loaded by the distribution step of the softload process. A "dirty" non-volatile storage device 103 will not be updated or downloaded. Moreover, a "dirty" non-volatile
storage device 103 will not be used to boot a CPU 101.
(c) A "trial" non-volatile storage device 103 is a non-volatile storage device 103 which is specified to be used in the cutover step of the softload process. Normally, only one "trial" non-volatile storage device 103 exists in a node, all other
non-volatile storage devices 103 in the node will be in one of the other states. If a "trial" non-volatile storage device 103 is present in a node, it will be used when a CPU 101 is IPL'ed, unless a predefined number of system crashes have already
occurred when the "trial" non-volatile storage device 103 is used to boot a CPU 101. In that case, the CPU 101 will be cutback to the old software and the "trial" non-volatile storage device 103 will be reverted to "new".
(d) An "old" non-volatile storage device 103 is a non-volatile storage device 103 which contains operative software of a node. A non-volatile storage device 103 is marked "old" only when it operates properly. When a CPU 101 is IPL'ed, the
content of an "old" non-volatile storage device 103 will be used if no "trial" non-volatile storage device 103 is present. If several "old" non-volatile storage devices 103 exist, the "old" non-volatile storage device 103 with the most recent version
number will be selected for an IPL. If the selected non-volatile storage device 103 operates improperly, then the next most recent non-volatile storage device 103 will be used.
Each of the non-volatile storage devices 103 has a status header identifying the status of the software stored therein. The status header of a non-volatile storage device 103 also contains the release number, version number and other data
associated with the software residing therein. This status header can be accessed by a CPU 101 as data.
FIG. 2 is a flow chart illustrating a softload process wherein is embodied the present invention.
At the beginning of the softload process, the new version of system software is installed in one of the nodes N.sub.1, N.sub.2, . . . N.sub.n in the network (block 201). The node where the new version of system software is installed is
hereinafter called a source node N.sub.s. The new system software version may be installed in more than one way. For example, the new version of system software may be programmed into a new non-volatile storage device 103 which is then coupled to the
controller 100 of the source node N.sub.s. As another example, the new version of system software may be downloaded to a non-volatile storage device 103 in the source node N.sub.s.
After the new version of system software is installed in the source node N.sub.s, the distribution of the new system software to the other nodes is initiated. Typically, basic operation of the new system software is tested before it is
distributed. Advantageously, the distribution is initiated by a "distribution command" which is entered by the operator at an input/output device in one of the nodes N.sub.1, N.sub.2, . . . N.sub.n. The node at which a distribution command is entered,
which may or may not be the source node N.sub.s, is hereinafter called the operator node N.sub.o. The distribution command contains a source field and a destination field. The source field contains a node-card combination which identifies the source
node N.sub.s as well as the location of the non-volatile storage device 103 which contains the new system software. The destination field contains a list of node-card combinations each of which identifies one of the nodes in the network which will
receive the new version of software (a destination node N.sub.d) In the preferred embodiment, only four node-card combinations are identified in the list, but it will be obvious that different values can be used. The destination field also contains the
locations of the non-volatile storage devices 103 within a destination node N.sub.d which will be used to receive the new version of system software.
Advantageously, the system is implemented so that a distribution command is recognizable by a controller 100 only when it is issued with predefined access privileges. Furthermore, such privileges may be defined with an hierarchical structure so
that the privilege required for a distribution process depends upon the importance of the software.
When the operator node N.sub.o receives a distribution command, it relays appropriate messages to both the source node N.sub.s and the destination nodes N.sub.d.
When the source node N.sub.s receives a distribution message, a check is made to see: (1) whether the source node N.sub.s is currently being installed with a new system software version, (2) whether the source node N.sub.s has been designated as
the source node N.sub.s of another softload process, and (3) whether the source node N.sub.s has a "new" non-volatile storage device 103 at the specified location. If (1) and (2) are false and (3) is true, a "consistency check" will be performed on the
"new" non-volatile storage device 103 (block 202). In the preferred embodiment, the consistency check involves a process of making a checksum verification on the "new" software, as well as checking the actual number of modules of the system software
contained in the non-volatile storage device 103. This number of modules is compared against the number stored in the header of the non-volatile storage device 103.
Upon receiving the messages, a CPU in a destination node N.sub.d makes a consistency check to see whether there is a non-volatile storage device 103 at the specified location. The destination node N.sub.d also checks whether data can be written
into the specified non-volatile storage device 103. This check is accomplished by a read-modify-write operation. In the read-modify-write operation, predetermined data is first read from the non-volatile storage device 103. This data is then modified
(e.g., by exclusively ORing it with a binary "1") and rewritten back to the non-volatile storage device 103. The non-volatile storage device 103 is then read again to check whether it is indeed modified.
After the consistency checks are completed in both the source node N.sub.s and the destination nodes N.sub.d, appropriate responses will be sent from these nodes back to the operator node N.sub.o. Upon receiving the proper responses, an operator
may confirm the softload process to all the available destinations, (2) limit the softload process to a subset of the non-volatile storage devices 103 originally specified, by changing the list of destinations, or (3) abort the softload process (block
203) completely.
If the softload process is not aborted, the status of the non-volatile storage devices 103 selected as destinations to receive the new system software are first marked "dirty" The new software will then be distributed from the source node N.sub.s
to the destination non-volatile storage devices 103 (block 204). The distribution can be carried out using one of the many well known data communication techniques in the art.
In the preferred embodiment, the new system software is transmitted in one or more batches of data packets to the destination nodes N.sub.d . Each data packet has a header containing a batch-number and packet-number. The header of the first
packet in a batch also contains the total number of packets to be sent in that batch. Each packet also contains information which specifies the location of non-volatile storage devices 103 within the controller 100 and the relative location within the
non-volatile storage device 103 where the packet is to be loaded. Each packet also contains a checksum value of its data.
During the distribution, the source node N.sub.s retains information concerning the progress of distribution so that it can relay the information to an operator upon request. The distribution can be aborted any time by the operator, although it
is usually aborted when one or more errors occur. If the distribution is aborted, a loadback process (block 203) will be preformed.
In the loadback process, an "old" non-volatile storage device 103 will be located and the content of the "old" non-volatile storage device 103 will be copied to the "dirty" non-volatile storage device 103 that was the destination of the softload
process. If the loadback process completes successfully, the status of the "dirty" non-volatile storage device 103 will be changed to "old". If the loadback process fails, the status of the non-volatile storage device 103 will remain "dirty" and an
alarm will be logged.
The data pockets are received by a CPU in each destination mode. When a CPU receives all of the data packets, it computes a checksum from the packets and compares this checksum against the checksum transmitted from the source node. If the
checksum values compare correctly, the destination node N.sub.d sends an acknowledgement to the source node, and loads the packet data into the specified memory location of the specified non-volatile storage device 103. The acknowledgement sent by a
destination node N.sub.d contains the packet-number of the packet received by the destination node N.sub.d. If no acknowledgement is received, the source node N.sub.s will retransmit the corresponding packet to the destination node N.sub.d.
When all the packets of a batch are sent, the source node N.sub.s will send an end-of-transmission message to the destination nodes N.sub.d. Upon receiving the end-of-transmission message, a destination node N.sub.d checks the number of packets
received thus far against the total number of packets it should receive. This total number was sent along in the first packet. If one or more packets are missing, a destination node N.sub.d will send the packet-numbers corresponding to the missing
packets to the source node N.sub.s. Thereupon, the source node N.sub.s will retransmit the non-delivered packets (block 205) to the destination node N.sub.d. A prespecified number of retries will be performed. If within the prespecified number of
retries, transmission of the new system software cannot be completed successfully, a loadback (block 206) will be performed.
This loadback process is similar to the above-described loadback process wherein an "old" non-volatile storage device 103 will be located and copied to the "dirty" non-volatile storage device 103. Also, if the loadback process completes
successfully, the status of the "dirty" non-volatile storage device 103 will be changed back to "old". If the loadback process fails, the non-volatile storage device 103 will be left "dirty" and an alarm will be logged.
Upon successful completion of the distribution, the status of the non-volatile storage device 103 will be changed to "new".
Following the distribution, a "cutover command" will be issued from an operator node N.sub.s, that may or may not be the same as the operator node N.sub.s which issued the distribution command. The cutover command has a destination field which
contains a list of node-card combinations each of which specified a target node, as well as a non-volatile storage device 103 within the target node which are selected for the cutover process.
Upon receiving a cutover command, the target node first saves the identification of the operator node which issued the cutover command. The target node also checks: (1) whether it is in a "trial" mode, and (2) whether it is currently involved in
a distribution. If neither (1) nor (2) is true, the cutover process will continue (block 207).
In the cutover process, the non-volatile storage device 103 specified in the cutover command will be located. The target node then checks whether the non-volatile storage device 103 is a "new" non-volatile storage device 103. If so, it performs
a consistency check on the target non-volatile storage device 103. This consistency check comprises the steps of comparing the checksums and comparing the numbers of modules in the non-volatile storage device 103.
If the consistency check completes successfully, the target node will set the status of the specified non-volatile storage device 103 to "trial". After the status of the non-volatile storage device 103 has been changed, a CPU 101 in the target
node will perform an IPL.
If the target node fails to IPL correctly, a process to cutback to an "old" non-volatile storage device 103 will be performed (block 208). In the cutback process, the target node locates an "old" non-volatile storage device 103 in the
controller. The content of this "old" non-volatile storage device 103 is then copied to the "trial" non-volatile storage device 103. The status of the "trial" non-volatile storage device 103 is changed back to "new", so that it will not be used to IPL
a CPU in the future.
After the CPU 101 is IPL'ed successfully, a trial use of the new system software is performed (block 209). This trial use includes the step of initiating a dialogue between the target node and the operator node N.sub.o that initiated the cutover
command. According to the preferred embodiment, the dialogue involves one or more exchanges of acknowledgments between the target node and the operator node N.sub.o. If the dialogue fails, a cutback process is initiated. (block 210).
In this cutback process, the target node checks: (1) whether it is performing a softload process, (2) whether it is running on system software from a "trial" non-volatile storage device 103, (3) whether there is only one "trial" non-volatile
storage device 103 in the target node, and (4) whether there is an "old" non-volatile storage device 103 in the target node. If (1) is false, and (2) (3) and (4) are true, the target node will locate an "old" non-volatile storage device 103 in the
target node which has the most recent version of system software. When this "old" non-volatile storage device 103 is located, the target node will change the status of the "trial" non-volatile storage device 103 to "old". After the status is changed,
the content of the "old" non-volatile storage device 103 will be copied to the "trial" non-volatile storage device 103, and the CPU 101 will perform an IPL. In a node with multiple processors when a CPU becomes inoperational during the trial use or
because of a bad "trial" system software, its functions may be taken over by another CPU in the node.
If the trial use completes successfully, an update process will be performed (block 211). In the update process, the content of the "trial" non-volatile storage device 103 will be copied to all the non-volatile storage devices 103 in the node.
The status of these non-volatile storage devices 103 will then be changed to "old" so that they will be used to IPL the CPU's.
In the preferred embodiment, the softload process is implemented by a master task and a slave task running in the CPU's 101 of each node. When a "distribution command" is entered at the operator node N.sub.o, the master task at that node will
accept the command, and based upon the context thereof, relays appropriate messages to the slave task of the source node N.sub.s and the respective slave tasks of the destination nodes N.sub.d.
At the source node N.sub.s, the master task translates the messages into directives, and send them to its slave task. Responsive to the directives, the slave task obtains information about the non-volatile storage devices 103 and sends this
information back to the operator node N.sub.o for display thereat.
The slave tasks at the destination nodes N.sub.d also collect information about their non-volatile storage devices 103 (e.g. software versions, statuses, or whether a non-volatile storage device 103 can be programmed) and send this information
back to the operator node N.sub.o.
The information from the source node N.sub.s and destination nodes N.sub.d are used by the master task at the operator node N.sub.o to provide the initial checking and selection to exclude nodes that are not appropriate to be a source node for
the softload process (e.g. nodes which are currently performing a softload process, nodes currently acting as source nodes for other softload processes, or nodes that do not contain a "new" non-volatile storage device 103, etc.). The master task at the
operator node N.sub.o may also exclude nodes which are not appropriate to be destination nodes N.sub.d (e.g. nodes with no communication links connected to the source node, nodes that are not operational, or nodes which have no non-volatile storage
device 103, etc.).
When the softload process is confirmed, the master task at the source node N.sub.s partitions the new system software into data packets and transfers them to the slave tasks of the destination nodes N.sub.d. The communication of data packets to
the slave tasks involves the aforementioned sequencing, flow control and retransmission (if necessary) between the master task of the source node N.sub.s and the slave tasks of the destination nodes N.sub.d.
The master task at the source node N.sub.s also retains information concerning the status of the softload process for reporting to the operator when requested.
Optionally, occurrences of such events as breakdowns of communication links or failures in a non-volatile storage device 103 may be logged by the master tasks for future uses.
During the softload process, the slave tasks will update the status of a non-volatile storage device 103 as required. For example, during distribution, the status of a non-volatile storage device 103 will be initially changed to "dirty", and
later to "new" if the non-volatile storage device 103 receives the system software successfully.
During a distribution, the respective slave tasks of the destination nodes N.sub.d receive data packets from the master task of the source node N.sub.s. Each slave task loads the data packets into their specified non-volatile storage devices
103. If retransmission of data packets is required (e.g. when non-delivery of a packet is detected), the slave task of the destination node N.sub.d will send a retransmission request to the source node N.sub.s.
If the non-volatile storage device 103 in a destination node N.sub.d cannot be loaded successfully within the predefined number of retries, its slave task will request its local master task to locate an "old" software, either locally or from a
neighbor node. The "old" software is them copied to the "dirty" non-volatile storage device 103. If the non-volatile storage device 103 then operates properly, it will be changed to "old". If this loadback fails, the non-volatile storage device 103
will remain "dirty".
When a cutover command is received, the slave tasks the destination nodes N.sub.d change the status of the non-volatile storage device 103 to "trial" so that it can be used to IPL a CPU 101.
In general, interactions between the master and the slave tasks involve:
(1) A request for information from the master task at the source node N.sub.s to the slave tasks at the destination nodes N.sub.d concerning the non-volatile storage devices 103 at the destination nodes N.sub.d. In response, each slave task at
the destination node N.sub.d sends the requested information back to the master task. Once the information is received, the master task can select destinations of a distribution.
(2) Once the correct destination nodes N.sub.d have been selected, the master task of the source node N.sub.s informs the slave tasks of the destination nodes N.sub.d regarding the number of packets the master task will send, the number of
packets in a batch after which it will expect an acknowledgement, the non-volatile storage device 103 to be used for distribution, and the identification (version and release number) of the system software to be distributed.
(3) The master task in the source node N.sub.s builds data packets from the new system software and sends the packets to the destination nodes N.sub.d in batches. After a batch is sent to a destination node N.sub.d, the master task at the source
node N.sub.s will wait for an acknowledgement from the destination node N.sub.d. If the batch is received successfully, the slave task in the destination node N.sub.d sends an reply message to the master task at the source node N.sub.s to acknowledge
the receipt thereof. Otherwise, the slave task will send a retransmission request accompanied by the numbers of packets not yet received, so that the master task at the source node N.sub.s can retransmit the packets that have not been received.
(4) If the operator decides to abort the softload process, the master task at the operator node N.sub.o will issue an abort request and the master task of the source node N.sub.s will stop sending data packets.
(5) The master task at the operator node N.sub.o sends a cutover request to the respective slave tasks of the destination nodes N.sub.d when cutover in these destination nodes N.sub.d is desired. The slave tasks at these nodes will acknowledge
receipt of the packet before the performing cutover.
The master task in the preferred embodiment is implemented as a computer program executing within one or more CPU 101 of the controller 100 in each node. A C-Programming Language listing of a master task of the preferred embodiment is given in
Appendix A. A block diagram of a master task is illustrated in FIG. 3.
Referring to FIG. 3, when a CPU 101 is first loaded, the master task 300 will enter into a block 301 whose computer code is given in Appendix A pages 1-2. In block 301, the master task 300 first checks a control flag SL.sub.-- TRIAL.sub.-- SW,
within block 306 to see whether the node is in a "trial" mode. If SL.sub.-- TRIAL.sub.-- SW is set, the master task activates a trial handler (SLM.sub.-- TrialStsHandler) 302 of page 3 in Appendix A whereby a trial use of a "trial" non-volatile storage
device 103 will be performed. This ensures that the new software version still maintains connectivity with a neighbor node.
If the trial process is not run or if the trial process is run successfully, the master task will enter into a message receiver 303 within block 301. The message receiver 303 receives operator commands 304 and messages 305 from slave tasks in
the network. In response to an operator command or a message, the message receiver 303 activates one of a plurality of processes in the message handler block 307. These processes in block 307 include a query handler 308 (SLM.sub.-- QueryHandler,
Appendix A page 4) a softload handler 309 (SLM.sub.-- SoftLoadHandler, Appendix A page 5), a loadback handler 310 (SLM.sub.-- LoadBackHandler, Appendix A page 6), a cutover handler 312 (SLM.sub.-- CutOverHandler, Appendix A page 8), an update handler 313
(SLM.sub.-- UpdateHandler, Appendix pages 9-10), and an acknowledge handler 314 (SLM.sub.-- AckHandler, Appendix A pages 10-11).
The handlers 308-314 in the blocks 307 are coupled to a plurality of master facilities, collectively illustrated as a block 315, which provide services to the handlers, 308-314. An implementation of some of these facilities is given as an
example in the C-Programming Language listing of Appendix C.
The slave task in the preferred embodiment is also implemented as a computer program executing within a CPU 101 in the controller 100 of each node. A C-Programming Language listing of the slave task of preferred embodiment is given in Appendix
B. A block diagram of a slave task is illustrated in FIG. 4.
Referring to FIG. 4, there is shown a slave task 400 which includes a message receiver 401. The code of the message receiver according to the preferred embodiment is given in pages 1-2 of Appendix B, The message receiver 401 receives messages
from the master tasks and, in response to a message, activates one | | |