|
|
|
| United States Patent | 4805107 |
| Link to this page | http://www.wikipatents.com/4805107.html |
| Inventor(s) | Kieckhafer; Roger M. (Ellicott City, MD);
Finn; Alan M. (Amston, CT);
Walter; Chris J. (Columbia, MD) |
| Abstract | A task scheduler for a fault tolerant multiple node processing system
having a task activity list storing a set of application tasks, a priority
scan list storing a selected portion of the set of application tasks, a
completion status list also storing the same selected portion of the set
of application tasks. A wake-up sequencer transfers the application tasks
from the task activity list to the priority scan list, and a priority
scanner transfers the application tasks ready for execution from the
priority scan list to a selection queue. A next task selector selects the
next application task that its node will execute, and a task started
register stores the identity of the application tasks completed by the
other nodes. A task interactive consistency (TIC) handler updates the
status of the application tasks stored in the task activity list, the
priority scan list, and the completion status list in response to messages
received from the other nodes identifying which nodes completed tasks. The
task interactive consistency handler checks the scheduling process of each
node by comparing the application tasks it reported to have started with
the highest priority application task scheduled for that node in its
selection queue. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 4805107 |
|
|
Task scheduler for a fault tolerant multiple node processing system |
|
|
|
|
|
| Publication Date |
February 14, 1989 |
|
|
|
|
|
| Filing Date |
April 15, 1987 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. In a mutliple node fault tolerant processing system for processing a set
of application tasks in which each node has an applications processor for
executing a predetermined subset of said set of application tasks and an
operations controller for controlling its own node and scheduling the
application tasks in said predetermined subset of tasks that are to be
executed by the applications processor through an exchange of inter-node
messages containing data and operation information with each node in the
processing system, the operations controller generating at least two
timing period intervals, a fundamental timing period and a master period
which is an integer multiple of the fundamental timing period, the master
period defining a timing interval during which every task in said
predetermined subset of tasks is scheduled for execution by the
applications processor at least once, each operations controller having a
task scheduler comprising:
a task activity list containing an entry for each application task in said
multiple node processing system, each entry containing an execution
periodicity and a node allocation for that application task;
a priority scan list containing a selected portion of said application
tasks in the task activity list which are available for execution, said
selected portion of said application tasks being stored in their preferred
order of execution;
a completion status list storing said selected portion of said application
tasks stored in said priority scan list;
a selection queue storing for each node said application tasks ready for
selection in their preferred order of execution;
a period counter for counting said fundamental timing periods to generate a
period count corresponding to a number of fundamental periods which have
expired since a beginning of a new master period;
wake-up sequencer means connected to said task activity list, said priority
scan list, said completion status list and said period counter for
interrogating said task activity list to transfer to said priority scan
list and said completion status list all of said application tasks whose
periodicity is greater than said period count;
priority scan means connected between said priority scan list and said
selection queue for transferring to said selection queue for each node
entry three application tasks which are ready for execution by that node,
which have a highest priority in said preferred order of execution;
task selector means connected to said selection queue for selecting in said
preferred order of execution, said application task currently stored in
said selection queue for its own node as a next task scheduled for
execution by its own applications processor; and
a task interactive consistency handler connected to said task activity
list, said priority scan list, said completion status list and said
selection queue for updating the status of each task in said task activity
list, said priority scan list, said completion status list and said
selection queue which are identified in inter-node messages reporting the
completion of a task.
2. The task scheduler of claim 1 wherein each application task entry of
said completion status list has a completion count entry storing a 2's
complement of a number of nodes which are scheduled to execute that
application task, said task interactive consistency handler having means
for incrementing said completion count in response to inter-node messages
identifying which node completed that application task and for setting a
terminated flag when said completion count is decremented to zero
indicating that the task has been executed by each of the nodes scheduled
to execute that task.
3. The task scheduler of claim 1 wherein each task entry of said task
activity list and said priority scan list has a predecessor count entry
indicative of a number of tasks which must be terminated before it can be
executed, said task interactive consistency handler having a successor
list storing an identity of all the tasks for which a terminated task is a
predecessor, and means responsive to a termination of a task for accessing
said successor list to identify each task for which said terminated task
is a predecessor, and for decrementing the predecessor count in said task
activity list and said priority scan list for each of said identified
tasks.
4. The task scheduler of claim 3 further having an "old task" table
connected to said task interactive consistency handler, said "old task"
storing for each node the task currently being executed by that node, said
task interactive consistency handler having means for recording as "used"
in said selection queue a task having a highest priority in said preferred
order of execution currently stored for each node which reported it has
started a new task and for recording an identity of said task having said
highest priority in said "old task" table.
5. The task scheduler of claim 1 in which the inter-node messages exchanged
between said multiple nodes includes a task completed/started message and
a task interactive consistency message, said task completed/started
message is sent to each node in said multiple node processing system
whenever a node begins a new task, said task completed/started message
containing at least an identity of the task started and an identity of the
task completed by that node and said task interactive consistency message
is sent at predetermined timing intervals and contains a task completed
vector identifying each node which sent a task completed/started message,
said task completed vector being a voted composite of said task
completed/started messages received from each of said multiple nodes in a
predetermined timing interval, said task scheduler having a started task
register connected to said task interactive consistency handler for
storing said identity of the task reported as started in said task
completed/started messages received from that node and said task
interactive consistency handler responsive to said task completed vector
contained in said task interactive consistency message to compare an
identity of the task having said highest priority stored for each node
identified as having completed a task in said task completed vector with
an identity of the task stored in said task started register and to
generate a sequence error signal said identity of the task having said
highest priority is different from said identity of the task in said task
completed vector.
6. The task scheduler of claim 5 having a Byzantine voter voting on the
task completed vectors received in the task interactive consistency
messages from all the nodes to generate a Byzantine voted task completed
vector.
7. The task scheduler of claim 6 having an "old task" table connected to
said task interactive consistency handler storing for each node the task
currently being executed by that node, said task interactive consistency
handler having means for recording as "used" in said selection queue the
task having said highest priority currently stored for each node
identified in said Byzantine voted task completed vector as having
completed a task and for recording in said "old task" table for each node
identified in said Byzantine voted task completed vector, an identity of
the task just recorded as "used" in said selection queue.
8. The task scheduler of claim 7 wherein each task entry of said completion
status list has a completion count entry storing a 2's complement of a
number of nodes which are scheduled to execute that task, said task
interactive consistency handler accessing said "old task" table to obtain
an identity of a current task for each node identified as having completed
a task by said Byzantine voted task completed vector and using said
identity of said current task to access said completion status list to
increment said completion count entry, said task interactive consistency
handler including means for setting a terminated flag when said completion
count entry is decremented to zero and for generating an internal message
identifying each task whose terminated flag is set.
9. The task scheduler of claim 8 wherein each task of said set of
application tasks has an associated maximum execution time and an
associated minimum execution time and wherein said task scheduler has a
plurality of execution timers connected to said "old task" table and said
interactive consistency handler, one execution timer for each node for
storing a 2's complement of the execution time for a current task, said
task interactive consistency handler being responsive to said Byzantine
voted task completed vector to generate an execution time error for each
node identified as having completed a task whose associated maximum
execution time is greater than said execution time stored in its
associated execution timer, said task interactive consistency handler
further loading said associated execution timer with a 2's complement of
the maximum execution time for a next highest priority unused task in the
selection queue for that node, each of said execution timers being
periodically incremented and operative to generate a timing error signal
when said execution time is incremented to zero.
10. The task scheduler of claim 9 wherein each task entry of said task
activity list and said priority scan list includes a predecessor count
corresponding to a number of predecessor tasks that must be completed
before that task can be executed, and an allocation vector identifying
each node on which that task can be executed, said task activity list
further having a periodicity number corresponding to a number of
fundamental timing periods which must pass before the task can be executed
again, and said priority scan list having an iteration entry corresponding
to a number of times that task has been executed, said wake-up sequencer
comparing said period count generated by said period counter with the
periodicity stored for each task to transfer said predecessor count and
said allocation vector from said task activity list to an appropriate
entry in said priority scan list.
11. The task scheduler of claim 10 wherein said selection queue stores
three entries for each node in the processing system, each entry having a
"used" flag, an iteration count and a task identification code, said
priority scan means scans said priority scan list to transfer to said
selection queue, for each node, said three tasks having said highest
priority whose precedent count is zero and whose allocation vector
indicates the task is to be executed by that node, said tasks being stored
in said selection queue in said order of preferred execution.
12. The task scheduler of claim 11 wherein said selection queue has three
pages, a NEXT page from which said task selector means selects the tasks
scheduled for execution by the applications processor, a PREVIOUS page,
and a CHECK page, each of said three pages containing three entries for
each node, and each entry having a "used" bit which is set to true when
said task selector means selects that task, an iteration number which
specifically identifies this task from prior executions of said task, and
a task identification code which uniquely identifies that task from all
other tasks.
13. The task scheduler of claim 12 wherein said task interactive
consistency handler marks as "used" on said CHECK page, an unused task
having said highest priority for each node identified in said Byzantine
voted task completed vector as having completed a task and records the
task identification code of the entry just marked "used" in the entry for
that same node in said "old task" table, said task interactive consistency
handler further marking as "used" on said PREVIOUS and NEXT pages for each
node said task interactive consistency handler marked "used" on said CHECK
page.
14. The task scheduler of claim 13 wherein said operations controller has a
sub-atomic period timing interval which is a subdivision of said
fundamental timing period, said task interactive consistency handler
transferring each entry of said PREVIOUS page to said CHECK page and
transferring each entry of said NEXT page to said PREVIOUS page at a
beginning of each of said subatomic period timing intervals prior to said
priority scan means transferring a next set of highest priority tasks to
said NEXT page for each node.
15. The task scheduler of claim 13 wherein said operations controller has a
sub-atomic period timing interval which is a subdivision of said
fundamental timing period, and said task interactive consistency handler
has a plurality of pointers, one for each of said NEXT, PREVIOUS and CHECK
pages of said selection queue, said task interactive consistency handler
rotating said plurality of pointers such that the NEXT page becomes the
PREVIOUS page, the PREVIOUS page becomes the CHECK page, and the CHECK
page becomes the NEXT page at a beginning of each sub-atomic period prior
to said priority scan means transferring a next set of highest priority
tasks to said NEXT page for each node.
16. The task scheduler of claim 13 wherein said task interactive
consistency message further includes a branch condition bit for each node
which completed a task, said branch bit being a 1-bit identifying a first
of two possible successors for the completed task and a 0-bit identifying
a second of said two possible successors, said Byzantine voter also
performs a Byzantine vote on said branch condition bits to generate a
Byzantine voted branch byte containing voted values of said branch
condition bits for each node, and wherein each entry of said completion
status list has a completion count entry storing a number of nodes which
completed the task, a branch count entry storing a number of nodes whose
branch condition bit for that task is a 1-bit and an allocation vector,
said task interactive consistency handler obtaining for each node
identified in said Byzantine voted task completed vector as having
completed a task, an identity of a completed task from said "old task"
table, and updating the status of said completed tasks in said completion
status list by incrementing the completion count once for each node that
completed that task, by adding said Byzantine voted branch bit to said
branch count and by setting to "false" an allocation bit in said
allocation vector.
17. The task scheduler of claim 16 wherein said successor list has for each
completed task a first successor task list containing the successor tasks
to be executed when said branch condition bit is a 1-bit and a second
successor task list containing the successor tasks to be executed when
said branch condition bit is a 0-bit, said task interactive consistency
handler comparing said branch count with one-half the value of said
completion count, in response to all the allocation flags being set to
"false," to determine which branch condition bit is agreed on by a
majority of nodes which completed the task, accessing said successor task
list for that task and using the branch condition bit of said majority to
get an identification code for a successor task and using said task
identification code to access said successor task in said task activity
list and said priority scan list to increment the predecessor count for
that task.
18. The task scheduler of claim 1 wherein said operations contoller has
means for excluding from and readmitting nodes to active participation in
said multiple node processing system and said operations controller
further has means for periodically generating a system state vector
identifying each node currently excluded from participating in said
multiple node processing system, said task scheduler further having a
reconfiguration module comprising:
a system state comparator for comparing each received system state vector
with an immediately preceding system state vector to generate a delta
vector identifying each newly excluded or readmitted node;
a relevance table storing a relevance vector for each task identifying
which nodes may execute the task;
a swap table storing a predecessor count, said periodicity and a swap count
for each task, said swap count corresponding to a number of nodes relative
to that task that may be excluded or readmitted before its swap status is
to be complemented;
a task swapper connected to said swap table and said task activity list for
identifying each task to which the node identified in the delta vector was
relevant to test said swap count and to complement its swap status if said
swap count is zero, said swap status being indicative of whether the task
will be included in said task activity list or removed from said task
activity list in the system state identified by the system state vector,
said task swapper further decrementing said swap count in response to the
node identified in the delta vector being excluded and incrementing said
swap count in response to the node identified in said delta vector being
readmitted;
an allocation table storing an allocation count for each node-task
combination indicative of the current node allocation status of each task;
a preferred table storing a preferred vector for each task listing the
nodes preferred for the execution of that task;
task reallocator means connected to said allocation table, said preferred
table and said task activity list for determining for each node, from said
preferred table, if the node identified in the delta vector was more
preferred for the execution of that task than itself, for complementing
the allocation status for that node when the node is more preferred than
the node identified in the delta vector and said allocation count is zero,
for decrementing said allocation count when said node identified in said
delta vector is more preferred and is an excluded node and for
incrementing said allocation count when said node identified in said delta
vector is more preferred and is a readmitted node; and
means for recording in said task activity list said predecessor count and
the periodicity from the swap table and the allocation status of each task
from said allocation table whose swap status indicates that the task
should be active in the next system state of the system.
19. The task scheduler of claim 18 wherein said set of applications tasks
includes a first set of tasks to be executed only by said non-excluded
nodes and a second set of tasks to be executed only by the excluded nodes
and said relevance table includes an excluded flag bit which identifies
each task in said second set of tasks, said reconfiguration module further
includes a task status matcher which will compare for each task, the
status of each node identified in said next system state vector with said
excluded flag bit to generate a match status bit enabling said means for
recording to record that task in said task activity list. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
CROSS REFERENCE
This invention is related to commonly assigned, copending patent
application Ser. Nos. 038,813 and 038,818 filed on Apr. 15, 1987,
concurrently herewith.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention is related to the field of multiple node processing systems
and in particular to a task scheduler for scheduling the tasks to be
executed by the individual nodes in the multiple node processing system.
2. Description of the Prior Art
The earliest attempts to produce fault tolerant computer systems provided
redundant computers in which each computer simultaneously executed every
task required for the control operation. Voting circuits monitoring the
outputs of the multiple computers determined a majority output which was
assumed to be the correct output for the system. In this type of system, a
faulty computer may or may not be detected and the faulty computer may or
may not be turned off.
The redundant computer concept, although highly successful, is expensive
because it requires multiple computers of equivalent capabilities. These
systems require powerful computers because each computer has to perform
every task required for the operation of the system. As an alternative,
the master-slave concept was introduced in which the operation of several
computers were controlled and coordinated by a master control. The master
control designated which tasks were to be executed by the individual
computers. This reduced the execution time of the control operation
because all the computers were no longer required to execute every task,
and many of the tasks could be executed in parallel. In this type of
system when a computer is detected as faulty, the master could remove it
from active participation in the system by assigning the task that would
normally have been assigned to the faulty computer to the other computers.
The problem encountered in the master-slave concept is that the system is
totally dependent upon the health of the master and if the master fails
then the system fails. This defect may be rectified by using redundant
master controls, however, the increased cost of redundant masters limits
the applicability of these systems to situations where the user is willing
to pay for the added reliability. Typical of such situations are the
controls of nuclear power plants, space exploration and other situations
where failure of the control system would endanger lives.
Recent improvements to the master-slave and redundant execution fault
tolerant computer systems are discussed above exemplified in the October
1978 proceedings of the IEEE, Volume 66, No. 10, which is dedicated to
fault tolerant computer systems. Of particular interest are the papers
entitled "Pluribus: An Operational Fault Tolerant Microprocessor" by D.
Katuski et al., Pages 1146-1159 and "SIFT: The Design and Analysis of a
Fault Tolerant Computer for Aircraft Control" by J. H. Wensley et al.,
Pages 1240-1255. The SIFT system uses redundant execution of each system
task and of the master control functions. The Pluribus system has a master
copy of the most current information which can be lost if certain types of
faults occur.
More recently a new fault tolerant multiple computer architecture has been
disclosed by Whiteside et al, in U.S. Pat. No. 4,256,547, in which each of
the individual task execution nodes has an applications processor and an
operations controller which functions as a master for its own node. These
operations controllers, in coordination with each other through the
exchange of data and other information by means of inter-node messages,
select the task its own node's applications processor will execute. The
task selection by the individual operations controllers is made on a
distributed basis such that the execution of each task required for the
operation of the control system may be selected by more than one of the
operations controller in a fault tolerant manner. In this system each node
is assigned a subset of the tasks it is capable of selecting and executing
and no node is required to execute every task. The operations controllers
are individually capable of detecting faulty nodes and excluding them from
participation in the system. A predecessor of the multiple computer system
is described by C. J. Walter et al in their paper "MAFT: A Multicomputer
Architecture for Fault-Tolerance in Real-Time Control Systems" published
in the proceedings of the Real Time System Symposium, San Diego, Dec. 3-6,
1985.
The present invention is a task scheduler for an operations controller in a
fault tolerant multiple node processing system. This task scheduler is
comparable to the scheduler taught by Whiteside et al in U.S. Pat. No.
4,323,966 and is individually set forth in Freedman et al in U.S. Pat. No.
4,318,173.
SUMMARY OF THE INVENTION
The invention is a task scheduler for the operations controller of a
multiple node fault tolerant processing system capable of processing a set
of application tasks. Each node in the fault tolerant processing system
has an applications processor for executing a predetermined subset of the
set of application tasks and an operations controller for controlling the
operation of the node and scheduling the order in which the individual
tasks in the predetermined set of tasks are to be executed by the
applications processor through the exchange of inter-node messages with
the other nodes in the system. These messages contain data and operating
information necessary for the operations of the individual nodes and for
the execution of the predetermined subset of tasks. The operations
controllers further generate at least two timing period intervals, a
fundamental timing period and a master period which is an integer multiple
of the fundamental period. The master period defines the timing interval
during which every task in the predetermined subset of tasks is scheduled
for execution by the applications processor at least once.
The task scheduler has a task activities list containing an entry for each
active task in the multiple node processing system, each entry containing
an execution periodicity and a node allocation, a priority scan list
containing a list of tasks in their preferred order of execution, a
completion status list containing an entry for each task stored in the
priority scan list, and a selection queue storing for each node the task
ready for selection in their preferred order of execution. The task
scheduler further includes a period counter for counting the fundamental
timing periods to generate the period count corresponding to the number of
fundamental timing periods that have expired since the beginning of a new
master period, a wake-up sensor means for interrogating the task activity
list to transfer to the priority scan list and the completion status list
all of the tasks whose periodicity is greater than the period count, and
priority scan means for transferring to the selection queue, for each node
entry, the three highest priority tasks which are ready for execution by
that node. The task scheduler also includes a task selector means for
selecting for its own node the highest priority task currently stored in
the selection queue as the next task scheduled for execution by its own
applications processor, a task interactive consistency handler for
updating the status of each task in the task activity list, the priority
scan list, the task completed list and the selection queue which are
identified in inter-node messages reporting the completion of a task.
The object of the invention is a priority based, data driven task scheduler
for selecting the task to be executed by its own applications processor.
Another object of the invention is a task scheduler which concurrently
tracks the scheduling process in every other node in the system.
Another object of the invention is a scheduler which can detect when any of
the other nodes in the system makes a scheduling error.
Another object of the invention is a scheduler capable of reallocating the
task when the number of active nodes in the system changes.
Another object of the invention is a scheduler in which the reallocation of
tasks may include the deletion of some tasks and the addition of the other
tasks in response to certain changes in the number of active nodes in the
system.
These and other objects of the invention will become more apparent from a
reading of the specification in conjunction with the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of the multi-computer architecture;
FIG. 2 is a block diagram of the Operations Controller;
FIG. 3 is the master/atomic period timing diagram;
FIG. 4 is a the atomic/subatomic period timing diagram;
FIG. 5 is a block diagram of the Transmitter;
FIG. 6 is a circuit diagram of one of the interfaces;
FIG. 7 is a block diagram of the Arbitrator;
FIG. 8 shows waveforms for the Self-Test Arbitration Logic;
FIG. 9 is a block diagram of the Longitudinal Redundancy Code Generator;
FIG. 10 is a block diagram of a Receiver;
FIG. 11 is a block diagram of the Message Checker;
FIG. 12 is a block diagram of the decision logic for the Between Limits
Checker;
FIG. 13 is the format for the error status byte generated by the message
Checker;
FIG. 14 is a block diagram of the Fault Tolerator;
FIG. 15 shows the partitioning of the Fault Tolerator RAM;
FIG. 16 shows the format of the Message partition of the Fault Tolerator
RAM;
FIG. 17 shows the format of the Error Code Files partition of the Fault
Tolerator RAM;
FIG. 18 shows the format of the Group Mapping partition of the Fault
Tolerator RAM;
FIG. 19 shows the format of the Error Code Files partition of the Fault
Tolerator RAM;
FIG. 20 shwos the format of the Penalty Weight partition of the Fault
Tolerator RAM;
FIG. 21 is a block diagram of the Fault Tolerator's Message Checker
Interface;
FIG. 22 is a block diagram of the Fault Tolerator's Error Handler;
FIG. 23 is a block diagram of the Error Handler's Error Consistency
Checker;
FIG. 24 is a block diagram of the Error Handler's Validity Checker;
FIG. 25 illustrates the format of the error byte in an error message;
FIG. 26 is a timing diagram of the reconfiguration sequence;
FIG. 27 is a block diagram of the Voter Subsystem;
FIG. 28 is a flow diagram for the Upper and Lower Medial Value Sorters;
FIG. 29 is a circuit diagram of the Lower Medial Value Sorter;
FIG. 30 is a flow diagram for the Averaging Circuit;
FIG. 31 is a circuit diagram of the Averaging Circuit;
FIG. 32 is a flow diagram of the Deviance Checker;
FIG. 33 is a circuit diagram of a Deviance Checker;
FIG. 34 is a block diagram of the Scheduler;
FIG. 35 shows the data format of the Scheduler RAM;
FIG. 36 shows the data format of the Scheduler ROM;
FIG. 37 is a block diagram of the Scheduler's Task Selector Module;
FIG. 38 is a flow diagram of the Wake-Up Sequencer's operation;
FIG. 39 is a flow diagram of the Execution Timer's operation;
FIG. 40 is a flow diagram of the TIC Handler's operation;
FIG. 41 is a flow diagram of the TIC Handler's Selection Queue Update
sub-process;
FIG. 42 is a flow diagram of the TIC Handler's Completion/Termination
sub-process;
FIG. 43 is a flow diagram of the TIC Handler's Execution Timer Reset
sub-process;
FIG. 44 is a flow diagram of the TIC Handler's Priority Scan List Update
sub-process;
FIG. 45 is a flow diagram of the Priority Scanner's operation;
FIG. 46 is a flow diagram of the Next Task Selector's operation;
FIG. 47 is a block diagram of the Reconfigure Module;
FIG. 48 is a flow diagram for the Task Swapper's operation in response to a
Node being excluded from the operating set;
FIG. 49 is a flow diagram of the Task Swapper's operation in response to a
Node being readmitted to the operating set;
FIG. 50 is a flow diagram of the Task Reallocator's operation in response
to a Node being excluded from the operating set;
FIG. 51 is a flow diagram of the Task Status Matcher's operation;
FIG. 52 is a block diagram of the Task Communicator;
FIG. 53 is a partial block diagram of the Task Communicator showing the
elements associated with the operation of the Store Data Control;
FIG. 54 is a flow diagram of the Store Data Control's operation;
FIG. 55 is a partial block diagram of the Task Communicator showing the
elements associated with the operation of the DID Request Handler;
FIG. 56 is a flow diagram of the DID Request Handler's operation;
FIG. 57 is a partial block diagram of the Task Communicator showing the
elements associated with the operation of the Task Terminated Recorder;
FIG. 58 is a flow diagram of the Task Terminated Recorder's operation;
FIG. 59 is a partial block diagram of the Task Communicator showing the
elements associated with the operation of the Task Started Recorder;
FIG. 60 is a flow diagram of the Task Started Recorder's operation;
FIG. 61 is a partial block diagram of the Task Communicator showing the
elements associated with the operation of the AP Input Handler;
FIG. 62 is a flow diagram of the AP Input Handler's operation;
FIG. 63 is a partial block diagram of the Task Communicator showing the
elements associated with the operation of the AP Output Handler;
FIG. 64 is a flow diagram showing the AP Output Handler's operation;
FIG. 65 shows the format of the DID information as stored in the DID List;
FIG. 66 shows the format of the DID information with the NUDAT bit
appended;
FIG. 67 is a partial block diagram of the Task Communicator showing the
subsystems involved in "reconfiguration";
FIG. 68 is a flow diagram showing the operation of the Reconfigure Control
during reconfiguration;
FIG. 69 is a partial block diagram of the Task Communicator showing the
subsystems involved in "reset";
FIG. 70 is a flow diagram of the Reset Control during reset;
FIG. 71 is a block diagram of the Synchronizer;
FIG. 72 shows the format of the Synchronizer Memory;
FIG. 73 shows the format of the Message Memory;
FIG. 74 shows the format of the Time Stamp Memory;
FIG. 75 shows the format of the Scratch Pad Memory;
FIG. 76 shows the waveforms of the signals generated by the Timing Signal
Generator;
FIG. 77 is a block diagram of the Synchronizer Control;
FIG. 78 is a flow diagram showing the operation of the Data Handler and
Expected Message Checker;
FIG. 79 is a flow diagram showing the operation of the Within Hard Error
Window and Soft Error Window Checker and the Time Stamper;
FIG. 80 is a flow diagram for the operation of the "HEW to warning count";
FIG. 81 is a partial block diagram of the Synchronizer showing the elements
associated with the operation of the Message Generator;
FIG. 82 is a flow diagram of the operation of the Message Generator and the
Transmitter Interface;
FIG. 83 shows the waveforms of the timing signals for generating a TIC
message;
FIG. 84 shows the waveforms of the timing signals for generating a sync
System State message;
FIG. 85 shows the format of the "cold start" pre-sync message;
FIG. 86 is a flow diagram showing the operation of the Synchronizer during
a "cold start";
FIGS. 87 and 87a are flow diagrams showing the generation of the HEW to
warning signal during "cold start";
FIG. 88 is a flow diagram showing the storing of data during a "cold
start";
FIG. 89 is a flow diagram showing the operation of the Operating Condition
Detector during a "cold start";
FIG. 90 is a timing diagram used in the description of the "cold start";
FIG. 91 is a flow diagram of the operation of the Synchronizer during a
"warm start";
FIG. 92 is a timing diagram used in the description of a "warm start";
FIG. 93 is a flow diagram of the operation of the Byzantine Voter to
generate Byzantine voted task completed vector and Byzantine voted branch
condition bits for the Scheduler;
FIG. 94 is a perspective of the Byzantine Voter's three-dimensional memory;
FIG. 95 shows the two-dimensional format of ISW vectors resulting from the
first Byzantine vote on the three-dimensional ISW matrices; and
FIG. 96 is a functional circuit diagram of the Byzantine Voter.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The multi-computer architecture for fault tolerance is a distributed
multi-computer system based on the functional and physical partitioning of
the application tasks and the overhead functions, such as fault tolerance
and systems operations. As shown in FIG. 1, the multi-computer
architecture consists of a plurality of Nodes 10A through 10N, each having
an Operations Controller 12 for performing the overhead functions and an
Applications Processor 14 for executing the application tasks.
For each application, the multi-computer architecture is required to
execute a predetermined set of tasks, collectively called application
tasks. Each Node is allocated an active task set which is a subset of the
application tasks. Each Node in coordination with all of the other Nodes
is capable of selecting tasks from its active task set and executing them
in a proper sequence. The active task set for each Node may be different
from the active task set allocated to the other Nodes and each task in the
application tasks may be included in the active tasks set of two or more
Nodes depending upon how many Nodes are in the system and the importance
of the task to the particular application. In this way, the multi-computer
architecture defines a distributed multi-computer system in which no one
Node 10 is required to execute every one of the application tasks, yet the
failure of one or more Nodes need not prevent the execution of any
application task. As shall be more fully explained later on, the active
task set in each Node is static for any given system configuration or
system state and will change as the system state changes with an increase
or decrease in the number of active Nodes. This change in the active task
set called "reconfiguration," takes place automatically and assures that
every one of the important or critical application tasks will be included
in the active task set of at least one of the remaining active Nodes in
the system.
Each Node 10A through 10N is connected to every other Node in the
multi-computer architecture through its Operational Controller 12 by means
of a private communication link 16. For example, the Operations Controller
"A" is the only Operations Controller capable of transmitting on
communication link 16a. All of the other Nodes are connected to the
communication link 16a and will receive every message transmitted by the
Operations Controller "A" over communication link 16a. In a like manner,
the Operations Controller "B" of Node 10B is the only Operations
Controller capable of transmitting messages on communication link 16b, and
Operations Controller N of the Node 10N is the only Operations Controller
capable of transmitting messages on communication link 16n.
External information from sensors and manually operated devices
collectively identified as Input Devices 20 are transmitted directly to
the Applications Processors 14 of each Node through an input line 18. It
is not necessary that every Applications Processor receive information
from every sensor and/or Input Device however, each Applications Processor
14 will receive the information from every sensor and/or Input Device
which it needs in the execution of the applications task.
In a like manner, the Applications Processor 14 in each Node will transmit
data and control signals, resulting from the execution of the applications
task to one or more actuators and/or display devices collectively
identified as Output Devices 22. The data and/or control signals generated
by the Applications Processor 14 in the individual Nodes 10A through 10N
may be combined by a Combiner/Voter Network 24 before it is transmitted to
the Output Devices 22. Further, when multiple values of the same data
and/or control signals are generated by two or more of the Nodes, the
Combiner/Voter Network 24 may also be used to generate a single voted
value which is transmitted to the Output Devices 22. The use or omission
of a Combiner/Voter Network 24 is optional. It is not necessary that every
actuator or display receive the output generated by every Node in the
system. The specific actuator or display only needs to be connected to the
Node or Nodes whose Applications Processor 14 is capable of generating the
data or command signals it requires.
The network of Operations Controllers 12 is the heart of the system and is
responsible for the inter-node communications, system synchronization,
data voting, error detection, error handling, task scheduling, and
reconfiguration. The Applications Processors 14 are responsible for the
execution of the application tasks and for communications with the Input
Devices 20 and Output Devices 22. In the multi-computer architecture, the
overhead functions performed by the Operations Controllers 12 are
transparent to the operations of the Applications Processor 14. Therefore,
the structure of the Applications Processor 14 may be based solely upon
the application requirements. Because of this, dissimilar Applications
Processor 14 may be used in different Nodes without destroying the
symmetry of the multi-computer architecture.
The structural details of the Operations Controller 12 in each Node 10A
through 10N are shown in FIG. 2. Each Operations Controller 12 has a
transmitter 30 for serially transmitting messages on the Node's private
communication link 16. For discussion purposes, it will be assumed that
the Operations Controller illustrated in FIG. 2 is the Operations
Controller A as shown in FIG. 1. In this case, the Transmitter 30 will
tranmit messages on the private communication link 16a. Each Operations
Controller also has a plurality of Receivers 32a through 32n, each of
which is connected to a different private communication link. In the
preferred embodiment, the number of Receivers 32a through 32n is equal to
the number of Nodes in the multi-computer architecture. In this way, each
Operations Controller 12 will receive all of the messages transmitted by
every Node in the system including its own. Each Receiver 32a through 32n
will convert each message received over the private communiction link to
which it is connected from a serial format to a parallel format then
forward it to a Message Checker 34. Each Receiver 32a through 32n will
also check the vertical parity and the longitudinal redundancy codes
appended to each of the received messages and will generate an error
signal identifying any errors detected.
The Message Checker 34 monitors the Receivers 32a through 32n and subjects
each received message to a variety of physical and logical checks. After
completion of these physical and logical checks, the messages are sent to
a Fault Tolerator 36. Upon the detection of any errors in any message, the
Message Checker 34 will generate an error status byte which is also
transmitted to the Fault Tolerator 36.
The Fault Tolerator 36 performs five basic functions. First, the Fault
Tolerator performs further logical checks on the messages received from
the Message Checker 34 to detect certain other errors that were not
capable of being detected by the Message Checker 34. Second, the Fault
Tolerator passes error free messages to a Voter 38 which votes on the
content of all messages containing the same information to generate a
voted value. Third, it passes selected fields from the error free messages
to other subsystems as required. Fourth, the Fault Tolerator aggregates
the internal error reports from the various error detection mechanisms in
the Operations Controller and generates Error messages which are
transmitted to all of the other Nodes in the system by the transmitter 30.
Finally, the Fault Tolerator 36 monitors the health status of each Node in
the system and will initiate a local reconfiguration when a Node is added
or excluded from the current number of operating Nodes. The Fault
Tolerator 36 maintains a base penalty count table which stores the current
base penalty counts accumulated for each Node in the system. Each time a
Node transmits a message containing an error, every Node in the system,
including the one that generated the message, should detect this error and
generate an Error message identifying the Node that sent the message
containing the error, the type of error detected, and a penalty count for
the detected error or erros. Each Fault Tolerator 36 will receive these
Error messages from every other Node and will increment the base penalty
count for that Node which is currently being stored in the base penalty
count table, if the detection of the error is supported by Error messages
received from a majority of the Nodes. The magnitude of the penalty count
increment is predetermined and is proportional to the severity of the
error. If the incremented base penalty count exceeds an exclusion
threshold, as shall be discussed later, the Fault Tolerator initiates a
Node exclusion and a reconfiguration process in which the faulty Node is
excluded from active participation in the system and the active task sets
for the remaining Nodes are changed to accommodate for the reduction in
the number of active Nodes.
The Fault Tolerator 36 will also periodically decrement the base penalty
count for each Node in the system so that a Node which was previously
excluded may be readmitted into the active system. When a previously
excluded Node continues to operate in an error free manner for a
sufficient period of time, its base penalty count will be decremented
below a readmittance threshold which will initiate a Node | | |