WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Task scheduler for a fault tolerant multiple node processing system    
United States Patent4805107   
Link to this pagehttp://www.wikipatents.com/4805107.html
Inventor(s)Kieckhafer; Roger M. (Ellicott City, MD); Finn; Alan M. (Amston, CT); Walter; Chris J. (Columbia, MD)
AbstractA task scheduler for a fault tolerant multiple node processing system having a task activity list storing a set of application tasks, a priority scan list storing a selected portion of the set of application tasks, a completion status list also storing the same selected portion of the set of application tasks. A wake-up sequencer transfers the application tasks from the task activity list to the priority scan list, and a priority scanner transfers the application tasks ready for execution from the priority scan list to a selection queue. A next task selector selects the next application task that its node will execute, and a task started register stores the identity of the application tasks completed by the other nodes. A task interactive consistency (TIC) handler updates the status of the application tasks stored in the task activity list, the priority scan list, and the completion status list in response to messages received from the other nodes identifying which nodes completed tasks. The task interactive consistency handler checks the scheduling process of each node by comparing the application tasks it reported to have started with the highest priority application task scheduled for that node in its selection queue.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4805107
Task scheduler for a fault tolerant multiple node processing system - US Patent 4805107 Drawing
Task scheduler for a fault tolerant multiple node processing system
Inventor     Kieckhafer; Roger M. (Ellicott City, MD); Finn; Alan M. (Amston, CT); Walter; Chris J. (Columbia, MD)
Owner/Assignee     Allied-Signal Inc. (Morris Township, Morris County, NJ)
Patent assignment
All assignments
Publication Date     February 14, 1989
Application Number     07/039,190
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     April 15, 1987
US Classification     714/4 714/15 718/103 719/310
Int'l Classification     G06F 015/16
Examiner     Zache; Raulfe B.
Assistant Examiner    
Attorney/Law Firm     Massung; Howard G.
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 364/900
Patent Tags     task scheduler fault tolerant multiple node processing
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4642756
Sherrod
718/103
Feb,1987

[0 after 0 votes]
4318173
Freedman
718/103
Mar,1982

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. In a mutliple node fault tolerant processing system for processing a set of application tasks in which each node has an applications processor for executing a predetermined subset of said set of application tasks and an operations controller for controlling its own node and scheduling the application tasks in said predetermined subset of tasks that are to be executed by the applications processor through an exchange of inter-node messages containing data and operation information with each node in the processing system, the operations controller generating at least two timing period intervals, a fundamental timing period and a master period which is an integer multiple of the fundamental timing period, the master period defining a timing interval during which every task in said predetermined subset of tasks is scheduled for execution by the applications processor at least once, each operations controller having a task scheduler comprising:

a task activity list containing an entry for each application task in said multiple node processing system, each entry containing an execution periodicity and a node allocation for that application task;

a priority scan list containing a selected portion of said application tasks in the task activity list which are available for execution, said selected portion of said application tasks being stored in their preferred order of execution;

a completion status list storing said selected portion of said application tasks stored in said priority scan list;

a selection queue storing for each node said application tasks ready for selection in their preferred order of execution;

a period counter for counting said fundamental timing periods to generate a period count corresponding to a number of fundamental periods which have expired since a beginning of a new master period;

wake-up sequencer means connected to said task activity list, said priority scan list, said completion status list and said period counter for interrogating said task activity list to transfer to said priority scan list and said completion status list all of said application tasks whose periodicity is greater than said period count;

priority scan means connected between said priority scan list and said selection queue for transferring to said selection queue for each node entry three application tasks which are ready for execution by that node, which have a highest priority in said preferred order of execution;

task selector means connected to said selection queue for selecting in said preferred order of execution, said application task currently stored in said selection queue for its own node as a next task scheduled for execution by its own applications processor; and

a task interactive consistency handler connected to said task activity list, said priority scan list, said completion status list and said selection queue for updating the status of each task in said task activity list, said priority scan list, said completion status list and said selection queue which are identified in inter-node messages reporting the completion of a task.

2. The task scheduler of claim 1 wherein each application task entry of said completion status list has a completion count entry storing a 2's complement of a number of nodes which are scheduled to execute that application task, said task interactive consistency handler having means for incrementing said completion count in response to inter-node messages identifying which node completed that application task and for setting a terminated flag when said completion count is decremented to zero indicating that the task has been executed by each of the nodes scheduled to execute that task.

3. The task scheduler of claim 1 wherein each task entry of said task activity list and said priority scan list has a predecessor count entry indicative of a number of tasks which must be terminated before it can be executed, said task interactive consistency handler having a successor list storing an identity of all the tasks for which a terminated task is a predecessor, and means responsive to a termination of a task for accessing said successor list to identify each task for which said terminated task is a predecessor, and for decrementing the predecessor count in said task activity list and said priority scan list for each of said identified tasks.

4. The task scheduler of claim 3 further having an "old task" table connected to said task interactive consistency handler, said "old task" storing for each node the task currently being executed by that node, said task interactive consistency handler having means for recording as "used" in said selection queue a task having a highest priority in said preferred order of execution currently stored for each node which reported it has started a new task and for recording an identity of said task having said highest priority in said "old task" table.

5. The task scheduler of claim 1 in which the inter-node messages exchanged between said multiple nodes includes a task completed/started message and a task interactive consistency message, said task completed/started message is sent to each node in said multiple node processing system whenever a node begins a new task, said task completed/started message containing at least an identity of the task started and an identity of the task completed by that node and said task interactive consistency message is sent at predetermined timing intervals and contains a task completed vector identifying each node which sent a task completed/started message, said task completed vector being a voted composite of said task completed/started messages received from each of said multiple nodes in a predetermined timing interval, said task scheduler having a started task register connected to said task interactive consistency handler for storing said identity of the task reported as started in said task completed/started messages received from that node and said task interactive consistency handler responsive to said task completed vector contained in said task interactive consistency message to compare an identity of the task having said highest priority stored for each node identified as having completed a task in said task completed vector with an identity of the task stored in said task started register and to generate a sequence error signal said identity of the task having said highest priority is different from said identity of the task in said task completed vector.

6. The task scheduler of claim 5 having a Byzantine voter voting on the task completed vectors received in the task interactive consistency messages from all the nodes to generate a Byzantine voted task completed vector.

7. The task scheduler of claim 6 having an "old task" table connected to said task interactive consistency handler storing for each node the task currently being executed by that node, said task interactive consistency handler having means for recording as "used" in said selection queue the task having said highest priority currently stored for each node identified in said Byzantine voted task completed vector as having completed a task and for recording in said "old task" table for each node identified in said Byzantine voted task completed vector, an identity of the task just recorded as "used" in said selection queue.

8. The task scheduler of claim 7 wherein each task entry of said completion status list has a completion count entry storing a 2's complement of a number of nodes which are scheduled to execute that task, said task interactive consistency handler accessing said "old task" table to obtain an identity of a current task for each node identified as having completed a task by said Byzantine voted task completed vector and using said identity of said current task to access said completion status list to increment said completion count entry, said task interactive consistency handler including means for setting a terminated flag when said completion count entry is decremented to zero and for generating an internal message identifying each task whose terminated flag is set.

9. The task scheduler of claim 8 wherein each task of said set of application tasks has an associated maximum execution time and an associated minimum execution time and wherein said task scheduler has a plurality of execution timers connected to said "old task" table and said interactive consistency handler, one execution timer for each node for storing a 2's complement of the execution time for a current task, said task interactive consistency handler being responsive to said Byzantine voted task completed vector to generate an execution time error for each node identified as having completed a task whose associated maximum execution time is greater than said execution time stored in its associated execution timer, said task interactive consistency handler further loading said associated execution timer with a 2's complement of the maximum execution time for a next highest priority unused task in the selection queue for that node, each of said execution timers being periodically incremented and operative to generate a timing error signal when said execution time is incremented to zero.

10. The task scheduler of claim 9 wherein each task entry of said task activity list and said priority scan list includes a predecessor count corresponding to a number of predecessor tasks that must be completed before that task can be executed, and an allocation vector identifying each node on which that task can be executed, said task activity list further having a periodicity number corresponding to a number of fundamental timing periods which must pass before the task can be executed again, and said priority scan list having an iteration entry corresponding to a number of times that task has been executed, said wake-up sequencer comparing said period count generated by said period counter with the periodicity stored for each task to transfer said predecessor count and said allocation vector from said task activity list to an appropriate entry in said priority scan list.

11. The task scheduler of claim 10 wherein said selection queue stores three entries for each node in the processing system, each entry having a "used" flag, an iteration count and a task identification code, said priority scan means scans said priority scan list to transfer to said selection queue, for each node, said three tasks having said highest priority whose precedent count is zero and whose allocation vector indicates the task is to be executed by that node, said tasks being stored in said selection queue in said order of preferred execution.

12. The task scheduler of claim 11 wherein said selection queue has three pages, a NEXT page from which said task selector means selects the tasks scheduled for execution by the applications processor, a PREVIOUS page, and a CHECK page, each of said three pages containing three entries for each node, and each entry having a "used" bit which is set to true when said task selector means selects that task, an iteration number which specifically identifies this task from prior executions of said task, and a task identification code which uniquely identifies that task from all other tasks.

13. The task scheduler of claim 12 wherein said task interactive consistency handler marks as "used" on said CHECK page, an unused task having said highest priority for each node identified in said Byzantine voted task completed vector as having completed a task and records the task identification code of the entry just marked "used" in the entry for that same node in said "old task" table, said task interactive consistency handler further marking as "used" on said PREVIOUS and NEXT pages for each node said task interactive consistency handler marked "used" on said CHECK page.

14. The task scheduler of claim 13 wherein said operations controller has a sub-atomic period timing interval which is a subdivision of said fundamental timing period, said task interactive consistency handler transferring each entry of said PREVIOUS page to said CHECK page and transferring each entry of said NEXT page to said PREVIOUS page at a beginning of each of said subatomic period timing intervals prior to said priority scan means transferring a next set of highest priority tasks to said NEXT page for each node.

15. The task scheduler of claim 13 wherein said operations controller has a sub-atomic period timing interval which is a subdivision of said fundamental timing period, and said task interactive consistency handler has a plurality of pointers, one for each of said NEXT, PREVIOUS and CHECK pages of said selection queue, said task interactive consistency handler rotating said plurality of pointers such that the NEXT page becomes the PREVIOUS page, the PREVIOUS page becomes the CHECK page, and the CHECK page becomes the NEXT page at a beginning of each sub-atomic period prior to said priority scan means transferring a next set of highest priority tasks to said NEXT page for each node.

16. The task scheduler of claim 13 wherein said task interactive consistency message further includes a branch condition bit for each node which completed a task, said branch bit being a 1-bit identifying a first of two possible successors for the completed task and a 0-bit identifying a second of said two possible successors, said Byzantine voter also performs a Byzantine vote on said branch condition bits to generate a Byzantine voted branch byte containing voted values of said branch condition bits for each node, and wherein each entry of said completion status list has a completion count entry storing a number of nodes which completed the task, a branch count entry storing a number of nodes whose branch condition bit for that task is a 1-bit and an allocation vector, said task interactive consistency handler obtaining for each node identified in said Byzantine voted task completed vector as having completed a task, an identity of a completed task from said "old task" table, and updating the status of said completed tasks in said completion status list by incrementing the completion count once for each node that completed that task, by adding said Byzantine voted branch bit to said branch count and by setting to "false" an allocation bit in said allocation vector.

17. The task scheduler of claim 16 wherein said successor list has for each completed task a first successor task list containing the successor tasks to be executed when said branch condition bit is a 1-bit and a second successor task list containing the successor tasks to be executed when said branch condition bit is a 0-bit, said task interactive consistency handler comparing said branch count with one-half the value of said completion count, in response to all the allocation flags being set to "false," to determine which branch condition bit is agreed on by a majority of nodes which completed the task, accessing said successor task list for that task and using the branch condition bit of said majority to get an identification code for a successor task and using said task identification code to access said successor task in said task activity list and said priority scan list to increment the predecessor count for that task.

18. The task scheduler of claim 1 wherein said operations contoller has means for excluding from and readmitting nodes to active participation in said multiple node processing system and said operations controller further has means for periodically generating a system state vector identifying each node currently excluded from participating in said multiple node processing system, said task scheduler further having a reconfiguration module comprising:

a system state comparator for comparing each received system state vector with an immediately preceding system state vector to generate a delta vector identifying each newly excluded or readmitted node;

a relevance table storing a relevance vector for each task identifying which nodes may execute the task;

a swap table storing a predecessor count, said periodicity and a swap count for each task, said swap count corresponding to a number of nodes relative to that task that may be excluded or readmitted before its swap status is to be complemented;

a task swapper connected to said swap table and said task activity list for identifying each task to which the node identified in the delta vector was relevant to test said swap count and to complement its swap status if said swap count is zero, said swap status being indicative of whether the task will be included in said task activity list or removed from said task activity list in the system state identified by the system state vector, said task swapper further decrementing said swap count in response to the node identified in the delta vector being excluded and incrementing said swap count in response to the node identified in said delta vector being readmitted;

an allocation table storing an allocation count for each node-task combination indicative of the current node allocation status of each task;

a preferred table storing a preferred vector for each task listing the nodes preferred for the execution of that task;

task reallocator means connected to said allocation table, said preferred table and said task activity list for determining for each node, from said preferred table, if the node identified in the delta vector was more preferred for the execution of that task than itself, for complementing the allocation status for that node when the node is more preferred than the node identified in the delta vector and said allocation count is zero, for decrementing said allocation count when said node identified in said delta vector is more preferred and is an excluded node and for incrementing said allocation count when said node identified in said delta vector is more preferred and is a readmitted node; and

means for recording in said task activity list said predecessor count and the periodicity from the swap table and the allocation status of each task from said allocation table whose swap status indicates that the task should be active in the next system state of the system.

19. The task scheduler of claim 18 wherein said set of applications tasks includes a first set of tasks to be executed only by said non-excluded nodes and a second set of tasks to be executed only by the excluded nodes and said relevance table includes an excluded flag bit which identifies each task in said second set of tasks, said reconfiguration module further includes a task status matcher which will compare for each task, the status of each node identified in said next system state vector with said excluded flag bit to generate a match status bit enabling said means for recording to record that task in said task activity list.
 Description Submit all comments and votes
 


CROSS REFERENCE

This invention is related to commonly assigned, copending patent application Ser. Nos. 038,813 and 038,818 filed on Apr. 15, 1987, concurrently herewith.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention is related to the field of multiple node processing systems and in particular to a task scheduler for scheduling the tasks to be executed by the individual nodes in the multiple node processing system.

2. Description of the Prior Art

The earliest attempts to produce fault tolerant computer systems provided redundant computers in which each computer simultaneously executed every task required for the control operation. Voting circuits monitoring the outputs of the multiple computers determined a majority output which was assumed to be the correct output for the system. In this type of system, a faulty computer may or may not be detected and the faulty computer may or may not be turned off.

The redundant computer concept, although highly successful, is expensive because it requires multiple computers of equivalent capabilities. These systems require powerful computers because each computer has to perform every task required for the operation of the system. As an alternative, the master-slave concept was introduced in which the operation of several computers were controlled and coordinated by a master control. The master control designated which tasks were to be executed by the individual computers. This reduced the execution time of the control operation because all the computers were no longer required to execute every task, and many of the tasks could be executed in parallel. In this type of system when a computer is detected as faulty, the master could remove it from active participation in the system by assigning the task that would normally have been assigned to the faulty computer to the other computers. The problem encountered in the master-slave concept is that the system is totally dependent upon the health of the master and if the master fails then the system fails. This defect may be rectified by using redundant master controls, however, the increased cost of redundant masters limits the applicability of these systems to situations where the user is willing to pay for the added reliability. Typical of such situations are the controls of nuclear power plants, space exploration and other situations where failure of the control system would endanger lives.

Recent improvements to the master-slave and redundant execution fault tolerant computer systems are discussed above exemplified in the October 1978 proceedings of the IEEE, Volume 66, No. 10, which is dedicated to fault tolerant computer systems. Of particular interest are the papers entitled "Pluribus: An Operational Fault Tolerant Microprocessor" by D. Katuski et al., Pages 1146-1159 and "SIFT: The Design and Analysis of a Fault Tolerant Computer for Aircraft Control" by J. H. Wensley et al., Pages 1240-1255. The SIFT system uses redundant execution of each system task and of the master control functions. The Pluribus system has a master copy of the most current information which can be lost if certain types of faults occur.

More recently a new fault tolerant multiple computer architecture has been disclosed by Whiteside et al, in U.S. Pat. No. 4,256,547, in which each of the individual task execution nodes has an applications processor and an operations controller which functions as a master for its own node. These operations controllers, in coordination with each other through the exchange of data and other information by means of inter-node messages, select the task its own node's applications processor will execute. The task selection by the individual operations controllers is made on a distributed basis such that the execution of each task required for the operation of the control system may be selected by more than one of the operations controller in a fault tolerant manner. In this system each node is assigned a subset of the tasks it is capable of selecting and executing and no node is required to execute every task. The operations controllers are individually capable of detecting faulty nodes and excluding them from participation in the system. A predecessor of the multiple computer system is described by C. J. Walter et al in their paper "MAFT: A Multicomputer Architecture for Fault-Tolerance in Real-Time Control Systems" published in the proceedings of the Real Time System Symposium, San Diego, Dec. 3-6, 1985.

The present invention is a task scheduler for an operations controller in a fault tolerant multiple node processing system. This task scheduler is comparable to the scheduler taught by Whiteside et al in U.S. Pat. No. 4,323,966 and is individually set forth in Freedman et al in U.S. Pat. No. 4,318,173.

SUMMARY OF THE INVENTION

The invention is a task scheduler for the operations controller of a multiple node fault tolerant processing system capable of processing a set of application tasks. Each node in the fault tolerant processing system has an applications processor for executing a predetermined subset of the set of application tasks and an operations controller for controlling the operation of the node and scheduling the order in which the individual tasks in the predetermined set of tasks are to be executed by the applications processor through the exchange of inter-node messages with the other nodes in the system. These messages contain data and operating information necessary for the operations of the individual nodes and for the execution of the predetermined subset of tasks. The operations controllers further generate at least two timing period intervals, a fundamental timing period and a master period which is an integer multiple of the fundamental period. The master period defines the timing interval during which every task in the predetermined subset of tasks is scheduled for execution by the applications processor at least once.

The task scheduler has a task activities list containing an entry for each active task in the multiple node processing system, each entry containing an execution periodicity and a node allocation, a priority scan list containing a list of tasks in their preferred order of execution, a completion status list containing an entry for each task stored in the priority scan list, and a selection queue storing for each node the task ready for selection in their preferred order of execution. The task scheduler further includes a period counter for counting the fundamental timing periods to generate the period count corresponding to the number of fundamental timing periods that have expired since the beginning of a new master period, a wake-up sensor means for interrogating the task activity list to transfer to the priority scan list and the completion status list all of the tasks whose periodicity is greater than the period count, and priority scan means for transferring to the selection queue, for each node entry, the three highest priority tasks which are ready for execution by that node. The task scheduler also includes a task selector means for selecting for its own node the highest priority task currently stored in the selection queue as the next task scheduled for execution by its own applications processor, a task interactive consistency handler for updating the status of each task in the task activity list, the priority scan list, the task completed list and the selection queue which are identified in inter-node messages reporting the completion of a task.

The object of the invention is a priority based, data driven task scheduler for selecting the task to be executed by its own applications processor.

Another object of the invention is a task scheduler which concurrently tracks the scheduling process in every other node in the system.

Another object of the invention is a scheduler which can detect when any of the other nodes in the system makes a scheduling error.

Another object of the invention is a scheduler capable of reallocating the task when the number of active nodes in the system changes.

Another object of the invention is a scheduler in which the reallocation of tasks may include the deletion of some tasks and the addition of the other tasks in response to certain changes in the number of active nodes in the system.

These and other objects of the invention will become more apparent from a reading of the specification in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of the multi-computer architecture;

FIG. 2 is a block diagram of the Operations Controller;

FIG. 3 is the master/atomic period timing diagram;

FIG. 4 is a the atomic/subatomic period timing diagram;

FIG. 5 is a block diagram of the Transmitter;

FIG. 6 is a circuit diagram of one of the interfaces;

FIG. 7 is a block diagram of the Arbitrator;

FIG. 8 shows waveforms for the Self-Test Arbitration Logic;

FIG. 9 is a block diagram of the Longitudinal Redundancy Code Generator;

FIG. 10 is a block diagram of a Receiver;

FIG. 11 is a block diagram of the Message Checker;

FIG. 12 is a block diagram of the decision logic for the Between Limits Checker;

FIG. 13 is the format for the error status byte generated by the message Checker;

FIG. 14 is a block diagram of the Fault Tolerator;

FIG. 15 shows the partitioning of the Fault Tolerator RAM;

FIG. 16 shows the format of the Message partition of the Fault Tolerator RAM;

FIG. 17 shows the format of the Error Code Files partition of the Fault Tolerator RAM;

FIG. 18 shows the format of the Group Mapping partition of the Fault Tolerator RAM;

FIG. 19 shows the format of the Error Code Files partition of the Fault Tolerator RAM;

FIG. 20 shwos the format of the Penalty Weight partition of the Fault Tolerator RAM;

FIG. 21 is a block diagram of the Fault Tolerator's Message Checker Interface;

FIG. 22 is a block diagram of the Fault Tolerator's Error Handler;

FIG. 23 is a block diagram of the Error Handler's Error Consistency Checker;

FIG. 24 is a block diagram of the Error Handler's Validity Checker;

FIG. 25 illustrates the format of the error byte in an error message;

FIG. 26 is a timing diagram of the reconfiguration sequence;

FIG. 27 is a block diagram of the Voter Subsystem;

FIG. 28 is a flow diagram for the Upper and Lower Medial Value Sorters;

FIG. 29 is a circuit diagram of the Lower Medial Value Sorter;

FIG. 30 is a flow diagram for the Averaging Circuit;

FIG. 31 is a circuit diagram of the Averaging Circuit;

FIG. 32 is a flow diagram of the Deviance Checker;

FIG. 33 is a circuit diagram of a Deviance Checker;

FIG. 34 is a block diagram of the Scheduler;

FIG. 35 shows the data format of the Scheduler RAM;

FIG. 36 shows the data format of the Scheduler ROM;

FIG. 37 is a block diagram of the Scheduler's Task Selector Module;

FIG. 38 is a flow diagram of the Wake-Up Sequencer's operation;

FIG. 39 is a flow diagram of the Execution Timer's operation;

FIG. 40 is a flow diagram of the TIC Handler's operation;

FIG. 41 is a flow diagram of the TIC Handler's Selection Queue Update sub-process;

FIG. 42 is a flow diagram of the TIC Handler's Completion/Termination sub-process;

FIG. 43 is a flow diagram of the TIC Handler's Execution Timer Reset sub-process;

FIG. 44 is a flow diagram of the TIC Handler's Priority Scan List Update sub-process;

FIG. 45 is a flow diagram of the Priority Scanner's operation;

FIG. 46 is a flow diagram of the Next Task Selector's operation;

FIG. 47 is a block diagram of the Reconfigure Module;

FIG. 48 is a flow diagram for the Task Swapper's operation in response to a Node being excluded from the operating set;

FIG. 49 is a flow diagram of the Task Swapper's operation in response to a Node being readmitted to the operating set;

FIG. 50 is a flow diagram of the Task Reallocator's operation in response to a Node being excluded from the operating set;

FIG. 51 is a flow diagram of the Task Status Matcher's operation;

FIG. 52 is a block diagram of the Task Communicator;

FIG. 53 is a partial block diagram of the Task Communicator showing the elements associated with the operation of the Store Data Control;

FIG. 54 is a flow diagram of the Store Data Control's operation;

FIG. 55 is a partial block diagram of the Task Communicator showing the elements associated with the operation of the DID Request Handler;

FIG. 56 is a flow diagram of the DID Request Handler's operation;

FIG. 57 is a partial block diagram of the Task Communicator showing the elements associated with the operation of the Task Terminated Recorder;

FIG. 58 is a flow diagram of the Task Terminated Recorder's operation;

FIG. 59 is a partial block diagram of the Task Communicator showing the elements associated with the operation of the Task Started Recorder;

FIG. 60 is a flow diagram of the Task Started Recorder's operation;

FIG. 61 is a partial block diagram of the Task Communicator showing the elements associated with the operation of the AP Input Handler;

FIG. 62 is a flow diagram of the AP Input Handler's operation;

FIG. 63 is a partial block diagram of the Task Communicator showing the elements associated with the operation of the AP Output Handler;

FIG. 64 is a flow diagram showing the AP Output Handler's operation;

FIG. 65 shows the format of the DID information as stored in the DID List;

FIG. 66 shows the format of the DID information with the NUDAT bit appended;

FIG. 67 is a partial block diagram of the Task Communicator showing the subsystems involved in "reconfiguration";

FIG. 68 is a flow diagram showing the operation of the Reconfigure Control during reconfiguration;

FIG. 69 is a partial block diagram of the Task Communicator showing the subsystems involved in "reset";

FIG. 70 is a flow diagram of the Reset Control during reset;

FIG. 71 is a block diagram of the Synchronizer;

FIG. 72 shows the format of the Synchronizer Memory;

FIG. 73 shows the format of the Message Memory;

FIG. 74 shows the format of the Time Stamp Memory;

FIG. 75 shows the format of the Scratch Pad Memory;

FIG. 76 shows the waveforms of the signals generated by the Timing Signal Generator;

FIG. 77 is a block diagram of the Synchronizer Control;

FIG. 78 is a flow diagram showing the operation of the Data Handler and Expected Message Checker;

FIG. 79 is a flow diagram showing the operation of the Within Hard Error Window and Soft Error Window Checker and the Time Stamper;

FIG. 80 is a flow diagram for the operation of the "HEW to warning count";

FIG. 81 is a partial block diagram of the Synchronizer showing the elements associated with the operation of the Message Generator;

FIG. 82 is a flow diagram of the operation of the Message Generator and the Transmitter Interface;

FIG. 83 shows the waveforms of the timing signals for generating a TIC message;

FIG. 84 shows the waveforms of the timing signals for generating a sync System State message;

FIG. 85 shows the format of the "cold start" pre-sync message;

FIG. 86 is a flow diagram showing the operation of the Synchronizer during a "cold start";

FIGS. 87 and 87a are flow diagrams showing the generation of the HEW to warning signal during "cold start";

FIG. 88 is a flow diagram showing the storing of data during a "cold start";

FIG. 89 is a flow diagram showing the operation of the Operating Condition Detector during a "cold start";

FIG. 90 is a timing diagram used in the description of the "cold start";

FIG. 91 is a flow diagram of the operation of the Synchronizer during a "warm start";

FIG. 92 is a timing diagram used in the description of a "warm start";

FIG. 93 is a flow diagram of the operation of the Byzantine Voter to generate Byzantine voted task completed vector and Byzantine voted branch condition bits for the Scheduler;

FIG. 94 is a perspective of the Byzantine Voter's three-dimensional memory;

FIG. 95 shows the two-dimensional format of ISW vectors resulting from the first Byzantine vote on the three-dimensional ISW matrices; and

FIG. 96 is a functional circuit diagram of the Byzantine Voter.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The multi-computer architecture for fault tolerance is a distributed multi-computer system based on the functional and physical partitioning of the application tasks and the overhead functions, such as fault tolerance and systems operations. As shown in FIG. 1, the multi-computer architecture consists of a plurality of Nodes 10A through 10N, each having an Operations Controller 12 for performing the overhead functions and an Applications Processor 14 for executing the application tasks.

For each application, the multi-computer architecture is required to execute a predetermined set of tasks, collectively called application tasks. Each Node is allocated an active task set which is a subset of the application tasks. Each Node in coordination with all of the other Nodes is capable of selecting tasks from its active task set and executing them in a proper sequence. The active task set for each Node may be different from the active task set allocated to the other Nodes and each task in the application tasks may be included in the active tasks set of two or more Nodes depending upon how many Nodes are in the system and the importance of the task to the particular application. In this way, the multi-computer architecture defines a distributed multi-computer system in which no one Node 10 is required to execute every one of the application tasks, yet the failure of one or more Nodes need not prevent the execution of any application task. As shall be more fully explained later on, the active task set in each Node is static for any given system configuration or system state and will change as the system state changes with an increase or decrease in the number of active Nodes. This change in the active task set called "reconfiguration," takes place automatically and assures that every one of the important or critical application tasks will be included in the active task set of at least one of the remaining active Nodes in the system.

Each Node 10A through 10N is connected to every other Node in the multi-computer architecture through its Operational Controller 12 by means of a private communication link 16. For example, the Operations Controller "A" is the only Operations Controller capable of transmitting on communication link 16a. All of the other Nodes are connected to the communication link 16a and will receive every message transmitted by the Operations Controller "A" over communication link 16a. In a like manner, the Operations Controller "B" of Node 10B is the only Operations Controller capable of transmitting messages on communication link 16b, and Operations Controller N of the Node 10N is the only Operations Controller capable of transmitting messages on communication link 16n.

External information from sensors and manually operated devices collectively identified as Input Devices 20 are transmitted directly to the Applications Processors 14 of each Node through an input line 18. It is not necessary that every Applications Processor receive information from every sensor and/or Input Device however, each Applications Processor 14 will receive the information from every sensor and/or Input Device which it needs in the execution of the applications task.

In a like manner, the Applications Processor 14 in each Node will transmit data and control signals, resulting from the execution of the applications task to one or more actuators and/or display devices collectively identified as Output Devices 22. The data and/or control signals generated by the Applications Processor 14 in the individual Nodes 10A through 10N may be combined by a Combiner/Voter Network 24 before it is transmitted to the Output Devices 22. Further, when multiple values of the same data and/or control signals are generated by two or more of the Nodes, the Combiner/Voter Network 24 may also be used to generate a single voted value which is transmitted to the Output Devices 22. The use or omission of a Combiner/Voter Network 24 is optional. It is not necessary that every actuator or display receive the output generated by every Node in the system. The specific actuator or display only needs to be connected to the Node or Nodes whose Applications Processor 14 is capable of generating the data or command signals it requires.

The network of Operations Controllers 12 is the heart of the system and is responsible for the inter-node communications, system synchronization, data voting, error detection, error handling, task scheduling, and reconfiguration. The Applications Processors 14 are responsible for the execution of the application tasks and for communications with the Input Devices 20 and Output Devices 22. In the multi-computer architecture, the overhead functions performed by the Operations Controllers 12 are transparent to the operations of the Applications Processor 14. Therefore, the structure of the Applications Processor 14 may be based solely upon the application requirements. Because of this, dissimilar Applications Processor 14 may be used in different Nodes without destroying the symmetry of the multi-computer architecture.

The structural details of the Operations Controller 12 in each Node 10A through 10N are shown in FIG. 2. Each Operations Controller 12 has a transmitter 30 for serially transmitting messages on the Node's private communication link 16. For discussion purposes, it will be assumed that the Operations Controller illustrated in FIG. 2 is the Operations Controller A as shown in FIG. 1. In this case, the Transmitter 30 will tranmit messages on the private communication link 16a. Each Operations Controller also has a plurality of Receivers 32a through 32n, each of which is connected to a different private communication link. In the preferred embodiment, the number of Receivers 32a through 32n is equal to the number of Nodes in the multi-computer architecture. In this way, each Operations Controller 12 will receive all of the messages transmitted by every Node in the system including its own. Each Receiver 32a through 32n will convert each message received over the private communiction link to which it is connected from a serial format to a parallel format then forward it to a Message Checker 34. Each Receiver 32a through 32n will also check the vertical parity and the longitudinal redundancy codes appended to each of the received messages and will generate an error signal identifying any errors detected.

The Message Checker 34 monitors the Receivers 32a through 32n and subjects each received message to a variety of physical and logical checks. After completion of these physical and logical checks, the messages are sent to a Fault Tolerator 36. Upon the detection of any errors in any message, the Message Checker 34 will generate an error status byte which is also transmitted to the Fault Tolerator 36.

The Fault Tolerator 36 performs five basic functions. First, the Fault Tolerator performs further logical checks on the messages received from the Message Checker 34 to detect certain other errors that were not capable of being detected by the Message Checker 34. Second, the Fault Tolerator passes error free messages to a Voter 38 which votes on the content of all messages containing the same information to generate a voted value. Third, it passes selected fields from the error free messages to other subsystems as required. Fourth, the Fault Tolerator aggregates the internal error reports from the various error detection mechanisms in the Operations Controller and generates Error messages which are transmitted to all of the other Nodes in the system by the transmitter 30. Finally, the Fault Tolerator 36 monitors the health status of each Node in the system and will initiate a local reconfiguration when a Node is added or excluded from the current number of operating Nodes. The Fault Tolerator 36 maintains a base penalty count table which stores the current base penalty counts accumulated for each Node in the system. Each time a Node transmits a message containing an error, every Node in the system, including the one that generated the message, should detect this error and generate an Error message identifying the Node that sent the message containing the error, the type of error detected, and a penalty count for the detected error or erros. Each Fault Tolerator 36 will receive these Error messages from every other Node and will increment the base penalty count for that Node which is currently being stored in the base penalty count table, if the detection of the error is supported by Error messages received from a majority of the Nodes. The magnitude of the penalty count increment is predetermined and is proportional to the severity of the error. If the incremented base penalty count exceeds an exclusion threshold, as shall be discussed later, the Fault Tolerator initiates a Node exclusion and a reconfiguration process in which the faulty Node is excluded from active participation in the system and the active task sets for the remaining Nodes are changed to accommodate for the reduction in the number of active Nodes.

The Fault Tolerator 36 will also periodically decrement the base penalty count for each Node in the system so that a Node which was previously excluded may be readmitted into the active system. When a previously excluded Node continues to operate in an error free manner for a sufficient period of time, its base penalty count will be decremented below a readmittance threshold which will initiate a Node