WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Data processing system with improved work flow system and method    
United States Patent5535322   
Link to this pagehttp://www.wikipatents.com/5535322.html
Inventor(s)Hecht; Matthew S. (Potomac, MD)
AbstractA Work Flow Manager (WFM), or process manager, is the software to manage and control the flow of work items from one function to the next in a well-defined application process to achieve the complete processing of those work items. Applications of WFMs include the processing of imaged or multimedia documents such as health and other insurance forms, filmless radiology, IRS tax submissions, and FBI fingerprint and voice identification. The invention WFM: a. provides an improved, scalable subsystem and method for work flow management; b. partitions the application (work flow) process into component distributed services, each represented by an OSF Distributed Computer Environment (DCE) service; c. defines the application process with a state transition diagram (STD); d. uses centralized control software with a work-in-process (WIP) manager, a work queue manager, and a WIP submission attributes data base manager; e. defines and uses a common "pull system" protocol for communication between the WFM and the component distributed services; f. distinguishes WIP submissions from archived submissions; and g. uses an "attribute-base file system" to store submissions, typically implemented with both a data base for submission attributes (including the current state of WIP submissions), and a distributed file system for submission contents files.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5535322
Data processing system with improved work flow system and method - US Patent 5535322 Drawing
Data processing system with improved work flow system and method
Inventor     Hecht; Matthew S. (Potomac, MD)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     July 9, 1996
Application Number     07/967,090
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 27, 1992
US Classification     705/1
Int'l Classification    
Examiner     Herndon; Heather R.
Assistant Examiner     Vo; Cliff N.
Attorney/Law Firm     Hoel; John E. Seaman; Kenneth A.
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/155 395/149 395/161 395/275 395/650 364/188 364/401 364/222.22
Patent Tags     data processing improved work flow
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5325478
Shelton
715/507
Jun,1994

[0 after 0 votes]
5321803
Ditter, Jr.
715/703
Jun,1994

[0 after 0 votes]
5293475
Hennigan
715/517
Mar,1994

[0 after 0 votes]
5228123
Heckel
715/762
Jul,1993

[0 after 0 votes]
5121319
Fath
700/83
Jun,1992

[0 after 0 votes]
5109337
Ferriter
705/29
Apr,1992

[0 after 0 votes]
5054096
Beizer
382/305
Oct,1991

[0 after 0 votes]
4875162
Ferriter
705/29
Oct,1989

[0 after 0 votes]
4751635
Kret
707/10
Jun,1988

[0 after 0 votes]
4503499
Mason
718/101
Mar,1985

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. In a data processing system, a method for managing and controlling the flow of imaged documents comprising work items from one function to the next in an application process to achieve the complete processing of said work items, comprising the steps of:

assigning in the data processing system, a unique submission identifier to each incoming submission;

establishing in the data processing system, a repository for work-in-process (WIP) submissions, having an attribute-based file system for WIP submission attributes, and an attribute-based file system for said WIP submission contents;

establishing in the data processing system, a data base of WIP submission attributes that for each submission includes a current state attribute;

storing in the data processing system, said WIP submission contents as files in a distributed file system;

establishing in the data processing system, a repository for archived submissions, having an attribute-based file system for archived submission attributes, and an attribute-based file system for archived submission contents;

defining in the data processing system, when to archive submission contents, when to archive submission attributes, and when to erase submissions from the repository of WIP submissions;

partitioning in the data processing system, an application process into distributed software services and defining a remote procedure call (RPC) interface for each component application service;

defining in the data processing system, an application work flow process with:

a state transition diagram (STD) that uses said application work flow process;

a list of the STD-using application work flow processes;

a service-specific structure of work queues;

for each service, a structure of the work queue, and

policy parameters for the assignment of work queues to services;

executing in the data processing system, a software, work flow, system architecture with functions for a WIP manager, to process incoming work items in priority order from a recoverable priority queue and to move WIP submissions through STD defined states;

a work queue manager, to manage application and service-specific, recoverable work queues and a work queue assignment table, and to assign work queues to services;

a WIP submission attributes data base manager, to maintain the state of each WIP submission; and

performing in the data processing system, a common pull system protocol for the WIP manager and work queue manager together to communicate with application services.

2. The method of claim 1 wherein said pull system protocol comprises the steps of:

a. service B notifies the work queue manager with an RPC that it is ready for work;

b. the work queue manager records this potential work assignment in its work assignment table;

c. service A completes work and so notifies the WIP manager with an RPC that adds the work completion notice to an in-box;

d. the WIP manager gets the next item of highest priority from the in-box;

e. the WIP manager updates the state of the WIP submission attributes data base;

f. the WIP manager tells the work queue manager with an RPC to enqueue work items, and in the same RPC tells the work queue manager that work completed;

g. the work queue manager enqueues work items on service-specific queues;

h. when a work queue for service B is ready, the work queue manager assigns it to a ready service B, and records the work assignment in its work queue assignment table;

i. the work queue manager notifies service B with an RPC of the assignment and the name of the work queue file;

j. service B copies the work queue file;

k. service B reads copies of objects it needs from temporary storage;

l. service B performs the work;

m. service B writes changed objects or new objects into temporary storage;

n. service B completes work and so notifies the WIP manager with an RPC that adds the work completion notice to the in-box; and

o. service B notifies the work queue manager with an RPC that it is ready for work.

3. The method of claim 1 wherein said submission represents multimedia documents.

4. The method of claim 1 wherein said method applies to a distributed architecture.

5. The method of claim 1 wherein said embedded STD captures and defines an application and its application-specific services.

6. The method of claim 1 wherein said component application service represents either a single instance of that service or multiple instances of that service.

7. The method of claim 1 wherein said component application service represents at least partially automated work, with work completion results entered and maintained on-line.

8. The method of claim 1 wherein said repository for WIP submissions and said repository for archived submissions can be combined into one repository for both WIP and archived submissions.

9. The method of claim 1 wherein said repositories for WIP and archived submissions are implemented with a single attribute-based file system using hierarchical storage with automatic caching and migration between successive levels.

10. The method of claim 1 wherein there is a small set of submission types that partitions the submissions, and the application work flow process is defined by several STDs, one for each submission type.

11. The method of claim 2 wherein said work queue manager maintains one logical work queue for each service and apportions out a batch of work items at a time, with each logical work queue implemented as a collection of files with a policy for determining the number of work items per file.

12. The method of claim 2 wherein said objects in step (k) are files that represent multimedia documents.

13. The method of claim 2 wherein said pull system protocol operates as a subordinate hierarchical method.

14. The method of claim 2 wherein said service B can selectively request and get work either (a) only when all previous work is completed, or (b) in anticipation to satisfy a fast response time requirement for the the ready for work notice.

15. In a data processing system, a method for managing and controlling the flow of imaged documents comprising work items from one function to the next in an application process to achieve the complete processing of said work items, comprising the steps of:

executing in the data processing system, a work flow method including a work in process (WIP) manager, to process incoming work items in priority order from a recoverable priority queue and to move WIP submissions through state transition diagram (STD) defined states;

said work flow method further including a work queue manager, to manage application and service-specific, recoverable work queues and a work queue assignment table, and to assign work queues to services;

said work flow method further including a WIP submission attributes data base manager, to maintain the state of each WIP submission; and

performing in the data processing system, a common pull system protocol for the WIP manager and work queue manager together to communicate with application services,

said pull system protocol comprising the steps of:

a. service B notifies the work queue manager with a remote procedure call (RPC) that it is ready for work;

b. the work queue manager records said ready to work notification in its work assignment table;

c. service A completes work and so notifies the WIP manager with an RPC that adds the work completion notice to an in-box;

d. the WIP manager gets the next item of highest priority from the in-box;

e. the WIP manager updates the state of the WIP submission attributes data base;

f. the WIP manager tells the work queue manager with an RPC to enqueue work items, and tells the work queue manager that work completed;

g. the work queue manager enqueues work items on service-specific queues;

h. when a work queue for service B is ready, the work queue manager assigns it to a ready service B, and records the work assignment in its work queue assignment table;

i. the work queue manager notifies service B with an RPC of the assignment and the name of the work queue file;

j. service B copies the work queue file;

k. service B reads copies of objects it needs from temporary storage;

l. service B performs the work;

m. service B writes changed objects or new objects into temporary storage;

n. service B completes work and so notifies the WIP manager with an RPC that adds the work completion notice to the in-box; and

o. service B notifies the work queue manager with an RPC that it is ready for work.

16. In a data processing system, a method for managing and controlling the flow of imaged documents comprising work items from one function to the next in an application process to achieve the complete processing of said work items, comprising the steps of:

processing with a work in process (WIP) manager in the data processing system, work items from one function to the next in priority order from a recoverable priority queue and moving WIP submissions through state transition diagram (STD) defined states;

accessing with a work queue manager in the data processing system, application and service-specific, recoverable work queues;

maintaining with a WIP submission attributes data base manager, a state of each WIP submission in the data processing system; and

communicating with a common pull system protocol in the data processing system, between the WIP manager and work queue manager and application services and performing the common pull system protocol to move items from one function to the next to achieve the processing of work items, wherein said step of communicating with said common pull system protocol includes the steps of:

a. service B notifies the work queue manager with a remote procedure call (RPC) that it is ready for work;

b. the work queue manager records said ready to work notification in its work assignment table;

c. service A completes work and so notifies the WIP manager with an RPC that adds the work completion notice to an in-box;

d. the WIP manager gets the next item of highest priority from the in-box;

e. the WIP manager updates the state of the WIP submission attributes data base;

f. the WIP manager tells the work queue manager with an RPC to enqueue work items, and tells the work queue manager that work completed;

g. the work queue manager enqueues work items on service-specific queues;

h. when a work queue for service B is ready, the work queue manager assigns it to a ready service B, and records the work assignment in its work queue assignment table;

i. the work queue manager notifies service B with an RPC of the assignment and the name of the work queue file;

j. service B copies the work queue file;

k. service B reads copies or objects it needs from temporary storage;

l. service B performs the work;

m. service B writes changed objects or new objects into temporary storage;

n. service B completes work and so notifies the WIP manager with an RPC that adds the work completion notice to the in-box; and

o. service B notifies the work queue manager with an RPC that it is ready for work.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention disclosed broadly relates to data processing and more particularly relates to the architecture and methods of work flow management in a distributed data processing system.

2. Prior Art

The following patent application relates to the invention:

U.S. patent application Ser. No. 07/902,908 filed Jun. 22, 1992 entitled "System and Method for Establishing Work Procedures in a Work Process Management System," by Marvin Addink, et. al., assigned to the IBM Corporation, now abandoned.

(Mason 1985) U.S. Pat. No. 4,503,499 by G. R. Mason, et al., entitled "Controlled Work Flow System" (Mar. 5, 1985);

(Beizer 1991) U.S. Pat. No. 5,054,096 by M. M. Beizer, entitled "Method and Apparatus for Converting Documents into Electronic Data for Transaction Processing," (Oct. 1, 1991) and the related Invention identified above.

To understand prior art, it is necessary to understand differing associated terminology.

(Mason 1985) describes an invention "system for automating office procedure to coordinate the flow of work on (imaged) documents and the transmittal of documents between office personnel," but it uses a different work flow management method than the present invention.

"In the controlled work flow process, the scheduling of work tasks in a project is controlled by the program executed by the data processor, which program is called the "work daemon." Each paperwork project involving a multiplicity of documents and a multiplicity of personnel to work on the documents is referred to as an "effort" and files referred to as RO files, each designating the schedule of paperwork for a given effort, are stored in the memory. Each effort is broken down into tasks with each task to be performed by one individual worker on one document. These tasks are called "work events." The individual workers and managers who make use of the controlled work flow system are called "users." The work daemon operates on the effort RO files to notify the users at the work stations that work events are ready to be carried out and issues BORROW, RETURN, STORE, and COPY requests to the central library facility to transmit documents between the work station processors and the central library facility. When a user indicates that he wishes to proceed with a specified work event, which involves modification of an existing document, the work daemon will issue a BORROW request to the central library facility to cause the document of the work event to be sent to the work station processor corresponding to the user for the work event. When a user indicates that he has completed a work event on a borrowed or newly created document, the work daemon will issue a RETURN or STORE request to the central library facility and cause the document, as modified, to be sent back to the central library facility for storage back in the library memory.

"An effort manager will be named to be in charge of each effort and an effort manager will usually have the responsibility for setting up the schedule of work events in a work effort. This schedule is called a "route specification" and each effort RO file is generated from the information in a route specification. To generate a route specification, the effort manager makes use of a program called the "effort program" in which he enters the data of a route specification into the computer system by means of one of the work stations. The effort program is preferably resident at each of the remote processors. The data will go first into a route specification file and, from the route specification file, it will be compiled into an effort RO file and stored in the memory."

(Beizer 1991) describes an invention for subjecting a large volume of scanned documents to transaction processing, together with routing programs to dictate the information flow, but it provides almost no detail on how the routing method works, It states the following about the routing program, which is the method for work flow management: "Basically the route program decides where the image of the document, images of related documents and data extracted by an optical character reader (OCR), which together form the "record," are to be transmitted and when. This routing program considers the document type, overall workload in the company, the capabilities of the department, and special instructions. The special instructions may be created by operator intervention so that privileged customer documents are given priority treatment." In contrast, the invention described herein provides details on a specific family of routing methods.

The related invention identified above, describes an invention that defines a method for establishing and executing work procedures in a work process management system for either imaged documents or multimedia documents, but compared to the present invention it uses a different work flow definition method, it is based on a different system architecture, and it has a different theory of operation. It defines a "work process" as a combination of "work baskets," "decision points," "collection points." "events," and "routes". An "object" is the smallest unit within the system (e.g., single document image, voice record, video record), a "folder" is a collection of objects with a common identifier, and a "work package" is (or points to) the entity that is worked on by a work process; a work package has a "work package" is (or points to) the entity that is worked on by a work process; a work package has a "work package identifier" and a "work package instance." The method defines a relational database based on these concepts, populates it with the details of a specific application through use of a work management definition program, and uses the database to maintain the state of work package instances for that specific application through use of control programs at a host computer and workstations. In contrast to (BT9-92-005), the invention described herein partitions the work flow into a centralized control component and a distributed services component, defines the work flow with a general STD, and uses s centralized control mechanism with centralized work queues to dispatch work to those services.

Existing commercially available work flow management software products known to the applicant differ in theory of operation and other key features from the present WFM invention. IBM ImagePlus, used by USAA for automobile insurance form processing, does not use DCE, does not run on a POSIX-based operating system, and does not use the same "pull system" method, among other differences. Likewise holds for TASC-Flow (TASC 1992), used in a major bank's mortgage processing division. The Plexus work flow manger, used by American Express, does not use DCE and has a different theory of operation. Scale-up properties of commercially available work flow management software, while unbounded in principle, are not widely understood except for specific installation.

BACKGROUND

A work flow manager (WFM), or process manager, is the software to manage and control the complete processing or those work items. A WFM is sometimes referred to as a "router" or "traffic cop" since it manages (controls, monitors, maintains) the flow of imaged work; it typically includes a "dispatcher" to apportion out work assignments. Applications of WFMs include the processing of imaged or multimedia documents for health and other insurance forms, filmless radiology, IRS tax submissions, and FBI fingerprint and voice identification.

A combined architecture and method for a scalable, WFM is needed to address emerging, huge-size, federal image processing problems. This WFM software should execute on a POSIX-based operating system (e.g., see (EEE POSIX 1003.1 1990), Lewine 1991), since federal programs typically require POSIX compliance. In addition, such a WFM should be based on OSF Distributed Computing Environment (DCE) OSF DCE 1991), an emerging de facto standard for distributed computing. Prior to this invention, this applicant knows of no such WFM.

An illustrative WFM application is for the IRS Document Processing System (DPS), an IRS Tax System Modernization (TSM) program to automate IRS Service Center operations by scanning incoming paper tax submissions and using image processing techniques to complete the work. DPS is defined in references (IRS DPS RFP 1991), (IRS DPS RFP Amendments), (IRS DPS RFP Q&A), and (IRS TSM 1991), DPS has a peak period processing requirement (in April) of about 232K tax submissions/day (or about 3.1M images/day), per IRS Service Center. The size of DPS at one IRS Service Center is estimated by some measures at over ten times the size of the USAA application of IBM ImagePlus, previously considered a large application of imaging and work flow. Unlike the USAA application however, DPS requires production scanning and optical character recognition.

OBJECTS OF THE INVENTION

An object of the invention is to provide an improved, scalable method for work flow management in a distributed data processing system. An intended and illustrative application of this method is for the IRS DPS.

Another object of the invention is to provide an improved method for work flow management that uses OSF Distributed Computing Environment (DCE) and that can use POSIX-based operating systems in a distributed data processing system.

A further object of the invention is to provide a common "pull system" protocol for component DCE services comprising the application work flow controlled by work flow management in a data processing system.

SUMMARY OF THE INVENTION

Problem Statement. This WFM has the following requirements and properties.

(a) Design Requirements

POSIX based--it can run on a UNIX or UNIX-like operating system;

scalable--the WFM mechanism can scale-up well to production image applications that include automated components, production workers, and knowledge workers, such as the IRS DPS; and

application flexibility--the WFM mechanism applies to the IRS DPS application, and can evolve gracefully as the DPS application evolves over time.

Invention Properties. This WFM design satisfies the above design requirements and in addition has the following properties.

(b) Software Environment Properties

is based on OSF DCE-implements functions of the application process as OSF DCE application services; executes under OSF DCE (OSF DCE 1991;

Uses an "attribute-based file system" to store work-in-process; the attribute-based file system can be implemented with a database system to store the state and other attributes of work objects and with a distributed file service like OSF DCE Distributed File Service (DFS) to store work objects (e.g., files containing imaged documents and related data); and

(c) Invention Properties

(c1) has application generality and flexibility--an application-specific state transition diagram (STD) defines the application process to be managed, where the STD can capture conditional work flow; during operation, the WFM administrator can alter the application process STD under certain conditions; see Chapter 5 in (Rumbaugh 1991) for a definition of a STD;

(c2) uses centralized control software--has a work-in-process manager (software) to manage the states of all work items, and has a work queue manager (software) to manage work items for the application process services;

(c3) fills the service-specific work queues with work items (c.g., unique submission identifiers and minimal service-related work item attributes) such that a work queue for a service can feed multiple copies of that service, and the contents of work queues are resilient to the failure or nonavailability of the services they feed;

(c4) uses an overall "pull system" design (application services "pull" work to do; the mechanism does not "push" work on services) to achieve simplicity of mechanism and to accommodate differences in the time to complete the various process functions (human, semi-automated, automated).

The insight and subtlety of this invention lie in its overall simplicity of mechanism, in the selection ad integration of its design component mechanisms (i.c., DCE, attribute-based file system, embedded STD, centralized control, partitionable and resilient work queues), and in the overall "pull system" design. The applicant knows of no other WFM with the above combination of properties.

BRIEF DESCRIPTION OF THE DRAWINGS

The features and advantages of the invention will be more fully appreciated with reference to the accompanying figures. FIG. 1 to FIG. 7 help present the DPS application, a representative example, its overall architecture and work flow. FIG. 8 and FIG. 9 help illustrate DCE. FIG. 10 shows a representative DCE name space for DPS. FIG. 11, FIG. 12, FIG. 14, FIG. 15, and FIG. 16 illustrate the invention, and FIG. 13 defines the notation used in FIG. 14. FIG. 12 best illustrates the invention (method).

DESCRIPTION

FIG. 1 shows DPS operation form the worker's perspective.

FIG. 2 is a representative DPS architecture, shown as a top-level functional architecture diagram

FIG. 3 is an example DPS distributed architecture consistent with FIG. 2.

FIG. 4 shows DPS conditional flow examples.

FIG. 5 summarizes the DPS conditional flow examples in one diagram.

FIG. 6 lists DPS and non-DPS application software structured as services.

FIG. 7, which combines FIG. 3 and FIG. 6, shows where key DPS software services execute.

FIG. 7, which combines FIG. 3 and FIG. 6, shows where key DPS software services execute.

FIG. 8 shows the layering of DCE and related software.

FIG. 9 is the DCE architecture.

FIG. 10 shows a representative DCE name space for DPS.

FIG. 11 is the top-level functional architecture diagram of the WFM.

FIG. 12 shows the steps in the invention's common pull protocol.

FIG. 13 explains the notation used in FIG. 14 for a state transition diagram, notation adopted from Chapter 5 of reference (Rubmaugh 1991).

FIG. 14 shows an illustrative state transition diagram of the life cycle of a DPS submission.

FIG. 15 contains a flow diagram of the method shown in FIG. 12.

FIG. 16 shows a multimedia, filmless radiology application of the WFM.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENT

1. Context of the Invention

The context of the invention is a distributed computing application with imaged documents, where the image processing software executes on a version of the UNIX operating system (e.g., AIX) with OSF DCE. This section describes an example application context first and OSF DCE context second.

2. Example Application

To illustrate the WFM invention, we use the IRS DPS application, involving the processing of imaged IRS tax submissions (e.g., 1040s), as defined in references (IRS DPS RFP 1991), (IRS DPS RFP Amendments), (IRS DPS RFP Q&A), and (IRS TSM 1991). Other potential applications to illustrate the invention include FBI fingerprint identification in references (FBI IAFIS RFC 1992), (FBI IAFIS SRD 1992), and filmless radiology in reference (Army MDIS RFP 1990). To understand the context of the invention, it is necessary to discuss application-specific work flow detail and a representative systems architecture.

Understanding DPS Pipeline Flow

IRS refers to the flow of work in a Service Center for the processing of incoming tax submissions as the pipeline process. We present DPS pipeline flow with the following topics:

(a) DPS Work Flow Overview

(b) A User's View of DPS Operations

(c) A Representative Distributed Architecture for DPS

(d) Automatic Flows

(e) The Major Flow

(f) Comments on Entity Check

(g) Flow Involving Data Perfection

(h) Conditional Flow

(i) Service Names and Locations

(j) Mandatory and Optional DPS Pipeline Services.

(a) DPS Work Flow Overview

DPS work flow involves the scanning, data capturing, data perfecting, and image archiving of paper IRS tax submissions at each of the ten IRS Service Centers. DPS is also responsible for processing EDI ASCII IRS tax submissions and facsimile IRS tax submissions, and for retrieving archived imaged submissions. DPS does not exist currently in 1992; it is part of the overall IRS Tax Systems Modernization (TSM) initiative (IRS TSM 1991) to improve federal tax processing. The intent is that the various component TSM projects will be contracted, built, and installed in the 1990's. Currently, while limited scanning is done today, typically workers at an IRS Service Center manually transcribe data from paper tax submissions. With DPS, images of tax submissions are moved from one function to the next, with minimal human intervention. DPS will use OCR (optical character recognition) and ICR (intelligent character recognition) techniques to minimize manual data entry, however (post-OCR and post-ICR) manual data capture and character recognition correction are still required to complete data capture since OCR/ICR recognition rates are not 100% for all submissions (currently about 85% on average for combined hand-print and machine-print, over about 95% for machine-print).

In general, DPS converts submissions from paper to image, and then from image to ASCII tax data records (TDRs). After validating a TDR with IRS provided software called TDPS (Tax Data Perfection System), which performs interfield addition checks among other things, DPS attempts to post a validated TDR to an external system named CAPS (Corporate Accounts Processing System). A successful post means that the transaction to add the validated TDR to the tax account of the taxpayer succeeded.

Flow of work (work flow) is conditional. Not all work moves through exactly the same functions; this depends on the current state of a submission. If the paper submission is not imageable or not OCR/ICR-able, then it goes to the manual data capture (MDC) function. If OCR/ICR is not complete, then it goes to the manual character recognition correction (CRC) functions. IRS refers to MDC and CRC together as supplemental data capture (SDC). If the submission is "white mail" (i.e., a letter) or of unrecognizable submission type, then it goes to an external, manual, white mail service for manual processing or manual submission type recognition. If the taxpayer does not use the preprinted label (about 50% of submissions), then its entity data (taxpayer name, address, . . . ) is sent to an external system for an entity check. If a TDR fails the TDPS check or the submission is not signed or the submission is missing component forms, then the submission (images and TDR) goes to the manual data perfection (DP) function where taxpayer notices are created and sent out. If appropriate, from DP the submission can be put into a suspended state, awaiting future taxpayer correspondence.

In general, DPS can archive an imaged tax submission when its submission type is known and, either it has a preprinted label or it passed the entity check; it is not necessary to wait until the submission posts before archiving it. DPS archives the original electronic form of the submission (images, EDI ASCII, facsimile). DPS does not archive the original TDR of a submission, but it does archive a changed TDR (if modified by DP).

(b) A User's View of the DPS Operations

FIG. 1 shows DPS operations form the user's (worker's) perspective; it includes non-DPS functions 110 that provide input to DPS 120. FIG. 1 shows nine steps: (1) delivering mail, (2) unloading the mail, (3) sorting standard-sized containers (envelopes) with the COMPS machines, (4a) processing remittances (e.g., checks), (4b) extracting documents and preparing them for scanning, (5) scanning documents with production scanners and exception scanners, (6) performing SDC, (7) performing DP, (8) operating the archives of imaged submissions, and (9) selectively archiving paper submissions. IRS estimates that over 98% of all paper submissions are production scannable, and that over 99% are scannable. It is expected that eventually, the paper archives, now at Federal Records Centers, will store less than 1% of incoming paper submissions.

FIG. 1 shows various worker roles in 130. The production control monitor is the name of a worker role that oversees pipeline production and runs a daily production meeting. System support worker roles include the system administrator, security administrator, and database administrator. DPS workers are grouped into teams with teams leaders. DPS has a management hierarchy, not sown in FIG. 1.

(c) A Representative Distributed Architecture for DPS

FIG. 2 shows a representative top-level functional system architecture for DPS. This functional architecture diagram shows all major external interfaces to DPS (as internally labeled rectangles with curved corners around the DPS perimeter 200), all major subsystems of DPS (as internally labeled rectangles: 210, 220, 230, 240), all major functions of each subsystem (as a numbered list within each subsystem), and all material flows and data flows (with labeled arrows). This representative DPS architecture has four major subsystems:

Document input Processing Subsystem, 210,

Storage and Retrieval Subsystem, 220,

End User Subsystem, 230, and

Processing Control and Management Reporting Subsystem, 240

FIG. 3 shows a representative distributed architecture for DPS, consistent with FIG. 2. The architecture distributes function across multiple processors for scaling, parallelism, and availability. Each major subsytem contains processor boxes represented as labeled rectangles. Some labeled rectangles represent single processors, some processor clusters (e.g., a pair), and some multiple unclustered processors. A backbone interconnect switch 301, drawn as a tall bold rectangle, connects most of the labeled boxes. FIG. 3 does not show the number of processors of each type, only the function of that box.

The Document Input Processing Subsystem 210 has two major functional components (sub-subsystems): the Image Capture Component 311 and the Forms Processing Component 315. In the Image Capture Component, exactly one scanner 312 is connected to an Image Capture Manager 313, multiple Image Capture Managers can be connected to a FDDI Concentrator 314, and multiple FDDI Concentrators can be connected to the switch 301. In the Forms Processing Component 315, multiple Forms Processing Managers 316, Forms Recognition engines 317, Image Separation engines 318, and Character Recognition .pi.3 engines 319 can be connected to the switch 301. ("Engine" is a synonym for "processor.") One Character Recognition #1 engine and one Character Recognition #2 engine is connected to each Forms Processing Manager 316.

The Storage and Retrieval Subsystem 220 has two major functional components (sub-subsystems): the Temporary Storage Component 321 and the Archival Storage Component 325. The Temporary Storage Component contains one Work-in-Process (WIP) Submission Index Server (processor cluster) 322 and multiple Temporary Storage Servers 323. The Archival Storage Component 325 contains one Archives Submission Index Server (processor cluster) 326, one Archives Server (processor Cluster) 327, multiple Automated Disk Library Servers 328, and one Backup Archives (processor cluster) 329.

The End User Subsystem 230 contains multiple File Servers 331, with multiple Universal Workstations 333 connected to a File Server via a FDDI Concentrator 332.

The Processing Control and Management Reporting Subsystem 240 contains six functions: the Management Reporting Server 341, the System Management Server 342, the Security Server 343, the Software Distribution Server 344, the External Communications Gateway 345, and the EDI ASCII Server 346. Except for facsimile submission images, data communications to/from all DPS external interfaces leave/enter DPS via the External Communications Gateway 345: facsimile submissions enter an Image Capture Manager 313.

(d) Automatic Flows

FIG. 4 summarizes four basic automatic flows (no human intervention after any scanning), 400, as:

______________________________________ (1) IC, FP, TDPS, Post 410 (2) IC, FP, EC, TDPS, Post 420 (3) IC, FP, WM 430 (4) EDI 440 ______________________________________

(1) IC, FP, TDPS, Post. 410. In the representative architecture of DPS, each Image Capture Manager (ICM) 313 performs both submission type recognition and ICR (intelligent character recognition) on the primary form, which is that form that classifies the submission. IC refers to the Image Capture function, which includes scanning, automated quality assurance, submission type recognition, and ICR. Each ICM works on one submission at a time. After submission type recognition and primary form ICR, the ICM writes the submission images and ICR results as files into Temporary Storage 323, then sends an attributes message for this submission to the work-in-process (WIP) manager that executes on the WIP Submission Index Server 322. This WIP manager (with associated work queue manager, protocol, and administration interface) is an instance of the invention WFM. In this representative architecture, submission images have well-defined file names with a common prefix, the USID (unique submission identifier), and an image-relative suffix (e.g., <<USID<;>. 00001, <<USID<;>.00002, . . . ). ICR results appear in a file with the recognition confidence levels.

From the attributes message, the WIP manager creates an entry for this submission in the WIP submission index database, then examines the attributes to determine what to do next. As processing continues, the WIP manager will maintain the state of this and each submission in the WIP submission index database. All submissions go to automated Forms Processing (FP) 315 after IC, where FP creates the initial Tax Data Record (TDR). FP writes the TDR file (e.g., named <<USID<;>.tdr) into Temporary Storage. In this first scenario, we assume that no Entity Check (EC) is needed (e.g., the unchanged IRS label is affixed). So, after FP, the WIP manager routes this submission's TDR to the TDPS service for TDR validation to check interfield arithmetic and other integrity assertions, and awaits the result. In this representative architecture for DPS, the TDPS service executes on each file server 331. If the TDPS result is positive, then the WIP manager routes this submission's TDR to the external Posting service (Post) 353, and awaits the result. If the Posting result is positive, then, except for archiving, we are almost done processing this submission. It is possible that archiving is done before the TDPS check.

In this representative architecture, the enabling condition to archive a submission is when we know the entity (preprinted label affixed or positive EC) and when we know the submission type. The submission type determines the archival retention class (i.e., 7 years for individual returns, 75 years for business and other returns). In this automatic flow scenario, the WIP manager knows the submission type when it first examines the submission attributes. So after the WIP manager creates a database entry for this submission, it simultaneously routes the submission's TDR to the TDPS service and submission's (USID, #images, retention class) to the DPS Archives service. The Archives service, which executes on the Archives Server 327, archives the submission's images then notifies the WIP manager in 322 when this operation completes.

After TDR posting, images are archived and purged from Temporary Storage. A submission's TDR lingers in Temporary Storage for awhile (about 12 days), a DPS requirement. After a required waiting time or after early notification, the WIP manager routes the submission's USID to the File Purging Service and to the TDR Purging Service, to purge (delete) the submission from Temporary Storage.

Two other things happen. First, after both posting and archiving, the WIP manager uses the Submission Index Entry Moving Service, which executes on the Archives Submission Index Server 326, to move the entry for this submission from the WIP submission index database in 322 to the archives submission index database in 326. (This movement can occur during third shift.) Second, the Archived Submission Retention Management Service, which also executes on the Archives Submission Index Server, manages the retention of the archived submission, which can be extended due to court cases.

(2) IC, FP, EC, TDPS, Post. In automatic flow scenario 420, an Entity Check (EC) is required, possibly because an IRS label is not affixed. In this case, after FP, the WIP manager routes the submission to the external EC service 351, and awaits the result. If the EC is positive, then the WIP manager routes work to the TDPS and Posting services, as described in (1), assuming that all results are positive.

(3) IC, FP, WM. In automatic flow scenario 430, submission type recognition in the ICM cannot identify the submission type so the WIP manager routes it to FP, which identifies it as white mail. So, the WIP manager routes it to the external White Mail (WM) service 350. The WIP manager awaits a disposition response from WM to archive or delete the submission.

(4) EDI. 440. The Electronic Filing System (EFS) 354, external to DPS, forwards EDI ASCII submissions to DPS through the EDI ASCII Server 346. In this representative architecture, the EDI In-Boun