|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention disclosed broadly relates to data processing and more particularly relates to the architecture and methods of work flow management in a distributed data processing system.
2. Prior Art
The following patent application relates to the invention:
U.S. patent application Ser. No. 07/902,908 filed Jun. 22, 1992 entitled "System and Method for Establishing Work Procedures in a Work Process Management System," by Marvin Addink, et. al., assigned to the IBM Corporation, now abandoned.
(Mason 1985) U.S. Pat. No. 4,503,499 by G. R. Mason, et al., entitled "Controlled Work Flow System" (Mar. 5, 1985);
(Beizer 1991) U.S. Pat. No. 5,054,096 by M. M. Beizer, entitled "Method and Apparatus for Converting Documents into Electronic Data for Transaction Processing," (Oct. 1, 1991) and the related Invention identified above.
To understand prior art, it is necessary to understand differing associated terminology.
(Mason 1985) describes an invention "system for automating office procedure to coordinate the flow of work on (imaged) documents and the transmittal of documents between office personnel," but it uses a different work flow management method than
the present invention.
"In the controlled work flow process, the scheduling of work tasks in a project is controlled by the program executed by the data processor, which program is called the "work daemon." Each paperwork project involving a multiplicity of documents
and a multiplicity of personnel to work on the documents is referred to as an "effort" and files referred to as RO files, each designating the schedule of paperwork for a given effort, are stored in the memory. Each effort is broken down into tasks with
each task to be performed by one individual worker on one document. These tasks are called "work events." The individual workers and managers who make use of the controlled work flow system are called "users." The work daemon operates on the effort RO
files to notify the users at the work stations that work events are ready to be carried out and issues BORROW, RETURN, STORE, and COPY requests to the central library facility to transmit documents between the work station processors and the central
library facility. When a user indicates that he wishes to proceed with a specified work event, which involves modification of an existing document, the work daemon will issue a BORROW request to the central library facility to cause the document of the
work event to be sent to the work station processor corresponding to the user for the work event. When a user indicates that he has completed a work event on a borrowed or newly created document, the work daemon will issue a RETURN or STORE request to
the central library facility and cause the document, as modified, to be sent back to the central library facility for storage back in the library memory.
"An effort manager will be named to be in charge of each effort and an effort manager will usually have the responsibility for setting up the schedule of work events in a work effort. This schedule is called a "route specification" and each
effort RO file is generated from the information in a route specification. To generate a route specification, the effort manager makes use of a program called the "effort program" in which he enters the data of a route specification into the computer
system by means of one of the work stations. The effort program is preferably resident at each of the remote processors. The data will go first into a route specification file and, from the route specification file, it will be compiled into an effort
RO file and stored in the memory."
(Beizer 1991) describes an invention for subjecting a large volume of scanned documents to transaction processing, together with routing programs to dictate the information flow, but it provides almost no detail on how the routing method works,
It states the following about the routing program, which is the method for work flow management: "Basically the route program decides where the image of the document, images of related documents and data extracted by an optical character reader (OCR),
which together form the "record," are to be transmitted and when. This routing program considers the document type, overall workload in the company, the capabilities of the department, and special instructions. The special instructions may be created
by operator intervention so that privileged customer documents are given priority treatment." In contrast, the invention described herein provides details on a specific family of routing methods.
The related invention identified above, describes an invention that defines a method for establishing and executing work procedures in a work process management system for either imaged documents or multimedia documents, but compared to the
present invention it uses a different work flow definition method, it is based on a different system architecture, and it has a different theory of operation. It defines a "work process" as a combination of "work baskets," "decision points," "collection
points." "events," and "routes". An "object" is the smallest unit within the system (e.g., single document image, voice record, video record), a "folder" is a collection of objects with a common identifier, and a "work package" is (or points to) the
entity that is worked on by a work process; a work package has a "work package" is (or points to) the entity that is worked on by a work process; a work package has a "work package identifier" and a "work package instance." The method defines a
relational database based on these concepts, populates it with the details of a specific application through use of a work management definition program, and uses the database to maintain the state of work package instances for that specific application
through use of control programs at a host computer and workstations. In contrast to (BT9-92-005), the invention described herein partitions the work flow into a centralized control component and a distributed services component, defines the work flow
with a general STD, and uses s centralized control mechanism with centralized work queues to dispatch work to those services.
Existing commercially available work flow management software products known to the applicant differ in theory of operation and other key features from the present WFM invention. IBM ImagePlus, used by USAA for automobile insurance form
processing, does not use DCE, does not run on a POSIX-based operating system, and does not use the same "pull system" method, among other differences. Likewise holds for TASC-Flow (TASC 1992), used in a major bank's mortgage processing division. The
Plexus work flow manger, used by American Express, does not use DCE and has a different theory of operation. Scale-up properties of commercially available work flow management software, while unbounded in principle, are not widely understood except for
specific installation.
BACKGROUND
A work flow manager (WFM), or process manager, is the software to manage and control the complete processing or those work items. A WFM is sometimes referred to as a "router" or "traffic cop" since it manages (controls, monitors, maintains) the
flow of imaged work; it typically includes a "dispatcher" to apportion out work assignments. Applications of WFMs include the processing of imaged or multimedia documents for health and other insurance forms, filmless radiology, IRS tax submissions, and
FBI fingerprint and voice identification.
A combined architecture and method for a scalable, WFM is needed to address emerging, huge-size, federal image processing problems. This WFM software should execute on a POSIX-based operating system (e.g., see (EEE POSIX 1003.1 1990), Lewine
1991), since federal programs typically require POSIX compliance. In addition, such a WFM should be based on OSF Distributed Computing Environment (DCE) OSF DCE 1991), an emerging de facto standard for distributed computing. Prior to this invention,
this applicant knows of no such WFM.
An illustrative WFM application is for the IRS Document Processing System (DPS), an IRS Tax System Modernization (TSM) program to automate IRS Service Center operations by scanning incoming paper tax submissions and using image processing
techniques to complete the work. DPS is defined in references (IRS DPS RFP 1991), (IRS DPS RFP Amendments), (IRS DPS RFP Q&A), and (IRS TSM 1991), DPS has a peak period processing requirement (in April) of about 232K tax submissions/day (or about 3.1M
images/day), per IRS Service Center. The size of DPS at one IRS Service Center is estimated by some measures at over ten times the size of the USAA application of IBM ImagePlus, previously considered a large application of imaging and work flow. Unlike
the USAA application however, DPS requires production scanning and optical character recognition.
OBJECTS OF THE INVENTION
An object of the invention is to provide an improved, scalable method for work flow management in a distributed data processing system. An intended and illustrative application of this method is for the IRS DPS.
Another object of the invention is to provide an improved method for work flow management that uses OSF Distributed Computing Environment (DCE) and that can use POSIX-based operating systems in a distributed data processing system.
A further object of the invention is to provide a common "pull system" protocol for component DCE services comprising the application work flow controlled by work flow management in a data processing system.
SUMMARY OF THE INVENTION
Problem Statement. This WFM has the following requirements and properties.
(a) Design Requirements
POSIX based--it can run on a UNIX or UNIX-like operating system;
scalable--the WFM mechanism can scale-up well to production image applications that include automated components, production workers, and knowledge workers, such as the IRS DPS; and
application flexibility--the WFM mechanism applies to the IRS DPS application, and can evolve gracefully as the DPS application evolves over time.
Invention Properties. This WFM design satisfies the above design requirements and in addition has the following properties.
(b) Software Environment Properties
is based on OSF DCE-implements functions of the application process as OSF DCE application services; executes under OSF DCE (OSF DCE 1991;
Uses an "attribute-based file system" to store work-in-process; the attribute-based file system can be implemented with a database system to store the state and other attributes of work objects and with a distributed file service like OSF DCE
Distributed File Service (DFS) to store work objects (e.g., files containing imaged documents and related data); and
(c) Invention Properties
(c1) has application generality and flexibility--an application-specific state transition diagram (STD) defines the application process to be managed, where the STD can capture conditional work flow; during operation, the WFM administrator can
alter the application process STD under certain conditions; see Chapter 5 in (Rumbaugh 1991) for a definition of a STD;
(c2) uses centralized control software--has a work-in-process manager (software) to manage the states of all work items, and has a work queue manager (software) to manage work items for the application process services;
(c3) fills the service-specific work queues with work items (c.g., unique submission identifiers and minimal service-related work item attributes) such that a work queue for a service can feed multiple copies of that service, and the contents of
work queues are resilient to the failure or nonavailability of the services they feed;
(c4) uses an overall "pull system" design (application services "pull" work to do; the mechanism does not "push" work on services) to achieve simplicity of mechanism and to accommodate differences in the time to complete the various process
functions (human, semi-automated, automated).
The insight and subtlety of this invention lie in its overall simplicity of mechanism, in the selection ad integration of its design component mechanisms (i.c., DCE, attribute-based file system, embedded STD, centralized control, partitionable
and resilient work queues), and in the overall "pull system" design. The applicant knows of no other WFM with the above combination of properties.
BRIEF DESCRIPTION OF THE DRAWINGS
The features and advantages of the invention will be more fully appreciated with reference to the accompanying figures. FIG. 1 to FIG. 7 help present the DPS application, a representative example, its overall architecture and work flow. FIG. 8
and FIG. 9 help illustrate DCE. FIG. 10 shows a representative DCE name space for DPS. FIG. 11, FIG. 12, FIG. 14, FIG. 15, and FIG. 16 illustrate the invention, and FIG. 13 defines the notation used in FIG. 14. FIG. 12 best illustrates the invention
(method).
DESCRIPTION
FIG. 1 shows DPS operation form the worker's perspective.
FIG. 2 is a representative DPS architecture, shown as a top-level functional architecture diagram
FIG. 3 is an example DPS distributed architecture consistent with FIG. 2.
FIG. 4 shows DPS conditional flow examples.
FIG. 5 summarizes the DPS conditional flow examples in one diagram.
FIG. 6 lists DPS and non-DPS application software structured as services.
FIG. 7, which combines FIG. 3 and FIG. 6, shows where key DPS software services execute.
FIG. 7, which combines FIG. 3 and FIG. 6, shows where key DPS software services execute.
FIG. 8 shows the layering of DCE and related software.
FIG. 9 is the DCE architecture.
FIG. 10 shows a representative DCE name space for DPS.
FIG. 11 is the top-level functional architecture diagram of the WFM.
FIG. 12 shows the steps in the invention's common pull protocol.
FIG. 13 explains the notation used in FIG. 14 for a state transition diagram, notation adopted from Chapter 5 of reference (Rubmaugh 1991).
FIG. 14 shows an illustrative state transition diagram of the life cycle of a DPS submission.
FIG. 15 contains a flow diagram of the method shown in FIG. 12.
FIG. 16 shows a multimedia, filmless radiology application of the WFM.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENT
1. Context of the Invention
The context of the invention is a distributed computing application with imaged documents, where the image processing software executes on a version of the UNIX operating system (e.g., AIX) with OSF DCE. This section describes an example
application context first and OSF DCE context second.
2. Example Application
To illustrate the WFM invention, we use the IRS DPS application, involving the processing of imaged IRS tax submissions (e.g., 1040s), as defined in references (IRS DPS RFP 1991), (IRS DPS RFP Amendments), (IRS DPS RFP Q&A), and (IRS TSM 1991).
Other potential applications to illustrate the invention include FBI fingerprint identification in references (FBI IAFIS RFC 1992), (FBI IAFIS SRD 1992), and filmless radiology in reference (Army MDIS RFP 1990). To understand the context of the
invention, it is necessary to discuss application-specific work flow detail and a representative systems architecture.
Understanding DPS Pipeline Flow
IRS refers to the flow of work in a Service Center for the processing of incoming tax submissions as the pipeline process. We present DPS pipeline flow with the following topics:
(a) DPS Work Flow Overview
(b) A User's View of DPS Operations
(c) A Representative Distributed Architecture for DPS
(d) Automatic Flows
(e) The Major Flow
(f) Comments on Entity Check
(g) Flow Involving Data Perfection
(h) Conditional Flow
(i) Service Names and Locations
(j) Mandatory and Optional DPS Pipeline Services.
(a) DPS Work Flow Overview
DPS work flow involves the scanning, data capturing, data perfecting, and image archiving of paper IRS tax submissions at each of the ten IRS Service Centers. DPS is also responsible for processing EDI ASCII IRS tax submissions and facsimile IRS
tax submissions, and for retrieving archived imaged submissions. DPS does not exist currently in 1992; it is part of the overall IRS Tax Systems Modernization (TSM) initiative (IRS TSM 1991) to improve federal tax processing. The intent is that the
various component TSM projects will be contracted, built, and installed in the 1990's. Currently, while limited scanning is done today, typically workers at an IRS Service Center manually transcribe data from paper tax submissions. With DPS, images of
tax submissions are moved from one function to the next, with minimal human intervention. DPS will use OCR (optical character recognition) and ICR (intelligent character recognition) techniques to minimize manual data entry, however (post-OCR and
post-ICR) manual data capture and character recognition correction are still required to complete data capture since OCR/ICR recognition rates are not 100% for all submissions (currently about 85% on average for combined hand-print and machine-print,
over about 95% for machine-print).
In general, DPS converts submissions from paper to image, and then from image to ASCII tax data records (TDRs). After validating a TDR with IRS provided software called TDPS (Tax Data Perfection System), which performs interfield addition checks
among other things, DPS attempts to post a validated TDR to an external system named CAPS (Corporate Accounts Processing System). A successful post means that the transaction to add the validated TDR to the tax account of the taxpayer succeeded.
Flow of work (work flow) is conditional. Not all work moves through exactly the same functions; this depends on the current state of a submission. If the paper submission is not imageable or not OCR/ICR-able, then it goes to the manual data
capture (MDC) function. If OCR/ICR is not complete, then it goes to the manual character recognition correction (CRC) functions. IRS refers to MDC and CRC together as supplemental data capture (SDC). If the submission is "white mail" (i.e., a letter)
or of unrecognizable submission type, then it goes to an external, manual, white mail service for manual processing or manual submission type recognition. If the taxpayer does not use the preprinted label (about 50% of submissions), then its entity data
(taxpayer name, address, . . . ) is sent to an external system for an entity check. If a TDR fails the TDPS check or the submission is not signed or the submission is missing component forms, then the submission (images and TDR) goes to the manual data
perfection (DP) function where taxpayer notices are created and sent out. If appropriate, from DP the submission can be put into a suspended state, awaiting future taxpayer correspondence.
In general, DPS can archive an imaged tax submission when its submission type is known and, either it has a preprinted label or it passed the entity check; it is not necessary to wait until the submission posts before archiving it. DPS archives
the original electronic form of the submission (images, EDI ASCII, facsimile). DPS does not archive the original TDR of a submission, but it does archive a changed TDR (if modified by DP).
(b) A User's View of the DPS Operations
FIG. 1 shows DPS operations form the user's (worker's) perspective; it includes non-DPS functions 110 that provide input to DPS 120. FIG. 1 shows nine steps: (1) delivering mail, (2) unloading the mail, (3) sorting standard-sized containers
(envelopes) with the COMPS machines, (4a) processing remittances (e.g., checks), (4b) extracting documents and preparing them for scanning, (5) scanning documents with production scanners and exception scanners, (6) performing SDC, (7) performing DP, (8)
operating the archives of imaged submissions, and (9) selectively archiving paper submissions. IRS estimates that over 98% of all paper submissions are production scannable, and that over 99% are scannable. It is expected that eventually, the paper
archives, now at Federal Records Centers, will store less than 1% of incoming paper submissions.
FIG. 1 shows various worker roles in 130. The production control monitor is the name of a worker role that oversees pipeline production and runs a daily production meeting. System support worker roles include the system administrator, security
administrator, and database administrator. DPS workers are grouped into teams with teams leaders. DPS has a management hierarchy, not sown in FIG. 1.
(c) A Representative Distributed Architecture for DPS
FIG. 2 shows a representative top-level functional system architecture for DPS. This functional architecture diagram shows all major external interfaces to DPS (as internally labeled rectangles with curved corners around the DPS perimeter 200),
all major subsystems of DPS (as internally labeled rectangles: 210, 220, 230, 240), all major functions of each subsystem (as a numbered list within each subsystem), and all material flows and data flows (with labeled arrows). This representative DPS
architecture has four major subsystems:
Document input Processing Subsystem, 210,
Storage and Retrieval Subsystem, 220,
End User Subsystem, 230, and
Processing Control and Management Reporting Subsystem, 240
FIG. 3 shows a representative distributed architecture for DPS, consistent with FIG. 2. The architecture distributes function across multiple processors for scaling, parallelism, and availability. Each major subsytem contains processor boxes
represented as labeled rectangles. Some labeled rectangles represent single processors, some processor clusters (e.g., a pair), and some multiple unclustered processors. A backbone interconnect switch 301, drawn as a tall bold rectangle, connects most
of the labeled boxes. FIG. 3 does not show the number of processors of each type, only the function of that box.
The Document Input Processing Subsystem 210 has two major functional components (sub-subsystems): the Image Capture Component 311 and the Forms Processing Component 315. In the Image Capture Component, exactly one scanner 312 is connected to an
Image Capture Manager 313, multiple Image Capture Managers can be connected to a FDDI Concentrator 314, and multiple FDDI Concentrators can be connected to the switch 301. In the Forms Processing Component 315, multiple Forms Processing Managers 316,
Forms Recognition engines 317, Image Separation engines 318, and Character Recognition .pi.3 engines 319 can be connected to the switch 301. ("Engine" is a synonym for "processor.") One Character Recognition #1 engine and one Character Recognition #2
engine is connected to each Forms Processing Manager 316.
The Storage and Retrieval Subsystem 220 has two major functional components (sub-subsystems): the Temporary Storage Component 321 and the Archival Storage Component 325. The Temporary Storage Component contains one Work-in-Process (WIP)
Submission Index Server (processor cluster) 322 and multiple Temporary Storage Servers 323. The Archival Storage Component 325 contains one Archives Submission Index Server (processor cluster) 326, one Archives Server (processor Cluster) 327, multiple
Automated Disk Library Servers 328, and one Backup Archives (processor cluster) 329.
The End User Subsystem 230 contains multiple File Servers 331, with multiple Universal Workstations 333 connected to a File Server via a FDDI Concentrator 332.
The Processing Control and Management Reporting Subsystem 240 contains six functions: the Management Reporting Server 341, the System Management Server 342, the Security Server 343, the Software Distribution Server 344, the External
Communications Gateway 345, and the EDI ASCII Server 346. Except for facsimile submission images, data communications to/from all DPS external interfaces leave/enter DPS via the External Communications Gateway 345: facsimile submissions enter an Image
Capture Manager 313.
(d) Automatic Flows
FIG. 4 summarizes four basic automatic flows (no human intervention after any scanning), 400, as:
______________________________________ (1) IC, FP, TDPS, Post 410 (2) IC, FP, EC, TDPS, Post 420 (3) IC, FP, WM 430 (4) EDI 440 ______________________________________
(1) IC, FP, TDPS, Post. 410. In the representative architecture of DPS, each Image Capture Manager (ICM) 313 performs both submission type recognition and ICR (intelligent character recognition) on the primary form, which is that form that
classifies the submission. IC refers to the Image Capture function, which includes scanning, automated quality assurance, submission type recognition, and ICR. Each ICM works on one submission at a time. After submission type recognition and primary
form ICR, the ICM writes the submission images and ICR results as files into Temporary Storage 323, then sends an attributes message for this submission to the work-in-process (WIP) manager that executes on the WIP Submission Index Server 322. This WIP
manager (with associated work queue manager, protocol, and administration interface) is an instance of the invention WFM. In this representative architecture, submission images have well-defined file names with a common prefix, the USID (unique
submission identifier), and an image-relative suffix (e.g., <<USID<;>. 00001, <<USID<;>.00002, . . . ). ICR results appear in a file with the recognition confidence levels.
From the attributes message, the WIP manager creates an entry for this submission in the WIP submission index database, then examines the attributes to determine what to do next. As processing continues, the WIP manager will maintain the state
of this and each submission in the WIP submission index database. All submissions go to automated Forms Processing (FP) 315 after IC, where FP creates the initial Tax Data Record (TDR). FP writes the TDR file (e.g., named <<USID<;>.tdr)
into Temporary Storage. In this first scenario, we assume that no Entity Check (EC) is needed (e.g., the unchanged IRS label is affixed). So, after FP, the WIP manager routes this submission's TDR to the TDPS service for TDR validation to check
interfield arithmetic and other integrity assertions, and awaits the result. In this representative architecture for DPS, the TDPS service executes on each file server 331. If the TDPS result is positive, then the WIP manager routes this submission's
TDR to the external Posting service (Post) 353, and awaits the result. If the Posting result is positive, then, except for archiving, we are almost done processing this submission. It is possible that archiving is done before the TDPS check.
In this representative architecture, the enabling condition to archive a submission is when we know the entity (preprinted label affixed or positive EC) and when we know the submission type. The submission type determines the archival retention
class (i.e., 7 years for individual returns, 75 years for business and other returns). In this automatic flow scenario, the WIP manager knows the submission type when it first examines the submission attributes. So after the WIP manager creates a
database entry for this submission, it simultaneously routes the submission's TDR to the TDPS service and submission's (USID, #images, retention class) to the DPS Archives service. The Archives service, which executes on the Archives Server 327,
archives the submission's images then notifies the WIP manager in 322 when this operation completes.
After TDR posting, images are archived and purged from Temporary Storage. A submission's TDR lingers in Temporary Storage for awhile (about 12 days), a DPS requirement. After a required waiting time or after early notification, the WIP manager
routes the submission's USID to the File Purging Service and to the TDR Purging Service, to purge (delete) the submission from Temporary Storage.
Two other things happen. First, after both posting and archiving, the WIP manager uses the Submission Index Entry Moving Service, which executes on the Archives Submission Index Server 326, to move the entry for this submission from the WIP
submission index database in 322 to the archives submission index database in 326. (This movement can occur during third shift.) Second, the Archived Submission Retention Management Service, which also executes on the Archives Submission Index Server,
manages the retention of the archived submission, which can be extended due to court cases.
(2) IC, FP, EC, TDPS, Post. In automatic flow scenario 420, an Entity Check (EC) is required, possibly because an IRS label is not affixed. In this case, after FP, the WIP manager routes the submission to the external EC service 351, and awaits
the result. If the EC is positive, then the WIP manager routes work to the TDPS and Posting services, as described in (1), assuming that all results are positive.
(3) IC, FP, WM. In automatic flow scenario 430, submission type recognition in the ICM cannot identify the submission type so the WIP manager routes it to FP, which identifies it as white mail. So, the WIP manager routes it to the external
White Mail (WM) service 350. The WIP manager awaits a disposition response from WM to archive or delete the submission.
(4) EDI. 440. The Electronic Filing System (EFS) 354, external to DPS, forwards EDI ASCII submissions to DPS through the EDI ASCII Server 346. In this representative architecture, the EDI In-Boun | | |