|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the preliminary processing of documents
which reflect business transactions and particularly to the correction of
data commensurate with information entered by hand on documents evidencing
retail sales, the documents having been initially processed by optical
character recognition apparatus. More specifically, this invention is
directed to optical character recognition systems for reading, correcting
and preliminarily processing data, including handprinted information, from
documents pertaining to retail sales transactions. The present invention
also encompasses uniquely formatted documents for use by such merchants.
Accordingly, the general objects of the present invention are to provide
novel and improved methods, systems and documents of such characters.
2. Description of the Prior Art
While not limited thereto in its utility, the present invention is
particularly well suited for use in the field of "remittance processing",
i.e., in the recovery and initial processing of information entered on
documents which are completed at the time a charge card holder enters into
a retail transaction with a merchant. The exceedingly high volume of such
transactions, coupled with the desire to minimize the time between the
transaction and the charging and/or crediting of the dollar amount thereof
to the appropriate accounts, imposes the requirements of speed and
accuracy on the systems and techniques for processing the related
documents.
Optical character recognition (OCR) apparatus has previously been used in
"remittance processing" to read information from documents of the type
alluded to above. The prior art OCR systems were capable of "reading",
with an acceptable speed and low rejection rate, machine printed
information found on documents completed at the time of retail sales.
Prior OCR apparatus could not, however, recognize handprinted information,
such as dollar amounts, at an acceptably low rejection rate with the
requisite document through-put rate. Accordingly, the conventional prior
technique has been to capture an image of the entire portion of each
document being processed where a dollar amount might be entered by hand.
These images were then presented to an operator sitting at a video display
terminal (VDT) so that the amounts represented by the images could be
visually observed and key entered thus creating a complete data record for
each document. These document data records included the machine read
information, such as an account number, and the dollar amount which was
key entered. It has, in the prior art, often been necessary to key enter
additional handprinted information, such as the date of the transaction,
and such key entry of information was also necessary when the OCR
apparatus could not read the essential machine printed information on the
document. Obviously, the speed of document processing could be increased
and/or the number of VDT terminals could be reduced if the need to key
enter information for every document being processed could be eliminated.
Optical character recognition apparatus capable of recognizing handprinted
characters with a high degree of accuracy is known in the art. Such
apparatus, however, is generally characterized by a document through put
rate which would be unacceptably slow for use in the field of "remittance
processing". However, less sophisticated OCR apparatus, particularly
apparatus which attempts to recognize characters by simultaneously
matching data derived from the scanning of the characters with comparable
stored data, i.e., template masks or models, commensurate with know
characters, is capable of operation at a speed suitable for "remittance
processing". This capability, however, has not previously been employed
because characters have not been entered on the retail sales drafts and
related documents with sufficient care as to location and/or character
formation to ensure an acceptably low rejection rate. Since OCR apparatus
must read a character by looking for variations in contrast, i.e., a dark
trace against a light background, it has been the prior practice to
attempt to guide the location and formation of characters on documents
through the use of "fade out" boxes, i.e., rectangles printed in a light
color on a white background, the light color not interfering with the
operation of the scanner in the OCR apparatus. Use of such "fade out"
boxes has not proven to be successful in constraining either the size or
location of handprinted characters on documents completed at the time of
retail sales transactions.
As noted above, it has been the previous practice to capture an image of
the entire "field" in which a dollar amount will be entered on a sales
draft document or the like. This field will typically have a minimum
length of at least six character spaces and may also include a further
space for the decimal point between the dollars and cents portions of the
field. In most cases, the dollar amount actually entered on the document
comprises fewer than six characters. The captured dollar amount field
image, which is subsequently digitized, thus contains unnecessary
information, i.e., the blank character spaces, in the prior art. The
necessity of transmitting and processing such unnecessarily long digitized
image records slows down the operation of the system.
SUMMARY OF THE INVENTION
The present invention overcomes the above-discussed and other deficiencies
and disadvantages of the prior art and, in so doing, provides a novel and
improved system and technique for the processing of documents,
particularly documents which pertain to retail sales transactions. The
present invention also encompasses novel documents for use with such
system and technique. A sales draft document in accordance with the
present invention is designed so as to constrain the user thereof to print
characters within the confines of precisely defined character spaces
located within a field. This is accomplished by defining the character
spaces as uncolored "windows" within a heavy, dark, solid colored band,
i.e., if the user prints outside of the window, he or she will be unable
to visually perceive the printing. The documents of the invention are
multi-page and the lowermost page which will be machine read, does not
include any character space delimiting indicia in the field to be read.
The document copy to be processed by the OCR apparatus does, however,
include a machine readable code which will tell the apparatus precisely
where the fields of interest, i.e., the series of unbounded character
spaces, are located. Experiments have shown that this technique results in
a sufficiently high percentage of the dollar amounts handprinted on the
documents being formed with adequate care so that the entire dollar
amounts are recognizable by OCR apparatus which operates by matching data
commensurate with each character to data commensurate with a template mask
or model corresponding to plural known characters, the comparison being
done in parallel fashion.
In accordance with a preferred embodiment of the present invention, the
captured field images are "truncated", i.e., are limited to the recognized
field size. Restated, blank spaces, particularly blank spaces in the
dollar portion of an amount field, are deleted and thus do not comprise
part of the digitized record of the captured field image.
A particularly novel aspect of the present invention comprises the
"tagging" of images as "rejects", i.e., unrecognized, and as dollar
amounts and the subsequent storage of such tagged image records as a
function of their nature. Thus, by way of example, each digitized dollar
amount image will be formatted such that its preamble includes a code or
pointer which indicates that it is in fact a dollar amount image. In
addition, in each case where all of the characters comprising the dollar
image have not been recognized, the image preamble will include a second
code or pointer which indicates that it is a reject. A data record will be
generated for each document being processed and that data record will
indicate whether, for the corresponding document, an image record exists
and, if so, whether a reject image record also exists. The document data
record, dollar amount image records, and reject image records are
separately stored, i.e., separate files are provided. Only the reject
images are displayed at a VDT terminal for key entry of the visually
observed amount, i.e. the amount which the OCR apparatus could not
recognize. The key entry of the amounts from the display of the reject
images results in the updating of the document data record for the
corresponding document by insertion of the key entered amount therein.
When the "correction" of all reject images has been completed, each
account is checked to determine whether it is in balance. Only in those
cases where the account is not in balance, as indicated for example by the
total indicated by a merchant for a group of sales drafts not equaling the
computed total of the actual amounts on those sales drafts, the images
from the dollar amount image file will be displayed along with the dollar
amounts from the data records so that the operator may make a comparison
and key enter any further adjustments.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention may be better understood and its numerous objects and
advantages will become apparent to those skilled in the art by reference
to the accompanying drawing wherein:
FIG. 1 is a general system block diagram of apparatus in accordance with
the invention, the apparatus including a scanner subsystem and an editing
subsystem;
FIG. 2 is a generalized flow diagram for a document scanner subsystem in
accordance with a preferred embodiment of the invention;
FIG. 3 is a schematic representation of the manner in which documents to be
scanned in accordance with the technique depicted in FIG. 2 may be
batched;
FIG. 4A is a copy of a merchant's copy of a typical deposit transmittal
form which is employed in the practice of the present invention;
FIG. 4B is the bank, i.e. machine readable, copy of the form of FIG. 4A;
FIG. 5A is a copy of a merchant's copy of a typical sales draft which is
employed in the practice of the present invention;
FIG. 5B is the cardholder copy of the document of FIG. 5A;
FIG. 5C is the bank, i.e. machine readable, copy of the document of FIG.
5A;
FIGS. 6-1 and 6-2 are a block diagram of the scanner subsystem of apparatus
accordance with a preferred embodiment of the invention;
FIGS. 7A, 7B and 7C are a detailed flow chart for the scanner subsystem of
FIG. 6.
FIG. 8 a logic diagram depicting the preprocessing of document related
information received from the scanner subsystem by the editing subsystem
in accordance with the preferred embodiment of the invention;
FIGS. 9A and 9B are a flow chart depicting the correction and manipulation
of the preprocessed information by the editing subsystem;
FIG. 10 is a block diagram of the editing subsystem which implements the
processing represented in FIGS. 8 and 9;
FIG. 11 is a representation of a data record for the document of FIG. 4B as
generated by the editing subsystem; and
FIG. 12 is a representation of a data record for the document of FIG. 5C as
generated by the editing subsystem.
DESCRIPTION OF THE DISCLOSED EMBODIMENT
With reference to FIG. 1, documents which are to be processed in accordance
with the present invention are initially "batched", i.e. are arranged in a
logical sequence which will be discussed below. A batch may function as a
source of, for example, up to 2,000 documents. These source documents,
indicated at 10 in FIG. 1, are delivered to a scanner subsystem 12. As
will be described in more detail in the discussion below of FIGS. 6 and 7,
the functions of the scanner subsystem include image capture and
subsequent digitization, character recognition, image encoding and
generation of data records. The encoding preferably includes compression
of digitized images. The scanner subsystem thus includes data processors
and associated memories. The scanner subsystem also includes a serializer
for imprinting serial numbers on the documents and a camera, a microfilm
camera for example, for capturing and storing images of one or both sides
of the serialized (numbered) documents. The scanner subsystem further
includes a document transport for moving the batched documents
individually and serially through an image capture module and the
serializer and camera. The document transport comprises a feeder section,
a transport section and document stackers located downstream o the camera.
If deemed necessary, the scanner subsystem can also include an
input/output device which will provide hard copies of "reports" containing
information concerning the stream of documents being processed.
The signals outputted by the scanner subsystem are delivered, via a data
link 14, to an editing subsystem which has been indicated generally at 16.
The editing subsystem 16 includes provision for storage, "correction" and
preliminary processing, i.e., account reconciliation, of the digitized
information transmitted via the data link 14. For the reasons to be
explained below, the data storage in the editing subsystem will establish
a plurality of separate files as indicated on FIG. 1. The transmitted
information for a document, in the embodiment being described, will
include a data record and may also include an image record. In accordance
with the invention, the images comprising a document image record may be
of a portion of the "field" or "fields" of interest only, i.e., only those
parts of the areas of the document where relevant data appears or should
appear will be imaged and digitized. In the case of dollar amounts
comprised of hand-printed characters, the images will be "tagged" either
as completely recognized or as reject, i.e., unreadable, images. A dollar
amount image, regardless of whether it is recognized or a reject, will be
comprised of that portion of the amount field, beginning with the least
significant digit, which includes characters or the like.
It is to be noted that, either alternatively or as a backup, the data
outputted from the scanner subsystem 12 may be recorded on a magnetic tape
18 or other suitable storage medium and the record then physically moved
to the editing sub-system and the recorded data transferred thereto under
the supervision of a tape controller 20 or the like.
FIG. 2 depicts the flow of the documents to and through the scanner
subsystem 12 of FIG. 1. As noted above, the documents are batched and then
serially transported through the scanner where they are "read". The
scanner will read, but will ordinarily not capture an image of, machine
printed information on each document. This machine printed information
will include indicia which identifies the document by type. In the
simplified example to be described below, the document types will comprise
block headers, deposit transmittals and sales drafts. Other document types
such as cash letters, cash advances and credit vouchers can also be
included within a batch and processed. In order to achieve maximum
efficiency, these document types must be read in the correct sequence. By
comparing the type identification of the document being read with that of
the previously read document, the scanner subsystem can determine if the
documents have been arranged in the correct order. If the documents are
out of order the transport will be stopped for problem identification. The
scanner, particularly the image capture device therein, is controllable so
as to permit the imaging of "fields" on the document. Depending on
document type, a "field" may be provided for an account number, date, an
authorization, a transaction type identification code and a dollar amount.
With the exception of the account number and transaction code, the data in
these fields will typically be inserted by hand. The captured images are
digitized and transmitted to recognition logic which will be described in
more detail in the discussion of FIGS. 6 and 7. The document, after having
been "read", will be imprinted with identification numbers which are
serialized. The entire document, with the serial numbers imprinted
thereon, will subsequently be microfilmed or imaged in some other suitable
manner. The documents, after microfilming, will be transported to a
document stacker and, subsequent to stacking, will be removed from the
system for storage and ultimate disposition.
As noted above, the recognition logic and associated devices which comprise
the scanner subsystem will generate a data record and may also generate an
image record, i.e., digitized and compressed data commensurate with the
images of scanned fields, for each document which is read. A document data
record may include information commensurate with an account number, the
type of document, the type of transaction, a dollar amount, the assigned
serial number, an amount image pointer or pointers, the date and the
results of various tests. As will be described in greater detail below, an
amount image pointer will be a multibit flag which identifies where the
image corresponding to the amount portion of the data record is stored.
FIG. 3 represents a manner in which documents might be batched for
processing in accordance with the disclosed embodiment of the present
invention. Each batch begins with a control document, i.e., a "block
header". The reading of a block header document controls the opening and
closing of batches. A block header typically carries a batch number, an
employee number, the date and possibly a department number. If a field
where this identifying information should be present is blank on the block
header, or if the characters in the field cannot be recognized, the
processing of the document data record by the editing subsystem will
result in an operator being prompted to key enter the information
appropriate to the blank field. In actual practice a cash letter document
is usually the second document in a batch and can be found anywhere else
within the batch. The cash letter document, which will not be discussed in
detail herein, will typically be prepared by a local financial institution
where the merchant that has engaged in the sales represented by the
deposit transmittal and sales drafts which will follow in the batch until
the next cash letter is encountered, has its account. A cash letter will,
as in the case of a block header, be identified by indicia imprinted
thereon and when this indicia is read the cash letter document will
function as a control document. The cash letter will be an order for the
entity processing the documents, typically a regional bank, to credit, to
the account of the financial institution that has forwarded the batched
documents for processing, an amount which has been entered in a field on
the cash letter.
The next document in a batch, and the first document following the block
header in the simplified example to be explained herein, will consist of a
deposit transmittal form which is completed by a merchant. In actual
practice a batch will include many deposit transmittals with each such
deposit transmittal being followed by a plurality of sales drafts. FIG. 4
is a copy of an example of a summary document, i.e., a merchant's deposit
transmittal form with FIG. 4A being the MERCHANT COPY, i.e., the copy
which is retained by the merchant. FIG. 4B is the "BANK COPY" of the
deposit transmittal and is the copy which will be included within the
| | |