|
Description  |
|
|
FIELD OF THE INVENTION
This invention relates to electronic document processing systems and, more
particularly, to methods and means for more tightly coupling the usual
hardcopy output of such systems to the electronic documents from which the
human readable hardcopies are produced. The coupling afforded by this
invention may be sufficiently tight to enable printed, human readable
hardcopy documents to be employed as an essentially lossless medium for
storing and transferring digital electronic documents. Or, such coupling
may be utilized to capture otherwise unavailable or not easily discernible
information relevant to the reproduction of the electronic source document
BACKGROUND OF THE INVENTION
Modern electronic document processing systems generally include input
scanners for electronically capturing the general appearance (i.e., the
human readable information content and the basic graphical layout) of
human readable hardcopy documents; programmed computers for enabling users
to create, edit and otherwise manipulate electronic documents; and
printers for producing hardcopy, human readable renderings of electronic
documents. These systems typically have convenient access to mass memory
for the starage and retrieval of electronic document files. Moreover, they
often are networked by local area networks (LANs), switched data links,
and the like for facilitating the interchange of digital electronic
documents and for providing multi-user access to shared system resources,
such as high speed electronic printers and electronic file servers.
The technical details pertaining to the interchangeability of electronic
documents are beyond the scope of this invention, but it should be
understood that there is not yet an "universal interchange standard" for
losslessly interchanging "structured electronic documents" (i.e.,
documents conforming to predefined rules governing their constituent
elements, the characteristics of those elements, and the
interrelationships among their elements). Plain text ASCII encoding is
becoming a de facto interchange standard, but it is of limited utility for
representing structured electronic documents. Other encoding formats
provide fuller structural representations of electronic documents, but
they usually are relatively system specific. For example, some of the more
basic document description languages (DDLs) employ embedded control codes
for supplementing ASCII encodings with variables defining the logical
structure (i.e., the sections, paragraphs, sentences, figures, figure
captions, etc.) of electronic documents, thereby permitting such documents
to be formatted in accordance with selected formatting variables, such as
selected font styles, font sizes, line and paragraph spacings, margins,
indentations, header and footer locations, and columns. Graphical DDL
encodings provide more sophisticated and complete representations of
electronic document structures because they encode both the logical
structure and the layout structure of such documents. Page description
language (PDL) encodings are related to graphical DDL encodings, but they
are designed so that they can be readily decomposed or interpreted to
define the detailed layout of the printed page in a raster scan format.
Accordingly, it will be appreciated that the transportability of
electronic documents from one document processing system to another
depends upon the ability of the receiving or "target" system to interpret,
either directly or through the use of a format converter, the encoding
format in which the document is provided by the originating or "source"
system. To simplify this disclosure, source/target encoding format
compatibility will be assumed, but it should be clearly understood that
this is a simplifying assumption.
Others previously have proposed printing digital data, including electronic
document files, on a recording medium, such as plain paper, so that
optical readers can be employed for uploading the data into electronic
document processing systems. See, for example, Brass et al U.S. Pat. No.
4,754,127, which issued Jun. 28, 1988 on "Method and Apparatus for
Transforming Digitally Encoded Data into Printed Data Strips," and Brass
et al U.S. Pat. No. 4,782,221, which issued Nov. 1, 1988 on "Printed Data
Strip Including Bit-Encoded Information and Scanner Control." In view of
the additional insights provided by the user documentation for "The Laser
Archivist," Cauzin Systems, Inc., 1987, it is believed that the so-called
"data strips" this prior work has provided are printed as physically
distinct entities. Accordingly, the user can use a standard "cut and
paste" process for attaching such data strips, if desired, to the human
readable renderings of the files to which they pertain. In this system,
the scanner used to read the printed data strips is not a general-purpose
document scanner, but rather, a special-purpose hand-held computer
peripheral optimized for reading said data strips, as specified in Brass
et al., U.S. Pat. No. 4,692,603, "Optical reader for printed bit-encoded
data and method of reading same," which issued Sep. 8, 1987. Thus this
system could not be said to close the loop between common document
production and reprographic equipment, as the present invention intends.
Drexler U.S. Pat. No. 4,665,004, which issued May 12, 1987 on "Method for
Dual Image Recording of Medical Data," also is interesting because it
proposes using a specialized optical recording system and recording medium
for optically recording the raw digital data for a computer generated
pictorial image in a form that permits the raw data (including digitized
versions of any optional written or oral annotations) to be physically
secured to the human readable, hardcopy rendering of the image. However,
that approach has the drawback of requiring the use of different recording
mechanisms for producing the machine readable digital data representation
and the human readable rendering. Moreover, the digital data is not
recorded in a form that permits it to be readily copied using ordinary
office equipment.
A commonly assigned J. J. Daniele United States patent which issued Mar. 1,
1988 as U.S. Pat. No. 4,728,984 on "Data Handling and Archiving System" is
believed to be especially noteworthy because it relates to the use of an
electronic printer for recording digital data on plain paper, together
with the use of an input scanner for scanning digital data that has been
recorded on such a recording medium to upload the data into the internal
computer of the printer. The Daniele '984 patent discusses several
subjects which are meaningful to the present invention, including the
redundant recording of digital information, the archival storage and
distribution of digital data recorded on plain paper, the compression that
can be achieved by digitally recording text and graphics, the data
security that can be achieved by encrypting digitally recorded text and
graphics, Moreover, it discloses a typical printer and a typical input
scanner in substantial detail. Therefore, the '984 patent hereby is
incorporated by reference.
Paper documents still are a primary medium for written communications and
for record keeping. They can be replicated easily by photocopying, they
can be distributed and filed in original or photocopied form, and
facsimiles of them can be transmitted to remote locations over the public
switched telephone network. Paper and other hardcopy documents are so
pervasive that they are not only a common output product of electronic
document processing systems, but also an important source of input data
for such systems.
In recognition of the fundamental role human readable hardcopy documents
play in modern society, input scanners have been developed for uploading
them into electronic document processing systems. These scanners typically
convert the appearance of the hardcopy into a raster formatted, digital
data stream, thereby providing a bit mapped representation of the hardcopy
appearance. However, bit maps require relatively large amounts of memory
and are difficult to edit and manipulate, so substantial effort and
expense have been devoted to the development of recognition processes for
converting bit mapped document appearances into corresponding symbolic
encodings. Unfortunately, recognition processes generally are inferential
and of limited scope, so they have difficulty correlating unusual bit map
patterns with corresponding encodings and they are prone to making
inference errors even when they determine that a correlation exists.
Turning for a moment on the conventional hardcopy output of electronic
document processing systems, it will be evident that a hardcopy rendering
of an electronic document often is only a partial representation of the
content of the corresponding electronic document file. The appearance of a
hardcopy rendering is governed by the structure and content of the
electronic document to which it pertains, but the digital data encodings
which define the structure and content of the electronic document are not
explicitly embodied by the rendering. So-called "intelligent" input
scanners (scanners equipped with substantial image-processing software)
having sufficient knowledge of the structural encoding rules theoretically
can recover the structural encodings for at least some types of electronic
documents from hardcopy renderings of them, but the practical results
frequently do not conform to the theoretical expectations, especially if
the hardcopy is distorted (such as by a photocopying or facsimile
process), damaged or altered prior to being input scanned.
Furthermore, some types of electronic document data are virtually
impossible to infer from a hardcopy rendering. For example, electronic
spreadsheets conventionally include computational algorithms for defining
the computations which are required to compute the speadsheet, but these
algorithms generally are not explicitly set forth in the hardcopy
rendering of the computed spreadsheet. Likewise, electronic hypertext
documents and multimedia documents ordinarily contain pointers which link
them to related electronic documents, but the links provided by those
pointers usually are not embodied in the hardcopy renderings of such
documents. Still another example is provided by computer generated
synthetic graphical images where the control points for the graphical
objects that form the image and the data defining the curves which fit
those control points normally can only be approximated from a hardcopy
rendering of such an image. As still another example, it will be
understood that prints generated by computer aided design (CAD) systems
typically are approximate representations of the high precision data of
the underlying electronic file, which often contains three dimensional
information. As a general rule, the mathematical models and the related
data from which such a system generates such prints is not fully
recoverable from a hardcopy rendering representing any single view. As a
further example, it is to be understood that the color values for objects
(such as the cyan, magenta, yellow and black values for printed four-color
images) also are difficult to ascertain with any substantial certainty
from a hardcopy color rendering, and would be impossible to recover from a
black & white copy of that color document hardcopy. There are times when
documents are printed in black and white as a result of the limited
capabilities of the available printer, even though the original electronic
source document might have been intended to provide a full color, a
functional color, or a highlight color representation. Indeed, even some
of the more fundamental attributes of electronic documents, such as their
file names, author, creation date, etc., are seldom found in the hardcopy
renderings of such documents.
Consequently, it will be evident that it would be a significant improvement
if the ordinary hardcopy output of electronic document processing systems
could be employed as an essentially lossless media for storing all or part
of the structure and content of electronic documents and for transferring
that data from the printer of one electronic document processing system to
the input scanner of the same or another document processing system.
Hardcopy documents of that type would not only continue to function as a
convenient medium for distributing and storing human readable renderings
of electronic documents, but also would provide a convenient alternative
to the digital mass memories which customarily are used for storing
electronic documents and to the digital data links and removable digital
recording media which normally are employed for transferring electronic
documents from one location to another. Furthermore, the integration of
machine readable digital representations of electronic documents with
human readable renderings of them would permit various combinations of
human and computer information processing steps to be employed for
processing information more easily and quickly.
SUMMARY OF THE INVENTION
Therefore, in accordance with the present invention, provision is made in
electronic document processing systems for printing unfiltered or filtered
(i.e., complete or partial, uncompressed or compressed) machine readable
digital representations of electronic documents and human readable
renderings of them on the same recording media using the same printing
process. The integration of machine readable digital representations of
electronic documents with the human readable hardcopy renderings of them
may be employed, for example, not only to enhance the precision with which
the structure and content of such electronic documents can be recovered by
scanning such hardcopies into electronic document processing systems, but
also as a mechanism for enabling recipients of scanned-in versions of such
documents to identify and process annotations that were added to the
hardcopies after they were printed and/or for alerting the recipients of
the scanned-in documents to alterations that may have been made to the
original human readable content of the hardcopy renderings.
In addition to storage of a complete or partial electronic representation
of the document and/or its content, this invention may be utilized for
encoding information about the electronic representation of the document
itself, such as file name, creation and modification dates, access and
security information, printing histories. Provision may also be made for
encoding information which is computed from the content of the document
and other information, for purposes of authentication and verification of
document integrity and for computational purposes, such as the
recomputation of a spreadsheet. Furthermore, provision may be made for the
encoding of information which relates to operations which are to be
performed depending on handwritten marks made upon a hardcopy rendering of
the document; for example, instructions controlling the action which is to
be taken when a box on a document is checked. Still further, this
invention may be employed for encoding in the hardcopy another class of
information: information about the rendering of the document specific to a
single, given hard copy, which can include a numbered copy of that print,
the identification of the machine which performed that print, the
reproduction characteristics of the printer, the screen frequency and
rotation used by the printer in rendering halftones, and the identity or
characteristics of the print medium and marking agents (such as the paper
and xerographic toner, respectively) Moreover, provision also may be made
for encoding information about the digital encoding mechanism itself, such
as information given in standard-encoded headers about subsequently
compressed or encrypted digital information.
When the electronic document includes a scanned-in image, this invention
may be utilized for supplementing the hardcopy rendering of such a
document with embedded data characterizing the input scanner and the scan
process responsible for inputting the image. Similarly, when a hardcopy is
reproduced by a light-lens or electronic copier or a facsimile system,
data characterizing the reproduction equipment and process can be embedded
in the hardcopy reproduction.
Still another possible application for the present invention relates to
augmentation of hardcopy renderings with data defining various active and
passive user aids which exist in the electronic document domain. For
example, electronic buttons, soft keys, drawing brushes, magnifying tools,
phone tools and document feed arrows could be transferred in this way.
As will be appreciated, the supplemental data may be embedded in the
hardcopy renderings in a variety of ways. For example, it may be organized
hierarchically to ensure the inclusion and robust survival of the more
important information. Some or all of the data may be redundantly recorded
on the hardcopy renderings to increase it's liklihood of surviving copying
and handling. Moreover, the redundantly recorded data may aid in
recovering lower priority, non-redundantly recorded data from the human
readable content of the rendering, or the hardcopy recorded data may
include pointers to sources of backup data should a backup source be
required.
BRIEF DESCRIPTION OF THE DRAWINGS
Additional features and advantages of this invention will become apparent
when the following detailed description is read in conjunction with the
attached drawings, in which:
FIGS. 1A and 1B combine to provide a functional schematic diagram of a
relatively fully featured, state-of-the-art, electronic document
processing system;
FIGS. 2A and 2B combine to provide another functional schematic diagram for
illustrating certain of the enhancements this invention provides for
electronic document processing systems of the same general type as shown
in FIG. 1;
FIGS. 3 and 4 depict digitally augmented documents produced in accordance
with this invention; and
FIG. 5 illustrates some of the document processing applications and
work-ways which are facilitated by this invention.
DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENT
While the invention is described in some detail hereinbelow with specific
reference to an illustrated embodiment and certain applications, it is to
be understood that there is no intent to limit it to that embodiment or to
those applications. On the contrary, the aim is to cover all
modifications, alternatives and equivalents falling within the spirit and
scope of the invention as defined by the appended claims.
Turning now to the drawings, and at this point especially to FIGS. 1A and
1B, existing electronic document processing systems, such as referenced
generally by 11, typically include (i) an input scanner 12 for inputting
or "uploading" human readable hardcopy documents (not shown) into the
system, (ii) a programmed computer 14, such as a personal computer or a
workstation, for creating, editing and manipulating digital electronic
documents, and (iii) a bitmap printer 15 and/or a dot matrix or fully
formed character printer 16 for outputting or "downloading" human readable
hardcopy renderings of electronic documents from the system.
There are a wide variety of known input devices which a user may employ for
creating, editing and manipulating electronic documents. For example, a
keyboard 21 ordinarily is provided for inputting typographic data,
generally together with a predetermined set of control codes.
Additionally, a pointing device, such as a mouse 22, commonly is utilized
for controlling the positioning of a cursor on a monitor (not shown) that
provides the visual feedback which assists the user to interact with the
computer 14 effectively. Modern user interfaces, such as the graphical
user interfaces that are becoming increasingly popular for personal
computers and workstations, often extend the functionality of the
mouse-like pointer 22 so that it can be employed, together with a few
keystrokes on the keyboard 21, to input a relatively rich and easily
extensible set of control codes. There are still other input devices 24,
such as stylus sensitive digitizing pads, voice digitizers and video
digitizers (not shown), which may be utilized for inputting handwritten
data (e.g., free-hand sketches, signatures, etc.), voice annotations and
video data into the document processing system 11. Furthermore, as
described in some additional detail hereinbelow, the input scanner 12 is
available for inputting hardcopy documents, including hardcopy output from
the document processing system 11 and from other electronic document
processing systems (not shown), as well as hardcopy documents created
manually and by other types of marking mechanisms, such as standard
typewriters.
Document assembly software 31 residing on the computer 14 interprets the
input data and the control codes that are fed into the computer 14 to
produce structured electronic documents 32. Each of these electronic
documents typically is identified by a locally unique file name 33 which
may be assigned to the electronic document 32 by the user, as shown, or by
the computer 14 under program control. Typically, the document assembly
software 31 is application specific, but the lines between different
applications are becoming blurred with the emergence of integrated
multi-function software, such a the Xerox Viewpoint environment. For
example, in the case of text entered via the keyboard 21, the ASCII
encodings 35 of the typographic characters are combined in the document
assembly software 31 with control codes to provide DDL encodings for
insertion into a structured text file (or, in the case of an electronic
document which permits mixed data types, into a text frame) 32. A
significant portion of the logical structure of the electronic document 32
usually is explicitly defined by its composition, without requiring any
additional intervention by the user. However, provision normally is made
for enabling the user to enter document formatting commands, as at 36 and
37, to override the default values which the document assembly software 31
otherwise would employ for defining the layout structure of the document
32.
As is known, structured electronic documents, such as the document 32, can
be interchanged between DDL compatible electronic document processing
systems, as at 41, through the use of removable digital recording media,
such as floppy disks and the like, and through the use of digital data
links. Furthermore, networked document processing systems typically are
able to interchange electronic documents, either directly by means of a
direct file transfer protocol or electronic mail as at 42, or indirectly
by means of shared electronic file servers 43.
Hardcopy renderings 45 of locally or remotely produced structured
electronic documents 32 can be printed from a DDL encoding by employing,
for example, a suitable print driver for driving a standard character
printer 16. Alternatively, a PDL encoding of the document 32 may be
composed, as at 46, to provide a PDL master 47 which, in turn, can be
decomposed, as at 48, to provide an electronic bitmap representation 49 of
the document 32 for printing by a bitmap printer 15. PDL masters, such as
the master 47, also are structured electronic documents which can be
interchanged among PDL compatible electronic document processing systems
by means of physically removable recording media as at 41, direct file
transfer protocols/electronic mail 42, and shared file servers 43.
Like any other hardcopy document, the hardcopy rendering 45 of an
electronic document 32 may be photocopied by a light/lens copier, as at
53, or by a digital copier, as at 54. Additionally, a copy of the
rendering 45 may be transmitted to or received from a remote location via
facsimile, as a 55. Standard photocopying and facsimile processes tend to
cause some distortion of the image, so the copies they produce often are
somewhat degraded, especially when the copies are several copy generations
removed from the original rendering 45.
As will be understood, the hardcopy input 61 for the input scanner 12 may
be the original or a copy of the rendering 45 or of a similar hardcopy
rendering from another electronic document processing system (not shown).
Or, the input document 61 may be the original or a copy of a document
created manually or through the use of a mechanical or electromechanical
marking mechanism, such as a standard typewriter and the like.
Additionally, the original human readable information content of the
document 61 might be supplemented by various annotations and editorial
markings. Also, changes may have been made to the original human readable
information content of the document 61, with or without any intent to
deceive.
In accordance with standard practices, to electronically capture the human
readable information content of the document 61, the input scanner 12
first converts the appearance or image of the document 61 into an
electronic bitmap 62. Recognition software 63 then usually is employed for
converting the bitmap representation 62 into elemental textual and
graphical encodings to the extent that the recognition software 63 is able
to establish a correlation between elements of the bitmap image 62 and the
features it is able to recognize. For example, state-of-the-art
recognition software 63 generally can correlate printed typographic
characters with their ASCII encodings, as at 64, with substantial success.
Additionally, the recognition software 63 sometimes is able to perform
some or all of the following tasks: (a) infer some or all of the
page-layout features of the document 61 from its bit map representation
62, thereby establishing a basis for supplying page-layout control codes
as at 65, (b) make probablistic (e.g. "nearest-fit") determinations with
respect to the font or fonts used to print text appearing in the document
61, thereby providing a foundation for supplying font control codes as at
66, and (c) fully or partially decomposing line drawings appearing in the
document 61 into "best-guess" vectors, thereby providing a basis for
supplying corresponding vector encodings as at 67. However, even with
these various recognition tools, the recognition software 63 often is
unable to recognize some of the features of the document 61, so it usually
also includes provision for inserting the bit maps for unrecognized images
into image frames. Therefore, the electronic representation of the
document 61 that the document processing sys | | |