|
Description  |
|
|
FIELD OF THE INVENTION
The invention relates to the field of computer-based document management,
and in particular to an integrated system for creating, distributing,
producing and managing various types of multi-component documents.
DESCRIPTION OF THE RELATED ART
Document composition, assembly and production is today often accomplished
in an automated environment. A typical word processing system, for
example, permits entry and modification of text to occur on a
host-connected terminal prior to generation of hard-copy output. A user
may perform a variety of merge, copy and transfer operations within a
document or among several documents in a straightforward and efficient
manner.
Likewise, automated publishing systems have replaced fully manual and
typesetting procedures, permitting interactive page composition and
formatting. Typically, embedded layout or formatting commands are entered
by the user along with text, graphic or image information, and then
implemented by the system. Word processing and automated publishing
systems may reside in a single terminal, be distributed on a time-share
basis or contained in a number of network-linked microcomputers.
The majority of present-day document processing systems are directed solely
to the task of creating and editing documents. However, many users require
integration of various program applications in order to merge their
outputs. In response to this need, a number of multiple-function programs
have recently emerged. These programs combine several applications, such
as word processing, data processing and spreadsheet operations, into a
single integrated system.
Function integration and multi-user capabilities, however, do not
necessarily automate the process of producing complex documents, where the
difficult aspects of document creation lie in coordinating the efforts of
a large number of creative participants, generating text that is not
readily produced by conventional data processing application software or
shepherding an evolving document through a series of sequential
procedures. A system designed for such sophisticated applications must
support an elaborate file structure capable of discriminating among users
and tracking the progress of procedural operations, allow access to
discrete document portions in order to maximize the number of users who
may simultaneously work on a document, and facilitate modular construction
of complex documents with considerable flexibility. The system must also
be compatible with an array of external application software packages that
is sufficient to support document assembly and output.
The present invention exploits a class of computer programming that
utilizes an "object-oriented" approach to document production. In an
object-oriented system, data is stored in self-contained programmatic
structures that also contain procedures for manipulating the data. The
procedures need not reside in the same area of memory as the data, nor
need the routines specifying the procedures be replicated in each object.
Rather, the object may comprise only a set of pointers to data and
procedural files that can be shared by many objects. The invention uses
objects to represent documents as collections of logical components (such
as chapters of a book or sections of a newspaper) which may be combined
and physically mapped onto a page-by-page layout when sufficient content
has been introduced to make this operation meaningful.
The use of object-oriented environments in the field of document
manipulation is not new. U.S. Pat. No. 4,739,477 and U.S. Pat. No.
4,723,209 describe an object-oriented document system that allows multiple
data sets to be assigned to a single displayable area of a document.
Furthermore, word processing systems sometimes represent documents as a
series of logical segments that contain information in order to facilitate
formatting into pages. See, e.g., U.S. Pat. No. 4,539,653, which describes
a formatting scheme in which pages are divided into named regions referred
to as "logical pages"; a user may assign text or graphics to these regions
by means of embedded commands contained in the text data stream. The
latter reference does not disclose an object-oriented system for mediating
between logical and layout document components, and neither reference is
directed toward an integrated system of document production wherein
documents are organized into DBMS-managed objects and gradually assembled
in response both to user commands and programmed procedures.
DESCRIPTION OF THE INVENTION
Objects of the Invention
Accordingly, it is an object of the present invention to provide a novel
system for creating, distributing, producing and managing various types of
complex documents.
It is another object of the invention to support coordinated, multiple user
access to various components of complex documents.
It is a further object of the invention to maintain individual document
components as discrete units that may be accessed selectively and combined
by the user or by means of external programming.
It is another object of the invention to provide a general platform which
may be customized to suit a variety of publishing, case management and
document handling applications.
It is yet another object of the invention to provide an object-oriented,
data-base-centered computational environment for the storage,
modification, organization and retrieval of documents.
Summary of the Invention
The invention decomposes a document into logical components, which are
stored as discrete "objects" in an object-oriented computational
environment. Stored objects are organized, accessed and manipulated
through a database management system (DBMS). The DBMS provides a coherent,
consistent encoding of object content, object attributes and inter-object
relationships. Ultimately, the objects are assembled into an integrated
whole when the document is to be physically produced, i.e., printed or
displayed electronically or electronically transmitted. At a minimum,
objects contain "content," that is, basic information-bearing constituents
such as text, image, voice or graphics. Objects may also contain further
data ("attributes") specifying (a) logical or physical relationships to
other objects or to the document as a whole, (b) characteristics relating
to the appearance of the content, or (c) access restrictions. For example,
a check may be divided into the simple logical objects "check number,"
"payee," "payor," "amount,", "signature," and "account number." The
content of the logical object "check number" will be the representative
characters, but this object might also contain a layout attribute
indicating that it is to be placed at the upper-left-hand corner of the
check document. A character font may also be specified. In addition to
attribute data, an object can contain procedures that store, send, delete,
modify and display the object.
Objects in a document may be hierarchically related to one another, such
that one object may draw some or all of its content and/or attributes from
another object or objects. This permits objects to be reused, resulting in
efficient memory utilization. For example, an advertisement may be stored
as an object, but incorporate a stock photograph stored as a second
object. That photograph may be accessed by other objects within the
document, and in other documents.
Objects may also be organized according to class, permitting multiple
objects to inherit the same set of characteristics and attributes. For
example, a document object may be subclassified as a check document, and
all check documents may contain the same set of content objects.
Documents can themselves be represented as objects, when this level of
generality is appropriate, and collected into bundles referred to
generically as "folders." Folders, too, can be represented as objects.
(Hereinafter, the terms "folder object" and "document object" will refer
to the folder or document itself, rather than the objects contained
therein.)
Objects are broadly classified as "logical" and "layout" objects. A logical
object defines the relationship between different portions of content, as
well as between documents. Layout objects specify the physical
distribution of content within the logical object, and define physical
locations on a page or within a document. Layout objects may include page
sets (e.g. sections of a newspaper or periodical); pages; frames, which
represent regions within a page; and blocks, which represent subregions.
Separating physical layout from logical relationship permits coordination
of work activity among a large number of users, because user access can be
limited to appropriate document elements and requests for document
components prioritized to prevent simultaneous access. In addition, the
invention can control access based on a work-flow model of document
assembly, wherein a user's ability to make modifications or additions is
contingent on the occurrence of a previous event.
Prior art systems are limited in terms of multiple-user support, despite
the growing importance of distributed operations in the publishing
industry. This advantage of the invention is particularly suited to
publications that are generated by teams of specialized personnel, some
responsible for various aspects of physical appearance and others
primarily concerned with content. Delaying complete merger of both content
and layout until actual production permits non-conflicting priority
requests to be implemented automatically, while reserving editorial
resources for the more difficult allocation decisions.
A document object contains pointers to content objects (which themselves
contain the basic information-bearing constituents), as well as to logical
objects and layout objects. Examples of logical objects are "First News
Story" or "First News Story Photo". Examples of layout objects are "Page
One" or a dimensional specification of a portion of Page One. Logical
objects can contain attributes specifying locational preferences within
the document, but these are not evaluated until logical objects are mapped
into layout objects during pagination. Document objects may also contain
attributes relating to the appearance of various fields within the
document (e.g. different font types for different portions of a page);
alternatively, these can be maintained within the content objects.
Objects are created, organized and accessed by means of an object-oriented
DBMS. The current embodiment of the invention can utilize any of a number
of general-purpose DBMSs, appropriately supplemented to facilitate
object-oriented operation. The selected DBMS is structured according to
the particular system application so that a basic set of "native" or
standard objects will be available to the users. The DBMS should also
accommodate new objects defined by users and integrate them within the
existing framework. Objects are organized within the DBMS by type (e.g.
Page Set) and by name identifier (e.g. Business Section). The information
contained in a content object may be stored within the DBMS or external to
the DBMS; in the latter case the DBMS contains a pointer to the memory
address of the content.
The user obtains information about objects and accesses objects through a
variety of utility programs integrated within or callable by the system.
These access the DBMS directly and provide editing, display and output
functions which are appropriate to the particular system application.
Indeed, the DBMS is most appropriately viewed as a kernel, to which access
is accorded to various application packages.
For example, users of a document management system may need only simple
text editing capabilities that permit input and modification of text,
composition functions to format text, and fonts to specify output of text
on typesetters and other output devices; by contrast, publishing systems
can require sophisticated image processing and graphics capabilities, as
well as integration of output arising from multiple sources. Objects of
the present invention are configured so as to require values for a
consistent set of parameters, thereby facilitating interface with a
variety of application programs, input devices and output devices. So long
as the application program is equipped to provide values for these
parameters, interaction with discrete objects can occur.
Direct communication between the user and the DBMS, when appropriate, may
take place in Structured Query Language (SQL). This standard language
interfaces most easily with DBMS systems. More commonly, content is
retrieved from the DBMS by an application program (e.g. an editor) through
the system's input/output system, and the user interacts with the
application program only. Application programs typically feature
menu-driven or command-driven interfaces which are more convenient to use
than SQL. After the user's session with the application program has been
completed, the modified content is passed back to the DBMS for storage.
The DBMS maintains immediate supervisory control over retrieval and
storage of files.
Access to objects can be selectively restricted by insertion of appropriate
"access" attributes within such objects. The user's attempts to gain
access to an object will depend on fulfillment of the criteria specified
in the attribute. For example, an access attribute can require a proper
user identification or terminal location. The DBMS can be configured to
scan this attribute of an object prior to retrieval, and refrain from
returning the object unless a match is detected.
In addition to characteristics of the user, access can depend on a sequence
of procedures that must be performed on the desired object or related
objects. As an example, the content of a newspaper story might not be
available for layout until the author's supervising editor has approved
the text. Thus, access to or manipulation of the corresponding object must
be restricted pending completion of this prerequisite operation. This
state-transition model of object manipulation can be extended to encompass
not only preconditions for object access, but post-conditions as well. For
example, modification of an object may be restricted until the DBMS has
been given an acknowledgement of the object's receipt by an application.
Status attributes can also be included within objects to record the
performance of procedures on the objects, and the access attribute of the
desired object can be set to evaluate the status attributes of its own
object or a related object. So long as the application program or DBMS
retrieval routines have been configured to respond appropriately to status
and access attributes, the user retains complete flexibility to design
work-flow procedures and/or personnel restrictions.
As noted previously, the invention separates the content and classification
of a document component from its physical location. Accordingly, the
process of building a document depends largely on its organizational
priorities. "Content-driven" documents are composed of logical components
that exhibit a consistent physical layout pattern. An office letter can be
extremely content-driven, in that its logical components always fall
within precisely describable physical locations on the printed document. A
less extreme example might be a literary novel with sequential chapters;
although the exact lengths of chapters may vary, their sequential physical
organization remains consistent. Other documents are "layout-driven,"
meaning that physical appearance is accorded first priority over content.
An example might be a catalogue, wherein the arrangement and relative
sizing of items on a page can be different for each page of the catalogue,
while the content may well be standardized or provided to the user by an
outside source. The invention is capable of accommodating both types of
document creation strategies, as discussed below.
The invention may also be utilized in environments where various procedures
are performed on documents over time by different work units; such
applications are referred to as "work-flow processing." In a typical
work-flow environment, work groups or departments discharge tasks that
require examination or modification of documents kept in files and/or
archives or which are created during the work; examples of organizations
that make use of work-flow processing are insurance claims processing
departments or advertising organizations. The documents, with work
requests, are routed to appropriate personnel within a department,
processed, and then returned to storage or to another work group for
further processing. The path followed by a particular folder or set of
folders may be determined by the individuals working on them or may follow
a pre-planned course.
In such contexts, it may become necessary to superimpose a computer
program--referred to as an "application agent"--onto the DBMS. The
application agent follows a set of rules to route objects to appropriate
personnel at the proper time or pursuant to external command, and to
perform autonomously on objects those procedures amenable to automated
discharge. The application agent can also provide status reports relating
to various objects within the DBMS. However, if the work-flow path is
simple enough, an appropriately chosen set of status and access attributes
attached to the relevant objects may be sufficient.
The DBMS is also configured to provide the proper utility programs to
personnel in various work groups, and provide access to another work
group's system where appropriate.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other and further objects of the invention will be
understood more readily from the following detailed description of the
invention and the present embodiment, when taken in conjunction with the
accompanying drawings, in which:
FIG. 1 depicts the basic hierarchical organization of objects according to
the present invention, in the context of a book consisting of sequential
chapters.
FIG. 2 illustrates the computational modules comprising the present
invention.
FIG. 3 details the nomenclature and basic hierarchical organization of
objects in the present embodiment of the invention.
FIG. 4 illustrates the physical structures corresponding to the layout
objects contained in the present embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
1) System Components
Ideally, the invention utilizes an object-oriented database. However, the
existing embodiment of the invention employs INGRES, a relational database
marketed by Relational Technology, Inc., augmented by a series of
additional software modules that support object-oriented operation. With
these additional modules, any DBMS capable of distributed implementation
and, preferably, providing an SQL interface can be used advantageously.
As hereinabove noted, the invention accommodates simultaneously both the
content-driven and layout-driven strategies of document creation. This is
accomplished by maintaining a computational distinction between logical
objects and layout objects.
FIG. 1 schematically depicts logical and layout objects in the organization
hereinabove described. The particular example depicted in FIG. 1 is a book
containing sequential chapters. Logical objects, which appear on the left
side of the drawing, represent the conceptual organization of the book.
Layout objects, which appear on the right, represent physical divisions of
the document. Logical objects remain separate from layout objects until
mapped thereon by document manager 16. Document manager 16 is typically a
supervisory computer program which creates a tentative initial version
based on layout parameters, content object attributes and the content
itself, although the necessary operations may be performed manually if
necessary. All objects may contain attributes and "bindings," which are
attributes that specify computational procedures; these procedures are
called and executed, either by a properly configured application program
or by the input/output system, upon the occurrence of conditions specified
as part of the attribute. Content objects are hierarchically subordinate
to logical objects, but their attributes may influence layout objects by
specifying placement parameters.
FIG. 2 illustrates the basic organization of the invention. In a simple
system comprising no automated-assistance programming, a user seeking to
construct a content-driven document would enter content and attributes
through an application program 21 into native or user-defined content
objects, which collectively describe the logical document. Alternatively,
the user may add or modify content directly through DBMS 27 using SQL
commands. DBMS 27 maintains the organization of all objects and controls
access thereto, but all communication with DBMS 27 is performed through
input/output system 23 (described below).
Next, the user would similarly enter basic layout parameters (such as
margins and justifications) into the layout objects. When the user has
entered all available content and layout parameters, the objects are
submitted to document manager 16, which generates final layout parameters
based on those already provided by the user, the set of logical objects
and the amount of content. As a simple example, the author of an
interoffice memorandum bearing a standard format might specify a style of
header, its page position, the contents of header subobjects (such as the
addressee and date), and the content of the memorandum itself. Document
manager 16 produces an appropriately formatted version of the memorandum
based on these values. The paginated memorandum is then sent to
input/output system 23. In this example, input/output system 23 converts
the document into a text stream for output onto a suitable viewing device.
The user has the ability to alter format and/or content as desired, and
send the document to other output devices.
Alternatively, the author of a layout-driven document could enter precise
values for the layout objects, thereby defining a physical structure into
which content may be loaded. This structure is ordinarily displayed to the
viewer prior to entry of content. For example, the designer of a single-
or multi-page advertisement might specify a set of borders, angular
rotation of specific portions of text, the location of an image within a
page and a color model (which establishes the method of encoding the color
of each data element). When the format has been satisfactorily determined,
the user could enter text directly into the allowed layout spaces,
lengthening, shortening or altering the fonts as necessary to accommodate
the layout.
Complex documents such as newspapers are both content-driven and layout
driven to various degrees, depending on editorial proclivity and the
particular locations within the newspaper. In such circumstances,
additional software support mediates the elaborate relationship among
content size, content attributes and layout objects.
2) Object Management
Content objects contain raw data that may be decoded by appropriate output
devices. Such data may take the form of floating-point values, fixed-point
numbers, alphanumeric characters, byte strings, binary values, or similar
primitive elements. For example, the data of an image typically consists
of a series of pixel encodings. A standardized series of attributes is
also associated with each object these attributes are interpreted by the
various components of the invention as necessary. Attributes such as color
model, resolution, position and decoding procedures, stored as part of an
image object, provide necessary output information to input/output system
23. Text object attributes relating to output include style commands,
fonts and typesetter driver functions. All objects carry a name and
classification, which is utilized by DBMS 27 to facilitate access and
linkage between attributes and objects.
Document objects and folder objects consist primarily of pointers and
attributes. Document object pointers specify the logical and layout
objects associated with a particular document; when the document has been
fully composed, all content objects are bound to layout objects. Multiple
layout objects (such as portions of corresponding pages in different
versions of a newspaper) can share the same content objects. The
hierarchically superior folder object pointers specify documents.
As shown in FIG. 2, objects are output to external devices by input/output
system 23, a computational module which obtains objects from DBMS 27 and
arranges the content in display order. An input/output system suitable for
use with the present invention is described in copending application Ser.
No. 07/446,975, filed contemporaneously herewith and incorporated herein
by reference. Input/output system 23 merges and translates these objects
into a linear list of output commands to drive one or more of output
devices 25.
As noted above, the user is provided with one or more utility programs 21
to assist with composition, which consists of content entry, modification,
pagination and style choices. The primary utility program for most
applications relates to text processing and layout modification, and is
referred to as the "text subsystem." The text subsystem consists of a
sophisticated word processing system that allows entry of text characters,
style and format commands which determine the physical layout and
appearance of characters, and graphic shapes; such systems are well-known
in the art. The word processor should be equipped with word-wrap and
hyphenation features.
After text input, modification and storage, the text subsystem supports an
interactive, interleaved process model that facilitates imaging and
positioning of text characters as they will appear in the visual output of
the system. Input/output system 23 facilitates visual imaging or "mock-up"
(i.e. dummy renditions) of these objects, the appearance of which may be
specified by an attribute. In complex typesetting applications, the text
subsystem calculates the physical size of the text block, and permits
characters to be positioned within graphical shapes and set onto baselines
within columns or within blocks. When the typeset version is finalized,
the text subsystem formats the content into a multi-byte description of
each text character. Information concerning the relationship of text
characters to one another are stored as attributes bound to text objects.
These attributes and embedded encodings are interpreted by input/output
system 23, where they are translated into commands that will drive the
particular output device chosen.
A second utility program, useful in publishing applications, is the "image
subsystem." This utility program facilitates input of image data for
designation by the DBMS as an object and user manipulation of the entered
image. The image subsystem formats the completed image into pixel values
and associated attributes that are interpreted by input/output system 23.
Image input may be accomplished in a number of ways, e.g. through a
conventional optical scanner, which digitally encodes tonal values of a
flat image passed beneath an electronic detector, or through communication
channels of equipment capable of encoding a subject in pixel format. The
application program that interacts with the input device is responsible
for furnishing values for the invention's set of image attributes.
The image subsystem operates on images resident in system memory. Editing
functions permit the user to access and modify the position, color and
density of image pixel values. A suitable image processing system should
also be capable of point manipulation, as well as manipulation (e.g.
rotation, cropping and scaling, and color modification) of an image
component or the entire image. Again, such systems are known and readily
available to those skilled in the art.
Output of an image by input/output system 23 requires retrieval of the
actual data representing the final image or image rendition, translation
of this data and associated image attributes into display data, and
transmission to the selected output device.
DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
A particularly useful application of the present invention is in an
integrated newspaper publishing environment. This complex application
draws heavily on the system's power to accommodate large numbers of users
simultaneously adding or modifying content or layout constraints; multiple
document versions that contain much of the same content; and intricate,
recursive relationships among layout, content and independent user choic | | |