|
Claims  |
|
|
What is claimed is:
1. A computer-implemented method for creating a plurality of versions of
custom documents from a document source file, comprising the steps of:
defining a source file having a plurality of encapsulated data elements,
wherein each encapsulated data element includes document information
including text or graphics, wherein the plurality of encapsulated data
elements includes a plurality of first encapsulated data elements and a
plurality of second encapsulated data elements;
defining a first class, wherein the first class includes a plurality of
first class variation names and wherein the plurality of first class
variation names includes a first and a second variation name;
tagging the plurality of first encapsulated data elements with the first
variation name in the first class;
tagging the plurality of second encapsulated data elements with the second
variation name in the first class;
selecting encapsulated data elements from the plurality of encapsulated
data elements wherein the step of selecting includes the step of choosing
a set of variation names including the first variation name; and
filtering the source file with the set of variation names, wherein the step
of filtering comprises forming a filtered source file comprising the
selected encapsulated data elements.
2. The method according to claim 1 wherein the step of selecting further
comprises the step of adding the second variation name to the set of
variation names.
3. The method according to claim 2 wherein the plurality of encapsulated
data elements includes a plurality of common encapsulated data elements
and wherein the step of forming the filtered source file includes the step
of placing the plurality of common encapsulated data elements into the
filtered source file.
4. The method according to claim 1 wherein the method further comprises the
step of producing a custom document output from the filtered source file.
5. The method according to claim 4 wherein the step of producing a custom
document output comprises the steps of:
formatting the filtered source file to produce an electronic document which
can be read on a display device; and
storing the electronic document in a machine-readable format such that the
electronic document can be searched and displayed via search and display
software.
6. The method according to claim 1 wherein the method further comprises the
steps of:
providing a second class, wherein the second class includes a plurality of
second class variation names and wherein the plurality of second class
variation names includes a third variation name; and
tagging one of the plurality of encapsulated data elements with the third
variation name; and
wherein the step of choosing a set of variation names comprises the steps
of:
selecting a first class variation name;
selecting a second class variation name; and
adding the selected variation names to the set of variation names.
7. The method according to claim 6 wherein the encapsulated data clement
tagged with the third variation name is one of the plurality of first
encapsulated data elements.
8. The method according to claim 6 wherein the plurality of encapsulated
data elements includes a plurality of common encapsulated data elements
and wherein the step of forming the filtered source file includes the step
of placing the plurality of common encapsulated data elements into the
filtered source file.
9. The method according to claim 6 wherein the method further comprises the
step of producing a custom document output from the filtered source file.
10. The method according to claim 9 wherein the step of producing a custom
document output comprises the steps of:
formatting the filtered source file to produce an electronic document which
can be read on a display device; and
storing the electronic document in a machine-readable format such that the
electronic document can be searched and displayed via search and display
software.
11. A computer-implemented method of generating a version of a document
from a document database, comprising the steps of:
providing a plurality of data objects, wherein each data object includes
document information including text or graphics, wherein the plurality of
data objects comprises a plurality of first data objects, a plurality of
second data objects and a plurality of common data objects;
defining a first class having a first variation name and a second variation
name;
associating the first variation name with the plurality of first data
objects;
associating the second variation name with the plurality of second data
objects;
selecting the data objects associated with a set of variation names,
wherein the step of selecting includes the step of adding the first
variation name to the set of variation names; and
forming an output document wherein the step of forming includes the step of
removing the unselected second data objects.
12. The computer-implemented method of claim 11 wherein the step of
selecting further comprises the step of adding the second variation name
to the set of variation names.
13. The computer-implemented method of claim 11 wherein the step of forming
comprises the step of formatting the output document in a predefined
format.
14. The computer-implemented method of claim 11 wherein the plurality of
data objects includes a third data object and wherein the method further
comprises the steps of:
defining a second class having a third variation name;
associating the third variation name with the third data object; and
wherein the step of selecting further comprises the step of adding the
third variation name to the set of variation names.
15. The computer-implemented method of claim 14 wherein the step of forming
comprises the step of formatting the output document in a predefined
format.
16. The computer-implemented method of claim 11 wherein the method further
comprises the steps of:
defining a second class having a third variation name;
associating the third variation name with one of the second data objects;
and
wherein the step of selecting further comprises the step of adding the
third variation name to the set of variation names.
17. The computer-implemented method of claim 16 wherein the step of forming
comprises the step of formatting the output document in a predefined
format.
18. A document generation system for generating a variety of documents from
a common document database, comprising:
authoring means for entering a document having a plurality of data objects,
wherein the plurality of data objects includes a plurality of first data
objects, a plurality of second data objects and a plurality of common data
objects, wherein each data object includes document information including
text or graphics, wherein the authoring means includes first class
assigning means for assigning a first class having a first and a second
variation name to each of the first and second data objects, wherein the
first class assigning means comprises means for associating the plurality
of first data objects with the first variation name and means for
associating the plurality of second data objects with the second variation
name;
document validation means for determining that the document is in a
predetermined format; and
document filtering means for removing data objects associated with the
second variation name.
19. The system according to claim 18, wherein the plurality of first data
objects includes a third data object and wherein:
the authoring means further includes second class assigning means for
assigning a second class having a third variation name, wherein the second
class assigning means comprises means for associating the third data
object with the third variation name; and
the document filtering means includes means for removing data objects
associated with the third variation name.
20. The system according to claim 18, wherein the plurality of data objects
further include a third and a fourth data object and wherein:
the authoring means further includes second class assigning means for
assigning a second class having a third and a fourth variation name,
wherein the second class assigning means comprises means for associating
the third data object with the third variation name and means for
associating the fourth data object with the fourth variation name; and
the document filtering means includes means for removing data objects
associated with the fourth variation name.
21. The system according to claim 18, wherein the system further comprises
formatting means, connected to the document filtering means, for placing
the document in a predefined format.
22. A computer implemented method of creating multiple variations of
documentation, comprising the steps of:
defining a source file having a plurality of document elements, wherein the
document elements include document information including text or graphics;
tagging predetermined ones of said plurality of document elements as first
variation document elements;
tagging predetermined ones of said plurality of document elements as second
variation document elements;
tagging predetermined ones of said plurality of document elements as common
document elements;
selecting a first variation;
scanning said source file for selected document elements, wherein the step
of scanning includes the step of scanning said source file for first
variation document elements and for common document elements; and
generating an output document from the document information contained in
the selected document elements.
23. The method of claim 22 further comprising the step of selecting a
second variation; wherein the step of scanning further includes the step
of scanning said source file for second variation document elements and
wherein the step of generating comprises the steps of marking as first
variation elements the document information from said first variation
document elements and as second variation elements the document
information from said second variation document elements.
24. A computer-implemented method of generating a version of a document
from a document database, comprising the steps of:
providing a plurality of document section objects, wherein the plurality of
document section objects includes document section objects having one or
more paragraphs and document section objects having one or more
illustrations;
dividing the plurality of document section objects into a plurality of
first document section objects, a plurality of second document section
objects and a plurality of common document section objects;
defining a first class having a first variation name and a second variation
name;
associating the first variation name with the plurality of first document
section objects;
associating the second variation name with the plurality of second document
section objects;
selecting the document section objects associated with a set of variation
names, wherein the step of selecting includes the step of adding the first
variation name to the set of variation names; and
filtering the document database to form an output document comprising the
common document section objects and the selected document section objects.
25. The computer-implemented method of claim 24 wherein the step of
selecting further comprises the step of adding the second variation name
to the set of variation names.
26. The computer-implemented method of claim 24 wherein the step of
filtering comprises the step of formatting the output document in a
predefined format.
27. The computer-implemented method of claim 24 wherein the method further
comprises the steps of:
defining a second class having a third variation name;
associating the third variation name with one of the second document
section objects; and
wherein the step of selecting further comprises the step of adding the
third variation name to the set of variation names.
28. The computer-implemented method of claim 27 wherein the step of
filtering comprises the step of formatting the output document in a
predefined format.
29. A document generation system for generating a variety of documents from
a common document database, comprising:
an input/output device, wherein the input/output device includes authoring
means for entering a document having a plurality of encapsulated
paragraphs and means for grouping each of the plurality of encapsulated
paragraphs into first, second and common encapsulated paragraphs, wherein
the authoring means includes first class assigning means for assigning a
first class having a first and a second variation name to each of the
first and second encapsulated paragraphs, wherein the first class
assigning means comprises means for associating the first encapsulated
paragraphs with the first variation name and means for associating the
second encapsulated paragraphs with the second variation name;
document validation means, connected to the input/output device, for
determining that the document is in a predetermined format;
storage means, connected to the document validation means, for storing the
document; and
document filtering means, connected to the storage means, for removing
encapsulated paragraphs associated with the second variation name.
30. The system according to claim 29, wherein:
the authoring means further includes second class assigning means for
assigning a second class having a third variation name, wherein the second
class assigning means comprises means for associating one of the first
encapsulated paragraphs with the third variation name; and
the document filtering means includes means for removing data objects
associated with the third variation name.
31. The system according to claim 29, wherein the system further comprises
formatting means, connected to the document filtering means, for placing
the document in a predefined format.
32. The system according to claim 29, wherein the document further includes
one or more encapsulated illustrations associated with the first variation
name and one or more encapsulated illustrations associated with the second
variation name, wherein the document filtering means includes means for
removing encapsulated illustrations associated with the second variation
name. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to computer-implemented methods of document
production, and in particular to a system and method for producing a
variety of documents from a common document database.
BACKGROUND
Documenting software products and other technical subjects has always been
difficult. The technical writer must learn the subject and write a
document that helps the reader understand the subject, find information
quickly, and troubleshoot problems. The more complex the subject, the more
difficult the writer's task. Each significant advance in modem technology
brings a new level of complexity and challenge for the writer of technical
documentation.
A Common Problem: Documenting Multiple Variations of a Subject
For example, consider the case of a set of user manuals for a software
product. When a new product is first released, it typically operates on
only one hardware platform, such as a personal computer (PC), a
workstation or a specific mainframe system. One set of user manuals is
created for the product. During the software product life cycle, however,
the product will generally become increasingly sophisticated and will be
upgraded to run on additional hardware platforms. Now the documentation
must not only track the increased sophistication but also adapt to
variations in operation introduced by the new hardware platforms.
In the past, technical writers have used one of two approaches to handle
variations (such as multiple hardware platforms) in a product offering. In
the first approach, each variation is documented separately. That is, the
writers will write documentation for the initial product and then modify
or rewrite that document for each product variation. The result is a
sequence of technical documents having a common ancestor, but no more.
Changes to one document are not necessarily reflected in another document.
The separate variations of the documents quickly become inconsistent and
hard to manage. The redundant documents are costly for the companies to
produce and maintain and are less usable for customers who want to compare
related products.
In the second approach, all variations are documented together in a single
document, with variation-specific information intermixed with common
information. For customers who want information about only one variation,
such an approach can be confusing. They may, for instance, find it
difficult to separate the information relating to their specific variation
from information relating to the other variations.
This problem of documenting multiple variations of information is a common
one. For software documentation, other examples include products that
implement multiple variations of a standard or run on multiple graphical
user interfaces. And the problem is not limited to software documentation.
For example, a piece of equipment, such as a television, lawn mower, or
kitchen appliance, often comes in several models, each with its own
variation of instructions. A course workbook might be developed for a
subject that will be taught to novice, intermediate, or advanced students,
each with variations of similar information. Even a cookbook might include
recipes with variations for low-cholesterol, low-sodium, or diabetic
diets.
Existing Document Generation Systems and/heir Limitations
To produce documentation, in both printed and electronic forms, technical
writers use document generation systems. Most document generation systems
today use a form of procedural (or physical) markup within the source data
files. Procedural markup specifies the presentation of a particular region
of text in an output document, such as the use of bold or italics for
emphasis. This type of markup is specific to one particular output format.
Procedural markup has an inherent limitation: It inhibits the reuse of
text. Because it specifics formatting characteristics, such as bold or
shading, procedural markup docs not facilitate reuse of text for different
output media, For example, the formatting command that specifies a shaded
region of text for printed output has no corresponding presentation
technique for on-screen presentation. Or a table structure with horizontal
and vertical boundaries may need to be displayed differently on the screen
than it is on paper. And font styles and sizes might be different for
paper and on-screen display.
Another reason procedural markup inhibits reuse is that it is often
system-dependent. The formatting codes used for one document generation
system are generally incompatible with other document generation systems.
When writers using different document generation systems need to share
text, they often must convert the files from the format of the sender's
document generation system to a plain text, or ASCII format, which removes
all procedural markup codes. Then the receiver must insert the procedural
markup codes of his/her document generation system into the ASCII file.
This is obviously labor-intensive, time-consuming, and costly.
Another limitation of existing document generation systems is their
inability to encapsulate information about text. For example, a given
paragraph or table might apply to a given variation (such as hardware
platform), but there is no inherent way to attach this information to the
text so that variation-specific documents can be generated.
SGML: A Standard That Promotes Reuse
To address the inherent limitations with existing document generation
systems, the ISO 8879 Standard Generalized Markup Language (SGML) was
published in 1986, and it has become increasingly popular in the industry.
SGML is a language for describing the structure and content of a document.
It is a structural, not procedural (or physical) markup language. In
structural markup languages, the markup used within the source dam files
identifies the kind of information stored in each data element (such as
heading, paragraph, table), rather than the physical presentation of that
element (such as typeface or table format). Therefore, text authored with
SGML is highly reusable. The same text can be reused for various output
media (such as printed documents and on-screen help text), and it is
system-independent so it can be shared by writers using different document
generation systems.
To understand SGML, it is helpful to briefly examine the SGML tags and how
they are used in a source data file. Each tag is enclosed in greater than
and less than symbols (<>); for example, a tag that specifies the
beginning of a paragraph might look like <p>. In an SGML source data file,
each of the various elements is clearly distinguished with beginning and
ending tags. The ending tags are preceded by a forward slash (/)
character. Authors can use commercially available software tools to insert
the required tags into the data file, or they can code the markup directly
using an ASCII editor.
The specific names used for the tags within an SGML data file and the
hierarchical relationships between the various elements are based on a set
of rules, called a Document Type Definition (DTD). The DTD is written
according to a rigorous language defined by the ISO 8879 standard. The
rules described by the DTD follow standards that have been defined for a
particular type of information data element (e.g., a bullet list must
contain more than one bullet, a second level heading must precede a third
level heading).
In a typical SGML implementation, the SGML source file and its associated
DTD are read into a validation software program. The validation program
parses the source file and determines whether the file conforms to the
rules defined by the DTD. If any rules are violated, an error is detected.
Errors can range from syntax errors, such as misspellings, to missing
elements. When validation is successful, all elements needed to make up a
document are present in the correct order. The data is then ready for
production (formatting), a stage that is handled by software applications
that format the SGML data elements for specific output mediums. More
information on SGML and on the use of SGML data files and Document Type
Definitions can be found in Standard Generalized Markup Language (SGML)
International Standard (ISO) 8879, First Edition in SGML (ISO) 8879:
1986/Amendment 1, both published in 1986. Both publications are available
from Graphic Communications Association (GCA) Publications and Resources
in Alexandria, Va.
SGML and Object-Oriented Information Management
The concept of object orientation has become popular in the field of
computer programming because it enables reuse of programming code objects.
Similarly, the concept of object orientation can be applied to technical
writing, where "objects" of documentation (including text, graphics, and
other forms) can be reused in many ways for many different documents. This
object-oriented information management addresses many of the problems
associated with authoring increasingly complex technical documentation.
By definition, SGML provides a mechanism for object-oriented integrated
information management. Under an object-oriented information management
strategy, individual snippets (or objects) of data become part of an
organization's database of information. Documents are then formed by
piecing together pertinent objects. SGML provides a mechanism for
encapsulating data within information objects. For example, data
describing an object's purpose or links to other objects may be
encapsulated within an information object.
As a standard language specification, however, SGML defines only the way
information is described in the source data files. SGML does not define
how to manipulate the data elements or how to generate output documents
from the source data files. Although SGML provides the ability to
encapsulate data within information objects, it does not define a method
or system for manipulating the information objects in order to produce a
variety of documents from a single document file.
There is a need in the art for a method of authoring a single source
document that contains multiple variations of a subject such that the
single source document can be used to generate a variety of documents
based on these variations. In addition, there is a need in the art for a
system for using that source document to generate a document tailored to
each variation or documents containing combinations of variations, where
variation-specific information is clearly identified.
Finally, there is a need in the art for a system and method of producing
documents that can identify data encapsulated within an information object
as specific to a particular variation, that can manipulate the
encapsulated data objects, that can generate multiple output documents
from a single source file that contains information for multiple
variations, and that permits variety in the presentation of the
encapsulated variation data in the output documents.
SUMMARY
The present invention is a system and method for producing a variety of
documents from a common document database. A document is partitioned into
a number of encapsulated data elements. One or more classes of variations
are defined and variation names are associated with each class. Data
elements within the document are tagged with one or more variation names
and placed within a document database; the resulting document database can
be filtered and formatted to form variation-specific documents.
According to another aspect of the present invention, a method of creating
custom documentation is described. A document having a plurality of
encapsulated data elements and a set of rules detailing a relationship
between the encapsulated data elements is provided. Each encapsulated data
element may be associated with one or more classes, wherein each class
includes a plurality of variation names. A document is created by
selecting a set of the variation names of data elements to be included in
the document and then collecting a set of data elements having those
attributes as a filtered dam file. The filtered data file is formatted and
output as a custom document.
BRIEF DESCRIPTION OF THE DRAWINGS
In the drawings, where like numerals refer to like components throughout
the several views,
FIG. 1 shows a conceptual model of a document generation system according
to the present invention;
FIG. 2 shows an example document database implemented using a document
generation system according to the present invention;
FIG. 3 is an information data model of a document generation system
according to the present invention;
FIG. 4 is an information flow model of a document generation system
according to the present invention;
FIGS. 5a and 5b are functional block diagrams of two embodiments of the
document generation system;
FIGS. 6a and 6b show a document segment and its corresponding SGML data
file;
FIGS. 7a and 7b show a representative Document Type Definition and its
hierarchy;
FIGS. 8a is a key to FIGS. 8aa-8ab;
FIGS. 8aa-8ab, 8b, and 8c show a Document Type Definition, its hierarchy,
and a corresponding SGML data file, respectively, for one embodiment of
the CDS configuration as defined by the present invention;
FIGS. 9a and 9b show a representative output document segment and its
corresponding SGML data file for one embodiment of the present embodiment;
FIG. 10a, 10b, and 10c show an SGML data file containing two classes of
variations and two corresponding output documents according to one
embodiment of the present invention;
FIGS. 11a-c the steps executed by a computer in validating an SGML data
file that uses CDS; and
FIGS. 12a and 12b show the steps executed by a computer in filtering an
SGML data file that uses CDS;
FIGS. 13a-c show the steps executed by a computer in formatting a filtered
SGML data file that uses CDS.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
In the following detailed description of the preferred embodiment,
reference is made to the accompanying drawings which form a part hereof,
and in which is shown by way of illustration specific embodiments in which
the inventions may be practiced. It is to be understood that other
embodiments may be utilized without departing from the spirit and scope of
the present inventions. The following detailed description is, therefore,
not to be taken in a limiting sense. Instead the scope of the present
inventions is defined by the appended claims.
Glossary
This application is complex and the following terms will be abbreviated to
preserve the flow of the document:
CDS--Common Documentation System: An authoring methodology and information
data model that focuses on combining multiple variations of information
into a single document source database. From this database, many different
types of printed or electronic documents can be produced.
DTD--Document Type Definition: A set of rules which define the relationship
between data elements in an SGML source file.
PC--personal computer
SGML--Structural Generalized Markup Language: SGML is a language for
describing the structure and content of a document. It is a structural
markup language published as ISO 8879 in 1986.
Definitions
The following terms and concepts are used in the description of the
invention:
author: A person who creates or modifies a database used to produce a CDS
document.
class: An author-specified category of variations used within Common
Documentation System (CDS) documents.
custom document: A document, generated from a source file containing
multiple variation of information, that contains a subset of the
variations included in the source file. The custom document may be
generated by the document author or by the end user.
database: A collection of source files created by authors and used to
produce documents. With the CDS methodology, a single database is used for
multiple variations of information.
document: A printed document, electronic document, courseware,
computer-based trig package, or any deliverable that describes or trains
people in a product, service, procedure, or concept.
encapsulated data element: A unit of information that makes up a document,
consisting of information (such as a paragraph or illustration), along
with data about that information (such as its purpose, the variations it
applies to, or links to other encapsulated data elements). Also commonly
referred to as information objects.
end user: A person who uses documents, such as a programmer who uses a
programming reference manual.
information data model: A conceptual aid used to describe concepts of an
information (document) database. It includes a description of encapsulated
data elements within a database and the structure and relationships of
those elements. The CDS information data model describes the information
included in the databases to enable programmatic application of the output
and suppression of variation-specific information.
methodology: A system of principles, processes, and practices applied to a
specific branch of knowledge (which, in this paper, is the authoring of
documents).
platform: Hardware or operating system, such as a specific personal
computer, midrange, or mainframe system.
variation name: An identifier for a variation within a class as defined by
the CDS information data model.
The Common Documentation System
The COMMON DOCUMENTATION SYSTEM (hereinafter "CDS") defines an authoring
methodology and information data model that focuses on combining multiple
variations of document data into a single document source database. From
this database, many different types of printed or electronic output
documents can be generated to suit the specific needs of the users as well
as the creators of the documentation. From a single source database, the
system generates a combined document containing information for all
variations, multiple documents containing information for combinations of
variations, and/or a separate document for each variation. (COMMON
DOCUMENTATION SYSTEM and CDS are trademarks of Unisys Corporation, the
assignee of the present invention.)
A conceptual model of a CDS document generation system is shown in FIG. 1.
As shown in FIG. 1, a document database 10 contains multiple variations of
information (represented in the figure as a triangle and a circle). A
document database processor 12 processes this database, filtering out any
variations specified by the author, and generates tailored output
documents. Depending on the input from the author, the system generates
one output document 14 containing one variation (represented by the
triangle), another output document 16 containing the other variation
(represented by a circle), and/or another output document 18 containing
both variations.
Here is an example of how the CDS document generation system might be used.
Suppose a software product runs on a PC, a UNIX system, and a mainframe.
The software basically operates the same on all three platforms, but there
are significant differences that must be addressed in the user
documentation. A typical documentation group would create, generate, and
deliver three separate sets of documentation-one for each platform. This
is inefficient, redundant, and costly for the documentation group. And
customers who use the product on two or three of the platforms must use
separate, redundant document sets.
FIG. 2 shows how CDS can solve these problems. Using CDS, the author
creates a single document database 20 containing information for all three
platforms. Depending on the author's input, the system generates any of
the following documents: A document 22 containing information for the PC
only, a document 24 for UNIX only, a document 26 for the mainframe only, a
document 28 that contains all three platforms, a document 30 that contains
PC and UNIX only, a document 32 that contains UNIX and mainframe only,
and/or a document 34 that contains PC and mainframe only. In one
embodiment, in documents (e.g., 28, 30, 32, 34) containing more than one
variation, the information specific to each platform is visually
identified in the text of the output document.
In another embodiment, for documents delivered in electronic form (such as
CD-ROM) that contain multiple variations (such as the three platforms),
the vendor uses search and retrieval software to suppress information
specific to one or more variations. For example, while viewing an
electronic document containing information for the PC, UNIX, and
| | |