|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to structured database systems, and
more particularly to a structured database system that uses structured
information in documents to manage the documents. In general, information
in documents is innately structured. For example, documents having such
structured information are documents used in the activities of companies,
such as drawings and specifications.
2. Description of the Prior Art
Generally, information in documents has a tree structure as shown in FIGS.
1A and 1B. FIG. 1B schematically shows the tree structure shown in FIG.
1A. Information in documents is structured in terms of structural units or
elements such as a group of documents, documents, chapters, sections and
paragraphs. The tree structure of a document may be dynamically changed.
For example, the tree structure may be expanded by adding a new unit after
an existing unit of tree structure or grouping a number of existing units.
For example, items are defined in paragraphs, drawings and tables, and are
then collected to form a group that follows an existing section.
The structured database handles electronic information of structured
documents. The electronic information of documents can be in the form of
text data, graphic data (image data and vector data), source code
(normally character data), the internal code (normally vector data) of a
CAD (Computer Assisted Design) system and so on.
Conventionally, a word processor, a DTP (Desk Top Publishing) system, a CAP
(Computer Assisted Publishing) system, and a CAD system are known as
devices for creating and managing the electronic data of documents.
Further, existing database systems such as an RDB (Relational DataBase),
can be used to store and manage documents.
The devices as mentioned above are classified into two types; a first type
in which a document is handled as groups of symbols such as characters,
control symbols, graphic symbols, or a second type in which a mark called
"tag" is added to elements in a document. The devices of the first type
handle a document as simple data and therefore have a difficulty in
management and reuse of the information structure. For example, it is
necessary to perform information retrieval in order to know specific
information in a specific document or the history of modified portions.
Generally, it is very difficult to correctly obtain all of necessary
information by means of the information retrieval for the above-mentioned
purpose. Even in a case where the document management table electronically
cooperates with documents, it is only possible to retrieve a storage area
in which the target document is stored, and it is impossible to correctly
obtain necessary information from the target document unless the operator
actually sees the contents of the documents.
The devices of the second type are capable of performing management based
on the structures of documents. However, the devices of the second type
still handle files with documents as groups of blocks of data independent
of the structures of the documents, and hence need a particular mechanism
like the document management table in order to perform development,
management and reuse of documents (including groups of documents mutually
associated) and to perform information retrieval. The above particular
mechanism is not directly related to information bodies themselves as in
the case of use of papers. Hence, the devices of the second type do not
have sufficient efficiency and reliability in information retrieval and so
on.
The existing database systems have structures that are optimized for
specific operations and do not have the functions of efficiently and
effectively supporting the document structures. Hence, the existing
database systems have the following disadvantages, particularly, regarding
the way that database systems are used.
When a document is stored and managed in an existing database system, the
document may be arranged on the basis of the structure thereof. For
example, when a document is stored in the RDB system, the document is
required to be arranged and stored in the form of a table.
On the other hand, if an existing database system is modified in order to
match the structure of a document to be stored and managed, some
definitions which were not originally prepared may be defined in the
existing database system. For example, it is required to define a pointer
for accessing a file and/or a free field for each field of the RDB system.
Such an additional definition in the existing database system may degrade
the original performance thereof, particularly regarding the efficiency in
information retrieval and storage capacity. In some cases, the additional
definition may prevent use of the original accessing method, such as a
standard query language for the RDB system. Such a problem further
degrades the efficiency in accessing the database and sometimes requires a
particular remedy, i.e., program, for access.
The structure of documents is flexible. For example, the structural units
or elements of documents, such as the numbers of chapters and sections are
variable, and the document structure expanded. Normally, the structure
definition (schema definition) of the existing database systems is
determined before data is actually stored. Hence, it is very troublesome
to modify the structure of the active database system when in use. When
the active database system is modified, a data backup process will be
needed, and the saved data may be required to be loaded into the system
again after the modification is complete in order to match the saved data
with the modified database structure.
It is required that the database system always stores the latest
information regarding documents. When a document is revised, a revised
version or edition of the document is issued. In some cases, it will be
required to save not only the revised version but also the previous
versions made in the past. Hence, it is necessary to efficiently manage
documents having a number of versions.
The conventional database systems are easily capable of managing the latest
version but need to save the previous versions independent of the latest
version. In this case, a particular mechanism such as a register system is
needed to manage the correspondence among the latest version and the
previous versions. Hence, it is necessary to save all the versions and
manage and update the correspondence among the versions.
However, it is practically impossible to manage the correspondence among
the versions by means of the register mechanism. For example, if there is
a need to reflect an error found in a version to the other versions, it
will be very difficult to efficiently access such an error in each of the
other versions. Further, there is a possibility that the above error may
not be completely corrected in some other versions.
In some cases, a document is required to be written in a number of
languages. When a document written in a particular language is developed
or modified, the other versions written into the other languages must be
developed or modified, so as to have the same contents as those of the
document originally developed or modified, for each of the structural
units such as chapters, sections and paragraphs. For example, when a
Japanese document is translated into English, information inherent in
Japanese may be omitted or one paragraph may be divided into a number of
parts such as paragraphs. In this case, the information elements of the
Japanese document and those of the English translation do not have a
direct one-to-one correspondence. Even in this case, the correspondence
between the Japanese document and the English version thereof is needed to
be managed for each information element.
Further, in a case where either the Japanese or the English version is
modified, it may be very troublesome to modify the other version even if
the relevant portions in the version to be modified are easily identified.
If the Japanese version is greatly modified, it may be required to
translate the modified Japanese version again in order to prepare the
English version perfectly corresponding to the modified Japanese version.
It will be noted that the contents of documents in the form of paper can be
easily seen while documents stored as electronic information cannot be
directly seen. In the form of paper, the location of information can be
seen and information retrieval can be facilitated. However, such useful
information is not available in electrically converted information. As the
amount of information electrically stored increases, more useful tools,
such as a table of contents and indexes are required to facilitate
information retrieval in addition to improvements in the structure of the
database system.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a structured database
system capable of performing retrieval at high speeds and in accordance
with the document structures.
The above object of the present invention is achieved by a structured
database system comprising: first means for obtaining a structure
definition frame of a document showing a structure of the document; and
second means for storing body data of the document in a database together
with the structure definition frame.
BRIEF DESCRIPTION OF THE DRAWINGS
Other objects, features and advantages of the present invention will become
more apparent from the following detailed description when read in
conjunction with the accompanying drawings, in which:
FIGS. 1A and 1B are illustrative and symbolic diagrams respectively which
show a structure of a document;
FIG. 2 is a block diagram of a structured database system according to an
embodiment of the present invention;
FIG. 3 is a diagram showing a structure definition frame and a document
stored in a database according to the embodiment of the present invention;
FIGS. 4 and 5 are flowcharts of a process converting information concerning
a document into information suitable for the database according to the
embodiment of the present invention;
FIG. 6 is a diagram of an example of information stored in the database
according to the embodiment of the present invention;
FIG. 7 is a flowchart of a process for creating a table of contents
according to the embodiment of the present invention;
FIG. 8 is a diagram showing an example of the table-of-contents creating
process shown in FIG. 7;
FIG. 9 is a diagram showing how data is stored in the database according to
the embodiment of the present invention;
FIG. 10 is a flowchart of a process for modifying a document according to
the embodiment of the present invention;
FIG. 11 is a diagram showing a check-in/check-output process according to
the embodiment of the present invention;
FIG. 12 is a diagram for explaining links between documents according to
the embodiment of the present invention;
FIGS. 13A and 13B together in combination are flowcharts of a link
modifying process according to the embodiment of the present invention;
FIG. 14 is a diagram showing a link modification according to the
embodiment of the present invention;
FIG. 15 is a block diagram of an editing device used in the embodiment of
the present invention;
FIG. 16 is a flowchart of a read process executed by the editing device
shown in FIG. 15;
FIG. 17 is a flowchart of an editing management process executed by the
editing device shown in FIG. 15;
FIG. 18 is a diagram for explaining the relationship among an editing range
management table, an editing management table and buffers in the editing
device shown in FIG. 15;
FIG. 19 is a flowchart of an editing starting process executed by the
editing device shown in FIG. 15;
FIG. 20 is a flowchart of an editing enable process executed by the editing
device shown in FIG.15;
FIG. 21 is a flowchart of an editing process executed by the editing device
shown in FIG. 15;
FIG. 22 is a block diagram showing how FIGS. 22A and 22B are combined
together;
FIGS. 22A and 22B together are flowcharts of a version number management
process executed by the editing device shown in FIG. 15;
FIG. 23 is a flowchart of a difference process executed by the editing
device shown in FIG. 15;
FIG. 24 is a flowchart of an instance specifying process executed by the
editing device shown in FIG. 15;
FIG. 25 is a flowchart of a write process executed by the editing device
shown in FIG. 15;
FIG. 26 is a flowchart of a new index creating process executed by the
editing device shown in FIG. 15;
FIGS. 27A and 27B are sequential diagrams for explaining an index table
used in the editing device shown in FIG. 15;
FIG. 28 is a block diagram showing how FIGS. 28A and 28B are combined
together;
FIGS. 28A and 28B together are flowcharts of an index creating process
executed by the editing device shown in FIG. 15; and
FIG. 29 is a flowchart of a retrieval process executed by the editing
process shown in FIG. 15;
DESCRIPTION OF THE PREFERABLE EMBODIMENTS
FIG. 2 is a block diagram of a structured database system according to an
embodiment of the present invention, A document converted into electronic
information by an input unit 10 is converted into structural data and body
data according to the structure of the above document by means of a
collecting unit 12 of a processing system 11. The converted structural
data and the main body data are stored in a database 14 functioning as a
storage medium.
According to the embodiment of the present invention, the infrastructure of
the structured database is designed so that the structure of a document
itself is defined as the structure of the database system. More
particularly, as shown in FIG. 3, the structure of a document is defined
by using a language and thereby a document structure definition frame 3A
is created. In the example shown in FIG. 3A, the structure definition
frame 3A is named "structure 1" in the first line thereof. The second line
of the structure definition frame 3A shows that sections and paragraphs
are included in chapters, and the third line thereof shows that
paragraphs, drawings and tables are included in sections. Further, the
fourth line of the structure definition frame 3A shows that drawings are
external files.
As shown in FIG. 3, a document 3B is given the same name as that of the
structure definition frame 3A and is connected with the structure
definition frame 3A. Further, tags such as <chapter>, <section>,
<paragraph>, and <drawing> are provided in order to indicate the structure
of the document 3B.
The structure definition frame 3A and the document 3B, input to the input
unit 10, are supplied to the collecting unit 12, which performs processes
shown in FIGS. 4 and 5. FIG. 4 is a flowchart of a process for creating a
database of the structure definition frame 3A. In step S1 of the flowchart
shown in FIG. 4, the hierarchical relationship between the elements of the
structure definition frame 3A, such as the chapters, sections, paragraphs,
drawings and tables, is interpreted and checked. In step S2, it is
determined, based on the check results, whether or not there is any
inconsistency in the hierarchical relationship. If an inconsistency is
found in the hierarchical relationship, an alarm is generated in step S3
in order to cause the structure definition frame 3A to be modified.
If no inconsistency is found in step S2, the hierarchical relationship
between the elements of the structure definition frame 3A is converted
into the form of a tree structure and stored in the database 14 in step
S4. In step S5, the name of the structure definition frame 3A is
registered in a management register in the database 14. Then, the process
shown in FIG. 4 is ended.
FIG. 5 is a flowchart of a process for converting a document into
information in the form of a database. In step S11 shown in FIG. 5, the
document 3B is read from the leading end thereof until a tag is found. In
step S12, data between the above tag and the subsequent tag is segmented.
In step S13, the tag attached to the leading end of the segmented data is
compared with the related structure definition frame 3A specified in the
document 3B.
In step S14, it is determined whether or not there is an inconsistency in
the result of the comparison performed in step S13. If an inconsistency is
found in step S14, an alarm is issued in step S15 to make the document
modified. If there is no inconsistency, the tag attached to the leading
end of the segmented data is registered in the tree structure in step S16.
In step S17, the data following the above registered tag is stored in the
database 14, and a pointer, indicating an area in which the segmented data
is stored in the database 14, is given to the tag registered in the tree
structure.
Thereafter, in step S18, it is determined whether or not the document 3B
has another tag. When the result of this determination is affirmative, the
process proceeds to step S11. When the result of the step S18
determination is negative, the process shown in FIG. 5 is ended.
By executing the processes shown in FIGS. 4 and 5, the structure definition
frame 3A and the document 3B are respectively converted into tree
structures 30 and 31 as shown in FIG. 1 and are then registered in the
database 14.
In the above-mentioned manner, the structure definition frames for
documents are provided, and the names of the structure definition frames
are given to the respective documents in order to connect the structure
definition frames with the documents. With the above structure, it becomes
possible to define the RDBs and the spreadsheets as tables. All or part of
each document is then handled as data within the existing database
systems. The interfaces with the above database systems are realized by
creating interface routines for interfacing with the database access
languages such as the SQL (the standard access interface for the RDB).
Hence, it becomes possible to access the database system according to the
embodiment of the present invention by means of the standard languages
having the standard database access interfaces.
The structure definition frames and the documents are separately stored,
and the connections therebetween are established by only information
indicating which structure definition frame is used in each of the
documents. Hence, even if the document structure is modified, it is not
necessary to modify the document itself as long as the modification
retains the structure before the modification. When documents have an
identical structure, it becomes possible to use the identical structure in
common to the documents only by changing the tag names if different tags
are used. That is, the documents are arranged in the form of the tree
structure, and an identical structure can be commonly used for many
documents. Further, there is a case where one document can use a number of
structure definition frames. That is, one piece of information is stored
and utilized in a number of database structures.
Further, since the documents are arranged in the form of the tree
structure, and pairs of tags and document contents related to these tags
are separately stored, it becomes possible to perform information
retrieval reflecting the document structure by means of a path (for
example, the chapter name.fwdarw.the section name.fwdarw.the sub-section
name.fwdarw.paragraph number) based on the document structure. As a
result, the document parts containing target information can be easily
obtained by tracing the path rather than searching about for the target
information.
Further, it is easily possible to manage modifications of layout
information versions, or editions by defining layout information, and by
describing the formats of printing and display in the same format as the
documents.
Furthermore, the following advantages can be obtained by separately
processing tag parts (the contents of the tags are not changed) and by
indicating the structure and the contents of information to be modified by
means of an editor (structured editor) for operating the database. In a
case where the cursor is moved on the display of the editor (to point to
portion to be modified) and, if the cursor is located at a tag, it can be
inhibited from deleting and modifying a character in the display position
of the tag. If the cursor is located in a position in which information to
be modified is displayed, it is allowed to add, delete and modify the
information and display the position of the cursor in the document and to
display information (tag information and so on) which can be input in the
above position.
Since the document structure is handled separately from the document
itself, it is possible to requires a particular remedy, i.e., program, for
access document.
FIG. 7 is a flowchart of the process for creating a table of contents of a
document. FIG. 8 shows an example of the process shown in FIG. 7. In step
S21, a table-of-contents extracting control message is interpreted. In the
table-of-contents control message, labels such as <chapter> and
<paragraph> in the database shown in FIG. 6 are specified. In step S22,
the structure definition frame 30 specified in the document 31 is compared
with the labels included in the table-of-contents control message, and it
is determined which data of the document 31 should be extracted.
In step S23, the tree structure describing the structure of the document 31
is traced by means of a path based on the structure, and the storage
position (pointer) storing the data related to the labels indicated as
requiring extraction is identified. In step S24, the identified data is
extracted. In step S25, if necessary, a character string indicating the
chapter, section and so on is added to the extracted data, which is then
output.
For example, when the labels <chapter> and <section> are specified by the
table-of-contents control message with respect to the document shown in
FIG. 6, [chapter title], [section title], [next chapter title] are
extracted as items of the table of contents.
Turning now back to FIG. 2, a document stored in the database 14 is checked
out (extracted) and is sent to a processing unit 15, which edits the
checked-out document. After editing, the edited document is checked in
(returned to) the database 14. The document in the database 14 is read by
a delivery unit 16, and is output to an output unit 17, which prints out
and copies the output document and which directly displays the document by
means of an on-line viewer. The database 14 is managed by a management
system 20.
Data is stored in the database 14 in the form shown in FIG. 9. For
management of documents, a version control table (VCT) is defined for each
of the versions of each of the documents. Further, in order to manage the
elements of each document, a logical management unit EB (Edit Block) and a
physical management unit SB (Storage Block) are defined. The edit blocks
EB indicating the elements of the corresponding document are registered in
the version control table, and each of the edit blocks ED stores the
storage blocks SB contained therein.
The version control table VCT is a table in which pointers indicating
connections to the edit blocks EB are defined. Each of the edit blocks EB
is a unit in the structure of the document. For example, one edit block EB
is the chapter, section, or paragraph, or individual each item as itemized
information. It may be possible for each edit block EB to have a different
hierarchical level (the edit blocks EB having the unit of chapter and the
edit blocks having the unit of list item may be mixed). The edit block EB
is a table in which pointers indicating connections to the storage blocks
SB are defined. The storage block SB is a table storing pointers
indicating individual element units (located with tags indicating the
structure) in blocks of information.
It is possible to register the storage blocks SB in the version control
table VCT. The process in this case can be performed in the same manner as
the process in the case shown in FIG. 9. However, there are not many cases
where all the element units are changed each time the documents are
revised. Hence, it is convenient to store the edit blocks EB in the
version control table VCT because management including the management of
the structure can be efficiently performed with ease. Each document in the
progress of development is managed by a VCT' equivalent to the version
control table VCT. When a document to be handled at the commencement of
creating the first version (edition) or revising the latest version is
specified, the version control table VCT' is created. At the commencement
of revising the latest version, the version control table VCT' is a copy
of the version control table VCT. When the development of the first
version or revision of the version is completed, the edit blocks
registered in the version control table VCT' are collected in the process
of authorization, and the version control table VCT for the revised
version is formed while the old version control table VCT is saved. Hence,
the documents of the old and new versions are managed in parallel. This
management includes the contents of the old and new versions.
The contents of the information bodies are contained in each of the storage
blocks SB. The contents of the information bodies are control information
and an instance (a character string indicating real information or an
external file name). The control information includes an identifier ID
identifying the contents of the information body of concern, the version
number of the contents of the information body, a data type indicating
whether the data indicates the structure of the document or a character
string of the document, a link destination, a link source, and the
attribute of the link. The contents of the information body can be
identified together with the control information. The contents of the
information body are arranged so that pairs of the structure of the
document and the body data of the document are sequentially continued.
When a document is modified, the check-out and check-in operations are
carried out in the edit block EB unit. Addition, modification and deletion
of information is so that the extracted part is located in the closed
state in the check-out destination. In the master database, the contents
of the information body are not modified in cases other than the check
out.
The check-out/check-in operation is performed for a unit which is completed
as a part of the structure of the document (for example, chapter, section,
paragraph and so on). One edit block is created for one completed part of
the document at the time of check out. At the time of check in, a modified
part of the edit block EB is traced and the edit block EB in the master
database is updated.
At the time of check-in, the edit block, a group of storage blocks linked
thereto, and the contents of the information bodies are simultaneously
checked in. When the check-in of data is completed, the checked-in edit
block EB containing a modification is compared with the edit block EB
before the check out is performed. The difference between the new version
and the old version is retained as the edit block EB for the old version
and the version control table VCT' is updated. Hence, the speed of
accessing a document in the progress of development or modification can be
improved and a number of old versions can be saved without any
inconsistency.
The edit blocks are formed so as to be the version control table VCT when
viewed from the check-out destination. The edit blocks are given from the
first check-out destination to the second and third check-out destinations
where the check-out operation is performed. Thereby, it becomes possible
to reflect, without any inconsistency, the activities in the progress of
document development and version revision on the database system. For
example, it becomes possible to easily manage a process in which a primary
outside order of a modification in the edit blocks EB is issued in the
edit block unit, and a secondary outside order of a modification in some
edit blocks EB contained in the primary outside order is further issued.
By separating the edit block EB which is the logical management unit from
the storage block SB which is the physical management unit, it becomes
possible to flexibly form the physical structure of the database and
improve efficiency in use of the storage medium independent of data stored
in the database. For example, it is possible to perform a tuning process
in which the physical performance of the database matches the physical
performance of the storage medium. For example, it is possible to form the
storage block SB in the unit equivalent to a paragraph, chapter or section
and to form one storage block SB for the whole document.
The version number is managed in the instance unit, and hence a document
can be formed by collecting instances belonging to arbitrary version
numbers. As a result, even in a case where the user uses a system
particularly having old components, it becomes easily possible to provide
the user with a manual of such a system and materials kept by the user.
That is, information concerning the document structure is dynamically
utilized for managing information.
The method of performing, in the storage unit, the management of versions
regarding the structure of the document and body data is realized by using
an editor as follows.
FIG. 10 is a flowchart of a process for updating the document executed by
the processing unit 15. The process shown in FIG. 10 is initiated when the
processing unit 15 is informed of start of a modification by an
application program executed in a terminal 15a of the processing unit 15.
When work on an instance is completed, the document updating process shown
in FIG. 10 is started. In step S31, it is determined whether or not a
modification has been added. When it is determined that a modification has
been added, a modified instance is created and added to the tail end of
the storage block. When no modification has been made, the updating
process is ended. For example, if a modification is added to instance A, a
modified version of the instance A is created when a work shifts to
instance B. The modified version of the instance A is added to the tail
end of the storage block SB, and a mark indicating that the modification
is completed is added to the identifier ID of the instance before the
modification. Further, the pointer from the related edit block EB is
changed.
When modifications are added to an identical instance a number of times,
the check-out of the instance for modification is performed along the
pointer from the edit block. Hence, it is possible to pick up the latest
instance.
When it is determined, in step S33, that a sequence of modifications has
been completed, that is, when an instruction indicating saving of the
modified edit blocks is issued by the application program, step S34 is
executed in which for each instance the identifier ID and the version
number of the oldest instance given a mark indicating invalidity are
retained, and the version number of the latest modified instance is
incremented. Further, the other instances which have been modified are
deleted in step S34. Then, the process shown in FIG. 10 is ended.
In the above manner, the contents of the information body (old version),
for which modification or deletion is carried out, are given the mark
indicating invalidity, and the identifier ID and the version number of the
contents of the information body are saved. At this time, the instance may
be deleted in order to facilitate efficiency in the storage are | | |