|
Claims  |
|
|
We claim:
1. A method for a computer for storing successive versions of successively
changed data files stored in a computer accessed data storage device and
for searching for a group of said versions, the method comprising the
steps of:
creating and storing a version history record when the content of one of
said data files undergoes a change, the version history record comprising
a set of instructions for recreating a version of the content of said one
data file existing immediately prior to the change by inserting data into
and deleting data from said one data file existing immediately following
the change, one such version history record being created and stored each
time a data file is changed such that a version history record is
associated with each successive version of the data file;
assigning to each version history record and storing a time parameter value
established according to the time the version associated with the version
history record was first stored as the content of a data file;
storing node records, each of said node records being associated with a
corresponding one of said version history records and comprising first
data indicating at least one file attribute parameter and second data
indicating a value for said at least one file attribute parameter, said
file attribute parameter value representing an attribute of a content of a
data file recreated by the set of instructions comprising the associated
version history record; and
searching for a group of said version history records in response to an
input command identifying a particular file attribute parameter value and
a first particular time, wherein each version history record of said group
comprises the set of instructions recreating a data file as it existed as
of said first particular time and is associated with a node record and
wherein the step of searching comprises the substeps of:
reading data comprising the stored node records;
identifying from the node record data read a first subset of version
history records associated with node records indicating said particular
file attribute parameter value,
reading stored time parameter values assigned to version history records to
determine a second subset of version history records comprising
instructions recreating a data file as it existed as of said first
particular time, and
identifying said group of version history records as all version history
records included in both said first and second subsets.
2. A method for a computer for storing successive versions of successively
changed data files stored in a computer accessed data storage device and
for searching for a group of said versions, the method comprising the
steps of:
creating and storing a version history record when the content of one of
said data files undergoes a change, the version history record comprising
a set of instructions for recreating a version of the content of said one
data file existing immediately prior to the change by inserting data into
and deleting data from said one data file existing immediately following
the change, one such version history record being created and stored each
time and a data file is changed such that a version history record is
associated with each successive version of the data file;
assigning to each version history record and storing a time parameter value
established according to the time the version associated with the version
history record was first stored as the content of a data file;
storing link records, each link record comprising third data version
history records, fourth data indicating at least one link attribute
parameter, and fifth data indicating at least one link attribute parameter
value for said at least one link attribute parameter, said at least one
link attribute parameter value representing an attribute of a relationship
between a pair of data file versions;
assigning to each link record and storing a second time parameter value
established according to a time the link record was stored;
storing node records, each of said node records being associated with a
corresponding one of said version history records and comprising first
data indicating at least one file attribute parameter and second data
indicating a value for said at least one file attribute parameter, said
file attribute parameter value representing an attribute of a content of a
data file recreated by the set of instructions comprising the associated
version history record; and
searching for a group of version history records in response to an input
command identifying a particular data file, a particular file attribute
parameter value, a particular link attribute parameter value, and a
particular time, wherein each version history record of said group
comprises instructions recreating a data file having a relationship with
said particular data file represented by said particular link attribute
parameter value as of said particular time and is associated with a node
record indicating said particular file attribute parameter value;
wherein the step of searching comprises the substeps of:
reading data comprising the stored link records;
reading stored time parameter values associated with said link records; and
determining from the link record data and time parameter values read said
group of version history records.
3. An apparatus for storing and identifying successive versions of
successively changed content of data files stored in a computer accessed
data storage device, the apparatus comprising:
first means for creating and storing a version history record when the
content of one of said data files undergoes a change, the version history
record comprising a set of instructions for recreating a version of the
content of said one data file existing immediately prior to the change by
inserting data into and deleting data from the content of said one data
file existing immediately following the change, one such version history
record being created and stored each time the data file is changed such
that a version history record is associated with each successive version
of the content of a file, and for creating and storing node records, each
data file and each version history being associated with one of said node
records, each said node record comprising first data indicating at least
one file attribute parameter and second data indicating a value for said
at least one file attribute parameter, the value of said file attribute
parameter representing a file content attribute, each node record being
assigned a time parameter value established according to the time the file
content version associated with the node record was first sored as the
content of a data file; and
second means for identifying a first group of said node records wherein
each node record of said first group comprises data indicating a
user-selected file attribute parameter value and having an assigned time
parameter value indicating the file content version associated with the
node record was stored in a file as of a user-selected time.
4. An apparatus for storing an identifying successive versions of
successively changed content of data files stored in a computer accessed
data storage device, the apparatus comprising:
means for creating and storing a version history record when the content of
one of said data files undergoes a change, the version history record
comprising a set of instructions for recreating a version of the content
of said one data file existing immediately prior to the change by
inserting data into and deleting data from the content of said one data
file existing immediately after the change, one such version history
record being created and stored each time the data file is changed such
that a version history record is associated with each successive version
of the content of a file;
means for creating and storing node records, each data file and each
version history being associated with one of said node records, each said
node record comprising first data indicating at least one file attribute
parameter and second data indicating a value for said at least one file
attribute parameter, the value of said file attribute parameter
representing a file content attribute;
means for creating a storing link records, each link record comprising
third data identifying a pair of node records, fourth data indicating at
least one link attribute parameter, and fifth data indicating a value for
said at least one link attribute parameter, said link attribute parameter
value representing an attribute of a relationship between a pair of data
file content versions associated with said pair of data files;
each of said node records being assigned a time parameter value established
according to the time the file content version associated with the node
record was first stored as the content of a data file, and each said link
record being assigned a time parameter value established according to the
time the link record was stored; and
search means responsive to an input command for reading said node and link
records and for identifying a group of said data files from data
comprising the node and link records read, wherein the content of each one
of said group of data files as of a particular time is related to the
content of a particular data file as of said particular time according to
a particular link attribute parameter value, said command reference said
particular time, said particular data file and said particular link
attribute parameter value. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to computerized data storage and retrieval
systems and in particular to a system for linking separate data files
according to user-definable relationships.
Typically one of the most difficult aspects of large engineering or similar
projects is record keeping. For instance in the design and construction of
a nuclear plant, massive numbers of documents are generated, including
preliminary studies, drawings, specifications, letters, reports and the
like. These documents must be stored in a logical fashion so that they can
be retrieved when needed. When the number of such documents becomes very
large it is often difficult to find them once they are stored,
particularly when only the nature of a document to be retrieved, and not a
name or a reference number under which it is filed, is known:
In addition to problems associated with storing and retrieving documents,
there are also considerations associated with the "ripple" effect that
changing one project document may have on other project documents. For
instance if a design drawing is changed, other drawings of a specification
which relate to the drawing might also have to be altered. For a very
complete project it is not easy to determine the other documents affected.
Also it is often important to keep records of document changes, including
not only prior versions of a document but in addition records as to why a
document was changed and who changed it.
The use of computerized data base systems is well known. Data base systems
permit documents to be characterized according to various attributes of
the document such as "author", "document type", "subject matter", etc. For
instance, to characterize a specification for a pump written by Smith, a
user may assign character strings "Smith", "specification", and "pump" as
the values of author, document type and subject matter attributes. Such
data base systems typically include search routines for locating documents
identified by assigned attribute values matching a list of selected
attribute values provided by a user, thereby enabling a user to easily
locate all documents sharing a common set of attributes such as all pump
specifications written by Smith.
More recently, the advent of rapid access bulk data storage devices and
multiple computer networks has permitted computer systems to actually
create and electronically store the documents as files in a bulk storage
device in addition to keeping track of documents. To be effective for use
in storing and retrieving documents associated with a large project, such
file management systems should be capable not only of locating groups of
files containing documents having common attributes, but also of finding
groups of files which are related to a given file in some definable way.
For instance, once a user has located a particular file containing Smith's
pump specification, he may then wish to locate other files which contain
reviewer's comments regarding the pump specification or which may contain
pump drawings related to the pump specification. There is continuing
activity in the area of computerized data storage and retrieval systems
pertaining to "hypertext" systems which enable users to establish "links"
between file pairs indicating that two files are related in some way. (The
article "Reading and Writing the Electronic Book", by Nicole Yankelovich
and Norman Meyrowitz, pages 15-30 in the October, 1985 issue of Computer,
published by the Institute of Electrical and Electronic Engineers, is a
good summary of prior art relating to systems of this type and is
incorporated herein by reference.) A "link" can be visualized as a pointer
from a first file to a second file indicating the second file is related
to the first file. A link is implemented as a stored record containing
data identifying the linked first and second files and containing link
attribute data defining the nature of the relationship between the two
files. For instance, when the first file is a specification and the second
file is a comment regarding the specification, a link record may be
created which identifies the first and second files and which contains
link attribute data indicating that the relationship between the files is
one of "comment". A separate link record is provided for every pair of
linked files. If three comments have been written about a particular
specification and stored in separate files, three links records may be
created, each indicating a "comment" relationship between the
specification file and a corresponding one of the comment files. Link
records may be grouped according to the files they link so that once a
particular file is identified, such as the specification file, all other
files, such as the comment files, to which the particular file is linked,
can be quickly determined by reviewing only the link records associated
with the particular file.
Files and links between files both may have assigned attributes
characterizing the nature of the file or link, but the concept of a file
attribute, as known in the art, differs somewhat from the concept of a
link attribute. Although separate files may be related by file attributes,
the relationship between such files is one of commonality and is
non-directed in that the relationship does not involve a pointing from one
file to another file. For instance, all files created by Smith are related
by virtue of having a common author and this common feature of each such
file may be denoted by using "Smith" as the value of an "author" file
attribute for each file. In contrast, links describe relationships between
pairs of files in a directed fashion, in the sense that a link leads a
user "from" a first document "to" another document for a particular
reason, the link attribute being descriptive of that reason. Thus the
relationship indicated by a link attribute is not one of commonality but
is rather one of "connectivity". For instance, the relationship between a
first file containing a drawing and a second file containing a comment
about the drawing cannot be easily described in terms of what both files
have in common (i.e., by a file attribute) since the files differ; one
file is a "drawing" and the other file is a "comment". But if the concept
of "comment" is used to describe a link between the two files, rather than
to describe the nature of one of the files, the relationship between the
files is clearly specified.
Even though a file may be thought of as containing a "comment", and
therefore may be assigned a file attribute value "comment", it is not
particularly useful to do so since users are typically not interested in
finding the group of all files which contain comments. Instead, users are
usually more interested in finding files containing comments about a
particular user-identified file. It is therefore much more useful to
establish a link between two files where the link has a "comment"
attribute.
Links give a collection of files structure by connecting file pairs to form
a "web" or a "graph" wherein each file may be thought of as a "node"
interconnected to other nodes by links. Some systems of the prior art are
adapted to display a representation of the graph enabling a user to
visualize how sets of files are interrelated in much the same way that a
map depicts how towns are interconnected by roads or rivers. Thus, for
instance, when a user decides to change one file ("node"), he may quickly
determine all of the other files which might be affected by the change by
inspecting other nodes which are linked to the node to be changed.
However, if the number of files associated with a project is large, the
graphs become complex, difficult to display and difficult for a user to
utilize. Therefore systems typically enable a user to reduce the size of a
graph to be displayed by specifying to the system the attributes of
various files of interest. The system then displays a "subgraph" which
contains only nodes characterized by the special attributes. For instance
when a user is only interested in files relating to pumps, the system
displays the nodes representing pump-related files along with their
interconnecting links, thereby reducing the number of files the user might
have to inspect in order to find a particular file of interest.
While prior art systems help a user to organize and retrieve stored data,
these systems still leave certain record keeping problems unresolved. One
problem relates to the difficulty of preselecting the types of file or
link attributes which may be most advantageous. In order for a system to
be useful, the attributes and their values which a user can use to
describe files and links must provide an appropriate basis for searches to
be performed by the system. However, only a limited number of attributes
is usually contemplated.
Hypertext systems also would be more useful if they included provisions for
maintaining comprehensive records of how project documentation changes
with time. Some computerized data storage systems store old versions of a
document but for large projects documents often undergo so many revisions
that it becomes impractical or impossible to store every version of every
document. It would also be desirable to maintain a history of changes to
file and link attributes. For instance a file attribute may indicate the
name of a person responsible for approving changes to that document, and
when another person assumes that responsibility the corresponding
attribute value must be changed. But in doing so the identity of the
person previously responsible for approving changes is lost unless some
means can be provided for maintaining the information. An ideal system
would be able to recreate the entire graph of a system as it existed at
any previous time, including the contents of files at the time, the file
attributes assigned to the files, the links between the files existing at
the time and the link attributes existing at the time. This feature would
be very useful in determining the cause of problems that arise in the
course of a project but implementation thereof is generally impractical in
prior art systems.
Another problem associated with the use of multi-user systems occurs when
two people independently attempt to change the same file at the same time.
A conflict arises as to which new version should be considered the latest
version. Some systems prevent the conflict by blocking access to a file by
more than one person at a time, but this approach inefficiently utilizes
multiple user capability, particularly when one user wants only to read
the file rather than change it.
Finally, it would be desirable to provide a data storage and retrieval
system capable of notifying a user when another user attempts to access or
change a file, a link, or other features of a graph. Such capability would
facilitate informing interested parties when a file or other aspect of a
graph has been, or is about to be, changed.
SUMMARY OF THE INVENTION
According to one aspect of the present invention, a data file management
machine enables a user to characterize stored data files ("nodes")
according to user-definable "file attributes". (Hereinafter the terms
"file" and "node" will be used interchangeably.) Each file attribute is a
variable having a user-defined name such as "author", or "subject matter",
and a user may assign a value to the file attribute for each file. The
values that may be assigned to a file attribute may comfiles prise
user-defined character strings, such as "Smith" or "pump specification",
or may be an integer. The machine stores data representing file attributes
and their values in a set of "node records", each node record comprising a
collection of data associated with a file.
Since the file attributes are user-definable, the user can establish a new
attribute whenever he recognizes a new distinguishing feature about a
file. For instance, new text files may be created which are written in
different languages than previously stored files, and a user may wish to
classify the new files according to the language in which they are
written. The present invention enables the user to establish a new file
attribute named with the character string "language". He may then assign
the character string "French" as the value of the language attribute for
every file written in French and he may assign the character string
"English" as the value of language attribute for every file written in
English. Thus unlike data management systems of the prior art, the present
invention enables a user to establish a new file attribute whenever the
need arises and the user is not limited to selecting from among a fixed
number of predefined attributes.
The data file management machine of the present invention further enables a
user to establish "links" between related file pairs. A "link" is a
user-defined relationship between two files, the link being evidenced by a
stored "link record", a collection of data including references to the two
files being linked and including data describing "link attributes" and
their assigned values. A "link attribute" is a variable comprising a
user-defined character string as a name. The user can also assign an
attribute value to the variable to characterize a relationship between two
files. This attribute value may also be a user-defined character string or
may be an integer. As an example of the utilization of links, when a first
file contains a drawing for a pump, a second file contains a specification
for the pump shown in the drawing, and a third file contains comments
regarding the drawing, a user may define two links, a first linking the
first and second files and a second linking the first and third files.
Each link may be characterized by a link attribute which the user may, for
instance, name "reference". For the link relating the drawing file to the
specification file, the user may assign a value to the "reference" link
attribute which the user calls "specification". For the link relating the
drawing file to the designer comments file, the user may assign the
character string "comment" as the value of the "reference" link attribute.
The machine of the present invention creates the appropriate link records
based on the user's input regarding the files to be linked and the names
and values of the link attributes.
The present invention enables a user to define not only the values of file
and link attributes but also to create new links between files based on
new, user-defined link attributes. In systems of the prior art which
permit the use of file and link attributes, the number and names of file
and link attributes are fixed and the user can only change values of
limited file and link attributes already associated with particular files
and links. The user cannot undertake to establish new file or link
attributes.
The machine of the present invention is also adapted to locate all files
and links characterized by user selected combinations of file and link
attribute values. The machine finds the appropriate files and links by
searching through the node and link records. The ability of a user to
define new link and file attributes provides the user with more flexible
control over file and link selection than is possible when the number and
nature of assignable file and link attributes are predetermined and fixed.
According to another aspect of the invention, the machine is adapted to
perform a "traversal" search whereby a user provides the machine with the
identification of a first node along with a predicate for the files and a
predicate for the links which describe the set of attributes and their
values which are desired. The machine then identifies all nodes connected
to the first node through intermediate nodes and links, wherein the
intermediate nodes and links are all characterized by the selected node or
link attribute values. This traversal search is useful when links are
employed to indicate a progression of files, as for instance when each
section of a specification is separately stored and wherein links having
"next section" attributes connect each successive section. Thus a user
need only identify the first section, and provide the machine with a
"specification" file attribute value and a "next section" link attribute
value. The machine will then find every section of the specification and
identify them to the user in the order they occur.
Hypertext systems of the prior art perform "query" searches wherein the
user provides a list of selected file and link attribute values but does
not provide a starting node. The system then identifies all files
characterized by the selected file attribute values and all links
characterized by the selected link attribute values which interconnect the
identified files. However, the traversal search of the present invention
always identifies the files in the proper order whereas the query search
returns files in arbitrary order. Moreover, for a query search to be
adequately selective, the file attributes must usually be more precisely
defined than for the traversal search. For instance, if more than one
specification is stored in the system, additional file attribute values
may be necessary to distinguish between files associated with different
specifications. Otherwise the query search would return all specification
files and not just the desired specification file. A traversal search can
be much faster than the query search because in a query search every node
record and every link record is inspected whereas in a traversal search
only those node and link records are inspected which are encountered in
the course of the search by passing through ("traversing") intermediate
nodes and links having the selected attribute values.
According to still another aspect of the invention, the machine is adapted
to transmit user-definable character strings ("demons") to a computer
operating system within which the machine functions. A demon is
transmitted on occurrence of any one of a set of events affecting a graph,
such as a user request to modify a file, delete a link, or change a file
or link attribute value. The demon character strings can be chosen so that
the operating system identifies them as commands, such as commands to run
a program. Demons are useful, for instance, to initiate a user-provided
program which sends a message over an electronic mail system to a person
responsible for approving changes to a file. The machine of the present
invention can be instructed to transmit the demon whenever someone
attempts to modify the contents of the file, thereby initiating a warning
message to the responsible person.
According to a further aspect of the invention, the machine identifies a
node according to the time (the "version time") the node was created. When
the machine modifies a node in response to user input, the version time
identifying the node is updated to the current time. When a user attempts
to change the contents of a node, the user indicates the version time of
the node he wishes to change, and if the version time indicated by the
user is not the current version time for the node, then the machine knows
that the user's changes are based on an outdated version of the node. That
is, the contents of the node have been changed since the last time the
user acquired them. In such case the machine prevents the user from
modifying the node and notifies the user of the problem. This aspect of
the invention prevents conflicts which can arise when more than one user
independently access and attempt to modify a node at the same time.
According to a still further aspect of the invention, the machine is
adapted to determine the states of files, links, attribute values and
demons as they existed at any previous time. Whenever a user changes the
contents of a file, thereby creating a new "version" of the file, the
machine creates and stores a "version history" record containing a set of
instructions for converting the new version back to the previous version
that it replaced. The entire previous file is not stored. The new version
of the file is identified by the version time, adjusted to indicate the
time at which the new version was created. The version history record for
the previous version is also identified by a version time indicating the
time that the previous version of the file was created. When a user wishes
to inspect the contents of a file as it existed at a particular time, the
user indicates the particular time to the machine and the machine can
determine from the version time identifiers wh | | |