|
Claims  |
|
|
I claim:
1. In a data processing system, a method for archiving image objects in a
document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a text object and
substantially adjacent image object into said system;
generating a first key word for said text object from said text object and
adding said first key word to said index;
automatically determining that said text object is substantially adjacent
to said substantially adjacent image object and in response thereto,
generating a second key word for said substantially adjacent image object
from said text object and adding said second key word to said index;
storing said document architecture envelope in said system;
storing said index including said first and second key words in said
system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said substantially adjacent image object if said second key word
is found in said comparing step.
2. The method of claim 1, wherein said second key word is generated from a
caption word string in said text object.
3. The method of claim 1, wherein said second key word is generated from
highlighting a word string in said text object.
4. The method of claim 1, wherein said second key word is generated from
typing a word string into said system.
5. In a data processing system, a method for archiving graphics objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a text object and a
graphics object containing embedded text into said system;
generating a first key word for said text object from said text object, and
adding said first key word to said index;
automatically determining if there is embedded text in said graphics
object, and in response thereto, extracting graphics data including
embedded text from said graphics object;
generating a second key word for said graphics object from said embedded
text and adding said second key word to said index;
storing said document architecture envelope in said system;
storing said index including said first and second key words in said
system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said graphics object if said second key word is found in said
comparing step.
6. The method of claim 5, which further comprises:
generating a third key word for said graphics object from a caption word
string in said text object and adding said third key word to said index;
storing said index including said first, second and third key words in said
system.
7. The method of claim 5, which further comprises:
generating a third key word for said graphics object from highlighting a
word string in said text object and adding said third key word to said
index;
storing said index including said first, second and third key words in said
system.
8. The method of claim 5, which further comprises:
generating a third key word for said graphics object from typing a word
string into said system and adding said third key word to said index;
storing said index including said first, second and third key words in said
system.
9. In a data processing system, a method for archiving image objects and
graphics objects in a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a text object, an
image object and a graphics object into said system;
generating a first key word for said text object from said text object and
adding said first key word to said index;
automatically generating a second key word for said image object from said
text object and adding said second key word to said index;
extracting graphics data including embedded text from said graphics object;
generating a third key word for said graphics object from said embedded
text and adding said third key word to said index;
storing said document architecture envelope in said system;
storing said index including said first, second and third key words in said
system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said image object if said second key word is found in said
comparing step.
10. The method of claim 9, wherein said second key word is generated from a
caption word string in said text object.
11. The method of claim 9, wherein said second key word is generated from
highlighting a word string in said text object.
12. The method of claim 9, wherein said second key word is generated from
typing a word string into said system.
13. The method of claim 9, which further comprises:
generating a fourth key word for said graphics object from a caption word
string in said text object and adding said fourth key word to said index;
storing said index including said first, second, third and fourth key words
in said system.
14. The method of claim 9, which further comprises:
generating a fourth key word for said graphics object from highlighting a
word string in said text object and adding said fourth key word to said
index;
storing said index including said first, second, third and fourth key words
in said system.
15. The method of claim 9, which further comprises:
generating a fourth key word for said graphics object from typing a word
string into said system and adding said fourth key word to said index;
storing said index including said first, second, third and fourth key words
in said system.
16. In a data processing system, a method for archiving non-text objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a non-text object and
comment text into said system;
automatically generating a first key word for said non-text object from
said comment text and adding said first key word to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said non-text object if said first key word is found in said
comparing step.
17. The method of claim 16, wherein said first key word is generated from
comment text contained in said non-text object.
18. The method of claim 16, wherein said first key word is generated from
displaying and highlighting a word string in said comment text.
19. The method of claim 16, wherein said first key word is generated from
typing a word string into said system.
20. In a data processing system, a method for archiving graphics objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a graphics object
containing embedded text into said system;
extracting graphics data including embedded text from said graphics object;
automatically determining if there is embedded text in said graphics object
and in response thereto, generating a first key word for said graphics
object from said embedded text and adding said first key word to said
index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said graphics object if said first key word is found in said
comparing step.
21. The method of claim 20, which further comprises:
generating a second key word for said graphics object from a caption word
string in a text object in said document and adding said second key word
to said index;
storing said index including said first and second key words in said
system.
22. The method of claim 20, which further comprises:
generating a second key word for said graphics object from highlighting a
word string in a text object in said document and adding said second key
word to said index;
storing said index including said first and second key words in said
system.
23. The method of claim 20, which further comprises:
generating a second key word for said graphics object from typing a word
string into said system and adding said second key word to said index;
storing said index including said first and second key words in said
system.
24. In a data processing system, a method for archiving image objects in a
document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including an image object and a
substantially adjacent text object into said system;
automatically determining if there is a text object substantially adjacent
to said image object, and in response thereto, generating a first key word
for said image object from said text object;
generating a link for said first key word to said text object;
adding said first key word and said link to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said image object if said first key word is found in said
comparing step.
25. The method of claim 24, wherein said first key word is generated from a
caption word string in said text object.
26. The method of claim 24, wherein said first key word is generated from
highlighting a word string in said text object.
27. The method of claim 24, wherein said first key word is generated from
typing a word string into said system.
28. In a data processing system, a method for archiving graphics objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a graphics object and
a text object into said system;
extracting graphics data including embedded text from said graphics object;
automatically generating a first key word for said graphics object from
said embedded text;
generating a link for said first key word to said text object;
adding said first key word and said link to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said graphics object if said first key word is found in said
comparing step.
29. The method of claim 28, which further comprises:
generating a second key word for said graphics object from a caption word
string in said text object;
generating a second link for said second key word to said text object;
adding said second key word to said index;
storing said index including said first and second key words in said
system.
30. The method of claim 28, which further comprises:
generating a second key word for said graphics object from highlighting a
word string in said text object;
generating a second link for said second key word to said text object;
adding said second key word to said index;
storing said index including said first and second key words in said
system.
31. The method of claim 28, which further comprises:
generating a second key word for said graphics object from typing a word
string into said system;
generating a second link for said second key word to said text object;
adding said second key word to said index;
storing said index including said first and second key words in said
system.
32. A data processing system for archiving image objects in a document,
comprising:
means for loading an existing index into a data processing system;
means for inputting a document architecture envelope including a text
object and an image object into said system;
means coupled to said loading means and said inputting means for generating
a first key word for said text object from said text object and adding
said first key word to said index;
said generating means automatically generating a second key word for said
image object from said text object and adding said second key word to said
index;
means coupled to said inputting means for storing said document
architecture envelope in said system;
means coupled to said generating means for storing said index including
said first and second key words in said system;
means for entering a search term into said data processing system;
means for comparing said search term with candidate key words in said
index; and
means for retrieving said image object if said second key word is found in
said means for comparing.
33. The system of claim 32, wherein said second key word is generated from
a caption word string in said text object.
34. The system of claim 32, wherein said second key word is generated from
highlighting a word string in said text object.
35. The system of claim 32, wherein said second key word is generated from
typing a word string into said system.
36. A data processing system for archiving graphics objects in a document,
comprising:
means for loading an existing index into a data processing system;
means for inputting a document architecture envelope including a text
object and a graphics object into said system;
first generating means coupled to said loading means and said inputting
means for generating a first key word for said text object from said text
object and adding said first key word to said index;
means coupled to said loading means and said inputting means for extracting
graphics data including embedded text from said graphics object;
second generating means coupled to said extracting means for automatically
generating a second key word for said graphics object from said embedded
text and adding said second key word to said index;
means coupled to said inputting means for storing said document
architecture envelope in said system;
means coupled to said first and said second generating means for storing
said index including said first and second key words in said system;
means for entering a search term into said data processing system;
means for comparing said search term with candidate key words in said
index; and
means for retrieving said graphics object if said second key word is found
in said means for comparing.
37. The system of claim 36, which further comprises:
said first generating means generating a third key word for said graphics
object from a caption word string in said text object and adding said
third key word to said index;
said index including said first, second and third key words in said system.
38. The system of claim 36, which further comprises:
third generating means coupled to said loading means and said inputting
means for generating a third key word for said graphics object from
highlighting a word string in said text object and adding said third key
word to said index;
said index including said first, second and third key words in said system.
39. The system of claim 36, which further comprises:
third generating means coupled to said loading means and said inputting
means for generating a third key word for said graphics object from typing
a word string into said system and adding said third key word to said
index;
said index including said first, second and third key words in said system.
40. In a data processing system, a method for archiving image objects in a
document, comprising the steps of:
loading an existing index into a data processing system;
inputting an image object file into said system;
inputting a document architecture envelope including a text object and a
pointer to said image object file, into said system;
automatically generating a first key word for said image object from said
text object;
generating a link for said first key word to said text object;
adding said first key word and said link to said index;
storing said document architecture envelope and said image object file in
said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said image object if said first key word, is found in said
comparing step.
41. The method of claim 40, wherein said first key word is generated from a
caption word string in said text object.
42. The method of claim 40, wherein said first key word is generated from
highlighting a word string in said text object.
43. The method of claim 40, wherein said first key word is generated from
typing a word string into said system.
44. In a data processing system, a method for archiving graphics objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a graphics object file into said system;
inputting a document architecture envelope including a text object and a
pointer to said graphics object file into said system;
extracting graphics data including embedded text from said graphics object;
automatically generating a first key word for said graphics object from
said embedded text;
generating a link for said first key word to said text object;
adding said first key word and said link to said index;
storing said document architecture envelope and said graphics object file
in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said graphics object if said first key word is found in said
comparing step.
45. The method of claim 44, which further comprises:
generating a second key word for said graphics object from a caption word
string in said text object;
generating a second link for said second key word to said text object;
adding said second key word to said index;
storing said index including said first and second key words in said
system.
46. The method of claim 44, which further comprises:
generating a second key word for said graphics object from highlighting a
word string in said text object;
generating a second link for said second key word to said text object;
adding said second key word to said index;
storing said index including said first and second key words in said
system.
47. The method of claim 44, which further comprises:
generating a second key word for said graphics object from typing a word
string into said system;
generating a second link for said second key word to said text object;
adding said second key word to said index;
storing said index including said first and second key words in said
system.
48. In a data processing system, a method for archiving voice objects in a
document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a text object and an
voice object into said system;
generating a first key word for said text object from said text object and
adding said first key word to said index;
automatically generating a second key word for said voice object from said
text object and adding said second key word to said index;
storing said document architecture envelope in said system;
storing said index including said first and second key words in said
system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said voice object if said second key word is found in said
comparing step.
49. The method of claim 48, wherein said second key word is generated from
a caption word string in said text object.
50. The method of claim 48, wherein said second key word is generated from
highlighting a word string in said text object.
51. The method of claim 48, wherein said second key word is generated from
typing a word string into said system.
52. In a data processing system, a method for archiving voice objects in a
document, comprising the steps of:
loading an exiting index into a data processing system;
inputting a document architecture envelope including an voice object into
said system;
automatically generating a first key word for said voice object from a text
object in said document and adding said first key word to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said voice object if said first key word is found in said
comparing step.
53. The method of claim 52, wherein said first key word is generated from a
caption word string in said text object.
54. The method of claim 52, wherein said first key word is generated from
highlighting a word string in said text object.
55. The method of claim 52, wherein said first key word is generated from
typing a word string into said system.
56. In a data processing system, a method for archiving voice objects in a
document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including an voice object and a
text object into said system;
automatically generating a first key word for said voice object from said
text object;
automatically generating a link for said first key word to said text
object;
adding said first key word and said link to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said voice object if said first key word is found in said
comparing step.
57. The method of claim 56, wherein said first key word is generated from a
caption word string in said text object.
58. The method of claim 56, wherein said first key word is generated from
highlighting a word string in said text object.
59. The method of claim 56, wherein said first key word is generated from
typing a word string into said system.
60. A data processing system for archiving voice objects in a document,
comprising:
means for loading an existing index into a data processing system;
means for inputting a document architecture envelope including a text
object and an voice object into said system;
means coupled to said loading means and said inputting means for generating
a first key word for said text object from said text object and adding
said first key word to said index;
said generating means automatically generating a second key word for said
voice object from said text object and adding said second key word to said
index;
means coupled to said inputting means for storing said document
architecture envelope in said system;
means coupled to said generating means for storing said index including
said first and second key words in said system;
means for entering a search term into said data processing system;
means for comparing said search term with candidate key words in said
index; and
means for retrieving said voice object if said second key word is found in
said means for comparing.
61. The system of claim 60, wherein said second key word is generated from
a caption word string in said text object.
62. The system of claim 60, wherein said second key word is generated from
highlighting a word string in said text object.
63. The system of claim 60, wherein said second key word is generated from
typing a word string into said system.
64. In a data processing system, a method for archiving non-text objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including a non-text object and
containing embedded text into said system;
extracting said embedded text;
automatically generating a first key word for said non-text object from
said embedded text and adding said first key word to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said non-text object if said first key word is found in said
comparing step.
65. The method of claim 64, wherein non-text object is a graphics object.
66. The method of claim 64, wherein said non-text object is a formatted
data object.
67. The method of claim 64, wherein at least a portion of said non-text
object is a separate file which is referenced by a pointer in said
envelope.
68. In a data processing system, a method for archiving non-text objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including an non-text object
into said system;
automatically generating a first key word for said non-text object from a
text object in said document and adding said first key word to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said non-text object if said first key word is found in said
comparing step.
69. The method of claim 68, wherein said first key word is generated from a
caption word string in said text object.
70. The method of claim 68, wherein said first key word is generated from
highlighting a word string in said text object.
71. The method of claim 68, wherein said first key word is generated from
typing a word string into said system.
72. In a data processing system, a method for archiving non-text objects in
a document, comprising the steps of:
loading an existing index into a data processing system;
inputting a document architecture envelope including an non-text object and
a text object into said system;
automatically generating a first key word for said non-text object from
said text object;
generating a link for said first key word to said text object;
adding said first key word and said link to said index;
storing said document architecture envelope in said system;
storing said index including said first key word in said system;
entering a search term into said data processing system;
comparing said search term with candidate key words in said index; and
retrieving said non-text object if said first key word is found in said
comparing step.
73. The method of claim 72, wherein said first key word is generated from a
caption word string in said text object.
74. The method of claim 72, wherein said first key word is generated from
highlighting a word string in said text object.
75. The method of claim 72, wherein said first key word is generated from
typing a word string into said system.
76. A data processing system for archiving non-text objects in a document,
comprising:
means for loading an existing index into a data processing system;
means for inputting a document architecture envelope including a text
object and an non-text object into said system;
means coupled to said loading means and said inputting means for generating
a first key word for said text object from said text object and adding
said first key word to said index;
said generating means automatically generating a second key word for said
non-text object from said text object and adding said second key word to
said index;
means coupled to said inputting means for storing said document
architecture envelope in said system;
means coupled to said generating means for storing said index including
said first and second key words in said system;
means for entering a search term into said data processing system;
means for comparing said search term with candidate key words in said
index; and
means for retrieving said non-text object if said second key word is found
in said means for comparing.
77. The system of claim 76, wherein said second key word is generated from
a caption word string in said text object.
78. The system of claim 76, wherein said second key word is generated from
highlighting a word string in said text object.
79. The system of claim 76, wherein said second key word is generated from
typing a word string into said system. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Technical Field
The invention disclosed broadly relates to data processing technology and
more particularly relates to improvements in information retrieval.
2. Background Art
For the last two decades the retrieval of documents using a computer has
been a prominent application in both business and library science. Two
methods of preparing and retrieving documents have become established in
the state of the art. They are:
Manual Generation of Key Word: At the time of document archival, operator
intervention is required to manually attach to the document a set of terms
that, in the opinion of the operator, describe the content or theme of the
document being stored. The words or phrases may or may not occur within
the document and represent a subjective judgement by the operator as to
how the document may be queried in the future.
Contextual: Prior to document archival, each word in the document text is
reviewed and based on a criterion or set of criteria, words and phrases
are chosen as being retrieval terms for the subject document. In its
simplest form, each word in the document text can be viewed as a retrieval
term. Alternately, elaborate grammatical criteria can be used to scale
down the selection of key words from the document text to more specific
words which, based on linguistic and information science methodology, are
determined to have a greater level of specificity and to be of more use in
later retrieval.
An example of the manually generated key word retrieval system is the IBM
PROFS System and an example of a contextual system for document text is
the IBM STAIRS program product. Both of these are examples of host
computer based information retrieval systems. An example of a contextual
information retrieval system for document text, which operates on a
personal computer or a local area network is the IBM Search Vision
product.
The prior art has not provided an efficient means for archiving documents
having mixed object types of both text and non-text objects. In the prior
art, if an archivist were attempting to archive a document which included
images or graphics, the archivist would manually add descriptive terms as
key words from his own judgement as to words which most appropriately
describe the image or graphic. The prior art has failed to provide a
contextual approach to archiving documents having non-text objects.
Furthermore, non-text objects contained within a document are not
independently accessible in prior art information retrieval systems.
OBJECTS OF THE INVENTION
It is therefore an object of the invention to provide an improved
information retrieval system.
It is another object of the invention to provide an improved information
retrieval system which is capable of archiving documents containing
non-text objects using a contextual method.
It is still a further object of the invention to provide an information
retrieval system which enables the independent accessing of non-text
objects from documents archived in the system.
It is still a further object of the invention to provide an information
retrieval system capable of accessing documents containing non-text
objects, using a query term which matches to a key word which was derived
solely from the non-text object.
SUMMARY OF THE INVENTION
These and other objects, features and advantages are accomplished by the
non-text object storage and retrieval invention disclosed herein. A
program, method and system are disclosed which senses the presence of a
non-text object in a mixed object, multimedia document to be archived in
an information retrieval system. In addition to text objects, a mixed
object document can contain non-text objects such as image objects,
graphics objects, formatted data objects, font objects, voice objects,
video objects and animation objects. The invention enables the creation of
key words which characterize the non-text object, for incorporation in the
inverted file index of the data base, thereby enabling the later retrieval
of either the entire document or the independent retrieval of the non-text
object through the use of such key words. Three different approaches are
described for creating the key words for the non-text objects. The first
method is by presenting to the archivist the non-text object within the
context of the descriptive text of the document. The archivist may then
input key words through the keyboard and by pointing to the object with a
mouse, for example, associating those key words with the non-text object
in the document. Later, when the inverted file index is prepared, the key
word, the document's storage address and the location of the non-text
object within the document are associated with one another. In this
manner, during later retrieval where the key word is the query term, not
only can the entire document be accessed, but the non-text object can be
independently accessed and displayed.
A second method for creating key words employs the display to the archivist
of the non-text object within the context of the text of the document. The
archivist is provided with a pointing device to highlight or mark those
portions of the text in the document which relate to the non-text object.
The pointing device is also employed to identify the non-text object to
which the highlighted text refers. Then, in a manner similar to the first
method, the highlighted portions of the text have the highlighted words
used as key words which are associated with the storage address of the
document and with the non-text object, in the formation of the inverted
file index. Then later, during an information retrieval session, not only
can the document be retrieved by the use of such key words, but also the
non-text object can be independently retrieved and displayed.
A third method for the formation of key words for non-text objects is by
automatic key word extraction. There are several types of non-text
objects, such as image, graphics, formatted data, fonts, voice, video, and
animation objects. An image object is a bit mapped image which typically
contains an array of picture elements or pels which may or may not be
compressed when stored in the data base. Usually, there is no text
contained within a bit mapped image, however text may be associated with
the bit mapped image in an architected data stream such as a Mixed Object
Document Content Architecture (MO:DCA) data stream or alternately an
Office Document Architecture (ODA) data stream. A second type of non-text
object is a graphics object wherein a two-dimensional picture is
represented by a set of vector representations of straight lines, arcs and
other graphical elements. In a graphic object, text can be | | |