WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Document structure retrieval apparatus utilizing partial tag-restored structure    
United States Patent5649218   
Link to this pagehttp://www.wikipatents.com/5649218.html
Inventor(s)Saito; Kazuo (Kanagawa, JP)
AbstractA document data storing section stores document data which incorporates tags that separate the document data into data portions to express its structure. Part of the tags are omissible. A type storing section stores a pattern of the document data structure expressed by the tags. An essential structure searching means identifies a minimum necessary range of the document data in which range omitted tags should be restored, based on a structure retrieving instruction including an object structure. A structure restoring section restores the omitted tags in the minimum necessary range to thereby produce a partial retrieved data. A structure retrieving section retrieves a tag of the object structure from the partial retrieved data.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5649218
Document structure retrieval apparatus utilizing partial tag-restored

     structure - US Patent 5649218 Drawing
Document structure retrieval apparatus utilizing partial tag-restored structure
Inventor     Saito; Kazuo (Kanagawa, JP)
Owner/Assignee     Fuji Xerox Co., Ltd. (Tokyo, JP)
Patent assignment
All assignments
Publication Date     July 15, 1997
Application Number     08/503,691
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     July 18, 1995
US Classification     715/513 715/514 715/531 715/533
Int'l Classification     G06F 017/30 G06F 017/21
Examiner     Kulik; Paul V.
Assistant Examiner    
Attorney/Law Firm     Finnegan, Henderson, Farabow, Garrett & Dunner, L.L.P.
Address
Parent Case    
Priority Data     Jul 19, 1994[JP]6-187866
USPTO Field of Search     395/774 395/761 395/776 395/779 395/784 395/785 395/935 395/943 395/601 395/613
Patent Tags     document retrieval utilizing partial tag-restored
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5587902
Kugimiya
704/2
Dec,1996

[0 after 0 votes]
5583762
Shafer
715/532
Dec,1996

[0 after 0 votes]
5548508
Nagami
704/2
Aug,1996

[0 after 0 votes]
5291602
Barker
715/524
Mar,1994

[0 after 0 votes]
5276793
Borgendale
715/513
Jan,1994

[0 after 0 votes]
5173853
Kelly
715/530
Dec,1992

[0 after 0 votes]
5113341
Kozol
715/531
May,1992

[0 after 0 votes]
5079700
Kozoll

Jan,1992

[0 after 0 votes]
5499329
Motoyama
715/513
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A structure retrieval apparatus comprising:

data storing means for storing data which incorporates tags each discriminating a portion of the data to express a structure of the data, part of the tags being omissible;

type storing means for storing a pattern of the data structure expressed by the tags;

essential structure searching means for identifying a minimum necessary range of the data in which range omitted tags should be restored, based on a structure retrieving instruction including an object structure;

structure restoring means for restoring the omitted tags in the minimum necessary range to thereby produce a partial retrieved data; and

structure retrieving means for retrieving a tag of the object structure from the partial retrieved data.

2. The structure retrieval apparatus of claim 1, wherein when the tag of the object structure is omissible, the essential structure searching means searches the pattern of the data structure for a non-omissible tag of a higher rank than the tag of the object structure.

3. The structure retrieval apparatus of claim 1, wherein the partial retrieved data is substituted for a corresponding partial data of the data stored in the data storing means.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a structure retrieval apparatus in which tags (part of the tags are omissible) are inserted in data to discriminate portions of the data to thereby express a structure, and the structure of the data is searched at high speed. For example, the invention is applicable to an apparatus for searching the structure of a structured document in which tags are inserted in a text to divide it into document elements.

2. Description of the Related Art

Conventionally, in document editing apparatuses for document processing, such as document editing apparatuses (word processors) in workstations, in order to efficiently prepare a document, attempts have been made to structure and edit the document by preparing in advance a plurality of document parts such as headers and paragraphs and by determining relationships among the respective document parts.

As examples of structured documents which incorporate the concept of a structure with respect to a document, structured documents conforming to international standards of ODA (ISO 8613: Open Document Architecture) and SGML (ISO 8879: Standard Generalized Markup Language) are known. As for an example of a document processing method using a structured document conforming to ODA standards, reference is made to Japanese Unexamined Patent Publication No. Hei. 5-135054 entitled "Document Processing Method. "

Structured documents conforming to SGML, which have high affinity with conventional text processing systems, have found widespread use principally in the United States, and have already entered a stage of practical use. This is because a conventional text processing system is sufficiently capable of realizing the structured document since the technique of the structured document conforming to SGML is a technique whereby the document text is partially classified (e.g., divided as document parts) by inserting marks called tags into the document text, the document is structured by defining relationships among the divisions, and a tree-structured document structure is thereby represented.

Next, by citing a structured document conforming to SGML as an example, a description will be given of an example of processing a structured document provided with marks. In a structured document conforming to SGML, a pattern of a document structure is provided in advance, and the structure of the document is constrained within the range of the provided pattern. Such a pattern of the document structure is called a document type definition (DTD) in SGML.

In a structured document conforming to SGML, a document type definition is first set forth to regulate the structure of the document. Next, to represent the structure, marks called tags are inserted in the document text, and the document text is partially classified by the tags. For example, one paragraph in a document is represented as shown below by using a tag <para> having a name "para."

"<para>This is one paragraph.</para>"

The tag <para> here means a start of the paragraph, and is called a start tag. The tag </para> means an end of the paragraph, and is called an end tag. That is, in this example, the paragraph is marked by using two tags, the start tag <para> and the end tag </para> having a name "para," and a part of the document text is thereby partially classified. In other words, the portion of the text sandwiched by the two tags indicates the content portion of the structure indicated by the tags.

The tags which are given their names are respectively distinguished, and their structural functions are defined in the document type definition. In this context, the tag represents a structure of the document. Accordingly, a structure of the structured document (an SGML-conformable document) referred to hereafter shall mean that it is synonymous with a tag insofar as confusion does not occur.

In addition, some tags are omissible in the structured document conforming to SGML (hereafter abbreviated as a SGML document). In that case, whether the omission is possible or not is designated by the document type definition (DTD). The designation as to whether the omission is possible or not is given independently for each start tag and each end tag. For example, the end tag </para> is omissible in a case where a designation to that effect is given in the document type definition, in which case the above-described example can be written as

"<para>This is one paragraph."

The document type definition of the SGML document is written as shown in FIG. 13, for example, In the document structure constrained by a document type definition 130 shown in FIG. 13, it is defined that the respective tags, including the start tag whose name is "header," the end tag of "header," the end tag of "paragraph," the end tag of "figure," and the end tag of "fig.sub.-- body," are omissible.

Next, a specific description will be given of the contents of the document type definition 130 shown in FIG. 13. The document type definition (DTD) here is written in conformity with the representation method of SGML. The initial symbol "<!" on each line of the content of the document type definition is a markup declaration delimiter, and an ensuing "ELEMENT" which continues without a space is an element declaration keyword. In other words, the initial word "<!ELEMENT" in the line constitutes a reserved word for designating the content of its structure (a lower structure) by an ensuing description. Then, the names (doc, chap, header, para, fig, and fig.sub.-- body) of items described next represent the names of object tags.

Ensuing symbols ("- - ,""- O,""O O," and so on) are symbols which represent whether the object tags of the items are respectively omissible in the order of the start tag and the end tag. The symbol "-" means that the tag is not omissible, while the symbol "O" means that the tag is omissible. For instance, if the symbols in a given line are "- O," it means that the start tag is not omissible, and the end tag is omissible.

Items that further follow provide for definitions representing the lower structure following the tag. The symbol "," means that items (tags) appear in order, the symbol ".vertline." means that it suffices if either item is present, and the symbol "*" means that the item (tag) is repeated a 0 time or more. In addition, the symbol "?" means that the item (tag) may or may not be present.

According, in a case where the lower structure following the tag is defined as "chap.sub.-- header, para*, chap*)," the lower structure according to the definition means that "first, there is a chapter header, followed by a repetition of a paragraph a 0 time or more, further followed by a repetition of a chapter a 0 time or more." To cite a specific example, in a case where the lower structure following the tag is defined as "header, (para.vertline.fig)*, chap*)" as in the second line of the document type definition 130 shown in FIG. 13, the lower structure according to the definition means that "there is a header, followed by a repetition of a paragraph or a figure a 0 time or more, further followed by a repetition of a chapter a 0 time or more."

In addition, "#PCDATA" in the lower structure following the tag and written in the third and fourth lines is one of the reserved words of SGML, and means in the definition of the structure that its content is character data. Accordingly, in the example of the document type definition 130, "#PCDATA" means that character data follows the tags on the "header" and the "para" which constitute the "chap."

That is, in the pattern of the document structure according to the document type definition (DTD) shown in FIG. 13, the relevant document is one which starts with a tag "<doc>" constituted by a repetition of a "chap" (chapter), and that "chap" has a "para" (paragraph) or a "fig" (figure) which is repeated a 0 time or more following the "header," further followed by a repetition of a "chap" a 0 time or more. In addition, the "header" and the "para" in this example are constituted by character data.

As detailed rules, it is defined that the content of the "fig" in the document structure is comprised of the "header" and the "fig.sub.-- body" (figure body) that follows, and the "fig.sub.-- body" is defined as having not lower structure ("EMPTY") since, for instance, an external image file is referred to. In addition, as for the designation concerning whether the tags of the structure are omissible or not, it is defined that the tags of the "doc" and the "chap" are not omissible, that, as for the tags of the "para," the "fig," and the "fig.sub.-- body," only their end tags are omissible, and that, as for the "header," its both tags are omissible.

An example of an actual document conforming to such a document type definition (hereafter such a document will be referred to as an object document) is, for example, an SGML document 140 shown in FIG. 14. Incidentally, in the document shown in FIG. 14, the document is represented by varying the indentation depending on the depth of the structure of the document, but this representation is provided only for the purpose of facilitating viewing in the light of explanation of an example of the document of a structured document. Actual documents are not indented in many cases.

Referring to FIG. 14, as can be appreciated from the SGML document 140 of the structured document in this example, as for the tags representing the "header" in the lower structure constituting the "chap," both their start tag and end tag do not appear in the document. In actuality, however, the start tag "<header>" between the tag "<chap>" and its content portion "What is SGML?" in the second line is omitted. Incidentally, whether or not such a tag is omitted cannot be determined unless reference is had to the aforementioned document type definition 130. Accordingly, the accurate structure of the object document cannot be understood unless it is considered in combination with the document type definition to which the object document constantly conforms.

Since tags are thus omitted in an SGML document, processing for analyzing the document structure (syntactic analysis processing using an SGML parser) is first required when processing the SGML document. In the processing of analysis of a document structure, processing is mainly carried out in which collation is made with a document type definition while analyzing the object document, and the omitted tags in the object document are restored. In the syntactic analysis processing which is executed in actual document processing, other processing (processing such as restoration of attributes and expansion of entities) is also carried out. Since attention is focused herein on the structure restoration processing alone, a description will be given hereafter under the assumption that the syntactic analysis processing is simply equivalent to structure restoration processing.

If restoration processing of tags (structures) is carried out with respect to the SGML document 140 illustrated in FIG. 14 as the object document, an SGML document 150 such as the one shown in FIG. 15 is obtained. In the SGML document 150 shown in FIG. 15, the underlined portions indicate restored tags (structures). In this object document, the omitted tags are restored by collating with the document type definition 130 such as the one shown in FIG. 13. That is, in view of the rule on the structure of "chap," the tag "<header"> must always exist next to the tag "<chap>," so that the tag "<header>" is first restored next to the tag "<chap>." Similarly, since the tag "<header>" must always exist next to the tag "<fig>," so that the tag "<header>" is restored next to the tag "<fig>." In addition, since their end tags are omitted, their respective end tags "</header>," "</para>" and the like are restored next to their content portions (at positions behind their corresponding ensuing tags). Thus, the respective tags (structures) are restored as underlined in the drawing.

Next, a description will be given of processing in a case where a structure is retrieved in the SGML document 150 in which the tags are restored and its structure is represented. In a structured document, when document editing is carried out, not only the retrieval of mere character strings of the text but also the retrieval of structures making use of the document structure becomes important processing in document processing. This is because in cases where processing of a structured document is performed, edit processing which makes use of structures of the document structure is actively undertaken.

In the retrieval of a structured document, not only the conventional retrieval of character strings (text retrieval) but also retrieval making active use of the structures is effectively utilized. For example, in a case where an attempt is made to retrieve a figure related to SGML in a document, according to conventional retrieval processing, full-text retrieval (character string retrieval) is carried out with respect to the document, and the "related figure" is located from the character string in the text.

However, if the structure itself of the document structure is used for retrieval, it becomes possible to effect retrieval by pointing to a structure in the document structure as in "a figure in which SGML is included in the title of the figure" or "a header in the lower structure of a figure," thereby making it possible to effect retrieval by narrowing down the object. In addition, in retrieval processing in that case, since retrieval can be effected by narrowing down the object range in accordance with the structure of the document, there is an advantage in that the efficiency of retrieval processing improves.

As described before, since the SGML document has a document architecture of a type in which tags for marking are embedded in the text, its affinity with a conventional text processing system is high. That is, since the structures are represented by tags for marking, it is unnecessary to use a special apparatus or processing program when retrieving the structure, and it is possible to retrieve the document structure by using character-string retrieval for retrieving a character string representing the symbols of the tags. In other words, the SGML document can be prepared by using a conventional text processing apparatus (such as a document editor), and structure retrieval can be basically carried out by retrieving the start tags and their corresponding end tags by using the conventional text retrieving technique for character-string retrieval in which character strings of the tags are retrieved.

As described above, in the SGML document, the tags (structures) are omitted according to the designation of a document type definition. Accordingly, there are cases where omissible tags are designated as objects to be retrieved. For example, if a description is given by citing the SGML document 140 illustrated in FIG. 14, in a case where the user performs retrieval by designating the tag "<header>" by desiring to retrieve the content of the structure called "header" from the document structure, this tag is omitted in the original SGML document, so that the conventional text retrieval method cannot be used unless the restoration processing of the tag is conducted beforehand.

Accordingly, in the structure retrieval of a representation format such as an SGML document, structure restoration processing for restoring the omitted structure (tag) generally becomes indispensable. That is, before the structure subject to retrieval is searched for, the structure (tag) is restored by conducting the structure restoration processing with respect to the entire object document, and processing for searching for the tag subject to retrieval is then carried out by text retrieval.

Since the structure restoration processing must be carried out while referring to the document type definition of the object document, complicated processing is required, and a long processing time is required, with the result that it takes time in retrieval processing. The longer the object document is, the more time it takes in the restoration processing of the structure, and the more retrieval time increases. For this reason, such processing presents a problem in practical applications when handling an SGML document as a structured document.

In contrast, when, for instance, the object document (SGML document) is stored, it is possible to use a technique whereby the results of restoration processing of the tags omitted are stored in advance after being converted into an internal data structure. If this technique is used, the structure restoration processing during retrieval of the structure becomes unnecessary, so that the retrieval processing speed can be improved.

With the above-described technique, however, since the object document must be stored after being converted into the internal data structure, there is a drawback in that a large storage area of such as an external storage device is required. In addition, when documents are frequently exchanged with external documents, processing for converting the documents into internal data structures (structure restoration processing) is, in fact, required on each such occasion, with the result that there arises a drawback that the overall throughput cannot be improved. Namely, although the above-described technique is effective for a large-scale document database for managing object documents in one place, it cannot be generally said that the technique provides an effective method when processing a group of small-scale structured documents.

SUMMARY OF THE INVENTION

To solve the above problems, it is an object of the present invention to provide a structure retrieval apparatus in which tags (part of the tags are omissible) are inserted in data to discriminate portions of the data to thereby express a structure, and the structure of the data is searched at high speed.

To this end, in accordance with a first aspect of the present invention, there is provided a structure retrieval apparatus comprising: data storing means (11) for storing data in which tags are partially omissible when the tags are inserted in the data, and the data is partially discriminated by the tags so as to represent a structure; type storing means (12) for storing a pattern of the structure represented by the tags; restoration processing means (13) for restoring an omitted portion of the tag in the data stored in the data storing means on the basis of the pattern of the structure; and structure retrieving means (14) for controlling the restoration processing means when a designated structure is retrieved, for effecting processing of partially restoring the structure with respect to necessary and minimum partial data concerning the tag of the structure subject to retrieval, and for retrieving the tag of the structure subject to retrieval on the basis of the restored partial data.

In accordance with a second aspect of the present invention, there is provided a structure retrieval apparatus comprising: data storing means (111) for storing data in which tags are partially omissible when the tags are inserted in the data, and the data is partially discriminated by the tags so as to represent a structure; type storing means (112) for storing a pattern of the structure represented by the tags; restoration processing means (113) for restoring an omitted portion of the tag in the data stored in the data storing means on the basis of the pattern of the structure; structure retrieving means (114) for retrieving a tag of a designated structure; essential-structure searching means (115) for searching the structure of the pattern for a tag which is located at a higher level than that of the structure subject to retrieval and is not omissible, in a case where the tag concerning the structure subject to retrieval is omissible; and control means (116) for controlling the restoration processing means so as to effect partial structure restoration processing on the basis of necessary and minimum partial data concerning the tag by using the tag found by the essential-structure searching means, and for controlling the structure retrieving means so as to retrieve the structure subject to retrieval.

In accordance with a third aspect of the present invention, there is provided a structure retrieval apparatus comprising: data storing means (121) for storing data in which tags are partially omissible when the tags are inserted in the data, and the data is partially discriminated by the tags so as to represent a structure; type storing means (122) for storing a pattern of the structure represented by the tags; restoration processing means (123) for restoring an omitted portion of the tag in the data stored in the data storing means on the basis of the pattern of the structure; structure retrieving means (124) for retrieving a tag of a designated structure on the basis of the data stored in the data storing means; essential-structure searching means (125) for searching the structure of the pattern for a tag which is located at a higher level than that of the structure subject to retrieval and is not omissible, in a case where the tag concerning the structure subject to retrieval is omissible; and control means (126) for controlling the restoration processing means so as to effect structure restoration processing with respect to necessary and minimum partial data concerning the tag by using the tag found by the essential-structure searching means, and for replacing corresponding data stored in the data storing means by restored data.

In the structure retrieval apparatus in accordance with the first aspect of the present invention, the data storing means (11) stores data in which tags are partially omissible when the tags are inserted in the data, and the data is partially discriminated by the tags so as to represent a structure. In addition, the type storing means (12) stores a pattern of the structure represented by the tags. The restoration processing means (13) restores an omitted portion of the tag in the data stored in the data storing means on the basis of the pattern of the structure.

When a structure designated by, for instance, a user is retrieved with respect to the data stored in the data storing means, the structure retrieving means (14) controls the restoration processing means, effects processing of partially restoring the structure with respect to necessary and minimum partial data concerning the tag of the structure subject to retrieval, and retrieves the tag of the structure subject to retrieval on the basis of the restored partial data.

As a result, since the structure retrieving means (14) retrieves the tag subject to retrieval by performing the processing for partially restoring the structure with respect to only necessary and minimum partial data concerning the tag of the structure subject to retrieval, the substantial retrieval time can be shortened. For this reason, it is possible to effect structure retrieval at high speed.

In the structure retrieval apparatus in accordance with the second aspect of the present invention, the data storing means (111) similarly stores data in which tags are partially omissible when the tags are inserted in the data, and the data is partially discriminated by the tags so as to represent a structure. The type storing means (112) stores a pattern of the structure represented by the tags. The restoration processing means (113) for restores an omitted portion of the tag in the data stored in the data storing means on the basis of the pattern of the structure

When a structure designated by, for instance, a user is retrieved from the data stored in the data storing means, the structure retrieving means (114) retrieves a tag of the designated structure, and at that time the essential-structure searching means (115) searches the structure of the pattern for a tag which is located at a higher level than that of the structure subject to retrieval and is not omissible, in a case where the tag concerning the structure subject to retrieval is omissible. Then, the control means (116) controls the restoration processing means so as to effect partial structure restoration processing on the basis of necessary and minimum partial data concerning the tag by using the tag found by the essential-structure searching means, and controls the structure retrieving means so as to retrieve the structure subject to retrieval.

Since the tag subject to retrieval can be retrieved as the control means (116) controls the restoration processing means (113) by using the tag found by the essential-structure searching means (115) and by merely effecting partial structure restoration processing on the basis of necessary and minimum partial data concerning the tag, the substantial retrieval time can be shortened. For this reason, it is possible to effect structure retrieval at high speed.

In the structure retrieval apparatus in accordance with the third aspect of the present invention, the data storing means (121) similarly stores data in which tags are partially omissible when the tags are inserted in the data, and the data is partially discriminated by the tags so as to represent a structure. The type storing means (122) stores a pattern of the structure represented by the tags. The restoration processing means (123) for restores an omitted portion of the tag in the data stored in the data storing means on the basis of the pattern of the structure.

The structure retrieving means (124) retrieves a tag of a designated structure on the basis of the data stored in the data storing means (121). When so doing, the essential-structure searching means (125) searches the structure of the pattern for a tag which is located at a higher level than that of the structure subject to retrieval and is not omissible, in a case where the tag concerning the structure subject to retrieval is omissible. Then, the control means (126) controls the restoration processing means (123) so as to effect structure restoration processing with respect to necessary and minimum partial data concerning the tag by using the tag found by the essential-structure searching means, and replaces corresponding data stored in the data storing means by restored data.

As a result, since the data in which the tags subject to retrieval are partially restored are consecutively replaced and are stored in the data storing means (121), the structure retrieving means (124) subsequently retrieves the tag of the designated structure on the basis of the data stored in the data storing means (121). In this case as well, since the tag subject to retrieval can be retrieved as the control means (126) controls the restoration processing means (123) by using the tag found by the essential-structure searching means (125) and by merely effecting partial structure restoration processing on the basis of necessary and minimum partial data concerning the tag, the substantial retrieval time can be shortened. For this reason, it is possible to effect structure retrieval at high speed. In addition, since the data in which the tags subject to retrieval are partially restored are consecutively replaced and are stored in the data storing means (121), in structure retrieval which is subsequently conducted, there are cases where it is unnecessary to carry out the structure restoration processing. Consequently, there is an additional advantage in which the substantial retrieval time can be shortened.

Thus, in accordance with the structure retrieval apparatus in accordance with the present invention, a structure retrieval apparatus is provided in which tags are inserted in data, the data is partially discriminated by the tags so as to represent a structure, and the structure can be retrieved at high speed from the data in which the tags are partially omissible. In addition, a structure retrieval apparatus is provided in which, by converting the data into document text, tags are inserted into the text, the text is divided into document elements, and the structure can be retrieved at high speed from the structured document.

The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the invention when read in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating a configuration of a structure retrieval apparatus in accordance with a first embodiment of the present invention;

FIG. 2 is a diagram illustrating an example of an operating screen in a case where a structure retrieving instruction is given;

FIG. 3 is a flowchart explaining structure retrieval processing by a structure retrieving section;

FIG. 4 is a flowchart explaining processing for retrieving an object structure from a partially restored structure;

FIG. 5 is a flowchart illustrating a processing flow of an essential-structure searching section;

FIG. 6 is a flowchart illustrating a processing flow of a structure restoring section;

FIG. 7 is a diagram specifically explaining the manner in which a structure of a structured document is partially restored in a case where a corresponding end tag is omitted;

FIG. 8 is a diagram illustrating another form of document type definition stored in a type storing section;

FIG. 9 is a diagram illustrating another example of the structure retrieving instruction;

FIG. 10 is a diagram illustrating still another example of the structure retrieving instruction;

FIG. 11 is a block diagram illustrating a configuration of a structure retrieval apparatus in accordance with a second embodiment of the present invention;

FIG. 12 is a block diagram illustrating a configuration of a structure retrieval apparatus in accordance with a third embodiment of the present invention;

FIG. 13 is a diagram illustrating an example of the document type definition (DTD) of SGML;

FIG. 14 is a diagram explaining an example of an SGML document in which tags are omitted; and

FIG. 15 is a diagram explaining an example of the SGML document in which the omitted tags are restored.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring now to the accompanying drawings, a description will be given of the embodiments of the present invention. FIG. 1 is a block diagram illustrating a configuration of a structure retrieval apparatus in accordance with a first embodiment of the present invention. In FIG. 1, reference numeral 11 denotes a document-data storing section; 12, a type storing section; 13, a structure restoring section; 14, a structure retrieving section; 15, and an essential-structure searching section. Numeral 16 denotes a structure retrieving instruction for retrieving a structure, and 17 denotes a retrieved result.

Document data for obtaining a structured document is stored in the document-data storing section 11, wherein the structured document is formed such that tags are inserted in the text data of a document, and the document text is partially distinguished (as document parts) by the tags. This document data is similar to, for instance, the SGML document 140 (FIG. 14) described before, and is the document data of a document architecture in which the structure of the document is represented by tags, and the tags are partially omissible. As a pattern of the structure represented by tags, a document type definition 130, such as the one shown in FIG. 13, is stored in the type storing section 12 in correspondence with the document data of the structured document. In addition, the structure restoring section 13 restores omitted portions of the tags in the document data stored in the document-data storing section 11, on the basis of the pattern (document type definition) stored in the type storing section 12.

Upon receiving the structure retrieving instruction 16 from a user or another apparatus, the structure retrieving section 14 effects retrieval processing of the structure with respect to the document data stored in the document-data storing section 11. At that time, by controlling the structure restoring section 13, the structure retrieving section 14 effects processing for partially restoring the structure with respect to necessary and minimum partial data concerning the tags of a structure subject to retrieval, and retrieves the tags of the structure subject to retrieval by means of the restored partial structure.

This structure retrieving section 14 includes the essential-structure searching section 15 as a part of its processing function, and the essential-structure searching section 15 searches for a necessary and minimum portion concerning the tags of the structure subject to retrieval. Subsequently, the structure restoring section 13 partially restores the structure with respect to the found necessary and minimum document data, and processing is then effected for searching for the object structure (tags) in the document by means of the restored partial structure. The pattern (document type definition) of the structure stored in the type storing section 12 is referred to at the time of structure restoration processing by the structure restoring section 13, and is also referred to when the essential-structure searching section 15 searches for the necessary and minimum portion concerning the tags of the structure subject to retrieval.

In the above-described manner, the structure retrieving section 14 effects processing for partially restoring the structure with respect to only the necessary and minimum partial data concerning the tags of the structure subject to retrieval, retrieves the tags (structure) subject to retrieval, and outputs the retrieved result 17. For this reason, as for the retrieval of the structure on the basis of the document data of the structured document here, the substantial retrieval time can be shortened, and the retrieval of the structure can be effected at high speed.

FIG. 2 is a diagram illustrating an example of an operating screen in the case where an instruction for receiving the structure is given. A structure retrieving request from the user or another apparatus or the like opens a subwindow 20 of a structure retrieving property, and a retrieving instruction is given there by designating constraints in retrieving the structure. As for the constraints in the structure retrieval, an object structure is designated by a field 21 for designating an object structure, a constraint of the content is designated by a field 22 for designating a content constraint, and a prescribed condition for constraining the structure is designated by a field 23 for designating a structural constraint. All the constraints in retrieving the structure designated by these three fields may not necessarily be designated, and the retrieval of the structure may be executed by the designation of some constraints. In that case, however, the desired structure is not sufficiently narrowed down, and there is a possibility that many structures satisfying the conditions may be retrieved.

As the structure retrieving instruction 16 having such constraints is given, the structure retrieving section 14 starts the retrieval processing of the structure. Next, a more detailed description will be given of the contents of processing by the structure retrieving section 14. In the basic processing of retrieving the structure by the structure retrieving section 14, the structure (tags) subject to retrieval is accepted from the user or another apparatus, the essential-structure searching section 15 and the structure restoring section 13 are accessed to request processing, and processing is effected in which the partial structure of the restored document is searched for the structure (tags) subject to retrieval, and the retrieved result is returned to the user or the apparatus.

Incidentally, as for the form of the structure retrieving instruction 16 from the user or another apparatus which is delivered to the structure retrieving section 14, an instruction for retrieving the structure is given in the form of a structure retrieving property sheet (FIG. 2) or a command in conformity with the form of the apparatus such as an apparatus whereby the user directly edits a document, such as a word processor or a document editor. In addition, in the case of an apparatus which is connected to a network, e.g., an apparatus such as a retrieval server, its structure retrieving instruction is given by a predetermined protocol through the network from another apparatus. In either case, there is no difference in the contents (constraints of the structure) of the given information in their structure retrieving instruction.

Also, the presentation methods for outputting the retrieved result here are similar in the respective cases. The retrieved result is outputted in conformity with the form of the apparatus such as an apparatus whereby the user directly edits a document, such as a word processor or a document editor. In