WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Document processing system deciding apparatus provided with selection functions    
United States Patent4876665   
Link to this pagehttp://www.wikipatents.com/4876665.html
Inventor(s)Iwai; Isamu (Kawasaki, JP); Okamoto; Toshio (Bunkyo, JP); Doi; Miwako (Kawasaki, JP)
AbstractA document processing apparatus has a heading word dictionary, a heading extractor, a heading rule dictionary, a heading decision section, a document architecture rule dictionary and, a document architecture decision section, for deciding a logical document architecture. The apparatus further comprises a rule application decision section and a candidate selection indication section to allow the operator to select any desired document architecture, when the document architecture decision section decides plural document architecture candidates exist in accordance with document architecture rules, thus improving the operability of the system. Further, the past rule application record information (priority order) is stored and updated so as to provide a learning function for providing better operability.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4876665
Document processing system deciding apparatus provided with selection

     functions - US Patent 4876665 Drawing
Document processing system deciding apparatus provided with selection functions
Inventor     Iwai; Isamu (Kawasaki, JP); Okamoto; Toshio (Bunkyo, JP); Doi; Miwako (Kawasaki, JP)
Owner/Assignee     Kabushiki Kaishi Toshiba (Kawasaki, JP)
Patent assignment
All assignments
Publication Date     October 24, 1989
Application Number     06/947,091
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 29, 1986
US Classification     707/200 715/514 715/531
Int'l Classification     G06F 003/14 G06F 015/21 G06F 015/40 G06F 015/62
Examiner     Williams Jr.; Archie E.
Assistant Examiner     Harrell; Robert B.
Attorney/Law Firm     Foley & Lardner, Schwartz, Jeffery, Schwaab, Mack, Blumenthal & Evans
Address
Parent Case    
Priority Data     Apr 18, 1986[JP]61-88065
USPTO Field of Search     364/200 MS File 364/900 MS File 364/518
Patent Tags     document processing deciding provided selection functions
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4723209
Hernandez

Feb,1988

[0 after 0 votes]
4713754
Agarwal
707/100
Dec,1987

[0 after 0 votes]
4710885
Litteken
715/513
Dec,1987

[0 after 0 votes]
4663615
Hernandez
715/785
May,1987

[0 after 0 votes]
4633430
Cooper
715/500
Dec,1986

[0 after 0 votes]
4631531
Maeda
345/556
Dec,1986

[0 after 0 votes]
4610025
Blum
382/177
Sep,1986

[0 after 0 votes]
4604712
Orrhammar
358/1.11
Aug,1986

[0 after 0 votes]
4601003
Yoneyama
715/775
Jul,1986

[0 after 0 votes]
4580218
Raye
707/1
Apr,1986

[0 after 0 votes]
4539653
Bartlett
715/520
Sep,1985

[0 after 0 votes]
4503515
Cuan
715/537
Mar,1985

[0 after 0 votes]
4502128
Okajima
704/8
Feb,1985

[0 after 0 votes]
4429372
Berry
715/508
Jan,1984

[0 after 0 votes]
4358824
Glickman
707/5
Nov,1982

[0 after 0 votes]
4193119
Arase
704/2
Mar,1980

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A document processing apparatus comprising:

processor means for controlling document processing operations;

input means connected to said processor means, for inputting document data;

heading dictionary storing means for storing words and phrases frequently used as headings arranged in a column direction;

heading candidate extraction means connected to said processor means and said heading dictionary storing means, for extracting, as a heading candidate, one of a plurality of words and phrases, which corresponds to one of the headings stored in said heading dictionary storing means, from the document data input at said input means;

heading rule dictionary means for storing rules used in determining the headings;

heading deciding means connected to said processor means and said heading rule dictionary means, for checking whether the heading candidate extracted by said heading candidate extracting means is a heading or a non-heading according to the heading rules stored in said heading rule dictionary means;

document architecture rule dictionary means for storing rules associated with document logical architectures; and

document architecture deciding means connected to said processor means and said document architecture rule dictionary means, for deciding document logical architecture candidates of the heading by checking whether the heading and the non-heading decided by said heading deciding means is a chapter heading, a section heading or a paragraph, in accordance with the document architecture rules stored in said document architecture rule dictionary means;

the apparatus further comprising document architecture selecting and indicating means for allowing an operator to select at least one desired document architecture when said document architecture deciding means decides a plurality of document architecture candidates in accordance with document architecture rules,

wherein said document architecture selecting and indicating means comprises;

(a) rule application deciding means accessible to said document architecture rule dictionary means when plural document logical architecture candidates are decided by said document architecture deciding means, for checking a rule name requesting candidate selection to retrieve flags corresponding to the rule name from an application rule table; and

(b) candidate selecting and indicating means responsive to a candidate selecting key, provided in said document input means, for allowing the operator to update the flags by selecting at least one desired document architecture through the candidate selecting key, said document architecture rule dictionary means storing rule application record information indicative of past rule application situations and said document architecture deciding means deciding a document architecture rule to be applied with reference to the stored rule application record information in order to facilitate document architecture selection dependent upon a learning function.

2. The document processing apparatus set forth in claim 1, wherein said document architecture deciding means updates the rule application record information.

3. The document processing apparatus as set forth in claim 1, wherein the rule application record information includes the number of times when a rule is applied and states where a rule is applied.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

(1) Field of the Invention

The present invention relates generally to a document processing apparatus, and more specifically to a document processing apparatus which can form a logical document architecture with respect to chapters, itemized statements, paragraphs, etc. of a document.

(2) Description of the Prior Art

In general, a document is divided into a plurality of blocks, and headings are assigned to the respective blocks to facilitate reading of the document. Further, each block is divided into subblocks, and subheadings are assigned to the respective subblocks. The headings and subheadings are composed of short sentences, and additionally heading symbols are often added to these introductory portions, for instance, such as "Chapter 1" or "Section 3", respectively. When documents having the hierarchical structure as described above are processed by a computer, the following problem arises: In the conventional document processing systems, since document data are processed in units of frames on the display or of pages of printing sheets, where a given chapter is required to be replaced with another, both the start and end positions of the document data to be moved elsewhere should be designated by the cursor. In this case, if the document data of the any given chapter is long, the display screen must be scrolled many times from the start position to the end position to be designated. The above screen scrolling is troublesome and tends to result in operational errors.

When an operator drafts a document, he often wishes to see the previous sentences, for instance, to check the contents of the previous sentences and the kinds of the previous heading symbols. In this case, he must guess the page and the position of the line which include the required sentence and heading symbol to be checked, and thereafter must search the desired sentence and heading symbol. The above search operation is troublesome and therefore the document drafting efficiency is greatly degraded.

To solve the above-mentioned problem, the same applicant and the same inventors have already filed a novel document processing apparatus which comprises document data inputting means; heading dictionary means; heading rule dictionary means; heading deciding means; document architecture rule dictionary means; and document architecture deciding means.

The above document processing apparatus can prepare a logical document hierarchical architecture list by handling documents in units of items in order that the operator can readily designate any given headings, itemized statements, paragraphs for easy document editing.

In the above document processing apparatus, however, since document architecture is decided in accordance with only the document architecture dictionary, there exists a problem such that heading is decided erroneously. For instance, a heading "2.2 Class Training" can be decided as an addition of "2.2" (Heading Symbol) and "Class Training" (Heading Word) or of "2" (Heading Symbol) and "2 class Training" (Heading Word). Further, where itemized statement of "1 . . . ", "2 . . . ", "3 . . . ", and "4 . . . " exist under a chapter heading "4 . . . " and further "5 . . . " follows, the "5 . . . " can be decided as a chapter heading or an itemized heading.

Therefore, when the document architecture univocally decided by the computer is different from that intended by the user, the user should modifly the document architecture to an intended one, thus resulting in maloperability.

SUMMARY OF THE INVENTION

With these problems in mind, therefore, it is the primary object of the present invention to provide the document processing apparatus with an additional function such that a plurality of document architecture candidates can be decided and the operator can readily select any one of desired candidate for providing a better operability.

To achieve the above-mentioned object, the document processing apparatus according to the present invention comprises:

a unit for inputting document data; heading dictionary unit for storing words and phrases frequently used as a heading; heading candidate extraction unit for extracting, as a heading candidate, the word and phrase corresponding to the heading stored in said heading dictionary unit from the document data input through said input unit; heading rule dictionary means for storing rules for deciding the headings; heading deciding unit for checking the heading candidate extracted by said heading candidate extracting unit in accordance with the heading rule stored in said heading rule dictionary unit and for deciding whether the heading candidate is a heading; document architecture rule dictionary unit for storing rules associated with document logical architectures; document architecture deciding unit for deciding the document architecture of a heading decided by said heading deciding unit and the sentence decided as a non-heading in accordance with document architecture rules stores in said document architecture rule dictionary unit; and, in particular, document architecture selecting and indicating unit for allowing an operator to select a desired document architecture when the document architecture deciding unit decides a plurality of document architecture candidates in accordance with document architecture rules.

The document architecture selecting and indicating unit comprises rule application deciding unit and candidate selecting and indicating unit. The rule application deciding unit is accessible to the document architecture rule dictionary to check a rule name requesting candidate selection and to retrieve flags corresponding to the rule name from a table. The candidate selecting and indicating unit is accessible to a candidate selecting key provided in the document input unit to update flags so that any desired document architecture can be selected.

Further, the document architecture rule dictionary stores rule application record information indicative of past rule application situations, and the document architecture deciding unit decides an architecture rule to be applied with reference to the stored rule application record information. The above record information can be updated.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of the document processing apparatus according to the present invention;

FIG. 2 is an example of documents;

FIG. 3 is an examplary heading word dictionary;

FIGS. 4A to 4D are an examplary heading rule dictionary;

FIG. 5 is a flowchart showing the operation procedure of the apparatus shown in FIG. 1;

FIGS. 6A to 6F show an examplary sequence of logical document architecture lists stored in the logical architecture storage,

FIGS. 7A to 7C show few examples of application of heading rules shown in FIGS. 4A to 4D to the document shown in FIG. 2;

FIG. 8 shows an example of stored rule tables including flags corresponding to rule names; and

FIG. 9 is a flowchard showing a procedure of the operation of the rule application decision section.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

With reference to the attached drawings, the document processing apparatus according to the present invention will be described in detail hereinbelow:

With reference to FIG. 1, a document processor 1 is connected to an input device 2 including a keyboard 2a for achieving centralized handling and processing input documents. The document processor 1 is also connected to an original document storage 3 for storing input original documents and to a display controller 4 for causing a display 5 to indicate the input original document read out from the storage 3. The document processor 1 is furhter connected to a heading extractor 6, a heading decision section 8, a document architecture decision section 9, and a logical architecture storage 10. The heading extractor 6 is connected to a heading word dictionary 7 for storing many types of words representative of headings. The heading decision section 8 includes a heading rule dictionary 8a. The document architecture decision section 9 includes a document architecture rule dictionary 9a.

The document processor 1 sequentially detects document data segmentation codes stored in the original document storage 3, for example, such as a line return code, and extracts sentences segmented by the segmentation code. In this case, the document processor 1 measures each sentence length. The extracted sentences are sequentially sent to the heading extractor 6. The heading extractor 6 decides the heading word by comparison of the input sentence with heading words stored in the heading word dictionary 7, and the sentence length.

The heading word dictionary 7 stores frequently used words, phrases and symbols, all of which are defined as heading words. The words, phrases and symbols are classified into categories, as shown in FIG. 3, and are registered in advance in the dictionary 7. Words such as "introduction" and "abstract" are registered in a category of "reserved heading word". In addition, frequently used numerals and symbols are also registered as heading words being classified into the respective categories.

The heading extractor 6 decides whether the number of characters of an extracted sentence is less than a predetermined number. In other words, the extractor 6 decides whether an extracted heading word (a word and/or phrase, and/or numeral and/or symbol represented as a code string) corresponds to one of the words registered in the dictionary 7. If a correspondence is detected, the extracted word is recognized as the corresponding heading word.

The extracted words decided by the extractor 6 as being the heading word are input, one by one, to the heading decision section 8, under the control of the processor 1. The heading decision section 8 decides, in accordance with the heading rules (FIGS. 4A to 4D) stored in the dictionary 8a, whether the recognized heading word is a heading word or another word.

The word discriminated as the heading word or any other word by the heading decision section 8 is input to the document architecture decision section 9 under the control of the processor 1. The architecture decision section 9 decides whether the sentence or word sent from the heading decision section 8 is a chapter heading, a section heading, or a paragraph, in accordance with the document architecture rules (shown below) stored in the document architecture rule dictionary 9a:

TABLE 1 ______________________________________ Rules for Heading ______________________________________ Condition 1: A reserved word is not included. Condition 1-1: A heading word is included. Condition 1-1-1: A reserved heading word is included. Condition 1-1-1-1: A chapter heading is not included in the previous part. (Result) Indicates a chapter heading. .fwdarw. A symbol portion, an alphanumeric portion, a punctuation portion, or a tail sysmbol is defined as a main heading pattern. Condition 1-1-2: A reserved heading word is not included. Condition 1-1-2-2: A chapter heading is present in the previous part. Condition 1-1-2-2-1: Matching with a chapter heading pattern is successful. (Result) Indicates a chapter heading. .fwdarw. The order of the chapter heading pattern is incremented by one. Condition 1-1-2-2-2: This heading pattern does not match the previous chapter heading. Condition 1-1-2-2-2-1: An itemized pattern is not present in the previous part. (Result) Indicates an itemized pattern candidate. Condition 1-1-2-2-2-2: An itemized pattern is present in the previous part. Condition 1-1-2-2-2- This heading pattern matches the 2-1: itemized pattern candidate. (Result) Indicates an itemized pattern. The order of the itemized pattern is incremented by one. ______________________________________

TABLE 2 ______________________________________ Rules for Matching with Heading Patterns ______________________________________ Condition 1-1: An alphanumeric portion is included. Condition 2-1: Alphanumeric portion are the same kind. Condition 3-1: The order of the alphanumeric portion is higher by one than that of a heading pattern. Condition 4-1: A symbol portion, factors excluding the order of an alphanumeric portion, a punctuation portion, a tail symbol, and the presence/absence of parentheses in the heading word are the same as those of the heading pattern. (Result) Indicates successful matching. Condition 4-2: A symbol portion, factors excluding the order of an alphanumeric portion, a punctuation portion, a tail symbol, and the presence/absence of parentheses in the heading word are the same within the range of the error pattern rules. (Result) Indicates successful matching. Condition 4-3: A symbol portion, factors excluding the order of an alphanumeric portion, a punctuation portion, a tail symbol, and the presence/absence of parentheses in the heading word are not the same as those of the heading pattern. (Result) Indicates failure matching. Condition 3-2: The order of the alphanumeric portion is equal to or incremented by two from the order of the heading pattern. Condition 4-1: A symbol portion, factors excluding the order of an alphanumeric portion, a punctuation portion, a tail symbol, and the presence/absence of parentheses in the heading word are the same within the range of the error pattern rules. (Result) Indicates successful matching Condition 4-2: A symbol portion, factors excluding the order of an alphanumeric portion, a punctuation portion, a tail symbol, and the presence/absence of parentheses in the heading word are not the same as those of the heading pattern. (Result) Indicates failure matching. Condition 1-2: An alphanumeric pattern is not included. Condition 2-1: A symbol portion, factors excluding the order of an alphanumeric portion, a punctuation portion, a tail symbol, and the presence/absence of parentheses in the heading word are the same as those of the heading pattern. (Result) Indicates successful matching. Condition 2-2: A symbol portion, factors excluding the order of an alphanumeric portion, a punctuation portion, a tail symbol, and the presence/absence of parentheses in the heading word are not the same as those of the heading pattern. (Result) Indicates failure matching. ______________________________________

TABLE 3 ______________________________________ Paragraph Associated Format ______________________________________ Condition 1-1: A heading is not included. (Result) Indicates a paragraph. ______________________________________

TABLE 4 ______________________________________ Conjunction Associated Format ______________________________________ Condition 1-1: A paragraph. Access rule application decision section Condition 2-1: Applied flag information is X1. (Result d.sub.1) Set the level of the current heading to the same as that of the previous heading. Condition 2-2: Applied flag information is X2. (Result d.sub.2) Set the level of the current heading to the same as that of the previous chapter heading. ______________________________________

The logical architecture of a sentence or word, as determined by the document architecture decision section 9 in accordance with the above rules, is stored in the logical architecture storage 10.

The display controller 4 controls the display 5 to display the document data according to the document logical architecture stored in the logical architecture storage 10.

The operation of the document processing apparatus will now be described with reference to the flow chart shown in FIG. 5. When document data is input to the input device 2 (step a), the input document data is sequentially stored in the original document storage 3. At the same time, the input document data is segmented into a plurality of blocks by the document processor 1, as shown in FIG. 2. In this segmentation processing, a line return codes etc. are determined as segmentation codes. The input document data is segmented in units of blocks at the segmentation codes. In this case, the segmentation sentence length is measured by counting characters. If the measured value falls within a predetermined value (e.g., 40 characters), the sentence is determined as having the possibility of being a heading sentence.

If the segmented sentence is determined as having the possibility of being a heading sentence according to the measured number of characters, the heading extractor 6 decides whether a character string (words, phrases, or symbols) constituting the segmented sentence is registered in the heading word dictionary 7 (step b). For example, when the sentence "1. Introduction" in the input document data is extracted, it is checked as to whether it is registered in the heading dictionary 7. In this case, "1", "." and "Introduction" are retrieved from the heading dictionary 7, and the sentence is determined as being a heading candidate A (step c).

When a heading candidate decision is performed, the heading decision section 8 accesses the heading rule dictionary 8a to determine whether the candidate A is a heading word (step d). If the candidate A is defined by any one of the rules shown in FIGS. 4A to 4D, the candidate A is determined as being heading word B (step e). In this case, the type of heading word is determined according to the applied heading rule.

If the sentence segmented by the document processor 1 does not correspond to any heading word registered in the dictionary 7, or if the segmented sentence does not coincide with any heading rule although it is determined as being a heading candidate word, the segmented sentence is determined as being a sentence not included in the heading word rules (step f).

The sentence determined as being a heading word, and the sentence determined as not being a heading word are input to the document architecture decision section 9 in order to determine their document architecture. When the document architectures are determined, the decision section 9 determines whether the sentence architectures correspond to document architecture rules (Tables 1 to 4) stored in the rule dictionary 9a (step g). If the architecture of the input document is defined by one of the document architecture rules, the document architecture data corresponding to the determined rule is stored in the storage 10 (step h and i).

With reference to the example of segmented sentences as shown in FIG. 2, the above method of determining the document architecture will be described in further detail. In the segmented sentences in FIG. 2, the sentence of the first line, i.e., "document understanding system", and the sentence of the second line, i.e., "Okawa Tara" are not stored in the dictionary 7. These sentences are decided by the extractor 6 not to be heading words. However, the sentence of the first line is defined by a rule representing a noun phrase appearing at the head of the document, and the decision section 9 decides that "document understanding system" is a title. The sentence of the second line, "Okawa Taro" is a proper noun representing a male name. Since the male name follows the title, the name is determined as being an author's name.

The results obtained by the document architecture decision as described above are stored in a form, as shown in FIG. 6A, in logical architecture storage 10.

In the sentence of the third line, i.e., "1. Introduction", three words, i.e., "1", ".", and "Introduction" are stored in the dictionary 7. Therefore, this sentence is determined as being a heading candidate sentence A1 (See FIG. 7A). At the same time, the categories constituting this sentence are recognized as a numeric portion, a punctuation portion, and a heading candidate word, respectively.

The heading decision section 8 accesses the heading rule dictionary 8a to determine whether the sentence determined as being heading candidate A1 is defined by the heading rules. In this case, the order of the categories constituting candidate word A1 is analyzed. The decision section 8 determines whether the order satisfies any one of the conditions in FIGS. 4A to 4D. The first numeral "1" is defined by the rule d shown in FIG. 4D. The numeral "1" and punctuation portion "." are defined by the rule b shown in FIG. 4B. Therefore, "1." is determined as being a heading symbol according to the rule b shown in FIG. 4B. "Introduction" is defined by the rule c shown in FIG. 4C, and is determined as being a heading word. The relationship between the heading symbol and the heading word is defined by the rule a shown in FIG. 4A. The heading candidate A1 is thus decided as heading B1. The above decision process is shown in FIG. 7A.

In the above decision process, if the categories are not defined by the rules a, b, c, d shown in FIGS. 4A to 4D, heading candidate A1 is determined as not being a heading word.

The document architecture decision section 9 determines the document architecture of heading B1 in accordance with the rules in table 1 to 4. In this case, the logical architecture of the analyzed sentence is stored in the storage 10, as shown in FIG. 6A. No chapter heading is indicated in the stored logical architectures. Heading B1, i.e., "1. Introduction" is defined by conditions (1), (1-1), (1-1-1), and (1-1-1-1) in Table 1 so that "1. Introduction" is determined constituting chapter heading C1 as shown in FIG. 7A. According to this decision, the logical architecture containing the chapter heading is stored in the logical architecture storage 10, as shown in FIG. 6B.

Since the number of characters of the sentence of the fourth and fifth lines shown in FIG. 2 exceeds the number for determining the possibility of a sentence being a heading word, this sentence is therefore determined as being other than a heading. As defined by the rule in Table 3, the sentence of the fourth and fifth lines is determined as being a sentence constituting a paragraph.

The sentence of the sixth line "2. Features of System" is recognized as heading candidate A2 in the same procedures as for heading candidate A1. In this case, the sentence of the sixth line is analyzed by the steps in FIG. 7B and is determined as being a heading B2. The heading B2 is compared with the rules in Table 2 to determine it coincides with a specific one of the rules. The heading B2 is defined by conditions (1-1), (2-1), (3-1), and (4-1), and is determined as having the possibility of being of the same level as that of chapter heading C1 "1. Introduction". In this way, it is determined whether the heading B2 is defined by the rules in Table 1. In other words, "2. Features of System" satisfies conditions (1), (1-1), (1-1-2), and (1-1-2-2-1), and thus, the heading word B2 is determined as constituting chapter heading C2. The resultant logical architecture data is stored in the storage 10 as shown in FIG. 6C.

The same processing as described above is performed for the sentences of the seventh and subsequent lines, and the document architectures of these sentences are stored in the storage 10, as shown in FIGS. 6D and 6E. More specifically, for the sentence of the seventh line, heading candidate A3 is analyzed, as shown in FIG. 7C, and then is determined as being heading B3 according to the rules shown in FIGS. 4A to 4D.

In the document architecture decision section 9, the heading B3 is compared with the rules in Table 2. Since the pattern of heading B2 does not previously appear, matching is unsuccessful. As a result, heading B3 is determined as being a heading having a level different from those of the previous headings. Heading B3 is checked in accordance with document architecture rules in Table 1 and is found to coincide with conditions (1), (1-1), (1-1-2), (1-1-2-2), and (1-1-2-2-2-1). Therefore, heading B3 is determined as being itemizing heading C3.

Similarly, since the sentence of the eighth line satisfies conditions (1-1), (2-1), (3-1), and (4-1), the level of the heading corresponding to the sentence of the eighth line is determined as being possibly the same as that of the itemized heading of the seventh line. The sentence of the eighth line is determined as satisfying conditions (1), (1-1), (1-1-2), (1-1-2-2), (1-1-2-2-2), (1-1-2-2-2-2), and (1-1-2-2-2-2-1) in Table 1 and therefore determined as an itemized heading, being stored as shown in FIG. 6D.

With respect to the ninth line "This system is . . . ", it is possible to consider this paragraph as having two cases or two candidates. That is, the first case is that the ninth line is a part of the eighth line itemized heading or "2 High recognition rate", while the second case is that ninth line is a paragraph having the same level as that of the sixth line chapter heading or "2. Feature of System".

Therefore, in the apparatus according to the present invention, the apparatus is so configured as to allow the operator to select any one of the candidates.

To achieve the above-mentioned object, the apparatus further comprises a rule application decision section 12 and a candidate selection indication section 14 as depicted in FIG. 1.

The rule application decision section 12 is allowed to be accessible to the document architecture rule dictionary 9a to check a rule name requesting candidate selection and to retrieve flags corresponding to the rule name from a table (not shown) whenever two or more candidates are decided. The candidate selecting and indicating section 14 is accessible to a candidate selection key arranged in the document input device 2 to update flags so that any desired document architecture can be selected.

In FIG. 5, when a decided candidate does not match with a single document architecture rule or when plural candidates are created (in step g), control allows the rule application decision section 12 to be accessible to the document architecture rule dictionary 9a.

The above-mentioned candidate selection function is the feature of the present invention.

As already explained, the document architecture decision section 9 determines whether the sentence architectures correspond to the document architecture rules (Tables 1 to 4) stored in the document architecture rule dictionary 9a. In this case, there exists the case where the determined heading candidate word matches a plurality of rules and therefore it is impossible to univocally determine the document architecture. In this case, a plurality of artitecture candidates are written in the logic