WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method for reading a document and a document reading apparatus utilizing an image buffer    
United States Patent4811416   
Link to this pagehttp://www.wikipatents.com/4811416.html
Inventor(s)Nakamura; Yoshikathu (Yokosuka, JP)
AbstractIn a document reading apparatus, a document is subdivided into a plurality of information fields into which character and/or image information has been recorded in accordance with predetermined format data for a document, and an imae memory has a plurality of write regions, the number of which is smaller than that of the information fields. A system control means inspects present writable regions within the image memory from which the character/image data stored has been read out, thereby permitting the document to be transported prior to the reading of the succeeding information field of the document.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4811416
Method for reading a document and a document reading apparatus utilizing

     an image buffer - US Patent 4811416 Drawing
Method for reading a document and a document reading apparatus utilizing an image buffer
Inventor     Nakamura; Yoshikathu (Yokosuka, JP)
Owner/Assignee     Kabushiki Kaisha Toshiba (Kawasaki, JP)
Patent assignment
All assignments
Publication Date     March 7, 1989
Application Number     06/911,438
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 25, 1986
US Classification     382/317 358/443 358/482 358/486
Int'l Classification     G06K 009/20
Examiner     Blum; Theodore M.
Assistant Examiner    
Attorney/Law Firm     Oblon, Fisher, Spivak, McClelland & Maier
Address
Parent Case    
Priority Data     Sep 27, 1985[JP]60-21546
USPTO Field of Search     382/61 382/41 358/280
Patent Tags     reading document document reading utilizing image buffer
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4712139
Kato
358/439
Dec,1987

[0 after 0 votes]
4651221
Yamaguchi
358/444
Mar,1987

[0 after 0 votes]
4408181
Nakayama
382/306
Oct,1983

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A document reading apparatus comprising:

means for transporting a document subdivided into a plurality of image pattern fields, each having at least one image pattern recorded thereon, in accordance with a predetermined format of the document;

means for reading the image pattern from the image pattern fields of the document, to derive image information corresponding to the image pattern while the document is intermittently transported by the transporting means in accordance with the format of the document;

memory means including a plurality of write regions corresponding to the image pattern fields of the document, for writing the image information into the predetermined write regions, based upon a predetermined document format data corresponding to the format of the document, and for reading image information therefrom, said memory means having a memory capacity which is capable of storing an amount larger than a maximum amount of the image information;

means for recognizing the image information read out from the write regions of the memory means;

system control means for storing, in advance, the document format data of the document to be read, and for driving the document transporting means to intermittently transport the document, prior to the reading of the succeeding one of the image pattern fields of the document, said system control means including means for updating the writable area while the image information is being recognized by said recognizing means, means for calculating an amount of writable regions of the memory means, on the basis of the whole memory capacity of the memory means and the write regions in which the image information has been written, and means for continuously monitoring present writable regions within the memory means, from which the image information stored has been read out; and

means for driving the reading means to read the document when the calculating means calculates a predetermined amount of writable regions corresponding to an amount of the image information which corresponds to at least one of the pattern fields.

2. An apparatus as claimed in claim 1, wherein the system control means further previously stores attribute data representative of sorts of the image pattern recorded on the image pattern fields of the document.

3. An apparatus as claimed in claim 1, wherein the transporting means transports the document in a first direction and reading operation by the reading means is performed in a second direction perpendicular to the first direction.

4. An apparatus as claimed in claim 3, wherein the transporting means transports fields of the document other than the image pattern fields thereof at higher speed than in the image pattern fields.

5. An apparatus as claimed in claim 1, wherein the apparatus further comprises format data buffer means interposed between the recognition means and the system control means, the format data being derived from the format data buffer means in one unit of the image pattern fields of the document into the recognition means, thereby permitting the image information to be recognized in one unit of the image pattern fields of the document.

6. An apparatus as claimed in claim 1, wherein the reading means is a photoelectric line sensor having 2048 bits.

7. An apparatus as claimed in claim 1, wherein the memory means has 2,048.times.2,048-bit memory capacity, which is subdivided into 16 memory regions.

8. An apparatus as claimed in claim 1, wherein the image pattern fields of the document contain at least hand-written characters.

9. A method of reading a document, comprising the steps of:

transporting a document having a plurality of image pattern fields, each having at least one image pattern recorded thereon, in accordance with a predetermined format of the document;

reading the image pattern from the predetermined image pattern fields of the document, to derive image information corresponding to the image pattern while the document is intermittently transported in accordance with the format of the document;

writing the image information into memory means having a memory capacity capable of the storing an amount larger than a maximum amount of the image information;

recognizing the image information read out from the memory means;

storing, in advance, (a) document format data corresponding to the format of the document to be read;

calculating an amount of writable regions of the memory means, on the basis of a whole memory capacity of the memory means and the write regions in which the image information has been written;

updating the writable area while the step of recognizing of the image information is being performed;

monitoring present writable regions within the memory means, wherefrom the image information was read out;

transporting intermittently the document prior to the reading of the succeeding one of the image pattern fields of the document; and

reading the document in response to a predetermined amount of writable regions corresponding to an amount of the image information which corresponds to at least one of the image pattern fields.

10. A document reading apparatus comprising:

means for transporting a document having a plurality of image pattern fields, each having at least one image pattern including at least one of characters and picture recorded thereon, in accordance with a predetermined format for the document;

means for reading the image pattern from the image pattern fields of the document, to derive image information corresponding to the image pattern while the document is intermittently transported by the transporting means in accordance with the format of the document;

memory means for storing, in advance, format data corresponding to the format of the document;

memory means including a plurality of write regions corresponding to the image pattern fields of the document, for writing the image information into the predetermined write regions, in accordance with the format data, and for reading the image information therefrom, said memory emans having a memory capacity capable of storing an amount larger than a maximum amount of the image information;

means for recognizing the image information read out from the write regions of the memory means which stores the image information;

system control means including means for calculating an amount of writable regions of the memory means, on the basis of the whole memory capacity of the memory means and the write regions in which the image information has been written, means for updating the writable area while the image information is being recognized by said recognizing means, and means for monitoring present writable regions within the memory means which stores the image information, wherefrom the stored image information was read out; and

means for driving the reading means for reading the document when the calculating means obtains a predetermined amount of writable regions corresponding to an amount of the image information which corresponds to at least one of the image pattern fields.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION 1. Field of the Invention

The present invention relates to a method and a document reading apparatus capable of reading character and image information recorded on documents at a higher efficiency for image processing and recognition purposes.

2. Description of Prior Art

In recent years, a number of optical character readers (OCR) have been used as means for inputting information into electronic computers. Character subsets to be read by such a kind of OCR include not only the printed alphanumeric subset but also hand-written alphanumeric characters, hand-written KATAKANA characters, typed KANJI characters, and hand-written KANJI characters. Various kinds of character subsets are normally read in association with the development of the reading recognition technique.

The prior art optical character reading device is disclosed, for instance, in Japanese Patent Publication No. 60-20785 (1985).

As fundamentally shown in FIG. 1, in such a kind of conventional OCR, a document 1 is scanned under control of a document controller 2 and the character and image information written on document 1 is read out by a photoelectric transducer 3 and stored into an image buffer 4. The character and image information stored in image buffer 4 is read out by a recognition unit 5 and subjected to the character/image recognition on the segmentation, feature extraction, and the like of the characters and images. A reading controller 6 is also provided to control those units 2 to 5.

Image buffer 4 plays the significant role to efficiently couple the document readout scanning system with the recognition processing system.

As shown in FIG. 2, photoelectric transducer 3 (for example, a line sensor 3a is employed) photoelectrically transduces the character and image information (optically scanned through a lens 3c) on document 1 on a line by line basis at a predetermined resolution in the direction perpendicular to the conveying direction, which document 1 has been conveyed to a readout scanning position 3b of line sensor 3a by document controller 2. In this manner, the information (character strings) 1a and 1b written on document 1 are read out. Image buffer 4 is constituted by, for example, two RAMs (random access memories) 4a and 4b, connected in parallel. The information read out of document 1 in this manner is stored into respective RAMs 4a and 4b on the basis of a unit of, for instance, the character string of one line. After completion of the storage of character strings 1a and 1b each corresponding to one line, image buffer 4 communicates with recognition unit 5, thereby subjecting the information of character strings 1a and 1 b to the recognition processes containing the segmentation, feature-extraction, discrimination and identification, and the like of the characters.

From a viewpoint of processing efficiency, it is very disadvantageous that during the time interval when image buffer 4 communicates with recognition unit 5, the writing operation of the character and image information into image buffer 4 is interrupted. Namely, when scanning of document 1 is interrupted, the scanned information data is potentially damaged, or lost.

Therefore, as mentioned above, RAMs 4a and 4b constituting image buffer 4 are operated in a parallel structure, thereby allowing the writing operation by the readout of the document and the reading operation for the recognition process to be alternately executed in parallel.

On the other hand, requirements to improve the reading performance for such kinds of OCR are even-increasing. For example, these requirements include not only an increase in character categories to be read out, but also an increase in the degree of freedom in writing hand-written characters (namely, a degree of freedom in modification of character styles), liberalization of the document formats, realization of a high data processing speed and the like. However, the conventional OCR as described above has the following problems.

First it is apparent that the time for requiring recognition of the character and image information stored in image buffer 4 varies considerably, depending on the character categories. That is to say, the printed alphanumeric characters and printed KATAKANA characters can be relatively simply recognized at a higher speed; conversely, in the case of the hand-written KANJI characters, a long time is required for the recognition process, since the character pattern structure is complicated, as well as there are many character subsets and similar characters.

Such a fact can be seen by example of the document shown in FIG. 3. In this case, document 1 contains character strings having different character subsets, namely, KATAKANA characters, HIRAGANA characters, KANJI characters, numerals, Roman characters, and a map. These characters are sequentially arranged in accordance with the scanning order perpendicular to the scanning direction and are scanned at a constant speed. As a result, the recognition time of the character and image information is necessarily prolonged as compared with the reading time. In this case, even if RAMs 4a and 4b of image buffer 4 shown in FIGS. 1 and 2 are parallel-connected, the readout operation of document 1 must be temporarily interrupted. This is because no further readout data can be stored in both RAMs 4a and 4b, resulting in a lower processing efficiency.

Moreover, as shown in FIG. 3, if a step 1c exists between the lines of the character strings (KATAKANA and HIRAGANA characters), these characters cannot be alternately written into two RAMs 4a and 4b in such a manner that the character string of each line is separately written as a unit. In such a case, for example, there is another disadvantage such that the simultaneous write control is required for both image buffer RAMs 4a and 4b. On the other hand, in order to simultaneously write the character information into image buffer RAMs 4a and 4b, there is also another problem that the scan of document 1 needs to be interrupted until the recognition process for the character and image information stored in RAMs 4a and 4b is completed.

Secondly, if the character strings are formated in the same direction as the document feed direction of document 1 as shown in, FIG. 4, the foregoing readout control cannot be applied thereto. In general, the buffer memory capacity of image buffer 4 is designed such that the information of the character string written in one line can be sufficiently stored with a desired accuracy necessary for the recognition process. However, when document 1 is fed with a skew in the document feed direction, the readout area of the character and image information of one line is out of the image buffer size, so that all information of the character string of one line cannot be stored.

To prevent such a problem, according to the conventional OCR, an amount of skew is detected in advance by the edge portion of document 1 to be conveyed. If the skew amount exceeds a predetermined value, the transportation of this document is regarded as an error and thus an instruction is given to the operator to re-enter the document into the OCR. However, the execution of such measures impedes the processing efficiency when continuously reading a large quantity of documents.

Thirdly, as a method of continuously processing a plurality of documents 1, the document convey paths are switched in accordance with the result of the recognition, and documents 1 are sorted and collected. In general, to switch over the document convey paths, documents 1 are continuously conveyed with a predetermined time interval between the successive documents to be continuously fed. This document feed time interval is not negligible, compared with the length of document 1.

In the prior art OCR, the period of time required to convey document 1 by the distance of the sum of the length of document 1 and the document feed interval may be set as the processing unit time for a single document (namely, the time longer than the unit time necessary to process only one document). In spite of such compromise, in the conventional OCR, image buffer 4 is controlled as mentioned above. Therefore, the time which can be allocated for the recognition process must be defined by the time necessary to scan document 1. Thus, the defined processing unit time cannot be effectively used, resulting in a long idle time.

As described above, the conventional OCR has various problems that hinder improvement of the document reading and recognition efficiencies.

To solve such drawbacks, one solution has been proposed that instead of performing the line-to-line recognition control by line buffer RAMs 4a and 4b, a page buffer memory having a capacity sufficiently to cover the entire document size, is employed.

However, when all of the information contained in the document is written into such a page buffer memory, there is another problem that not only the slow reading scan is necessarily required, but also very complicated processes need to be executed to segment the desired character strings from the information. Accordingly, a high-speed process cannot be expected.

The above-described problems of the conventional OCR will now be summarized as follows.

First, it is obvious that the image buffer memory in this kind of OCR has the significant function as a buffer for matching the scanning unit (2, 3) with the recognition unit (5).

In the OCR employing two line buffer memories alternately operable in parallel, the time required for the information recognition is greatly affected by the influences of the degree of freedom in the writing operation, as well as the document format, and skew.

Conversely, the above problems may be solved to some extent by use of the image buffer memory having capacity sufficient to cover a document of the maximum size. However, another drawback then occurs. All of the unnecessary information written on the document must be scanned and stored while at the same time, the necessary information needs to be segmented from the entire information. As a result, the whole processing time is prolonged and a high-speed reading process cannot be expected.

Therefore, there is a need for an optical character reader with a relatively small capacity buffer memory that can execute, with the high performance, for example, the recognition of hand-written KANJI characters under a constant document feed, as can be realized by only the conventional high-performance OCR.

The present invention is made in consideration of such circumstances and an object of the invention is to provide an apparatus for reading characters and images in which the degree of freedom in design of the document format can be improved, the fluctuation in recognition processing time for various kinds of character categories can be absorbed, and the document can be efficiently processed at a high speed.

More specifically, another object of the invention is to provide a document reading apparatus by which a plurality of documents can be continuously fed during the reading process at a substantially constant feeding speed, even if these documents contain hand-written characters and/or images that take much time for recognition.

Still another object of the invention is to provide a document reading apparatus which employs simple recognition arrangements, even if a plurality of documents are substantially constantly fed in the reading process, because the image memory operable under the scroll control can function as a buffer or damper memory.

SUMMARY OF THE INVENTION

The above and other objects of the invention can be realized by providing a document reading apparatus comprising:

means for transporting a document subdivided into a plurality of information fields into which character and/or image information has been recorded in accordance with predetermined format data for the document;

reading means for reading the character and/or image information from the predetermined information fields of the document to derive character/image data while the document is transported by the transporting means based upon the format data of the document;

memory means including a plurality of write regions corresponding to the information fields of the document, for writing the character/image data into the predetermined write regions, based upon the predetermined format data, and for reading the character/image data therefrom;

means for recognizing the character/image data read out from the write regions of the memory means; and,

system control means for previously storing the format data of the document to be read, and for inspecting present writable regions within the image memory means from which the character/image data stored has been read out so as to permit the document to be intermittently transported by the document transporting means prior to the reading of the succeeding information field of the document.

Furthermore, these objects of the invention can be accomplished by providing a method of reading a document comprising steps of:

reading character and/or image information from predetermined information fields of the document in accordance with predetermined format data for the document to derive electronic character/image data;

storing the electronic character/image data into predetermined write regions of memory means corresponding to the information fields of the document;

reading out the electronic character/image data from the predetermined write regions of the memory means;

recognizing the electronic character/image data read out from the predetermined write regions of the memory means; and

inspecting present writable regions within the memory means from which the electronic character/image data stored has been read out, thereby allowing the document to be intermittently read prior to the reading of the succeeding information field of the document.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of these and other objects of the present invention, reference is made to the following detailed description of the invention to be read in conjunction with the following drawings, wherein:

FIG. 1 is a schematic block diagram of a conventional document reading apparatus;

FIG. 2 is an illustration for explaining a relationship between the reading area of the document and the image memory of the reading apparatus shown in FIG. 1;

FIGS. 3 and 4 are document formats;

FIG. 5 is a schematic block diagram of a document reading apparatus according to one preferred embodiment;

FIG. 6A shows two sheets of documents to be read;

FIG. 6B illustrates memory regions of the image memory;

FIG. 7 is an illustration for explaining a relationship between the reading fields of the document;

FIG. 8 illustrates the document convey and reading operations;

FIGS. 9A and 9B show transfer pulses and sensor drive pulses;

FIG. 10 illustrates format data for the reading field of the document;

FIG. 11 illustrates format data of the character;

FIG. 12 is a schematic block diagram of the document controller shown in FIG. 5;

FIG. 13 illustrates the data transfer conditions between the document controller and the recognition control unit shown in FIG. 13;

FIG. 14 illustrates the format data of the reading field of the document;

FIG. 15 is a schematic block diagram of the address controller shown in FIG. 5;

FIG. 16 shows control modes for the image memory shown in FIG. 5;

FIG. 17 shows a flowchart of the overall operations of the document reading apparatus shown in FIG. 5; and

FIG. 18 shows a flowchart of the interrupt process employed in the overall operation process shown in FIG. 17.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

BASIC IDEA

A basic idea of the document reading apparatus according to the invention will now be summarized.

The present invention is directed to a character and image reading apparatus for reading characters and images written on a document by scanning the document, storing the read character and image information into an image memory, and thereby performing the recognition and image processes, wherein the image memory having a memory capacity in excess of the total scanning region of the document is employed and the image memory operation is controlled in a scroll manner as will be explained hereinbelow.

This scroll control for the image memory may be understood by the following three functions:

(1) The document is selectively scanned in line by line in accordance with given format information. The character and image information is read out of the reading area on the document which is designated by the format information. Then, this character and image data is written into the writing unit area in the image memory which is preset in correspondence to the selective scanning of the document. Into the read character and image data stored into the writing unit area, area information regarding the reading area from which the information was obtained and also the attribute information concerned with the characters and images are respectively added, and thereafter this data is subjected to the recognition and image processes.

(2) The capacity (i.e., the number of the unit regions) of the writing unit area into which the character and image data has been written is subtracted from the whole capacity (i.e., the total number of the unit regions) of the image memory, thereby obtaining the capacity of the writable area (i.e., the number of the writable unit regions) in the image memory.

The definition of the "unit regions" in the specification should be understood as follows. In general, since a document contains various-sized readout fields, the corresponding unit regions of the image memory have different sizes, or capacities. It is however apparent that if a document contains the same-sized readout fields only, the corresponding unit regions of the image memory have the same sizes, or capacities.

On the other hand, the number of the writable unit regions is continuously updated while the recognition process is performed. That is to say, the capacity of the writing unit area from which the character and image data had already been read and recognized is added to the capacity of the writable region.

(3) When the succeeding readout region of the document is scanned, the capacity of the reading area on the document, designated by the above format information is compared with that of the updated, or latest writable area in the image memory prior to scanning this region, thereby selecting either the document scanning, or the temporary standby mode.

The features of the invention will be briefly summarized.

First, once the scanning operation of the reading region of the document commences, the scanning cannot be interrupted. Otherwise, problems may occur such as distortion of the image and mechanical damage of the document. Therefore, according to the invention, prior to commencement of reading the next readout field (region) on the document, the residual amount of the writable area in the image memory is checked. Depending upon the residual amount checked, the interruption of the document feed, or the document scanning is selectively performed.

Since the recognition operation of the read character and image data is simultaneously performed in accordance with the readout operation, the memory areas in which the character and image data has been stored so far becomes empty. Thus, these memory areas can be used as new writable memory areas. Namely, since the memory area can be always updated, the same effect achieved in the memory of a large capacity can be obtained even if the buffer memory of a relatively small capacity is used. Such a utilization of the memory areas is defined as a "scroll operation" of the memory in the invention.

Briefly stated, the document reading apparatus according to the invention is characterized in that at least one readout field or region exists in a document and the character and image information to be read is written in this field. Prior to reading the information in this field, the current writable capacity of the buffer memory is checked to determine whether the readout operation is executed, or brought into a standby condition. That is, the prior checking is made whether the character and image information can be completely stored in this latest writable memory regions.

According to the invention, in accordance with the reading areas or fields of the document which are designated by the format information or data, the writing unit areas to store the character and image data are allocated in the image memory and the respective character and image data is written into this allocated writing unit area. Therefore, it is possible to eliminate the limitations of the reading field at a fixed pitch which has been specified in the conventional apparatus. Thus, a document format having a high degree of freedom can be utilized.

In addition, the writable unit areas can be set in the image memory while continuously monitoring the writable area in the image memory. Therefore, a continuous damper function can be established for the scan of the document and also for the recognition processes of the character and image data. Even if the time necessary for the recognition processes of the character and image data greatly varies, this variation can be effectively absorbed and the document is efficiently scanned. In other words, a plurality of documents can be continuously fed during the document reading process at a substantially constant speed, even if these documents contain hand-written characters and/or images that take much recognition time, because the image memory can be operated under the scroll control and as a buffer or damper memory. According to the invention, since the document can be smoothly transferred during the reading process, the document convey unit can be of a simpler design and a low cost.

Not only the process to recognize the characters but also the function to read out the figures in an arbitrary area and the mark of an arbitrary format can be attained similarly to the foregoing effects.

ARRANGEMENT OF DOCUMENT READING APPARATUS

Referring now to FIG. 5, a document reading apparatus 100 according to one preferred embodiment will be described. Document reading apparatus 100 mainly includes: a document feed controller 11, a document convey or feed unit 12; a recognition control unit 13; a photoelectric transducer 14; an image memory 15; an address controller 16; a format data buffer 17; and a reading controller 18 as a host computer.

The function of each unit will now be described. Document feed controller 11 controls the transportation of the document, delivers the recognized character and/or image data to the outside, receives the control data supplied from the outside devices (not shown) in detail, and so on. Document convey unit 12 practically controls the feeding of the document (shown in FIGS. 6A and 6B) and receives and outputs various statuses, or conditions in association with the document feed under control of document feed controller 11. Photoelectric transducer 14 photoelectrically converts the character image on the document into the electric signal, thereby reading and receiving the character image. The character image data which has been binary-digitized and derived from photoelectric transducer 14 is written into image memory 15. The character image data is written into image memory 15 under control of the write address by address controller 16 by the control of document feed controller 11.

Format data buffer 17 stores the format data, i.e., the information for indicating which read character and image data has been stored in the field, or region of the image memory 15. In general, a document has its own predetermined format, and thus, the formats may be varied in accordance with the sort of document. Recognition control unit 13 determines whether the readout field to be recognized exists in image memory 15 or not on the basis of the above-defined information stored in format data buffer 17. In addition, control unit 13 also determines into which region in image memory 15, the readout field was stored. Recognition control unit 13 in accordance with the format data reads the character image of the readout field to be recognized from image memory 15. The character image is then subjected to the processes for segmentation, discrimination, and the like of the characters. The recognized data is returned to document feed controller 11 on a reading field unit basis.

The above description is the fundamental processing function of each functional block.

DOCUMENT FORMAT/READING FIELD

FIG. 6A shows the character information, for example, KANJI information written on two sheets of documents 19A and 19B which are continuously conveyed. FIG. 6B shows the relationship between the unit for the reading field, or region and memory area in image memory 15 when the information is read out on a readout field unit basis of f.sub.1 to f.sub.6 and stored into image memory 15 (FIG. 5) in accordance with the feed sequence.

As shown in FIG. 6B, in document reading apparatus 100, the character and images written on documents 19A and 19B are sequentially read out and input in a manner such that, for example, the character line is read out and used as a unit of the readout field, or region. The read character and image data are sequentially written into the respective unit writing areas which are set, or allocated in the image memory 15 corresponding to the reading field unit of the document.

The reading operation of the character/image information from the documents and the writing operation of the read character/images data into image memory 15 will now be described in detail hereinbelow.

FIG. 7 is a schematic diagram to clarify the positional relationship between a document 21 in document convey unit 12 shown in FIG. 5 and the principal functional device concerned with the recognition process. Document 21 is conveyed such that the upper and left sides are used as the base lines in the X and Y directions. Readout fields (character writing frames) 22 and 23 in document 21 are measured as the distances from each of the base lines, thereby performing a so-called "framing".

As previously described, readout fields 22 and 23 are measured before document 21 is read by document reading apparatus 100 or determined as a predetermined format in advance. The character subsets to be recognized, which are written in readout fields 22 and 23 and other format information are respectively given for each of readout fields 22 and 23.

READING OPERATION

Although not shown in FIG. 5, a plurality of documents 21 are input into the hopper unit and sent out to the convey path one by one with regular intervals by the convey take-out mechanism. In this case, the document is conveyed synchronously with a transfer timing signal which is given from document feed controller 11. Document 21 is fed to a reading position 24 of a photoelectric converting sensor under such document transportion control.

A sensor 25 arranged in front of reading position 24 at a distance Y.sub.0 detects the edge of document 21 to be fed. The readout timings of the character and image information written in readout fields 22 and 23 of document 21 are controlled on the basis of the detection signal.

When a document detection signal is derived as shown in FIG. 8, the relationship of the distances between a non-readout field 26 and document 21 and between a readout field 27 and document 21 using reading position 24 as a reference position at that time can be known. Therefore, when document 21 is fed by only a distance of (Y.sub.0 +Y.sub.1) at a high speed after the document detection signal was obtained, if the reading operation by photoelectric transducer 14 is started, the information in readout field 22 can be read out. This reading operation from readout field 22 is executed over the period of time when document 21 is conveyed by only the distance (Y.sub.2). Subsequently, after document 21 was further fed by the distance (Y.sub.3), reading operation from readout field 23 is similarly started.

It should be noted here that when the reading operation has once been started, the writing operation into image memory 15 cannot be interrupted. Therefore, as will be explained hereinafter, the document reading apparatus is designed in such a manner that the reading and writing operations are performed after confirming that enough memory area, i.e., writable memory region into which the information of readout fields 22 and 23 can be fully written has previously been prepared, or is available.

READOUT TIMING CONTROL

FIG. 9 shows a reading control timing of document 21 as mentioned above. The document feed speed of non-readout field 26 shown in FIG. 8 is set to be twice as high as that of readout field 27, thereby realizing the high document feed speed. FIG. 9A shows a transfer timing signal for readout field 27. FIG. 9B shows a transfer timing signal for non-readout field 26. As shown in these timing charts, the periods of these transfer pulses are set to 1/2. Thus, the period of a drive pulse for photoelectric transducer 14 is varied and this transducer is driven synchronously with the transportation of document 21.

FIG. 10 shows an example of the transfer control information regarding the example of document 21. This information, i.e., the format data is produced by document feed controller 11 (FIG. 5) on the basis of the information of readout fields 22 and 23 of document 21 which was input from an external apparatus (not shown in detail). In accordance with the format information, the transfer timings and the drive pulse of photoelectric transducer 14 are controlled, respectively.

In FIG. 10, (n) denotes a number representative of the sequence of the transfer unit. In this example, (0) to (5) are preset because two readout fields 22 and 23 are present on a sheet of document 21.

(EF) represents a flag indicative of the final transfer unit of document 21. In this case, the flag is set to the transfer unit of (n=5).

(RF) denotes whether the transfer unit is for read-out field 27 or for non-readout field 26. For example, "1" is set for the readout field and "0" is set for the non-readout field. Namely, flag (RF) designates the transfer mode.

(YS) is a value proportional to the distance from the base line (FIG. 7) in the Y direction regarding the readout field, as will be explained in detail hereinafter. The value, namely, the value of Y.sub.s1 in readout field 22 is the same as the value of Y.sub.1 mentioned before. The value of Y.sub.s2 in readout field 23 is equal to (Y.sub.1 +Y.sub.2 +Y.sub.3).

(YL) indicates each transfer unit from the Y base line. The maximum transfer distance is specified by "FFF". The value YL is designated when the first of the documents is conveyed. The value YL is also designated at the end of transfer of the last readout field on the document. This process is performed to identify all of the coordinates by detecting the Y base line of document 21 irrespective of the size of document 21.

(XS) and (XL) denote a distance from the X base line (FIG. 7) of the readout field in document and a width of the readout field.

(FC) designates the subset of the character written in each of readout fields 22 and 23 to be read out, respectively. For example, as shown in FIG. 11, the FC consists of the eight-bit data and indicates the character subset on the basis of the information allocated to each bit position.

In the example of FIG. 11, the bit information is allocated in a manner such that (hand-written/type), (KANJI characters), (alphabets), (numerals), (KATAKANA characters), . . . , and the FC data is "10010000". Therefore, this data means that the character subset to be read out of the readout field is hand-written numerals.

In FIG. 10, the portions with oblique lines denote useless data as format information. Therefore, in the actual document reading apparatus, the data format is specified using flags EF and RF and the format information may be also handled as the information from which the useless data was removed.

CIRCUIT ARRANGEMENT OF DOCUMENT CONTROLLER

Referring to FIG. 12, a description will now be made of a circuit arrangement of document feed controller 11 shown in FIG. 5.

A program controller 31 receives the document format data (containing the data indicative of the readout field, data representative of the character subset to be recognized, and the like) from reading controller (host computer) 18 and operates in accordance with the control program stored in program controller 31. Program controller 31 enables the information to determine the reading or transfer mode to be set into a first flip-flop (1st F/F) 32 and also enables the information to instruct the stop of document feeding ("1" in the case of continuing the document transportation and "0" in the case of stopping the document transportation) to be set into a second flip-flop (2nd F/F) 33.

A timing generator (TG) 34 is provided to generate a drive clock pulse to drive photoelectric transducer 14 (FIG. 5). A counter (CTR) 35 receives the drive clock pulse and generates a transfer timing signal of document 21. Counter 35 is made operative or inoperative on the basis of the initialization data which is selectively input through a multiplexer (MPX) 36.

For example, when the line sensor (not shown in detail) of photoelectric transducer 14 is of the type consisting of 2048 bits (pixels), counter 35 may be realized by the 12-bit (2048 notation) counter. In the case of the reading and transporting, this counter initializes complement of two (2) "7FF" as the constant data and operates. When the non-readout field is scanned, as previously described in conjunction with FIG. 9B, the counter initializes the constant data "BFF" and operates to transfer document 21 at the double scanning speed.

The initial data is set into counter 35 by use of the carry output of counter 35. Multiplexer 36 selects the initial data on the basis of the output of first F/F 32.

By the operation of counter 35, the operation synchronization signals to photoelectric transducer 14 and document convey unit 12 are produced.

The output of second F/F 33 to interrupt the transportation of document 21 is input to an AND circuit 37 to AND-gate with the carry output of counter 35. The transfer timing signal to document convey unit 12 is produced by AND circuit 37.

Other AND circuits 38 and 39 operate in response to the outputs of first and second flip-flops 32 and 33, thereby obtaining address control signals (XINC, YINC) to control the write address of the image data into image memory 15. The address control signals are supplied to address controller 16.

The value of the address control signal regarding the Y component needs to be initialized to zero before a series of documents are continuously read out. For this purpose, the output of program controller 31 is given as YCLR to address controller 16.

OPERATION OF DOCUMENT CONTROLLER

The transportation of document 21, the synchronized control of the image sensor, and the control of the image memory are performed by document feed controller 11 arranged as described above. Thus, the document image data in each readout field on document 21 is continuously written into image memory 15 as shown in FIG. 6B such that the readout field is used as a unit.

According to the preferred embodiment, in order to sequentially write the document images of the readout fields into image memory 15, image memory 15 must be operated in a scrolling manner, since the total capacity (i.e., the total memory regions) of image memory 15 is relatively small. That is to say, recognition control unit 13 needs to be sequentially initialized to recognize the character and image data written in memory regions of the image memory 15 before reading the successive character and image written in the image memory. Further, the memory regions from which the character and image data have been subjected to the recognition processes need to be returned to the writable areas or memory regions in memory 15. Namely, the writable areas or memory regions denote the areas where the character and image data had already been read and recognized. Therefore, the information indicative of the existence of the writable storage areas of the image memory which have no data stored is the significant information necessary for reading and scanning the readout fields of document prior to executing the reading processes. This is because the reading of the character and image information written in the readout field cannot be interrupted.

The document reading apparatus is designed in a manner such that the readout fields to be written into image memory 15 are sequentially allocated from "0" of the Y address. Since the information of the readout fields is sequentially repeatedly written, even if the readout fields are the same, when the documents are different, the information of the readout fields is stored into the different addresses in image memory 15. Therefore, to initialize recognition