|
Description  |
|
|
BACKGROUND OF THE INVENTION 1. Field of the Invention
The present invention relates to a method and a document reading apparatus
capable of reading character and image information recorded on documents
at a higher efficiency for image processing and recognition purposes.
2. Description of Prior Art
In recent years, a number of optical character readers (OCR) have been used
as means for inputting information into electronic computers. Character
subsets to be read by such a kind of OCR include not only the printed
alphanumeric subset but also hand-written alphanumeric characters,
hand-written KATAKANA characters, typed KANJI characters, and hand-written
KANJI characters. Various kinds of character subsets are normally read in
association with the development of the reading recognition technique.
The prior art optical character reading device is disclosed, for instance,
in Japanese Patent Publication No. 60-20785 (1985).
As fundamentally shown in FIG. 1, in such a kind of conventional OCR, a
document 1 is scanned under control of a document controller 2 and the
character and image information written on document 1 is read out by a
photoelectric transducer 3 and stored into an image buffer 4. The
character and image information stored in image buffer 4 is read out by a
recognition unit 5 and subjected to the character/image recognition on the
segmentation, feature extraction, and the like of the characters and
images. A reading controller 6 is also provided to control those units 2
to 5.
Image buffer 4 plays the significant role to efficiently couple the
document readout scanning system with the recognition processing system.
As shown in FIG. 2, photoelectric transducer 3 (for example, a line sensor
3a is employed) photoelectrically transduces the character and image
information (optically scanned through a lens 3c) on document 1 on a line
by line basis at a predetermined resolution in the direction perpendicular
to the conveying direction, which document 1 has been conveyed to a
readout scanning position 3b of line sensor 3a by document controller 2.
In this manner, the information (character strings) 1a and 1b written on
document 1 are read out. Image buffer 4 is constituted by, for example,
two RAMs (random access memories) 4a and 4b, connected in parallel. The
information read out of document 1 in this manner is stored into
respective RAMs 4a and 4b on the basis of a unit of, for instance, the
character string of one line. After completion of the storage of character
strings 1a and 1b each corresponding to one line, image buffer 4
communicates with recognition unit 5, thereby subjecting the information
of character strings 1a and 1 b to the recognition processes containing
the segmentation, feature-extraction, discrimination and identification,
and the like of the characters.
From a viewpoint of processing efficiency, it is very disadvantageous that
during the time interval when image buffer 4 communicates with recognition
unit 5, the writing operation of the character and image information into
image buffer 4 is interrupted. Namely, when scanning of document 1 is
interrupted, the scanned information data is potentially damaged, or lost.
Therefore, as mentioned above, RAMs 4a and 4b constituting image buffer 4
are operated in a parallel structure, thereby allowing the writing
operation by the readout of the document and the reading operation for the
recognition process to be alternately executed in parallel.
On the other hand, requirements to improve the reading performance for such
kinds of OCR are even-increasing. For example, these requirements include
not only an increase in character categories to be read out, but also an
increase in the degree of freedom in writing hand-written characters
(namely, a degree of freedom in modification of character styles),
liberalization of the document formats, realization of a high data
processing speed and the like. However, the conventional OCR as described
above has the following problems.
First it is apparent that the time for requiring recognition of the
character and image information stored in image buffer 4 varies
considerably, depending on the character categories. That is to say, the
printed alphanumeric characters and printed KATAKANA characters can be
relatively simply recognized at a higher speed; conversely, in the case of
the hand-written KANJI characters, a long time is required for the
recognition process, since the character pattern structure is complicated,
as well as there are many character subsets and similar characters.
Such a fact can be seen by example of the document shown in FIG. 3. In this
case, document 1 contains character strings having different character
subsets, namely, KATAKANA characters, HIRAGANA characters, KANJI
characters, numerals, Roman characters, and a map. These characters are
sequentially arranged in accordance with the scanning order perpendicular
to the scanning direction and are scanned at a constant speed. As a
result, the recognition time of the character and image information is
necessarily prolonged as compared with the reading time. In this case,
even if RAMs 4a and 4b of image buffer 4 shown in FIGS. 1 and 2 are
parallel-connected, the readout operation of document 1 must be
temporarily interrupted. This is because no further readout data can be
stored in both RAMs 4a and 4b, resulting in a lower processing efficiency.
Moreover, as shown in FIG. 3, if a step 1c exists between the lines of the
character strings (KATAKANA and HIRAGANA characters), these characters
cannot be alternately written into two RAMs 4a and 4b in such a manner
that the character string of each line is separately written as a unit. In
such a case, for example, there is another disadvantage such that the
simultaneous write control is required for both image buffer RAMs 4a and
4b. On the other hand, in order to simultaneously write the character
information into image buffer RAMs 4a and 4b, there is also another
problem that the scan of document 1 needs to be interrupted until the
recognition process for the character and image information stored in RAMs
4a and 4b is completed.
Secondly, if the character strings are formated in the same direction as
the document feed direction of document 1 as shown in, FIG. 4, the
foregoing readout control cannot be applied thereto. In general, the
buffer memory capacity of image buffer 4 is designed such that the
information of the character string written in one line can be
sufficiently stored with a desired accuracy necessary for the recognition
process. However, when document 1 is fed with a skew in the document feed
direction, the readout area of the character and image information of one
line is out of the image buffer size, so that all information of the
character string of one line cannot be stored.
To prevent such a problem, according to the conventional OCR, an amount of
skew is detected in advance by the edge portion of document 1 to be
conveyed. If the skew amount exceeds a predetermined value, the
transportation of this document is regarded as an error and thus an
instruction is given to the operator to re-enter the document into the
OCR. However, the execution of such measures impedes the processing
efficiency when continuously reading a large quantity of documents.
Thirdly, as a method of continuously processing a plurality of documents 1,
the document convey paths are switched in accordance with the result of
the recognition, and documents 1 are sorted and collected. In general, to
switch over the document convey paths, documents 1 are continuously
conveyed with a predetermined time interval between the successive
documents to be continuously fed. This document feed time interval is not
negligible, compared with the length of document 1.
In the prior art OCR, the period of time required to convey document 1 by
the distance of the sum of the length of document 1 and the document feed
interval may be set as the processing unit time for a single document
(namely, the time longer than the unit time necessary to process only one
document). In spite of such compromise, in the conventional OCR, image
buffer 4 is controlled as mentioned above. Therefore, the time which can
be allocated for the recognition process must be defined by the time
necessary to scan document 1. Thus, the defined processing unit time
cannot be effectively used, resulting in a long idle time.
As described above, the conventional OCR has various problems that hinder
improvement of the document reading and recognition efficiencies.
To solve such drawbacks, one solution has been proposed that instead of
performing the line-to-line recognition control by line buffer RAMs 4a and
4b, a page buffer memory having a capacity sufficiently to cover the
entire document size, is employed.
However, when all of the information contained in the document is written
into such a page buffer memory, there is another problem that not only the
slow reading scan is necessarily required, but also very complicated
processes need to be executed to segment the desired character strings
from the information. Accordingly, a high-speed process cannot be
expected.
The above-described problems of the conventional OCR will now be summarized
as follows.
First, it is obvious that the image buffer memory in this kind of OCR has
the significant function as a buffer for matching the scanning unit (2, 3)
with the recognition unit (5).
In the OCR employing two line buffer memories alternately operable in
parallel, the time required for the information recognition is greatly
affected by the influences of the degree of freedom in the writing
operation, as well as the document format, and skew.
Conversely, the above problems may be solved to some extent by use of the
image buffer memory having capacity sufficient to cover a document of the
maximum size. However, another drawback then occurs. All of the
unnecessary information written on the document must be scanned and stored
while at the same time, the necessary information needs to be segmented
from the entire information. As a result, the whole processing time is
prolonged and a high-speed reading process cannot be expected.
Therefore, there is a need for an optical character reader with a
relatively small capacity buffer memory that can execute, with the high
performance, for example, the recognition of hand-written KANJI characters
under a constant document feed, as can be realized by only the
conventional high-performance OCR.
The present invention is made in consideration of such circumstances and an
object of the invention is to provide an apparatus for reading characters
and images in which the degree of freedom in design of the document format
can be improved, the fluctuation in recognition processing time for
various kinds of character categories can be absorbed, and the document
can be efficiently processed at a high speed.
More specifically, another object of the invention is to provide a document
reading apparatus by which a plurality of documents can be continuously
fed during the reading process at a substantially constant feeding speed,
even if these documents contain hand-written characters and/or images that
take much time for recognition.
Still another object of the invention is to provide a document reading
apparatus which employs simple recognition arrangements, even if a
plurality of documents are substantially constantly fed in the reading
process, because the image memory operable under the scroll control can
function as a buffer or damper memory.
SUMMARY OF THE INVENTION
The above and other objects of the invention can be realized by providing a
document reading apparatus comprising:
means for transporting a document subdivided into a plurality of
information fields into which character and/or image information has been
recorded in accordance with predetermined format data for the document;
reading means for reading the character and/or image information from the
predetermined information fields of the document to derive character/image
data while the document is transported by the transporting means based
upon the format data of the document;
memory means including a plurality of write regions corresponding to the
information fields of the document, for writing the character/image data
into the predetermined write regions, based upon the predetermined format
data, and for reading the character/image data therefrom;
means for recognizing the character/image data read out from the write
regions of the memory means; and,
system control means for previously storing the format data of the document
to be read, and for inspecting present writable regions within the image
memory means from which the character/image data stored has been read out
so as to permit the document to be intermittently transported by the
document transporting means prior to the reading of the succeeding
information field of the document.
Furthermore, these objects of the invention can be accomplished by
providing a method of reading a document comprising steps of:
reading character and/or image information from predetermined information
fields of the document in accordance with predetermined format data for
the document to derive electronic character/image data;
storing the electronic character/image data into predetermined write
regions of memory means corresponding to the information fields of the
document;
reading out the electronic character/image data from the predetermined
write regions of the memory means;
recognizing the electronic character/image data read out from the
predetermined write regions of the memory means; and
inspecting present writable regions within the memory means from which the
electronic character/image data stored has been read out, thereby allowing
the document to be intermittently read prior to the reading of the
succeeding information field of the document.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of these and other objects of the present
invention, reference is made to the following detailed description of the
invention to be read in conjunction with the following drawings, wherein:
FIG. 1 is a schematic block diagram of a conventional document reading
apparatus;
FIG. 2 is an illustration for explaining a relationship between the reading
area of the document and the image memory of the reading apparatus shown
in FIG. 1;
FIGS. 3 and 4 are document formats;
FIG. 5 is a schematic block diagram of a document reading apparatus
according to one preferred embodiment;
FIG. 6A shows two sheets of documents to be read;
FIG. 6B illustrates memory regions of the image memory;
FIG. 7 is an illustration for explaining a relationship between the reading
fields of the document;
FIG. 8 illustrates the document convey and reading operations;
FIGS. 9A and 9B show transfer pulses and sensor drive pulses;
FIG. 10 illustrates format data for the reading field of the document;
FIG. 11 illustrates format data of the character;
FIG. 12 is a schematic block diagram of the document controller shown in
FIG. 5;
FIG. 13 illustrates the data transfer conditions between the document
controller and the recognition control unit shown in FIG. 13;
FIG. 14 illustrates the format data of the reading field of the document;
FIG. 15 is a schematic block diagram of the address controller shown in
FIG. 5;
FIG. 16 shows control modes for the image memory shown in FIG. 5;
FIG. 17 shows a flowchart of the overall operations of the document reading
apparatus shown in FIG. 5; and
FIG. 18 shows a flowchart of the interrupt process employed in the overall
operation process shown in FIG. 17.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
BASIC IDEA
A basic idea of the document reading apparatus according to the invention
will now be summarized.
The present invention is directed to a character and image reading
apparatus for reading characters and images written on a document by
scanning the document, storing the read character and image information
into an image memory, and thereby performing the recognition and image
processes, wherein the image memory having a memory capacity in excess of
the total scanning region of the document is employed and the image memory
operation is controlled in a scroll manner as will be explained
hereinbelow.
This scroll control for the image memory may be understood by the following
three functions:
(1) The document is selectively scanned in line by line in accordance with
given format information. The character and image information is read out
of the reading area on the document which is designated by the format
information. Then, this character and image data is written into the
writing unit area in the image memory which is preset in correspondence to
the selective scanning of the document. Into the read character and image
data stored into the writing unit area, area information regarding the
reading area from which the information was obtained and also the
attribute information concerned with the characters and images are
respectively added, and thereafter this data is subjected to the
recognition and image processes.
(2) The capacity (i.e., the number of the unit regions) of the writing unit
area into which the character and image data has been written is
subtracted from the whole capacity (i.e., the total number of the unit
regions) of the image memory, thereby obtaining the capacity of the
writable area (i.e., the number of the writable unit regions) in the image
memory.
The definition of the "unit regions" in the specification should be
understood as follows. In general, since a document contains various-sized
readout fields, the corresponding unit regions of the image memory have
different sizes, or capacities. It is however apparent that if a document
contains the same-sized readout fields only, the corresponding unit
regions of the image memory have the same sizes, or capacities.
On the other hand, the number of the writable unit regions is continuously
updated while the recognition process is performed. That is to say, the
capacity of the writing unit area from which the character and image data
had already been read and recognized is added to the capacity of the
writable region.
(3) When the succeeding readout region of the document is scanned, the
capacity of the reading area on the document, designated by the above
format information is compared with that of the updated, or latest
writable area in the image memory prior to scanning this region, thereby
selecting either the document scanning, or the temporary standby mode.
The features of the invention will be briefly summarized.
First, once the scanning operation of the reading region of the document
commences, the scanning cannot be interrupted. Otherwise, problems may
occur such as distortion of the image and mechanical damage of the
document. Therefore, according to the invention, prior to commencement of
reading the next readout field (region) on the document, the residual
amount of the writable area in the image memory is checked. Depending upon
the residual amount checked, the interruption of the document feed, or the
document scanning is selectively performed.
Since the recognition operation of the read character and image data is
simultaneously performed in accordance with the readout operation, the
memory areas in which the character and image data has been stored so far
becomes empty. Thus, these memory areas can be used as new writable memory
areas. Namely, since the memory area can be always updated, the same
effect achieved in the memory of a large capacity can be obtained even if
the buffer memory of a relatively small capacity is used. Such a
utilization of the memory areas is defined as a "scroll operation" of the
memory in the invention.
Briefly stated, the document reading apparatus according to the invention
is characterized in that at least one readout field or region exists in a
document and the character and image information to be read is written in
this field. Prior to reading the information in this field, the current
writable capacity of the buffer memory is checked to determine whether the
readout operation is executed, or brought into a standby condition. That
is, the prior checking is made whether the character and image information
can be completely stored in this latest writable memory regions.
According to the invention, in accordance with the reading areas or fields
of the document which are designated by the format information or data,
the writing unit areas to store the character and image data are allocated
in the image memory and the respective character and image data is written
into this allocated writing unit area. Therefore, it is possible to
eliminate the limitations of the reading field at a fixed pitch which has
been specified in the conventional apparatus. Thus, a document format
having a high degree of freedom can be utilized.
In addition, the writable unit areas can be set in the image memory while
continuously monitoring the writable area in the image memory. Therefore,
a continuous damper function can be established for the scan of the
document and also for the recognition processes of the character and image
data. Even if the time necessary for the recognition processes of the
character and image data greatly varies, this variation can be effectively
absorbed and the document is efficiently scanned. In other words, a
plurality of documents can be continuously fed during the document reading
process at a substantially constant speed, even if these documents contain
hand-written characters and/or images that take much recognition time,
because the image memory can be operated under the scroll control and as a
buffer or damper memory. According to the invention, since the document
can be smoothly transferred during the reading process, the document
convey unit can be of a simpler design and a low cost.
Not only the process to recognize the characters but also the function to
read out the figures in an arbitrary area and the mark of an arbitrary
format can be attained similarly to the foregoing effects.
ARRANGEMENT OF DOCUMENT READING APPARATUS
Referring now to FIG. 5, a document reading apparatus 100 according to one
preferred embodiment will be described. Document reading apparatus 100
mainly includes: a document feed controller 11, a document convey or feed
unit 12; a recognition control unit 13; a photoelectric transducer 14; an
image memory 15; an address controller 16; a format data buffer 17; and a
reading controller 18 as a host computer.
The function of each unit will now be described. Document feed controller
11 controls the transportation of the document, delivers the recognized
character and/or image data to the outside, receives the control data
supplied from the outside devices (not shown) in detail, and so on.
Document convey unit 12 practically controls the feeding of the document
(shown in FIGS. 6A and 6B) and receives and outputs various statuses, or
conditions in association with the document feed under control of document
feed controller 11. Photoelectric transducer 14 photoelectrically converts
the character image on the document into the electric signal, thereby
reading and receiving the character image. The character image data which
has been binary-digitized and derived from photoelectric transducer 14 is
written into image memory 15. The character image data is written into
image memory 15 under control of the write address by address controller
16 by the control of document feed controller 11.
Format data buffer 17 stores the format data, i.e., the information for
indicating which read character and image data has been stored in the
field, or region of the image memory 15. In general, a document has its
own predetermined format, and thus, the formats may be varied in
accordance with the sort of document. Recognition control unit 13
determines whether the readout field to be recognized exists in image
memory 15 or not on the basis of the above-defined information stored in
format data buffer 17. In addition, control unit 13 also determines into
which region in image memory 15, the readout field was stored. Recognition
control unit 13 in accordance with the format data reads the character
image of the readout field to be recognized from image memory 15. The
character image is then subjected to the processes for segmentation,
discrimination, and the like of the characters. The recognized data is
returned to document feed controller 11 on a reading field unit basis.
The above description is the fundamental processing function of each
functional block.
DOCUMENT FORMAT/READING FIELD
FIG. 6A shows the character information, for example, KANJI information
written on two sheets of documents 19A and 19B which are continuously
conveyed. FIG. 6B shows the relationship between the unit for the reading
field, or region and memory area in image memory 15 when the information
is read out on a readout field unit basis of f.sub.1 to f.sub.6 and stored
into image memory 15 (FIG. 5) in accordance with the feed sequence.
As shown in FIG. 6B, in document reading apparatus 100, the character and
images written on documents 19A and 19B are sequentially read out and
input in a manner such that, for example, the character line is read out
and used as a unit of the readout field, or region. The read character and
image data are sequentially written into the respective unit writing areas
which are set, or allocated in the image memory 15 corresponding to the
reading field unit of the document.
The reading operation of the character/image information from the documents
and the writing operation of the read character/images data into image
memory 15 will now be described in detail hereinbelow.
FIG. 7 is a schematic diagram to clarify the positional relationship
between a document 21 in document convey unit 12 shown in FIG. 5 and the
principal functional device concerned with the recognition process.
Document 21 is conveyed such that the upper and left sides are used as the
base lines in the X and Y directions. Readout fields (character writing
frames) 22 and 23 in document 21 are measured as the distances from each
of the base lines, thereby performing a so-called "framing".
As previously described, readout fields 22 and 23 are measured before
document 21 is read by document reading apparatus 100 or determined as a
predetermined format in advance. The character subsets to be recognized,
which are written in readout fields 22 and 23 and other format information
are respectively given for each of readout fields 22 and 23.
READING OPERATION
Although not shown in FIG. 5, a plurality of documents 21 are input into
the hopper unit and sent out to the convey path one by one with regular
intervals by the convey take-out mechanism. In this case, the document is
conveyed synchronously with a transfer timing signal which is given from
document feed controller 11. Document 21 is fed to a reading position 24
of a photoelectric converting sensor under such document transportion
control.
A sensor 25 arranged in front of reading position 24 at a distance Y.sub.0
detects the edge of document 21 to be fed. The readout timings of the
character and image information written in readout fields 22 and 23 of
document 21 are controlled on the basis of the detection signal.
When a document detection signal is derived as shown in FIG. 8, the
relationship of the distances between a non-readout field 26 and document
21 and between a readout field 27 and document 21 using reading position
24 as a reference position at that time can be known. Therefore, when
document 21 is fed by only a distance of (Y.sub.0 +Y.sub.1) at a high
speed after the document detection signal was obtained, if the reading
operation by photoelectric transducer 14 is started, the information in
readout field 22 can be read out. This reading operation from readout
field 22 is executed over the period of time when document 21 is conveyed
by only the distance (Y.sub.2). Subsequently, after document 21 was
further fed by the distance (Y.sub.3), reading operation from readout
field 23 is similarly started.
It should be noted here that when the reading operation has once been
started, the writing operation into image memory 15 cannot be interrupted.
Therefore, as will be explained hereinafter, the document reading
apparatus is designed in such a manner that the reading and writing
operations are performed after confirming that enough memory area, i.e.,
writable memory region into which the information of readout fields 22 and
23 can be fully written has previously been prepared, or is available.
READOUT TIMING CONTROL
FIG. 9 shows a reading control timing of document 21 as mentioned above.
The document feed speed of non-readout field 26 shown in FIG. 8 is set to
be twice as high as that of readout field 27, thereby realizing the high
document feed speed. FIG. 9A shows a transfer timing signal for readout
field 27. FIG. 9B shows a transfer timing signal for non-readout field 26.
As shown in these timing charts, the periods of these transfer pulses are
set to 1/2. Thus, the period of a drive pulse for photoelectric transducer
14 is varied and this transducer is driven synchronously with the
transportation of document 21.
FIG. 10 shows an example of the transfer control information regarding the
example of document 21. This information, i.e., the format data is
produced by document feed controller 11 (FIG. 5) on the basis of the
information of readout fields 22 and 23 of document 21 which was input
from an external apparatus (not shown in detail). In accordance with the
format information, the transfer timings and the drive pulse of
photoelectric transducer 14 are controlled, respectively.
In FIG. 10, (n) denotes a number representative of the sequence of the
transfer unit. In this example, (0) to (5) are preset because two readout
fields 22 and 23 are present on a sheet of document 21.
(EF) represents a flag indicative of the final transfer unit of document
21. In this case, the flag is set to the transfer unit of (n=5).
(RF) denotes whether the transfer unit is for read-out field 27 or for
non-readout field 26. For example, "1" is set for the readout field and
"0" is set for the non-readout field. Namely, flag (RF) designates the
transfer mode.
(YS) is a value proportional to the distance from the base line (FIG. 7) in
the Y direction regarding the readout field, as will be explained in
detail hereinafter. The value, namely, the value of Y.sub.s1 in readout
field 22 is the same as the value of Y.sub.1 mentioned before. The value
of Y.sub.s2 in readout field 23 is equal to (Y.sub.1 +Y.sub.2 +Y.sub.3).
(YL) indicates each transfer unit from the Y base line. The maximum
transfer distance is specified by "FFF". The value YL is designated when
the first of the documents is conveyed. The value YL is also designated at
the end of transfer of the last readout field on the document. This
process is performed to identify all of the coordinates by detecting the Y
base line of document 21 irrespective of the size of document 21.
(XS) and (XL) denote a distance from the X base line (FIG. 7) of the
readout field in document and a width of the readout field.
(FC) designates the subset of the character written in each of readout
fields 22 and 23 to be read out, respectively. For example, as shown in
FIG. 11, the FC consists of the eight-bit data and indicates the character
subset on the basis of the information allocated to each bit position.
In the example of FIG. 11, the bit information is allocated in a manner
such that (hand-written/type), (KANJI characters), (alphabets),
(numerals), (KATAKANA characters), . . . , and the FC data is "10010000".
Therefore, this data means that the character subset to be read out of the
readout field is hand-written numerals.
In FIG. 10, the portions with oblique lines denote useless data as format
information. Therefore, in the actual document reading apparatus, the data
format is specified using flags EF and RF and the format information may
be also handled as the information from which the useless data was
removed.
CIRCUIT ARRANGEMENT OF DOCUMENT CONTROLLER
Referring to FIG. 12, a description will now be made of a circuit
arrangement of document feed controller 11 shown in FIG. 5.
A program controller 31 receives the document format data (containing the
data indicative of the readout field, data representative of the character
subset to be recognized, and the like) from reading controller (host
computer) 18 and operates in accordance with the control program stored in
program controller 31. Program controller 31 enables the information to
determine the reading or transfer mode to be set into a first flip-flop
(1st F/F) 32 and also enables the information to instruct the stop of
document feeding ("1" in the case of continuing the document
transportation and "0" in the case of stopping the document
transportation) to be set into a second flip-flop (2nd F/F) 33.
A timing generator (TG) 34 is provided to generate a drive clock pulse to
drive photoelectric transducer 14 (FIG. 5). A counter (CTR) 35 receives
the drive clock pulse and generates a transfer timing signal of document
21. Counter 35 is made operative or inoperative on the basis of the
initialization data which is selectively input through a multiplexer (MPX)
36.
For example, when the line sensor (not shown in detail) of photoelectric
transducer 14 is of the type consisting of 2048 bits (pixels), counter 35
may be realized by the 12-bit (2048 notation) counter. In the case of the
reading and transporting, this counter initializes complement of two (2)
"7FF" as the constant data and operates. When the non-readout field is
scanned, as previously described in conjunction with FIG. 9B, the counter
initializes the constant data "BFF" and operates to transfer document 21
at the double scanning speed.
The initial data is set into counter 35 by use of the carry output of
counter 35. Multiplexer 36 selects the initial data on the basis of the
output of first F/F 32.
By the operation of counter 35, the operation synchronization signals to
photoelectric transducer 14 and document convey unit 12 are produced.
The output of second F/F 33 to interrupt the transportation of document 21
is input to an AND circuit 37 to AND-gate with the carry output of counter
35. The transfer timing signal to document convey unit 12 is produced by
AND circuit 37.
Other AND circuits 38 and 39 operate in response to the outputs of first
and second flip-flops 32 and 33, thereby obtaining address control signals
(XINC, YINC) to control the write address of the image data into image
memory 15. The address control signals are supplied to address controller
16.
The value of the address control signal regarding the Y component needs to
be initialized to zero before a series of documents are continuously read
out. For this purpose, the output of program controller 31 is given as
YCLR to address controller 16.
OPERATION OF DOCUMENT CONTROLLER
The transportation of document 21, the synchronized control of the image
sensor, and the control of the image memory are performed by document feed
controller 11 arranged as described above. Thus, the document image data
in each readout field on document 21 is continuously written into image
memory 15 as shown in FIG. 6B such that the readout field is used as a
unit.
According to the preferred embodiment, in order to sequentially write the
document images of the readout fields into image memory 15, image memory
15 must be operated in a scrolling manner, since the total capacity (i.e.,
the total memory regions) of image memory 15 is relatively small. That is
to say, recognition control unit 13 needs to be sequentially initialized
to recognize the character and image data written in memory regions of the
image memory 15 before reading the successive character and image written
in the image memory. Further, the memory regions from which the character
and image data have been subjected to the recognition processes need to be
returned to the writable areas or memory regions in memory 15. Namely, the
writable areas or memory regions denote the areas where the character and
image data had already been read and recognized. Therefore, the
information indicative of the existence of the writable storage areas of
the image memory which have no data stored is the significant information
necessary for reading and scanning the readout fields of document prior to
executing the reading processes. This is because the reading of the
character and image information written in the readout field cannot be
interrupted.
The document reading apparatus is designed in a manner such that the
readout fields to be written into image memory 15 are sequentially
allocated from "0" of the Y address. Since the information of the readout
fields is sequentially repeatedly written, even if the readout fields are
the same, when the documents are different, the information of the readout
fields is stored into the different addresses in image memory 15.
Therefore, to initialize recognition | | |