|
|
|
| United States Patent | 5657402 |
| Link to this page | http://www.wikipatents.com/5657402.html |
| Inventor(s) | Bender; Walter R. (Auburndale, MA);
Teodosio; Laura A. (Derby, CT) |
| Abstract | The invention is a method for generating a still image, comprising the
steps of producing a plurality of images, each of the plurality having
been produced at a distinct focal length, scaling each of the images to a
common focal length and combining each of the scaled images to a final
image of a single focal length, portions of which are of a relatively high
resolution, as compared to the images of the original sequence. The
invention also includes combining a sequence of still images of varying
fields of view into a panoramic image of an overall field of view, which
overall field of view includes all of the fields of view of the sequence.
In addition to combining images generated at different fields of view, the
method of the invention can be used to combine images generated with
respect to different fields of view of an overall scene, such as a
panoramic scene into a combined panoramic field of view. This aspect of
the invention may also be combined with the varying focal length aspect.
Even without varying the focal length or the field of view, the invention
can be used to produce a composite image of enhanced resolution relative
to the resolution of any of the images of the original sequence. The
invention is also an apparatus for generating a still image, comprising
means for producing a plurality of images, each of the plurality having
been produced at a distinct focal length, the focal lengths differing from
each other, means for scaling each of the plurality of images to a common
focal length and means for combining each of the scaled images into a
single image of a single focal length. The apparatus of the invention also
includes apparatus to combine images generated with respect to different
fields of view of an overall scene into a combined panoramic field of
view. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5657402 |
|
|
Method of creating a high resolution still image using a plurality of
images and apparatus for practice of the method |
|
|
|
|
|
| Publication Date |
August 12, 1997 |
|
|
|
|
|
| Filing Date |
October 30, 1992 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This is a continuation in part of commonly owned U.S. patent application
Ser. No. 786,698, "Method of Creating a High Resolution Still Image Using
a Plurality of Images of Varied Focal Length or Varied Field of View and
Apparatus for Practice of the Method," filed on Nov. 1, 1991 in the names
of Walter R. Bender and Laura A. Teodosio, and assigned to the
Massachusetts Institute of Technology, now abandoned, which is
incorporated fully herein by reference. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
Claims  |
|
|
Having described the invention, what is claimed is:
1. A method for generating a signal corresponding to a still, perceptible
image representing a physical situation, comprising the steps of:
a. using electromagnetic radiation, capturing a plurality of at least three
images of said situation, each of said plurality having been captured at a
distinct focal length, said focal lengths differing from each other and
each of at least three overlapping images of said plurality sharing an
overlap region that corresponds to the same portion of said situation:
b. generating, for each of said plurality of images, an electromagnetic
signal representing said image, resultina in at least three
electromagnetic signals, each signal designated an overlap signal, each
one of said overlap signals representing said overlap region of one of
said three overlapping images:
c. transforming each overlap signal so that it represents the respective
image scaled to a common focal length and aligned to a common field of
view, said transforming step comprising the steps of, for each said
overlap signal:
i. applying to said signal at least one affine transformation comprising
the steps of:
A. ordering said plurality of overlap signals in a sequence;
B. for each sequentially adjacent pair of overlap signals in said sequence,
determining a set of affine parameters substantially defining a
transformation of the image represented by a first of said pair to the
image represented by the second of said pair;
C. for at least one of said plurality of overlap signals, combining a
plurality of said sets of affine parameters into a composite set of affine
parameters; and
D. applying an affine transformation to said at least one overlap signal
using said respective composite set of affine parameters; and
ii. generating a signal that represents said transformed overlap signal;
and
d. combining each of said transformed overlap signals into a resultant
signal that represents the combination of each of said scaled images into
a single image of said situation of a single focal length using an aspect
of each of said at least three overlap signals.
2. A method for generating a signal corresoonding to a still, perceptible
image representing a physical situation, comprising the steps of:
a. using electromagnetic radiation, capturing a plurality of at least three
images of said situation, each of said plurality having been captured at a
distinct focal length, said focal lengths differing from each other and
each of at least three overlapping images of said plurality sharing an
overlap region that corresoonds to the same portion of said situation;
b. generating, for each of said plurality of images, an electromagnetic
signal representing said image, resulting in at least three
electromagnetic signals, each signal designated an overlap signal, each
one of said overlap signals representing said overlap region of one of
said three overlapping images:
c. transforming each overlap signal so that it represents the respective
image, scaled to a common focal length and aligned to a common field of
view;
d. combining each of said filtered, transformed overlap signals into a
resultant signal that represents the combination of each of said scaled
images into a single image of said situation of a single focal length by
applying a temporal median filter to each transformed signal and using an
aspect of each of said at least three overlap signals.
3. The method of claim 2, said step of applying a temporal median filter
comprising the step of applying a weighted temporal median filter to each
transformed signal.
4. The method of claim 3, said weighted temporal median filter comprising a
filter that assigns more weight to overlap signals that represent images
that were produced at a longer focal length than to overlap signals that
represent images that were produced at a relatively shorter focal length.
5. A method for generating a signal corresponding to a still, perceptible
image representing a physical situation, comprising the steps of:
a. using electromagnetic radiation, capturing a plurality of at least three
images of said situation, each of said plurality having been captured at a
distinct focal length, said focal lengths differing from each other and
each of at least three overlapping images of said plurality sharing an
overlap region that corresoonds to the same portion of said situation:
b. generating, for each of said plurality of images, an electromagnetic
signal representing said image, resulting in at least three
electromagnetic signals, each signal designated an overlap signal, each
one of said overlap signals representing said overlap region of one of
said three overlapping images;
c. transforming each overlap signal so that it represents the respective
image, scaled to a common focal length and aligned to a common field of
view; and
d. identifying differences between pairs of signals representing pairs of
images, which signal differences are due to relative motions between pairs
of images that are due to causes other than the fact that the two images
were produced at different focal lengths said step of identifying
differences between pairs of signals that are due to relative motions
comprising the steps of:
A. estimating a first relative motion of a first pattern portion of both
signals of a pair;
A. using said estimated first motion to determine a second relative motion
of a second pattern portion of both signals;
C. repeating the following steps until a desired resolution of relative
motion is achieved:
.alpha.. using said second relative motion to specify more precisely said
first relative motion of said first pattern portion; and
.beta.. using said more precise specification of said first relative motion
to specify more precisely said second relative motion of said second
pattern portion; and
e. combining each of said transformed overlap signals into a resultant
signal that represents the combination of each of said scaled images into
a single image of said situation of a single focal length using an aspect
of each of said at least three overlap signals.
6. A method for generating a signal corresponding to a still, perceptible
image of a physical situation, comprising the steps of:
a. using electromagnetic radiation, capturing a plurality of images of said
situation, each of said plurality having been produced covering a distinct
field of view, said fields of view differing from each other and being
members of an overall field of view, each field of view overlapping at
least one other field of view, each pair of overlapping images of said
plurality sharing an overlap region that corresponds to the same portion
of said situation, and said overall field of view corresponding to a
greater extent of said situation than any single image of said plurality;
b. generating, for each of an overlapping pair of said plurality of images,
an electromagnetic signal representing said image, each signal designated
an overlap signal;
c. transforming each overlap signal so that it represents the respective
image aligned to said overall field of view, said transformation being
conducted without reference to the locations of features relative to said
overall field of view or said physical situation; and
d. combining each of said transformed overlap signals into a resultant
signal that represents the combination of each of said aligned images into
a single image of said situation of said overall field of view by applying
a temporal median filter to said transformed overlap signals representing
said aligned images.
7. A method for generating a signal corresponding to a still, perceptible
image of a physical situation, comprising the steps of:
a. using electromagnetic radiation, capturing a plurality of images of said
situation, each of said plurality having been produced covering a distinct
field of view, said fields of view differing from each other and being
members of an overall field of view, each field of view overlapping at
least one other field of view, each pair of overlapping images of said
plurality sharing an overlap region that corresponds to the same portion
of said situation, and said overall field of view corresponding to a
greater extent of said situation than any single image of said plurality;
b. generating, for each of an overlapping pair of said plurality of images,
an electromagnetic signal representing said image, each signal designated
an overlap signal;
c. without reference to the locations of features reletive to said overall
field of view or said physical situation, transforming each overlap signal
so that it represents the respective image aligned to said overall field
of view, said transformation being conducted by the steps of, for each
said overlap signal:
i. applying to said overlap signal at least one affine transformation by
conducting the steps of;
A. ordering said plurality of overlap signals in a sequence;
B. for each sequentially adjacent pair of overlap signals in said sequence,
determining a set of affine parameters substantially defining a
transformation of said image represented by a first of said pair of
overlap signals to the image represented by the second of said pair of
signals;
C. for at least one of said plurality of overlap signals, combining a
plurality of said sets of affine parameters into a composite set of affine
parameters; and
D. applying an affine transformation to said at least one overlap signal
using said respective composite set of affine parameters; and
ii. generating a signal that represents said transformed overlap signal;
and
d. combining each of said transformed overlap signals into a resultant
signal that represents the combination of each of said aligned images into
a single image of said situation of said overall field of view.
8. An apparatus for generating a signal corresponding to a still,
perceptible image of a situation comprising:
a. means for capturing a plurality of images of said situation, using
electromagnetic radiation, each of said plurality having been produced at
a distinct field of view, said fields of view differing from each other
and being members of an overall field of view, each field of view
overlapping at least one other field of view, each pair of overlapping
images of said plurality sharing an overlap region that corresponds to the
same portion of said situation, and said overall field of view
corresponding to a greater extent of said situation than any single image
of said plurality;
b. transducer means for transducing each of said images into an
electromagnetic signal representative of said image, each signal so
transduced designated an overlap signal;
c. signal processing means for transforming each overlap signal so that it
represents the respective image aligned to said overall field of view,
without reference to said overall field of view or said physical
situation; and
d. signal processing means for combining each of said transformed overlap
signals into a resultant signal that represents the combination of each of
said aligned images into a single image of said situation of said overall
field of view, said signal processing means comprising means for applying
a temporal median filter to each of said transformed overlap signals.
9. The apparatus of claim 8, said means for capturing a plurality of images
comprising a video recording device.
10. A method for generating a signal that represents a still, perceptible
image of a physical situation comprising the steps of:
a. establishing a plurality of different sampling lattices bearing no
predetermined spatial relationship to one another;
b. using electromagnetic radiation, capturing a plurality of images of said
situation, each of said plurality having been captured at a distinct time
with a different of said plurality of different sampling lattices, a
region of each of said plurality of images constituting an image of the
same portion of said situation as is captured by at least one other of
said images;
c. generating, for each of said images, an electromagnetic signal
representative of said image;
d. transforming each signal so that it represents the respective image,
aligned to a common field of view; and
e. combining at least two of said transformed signals, using a sampling
lattice of higher resolution than any sampling lattice of any image of
said plurality into a resultant signal that represents the combination of
at least two of said images into a single image having an enhanced
resolution over any of said original images.
11. The method of claim 10, said step of transforming each of said signals
comprising the steps of:
a. applying to said signal at least one affine transformation; and
b. generating a signal that represents said transformed signal.
12. The method of claim 11, said step of applying at least one affine
transformation comprising the step of generating a plurality of signals
that represent a sequence of modified image frames which have been reduced
in resolution and sampling, and applying to said plurality of signals
representing said modified frames at least one affine transformation.
13. The method of claim 11, said step of applying at least one affine
transformation comprising the steps of:
a. ordering said plurality of signals in a sequence;
b. for each pair of signals in said sequence, determining a set of affine
parameters substantially defining a transformation of the image
represented by a first of said pair to the image represented by a second
of said pair;
c. for each of said plurality of signals, combining a plurality of said
sets of affine parameters into a composite set of affine parameters; and
d. applying an affine transformation to each said signal using said
respective composite set of affine parameters.
14. The method of claim 13, said step of combining comprising the steps of
applying a temporal median filter to the corresponding signal representing
each aligned image.
15. A method for generating a signal representing a still, perceptible
image of a physical situation comprising the steps of:
a. establishing a plurality of different space time sampling lattices
bearing no predetermined space time relationship to one another:
b. using electromagnetic radiation, capturing a plurality of images, each
of said plurality having been captured at a distinct and different space
time coordinate with a different one of said plurality of different space
time sampling lattices;
c. generating, for each of said images, an electromagnetic signal
representative of said image;
d. transforming each signal so that it represents the respective image,
aligned to a common field of view; and
e. combining each of said transformed signals into a resultant signal that
represents the combination of each of said images into a single image of a
common field of view having a higher resolution than any of the original
plurality of images.
16. A method for generating a signal corresoonding to a still, oerceotible
image representing a physical situation, comprising the steps of:
a. using electromagnetic radiation, capturing a plurality of at least three
images of said situation, each of said plurality having been captured at a
distinct focal length, said focal lengths differing from each other and
each of at least three overlapping images of said plurality sharing an
overlap region that corresponds to the same portion of said situation;
b. .circle-w/dot. generating, for each of said plurality of images, an
electromagnetic signal representing said image, resulting in at least
three electromagnetic signals, each signal designated an overlap signal,
each one of said overlap signals representing said overlap region of one
of said three overlapping images;
c. transforming each overlap signal so that it represents the resoective
image, scaled to a common focal length and aligned to a common field of
view, said transforming step comprising the steps of:
i. ordering said plurality of overlap signals into a sequence;
ii. for at least two of said overlap signals, applying to said at least two
signals at least one affine transformation;
iii. wherein for at least one of said at least two of said overlap signals,
said at least one affine transformation transforms said signal so that it
represents the respective image aligned to a common field of view with an
image of said sequence that is distant from said respective image in said
sequence; and
iv. generating a signal that represents said transformed overlap signal;
and
d. combining each of said transformed overlap signals into a resultant
signal that represents the combination of each of said scaled images into
a single image of said situation of a single focal length, said single
image being arranged according to a combination sampling lattice that
defines a plurality of pixels, using an aspect of each of said at least
three overlap signals at each pixel of said single image that represents
said combined overlap signals.
17. A method for generating a signal corresponding to a still, perceptible
image representing a physical situation, comprising the steps of:
a. using electromagnetic radiation, capturing a plurality of at least three
images of said situation, each of said plurality having been captured at a
distinct focal length, said focal lengths differing from each other and
each of at least three overlapping images of said plurality sharing an
overlap region that corresponds to the same portion of said situation:
b. generating, for each of said plurality of images, an electromagnetic
signal representing said image, resulting in at least three
electromagnetic signals, each signal designated an overlap signal, each
one of said overlap signals representing said overlap region of one of
said three overlapping images;
c. transformina each overlap signal so that it represents the respective
image, scaled to a common focal length and aligned to a common field of
view, said transforming step comprising the steps of, for each said
overlap signal;
i. applying to said signal at least one affine transformation comprising
the steps of:
A. ordering said plurality of overlap signals in a sequence;
B. for each sequentially adjacent pair of overlap signals in said sequence,
determining a set of affine parameters substantially defining a
transformation of the image represented by a first of said pair to the
image represented by the second of said pair;
C. for at least one of said plurality of overlap signals, applying a fixst
affine transformation to said at least one overlap signal using said
affine parameters determined with respect to said overlap signal as the
first of a pair and an adjacent overlap signal as the second of said pair
to generate a first transformed overlap signal; and
D. applying to said first transformed overlap signal a second affine
transformation using said affine parameters determined with respect to a
second pair of overlap signals that comprise:
.alpha.. said adjacent overlap signal as the first signal of said second
pair; and
.beta.. another overlap signal, different from said at least one overlap
signal, as the second signal of said second pair; and
ii. generating a signal that represents said transformed overlap signal;
and
d. combining each of said transformer overlap signals into a resultant
signal that represents the combination of each of said scaled images into
a sinale image of said situation of a single focal length using an aspect
of each of said at least three overlap signals. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates in general to a method for creating a high
resolution still image, using a plurality of images and an apparatus
therefor. In particular, the invention relates to a method for creating a
still high resolution, fixed focal length image, using a plurality of
images of various focal lengths, such as a zoom video sequence. The
invention also relates to creating a still panoramic image from a
plurality of images of a field of view less than that of the still
panoramic image. The invention also relates to creating a high resolution
still image from a plurality of images of the same scene, taken over a
period of time during which some portions of the scene do not change.
In the field of image processing, it is often desirable to create a still
image of a scene. In a typical case, the image will be of a certain
resolution, which depends on the coarseness of the recording medium and
the focal length of the equipment by which the image is captured. Video
equipment is now relatively inexpensive and simple enough for many people
to use. Video recording equipment has certain advantages over still image
rendering, such as still photography. An activated video camera will
capture all events within its field of focus, rather than only those that
the photographer chooses to capture by operating a shutter. Thus, in fast
moving situations, such as sporting events, or unpredictable situations,
such as weddings and news stories, it is often beneficial to set up a
video camera to be constantly recording, and then choose selected still
shots at a later time. Unfortunately, the resolution of even a very good
video signal is only on the order of 480 lines per picture height by 640
samples per picture width. (A video signal is, itself, continuous across a
scanline. However, for display, it is sampled along the length of a
scanline.) This resolution is inadequate for a quality rendering in many
cases, particularly if the original image is shot at a relatively short
focal length. If the image were to be blown up, it would be relatively
blurry. Similarly, other image capturing techniques, such as moving film,
involve a specific degree of resolution. Blowing up the image necessarily
entails loss of resolution per unit area over the entire scene.
For instance, a scene of a solo instrumentalist on stage in front of a
piano, playing to an audience may be desired, showing the audience. If the
image capturing device is a video device, the wide angle image showing the
audience will be resolved at the video standard mentioned above. The
resolution over the entire image is the same. Thus, the rendering of the
soloist will be as coarse as the rendering of the rest of the scene. For
example, if the soloist takes up a space of one sixteenth of the image, it
will be rendered using 120 lines in the vertical direction and 160 samples
in the horizontal direction. Less important aspects of the scene, for
instance empty chairs in the back row, will be rendered at the same
resolution. FIG. 1 shows schematically the focusing of a scene on a focal
plane in connection with two different focal lengths. The full width of
image 2 is focused on focal plane 4, if the focal length f.sub.w is
relatively short.
It is, of course, possible to render the soloist at a higher resolution
(i.e. a greater number of lines in the vertical direction and more pixels
in the horizontal direction), by "zooming in" on the soloist and capturing
the image of the soloist at a longer focal length. As shown in FIG. 1, the
focal length f.sub.T is longer than f.sub.w. However, only the central
portion 6 of image 2 is focused on focal plane 4. Much of the scene is
lost, because it focuses outside of the scope of the focal plane. The
image of the soloist is enlarged to fill more space, and some of the
perimeter of the former image is not captured.
It is known to enhance pictorial data by combining two channels of data; a
first channel having a high spatial resolution (i.e. relatively many
picture elements per inch) and a relatively low temporal resolution (i.e.
relatively few frames per second) and a second channel having a lower
spatial resolution and a higher temporal resolution. The resultant
combination achieves a spatial and temporal resolution approaching the
higher of both, while requiring the transfer of less information than
would ordinarily be required to transmit a single image sequence of high
temporal and spatial resolutions. See Claman, Lawrence N., A Two-Channel
Spatio-Temporal Encoder, B. S. Thesis submitted to the Department of
Electrical Engineering and Computer Science at The Massachusetts Institute
of Technology, May 1988.
The known techniques are not conducive to the task at hand, namely
enhancing the resolution of various spatial portions of a still figure
beyond that available in the rendering captured at the shortest focal
length. The Claman disclosure uses fixed focal length images and vector
quantization, and results in a still frame of resolution and field of view
no greater than that of the original high spatial resolution images.
A related problem arises in connection with capturing the maximum amount of
information available from a scene and generating a signal representative
of that information, and later recovering the maximum available amount of
information from the signal. It is desireable to be able to provide the
highest resolution image possible.
It is also desireable to be able to provide a panoramic view of a scene,
maintaining a substantially common focal length from one portion of the
panoramic view to another. The known way to do this is to move a video
camera from one side of a panoramic scene to another, essentially taking
many frames that each differ only slightly from the preceding and
following frames. Relative to its adjacent neighbors, each frame differs
only in that the left and right edges are different. Most of the image
making up the frame is identical to a portion of the image in the
neighboring flames. Storage and navigation through these various images
that make up a panoramic scene requires a huge amount of data storage and
data access. This known technique is undesirable for the obvious reasons
that data storage and access are expensive. It is further undesireable,
because most of the data stored and accessed is redundant. Image capture
devices that are currently used to capture panoramic spaces include a
moving glubuscope camera or a volpi lens.
It is also desireable to be able to both pan from one location in a scene
to another, and to zoom at the same time. The drawbacks of known methods
certainly create an undesireable situation with respect to such a
combination.
OBJECTS OF THE INVENTION
Thus, the several objects of the invention include to provide a method and
apparatus for creating a relatively high resolution still image that: does
not require capturing information at the high resolution over the range of
the entire image; that can produce an image of higher resolution image
than any image in a sequence used to compose the high resolution image;
that does not require collecting information with respect to large parts
of the image that are of only minor interest; that can take as an input a
sequence of standard video images of varying focal length or field of
view; that can take as an input a sequence of standard film images; that
allows enhancing the resolution of any desired portion of the image; and
which can be implemente | | |