|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of Invention
The present invention relates generally to an improved system for creating
a full 360-degree virtual reality panorama from rectilinear images.
A panorama is a compact representation of the environment viewed from a 3D
position. While an ordinary image can capture only a small portion of the
environment, a panorama can capture it all, or any portion of it,
depending on the geometry in which the panoramas are represented. Recently
there has been an explosive popularity of panoramas on the world wide web
and in multimedia as an effective tool to present a photo-realistic
virtual reality. However, creating high-quality panoramas, especially
those that completely enclose space, has been difficult
2. Description of Related Art
Various systems have been proposed for simulating a virtual reality
environment using photographic quality images. Many virtual reality
environments use 3D models or mathematical equations to create a simulated
world. The user explores this simulation in real time. Though 3D modeling
via equations has certain advantages, such as a depiction of a scene from
any arbitrary vantage point, creating images from equations generated by a
computer is seriously limited by the speed of the computer. To avoid this
problem, technology such as QuickTime.TM. VR from Apple Corporation uses
images that have already been produced, either photographically or
generated by a 3D modeling program, and stored in secondary memory.
Software only has to read the image files from a disk and display the
scene as needed, rather than calculating the scene from mathematical
models. However, a limitation of the QuickTime.TM. VR program is that it
requires that the view direction for all photos reside in a single plane,
such as that obtained by rotating a camera on a tripod. It also requires
that the vertical field of view (or equivalently, the focal length) be
known, and that there be roughly equal angular increments between one
photo and the next.
Further, a panoramic movie or image can be created using specialized
hardware, such as with a panoramic camera or a fisheye lens camera.
However, such hardware is inconvenient for the average novice
photographer. In the alternative, software can be used to simulate a
panorama. This obviates the need for specialized hardware.
Though various software programs have been proposed to simulate panoramas
without the use of special hardware, these programs have certain serious
drawbacks that have not been successfully overcome to date. These include,
but are not limited to, unrealistic representations of images, lack of
proper registration and calibration of images, lack of proper blending of
images, and slow speed in registering, calibrating and blending images to
create a panorama.
SUMMARY OF THE INVENTION
Accordingly, one aspect of the present invention is to provide an improved
system and method for overcoming the drawbacks of prior techniques
discussed above.
Another aspect of the present invention is to provide for the registration,
calibration and global optimization of images, preferably captured from a
substantially single nodal position. The solution to creating a full
360-degree panorama quickly and seamlessly is divided into three steps.
The first step registers all overlapping images projectively. A
combination of a gradient-based optimization method and a
correlation-based linear search has proved to be robust in cases of
drastic exposure differences and small amount of parallax. The second step
takes the projective matrices and their associated Hessian matrices as
inputs, and calibrates the internal and external parameters of every image
through a global optimization. The objective is to minimize the overall
image discrepancies in all overlap regions while converting projective
matrices into camera parameters such as focal length, aspect ratio, image
center, 3D orientation and the like. Improved techniques for global
optimization are disclosed that give order of magnitude improvements over
prior systems of optimization. The third step re-projects all images onto
a panorama by a method employing Laplacian-pyramid based blending using a
Gaussian blend mask generated by the grassfire transform. The purpose of
the blending is to provide a smooth transition between images and
eliminate small residues of misalignments resulting from parallax or
imperfect pairwise registrations. The invention further provides for human
interaction, where necessary, for initialization, feedback and manual
options.
Further, the present invention, unlike some of the prior art, allows for
multiple views, from multiple planes and rows of images, and allows for
the arbitrary orientation of photographic images to be constructed into a
panorama, without specialized hardware such as a tripod or fisheye lens.
In addition, the present system and method can be several orders of
magnitude faster than the prior art.
The numerous aspects of the invention described herein result in a system
for registration, calibration and blending that creates high quality
panoramas from rectilinear images that is up to several orders of
magnitude faster than prior systems. In one calculation, the present
invention is up to 100,000 times faster than prior techniques. As a
consequence, the present invention could be used to construct panoramas
much quicker than previous methods. These panoramas can be used in
applications where real-time image rendering is important, such as in
real-time 3D virtual reality, the construction of background images,
computer animation, multimedia, and the like.
The above described, and many other features and attendant advantages of
the present invention will become apparent from a consideration of the
following detailed description when considered in conjunction with the
accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Detailed description of preferred embodiments of the invention will be made
with reference to the accompanying drawings.
FIGS. 1(a) and 1(b) is an artist's rendition of a composite photograph
before and after the application of the present invention.
FIG. 2 is a generalized flowchart of the various function modules
comprising one embodiment of the present invention.
FIG. 3 is a generalized flowchart of the method of operation of the
Pairwise Registration function module of one embodiment of the present
invention.
FIG. 4 is a generalized flowchart for the method of operation of the
Calibration and Global Optimization function module of one embodiment of
the present invention.
FIG. 5 is a generalized flowchart for the method of operation of the
blending function module of one embodiment of the present invention.
FIG. 6 is a screen shot of a user interface dialog window for the user
interface of one embodiment of the present invention.
FIG. 7 is a conceptual illustration on the problem of finding the proper
Laplacian pyramid level using the minor axis of an inertial ellipse.
FIG. 8 is a graphical illustration of the transition lengths for different
frequency image components (low, middle and high) used in one embodiment
of the blending function module of the present invention.
FIG. 9 conceptually illustrates the weighted average method for blending.
FIGS. 10(a) and (b) illustrate the blend mask used for blending two images
during the blending phase of one embodiment of the present invention.
FIGS. 11(a) and (b) illustrate a particular problem overcome during
blending in one embodiment of the present invention.
FIG. 12 illustrates a particular virtual reality orientation of images for
the user interface for one embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Disclosed herein is a detailed description of the best presently known mode
of carrying out the invention. This description is not to be taken in a
limiting sense, but is made merely for the purpose of illustrating the
general principles of the invention. The section titles and overall
organization of the present detailed description are for the purpose of
convenience only and are not intended to limit the present invention.
Turning to FIG. 1, there is shown a simulation of different overlapping
rectilinear images, or 2D photographs, framed by dashed lines during the
authoring portion of the present invention, as indicated by dashed lines
110. Where the images overlap there is potential for misalignment when
constructing a 3D panorama, as indicated by blurry lines 112, for a
variety of reasons, including the arbitrary position of the camera, errors
in internal and external camera parameters, and, distortions that occur
when warping a 2D image to construct a 3D image space. The present
invention is designed to calibrate and align all such 2D rectilinear
images with respect to one another and globally, blend the images where
they overlap, and construct a reconstructed and relatively error free 3D
panorama image, shown conceptually in 2D form as FIG. 1(b), for any
arbitrary geometry.
FIG. 2 discloses a generalized flowchart for overall operation of the
invention. The invention is part of a system 210 comprising a computer,
having all necessary hardware, such as processing unit 212, e.g., a G3
microprocessor chip; I/O 214, such as a keyboard 216, video monitor 218,
and mouse 217; memory 220, which may be any sort of memory buffer,
preferably primary memory, e.g. RAM, that can be cached to secondary
memory, such as a hard drive. The system 210 is controlled by a program
residing in system memory 220, which also stores output data and other
data. The program is preferably written in the C or C++ language, using
classes, structures, functions, calls, translation units, headers,
subroutines, modules, and other features of structured programming, where
appropriate, of both data and source code, suitably compiled and in
executable form, in accordance with the teachings of the present
disclosure to practice the invention. The present invention is also
suitable for implementation with an interpreted language such as Java.
In constructing a panorama from rectilinear images, the system finds
solutions to three sub-problems: (1) the projective registrations of
overlapping images (shown as the "local pairwise registration" box 222 in
FIG. 2), (2) calibration and global optimization of these images, a
self-calibration in which 2D image planes are positioned as 3D planes in
space (shown as the "calibration and global optimization" box 224 in FIG.
2), and (3) the composing or blending problem in which images are ready to
be reprojected to a 3D environment map with pixels in overlap regions
being composed from multiple images, to smooth any transitional
discontinuities (shown as the "blending" box 226 in FIG. 2). Finally,
there is the projection or construction of the assembled panorama onto a
3D geometry surface, such as a cylinder, cube or sphere (defined as the
"projection" box 228 in FIG. 2).
The solutions to these sub-problems are performed by software function
modules 222, 224, 226, 228 residing in memory 220 and operating the
processor 212. The modules are designated, as explained further herein,
the pairwise registration function module 222, the calibration and global
optimization function module 224, the blending function module 226, and
the projection function module 228. A user interface module 230, also
residing in memory 220, may interact with the other modules to pass data
to and from the modules, and accept input from a human user of the system.
The modules may receive data from memory, manipulate that data as
described herein, and output the data to other modules. The three modules
222, 224 and 226 may perform feedback to pass data back to previous
modules, as indicated by arrows 233, and as described below. Although in a
preferred embodiment the modules are programmed as separate software
routines or classes, the modules may be combined into one module
performing all the designated tasks performed by separate modules.
As a final step, the fourth module, the projection function module 228,
constructs a panoramic scene by projecting the blended image onto any
designated geometry view surface, typically a cubic, polyhedral,
cylindrical or spherical surface. The projection module may be controlled
through the user interface 230 as well, to allow a user to select what
geometry will be projected onto and to control and modify other factors,
including the use of photo re-touching software such as PhotoShop.TM. for
modifying the final panorama.
Generally, the local registration, self-calibration and global
optimization, and blending involve a multi-step procedure.
First, regarding the initial local registration, and referring generally to
the generalized flowchart of FIG. 3, the system 210 reads in each
overlapping rectilinear image into main memory 220, as indicated by step
312. The images are assumed to roughly share a common nodal point (i.e.,
that point in the three-space where all rays of light converge through a
lens) with other overlapping rectilinear images. The object of the program
during local registration is to register the locally overlapping images,
by comparing common overlapping areas between overlapping images at
certain predetermined resolution levels on a Gaussian pyramid representing
the overlapped images. Different combinations of overlapping areas are
tried to achieve the optimal overlap between images (or, equivalently, the
smallest error in the error function or pairwise objective function
described herein) using the steps described herein, which generally
minimizes the average squared pixel intensity (e.g., brightness and
contrast) difference with respect to certain transformation parameters.
Initial values for parameters used in optimizing the pairwise objective
function are assumed by the computer, as indicated in step 314. The
initial values may optionally be input by a user, e.g., with a user
interface 230 as in FIG. 2, and in response to a user dialog window such
as of the kind shown in FIG. 6. Besides the global orientation (pan, tilt
and roll) the other parameters that are most likely to give instability in
convergence of the error function are bad initial estimates of the
brightness and contrast, as well as of the geometric image center of
projection of the overlapping rectilinear images. Certain parameters most
likely to create instability in the convergence of the local error
functions can be controlled (e.g., progressively dampened at different
levels of the Gaussian pyramid) to ensure convergence, as indicated by
step 318. The overlapping images are then perturbed and the local error
function with respect to these and other variables is calculated until a
minimal local error function is found, as indicated by step 320. The
minimal local error function is then stored for a particular level of the
Gaussian pyramid, as indicated in step 322, for each pairwise
registration, and is saved and later used to compute a global error
function for all the overlapping images. The local pairwise registration
module 222 iterates until the entire Gaussian pyramid is traversed,
starting from the coarsest level of the pyramid (sometimes called the
bottom, where the pyramid can be standing on its inverted top) and working
to the finest level resolution, as indicated in decision box 324. It
should be noted that at any stage throughout the registration, and
throughout the invention in general, the system may check for a user
interruption, through the user interface, that would require immediate
attention from the processor, such as to allow the user to interactively
adjust the parameters to avoid divergence or convergence to an undesired
local minimum.
As indicated in box 316 of FIG. 3, it must be determined at what level in
the Gaussian pyramid to start the local pairwise registration. One way to
find the lowest level is to select the resolution level at which it is
found that the images share at least some arbitrary number of overlapping
pixels, e.g., preferably about 30 pixels from each side, e.g., preferably
no less than 30 pixels across the overlapping area. If greater than 60
pixels of overlap is found in these areas, the size (resolution) of the
overlap region is decreased by half (going deeper into the pyramid) and
the procedure of the present invention is reiterated again. If, on the
other hand, the overlap is less than 30 pixels, then the size (resolution)
of the overlap region is increased by doubling. By utilizing
multi-resolution registration of overlapping images by way of the Gaussian
pyramid, convergence to the desired optimum is accelerated, and false
local minima are avoided.
On occasion, it may be visually apparent to a user that during registration
the images are not converging optimally. In this case user input may
manually abort the pairwise registration procedure, and the user may
manually help align the images closer before resuming automatic
registration, as before. This manual intervention is true for all aspects
of the invention. Nevertheless, the present invention is surprisingly
robust, and manual intervention is not a prerequisite for the invention to
work.
Non-optimal convergence or divergence has sometimes been found to be the
case whenever images for a spherical projection are used, especially those
in the "pole" regions of the sphere (though in general the invention can
adjust quite nicely for images that wrap around the poles). Divergence
sometimes results when the initial default parameters chosen are wildly
off or not suitable for convergence. During such instability, the images
will appear to a user to "run away" from each other. In this case, and
throughout the invention, provision may be provided in the user interface
230 of the embodiment of FIG. 2 of the present invention for manual
intervention, such as to abort the program, for the manual selection and
relative positioning of the images to be pairwise registered, and for the
selection and relative positioning of overlapping images for blending.
The iterative method of moving down a pyramid when an overlap region is
greater than, say, 30 pixels, is an attempt to prevent instability in the
error function due to problematic parameters, such as initial value errors
in the image center of projection of the images being registered, and
errors in setting initial brightness and contrast values. Techniques of
damping and annealing of problematic parameters (with damping
progressively diminished and finally set to zero as one moves up the
pyramid to finer levels) can be used to stabilize the local error function
for these problematic terms, as explained further herein.
One improvement over prior techniques has been to save the local error
function values and use them to compute and optimize the global error
function needed for optimization. This improvement also avoids having to
evaluate the entire global error function (global objective function) from
scratch. The pairwise objective functions (local error functions) are
approximated by a quadratic Taylor series, and, together with the chain
rule, the global objective function (global error function) is minimized.
Calculation of the global error function is greatly speeded up by this
procedure.
Further regarding rectilinear images taken in a non-arbitrary manner (such
as from a tripod that is rotated, or a photographer who manually "pans" a
field of view), the number of pyramid levels and optimal direction for the
blending of images overlapping in a region can be computed by the present
invention by computing the minimum eigenvalue of the 2.times.2 inertial
tensor of the overlapping region between two images. It has been found in
practice that for an arbitrary polygon representing an overlapping region,
the optimal direction for blending, as well as the width of the blending
region (which determines the level in the pyramid at which to start the
method of registration and optimization) is found along the minor axis of
the inertial ellipse found from solving for the inertial tensor of the
overlapping images. A similar method of finding the proper pyramid level
is by solving for the smallest eigenvalue of an inertial tensor of the
overlap region between images. Conceptually, such an ellipse is shown in
FIG. 7. The blending region in an arbitrarily shaped polygon region 710,
which represents the area of overlap between overlapping images, lies
along the width and direction of the minor axis 712 of the ellipse 714,
which is calculated from the inertial tensor of the overlapping images
forming the polygon.
Thus, the results from computing the inertial tensor are used to determine
the pyramid level, blending width, and blending direction. The smallest
inertial eigenvalue is used to determine the number of pyramid levels. One
could also use the eigenvalue vector (eigenvector) to determine the
direction, or, preferably, use a blending mask, as explained herein, that
yields a grayscale ramp, which defines direction in a direction field from
taking the grayscale ramp gradient.
Next, after pairwise local registration, global optimization is used to
remove any inconsistencies. The parameters found at the local registration
level, generally from six to ten parameters per overlapping image pair,
are optimized globally. Various constraints ensure optimization whenever
there are suitable overlapping image pairs, as the number of independent
parameters is usually less than the constraints. Regarding optimization in
general, to ensure the best chance for convergence of a solution a
combination of simulated annealing and progressive damping is used, as
described herein.
Global optimization is necessary because noise in the images will yield
inconsistencies in cyclically overlapping sets of images (e.g., that A is
pairwise registered with B, B with C, and C with A, does not necessarily
mean A and C are properly globally registered with C and B). During the
global optimization phase, the discrepancies are distributed among all
image pairs in such a way as to minimize the increase in total error. The
Hessian matrix, computed in the local registration phase, as described
further herein, provides a description of the shape of the error function
landscape in the neighborhood of each pairwise optimum. Parametric
perturbations occur along the valleys of this landscape (as indicated by
the Hessian), where the increase in error is minimal. One way of looking
at this solution is to say that knowledge gathered from the pairwise
registration optimization is saved and used for global optimization,
avoiding additional computation.
Turning attention to FIG. 4, there is shown a generalized flowchart for the
calibration and global optimization module of the present invention. Data
from the local pairwise registration module 222 is provided to the
calibration and global optimization module 224, as indicated by box step
402. Such data can include the pairwise objective functions (local error
function values) for each pairwise registration found previously. The
global objective function is calculated from such data, as indicated by
boxes 404, 406, and as described more fully below. The alignment of images
globally is checked to ensure global registration, as in decision boxes
410 and 412, which may accept manual input from a user, via a user
interface module. If there is alignment, the system proceeds from the
calibration and global optimization function module to the blending of
images. Otherwise, as illustrated by boxes 418 and 420, the pairwise
registration module parameters (such as shown by box 314 in FIG. 3) may be
re-initialized and the pairwise registration module reexecuted to
recompute the pairwise registration of images, using better, updated
camera parameters as determined from the global registration procedure as
shown in FIG. 4.
During the blending step of the procedure utilized by the system of the
present invention, the image overlap regions are "blended" or the images
are "stitched" together, so that the high frequencies (e.g., sharp lines
of contrast, such as edge boundaries, analogous to the high frequency
signals associated with a square wave) are blended over a narrow blend
region range, and the low frequencies (e.g., illumination variations,
analogous to DC baseband signals) are blended over a wide range. In this
way the images are seamlessly integrated to form a panorama in an
aesthetic manner.
In one preferred embodiment of the invention, as illustrated by the steps
in FIG. 5, blending is performed by determining the coarsest level in a
Laplacian pyramid for which to begin blending of images I, J (box 504),
and constructing a Laplacian pyramid at this level (box 506). One of the
two images may be comprised of previously blended images. A blend mask
boundary is generated (box 508), defining the boundary over which blending
is to occur, preferably using the "grassfire" transform method. Next, a
blend mask is generated by a Gaussian pyramid method (box 508). In one
preferred embodiment, the overlapping images I and J that are to be
blended are put into a Laplacian pyramid and multiplied by the blending
mask, or its compliment, respectively (box 510). The resulting product is
repeatedly added together to each level of the Laplacian pyramid (box
512), moving up to the finest level resolution of the pyramid in a
sequential fashion (decision box 514), until a blended image is achieved
for the two images. A similar procedure is performed for all other images
that overlap, as indicated by decision box 516. This preferred technique
has the advantage over prior techniques in that low frequency component
images are blended over a wider area region, giving a smoothing effect, as
desired for low frequency components, while high frequency components
(such as sharp edges) are blended over a smaller blend region, giving a
"sharpening" effect for these high frequency components, exactly as
desired. This is illustrated conceptually by FIG. 8.
Authoring a panorama from 2D images can be thought of as divided into two
different phases: (1) orientation of originally 2D images into 3D space,
and (2) the projection of a panorama onto a particular 3D geometry, that
can later be used to project views of the panorama onto a 2D viewing
plane. A series of preferably overlapping photographs are analyzed to
determine what orientation the photographs were taken in order to
establish a common ground for subsequent operations, including the
construction of a panorama. The panorama is not ordinarily meant to be
viewed by a user, only the subsequent projection of the panorama onto a
viewing plane is viewed by the user. The panorama is constructed on a
particular geometry that will best facilitate the subsequent step
(sometimes termed rendering) of the projection of the panorama from the
particular geometry onto a chosen viewing plane for viewing by a user.
Typical geometries in the system of the present invention on which
panaoramas are formed include: cubic, polyhedral, cylindrical and
spherical geometries. However, any type of geometry may be used, such as
two frusto-conical cones joined at their base with the apexes pointing
away from one another; any quadric surface, and any and all of the
geometries that employ the following projections: equidistant,
equiangular, ellipsoid, Mercator (and all derivatives thereof, e.g.,
transverse Mercator, Oblique Mercator, and the like), cylindrical
equal-area, Miller cylindrical, equidistant cylindrical, Cassini (e.g.,
both for spherical and ellipsoid projections, and the like), all conic map
projections, e.g., Albers equal-area, Lambert conformal conic, equidistant
conic, bipolar oblique conic conformal, polyconic, Bonne, all azimuthal
and related projections, e.g., orthographic, sterographic, gnomonic,
general perspective, Lambert azimuthal equal-area, azimuthal equidistant,
modified-stereographic conformal, all space map projections, including
space oblique Mercator and satellite-tracking projections, all
pseudocylindrical and other miscellaneous projections, including Van der
Grinten, sinusoidal, Mollweide and Eckert IV and VI projections. The
foregoing list is meant to be illustrative and not exhaustive of the
geometries and projections possible during the construction and employment
of panoramas using the system of the present invention.
Further, the present invention can be employed in future systems that are
fast enough to eliminate the need for a projection function module 228,
and proceed directly from pairwise registration, calibration and global
optimization to the viewing and blending of the panorama on a chosen
viewing plane, without loss of generality. Presently, however, there is
not sufficient computing power in most desktop computers for this to be
feasible for real time applications.
Furthermore, the system of the present invention has means for a user
interface for all phases of the invention. A user may select, among other
things, which images are to be registered, and at what arbitrary image
plane. The user interface, suitable for display on a computer monitor and
with input from a keyboard, mouse pointer, or other I/O device, has fields
for any and all internal and external parameters of the projection matrix
of the images, including aspect ratio, number of rows of images, the tilt
between rows, the angle between photos within a row, the roll of each
image taken (e.g., landscape mode), as well as fields for how many
horizontal rows of images are to be registered (typically two or more),
image center position, focal length of camera, camera orientation with
respect to a common reference frame, such as camera pan, tilt, roll and
skew, and the brightness and contrast of images. The user interface may
have the ability to adjust the aforementioned parameters for each image
individually, or may have the ability to adjust parameters for images
captured with a particular methodology, such as equal angular increments
in latitude and longitude.
Thus, turning attention now to FIG. 6, there is shown a screen shot of a
user interface for the present system, suitable for initialization
parameters for an ensemble of images, captured with an equal angular
increment methodology that can be facilitated by use of a tripod. The user
interface is particularly useful for authoring panoramas when a user
wishes to adjust the automatic default orientation generated by the
computer for the present invention. A more complex user interface may be
provided to accommodate the initial orientation and placement of a
free-form set of images, i.e. a set of images captured without any
particular methodology, such as that captured using a hand-held camera. A
plurality of parameters of the kind described herein may be manually
entered into the dialog box fields by the user (if known) to aid in the
pairwise registration, calibration and global optimization and blending of
images. Further, the particular images selected for pairwise registration
and calibration and global optimization may be aborted during
non-convergence run-away conditions. Some of the parameters that may be
explicitly specified by a user include (referring to FIG. 6) the number of
rows 602 (assuming a panning of photos are taken, with a number of
overlapping rows of photos taken about a 360 degree arc), initial pan 604,
initial tilt 606, pan increment 608 and initial roll 610 for the first row
of photos (rows are preferably used, but vertical columns of photos are
also contemplated); and, for inter-row parameters, the pan increment 612,
roll increment 614 and tilt increment 616. Zoom lens distortion factors
and other miscellaneous factors may be entered in dialog fields such as
field 620. Other parameters may be specified for overriding the computer
defaults and for better guaranteeing convergence, such as camera focal
length or f-stop, pixel aspect ratio, and the like. In addition, the user
interface may allow for the selection, arrangement and relative
positioning of photos to be composed into a panorama, preferably in a row
by row layout, with preferably at least one or more rows of photos to be
made into the panorama.
Further, while one preferred embodiment of the present system is designed
for overlapping image pairs to share a common nodal position, in general
the system is forgiving of nodal point changes, provided the changes are
not excessive, e.g., a 1% shift in nodal point should still allow
satisfactory panoramas to be formed. The one-percent shift is not overly
cumbersome when taking long range photos, e.g., from the height of the 300
m Eiffel Tower a 1% shift will allow up to 3 meters (nearly 10 ft) of
shifting of camera nodal position when taking photos at the base, which
can accommodate an amateur photographer taking pictures without a tripod.
Turning attention again to the three modules labeled Pairwise Registration,
Calibration and Global Optimization and Blending, as illustrated in FIG.
2, the authoring aspect of the invention will be further described.
I. Pairwise Registration
To find a solution to the first sub-problem posed in constructing a
panorama from rectilinear images, finding the projective registrations of
overlapping images, one must pairwise register the two images. Pairwise
registration can be thought of as synonymous to finding an estimate of the
projective transformation relating two given overlapping rectilinear
images. The projective transformation may be represented by a particular
parametrized projective matrix, that is parametrized by a typical
canonical number (usually 8 or 9) projective parameters, e.g., 3D rotation
parameters (pan, tilt roll), center of projection of images, ratio of
focal lengths, and the like. The projective matrix can be defined as a
particular case of a three-dimensional affine transformation, a
transformation that effects rotation, scaling, shear and translation, with
the restriction that camera motions are rotational. As is known per se,
transformations are important tools in generating three-dimensional
scenes, in moving objects around in an environment, and in constructing a
two-dimensional view of the environment.
The mathematics described herein are but one representation for the method
carried out by the apparatus of the present system for one or more
preferred embodiments of the invention. In mathematics the same phenomena
can be represented in different symbolic notation--which often appear to
the untrained eye as being radically different from one another--without
changing the nature of the phenomena described. For example, as explained
above and represented further below, the projective transformation of a
pair of images can be more particularly characterized as a projective
transformation that is particularized as a projective matrix parametrized
by a certain projective matrix parameters (typically having 8 or 9
projective parameters, as explained herein, such as pan, tilt, roll,
center of projection of the images, ratio of focal lengths, and the like).
However, this particular representation does not preclude the projective
transformation from being reduced to practice using the teachings of the
present invention by alternate equivalent methods or other
representations, other than as represented by a particular parametric
matrix representation, without loss of generality from the way the
invention is described herein. Further, and concomitantly, the
transformations involved with the present invention may be described in
alternate notation, using for example Euler angles or quaternions, without
detracting from the spirit and scope of the invention. It is to be
understood from the teachings of the present disclosure that the
description of a projective matrix also includes these other
representations. By the same token, programming constructs such as data
structures and classes are typically realized in binary code, rather than
abstract mathematical notations, and, as such, constitute the machine
readable representations of the constructs. The representation of such
constructs in this form do not result in any loss of generality of the
representation of the underlying invention as described herein.
Regarding local pairwise registration in general, if one restricts camera
motions to be rotational only, the 2D warping between images i, j, is
strictly projective in absence of lens distortions, and given by, [Eq.
(1)]
##EQU1##
where:[x.sub.i y.sub.i z.sub.i ].sup.T are the homogeneous coordinates of
pixel locations (with the convention that column vectors represent
three-dimensional points). In the following description, the vector
X.sub.i represents the homogeneous coordinates, and the matrix M.sub.ij
represents the matrix that transforms coordinates from image j to image i.
Due to the scale ambiguity in the projective matrix, the last parameter
m.sub.8 in the projective matrices is set to equal 1.
The objective of local pairwise registration is to estimate the projective
matrix given two overlapping images. The projective matrix is initialized
by the camera internal and external parameters, e.g., [Eq. (2)]
M.sub.ij =T.sup.-1 (p.sub.i,q.sub.i)T(p.sub.j,q.sub.j) (2)
[Eq. (3)]
where
##EQU2##
where [C.sub.x.sup.i,C.sub.y.sup.i ],a.sub.i,f.sub.i are the image center
position, the aspect ratio and the focal length, respectively;
p.sub.i =[a.sub.i, f.sub.i, C.sub.x.sup.i, C.sub.y.sup.i ].sup.T
is the internal parameters vector;
q.sub.i =represents the camera orientation with respect to a common
reference frame; and
R( )=represents the 3.times.3 rotation matrix computed from the orientation
parameters q.sub.i.
Camera internal and external parameters are initialized either
automatically by the computer assuming default values, or manually with
user input.
There are ten parameters in the projective registration: eight independent
parameters in the projective matrix and two parameters to compensate for
brightness and contrast difference between the two images. The
gradient-based optimization minimizes the following objective, as
suggested in box 320 in FIG. 3, by instructing the processor 212 to
perturb the overlapping images stored in memory with various combinations
of overlapping pixels until the below local registration error function
has the smallest value [Eq. 4]:
##EQU3##
where s.sub.ij and b.sub.ij, the exposure parameters, represent the
exposure difference, I.sub.i ( ) and I.sub.j ( ) are pixel intensity
values from the two images, and A.sub. | | |