|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Technical Field
The invention is related to computer systems for constructing and rendering panoramic mosaic images from a sequence of still images, video images or scanned photographic images or the like.
2. Background Art
Image-based rendering is a popular way to simulate a visually rich tele-presence or virtual reality experience. Instead of building and rendering a complete 3D model of the environment, a collection of images is used to render the scene while
supporting virtual camera motion. For example, a single cylindrical image surrounding the viewer enables the user to pan and zoom inside an environment created from real images. More powerful image-based rendering systems can be built by adding a depth
map to the image or by using a larger collection of images.
The present invention is particularly directed to image-based rendering systems without any depth information, i.e., those which only support user panning, rotation, and zoom. Most of the commercial products based on this idea (such as QuickTime
VR) use cylindrical images with a limited vertical field of view, although newer systems support full spherical maps (e.g., Interactive Pictures, and RealVR). A number of techniques have been developed for capturing panoramic images of real-world
scenes. One way is to record an image onto a long film strip using a panoramic camera to directly capture a cylindrical panoramic image. Another way is to use a lens with a very large field of view such as a fisheye lens. Mirrored pyramids and
parabolic mirrors can also be used to directly capture panoramic images. A less hardware-intensive method for constructing full view panoramas is to take many regular photographic or video images in order to cover the whole viewing space. These images
must then be aligned and composited into complete panoramic images using an image mosaic or "stitching" method. Most stitching systems require a carefully controlled camera motion (pure pan), and only produce cylindrical images.
Cylindrical panoramas are commonly used because of their ease of construction.
To build a cylindrical panorama, a sequence of images is taken by a camera mounted on a leveled tripod. If the camera focal length or field of view is known, each perspective image can be warped into cylindrical coordinates. To build a
cylindrical panorama, 3D world coordinates are mapped to 2D cylindrical screen coordinates using ##EQU1## where .theta. is the panning angle and v is the scanline. Similarly, 3D world coordinates can be mapped into 2D spherical coordinates
.theta.,.phi. using ##EQU2##
Once each input image has been warped, constructing the panoramic mosaics for a leveled camera undergoing a pure panning motion becomes a pure translation problem.
Ideally, to build a cylindrical or spherical panorama from a horizontal panning sequence, only the unknown panning angles need to be recovered. In practice, small vertical translations are needed to compensate for vertical jitter and optical
twist. Therefore, both a horizontal translation t.sub.x and a vertical translation t.sub.y are estimated for each input image.
To recover the translational motion, the incremental translation .delta.t=(.delta.t.sub.x,.delta.t.sub.y) is estimated by minimizing the intensity error between two images, I.sub.0, I.sub.1, ##EQU3## are corresponding points in the two images,
and t=(t.sub.x, t.sub.y) is the global translational motion field which is the same for all pixels. After a first order Taylor series expansion, the above equation becomes ##EQU4## where e.sub.i =I.sub.1 (x'.sub.1)-I.sub.0 (x.sub.i) is the current
intensity or color error, and g.sub.i.sup.T =.gradient.I.sub.1 (x'.sub.1) is the image gradient of I.sub.1 at x'.sub.i. This minimization problem has a simple least-squares solution, ##EQU5## To handle larger initial displacements, a hierarchical
coarse-to-fine optimization scheme is used. To reduce discontinuities in intensity and color between the images being composited, a simple feathering process is applied in which the pixels in each image are weighted proportionally to their distance to
the edge (or more precisely, their distance to the nearest invisible pixel). Once registration is finished, the ends are clipped (and optionally the top and bottom), and a single panoramic image is produced.
Creating panoramas in cylindrical or spherical coordinates has several limitations. First, it can only handle the simple case of pure panning motion. Second, even though it is possible to convert an image to 2D spherical or cylindrical
coordinates for a known tilting angle, ill-sampling at north pole and south pole causes big registration errors. (Note that cylindrical coordinates become undefined as you tilt your camera toward north or south pole.) Third, it requires knowing the
focal length (or equivalently, field of view). While focal length can be carefully calibrated in the lab, estimating the focal length of the lens by registering two or more images using conventional techniques is not very accurate.
The automatic construction of large, high-resolution image mosaics is an active area of research in the fields of photogrammetry, computer vision, image processing, and computer graphics. Image mosaics can be used for many different
applications. The most traditional application is the construction of large aerial and satellite photographs from collections of images. More recent applications include scene stabilization and change detection, video compression and video indexing,
increasing the field of view and resolution of a camera, and even simple photo editing. A particularly popular application is the emulation of traditional film-based panoramic photography with digital panoramic mosaics, for applications such as the
construction of virtual environments and virtual travel. In computer vision, image mosaics are part of a larger recent trend, namely the study of visual scene representations. The complete description of visual scenes and scene models often entails the
recovery of depth or parallax information as well.
In computer graphics, image mosaics play an important role in the field of image-based rendering, which aims to rapidly render photorealistic novel views from collections of real (or pre-rendered) images. For applications such as virtual travel
and architectural walkthroughs, it is desirable to have complete (full view) panoramas, i.e., mosaics which cover the whole viewing sphere and hence allow the user to look in any direction. Unfortunately, most of the results to date have been limited to
cylindrical panoramas obtained with cameras rotating on leveled tripods with carefully designed stages adjusted to minimize motion parallax. This has limited the users of mosaic building ("stitching") to researchers and professional photographers who
can afford such specialized equipment. Present techniques are difficult because generally they require special camera equipment which provides pure panning motion with no motion parallax.
Problems to be Solved by the Invention:
It would be desirable for any user to be able to "paint" a full view panoramic mosaic with a simple hand-held camera or camcorder. However, this requires several problems to be overcome.
First, cylindrical or spherical coordinates should be avoided for constructing the mosaic, since these representations introduce singularities near the poles of the viewing sphere.
Second, accumulated misregistration errors need to be corrected, which are always present in any large image mosaic. For example, if one registers a sequence of images using pairwise alignments, there is usually a gap between the last image and
the first one even if these two images are the same. A simple "gap closing" technique is introduced in the specification below which forces the first and last image to be the same and distributes the resulting corrections across the image sequence.
Unfortunately, this approach works only for pure panning motions with uniform motion steps, a significant limitation.
Third, any deviations from the pure parallax-free motion model or ideal pinhole (perspective) camera model may result in local misregistrations, which are visible as a loss of detail or multiple images (ghosting).
SUMMARY OF THE INVENTION
The specification discloses how to avoid using cylindrical or spherical coordinates for constructing the mosaic by associating a rotation matrix (and optionally a focal length) with each input image, and performing registration in the input
image's coordinate system. Such mosaics are referred to herein as rotational mosaics. The system can use a postprocessing stage to project such mosaics onto a convenient viewing surface, i.e., to create an environment map represented as a
texture-mapped polyhedron surrounding the origin.
The specification discloses a global optimization process, which solves the problem of accumulated misregistration errors, to find the optimal overall registration.
The deghosting method and apparatus claimed in this application solves the problem of loss of detail or image ghosting is solved by computing local motion estimates (block-based optical flow) between pairs of overlapping images, and using these
estimates to warp each input image so as to reduce the misregistration. This is less ambitious than actually recovering a perspective depth value for each pixel, but has the advantage of being able to simultaneously model other effects such as radial
lens distortions and small movements in the image.
The invention is embodied in a deghosting method and apparatus which locally aligns individual images in a set of overlapping images of a mosaic. This is accomplished by determining, at plural predetermined pixel locations of each one of the
images, motions between the one image and other images of the set, combining the motions to produce an estimated motion at each of the plural predetermined pixel locations of the one image, and then warping the one image in accordance with the estimated
motions. Preferably, it is first determined which of the images of the set overlies the one image. This determination is made by determining alignment transformations relating the images to a 3-dimensional coordinate system and then inferring mutual
overlap between images from the transformations. The images are resampled in accordance with these alignment transformations.
The warping of each image is accomplished by constructing a mapping of warped pixel locations from the estimated motions and then resampling the one image at the warped pixel locations. The mapping is preferably a reverse mapping of pixels in an
unwarped version of the one image.
Preferably, the plural predetermined pixels are a sparse set of pixel locations of the one image and the mapping is constructed by computing an estimated motion for a dense set of pixel locations in the one image based upon the estimated motions
of the sparse set of predetermined pixel locations.
In the preferred embodiment, the image is divided into plural patches, and a center pixel location of each patch is one of the predetermined (sparse) pixel locations. The dense set of pixel locations preferably constitutes all pixel locations in
the one image.
The estimated motion for the dense set of pixel locations is computed by computing, for each pixel location of the dense set of pixel locations, an average of estimated motions of neighboring ones of the plural predetermined pixel locations
weighted in accordance with their proximity to the one pixel location. Combining the motions to produce an estimated motion is accomplished by computing a normalized average of the motions. Preferably, the process includes downweighting the motions in
the average by a fraction proportional to the number n of the motions in the average. This fraction is selected to avoid or minimize overcorrection, and is preferably 1/(n+1). The foregoing process is performed for all of the images simultaneously.
The invention can be used to create full view panoramic mosaics from image sequences. Unlike current panoramic stitching methods, which usually require pure horizontal camera panning, the disclosed system does not require any controlled motions
or constraints on how the images are taken (as long as there is no strong motion parallax). For example, images taken from a hand-held digital camera can be stitched seamlessly into panoramic mosaics.
By taking as many images as needed, image mosaics can be constructed which cover as large a field of view as desired, e.g., a complete environment map. Because the image mosaics is represented in the invention using a set of transforms, there
are no singularity problems such as those existing at the top and bottom of cylindrical or spherical maps. This method is fast and robust because it directly recovers 3D rotations instead of general 8 parameter planar perspective transforms. By mapping
the mosaic onto an arbitrary texture-mapped polyhedron surrounding the origin, the virtual environment can be viewed or explored using standard 3D graphics viewers and hardware without requiring special-purpose players. Once a mosaic has been
constructed, it can, of course, be mapped into cylindrical or spherical coordinates, and displayed using a special purpose viewer. Such specialized representations are not necessary, and represent just a particular choice of geometry and texture
coordinate embedding. Instead, using a mapping process described herein, the mosaic can be converted to an environment map, i.e., by mapping the mosaic onto any texture-mapped polyhedron surrounding the origin. This allows the use of standard 3D
graphics APIs and 3D model formats, and allows the use of 3D graphics accelerators for texture mapping. The mapping process can be employed in a rendering process that can be just as fast as special-purpose viewers. Furthermore, the mapping process can
take advantage of 3D graphics (rendering) hardware, support multi-resolution textures, and allow for easy integration with regular 3D graphics.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating the operation of a system embodying the invention.
FIGS. 2A and 2B is a block diagram illustrating apparatus embodying a system of the invention.
FIG. 3 illustrates apparatus including a camera employed for the acquisition of images to form a panoramic image.
FIG. 4 is a detailed view of a portion of the apparatus of FIG. 3.
FIG. 5 illustrates a perspective warping and image.
FIG. 6 illustrates a sequence of images obtained in an incremental image alignment method.
FIG. 7 illustrates a pixel resampling problem encountered in the incremental image alignment method.
FIG. 8 is a flow diagram of the incremental alignment method.
FIG. 9 is a block diagram illustrating the signal flow in the incremental alignment method of FIG. 8.
FIG. 10 is a flow diagram illustrating a coarse-to-fine resolution hierarchy employed in one embodiment.
FIG. 11 is a flow diagram of the incremental 3D rotation alignment method.
FIG. 12 is a block diagram illustrating the signal flow in the incremental 3D rotation alignment method of FIG. 11.
FIG. 13 is a flow diagram illustrating a focal length estimation method which can be employed in carrying out the rotation alignment method.
FIG. 14 is a signal flow diagram illustrating a modification of the signal flow of FIG. 11 for carrying out a patch-based alignment.
FIG. 15 illustrates adjustment of a bundle of direction rays in accordance with a first approach to a global block alignment method.
FIG. 16 illustrates adjustment of a bundle of direction rays in accordance with a second approach (corresponding to Equation 33 below) to the global block alignment method.
FIG. 17 illustrates pair-wise adjustment of a bundle of direction rays in accordance with the preferred embodiment (corresponding to Equation 36 below) of the global block adjustment method.
FIG. 18 is a flow diagram illustrating the global block adjustment method.
FIG. 19 illustrates pair-wise motion estimation between pixels and average motion computation in accordance with a deghosting method.
FIG. 20 is a flow diagram illustrating a preferred embodiment of the deghosting method.
FIGS. 21 and 22 illustrate complementary warpings of each image in an overlapping pair of images in accordance with the deghosting method of FIG. 20.
FIG. 23 illustrates the averaging of plural pair-wise motion estimates in accordance with one aspect of the deghosting method.
FIG. 24 illustrates an array of patch-based motion estimates in accordance with the deghosting method.
FIG. 25A illustrates an inverse mapping method in accordance with a preferred embodiment of the deghosting method in which per-patch motion estimates are obtained.
FIG. 25B illustrates how the per-patch motion estimates are interpolated to produce a finer grid of per-pixel motion estimates.
FIG. 26 is a signal flow diagram corresponding to one embodiment for carrying out the deghosting method.
FIGS. 27 and 28 illustrate a cubic 3D model and corresponding 2D texture map for carrying out one embodiment of a texture mapping method.
FIG. 29 illustrates a mapping between triangular vertices of a 3D model and a 2D texture map employed in carrying out one embodiment of the texture mapping.
FIGS. 30 and 31 illustrate a spherical 3D model and corresponding 2D texture map for carrying out one embodiment of the texture mapping method.
FIG. 32 is a flow diagram illustrating the texture map computation process.
FIGS. 33A-37D and 38 are panoramic images obtained using different aspects of the present invention in accordance with respective working examples.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS INTRODUCTION:
This specification describes how to associate a rotation matrix (and optionally a focal length) with each input image, rather than explicitly projecting all of the images onto a common surface (e.g., a cylinder). In order to reduce accumulated
registration errors, a global alignment (block adjustment) is applied to the whole sequence of images, which results in an optimal image mosaic (in the least-squares sense). To compensate for small amoints of motion parallax introduced by translations
of the camera and other unmodeled distortions, a local alignment (deghosting) method is employed which warps each image based on the results of pairwise local image registrations. Combining both global and local alignment significantly improves the
quality of the image mosaics, thereby enabling the creation of full view panoramic mosaics with hand-held cameras.
The overall flow of processing in the system is illustrated in FIG. 1. First, if the camera intrinsic parameters are unknown, the user creates a small mosaic using a planar perspective motion model, from which a rough estimate of the focal
length is computed (block 110 of FIG. 1). Next, a complete initial panoramic mosaic is assembled sequentially by adding one image at a time and adjusting its position using the rotational motion model (block 120 of FIG. 1). Then, global alignment
(block adjustment) is invoked to modify each image's transformation (and focal length) such that the global error across all possible overlapping image pairs is minimized (block 130 of FIG. 1). This stage also removes any large inconsistencies in the
mosaic, e.g., the "gaps" that might be present in a panoramic mosaic assembled using the sequential method. Then, a local alignment (deghosting) method is invoked to reduce any local misregistration errors (block 140 of FIG. 1). The final mosaic can be
stored as a collection of images with associated transformations, or optionally converted into a texture-mapped polyhedron (environment map) (block 150 of FIG. 1). By mapping the mosaic onto an arbitrary texture-mapped polyhedron surrounding the origin,
the virtual environment is viewed using standard 3D graphics viewers and hardware (block 160) without requiring special purpose players.
Exemplary Operating Environment
FIG. 2A and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented. Although not required, the invention will be described in the general
context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement
particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable
consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a
communications network. In a distributed computing environment, program modules may be located both local and remote memory storage devices.
With reference to FIG. 2A, an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional personal computer 20, including a processing unit 21, a system memory 22, and a system bus 23
that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a
variety of bus architectures. The system memory includes read only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system 26 (BIOS), containing the basic routines that helps to transfer information between elements within the
personal computer 20, such as during start-up, is stored in ROM 24. The personal computer 20 further includes a hard disk drive 27 for reading from and writing to a hard disk, not shown, a magnetic disk drive 28 for reading from or writing to a
removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD ROM or other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the
system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions,
data structures, program modules and other data for the personal computer 20. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 29 and a removable optical disk 31, it should be appreciated by those
skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read only
memories (ROM), and the like, may also be used in the exemplary operating environment.
A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more application programs 36, other program modules 37, and program data 38. A user may
enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These
and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A
monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as
speakers and printers.
The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another personal computer, a server, a router, a network PC, a
peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in FIG. 2A. The logical connections depicted
in FIG. 2A include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and Internet.
When used in a LAN networking environment, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54
or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program
modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link
between the computers may be used.
The hardware or apparatus of FIG. 2A may be modified to include additional features in order to carry out the present invention, as illustrated in FIG. 2B. In FIG. 2B, a camera 210 (such as a digital/electronic still or video camera or film or
photographic scanner) captures a sequence of images 220 which are then stored in an image memory 230. The method or processes of the invention are stored as a sequence of program instructions in a program memory 240 for execution by a microprocessor
250. The microprocessor 250, in carrying out the instructions it fetches from the program memory 240, fetches individual ones of the images stored in the image memory 230 and computes a transformation or camera matrix M for each image. The collection
of camera matrices M generated by the microprocessor 250 are stored in an image warp memory 260. From these, the microprocessor, in carrying out other program instructions stored in the program memory 240, constructs an environment model-based texture
map which is stored in an environment/texture map memory 270. The microprocessor 250 can then employ standard graphics techniques in taking the texture maps stored in the texture map memory 270 to create panoramic images which it displays in an image
display device or medium 280.
ALIGNMENT FRAMEWORK AND MOTION MODELS:
Incremental Alignment
In the invention, image mosaics are represented as collections of images with associated geometrical transformations. The first stage of the mosaic construction method computes an initial estimate for the transformation associated with each
input image. This is done by processing each input image in turn, and finding the best alignment between this image and the mosaic constructed from all previous images. (To speed up this part, one option is to register with only the previous image in
the sequence.) This reduces the problem to that of parametric motion estimation. A well-known hierarchical motion estimation framework is employed consisting of four parts: (i) pyramid construction, (ii) motion estimation, (iii) image warping, and (iv)
coarse-to-fine refinement. An important element of this framework, which the present invention exploits, is to perform the motion estimation between the current new input image and a warped (resampled) version of the mosaic. This allows us to estimate
only incremental deformations of images (or equivalently, instantaneous motion), which greatly simplifies the computations required in the gradient descent method.
In the invention, the camera motion, unlike many prior art methods, is not at all restricted to a simple panning motion, but is permitted to be any suitable three-dimensional rotation, a significant advantage. However, for the sake of clarity of
illustration, FIG. 3 illustrates a tutorial example in which the camera motion is restricted to a simple pure panning motion. Referring to FIGS. 3 and 4, a camera 310 having its optical center fixed at point C (FIG. 3) captures a sequence of 2D still
images I.sub.0, I.sub.1, I.sub.2, I.sub.3, . . . as it pans, the center rays of these images being focused on 3D points P.sub.0, P.sub.1, P.sub.2, P.sub.3, . . . at a focal length f from the optical center point C. The points P.sub.i are defined
relative to a fixed 3D world coordinate system P.sub.x, P.sub.y, P.sub.z indicated in the drawings. Each 2D image I has its own 2D x-y coordinate system whose origin (x,y)=(0,0) may be considered to be at the center of the image. Thus, for example,
image I.sub.0 may have coordinates x=(xy) while image I.sub.1 may have coordinates x'=(x',y'). The object is to register all of the images so that a panorama 320 may be constructed. An example of how perspective transforms would warp one of the images
510 to register it with another image 520 with which it overlaps is illustrated in FIG. 5. FIG. 5 illustrates the "keystoning" effect of the warping of the image 510 required to produce the desired registration between the two images. The problem is
that the transformation (or the camera motion) between the two images I.sub.0 and I.sub.1 is unknown so that a transformation between x and x' must be inferred from the images themselves before the images can be registered.
In order to register the two images I.sub.0 (x) and I.sub.1 (x'), where x' is computed using some parametric motion model m, i.e., x'=f(x;m), the warped image is first computed in accordance with
In the current implementation, bilinear pixel resampling is employed as illustrated in FIG. 7. The trick is then to find a deformation of I.sub.1 (x) which brings it into closer registration with I.sub.0 (x) and which can also be used to update
the parameter m. The warp/register/update loop can then be repeated. This specification describes how this can be done for two different transformation models, namely 8-parameter planar perspective transformations, and 3D rotations, and how this can be
generalized to other motion models and parameters.
Given two images taken from the same viewpoint (optical center) but in potentially different directions (and/or with different intrinsic parameters), the relationship between two overlapping images can be described by a planar perspective motion
model. The planar perspective transformation warps an image into another using ##EQU6## where x=(x,y,1) and x'=(x',y',1) are homogeneous or perspective coordinates, and the symbol indicates equality up to scale. (Since the M matrix is invariant to
scaling, there are only 8 independent parameters.) (It may be instructive to note here that the vector m referred to in Equation 1 is a column vector consisting of the elements m.sub.0, m.sub.1, . . . , m.sub.8 of the matrix M of Equation 2.) Equation 2
can be re-written as ##EQU7## To recover the parameters, the process iteratively updates the transformation matrix using
where ##EQU8## Resampling image I.sub.1 (x') with the new transformation x'.about.(I+D)Mx is the same as warping the resampled image I(x) by x".about.(I+D)x, (ignoring errors introduced by the double resampling operation), i.e., ##EQU9##
The matrix M may be thought of as producing a large change in the image as suggested in FIG. 6 in which a relatively large right-to-left corrective movement is observed from I.sub.1 (x') to I.sub.1 (x), while the subsequent incremental
deformation D induces an additional relatively small right-to-left movement from I.sub.1 (x) to I.sub.1 (x'). If this deformation by D provides the final correction, then the placement of a fixed object 610 relative to the second image's re-warped
coordinate system (x",y") is the about same as the object's placement relative to the first image's coordinate system (x,y), as indicated in FIG. 6.
The transformation of the second image's coordinate system is not likely to shift it by an integral number of image pixels. More likely, each shifted pixel coordinate of the second image will lie somewhere between neighboring pixel coordinates
of the first image. FIG. 7 illustrates how a pixel being sampled at a non-integral location can be computed as a bilinear blend of the original pixels. In order to warp images, the pixels of the two images are best resampled using techniques well-known
in the art, such as bilinear resampling of the type disclosed in G. Wolberg, Digital Image Warping, IEEE Computer Society Press, Los Alamitos, Calif., 1990.
In order to compute an appropriate value of the deformation parameter d, the process minimizes the squared error metric ##EQU10## where e.sub.i =I.sub.1 (x.sub.i)-I.sub.0 (x.sub.i) is the intensity or color error (one can use either intensity or
three channels of color errors), g.sub.i.sup.T =.quadrature.I.sub.1 (x.sub.i) is the image gradient of I.sub.1 at x.sub.i, d=(d.sub.0, . . . , d.sub.8) is the incremental motion parameter vector, and J.sub.i =J.sub.d (x.sub.i), where ##EQU11## is the
Jacobian of the resampled point coordinate x" with respect to d. (The entries in the Jacobian correspond to the optical flow induced by the instantaneous motion of a plane in 3D. It should be noted that while the Jacobian of Equation 11 is the Jacobian
of the warped or resampled coordinate system, its expression on the right-harid side of Equation 11 is entirely in terms of x or (x,y), which is the original or unwarped coordinate system, and therefore it is a function of the unwarped coordinates.) This
least-squares problem has a simple solution through the normal equations
where ##EQU12## is the accumulated gradient or residual. These equations can be solved using a symmetric positive definite (SPD) solver such as Cholesky decomposition of the type described in Press et al., Numerical Recipes in C: The Art of
Scientific Computing, Cambridge University Press, Cambridge, England, second edition, 1992. Note that for this problem, the matrix A is singular without elimination of one of the three parameters d.sub.0, d.sub.4, d.sub.8. In practice, we set d.sub.8
=0, and therefore only solve an 8-by-8 system. Translational motion is a special case of the general 8-parameter transformation where J is a 2-by-2 identity matrix because only the two parameters m.sub.2 and m.sub.5 are used. The translational motion
model can be used to construct cylindrical and spherical panoramas if each image is warped to cylindrical or spherical coordinates image using a known focal length. Details of how to warp and construct cylindrical and spherical panoramas are well-known
in the art.
The foregoing method may be summarized with respect to the steps of FIG. 8 and the signal flow diagram of FIG. 9 as follows. A pair of overlapping images 905, 910 are input (step 810) and a transform matrix 915 of the second image is initialized
(step 820).
An incremental deformation of the second image to improve the registration between the two images is computed (block 830) as will be described below in more detail with reference to FIG. 9. A matrix multiplier 917 updates the transformation
matrix with the deformation (step 840). A matrix multiplier 920 transforms the current image coordinates with the transform matrix to new coordinates with which a warp operator 925 resamples the second image to produce a warped image (860).
In order to compute the deformation, a subtractor 930 subtracts the warped image from the first image to produce an error. A gradient operator 935 produces the gradient with respect to the new coordinates of the warped image while an operator
940 computes the Jacobian matrix of the new or warped coordinate system with respect to the elements of the deformation matrix. As noted previously herein, the Jacobian of the warped coordinate system is expressed as a function of the unwarped
coordinates, and therefore in FIG. 9 the Jacobian-computing operator 940 receives the unwarped coordinate or vector x as an input. One advantage of this feature is that the Jacobian operator is a function of the current (unwarped) coordinate system
instead of the new coordinate system, which reduces the computational burden. A multiplier 945 multiplies each error by the product of the corresponding gradients and Jacobians, and a summer 950 accumulates the resulting products to produce a residual
vector b. A multiplier 955 obtains products of the gradients and Jacobians which are transposed by a transposer 960, after which another multiplier 965 combines the transposed and untransposed gradient-Jacobian products. These combined products are
accumulated in a summer 970 to produce the Hessian matrix A. A normal equation solver computes the individual elements of the deformation matrix, which is then reconstructed by a reconstructor 980.
If a stopping criteria has been reached (YES branch of block 870), then the latest versions of the warped second image and of the transformation matrix are stored or output. Otherwise (NO branch of block 870), the next iteration begins starting
| | |