WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Deghosting method and apparatus for construction of image mosaics    
United States Patent5986668   
Link to this pagehttp://www.wikipatents.com/5986668.html
Inventor(s)Szeliski; Richard (Bellevue, WA); Shum; Heung-Yeung (Bellevue, WA)
AbstractThe invention is embodied in a deghosting method and apparatus which locally aligns individual images in a set of overlapping images of a mosaic. This is accomplished by determining, at plural predetermined pixel locations of each one of the images, motions between the one image and other images of the set, combining the motions to produce an estimated motion at each of the plural predetermined pixel locations of the one image, and then warping the one image in accordance with the estimated motions. Preferably, it is first which of the images of the set overlies the one image. This determination is made by determining alignment transformations relating the images to a 3-dimensional coordinate system and then inferring mutual overlap between images from the transformations. The images are resampled in accordance with these alignment transformations. The warping of each image is accomplished by constructing a mapping of warped pixel locations from the estimated motions and then resampling the one image at the warped pixel locations. The mapping is preferably a reverse mapping of pixels in an unwarped version of the one image.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Szeliski; Richard (Bellevue, WA); Shum; Heung-Yeung (Bellevue, WA)
Owner/Assignee     Microsoft Corporation (Redmond, WA)
Patent assignment
All assignments
Publication Date     November 16, 1999
Application Number     08/905,103
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 1, 1997
US Classification    
Int'l Classification    
Examiner     Nguyen; Phu K.
Assistant Examiner    
Attorney/Law Firm     Michaelson; Peter L. Michaelson & Wallace Wallace; Robert M.
Address
Parent Case    
Priority Data    
USPTO Field of Search    
Patent Tags     deghosting construction image mosaics
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5708826
Ikeda
715/501.1
Jan,1998

[0 after 0 votes]
5553221
Reimer
715/720
Sep,1996

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method for locally aligning individual images in a set of overlapping images of a mosaic, comprising:

selecting one of said images;

determining, at plural predetermined pixel locations of said one image, motions between said one image and other images of said set;

combining said motions to produce an estimated motion at each of said plural predetermined pixel locations of said one image; and

warping said one image in accordance with the estimated motions.

2. The method of claim 1 wherein the determining step is preceded by determining which of said images of said set overlies said one image.

3. The method of claim 2 wherein the step of determining which of said images overlies said one image comprises:

determining alignment transformations relating said images to a 3-dimensional coordinate system; and

inferring mutual overlap between images from said transformations.

4. The method of claim 3 further comprising:

initially resampling said images in accordance with said alignment transformations.

5. The method of claim 3 wherein the step of determining alignment transformations comprises performing an incremental alignment process.

6. The method of claim 5 wherein said incremental alignment process comprises a patch-based alignment process.

7. The method of claim 6 wherein said a patch-based alignment process comprises computing an incremental deformation of said one image.

8. The method of claim 7 wherein said incremental deformation comprises a planar perspective transform.

9. The method of claim 7 wherein said incremental deformation comprises a 3-dimensional rotational transform.

10. The method of claim 1 wherein the step of warping comprises:

constructing a mapping of warped pixel locations from said estimated motions; and

resampling said one image at said warped pixel locations.

11. The method of claim 10 wherein said mapping comprises a reverse mapping of pixels in an unwarped version of said one image.

12. The method of claim 10 wherein said plural predetermined pixels are a sparse set of pixel locations of said one image and wherein the step of constructing a mapping comprises:

computing an estimated motion for a dense set of pixel locations in said one image based upon the estimated motions of said sparse set of predetermined pixel locations.

13. The method of claim 12 further comprising dividing said one image into plural patches, and assigning a center pixel location in each patch to be a corresponding one of said plural predetermined pixel locations.

14. The method of claim 13 wherein said dense set of pixel locations comprise all pixel locations in said one image.

15. The method of claim 12 wherein the step of computing an estimated motion for said dense set of pixel locations comprises:

computing, for each pixel location of said dense set of pixel locations, an average of estimated motions of neighboring ones of said plural predetermined pixel locations weighted in accordance with their proximity to the one pixel location.

16. The method of claim 1 wherein the step of combining said motions to produce an estimated motion comprises computing a normalized average of said motions.

17. The method of claim 16 further comprising downweighting said motions in said average by a fraction proportional to the number n of said motions in said average.

18. The method of claim 17 wherein said fraction is 1/(n+1).

19. The method of claim 1 wherein said steps of determining, combining and warping are performed for all of said images simultaneously.

20. A system for locally aligning individual images in a set of overlapping images of a mosaic, comprising:

a set of relative motion estimators which estimate relative motions between respective pairs of said images;

for each one of said images, an averager which computes an average of a set of said relative motions said one image; and

for each one of said averagers, a local warper which warps said one image in accordance with said average.

21. The system of claim 20 wherein said averager comprises

summer which adds said relative motions to produce a sum; and

a divider which divides said sum by a normalization factor.

22. The system of claim 21 wherein said normalization factor is proportional to the number n of images comprising said respective pairs.

23. The system of claim 22 wherein said fraction is 1/(n+1).

24. The system of claim 23 wherein said sparse data interpolator comprises a weighter which weights said relative motions in accordance with the proximities of the corresponding sparse set of pixel locations to the one pixel location of said dense set of pixel locations.

25. The system of claim 20 wherein said relative motions are of a sparse set of pixel locations in said one image.

26. The system of claim 25 further comprising:

a sparse data interpolator for computing, from said relative motions of said sparse set of pixel locations, a motion estimate for each one of a dense set of pixel locations, and wherein said local warper receives as its input the motion estimates for said dense set of pixel locations and said one image.

27. The system of claim 26 wherein said one image is divisible into contiguous patches and wherein each pixel location of said sparse set of pixel locations lies in a respective one of said patches.

28. The system of claim 27 wherein each pixel location of said sparse set of pixel locations is a center pixel location of a respective one of said patches.

29. A method for aligning individual images in a set of overlapping images of a mosaic, comprising:

determining pair-wise estimates of motions between said one image and other images of said set; and

warping said one image in accordance with said pair-wise estimates of motions.

30. The method of claim 29 wherein the warping step comprises:

combining said motions to produce an estimated motion of said one image; and

computing a warp of said one image from said estimate motion.

31. The method of claim 30 wherein the step of computing a warp comprises:

constructing a mapping of warped pixel locations from said estimated motion of said one image; and

resampling said one image at said warped pixel locations.

32. The method of claim 36 wherein said mapping comprises a reverse mapping of pixels in an unwarped version of said one image.

33. The method of claim 30 wherein the step of combining said motions to produce an estimated motion comprises computing a normalized average of said motions.

34. The method of claim 33 further comprising downweighting said motions in said average by a fraction proportional to the number n of said motions in said average.

35. The method of claim 34 wherein said fraction is 1/(n+1).

36. The method of claim 29 wherein the determining step is preceded by determining which of said images of said set overlies said one image.

37. The method of claim 36 wherein the step of determining which of said images overlies said one image comprises:

determining alignment transformations relating said images to a 3-dimensional coordinate system; and

inferring mutual overlap between images from said transformations.

38. The method of claim 37 further comprising:

initially resampling said images in accordance with said alignment transformations.

39. The method of claim 37 wherein the step of determining alignment transformations comprises performing an incremental alignment process.

40. The method of claim 39 wherein said incremental deformation comprises a planar perspective transform.

41. The method of claim 39 wherein said incremental deformation comprises a 3-dimensional rotational transform.

42. The method of claim 39 wherein said incremental alignment process comprise dividing said one image into plural patches and performing a patch-based alignment process.

43. The method of claim 42 wherein the step of determining motions between said one image and other images of said set determines said motions at plural predetermined pixel locations of said one image, said plural predetermined pixel location being center pixel locations in respective ones of said patches.

44. The method of claim 43 wherein the step of warping comprises:

constructing a mapping of warped pixel locations from said estimated motions; and

resampling said one image at said warped pixel locations.

45. The method of claim 44 wherein the step of constructing a mapping comprises:

computing an estimated motion for a dense set of pixel locations in said one image based upon the estimated motions of said sparse set of predetermined pixel locations.

46. The method of claim 45 wherein said dense set of pixel locations comprise substantially all pixel locations in said one image.

47. The method of claim 45 wherein the step of computing an estimated motion for said dense set of pixel locations comprises:

computing, for each pixel location of said dense set of pixel locations, an average of estimated motions of neighboring ones of said plural predetermined pixel locations weighted in accordance with their proximity to the one pixel location.

48. The apparatus of claim 45 wherein said processor computes an estimated motion for said dense set of pixel locations in that said processor:

computes, for each pixel location of said dense set of pixel locations, an average of estimated motions of neighbors ones of said plural predetermined pixel locations weighted in accordance with their proximity to the one pixel location.

49. The method of claim 29 wherein the steps of determining and warping are performed for all of said images simultaneously.

50. Apparatus for use in aligning individual images in a set of overlapping images of a mosaic, comprising:

a processor;

memory having executable instructions stored therein; and,

wherein the processor, in response to the instructions stored in the memory:

determines pair-wise estimates of motions between said one image and other images of said set; and

warps said one image in accordance with said pair-wise estimates of motions.

51. The apparatus of claim 50 wherein said processor warps said one image by:

combining said motions to produce an estimated motion of said one image.

52. The apparatus of claim 50 wherein the processor, in further response to said instructions, determines which of said images of said set overlies said one image prior to determining said pair-wise estimates.

53. The apparatus of claim 52 wherein said processor determines which of said images overlies said one image in that said processor:

determines alignment transformations relating said images to a 3-dimensional coordinate system; and

infers mutual overlap between images from said transformations.

54. The apparatus of claim 53 wherein said processor in further response to said instructions:

initially resamples said images in accordance with said alignment transformations.

55. The apparatus of claim 50 wherein said processor warps said one image in that said processor:

constructs a mapping of warped pixel locations from said estimated motions; and

resamples said one image at said warped pixel locations.

56. The apparatus of claim 55 wherein said mapping comprises a reverse mapping of pixels in an unwarped version of said one image.

57. The apparatus of claim 50 wherein said processor combines said motions to produce an estimated motion by computing a normalized average of said motions.

58. The apparatus of claim 57 wherein said processor in further response to said instructions downweights said motions in said average by a fraction proportional to the number n of said motions in said average.

59. The apparatus of claim 58 wherein said fraction compensates for overcorrection in said average.

60. The apparatus of claim 59 wherein said fraction is 1/(n+1).

61. The apparatus of claim 50 wherein said processor in further response to said instructions selects a next one of said images and, for said next one image, determines said pair-wise motions and warps said one image.

62. The apparatus of claim 50 said processor determines motions between said one image and other images of said set by determining said motions at plural predetermined pixel locations of said one image.

63. The apparatus of claim 62 wherein said processor warps said one image in that said processor:

constructs a mapping of warped pixel locations from said estimated motions; and

resamples said one image at said warped pixel locations.

64. The apparatus of claim 63 wherein said processor constructs a mapping by computing an estimated motion for a dense set of pixel locations in said one image based upon the estimated motions of said sparse set of predetermined pixel locations.

65. The apparatus of claim 64 wherein said dense set of pixel locations comprise substantially all pixel locations in said one image.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Technical Field

The invention is related to computer systems for constructing and rendering panoramic mosaic images from a sequence of still images, video images or scanned photographic images or the like.

2. Background Art

Image-based rendering is a popular way to simulate a visually rich tele-presence or virtual reality experience. Instead of building and rendering a complete 3D model of the environment, a collection of images is used to render the scene while supporting virtual camera motion. For example, a single cylindrical image surrounding the viewer enables the user to pan and zoom inside an environment created from real images. More powerful image-based rendering systems can be built by adding a depth map to the image or by using a larger collection of images.

The present invention is particularly directed to image-based rendering systems without any depth information, i.e., those which only support user panning, rotation, and zoom. Most of the commercial products based on this idea (such as QuickTime VR) use cylindrical images with a limited vertical field of view, although newer systems support full spherical maps (e.g., Interactive Pictures, and RealVR). A number of techniques have been developed for capturing panoramic images of real-world scenes. One way is to record an image onto a long film strip using a panoramic camera to directly capture a cylindrical panoramic image. Another way is to use a lens with a very large field of view such as a fisheye lens. Mirrored pyramids and parabolic mirrors can also be used to directly capture panoramic images. A less hardware-intensive method for constructing full view panoramas is to take many regular photographic or video images in order to cover the whole viewing space. These images must then be aligned and composited into complete panoramic images using an image mosaic or "stitching" method. Most stitching systems require a carefully controlled camera motion (pure pan), and only produce cylindrical images.

Cylindrical panoramas are commonly used because of their ease of construction.

To build a cylindrical panorama, a sequence of images is taken by a camera mounted on a leveled tripod. If the camera focal length or field of view is known, each perspective image can be warped into cylindrical coordinates. To build a cylindrical panorama, 3D world coordinates are mapped to 2D cylindrical screen coordinates using ##EQU1## where .theta. is the panning angle and v is the scanline. Similarly, 3D world coordinates can be mapped into 2D spherical coordinates .theta.,.phi. using ##EQU2##

Once each input image has been warped, constructing the panoramic mosaics for a leveled camera undergoing a pure panning motion becomes a pure translation problem.

Ideally, to build a cylindrical or spherical panorama from a horizontal panning sequence, only the unknown panning angles need to be recovered. In practice, small vertical translations are needed to compensate for vertical jitter and optical twist. Therefore, both a horizontal translation t.sub.x and a vertical translation t.sub.y are estimated for each input image.

To recover the translational motion, the incremental translation .delta.t=(.delta.t.sub.x,.delta.t.sub.y) is estimated by minimizing the intensity error between two images, I.sub.0, I.sub.1, ##EQU3## are corresponding points in the two images, and t=(t.sub.x, t.sub.y) is the global translational motion field which is the same for all pixels. After a first order Taylor series expansion, the above equation becomes ##EQU4## where e.sub.i =I.sub.1 (x'.sub.1)-I.sub.0 (x.sub.i) is the current intensity or color error, and g.sub.i.sup.T =.gradient.I.sub.1 (x'.sub.1) is the image gradient of I.sub.1 at x'.sub.i. This minimization problem has a simple least-squares solution, ##EQU5## To handle larger initial displacements, a hierarchical coarse-to-fine optimization scheme is used. To reduce discontinuities in intensity and color between the images being composited, a simple feathering process is applied in which the pixels in each image are weighted proportionally to their distance to the edge (or more precisely, their distance to the nearest invisible pixel). Once registration is finished, the ends are clipped (and optionally the top and bottom), and a single panoramic image is produced.

Creating panoramas in cylindrical or spherical coordinates has several limitations. First, it can only handle the simple case of pure panning motion. Second, even though it is possible to convert an image to 2D spherical or cylindrical coordinates for a known tilting angle, ill-sampling at north pole and south pole causes big registration errors. (Note that cylindrical coordinates become undefined as you tilt your camera toward north or south pole.) Third, it requires knowing the focal length (or equivalently, field of view). While focal length can be carefully calibrated in the lab, estimating the focal length of the lens by registering two or more images using conventional techniques is not very accurate.

The automatic construction of large, high-resolution image mosaics is an active area of research in the fields of photogrammetry, computer vision, image processing, and computer graphics. Image mosaics can be used for many different applications. The most traditional application is the construction of large aerial and satellite photographs from collections of images. More recent applications include scene stabilization and change detection, video compression and video indexing, increasing the field of view and resolution of a camera, and even simple photo editing. A particularly popular application is the emulation of traditional film-based panoramic photography with digital panoramic mosaics, for applications such as the construction of virtual environments and virtual travel. In computer vision, image mosaics are part of a larger recent trend, namely the study of visual scene representations. The complete description of visual scenes and scene models often entails the recovery of depth or parallax information as well.

In computer graphics, image mosaics play an important role in the field of image-based rendering, which aims to rapidly render photorealistic novel views from collections of real (or pre-rendered) images. For applications such as virtual travel and architectural walkthroughs, it is desirable to have complete (full view) panoramas, i.e., mosaics which cover the whole viewing sphere and hence allow the user to look in any direction. Unfortunately, most of the results to date have been limited to cylindrical panoramas obtained with cameras rotating on leveled tripods with carefully designed stages adjusted to minimize motion parallax. This has limited the users of mosaic building ("stitching") to researchers and professional photographers who can afford such specialized equipment. Present techniques are difficult because generally they require special camera equipment which provides pure panning motion with no motion parallax.

Problems to be Solved by the Invention:

It would be desirable for any user to be able to "paint" a full view panoramic mosaic with a simple hand-held camera or camcorder. However, this requires several problems to be overcome.

First, cylindrical or spherical coordinates should be avoided for constructing the mosaic, since these representations introduce singularities near the poles of the viewing sphere.

Second, accumulated misregistration errors need to be corrected, which are always present in any large image mosaic. For example, if one registers a sequence of images using pairwise alignments, there is usually a gap between the last image and the first one even if these two images are the same. A simple "gap closing" technique is introduced in the specification below which forces the first and last image to be the same and distributes the resulting corrections across the image sequence. Unfortunately, this approach works only for pure panning motions with uniform motion steps, a significant limitation.

Third, any deviations from the pure parallax-free motion model or ideal pinhole (perspective) camera model may result in local misregistrations, which are visible as a loss of detail or multiple images (ghosting).

SUMMARY OF THE INVENTION

The specification discloses how to avoid using cylindrical or spherical coordinates for constructing the mosaic by associating a rotation matrix (and optionally a focal length) with each input image, and performing registration in the input image's coordinate system. Such mosaics are referred to herein as rotational mosaics. The system can use a postprocessing stage to project such mosaics onto a convenient viewing surface, i.e., to create an environment map represented as a texture-mapped polyhedron surrounding the origin.

The specification discloses a global optimization process, which solves the problem of accumulated misregistration errors, to find the optimal overall registration.

The deghosting method and apparatus claimed in this application solves the problem of loss of detail or image ghosting is solved by computing local motion estimates (block-based optical flow) between pairs of overlapping images, and using these estimates to warp each input image so as to reduce the misregistration. This is less ambitious than actually recovering a perspective depth value for each pixel, but has the advantage of being able to simultaneously model other effects such as radial lens distortions and small movements in the image.

The invention is embodied in a deghosting method and apparatus which locally aligns individual images in a set of overlapping images of a mosaic. This is accomplished by determining, at plural predetermined pixel locations of each one of the images, motions between the one image and other images of the set, combining the motions to produce an estimated motion at each of the plural predetermined pixel locations of the one image, and then warping the one image in accordance with the estimated motions. Preferably, it is first determined which of the images of the set overlies the one image. This determination is made by determining alignment transformations relating the images to a 3-dimensional coordinate system and then inferring mutual overlap between images from the transformations. The images are resampled in accordance with these alignment transformations.

The warping of each image is accomplished by constructing a mapping of warped pixel locations from the estimated motions and then resampling the one image at the warped pixel locations. The mapping is preferably a reverse mapping of pixels in an unwarped version of the one image.

Preferably, the plural predetermined pixels are a sparse set of pixel locations of the one image and the mapping is constructed by computing an estimated motion for a dense set of pixel locations in the one image based upon the estimated motions of the sparse set of predetermined pixel locations.

In the preferred embodiment, the image is divided into plural patches, and a center pixel location of each patch is one of the predetermined (sparse) pixel locations. The dense set of pixel locations preferably constitutes all pixel locations in the one image.

The estimated motion for the dense set of pixel locations is computed by computing, for each pixel location of the dense set of pixel locations, an average of estimated motions of neighboring ones of the plural predetermined pixel locations weighted in accordance with their proximity to the one pixel location. Combining the motions to produce an estimated motion is accomplished by computing a normalized average of the motions. Preferably, the process includes downweighting the motions in the average by a fraction proportional to the number n of the motions in the average. This fraction is selected to avoid or minimize overcorrection, and is preferably 1/(n+1). The foregoing process is performed for all of the images simultaneously.

The invention can be used to create full view panoramic mosaics from image sequences. Unlike current panoramic stitching methods, which usually require pure horizontal camera panning, the disclosed system does not require any controlled motions or constraints on how the images are taken (as long as there is no strong motion parallax). For example, images taken from a hand-held digital camera can be stitched seamlessly into panoramic mosaics.

By taking as many images as needed, image mosaics can be constructed which cover as large a field of view as desired, e.g., a complete environment map. Because the image mosaics is represented in the invention using a set of transforms, there are no singularity problems such as those existing at the top and bottom of cylindrical or spherical maps. This method is fast and robust because it directly recovers 3D rotations instead of general 8 parameter planar perspective transforms. By mapping the mosaic onto an arbitrary texture-mapped polyhedron surrounding the origin, the virtual environment can be viewed or explored using standard 3D graphics viewers and hardware without requiring special-purpose players. Once a mosaic has been constructed, it can, of course, be mapped into cylindrical or spherical coordinates, and displayed using a special purpose viewer. Such specialized representations are not necessary, and represent just a particular choice of geometry and texture coordinate embedding. Instead, using a mapping process described herein, the mosaic can be converted to an environment map, i.e., by mapping the mosaic onto any texture-mapped polyhedron surrounding the origin. This allows the use of standard 3D graphics APIs and 3D model formats, and allows the use of 3D graphics accelerators for texture mapping. The mapping process can be employed in a rendering process that can be just as fast as special-purpose viewers. Furthermore, the mapping process can take advantage of 3D graphics (rendering) hardware, support multi-resolution textures, and allow for easy integration with regular 3D graphics.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the operation of a system embodying the invention.

FIGS. 2A and 2B is a block diagram illustrating apparatus embodying a system of the invention.

FIG. 3 illustrates apparatus including a camera employed for the acquisition of images to form a panoramic image.

FIG. 4 is a detailed view of a portion of the apparatus of FIG. 3.

FIG. 5 illustrates a perspective warping and image.

FIG. 6 illustrates a sequence of images obtained in an incremental image alignment method.

FIG. 7 illustrates a pixel resampling problem encountered in the incremental image alignment method.

FIG. 8 is a flow diagram of the incremental alignment method.

FIG. 9 is a block diagram illustrating the signal flow in the incremental alignment method of FIG. 8.

FIG. 10 is a flow diagram illustrating a coarse-to-fine resolution hierarchy employed in one embodiment.

FIG. 11 is a flow diagram of the incremental 3D rotation alignment method.

FIG. 12 is a block diagram illustrating the signal flow in the incremental 3D rotation alignment method of FIG. 11.

FIG. 13 is a flow diagram illustrating a focal length estimation method which can be employed in carrying out the rotation alignment method.

FIG. 14 is a signal flow diagram illustrating a modification of the signal flow of FIG. 11 for carrying out a patch-based alignment.

FIG. 15 illustrates adjustment of a bundle of direction rays in accordance with a first approach to a global block alignment method.

FIG. 16 illustrates adjustment of a bundle of direction rays in accordance with a second approach (corresponding to Equation 33 below) to the global block alignment method.

FIG. 17 illustrates pair-wise adjustment of a bundle of direction rays in accordance with the preferred embodiment (corresponding to Equation 36 below) of the global block adjustment method.

FIG. 18 is a flow diagram illustrating the global block adjustment method.

FIG. 19 illustrates pair-wise motion estimation between pixels and average motion computation in accordance with a deghosting method.

FIG. 20 is a flow diagram illustrating a preferred embodiment of the deghosting method.

FIGS. 21 and 22 illustrate complementary warpings of each image in an overlapping pair of images in accordance with the deghosting method of FIG. 20.

FIG. 23 illustrates the averaging of plural pair-wise motion estimates in accordance with one aspect of the deghosting method.

FIG. 24 illustrates an array of patch-based motion estimates in accordance with the deghosting method.

FIG. 25A illustrates an inverse mapping method in accordance with a preferred embodiment of the deghosting method in which per-patch motion estimates are obtained.

FIG. 25B illustrates how the per-patch motion estimates are interpolated to produce a finer grid of per-pixel motion estimates.

FIG. 26 is a signal flow diagram corresponding to one embodiment for carrying out the deghosting method.

FIGS. 27 and 28 illustrate a cubic 3D model and corresponding 2D texture map for carrying out one embodiment of a texture mapping method.

FIG. 29 illustrates a mapping between triangular vertices of a 3D model and a 2D texture map employed in carrying out one embodiment of the texture mapping.

FIGS. 30 and 31 illustrate a spherical 3D model and corresponding 2D texture map for carrying out one embodiment of the texture mapping method.

FIG. 32 is a flow diagram illustrating the texture map computation process.

FIGS. 33A-37D and 38 are panoramic images obtained using different aspects of the present invention in accordance with respective working examples.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS INTRODUCTION:

This specification describes how to associate a rotation matrix (and optionally a focal length) with each input image, rather than explicitly projecting all of the images onto a common surface (e.g., a cylinder). In order to reduce accumulated registration errors, a global alignment (block adjustment) is applied to the whole sequence of images, which results in an optimal image mosaic (in the least-squares sense). To compensate for small amoints of motion parallax introduced by translations of the camera and other unmodeled distortions, a local alignment (deghosting) method is employed which warps each image based on the results of pairwise local image registrations. Combining both global and local alignment significantly improves the quality of the image mosaics, thereby enabling the creation of full view panoramic mosaics with hand-held cameras.

The overall flow of processing in the system is illustrated in FIG. 1. First, if the camera intrinsic parameters are unknown, the user creates a small mosaic using a planar perspective motion model, from which a rough estimate of the focal length is computed (block 110 of FIG. 1). Next, a complete initial panoramic mosaic is assembled sequentially by adding one image at a time and adjusting its position using the rotational motion model (block 120 of FIG. 1). Then, global alignment (block adjustment) is invoked to modify each image's transformation (and focal length) such that the global error across all possible overlapping image pairs is minimized (block 130 of FIG. 1). This stage also removes any large inconsistencies in the mosaic, e.g., the "gaps" that might be present in a panoramic mosaic assembled using the sequential method. Then, a local alignment (deghosting) method is invoked to reduce any local misregistration errors (block 140 of FIG. 1). The final mosaic can be stored as a collection of images with associated transformations, or optionally converted into a texture-mapped polyhedron (environment map) (block 150 of FIG. 1). By mapping the mosaic onto an arbitrary texture-mapped polyhedron surrounding the origin, the virtual environment is viewed using standard 3D graphics viewers and hardware (block 160) without requiring special purpose players.

Exemplary Operating Environment

FIG. 2A and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located both local and remote memory storage devices.

With reference to FIG. 2A, an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional personal computer 20, including a processing unit 21, a system memory 22, and a system bus 23 that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes read only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system 26 (BIOS), containing the basic routines that helps to transfer information between elements within the personal computer 20, such as during start-up, is stored in ROM 24. The personal computer 20 further includes a hard disk drive 27 for reading from and writing to a hard disk, not shown, a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD ROM or other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 20. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 29 and a removable optical disk 31, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read only memories (ROM), and the like, may also be used in the exemplary operating environment.

A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more application programs 36, other program modules 37, and program data 38. A user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.

The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in FIG. 2A. The logical connections depicted in FIG. 2A include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and Internet.

When used in a LAN networking environment, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.

The hardware or apparatus of FIG. 2A may be modified to include additional features in order to carry out the present invention, as illustrated in FIG. 2B. In FIG. 2B, a camera 210 (such as a digital/electronic still or video camera or film or photographic scanner) captures a sequence of images 220 which are then stored in an image memory 230. The method or processes of the invention are stored as a sequence of program instructions in a program memory 240 for execution by a microprocessor 250. The microprocessor 250, in carrying out the instructions it fetches from the program memory 240, fetches individual ones of the images stored in the image memory 230 and computes a transformation or camera matrix M for each image. The collection of camera matrices M generated by the microprocessor 250 are stored in an image warp memory 260. From these, the microprocessor, in carrying out other program instructions stored in the program memory 240, constructs an environment model-based texture map which is stored in an environment/texture map memory 270. The microprocessor 250 can then employ standard graphics techniques in taking the texture maps stored in the texture map memory 270 to create panoramic images which it displays in an image display device or medium 280.

ALIGNMENT FRAMEWORK AND MOTION MODELS:

Incremental Alignment

In the invention, image mosaics are represented as collections of images with associated geometrical transformations. The first stage of the mosaic construction method computes an initial estimate for the transformation associated with each input image. This is done by processing each input image in turn, and finding the best alignment between this image and the mosaic constructed from all previous images. (To speed up this part, one option is to register with only the previous image in the sequence.) This reduces the problem to that of parametric motion estimation. A well-known hierarchical motion estimation framework is employed consisting of four parts: (i) pyramid construction, (ii) motion estimation, (iii) image warping, and (iv) coarse-to-fine refinement. An important element of this framework, which the present invention exploits, is to perform the motion estimation between the current new input image and a warped (resampled) version of the mosaic. This allows us to estimate only incremental deformations of images (or equivalently, instantaneous motion), which greatly simplifies the computations required in the gradient descent method.

In the invention, the camera motion, unlike many prior art methods, is not at all restricted to a simple panning motion, but is permitted to be any suitable three-dimensional rotation, a significant advantage. However, for the sake of clarity of illustration, FIG. 3 illustrates a tutorial example in which the camera motion is restricted to a simple pure panning motion. Referring to FIGS. 3 and 4, a camera 310 having its optical center fixed at point C (FIG. 3) captures a sequence of 2D still images I.sub.0, I.sub.1, I.sub.2, I.sub.3, . . . as it pans, the center rays of these images being focused on 3D points P.sub.0, P.sub.1, P.sub.2, P.sub.3, . . . at a focal length f from the optical center point C. The points P.sub.i are defined relative to a fixed 3D world coordinate system P.sub.x, P.sub.y, P.sub.z indicated in the drawings. Each 2D image I has its own 2D x-y coordinate system whose origin (x,y)=(0,0) may be considered to be at the center of the image. Thus, for example, image I.sub.0 may have coordinates x=(xy) while image I.sub.1 may have coordinates x'=(x',y'). The object is to register all of the images so that a panorama 320 may be constructed. An example of how perspective transforms would warp one of the images 510 to register it with another image 520 with which it overlaps is illustrated in FIG. 5. FIG. 5 illustrates the "keystoning" effect of the warping of the image 510 required to produce the desired registration between the two images. The problem is that the transformation (or the camera motion) between the two images I.sub.0 and I.sub.1 is unknown so that a transformation between x and x' must be inferred from the images themselves before the images can be registered.

In order to register the two images I.sub.0 (x) and I.sub.1 (x'), where x' is computed using some parametric motion model m, i.e., x'=f(x;m), the warped image is first computed in accordance with

In the current implementation, bilinear pixel resampling is employed as illustrated in FIG. 7. The trick is then to find a deformation of I.sub.1 (x) which brings it into closer registration with I.sub.0 (x) and which can also be used to update the parameter m. The warp/register/update loop can then be repeated. This specification describes how this can be done for two different transformation models, namely 8-parameter planar perspective transformations, and 3D rotations, and how this can be generalized to other motion models and parameters.

Given two images taken from the same viewpoint (optical center) but in potentially different directions (and/or with different intrinsic parameters), the relationship between two overlapping images can be described by a planar perspective motion model. The planar perspective transformation warps an image into another using ##EQU6## where x=(x,y,1) and x'=(x',y',1) are homogeneous or perspective coordinates, and the symbol indicates equality up to scale. (Since the M matrix is invariant to scaling, there are only 8 independent parameters.) (It may be instructive to note here that the vector m referred to in Equation 1 is a column vector consisting of the elements m.sub.0, m.sub.1, . . . , m.sub.8 of the matrix M of Equation 2.) Equation 2 can be re-written as ##EQU7## To recover the parameters, the process iteratively updates the transformation matrix using

where ##EQU8## Resampling image I.sub.1 (x') with the new transformation x'.about.(I+D)Mx is the same as warping the resampled image I(x) by x".about.(I+D)x, (ignoring errors introduced by the double resampling operation), i.e., ##EQU9##

The matrix M may be thought of as producing a large change in the image as suggested in FIG. 6 in which a relatively large right-to-left corrective movement is observed from I.sub.1 (x') to I.sub.1 (x), while the subsequent incremental deformation D induces an additional relatively small right-to-left movement from I.sub.1 (x) to I.sub.1 (x'). If this deformation by D provides the final correction, then the placement of a fixed object 610 relative to the second image's re-warped coordinate system (x",y") is the about same as the object's placement relative to the first image's coordinate system (x,y), as indicated in FIG. 6.

The transformation of the second image's coordinate system is not likely to shift it by an integral number of image pixels. More likely, each shifted pixel coordinate of the second image will lie somewhere between neighboring pixel coordinates of the first image. FIG. 7 illustrates how a pixel being sampled at a non-integral location can be computed as a bilinear blend of the original pixels. In order to warp images, the pixels of the two images are best resampled using techniques well-known in the art, such as bilinear resampling of the type disclosed in G. Wolberg, Digital Image Warping, IEEE Computer Society Press, Los Alamitos, Calif., 1990.

In order to compute an appropriate value of the deformation parameter d, the process minimizes the squared error metric ##EQU10## where e.sub.i =I.sub.1 (x.sub.i)-I.sub.0 (x.sub.i) is the intensity or color error (one can use either intensity or three channels of color errors), g.sub.i.sup.T =.quadrature.I.sub.1 (x.sub.i) is the image gradient of I.sub.1 at x.sub.i, d=(d.sub.0, . . . , d.sub.8) is the incremental motion parameter vector, and J.sub.i =J.sub.d (x.sub.i), where ##EQU11## is the Jacobian of the resampled point coordinate x" with respect to d. (The entries in the Jacobian correspond to the optical flow induced by the instantaneous motion of a plane in 3D. It should be noted that while the Jacobian of Equation 11 is the Jacobian of the warped or resampled coordinate system, its expression on the right-harid side of Equation 11 is entirely in terms of x or (x,y), which is the original or unwarped coordinate system, and therefore it is a function of the unwarped coordinates.) This least-squares problem has a simple solution through the normal equations

where ##EQU12## is the accumulated gradient or residual. These equations can be solved using a symmetric positive definite (SPD) solver such as Cholesky decomposition of the type described in Press et al., Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, Cambridge, England, second edition, 1992. Note that for this problem, the matrix A is singular without elimination of one of the three parameters d.sub.0, d.sub.4, d.sub.8. In practice, we set d.sub.8 =0, and therefore only solve an 8-by-8 system. Translational motion is a special case of the general 8-parameter transformation where J is a 2-by-2 identity matrix because only the two parameters m.sub.2 and m.sub.5 are used. The translational motion model can be used to construct cylindrical and spherical panoramas if each image is warped to cylindrical or spherical coordinates image using a known focal length. Details of how to warp and construct cylindrical and spherical panoramas are well-known in the art.

The foregoing method may be summarized with respect to the steps of FIG. 8 and the signal flow diagram of FIG. 9 as follows. A pair of overlapping images 905, 910 are input (step 810) and a transform matrix 915 of the second image is initialized (step 820).

An incremental deformation of the second image to improve the registration between the two images is computed (block 830) as will be described below in more detail with reference to FIG. 9. A matrix multiplier 917 updates the transformation matrix with the deformation (step 840). A matrix multiplier 920 transforms the current image coordinates with the transform matrix to new coordinates with which a warp operator 925 resamples the second image to produce a warped image (860).

In order to compute the deformation, a subtractor 930 subtracts the warped image from the first image to produce an error. A gradient operator 935 produces the gradient with respect to the new coordinates of the warped image while an operator 940 computes the Jacobian matrix of the new or warped coordinate system with respect to the elements of the deformation matrix. As noted previously herein, the Jacobian of the warped coordinate system is expressed as a function of the unwarped coordinates, and therefore in FIG. 9 the Jacobian-computing operator 940 receives the unwarped coordinate or vector x as an input. One advantage of this feature is that the Jacobian operator is a function of the current (unwarped) coordinate system instead of the new coordinate system, which reduces the computational burden. A multiplier 945 multiplies each error by the product of the corresponding gradients and Jacobians, and a summer 950 accumulates the resulting products to produce a residual vector b. A multiplier 955 obtains products of the gradients and Jacobians which are transposed by a transposer 960, after which another multiplier 965 combines the transposed and untransposed gradient-Jacobian products. These combined products are accumulated in a summer 970 to produce the Hessian matrix A. A normal equation solver computes the individual elements of the deformation matrix, which is then reconstructed by a reconstructor 980.

If a stopping criteria has been reached (YES branch of block 870), then the latest versions of the warped second image and of the transformation matrix are stored or output. Otherwise (NO branch of block 870), the next iteration begins starting