|
Claims  |
|
|
What is claimed is:
1. A method for constructing, from a set of overlapping images, a texture map divisible into plural faces, comprising:
for each one of said plural faces, computing a texture mapping transform which maps between pixel locations in said texture map and a three-dimensional coordinate system;
for each image of said set, computing an image transform which maps between pixel locations in said image and said three-dimensional coordinate system;
for each image of said set, combining said texture mapping transform and said image transform to produce a texture map-to-image space transform mapping between pixel locations in said image and pixel locations in said texture map;
for each image of said set and for each one of the pixel locations in said one face of said texture map, computing from said texture map-to-image space transform a pixel value of a pixel location in said image corresponding to said one pixel
location in said one face of said texture map, to produce a set of image pixel values from said set of overlapping images for said one pixel location in said texture space; and
blending said set of image pixel values to produce a composite pixel value for said one pixel location in said one face of said texture map.
2. The method of claim 1 further comprising:
obtaining a user-defined mapping between plural object faces in said three-dimensional coordinate system and said plural faces of said texture map;
defining space vectors locating said plural object faces in said three-dimensional coordinate system and texture vectors locating said plural faces in said texture map; and
wherein the step of computing said texture map transform comprises:
concatentating the texture vectors of said one face of said texture map to form a first matrix,
concatenating the space vectors of a corresponding face of said three-dimensional space to form a second matrix, and
combining said first matrix and said second matrix.
3. The method of claim 2 wherein said faces in said texture map and said object faces have plural vertices, and wherein said texture vectors locate the vertices of said faces in said texture map and said space vectors locate the vertices of said
object faces.
4. The method of claim 3 wherein said faces of said texture map and said object faces have the same number of vertices n, wherein n is an integer, and wherein:
the step of defining space vectors comprises defining n space vectors each locating a corresponding one of said n vertices of said object faces; and
the step of defining texture vectors comprises defining n texture vectors each locating a corresponding one of said n vertices of said faces of said texture map.
5. The method of claim 4 wherein the step of combining said first and second matrices comprises matrix multiplying said first matrix by an inverse of said second matrix.
6. The method of claim 4 wherein n=3 and said faces are triangles.
7. The method of claim 1 wherein the step of combining said texture and image transforms comprises matrix-multiplying said transforms.
8. The method of claim 1 further comprising placing said composite pixel values into corresponding pixel locations in said texture map.
9. The method of claim 8 further comprising assigning a color identification tag to each triangle in said texture map and to each pixal in the triangle, and storing each color identification tag in a buffer at a location in said buffer
associated with the respective triangle.
10. The method of claim 9 wherein said texture map comprises visible regions mapped to said three-dimensional coordinate system and invisible regions, said method further comprising propagating the color id tags of said visible regions of said
texture map into neighboring portions of said invisible regions.
11. The method of claim 10 wherein the step of blending is carried out in a bounding box surrounding said one face of said texture map.
12. The method of claim 10 wherein the step of placing said composite pixel values in corresponding pixel locations in said texture map comprises:
determining for each of said corresponding pixel locations in said texture map a corresponding color identification tag; and
comparing color identification tags of said corresponding pixel locations in said texture map with the color identification tags stored in said buffer memory and placing said composite pixel values in the corresponding pixel locations for which a
color identification tag match is found.
13. The method of claim 1 wherein the step of blending comprises averaging said image pixel values.
14. The method of claim 13 wherein the step of averaging comprises deweighting each image pixel value in accordance to its proximity to an edge of the corresponding image.
15. The method of claim 14 wherein said pixel values represent colors.
16. The method of claim 1 wherein the texture-to-image transform comprises an 8-parameter perspective transformation.
17. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 1.
18. A method for constructing, from a set of overlapping images, a texture map divisible into plural faces, comprising:
associating each one of said overlapping images with a first transform between said one image and three-dimensional space;
computing a second transform which maps between said texture map and said three-dimensional space;
determining, from said first and second transforms, image pixel values in each image corresponding to pixel locations in one of said plural faces of said texture space; and
blending said image pixel values to produce a composite pixel value for each pixel location in said texture space.
19. The method of claim 18 wherein the step of determining image pixel values for each pixel location in texture space comprises:
combining said first and second to produce a texture space-to-image space transform mapping between pixel locations in said image and pixel locations in said texture space; and
for each image of said set and for each one of the pixel locations in said texture space, computing from said texture space-to-image space transform a pixel value of a pixel location in said image corresponding to said one pixel location in said
texture space, to produce from said set of overlapping images image pixel values corresponding to said one pixel location in said texture space.
20. The method of claim 18 further comprising:
obtaining a user-defined mapping between plural object faces in said three-dimensional coordinate system and said plural faces of said texture map;
defining space vectors locating said plural object faces in said three-dimensional coordinate system and texture vectors locating said plural faces in said texture map; and
wherein the step of computing said second transform comprises:
concatentating the texture vectors of said one face of said texture map to form a first matrix,
concatenating the space vectors of a corresponding face of said three-dimensional space to form a second matrix, and
combining said first matrix and said second matrix.
21. The method of claim 20 wherein said faces in said texture map and said object faces have plural vertices, and wherein said texture vectors locate the vertices of said faces in said texture map and said space vectors locate the vertices of
said object faces.
22. The method of claim 21 wherein said faces of said texture map and said object faces have the same number of vertices n, wherein n is an integer, and wherein:
the step of defining space vectors comprises defining n space vectors each locating a corresponding one of said n vertices of said object faces; and
the step of defining texture vectors comprises defining n texture vectors each locating a corresponding one of said n vertices of said faces of said texture map.
23. The method of claim 22 wherein the step of combining said first and second matrices comprises matrix multiplying said first matrix by an inverse of said second matrix.
24. The method of claim 22 wherein n=3 and said faces are triangles.
25. The method of claim 18 wherein the step of combining said first and second transforms comprises matrix-multiplying said first transform by an inverse of said second transform.
26. The method of claim 18 further comprising placing said composite pixel values into corresponding pixel locations in said texture map.
27. The method of claim 26 further comprising assigning a color identification tag to each face in said texture map and to each pixal in the face, and storing each color identification tag in a buffer at a location in said buffer associated with
the respective face.
28. The method of claim 27 wherein said texture map comprises visible regions mapped to said three-dimensional coordinate system and invisible regions, said method further comprising propagating the color id tags of said visible regions of said
texture map into neighboring portions of said invisible regions.
29. The method of claim 28 wherein the step of blending is carried out in a bounding box surrounding said one face of said texture map.
30. The method of claim 28 wherein the step of placing said composite pixel values in corresponding pixel locations in said texture map comprises:
determining for each of said corresponding pixel locations in said texture map a corresponding color identification tag; and
comparing color identification tags of said corresponding pixel locations in said texture map with the color identification tags stored in said buffer memory and placing said composite pixel values in the corresponding pixel locations for which a
color identification tag match is found.
31. The method of claim 18 wherein the step of blending comprises averaging said image pixel values.
32. The method of claim 31 wherein the step of averaging comprises deweighting each image pixel value in accordance to its proximity to an edge of the corresponding image.
33. The method of claim 32 wherein said pixel values represent colors.
34. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 18.
35. Apparatus for use in constructing, from a set of overlapping images, a texture map divisible into plural faces, said apparatus comprising:
a processor;
memory having executable instructions stored therein; and
wherein the processor, in response to the instructions stored in the memory:
for each one of said plural faces, computes a texture mapping transform which maps between pixel locations in said texture map and a three-dimensional coordinate system;
for each image of said set, computes an image transform which maps between pixel locations in said image and said three-dimensional coordinate system;
for each image of said set, combines said texture mapping transform and said image transform to produce a texture map-to-image space transform mapping between pixel locations in said image and pixel locations in said texture map;
for each image of said set and for each one of the pixel locations in said one face of said texture map, computes from said texture map-to-image space transform a pixel value of a pixel location in said image corresponding to said one pixel
location in said one face of said texture map, to produce a set of image pixel values from said set of overlapping images for said one pixel location in said texture space; and
blends said set of image pixel values to produce a composite pixel value for said one pixel location in said one face of said texture map.
36. The apparatus of claim 35 wherein said processor, in further response to said instructions:
obtains a user-defined mapping between plural object faces in said three-dimensional coordinate system and said plural faces of said texture map;
defines space vectors locating said plural object faces in said three-dimensional coordinate system and texture vectors locating said plural faces in said texture map; and
wherein said processor computes said texture map transform in that said processor:
concatentates the texture vectors of said one face of said texture map to form a first matrix,
concatenates the space vectors of a corresponding face of said three-dimensional space to form a second matrix, and
combines said first matrix and said second matrix.
37. The apparatus of claim 36 wherein said faces in said texture map and said object faces have plural vertices, and wherein said texture vectors locate the vertices of said faces in said texture map and said space vectors locate the vertices of
said object faces.
38. The apparatus of claim 37 wherein said faces of said texture map and said object faces have the same number of vertices n, wherein n is an integer, and wherein:
said processor defines space vectors by defining n space vectors each locating a corresponding one of said n vertices of said object faces; and
said processor defines the texture vectors by defining n texture vectors each locating a corresponding one of said n vertices of said faces of said texture map.
39. The apparatus of claim 38 wherein said processor combines said first and second matrices by matrix multiplying said first matrix by an inverse of said second matrix.
40. The apparatus of claim 38 wherein n=3 and said faces are triangles.
41. The apparatus of claim 35 wherein said processor, in further response to said instructions, places said composite pixel values into corresponding pixel locations in said texture map.
42. The apparatus of claim 41 wherein said processor, in further response to said instructions, assigns a color identification tag to each triangle in said texture map and to each pixal in the triangle, and stores each color identification tag
in a buffer at a location in said buffer associated with the respective triangle.
43. The apparatus of claim 42 wherein said texture map comprises visible regions mapped to said three-dimensional coordinate system and invisible regions, and wherein said processor, in further response to said instructions, propagates the color
id tags of said visible regions of said texture map into neighboring portions of said invisible regions.
44. The apparatus of claim 43 wherein said processor blends said image pixel values in that said processor selects the corresponding pixel locations in a bounding box surrounding said one face of said texture map.
45. The apparatus of claim 44 wherein said processor places said composite pixel values in corresponding pixel locations in said texture map in that said processor:
determines for each of said corresponding pixel locations in said texture map a corresponding color identification tag; and
compares color identification tags of said corresponding pixel locations in said texture map with the color identification tags stored in said buffer memory and places said composite pixel values in the corresponding pixel locations for which a
color identification tag match is found.
46. The apparatus of claim 35 wherein said processor blends said image pixel values by averaging said image pixel values.
47. The apparatus of claim 46 wherein said processor, in blending said image pixel values, deweights each image pixel value in accordance to its proximity to an edge of the corresponding image.
48. The apparatus of claim 47 wherein said pixel values represent colors.
49. The apparatus of claim 35 wherein the texture-to-image transform comprises an 8-parameter perspective transformation.
50. Apparatus for use in constructing, from a set of overlapping images, a texture map divisible into plural faces, said apparatus comprising:
a processor;
memory having executable instructions stored therein; and
wherein the processor, in response to the instructions stored in the memory:
associates each one of said overlapping images with a first transform between said one image and three-dimensional space;
computes a second transform which maps between said texture map and said three-dimensional space;
determines, from said first and second transforms, image pixel values in each image corresponding to pixel locations in one of said plural faces of said texture space; and
blends said image pixel values to produce a composite pixel value for each pixel location in said texture space.
51. The apparatus of claim 50 wherein said processor determines image pixel values for each pixel location in texture space in that said processor:
combines said first and second to produce a texture space-to-image space transform mapping between pixel locations in said image and pixel locations in said texture space; and
for each image of said set and for each one of the pixel locations in said texture space, computes from said texture space-to-image space transform a pixel value of a pixel location in said image corresponding to said one pixel location in said
texture space, to produce from said set of overlapping images image pixel values corresponding to said one pixel location in said texture space.
52. The apparatus of claim 50 wherein said processor, in further response to said instructions:
obtains a user-defined mapping between plural object faces in said three-dimensional coordinate system and said plural faces of said texture map;
defines space vectors locating said plural object faces in said three-dimensional coordinate system and texture vectors locating said plural faces in said texture map; and
wherein said processor computes said texture map transform in that said processor:
concatentates the texture vectors of said one face of said texture map to form a first matrix,
concatenates the space vectors of a corresponding face of said three-dimensional space to form a second matrix, and
combines said first matrix and said second matrix.
53. The apparatus of claim 52 wherein said faces in said texture map and said object faces have plural vertices, and wherein said texture vectors locate the vertices of said faces in said texture map and said space vectors locate the vertices of
said object faces.
54. The apparatus of claim 53 wherein said faces of said texture map and said object faces have the same number of vertices n, wherein n is an integer, and wherein:
said processor defines space vectors by defining n space vectors each locating a corresponding one of said n vertices of said object faces; and
said processor defines the texture vectors by defining n texture vectors each locating a corresponding one of said n vertices of said faces of said texture map.
55. The apparatus of claim 54 wherein said processor combines said first and second matrices by matrix multiplying said first matrix by an inverse of said second matrix.
56. The apparatus of claim 54 wherein n=3 and said faces are triangles.
57. The apparatus of claim 50 wherein said processor, in further response to said instructions, places said composite pixel values into corresponding pixel locations in said texture map.
58. The apparatus of claim 57 wherein said processor, in further response to said instructions, assigns a color identification tag to each triangle in said texture map and to each pixal in the triangle, and stores each color identification tag
in a buffer at a location in said buffer associated with the respective triangle.
59. The apparatus of claim 58 wherein said texture map comprises visible regions mapped to said three-dimensional coordinate system and invisible regions, and wherein said processor, in further response to said instructions, propagates the color
id tags of said visible regions of said texture map into neighboring portions of said invisible regions.
60. The apparatus of claim 59 wherein said processor blends said image pixel values in that said processor selects the corresponding pixel locations in a bounding box surrounding said one face of said texture map.
61. The apparatus of claim 60 wherein said processor places said composite pixel values in corresponding pixel locations in said texture map in that said processor:
determines for each of said corresponding pixel locations in said texture map a corresponding color identification tag; and
compares color identification tags of said corresponding pixel locations in said texture map with the color identification tags stored in said buffer memory and places said composite pixel values in the corresponding pixel locations for which a
color identification tag match is found.
62. The apparatus of claim 50 wherein said processor blends said image pixel values by averaging said image pixel values.
63. The apparatus of claim 62 wherein said processor, in blending said image pixel values, deweights each image pixel value in accordance to its proximity to an edge of the corresponding image.
64. The apparatus of claim 63 wherein said pixel values represent colors.
65. The apparatus of claim 50 wherein the texture-to-image transform comprises an 8-parameter perspective transformation. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Technical Field
The invention is related to computer systems for constructing and rendering panoramic mosaic images from a sequence of still images, video images or scanned photographic images or the like.
2. Background Art
Image-based rendering is a popular way to simulate a visually rich tele-presence or virtual reality experience. Instead of building and rendering a complete 3D model of the environment, a collection of images is used to render the scene while
supporting virtual camera motion. For example, a single cylindrical image surrounding the viewer enables the user to pan and zoom inside an environment created from real images. More powerful image-based rendering systems can be built by adding a depth
map to the image or by using a larger collection of images.
The present invention is particularly directed to image-based rendering systems without any depth information, i.e., those which only support user panning, rotation, and zoom. Most of the commercial products based on this idea (such as QuickTime
VR) use cylindrical images with a limited vertical field of view, although newer systems support full spherical maps (e.g., Interactive Pictures, and RealVR). A number of techniques have been developed for capturing panoramic images of real-world
scenes. One way is to record an image onto a long film strip using a panoramic camera to directly capture a cylindrical panoramic image. Another way is to use a lens with a very large field of view such as a fisheye lens. Mirrored pyramids and
parabolic mirrors can also be used to directly capture panoramic images. A less hardware-intensive method for constructing full view panoramas is to take many regular photographic or video images in order to cover the whole viewing space. These images
must then be aligned and composited into complete panoramic images using an image mosaic or "stitching" method. Most stitching systems require a carefully controlled camera motion (pure pan), and only produce cylindrical images.
Cylindrical panoramas are commonly used because of their ease of construction. To build a cylindrical panorama, a sequence of images is taken by a camera mounted on a leveled tripod. If the camera focal length or field of view is known, each
perspective image can be warped into cylindrical coordinates. To build a cylindrical panorama, 3D world coordinates are mapped to 2D cylindrical screen coordinates using ##EQU1## where .theta. is the panning angle and .nu. is the scanline. Similarly,
3D world coordinates can be mapped into 2D spherical coordinates .theta., .phi. using ##EQU2##
Once each input image has been warped, constructing the panoramic mosaics for a leveled camera undergoing a pure panning motion becomes a pure translation problem. Ideally, to build a cylindrical or spherical panorama from a horizontal panning
sequence, only the unknown panning angles need to be recovered. In practice, small vertical translations are needed to compensate for vertical jitter and optical twist. Therefore, both a horizontal translation t.sub.x and a vertical translation t.sub.y
are estimated for each input image. To recover the translational motion, the incremental translation .delta.t=(.delta.t.sub.x,.delta.t.sub.y) is estimated by minimizing the intensity error between two images, I.sub.0, I.sub.1, ##EQU3## are corresponding
points in the two images, and t=(t.sub.x,t.sub.y) is the global translational motion field which is the same for all pixels. After a first order Taylor series expansion, the above equation becomes ##EQU4## where e.sub.i =I.sub.1 (x'.sub.1)-I.sub.0
(x.sub.i) is the current intensity or color error, and g.sub.i.sup.T =.gradient.I.sub.1 (x'.sub.1) is the image gradient of I.sub.1 at x'.sub.i. This minimization problem has a simple least-squares solution, ##EQU5## To handle larger initial
displacements, a hierarchical coarse-to-fine optimization scheme is used. To reduce discontinuities in intensity and color between the images being composited, a simple feathering process is applied in which the pixels in each image are weighted
proportionally to their distance to the edge (or more precisely, their distance to the nearest invisible pixel). Once registration is finished, the ends are clipped (and optionally the top and bottom), and a single panoramic image is produced.
Creating panoramas in cylindrical or spherical coordinates has several limitations. First, it can only handle the simple case of pure panning motion. Second, even though it is possible to convert an image to 2D spherical or cylindrical
coordinates for a known tilting angle, ill-sampling at north pole and south pole causes big registration errors. (Note that cylindrical coordinates become undefined as you tilt your camera toward north or south pole.) Third, it requires knowing the
focal length (or equivalently, field of view). While focal length can be carefully calibrated in the lab, estimating the focal length of the lens by registering two or more images using conventional techniques is not very accurate.
The automatic construction of large, high-resolution image mosaics is an active area of research in the fields of photogrammetry, computer vision, image processing, and computer graphics. Image mosaics can be used for many different
applications. The most traditional application is the construction of large aerial and satellite photographs from collections of images. More recent applications include scene stabilization and change detection, video compression and video indexing,
increasing the field of view and resolution of a camera, and even simple photo editing. A particularly popular application is the emulation of traditional film-based panoramic photography with digital panoramic mosaics, for applications such as the
construction of virtual environments and virtual travel. In computer vision, image mosaics are part of a larger recent trend, namely the study of visual scene representations. The complete description of visual scenes and scene models often entails the
recovery of depth or parallax information as well.
In computer graphics, image mosaics play an important role in the field of image-based rendering, which aims to rapidly render photorealistic novel views from collections of real (or pre-rendered) images. For applications such as virtual travel
and architectural walkthroughs, it is desirable to have complete (full view) panoramas, i.e., mosaics which cover the whole viewing sphere and hence allow the user to look in any direction. Unfortunately, most of the results to date have been limited to
cylindrical panoramas obtained with cameras rotating on leveled tripods with carefully designed stages adjusted to minimize motion parallax. This has limited the users of mosaic building ("stitching") to researchers and professional photographers who
can afford such specialized equipment. Present techniques are difficult because generally they require special camera equipment which provides pure panning motion with no motion parallax.
Problems to be Solved by the Invention:
It would be desirable for any user to be able to "paint" a full view panoramic mosaic with a simple hand-held camera or camcorder. However, this requires several problems to be overcome.
First, cylindrical or spherical coordinates should be avoided for constructing the mosaic, since these representations introduce singularities near the poles of the viewing sphere.
Second, accumulated misregistration errors need to be corrected, which are always present in any large image mosaic. For example, if one registers a sequence of images using pairwise alignments, there is usually a gap between the last image and
the first one even if these two images are the same. A simple "gap closing" technique is introduced in the specification below which forces the first and last image to be the same and distributes the resulting corrections across the image sequence.
Unfortunately, this approach works only for pure panning motions with uniform motion steps, a significant limitation.
Third, any deviations from the pure parallax-free motion model or ideal pinhole (perspective) camera model may result in local misregistrations, which are visible as a loss of detail or multiple images (ghosting).
SUMMARY OF THE INVENTION
The specification discloses how to avoid using cylindrical or spherical coordinates for constructing the mosaic by associating a rotation matrix (and optionally a focal length) with each input image, and performing registration in the input
image's coordinate system. Such mosaics are referred to herein as rotational mosaics. The system can use a postprocessing stage to project such mosaics onto a convenient viewing surface, i.e., to create an environment map represented as a
texture-mapped polyhedron surrounding the origin.
The specification discloses a global optimization process, which solves the problem of accumulated misregistration errors, to find the optimal overall registration.
The specification discloses how to solve the problem of loss of detail or image ghosting is solved by computing local motion estimates (block-based optical flow) between pairs of overlapping images, and using these estimates to warp each input
image so as to reduce the misregistration. This is less ambitious than actually recovering a perspective depth value for each pixel, but has the advantage of being able to simultaneously model other effects such as radial lens distortions and small
movements in the image.
The texture map construction method and apparatus claimed in this application constructs, from a set of overlapping images, a texture map divisible into plural faces. This is accomplished for each one of the plural faces by computing a texture
mapping transform which maps between pixel locations in the texture map and a three-dimensional coordinate system. For each image of the set, an image transform is computed which maps between pixel locations in the image and the three-dimensional
coordinate system. For each image of the set, the texture mapping transform and the image transform are combined to produce a texture map-to-image space transform mapping between pixel locations in the image and pixel locations in the texture map. For
each one of the pixel locations in the one face of the texture map, the system computes from the texture map-to-image space transform a pixel value of a pixel location in the image corresponding to the one pixel location in the one face of the texture
map. This produces a set of image pixel values from the set of overlapping images for the one pixel location in the texture space. Finally, the set of image pixel values are blended to produce a composite pixel value for the one pixel location in the
one face of the texture map.
A user-defined mapping between plural object faces in the three-dimensional coordinate system and the plural faces of the texture map is obtained. From this, space vectors are defined locating the plural object faces in the three-dimensional
coordinate system, and texture vectors are defined locating the plural faces in the texture map. The texture map transform is computed concatentating the texture vectors of the one face of the texture map to form a first matrix, concatenating the space
vectors of a corresponding face of the three-dimensional space to form a second matrix, and then combining the first matrix and the second matrix.
The faces in the texture map and the object faces have plural vertices, and the texture vectors locate the vertices of the faces in the texture map while the space vectors locate the vertices of the object faces. Preferably, the faces of the
texture map and the object faces have the same number of vertices n, and n space vectors are defined each locating a corresponding one of the n vertices of the object faces, while n texture vectors are defined each locating a corresponding one of the n
vertices of the faces of the texture map. Preferably, n=3 and the faces are triangles.
In accordance with one embodiment of the invention, the composite pixel values are placed into corresponding pixel locations in the texture map using a color stenciling feature in a unique color identification tag is assigned to each triangle in
the texture map and to each pixal in the triangle. This color tag is stored in a buffer at a location in the buffer associated with the respective triangle. The texture map typically has visible regions mapped to the three-dimensional coordinate system
and invisible regions. The color id tags of the visible regions of the texture map are propagated or grown into neighboring portions of the invisible regions. The blending of the pixel values is carried out in a bounding box surrounding the face of the
texture map. This bounding box can include pixel locations into which the color id tags have been grown.
Placing the composite pixel values in corresponding pixel locations in the texture map is then accomplished by determining for each of the corresponding pixel locations in the texture map a corresponding color identification tag, comparing color
identification tags of the corresponding pixel locations in the texture map with the color identification tags stored in the buffer memory. Each composite pixel value is placed in a corresponding pixel location for which a color identification tag match
is found.
Blending of the pixel values is accomplished by averaging the image pixel values. Preferably, each image pixel value is downweighted in accordance to its proximity to an edge of the corresponding image.
The disclosed system can be used to create full view panoramic mosaics from image sequences. Unlike current panoramic stitching methods, which usually require pure horizontal camera panning, the disclosed system does not require any controlled
motions or constraints on how the images are taken (as long as there is no strong motion parallax). For example, images taken from a hand-held digital camera can be stitched seamlessly into panoramic mosaics.
By taking as many images as needed, image mosaics can be constructed which cover as large a field of view as desired, e.g., a complete environment map. Because the image mosaics is represented in the invention using a set of transforms, there
are no singularity problems such as those existing at the top and bottom of cylindrical or spherical maps. This method is fast and robust because it directly recovers 3D rotations instead of general 8 parameter planar perspective transforms. By mapping
the mosaic onto an arbitrary texture-mapped polyhedron surrounding the origin, the virtual environment can be viewed or explored using standard 3D graphics viewers and hardware without requiring special-purpose players. Once a mosaic has been
constructed, it can, of course, be mapped into cylindrical or spherical coordinates, and displayed using a special purpose viewer. Such specialized representations are not necessary, and represent just a particular choice of geometry and texture
coordinate embedding. Instead, using the texture mapping method and apparatus claimed in this application, the mosaic can be converted to an environment map, i.e., by mapping the mosaic onto any texture-mapped polyhedron surrounding the origin. This
allows the use of standard 3D graphics APIs and 3D model formats, and allows the use of 3D graphics accelerators for texture mapping. The mapping process can be employed in a rendering process that can be just as fast as special-purpose viewers.
Furthermore, the mapping process can take advantage of 3D graphics (rendering) hardware, support multi-resolution textures, and allow for easy integration with regular 3D graphics.
BRIEF DESCRIPTION OF THE DRAWINGS
The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawings will be provided by the Patent and Trademark Office upon request and payment of the necessary fee.
FIG. 1 is a block diagram illustrating the operation of a system embodying the invention.
FIGS. 2A and 2B is a block diagram illustrating apparatus embodying a system of the invention.
FIG. 3 illustrates apparatus including a camera employed for the acquisition of images to form a panoramic image.
FIG. 4 is a detailed view of a portion of the apparatus of FIG. 3.
FIG. 5 illustrates a perspective warping and image.
FIG. 6 illustrates a sequence of images obtained in an incremental image alignment method.
FIG. 7 illustrates a pixel resampling problem encountered in the incremental image alignment method.
FIG. 8 is a flow diagram of the incremental alignment method.
FIG. 9 is a block diagram illustrating the signal flow in the incremental alignment method of FIG. 8.
FIG. 10 is a flow diagram illustrating a coarse-to-fine resolution hierarchy employed in one embodiment.
FIG. 11 is a flow diagram of the incremental 3D rotation alignment method.
FIG. 12 is a block diagram illustrating the signal flow in the incremental 3D rotation alignment method of FIG. 11.
FIG. 13 is a flow diagram illustrating a focal length estimation method which can be employed in carrying out the rotation alignment method.
FIG. 14 is a signal flow diagram illustrating a modification of the signal flow of FIG. 11 for carrying out a patch-based alignment.
FIG. 15 illustrates adjustment of a bundle of direction rays in accordance with a first approach to a global block alignment method.
FIG. 16 illustrates adjustment of a bundle of direction rays in accordance with a second approach (corresponding to Equation 33 below) to the global block alignment method.
FIG. 17 illustrates pair-wise adjustment of a bundle of direction rays in accordance with the preferred embodiment (corresponding to Equation 36 below) of the global block adjustment method.
FIG. 18 is a flow diagram illustrating the global block adjustment method.
FIG. 19 illustrates pair-wise motion estimation between pixels and average motion computation in accordance with a deghosting method.
FIG. 20 is a flow diagram illustrating a preferred embodiment of the deghosting method.
FIGS. 21 and 22 illustrate complementary warpings of each image in an overlapping pair of images in accordance with the deghosting method of FIG. 20.
FIG. 23 illustrates the averaging of plural pair-wise motion estimates in accordance with one aspect of the deghosting method.
FIG. 24 illustrates an array of patch-based motion estimates in accordance with the deghosting method.
FIG. 25A illustrates an inverse mapping method in accordance with a preferred embodiment of the deghosting method in which per-patch motion estimates are obtained.
FIG. 25B illustrates how the per-patch motion estimates are interpolated to produce a finer grid of per-pixel motion estimates.
FIG. 26 is a signal flow diagram corresponding to one embodiment for carrying out the deghosting method.
FIGS. 27 and 28 illustrate a cubic 3D model and corresponding 2D texture map for carrying out one embodiment of a texture mapping method.
FIG. 29 illustrates a mapping between triangular vertices of a 3D model and a 2D texture map employed in carrying out one embodiment of the texture mapping.
FIGS. 30 and 31 illustrate a spherical 3D model and corresponding 2D texture map for carrying out one embodiment of the texture mapping method.
FIG. 32 is a flow diagram illustrating the texture map computation process.
FIGS. 33-38 are panoramic images obtained using different aspects of the present invention in accordance with respective working examples.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Introduction:
This specification describes how to associate a rotation matrix (and optionally a focal length) with each input image, rather than explicitly projecting all of the images onto a common surface (e.g., a cylinder). In order to reduce accumulated
registration errors, a global alignment (block adjustment) is applied to the whole sequence of images, which results in an optimal image mosaic (in the least-squares sense). To compensate for small amounts of motion parallax introduced by translations
of the camera and other unmodeled distortions, a local alignment (deghosting) method is employed which warps each image based on the results of pairwise local image registrations. Combining both global and local alignment significantly improves the
quality of the image mosaics, thereby enabling the creation of full view panoramic mosaics with hand-held cameras.
The overall flow of processing in the system is illustrated in FIG. 1. First, if the camera intrinsic parameters are unknown, the user creates a small mosaic using a planar perspective motion model, from which a rough estimate of the focal
length is computed (block 110 of FIG. 1). Next, a complete initial panoramic mosaic is assembled sequentially by adding one image at a time and adjusting its position using the rotational motion model (block 120 of FIG. 1). Then, global alignment
(block adjustment) is invoked to modify each image's transformation (and focal length) such that the global error across all possible overlapping image pairs is minimized (block 130 of FIG. 1). This stage also removes any large inconsistencies in the
mosaic, e.g., the "gaps" that might be present in a panoramic mosaic assembled using the sequential method. Then, a local alignment (deghosting) method is invoked to reduce any local misregistration errors (block 140 of FIG. 1). The final mosaic can be
stored as a collection of images with associated transformations, or optionally converted into a texture-mapped polyhedron (environment map) (block 150 of FIG. 1). By mapping the mosaic onto an arbitrary texture-mapped polyhedron surrounding the origin,
the virtual environment is viewed using standard 3D graphics viewers and hardware (block 160) without requiring special purpose players.
Exemplary Operating Environment
FIG. 2A and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented. Although not required, the invention will be described in the general
context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement
particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable
consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a
communications network. In a distributed computing environment, program modules may be located both local and remote memory storage devices.
With reference to FIG. 2A, an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional personal computer 20, including a processing unit 21, a system memory 22, and a system bus 23
that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a
variety of bus architectures. The system memory includes read only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system 26 (BIOS), containing the basic routines that helps to transfer information between elements within the
personal computer 20, such as during start-up, is stored in ROM 24. The personal computer 20 further includes a hard disk drive 27 for reading from and writing to a hard disk, not shown, a magnetic disk drive 28 for reading from or writing to a
removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD ROM or other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the
system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions,
data structures, program modules and other data for the personal computer 20. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 29 and a removable optical disk 31, it should be appreciated by those
skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read only
memories (ROM), and the like, may also be used in the exemplary operating environment.
A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more application programs 36, other program modules 37, and program data 38. A user may
enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These
and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A
monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as
speakers and printers.
The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another personal computer, a server, a router, a network PC, a
peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in FIG. 2A. The logical connections depicted
in FIG. 2A include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and Internet.
When used in a LAN networking environment, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54
or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program
modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link
between the computers may be used.
The hardware or apparatus of FIG. 2A may be modified to include additional features in order to carry out the present invention, as illustrated in FIG. 2B. In FIG. 2B, a camera 210 (such as a digital/electronic still or video | | |