Academia.eduAcademia.edu

Image-based texture replacement using multiview images

2008, Proceedings of the 2008 ACM symposium on Virtual reality software and technology

Augmented reality is concerned with combining real-world data, such as images, with artificial data. Texture replacement is one such task. It is the process of painting a new texture over an existing textured image patch, such that depth cues are maintained. This paper proposes a general and automatic approach for performing texture replacement, which is based on multiview stereo techniques that produce depth information at every pixel. The use of several images allows us to address the inherent limitation of previous studies, which are constrained to specific texture classes, such as textureless or near-regular textures. To be able to handle general textures, a modified dense correspondence estimation algorithm is designed and presented.

Image-Based Texture Replacement Using Multiview Images Doron Tal ∗ Technion – Israel Inst. of Technology Ilan Shimshoni † University of Haifa Ayellet Tal‡ Technion – Israel Inst. of Technology Abstract Augmented reality is concerned with combining real-world data, such as images, with artificial data. Texture replacement is one such task. It is the process of painting a new texture over an existing textured image patch, such that depth cues are maintained. This paper proposes a general and automatic approach for performing texture replacement, which is based on multiview stereo techniques that produce depth information at every pixel. The use of several images allows us to address the inherent limitation of previous studies, which are constrained to specific texture classes, such as textureless or near-regular textures. To be able to handle general textures, a modified dense correspondence estimation algorithm is designed and presented. Reference image Left angle Right angle Keywords: Texture replacement, multiview stereo 1 Introduction A basic problem in many augmented reality systems is creating new artificial images given real images [Azuma 1997; Berger et al. 1999; Debevec 1998; Keren et al. 2004]. Texture replacement is a challenging example of this type, which can be utilized in other applications, such as post-production in the entertainment industry. It is the process of painting a new texture over an existing textured image patch, in a manner that preserves the 3D appearance of the original patch. The 3D appearance can be visualized by the deformation of the texture over the 3D body in two ways. First, the texture pattern should be smaller on the more distant parts of the object. Second, as the angle between the surface normal and the viewing direction increases, the area of the texture in the image decreases (foreshortening). Several texture replacement approaches have been suggested in recent years. The key distinction between them is the shape extraction method they use. Some of the techniques rely on shape from texture methods [Liu et al. 2004; Scholz and Magnor 2006], while others are based on shape from shading, where the luminance of the object is used, assuming a Lambertian reflectance model [Fang and Hart 2004; Zelinka et al. 2005]. In both cases a single image is given and thus the techniques are limited to specific texture types: near regular in the first case and textureless surfaces in the latter. The limitation to certain texture types severely hinders these approaches. Moreover, determining in advance whether a given tex∗ e-mail: [email protected] [email protected] ‡ e-mail: [email protected] † e-mail: Output Figure 1: Texture replacement: Note that the input texture is neither smooth nor near-regular ture fits the restrictions of a given method might in itself be complicated. The key challenge is to design a texture replacement approach that can handle all varieties of input textures. Since it cannot be achieved using a single image, we propose to use a few images. Given a set of multiview images of an object, one of which is the reference image, the goal is to replace the texture of a given patch in this image by some other texture, using the other images for shape reconstruction. This paper makes the following contributions. First, a general, endto-end, automatic approach for accurate texture replacement, which is based on multiview stereo for shape reconstruction, is presented. Second, a method specifically tailored for dense correspondence estimation between textured images, is described. Figure 1 gives an example of the results of our approach. 2 Related Work In the following we describe previous texture replacement algorithms. Then, we discuss multiview stereo methods, which have not been utilized before for texture replacement. Texture replacement: The first class of texture replacement approaches is based on shape from texture. These methods use texture deformations to extract the geometric characteristics of the surface. One of the first papers on texture replacement is [Tsin et al. 2001], which focuses on decoupling the illumination and the texture. The new texture is recoupled with the original illumination to create pleasing results. This method is limited to regular textured inputs and to fronto-parallel planes. set of point correspondences and the fundamental matrix that relates corresponding points in the two images [Hartley and Zisserman 2004]. In [Liu et al. 2004] a wider set of textures is handled. This algorithm uses near-regular textures both for the source and for the target textures. The deformation field is extracted from the source texture interactively. Using the deformation field, this texture is dewarped into a regular texture. Finally, a regular texture coupled with the original illumination is deformed back into the same geometric structure. This step consists of three stages: feature extraction from each image; creation of an initial putative list of matches; and identifying correct matches (inliers). In [Scholz and Magnor 2006] a technique for texture replacement of garments on a monocular video sequence is presented. Special coded print on the garment enables the surface coordinate system to be extracted from it with relative ease. The other class of methods is based on shape from shading, where the reflection model of the surface is usually assumed to be nearLambertian. The method proposed in [Fang and Hart 2004] is based on automatic extraction of a putative normal field and is followed by a manual correction stage. The corrected normal field is then approximated using 2D planes, and a texture is synthesized for each plane separately. Finally, the textured regions are seamed together. The method is limited to textureless input surfaces. The method described in [Zelinka et al. 2005] extends the class of textures to include piecewise constant textures. It does so by an initial stage of image segmentation into different colors. Then, the surface normals of the most reliable color are extracted and later propagated into less reliable regions. The texture synthesis is done using jump maps [Zelinka and Garland 2004], which produce lower quality results than alternative methods but are very fast. The advantage of the above approaches is that they are based on a single image. This comes at a hefty price of constraining the texture types, assuming an ideal reflection model. Since these assumptions are usually not precisely satisfied, user interaction is often required. To overcome these limitations, we propose to perform texture replacement using multiple images. Multiview stereo: There are numerous stereo vision methods, where two or more images of the same object, taken from different directions, are used to reconstruct the 3D shape of the object. [Scharstein and Szeliski 2002] surveys narrow-baseline two-frame stereo methods, i.e., using two fairly similar images of the object. [Seitz et al. 2006] surveys multiview stereo algorithms. It is demonstrated that multiview algorithms present significantly better results for 3D reconstruction than the single-image approaches mentioned above. None of these algorithms has taken the process onward to include texture replacement. Moreover, these algorithms have usually been applied to textureless surfaces, implying that with a high probability, image gradients correspond to depth discontinuities. This assumption does not hold for textured images. Our proposed algorithm handles this case. 3 General Approach Our approach requires, as input, a set of images, one of which is regarded as the reference image, in which the texture patch should be replaced. The approach consists of four stages: sparse matching, bundle adjustment, dense reconstruction, and texture mapping, as described below. 3.1. Sparse matching: In this stage we acquire an initial match between two images at a time. This match is composed from a sparse The features extracted by our algorithms are the widely used scaleinvariant feature transform (SIFT) [Lowe 2004; Lowe ]. Feature vectors obtained by SIFT are invariant to image scale and rotation, and partially invariant (i.e. robust) to changes in viewpoint and illumination. Given two sets of feature points, one for each image, a putative match is constructed by matching each descriptor of the first set to its nearest neighbor descriptor in the other set. From this set of matches, the pairs selected are those whose distance to the closest neighbor is much smaller than the distance to the second-closest neighbor. (In our implementation, the ratio threshold is 0.6.) Many incorrect matches are thus removed. The process described above yields a set of putative matches, many of which are still incorrect. The goal is to identify the inlier matches. This is done by using the RANSAC (Random Sample Consensus) approach [Fischler and Bolles 1981]. RANSAC estimates the parameters of a mathematical model from a set of observed data that contains outliers. In our case, the mathematical model is the fundamental matrix, the observed data is the point matches, and the outliers are the incorrect matches. RANSAC achieves its goal by iteratively applying the following procedure. A minimal random subset of the original data points is selected. From this subset a model is computed. This model will be close to the correct model if these points are inliers. The hypothesis that this is so is then tested by checking for each of the data points its agreement with the model. If sufficiently many points have been classified as inliers relative to the estimated model, then the model is reasonably good. This iterative procedure is repeated a fixed number of times, each time producing either a rejected model because too few points are classified as inliers, or a refined model, together with a corresponding error measure. The algorithm selects the fundamental matrix for which the largest number of matches conform. In our implementation, we use a variant of the algorithm known as MSAC, in which the score is not based only on the number of inlier matches, but also on the actual distances computed for them [Torr and Zisserman 2000]. In our implementation, 1000 cycles of RANSAC are run. Assuming that 50% of the matcher are inliers, 1000 cycles reflect a probability of 99.999% that an error-free minimal (seven) set of matches will be chosen. Figure 2 shows a result in which the blue points indicate matched features. 3.2. Bundle Adjustment: Once stereo sparse matching has been accomplished for all image pairs, the next step is to build a consistent model of all the images. Bundle adjustment performs this task of visual reconstruction to produce a jointly optimal 3D structure and viewing parameter estimates (camera pose and/or calibration). Optimal refers to minimizing a cost function that quantifies the model fitting error. Jointly refers to a solution that is simultaneously optimal with respect to both structure and camera parameters [Triggs et al. 1999]. In a nutshell, the problem can be formulated as follows. Assume that n 3D points are seen in m views. Furthermore, assume that each camera j is parameterized by a vector a j which represents the rotation and translation parameters of the camera in vector form. Let xi j be the projection of the ith point on the jth image. Bundle parameters for different 3D points and cameras can be exploited, as done in [Lourakis and Argyros 2004]. 3.3. Dense Reconstruction: Once accurate sparse matching has been accomplished and the 3D structure and the viewing parameters were found for this sparse set of points, they are used to perform the multiview dense reconstruction. This step calculates the depth of all the pixels in the reference image. Our algorithm, which is specifically tailored for textured images, is described in Section 4. 3.4. Texture Mapping: The last stage of the framework is casting a new textured image on the depth map acquired. The aim is to perform isometric-preserving texture mapping, i.e., distancepreserving mapping. There are several known algorithms for approximated isometric surface parametrization [Sorkine et al. 2002; Zigelman et al. 2002]. Our algorithm follows the general approach of the latter. Figure 2: Sparse matching. The matched features are marked in blue. adjustment amounts to refining the set of initial camera and structure parameter estimates in order to find the set of parameters that most accurately predicts the location of the observed n points in the set of the m available images. It minimizes the reprojection error with respect to all 3D points and camera parameters. General purpose solutions of the non-linear problems are computationally demanding when employed to minimize functions that depend on many parameters. Fortunately, when solving minimization problems in bundle adjustment, the absence of interaction among The key idea is to calculate the geodesic distances on the mesh [Kimmel and Sethian 1998; Mitchell et al. 1987; Surazhsky et al. 2005] and then find an optimal embedding on a 2D plane that preserves these distances. This is done using Multi-Dimensional Scaling (MDS) [Kruskal and Wish 1978; Cox and Cox 1994]. We use a simple MDS method called classical scaling, which uses the first two eigenvectors of the distance matrix to project the points to the plane. After the 2D isometric embedding has been extracted, it is possible to “paint” the 3D mesh. This is done by first laying the texture image on the plane of the 2D embedding. Then, by traversing the original image, each pixel is associated with a triangle on the mesh. The sparse isometric mapping is interpolated using barycentric coordinates. Eventually, a bilinear interpolation is performed in order to find the texture color painted over the original pixel position, as illustrated in Figure 3. Figure 3: Texture mapping. To paint a new texture over the original image (bottom left), each pixel in the image is mapped to the 2D mesh embedding (bottom right). The triangle that includes the pixel is found in the simplified mesh (up). Then using barycentric coordinates, the mapping to the 2D mesh embedding is computed. Since the depth map is large and complex (usually more than 100,000 pixels), texture mapping is preceded by simplification [Garland and Heckbert 1997; Garland ]. 4 Dense Reconstruction Dense reconstruction is a process that calculates the depths of all the pixels. Assume that we are given m images Ii , i = 1 . . . m, which associate each 2D-coordinate x with intensity values Ii (x). The images are taken with a set of cameras, for which the internal and external calibrations are known, as a result of an internal calibration procedure [Strobl et al. ] and bundle adjustment. Our aim is to estimate the depth map D1 that assigns a depth-value D1 (x) to all pixel locations in the reference image I1 . In this multiview stereo problem, the information from all the images will contribute to the computation of each pixel of the map D1 . To enable free image acquisition using a hand-held digital camera, the algorithm should handle wide baselines. However, inherent to the wide-baseline setting, is the problem of occlusions, in which not all parts of the scene visible in a particular image, are also visible in the other images. Moreover, because of the large difference in viewpoint, pixels in different images, which are projections of the same point in the scene, might have different color values. We are seeking a multiview reconstruction algorithm that satisfies the following requirements: (1) Occlusion handling. Occlusion is a difficult problem because, when a pixel in the reference image is occluded in one of the other images, their photometric values will usually be different. The algorithm should distinguish between this case and the case where the pixels differ because there is an error in the depth estimation (i.e., the pixels do not match). (2) Texturedsurface handling – reconstructing objects regardless of their texture. In a given image of a textured surface, the algorithm may detect texture edges or it may detect edges that appear as a result of depth discontinuities. An algorithm that uses the edge cue in the 3D reconstruction as a depth discontinuity cue has to be able to distinguish between both types of edges. We base our algorithm on [Strecha et al. 2004], which is a probability-based optimization approach for multiview reconstruction. Though this algorithm generally produces accurate results, it has a couple of drawbacks. First, the algorithm’s performance depends largely on a good initialization, otherwise it might get stuck in local minima. Textured images are, by their very nature, not photometrically smooth, because a small error in the depth of a pixel can cause a non-smooth change in its color. Thus, this algorithm often fails to converge for our examples. Second, the assumption made that the photometric properties of all occluded pixels are similar, is beneficial for the case of specularities, but is insufficient for dealing with real occlusions. In particular, not only occluded pixels need not have the same color, but also in textured images, their colors vary significantly. In our approach, the first problem is overcome by extracting many feature points in the previous stages. A key observation is that this can be achieved by taking several narrow-baseline image pairs, such that for each image pair many matches can be easily found (e.g., Figure 5(a)-(f)). Occlusion and foreshortening are handled through the use of a few pairs, rather than a single stereo pair, and are stitched together via bundle adjustment. The depth of these points, which have been estimated accurately by bundle adjustment, is kept fixed throughout the algorithm. These points are used as anchor points, and the depth has to be estimated only in the regions between them. The second problem is solved by changing the way occlusion is handled. Intuitively, rather than assuming that all occluded pix- els have a similar color, we base the visibility update on the previous pixel likelihood estimation. Namely, we assume that pixels which are very similar to their counterparts in the reference image are probably visible, as explained below. The rest of this section outlines the algorithm. Given the camera calibrations and a depth value D1 (x1 ) for a position x1 in I1 , the corresponding pixel location in the ith image can be computed by: T λi xi = D1 (x1 )Ki RTi R−1 1 x1 + Ki Ri (t1 − ti ), (1) where Ki , Ri and ti are the internal calibration matrix, rotation, and translation of the ith camera, respectively. The overall mapping is denoted as xi = li (x1 , D1 (x1i )) or xi = li (x1 ). Each input image Ii , i = 2, . . . , m is regarded as a noisy measurement of the reference image I1 : Ii (li (x1 )) = I1 (x1 ) + ε , (2) where ε ∼ N(0, Σ) is the image noise, assumed to be normally distributed with zero mean and covariance matrix Σ. Occlusion is modeled by a set of visibility maps Vi (x1 ), which specify whether a scene point X that projects onto x1 in I1 is also visible in image Ii . Every element of Vi (x1 ) is a binary random variable, which is either 1 or 0, corresponding to visibility or occlusion, respectively. The set Vi contains the hidden variables, and their values must be inferred from the input images. Note that, as the first image was chosen as the reference image, all pixels in the first image are visible and therefore, V1 (x1 ) = 1. Under the assumption that the image noise is independent and identically distributed for all pixels in all views, the data likelihood L can be written as the product of all individual pixel probabilities, where the product is: m L = ∏ ∏ N(I1 (x) − Ii (li (x)); 0, Σ). (3) i=1 x∈I1 The algorithm assumes that the surface is smooth nearly everywhere (smoothness prior). The smoothness prior is modeled as an exponential density distribution of the form exp(−R(I1 , D1 )/λ ). Here, λ is a parameter that reflects our prior belief as to how smooth D1 is and R(I1 , D1 ) is a data-driven regularizer that relates image gradients to depth discontinuities. A high image gradient existing at a particular point x1 causes the algorithm to assume that a large depth discontinuity at that point and in that direction is more likely a-priori. The following regularizer is used: R(I1 , D1 ) = ∇DT1 T (∇I1 )∇D1 . (4) Here, T (∇I1 ) is a diffusion tensor defined by: T (∇I1 ) = 1 (∇I1⊥ ∇I1⊥T + ν 2 1), |∇I1 |2 + 2ν 2 (5) where 1 is the identity matrix, ν is a parameter controlling the degree of anisotropy, namely it controls the level of alignment between D1 ’s and I1 ’s gradients. The downside of this data-driven regularizer is when attempting to recover the depth map of textured scene. Textures usually do not have depth discontinuities however, they do contain significant intensity gradients. Using the data-driven regularizer may lead to increased noise within the texture elements. In our algorithm, we adjust the anisotropy parameter ν according to the scene at hand. (a) Depth map (b) Textured depth map of the reconstructed scene from different views Figure 4: Reconstruction results Minimizing the negative logarithm leads to the following energy formulation: E(D1 ,V ) = m ∑ ∑ Vi (x) i=2 x∈I1 1 + λ R(I1 , D1 ), 1 2 √ di (x)T Σ−1 di (x) + log( To minimize the energy, an expectation maximization (EM) procedure is used. This procedure iterates between the estimation of V (the visibility maps) and the minimization of E(D1 ,V ) (Equation 6), as follows. (k + 1)th E-step: In the iteration, the hidden variables Vi (x1 ), are replaced by their conditional expectation, given the current estimate (k) D1 for D1 . The question is how to update Vi . Basing it on a global estimation of the probability of a pixel of a certain color to be occluded, as done in [Strecha et al. 2004], is insufficient for producing accurate visibility maps. This is so since it focuses mostly on photometric problems such as specular reflections, where all specular pixels have the color of the light source. Occluded pixels, however, are usually not of the same color. Instead, we base the visibility update on the previous pixel likelihood estimation, assuming that the pixel in image i have a similar color to its corresponding pixel in the reference. We use the following update step: N(di (x); 0, Σ) . max N(di (y); 0, Σ) Results The input images were taken using an off-the-shelf standard Canon A70 with 3MP. As a prepossessing stage, the radial distortion and internal calibration of the camera are extracted [Strobl et al. ]. Then, the images are dewarped, to eliminate the camera radial distortion.  2π |Σ|) (6) where di (x) = I1 (x) − Ii (li (x)). Each term in the sum is weighted by the probability of the pixel to be visible. Vi (x) ← 5 (7) y∈I1 M-step: At the M-step the intent is to compute values for D1 that minimize Equation 6, given the current estimates of Vi . To minimize E w.r.t D1 , we follow a gradient descent approach. By applying the Euler-Lagrange formalism, we get: m 1 ∂E = ∑ −2Vi (I1 − Ii (li ))T Σ−1 ∇Ii (li )∂ li + div(T (∇I1 )∇D1 ). ∂ D1 i=2 λ (8) Image I1 is excluded from the sum, because l1 (x1 ) is the identity transformation. The derivative ∂ li is a 2-vector, whose expression is derived from Equation 1. Figure 4 illustrates the 3D reconstruction results of our running example. We have built an interactive utility, in which the user can select the scale, position, and rotation of the new texture. This is done during the final step of the algorithm, by placing the new texture on the flattened surface (i.e., the result of the MDS step described above). This utility provides immediate feedback on how the final result would look. Figures 1,5,6 illustrate our results. Note that these results cannot be compared to previous results, since previous work assumed that the input is a single image and the texture is constrained, while in our case the input consists of a few images and the texture is arbitrary. To generate the result shown in Figure 1 three images are used and a total of 1996 correspondences are automatically extracted. For Figure 5, six images are used and a total of 3765 correspondences are extracted. There are three pairs of close images, each is regarded as a narrow-baseline stereo pair. The advantage of this image configuration is that it is able to solve the main problem in the dense reconstruction – not having enough sparse points. The narrow-baseline image pair produces many correspondences for 3D points whose normals point roughly towards the camera. In order to extract correspondences of points that point in other directions, two additional narrow-baseline image pairs are used. A similar configuration is used in Figure 6 (top), for replacing a bread texture using 880 correspondences. For obtaining the results shown in Figure 6 (bottom), 3 1.5MP images are used and 1269 correspondences are extracted. The algorithm was implemented in unoptimized Matlab and ran on a 1.86GHz Intel Core2. Sparse matching takes 6.5 minutes using [Lowe ], which can be accelerated to under 10 seconds using more efficient data structures [Lowe 2004; Goshen and Shimshoni 2006]. Bundle adjustment and dense reconstruction take together 3 minutes. Texture mapping takes 1.5 minutes, mostly devoted to fast marching. 6 Conclusions This paper presents an automatic end-to-end approach for texture replacement, which is based on state-of-the-art techniques for shape reconstruction from multiview images. The dense reconstruction stage of our approach is modified to deal with any kind of textured surfaces. (a) Reference image (b) Left angle (c) Right angle (d) Straight angle (e) Left angle (f) Right angle (g) Reconstructed mesh (h) Texture replacement Figure 5: Chubby statue result. Input images that consist of three narrow-baseline image pairs (a-f), reconstructed mesh (g), and texture replacement results (h). As the results show, the major advantage of this method over previous methods, is its ability to paint a new texture on an image containing any type of texture. The drawback of the approach is that more than a single image has to be used. The benefits of taking several images outweigh the drawbacks of the single-image techniques, which are inaccurate, and thus require user interaction, and are constrained to special classes of textures. In the future we wish to design a method that can handle video streams of deforming textures, rather than utilizing a small set of images of a static surface. We would also like to develop methods for illumination extraction for general textured images, which would complement this work. Acknowledgments: This work was partially supported by the European FP6 NoE grant 506766 (AIM@SHAPE), by the Israeli Ministry of Science, Culture & Sports, grant 3-3421, by Dr. I. Libling Fund, by the S.and N. Grand Research Fund, and by the A.M.N Foundation. References A ZUMA , R. 1997. A survey of augmented reality. Presence: Teleoperators and Virtual Environments 6, 4, 355–385. B ERGER , M.-O., W ROBEL -DAUTCOURT, B., P ETITJEAN , S., AND S IMON , G. 1999. Mixing synthetic and video images of an outdoor urban environment. Machine Vision Applications 11, 3, 145–159. C OX , M. A. A., AND C OX , T. 1994. Multidimensional Scaling. Chapman and Hall. D EBEVEC , P. 1998. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In SIGGRAPH, 189–198. (a) Reference image (b) Texture replacement Figure 6: Texture replacement results of bread (top) and sofa (bottom) images. FANG , H., AND H ART, J. C. 2004. Textureshop: texture synthesis as a photograph editing tool. ACM Trans. Graph 23, 3, 354–359. K EREN , S., S HIMSHONI , I., AND TAL , A. 2004. Placing threedimensional models in an uncalibrated single image of an architectural scene. PRESENCE 13, 6, 692–707. F ISCHLER , M. A., AND B OLLES , R. C. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381–395. K IMMEL , R., AND S ETHIAN , J. 1998. Computing Geodesic Paths on Manifolds. Proceedings of the National Academy of Sciences of the United States of America 95, 15, 8431–8435. G ARLAND , M. Qslim software package. Website. http://graphics.cs.uiuc.edu/∼garland/software/qslim.html. K RUSKAL , J. B., AND W ISH , M. 1978. Multidimensional Scaling. Sage. G ARLAND , M., AND H ECKBERT, P. 1997. Surface simplification using quadric error metrics. Proceedings of SIGGRAPH’97, 209–215. G OSHEN , L., AND S HIMSHONI , I. 2006. Balanced exploration and exploitation model search for efficient epipolar geometry estimation. In European Conference on Computer Vision, II: 151–164. H ARTLEY, R. I., AND Z ISSERMAN , A. 2004. Multiple View Geometry in Computer Vision, second ed. Cambridge University Press. L IU , Y., L IN , W.-C., AND H AYS , J. H. 2004. Near regular texture analysis and manipulation. ACM Transactions on Graphics 23, 3, 368 – 376. L OURAKIS , M., AND A RGYROS , A. 2004. The design and implementation of a generic sparse bundle adjustment software package based on the Levenberg-Marquardt algorithm. Tech. Rep. 340, Institute of Computer Science - FORTH, Heraklion, Crete, Greece, Aug. L OWE , D. G. Demo software: Sift keypoint detector. Website. http://www.cs.ubc.ca/∼lowe/keypoints/. L OWE , D. G. 2004. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision 60, 2 (Nov.), 91–110. M ITCHELL , J., M OUNT, D., AND PAPADIMITRIOU , C. 1987. The discrete geodesic problem. SIAM Journal on Computing 16, 4, 647–668. S CHARSTEIN , D., AND S ZELISKI , R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 1-3, 7–42. S CHOLZ , V., AND M AGNOR , M. 2006. Texture replacement of garments in monocular video sequences. In Eurographics Symposium on Rendering, 305–312. S EITZ , S. M., C URLESS , B., D IEBEL , J., S CHARSTEIN , D., AND S ZELISKI , R. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proc. IEEE Conf. Comp. Vision Patt. Recog., I: 519–528. S ORKINE , O., C OHEN -O R , D., G OLDENTHAL , R., AND L ISCHINSKI , D. 2002. Bounded-distortion piecewise mesh parameterization. In Proceedings of IEEE Visualization, 355–362. S TRECHA , C., F RANSENS , R., AND VAN G OOL , L. J. 2004. Wide-baseline stereo from multiple views: A probabilistic account. In Proc. IEEE Conf. Comp. Vision Patt. Recog., I: 552– 559. S TROBL , K., S EPP, W., F UCHS , S., PAREDES , C., AND A RBTER , K. Camera calibration toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib doc/. S URAZHSKY, V., S URAZHSKY, T., K IRSANOV, D., G ORTLER , S., AND H OPPE , H. 2005. Fast exact and approximate geodesics on meshes. ACM Trans. Graph 24, 3, 553–560. T ORR , P. H. S., AND Z ISSERMAN , A. 2000. MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding 78, 1 (Apr.), 138– 156. T RIGGS , B., M C L AUCHLAN , P., H ARTLEY, R. I., AND F ITZGIB BON , A. W. 1999. Bundle adjustment: A modern synthesis. In Vision Algorithms Workshop: Theory and Practice, 298–372. T SIN , Y. H., L IU , Y., AND R AMESH , V. 2001. Texture replacement in real images. In Proc. IEEE Conf. Comp. Vision Patt. Recog., II:539–544. Z ELINKA , S., AND G ARLAND , M. 2004. Jump map-based interactive texture synthesis. ACM Transactions on Graphics 23, 4, 930–962. Z ELINKA , S., FANG , H., G ARLAND , M., AND H ART, J. C. 2005. Interactive material replacement in photographs. In Graphics Interface, K. Inkpen and M. van de Panne, Eds., 227–232. Z IGELMAN , G., K IMMEL , R., AND K IRYATI , N. 2002. Texture mapping using surface flattening via multidimensional scaling. IEEE Transactions on Visualization and Computer Graphics 8, 2, 198–207.