Academia.eduAcademia.edu

Multiview Photometric Stereo

2000, IEEE Transactions on Pattern Analysis and Machine Intelligence

This paper addresses the problem of obtaining complete, detailed reconstructions of textureless shiny objects. We present an algorithm which uses silhouettes of the object, as well as images obtained under changing illumination conditions. In contrast with previous photometric stereo techniques, ours is not limited to a single viewpoint but produces accurate reconstructions in full 3D. A number of images of the object are obtained from multiple viewpoints, under varying lighting conditions. Starting from the silhouettes, the algorithm recovers camera motion and constructs the object's visual hull. This is then used to recover the illumination and initialize a multiview photometric stereo scheme to obtain a closed surface reconstruction. There are two main contributions in this paper: First, we describe a robust technique to estimate light directions and intensities and, second, we introduce a novel formulation of photometric stereo which combines multiple viewpoints and, hence, allows closed surface reconstructions. The algorithm has been implemented as a practical model acquisition system. Here, a quantitative evaluation of the algorithm on synthetic data is presented together with complete reconstructions of challenging real objects. Finally, we show experimentally how, even in the case of highly textured objects, this technique can greatly improve on correspondence-based multiview stereo results.

548 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Multiview Photometric Stereo Carlos Hernández, Member, IEEE, George Vogiatzis, Member, IEEE, and Roberto Cipolla, Member, IEEE Abstract—This paper addresses the problem of obtaining complete, detailed reconstructions of textureless shiny objects. We present an algorithm which uses silhouettes of the object, as well as images obtained under changing illumination conditions. In contrast with previous photometric stereo techniques, ours is not limited to a single viewpoint but produces accurate reconstructions in full 3D. A number of images of the object are obtained from multiple viewpoints, under varying lighting conditions. Starting from the silhouettes, the algorithm recovers camera motion and constructs the object’s visual hull. This is then used to recover the illumination and initialize a multiview photometric stereo scheme to obtain a closed surface reconstruction. There are two main contributions in this paper: First, we describe a robust technique to estimate light directions and intensities and, second, we introduce a novel formulation of photometric stereo which combines multiple viewpoints and, hence, allows closed surface reconstructions. The algorithm has been implemented as a practical model acquisition system. Here, a quantitative evaluation of the algorithm on synthetic data is presented together with complete reconstructions of challenging real objects. Finally, we show experimentally how, even in the case of highly textured objects, this technique can greatly improve on correspondence-based multiview stereo results. Index Terms—Photometric stereo, multiple views, light calibration, silhouette. Ç 1 INTRODUCTION DIGITAL archiving of 3D objects is a key area of interest in cultural heritage preservation. While laser range scanning is one of the most popular techniques, it has a number of drawbacks, namely, the need for specialized, expensive hardware and also the requirement of exclusive access to an object for significant periods of time. Also, for a large class of shiny objects such as porcelain or glazed ceramics, 3D scanning with lasers is challenging [1]. Recovering 3D shape from photographic images is an efficient, cost-effective way to generate accurate 3D scans of objects. Several solutions have been proposed for this long-studied problem. When the object is well textured, its shape can be obtained by densely matching pixel locations across multiple images and triangulating [2], however, the results typically exhibit high frequency noise. Alternatively, photometric stereo is a well-established technique which uses the shading cue and can provide very detailed, but partial 2.5D reconstructions [3]. In this paper, we propose an elegant and practical method for acquiring a complete and accurate 3D model from a number of images taken around the object, captured under changing light conditions (see Fig. 1). The changing (but otherwise unknown) illumination conditions uncover the fine geometric detail of the object surface which is obtained by a generalized photometric stereo scheme. The object’s reflectance is assumed to follow Lambert’s law, i.e., points on the surface keep their appearance constant irrespective of . C. Hernández and G. Vogiatzis are with the Computer Vision Group, Toshiba Research Europe, 208 Cambridge Science Park, Cambridge, CB4 0GZ, UK. E-mail: {carlos.hernandez, george}@crl.toshiba.co.uk. . R. Cipolla is with the University of Cambridge, Cambridge, CB2 1PZ, UK. E-mail: [email protected]. Manuscript received 25 May 2007; revised 18 Sept. 2007; accepted 27 Sept. 2007; published online 26 Oct. 2007. Recommended for acceptance by Y. Sato. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-2007-05-0306. Digital Object Identifier no. 10.1109/TPAMI.2007.70820. 0162-8828/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society VOL. 30, NO. 3, MARCH 2008 viewpoint. The method can, however, tolerate isolated specular highlights, typically observed in glazed surfaces such as porcelain. We also assume that a single, distant light source illuminates the object and that it can be changed arbitrarily between image captures. Finally, it is assumed that the object can be segmented from the background and silhouettes extracted automatically. 2 RELATED WORK This paper addresses the problem of shape reconstruction from images and is therefore related to a vast body of computer vision research. We draw inspiration from the recent work of [4] where the authors explore the possibility of using photometric stereo with images from multiple views when correspondence between views is not initially known. Picking an arbitrary viewpoint as a reference image, a depth-map with respect to that view serves as the source of approximate correspondences between frames. This depth-map is initialized from a Delaunay triangulation of sparse 3D features located on the surface. Using this depth-map, their algorithm performs a photometric stereo computation obtaining normal directions for each depth-map location. When these normals are integrated, the resulting depth-map is closer to the true surface than the original. The paper presents high quality reconstructions and gives a theoretical argument justifying the convergence of the scheme. The method, however, relies on the existence of distinct features on the object surface which are tracked to obtain camera motion and initialize the depth-map. In the class of textureless objects we are considering, it may be impossible to locate such surface features and, indeed, our method has no such requirement. Also, the surface representation is still depth-map-based and, consequently, the models produced are 2.5D. A similar approach of extending photometric stereo to multiple views and more complex BRDFs was presented in [5] with the limitation of almost planar 2.5D reconstructed surfaces. Our method is based on the same fundamental principle of bootstrapping photometric stereo with approximate correspondences, but we use a general volumetric framework which allows complete 3D reconstructions from multiple views. Quite related to this idea is the work of [6] and [7] where photometric stereo information is combined with 3D range scan data. In [6], the photometric information is simply used as a normal map texture for visualization purposes. In [7], a very good initial approximation to the object surface is obtained using range scanning technology which, however, is shown to suffer from high-frequency noise. By applying a fully calibrated 2.5D photometric stereo technique, normal maps are estimated which are then integrated to produce an improved, almost noiseless surface geometry. Our acquisition technique is different from [7] in the following respects: 1) We only use standard photographic images and simple light sources, 2) our method is fully uncalibrated—all necessary information is extracted from the object’s contours, and 3) we completely avoid the time consuming and error prone process of merging 2.5D range scans. The use of the silhouette cue is inspired by the work of [8] where a scheme for the recovery of illumination information, surface reflectance and geometry is described. The algorithm described makes use of frontier points, a geometrical feature of the object obtained by the silhouettes. Frontier points are points of the visual hull where two contour generators intersect and, hence, are guaranteed to be on the object surface. Furthermore, the local surface orientation is known at these points, which makes them suitable for various photometric computations such as extraction of reflectance and illumination information. Our method generalizes the idea by examining a much richer superset of frontier points which is the set of contour generator points. We overcome the difficulty of localizing contour generators by a robust random sampling strategy. The price we pay is that a considerably simpler reflectance model must be used. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 3, MARCH 2008 549 Fig. 1. Our acquisition setup. The object is rotated on a turntable in front of a camera and a point light source. A sequence of images is captured, while the light source changes position between consecutive frames. No knowledge of the camera or light source positions is assumed. Although solving a different type of problem, the work of [9] is also highly related mainly because the class of objects addressed is similar to ours. While the energy term defined and optimized in their paper bears strong similarity to ours, their reconstruction setup keeps the lights fixed with respect to the object so, in fact, an entirely different problem is solved and, hence, a performance comparison between the two techniques is difficult. However, the results presented in [9] at first glance seem to be lacking in detail, especially in concavities, while our technique considerably improves on the visual hull. Finally, there is a growing volume of work on using specularities for calibrating photometric stereo (see [10] for a detailed literature survey). This is an example of a different cue used for performing uncalibrated photometric stereo on objects of the same class as the one considered here. However, methods proposed have so far only been concerned with the fixed view case. 3 ALGORITHM In this paper, we reconstruct the complete geometry of 3D objects by exploiting the powerful silhouette and shading cues. We modify classic photometric stereo and cast it in a multiview framework where the camera is allowed to circumnavigate the object and illumination is allowed to vary. First, the object’s silhouettes are used to recover camera motion using the technique presented in [11] and, via a novel robust estimation scheme, they allow us to accurately estimate the light directions and intensities in every image. Second, the object surface, which is parameterized by a mesh and initialized from the visual hull, is evolved until its predicted appearance matches the captured images. The advantages of our approach are the following: . . . 3.1 It is fully uncalibrated: No light or camera pose calibration object needs to be present in the scene. Both camera pose and illumination are estimated from the object’s silhouettes. The full 3D geometry of a complex, textureless multialbedo object is accurately recovered, something not previously possible by any other method. It is practical and efficient as evidenced by our simple acquisition setup. Robust Estimation of Light-Sources from the Visual Hull For an image of a Lambertian object with varying albedo, under a single distant light source, and assuming no self-occlusion, each surface point projects to a point of intensity given by i ¼ lT n; ð1Þ Fig. 2. The visual hull for light estimation. The figure shows a 2D example of an object which is photographed from two viewpoints. The visual hull (gray quadrilateral) is the largest volume that projects inside the silhouettes of the object. While the surface of the visual hull is generally quite far from the true object surface, there is a set of points where the two surfaces are tangent and, moreover, share the same local orientation (these points are denoted here with the four dots and arrows). In the full 3D case, three points, with their surface normals, are enough to fix an illumination hypothesis against which all other points can be tested for agreement. This suggests a robust random sampling scheme, described in the main text, via which the correct illumination can be obtained. where l is a 3D vector directed toward the light source and scaled by the light source intensity, n is the surface unit normal at the object location, and  is the albedo at that location. Equation (1) provides a single constraint on the three coordinates of the product l. Then, given three points xa , xb , xc with an unknown but equal albedo , their normals (noncoplanar) na , nb , nc , and the corresponding three image intensities ia , ib , ic , we can construct three such equations that can uniquely determine l as 2 3 ia 1 4 ð2Þ l ¼ ½na nb nc  ib 5: ic For multiple images, these same three points can provide the light directions and intensities in each image up to a global unknown scale factor . The problem is then how to obtain three such points. Our approach is to use the powerful silhouette cue. The observation on which this is based is the following: When the images have been calibrated for camera motion, the object’s silhouettes allow the construction of the visual hull [12], which is defined as the maximal volume that projects inside the silhouettes (see Fig. 2). A fundamental property of the visual hull is that its surface coincides with the real surface of the object along a set of 3D curves, one for each silhouette, known as contour generators [13]. Furthermore, for all points on those curves, the surface orientation of the visual hull surface is equal to the orientation of the object surface. Therefore, if we could detect points on the visual hull that belong to contour generators and have equal albedo, we could use their surface normal directions and projected intensities to estimate lighting. Unfortunately, contour generator points with equal albedo cannot be directly identified within the set of all points of the visual hull. Light estimation, however, can be viewed as robust model fitting where the inliers are the contour generator points of some constant albedo and the outliers are the rest of the visual hull points. The albedo of the inliers will be the dominant albedo, i.e., the color of the majority of the contour generator points. One can expect that the outliers do not generate consensus in favor of any particular illumination model, while the inliers do so in favor of the correct model. This observation motivates us to use a robust RANSAC scheme [14] to separate inliers 550 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 3, MARCH 2008 Fig. 3. Shape of illumination consensus. For different illumination configurations, we have plotted the consensus as a function of light direction. For each direction, consensus has been maximized with respect to light intensity. Red values denote big consensus. The shape of the maxima of this cost function as well as the lack of local optima implies a stable optimization problem. Top: Six different illuminations of a single albedo object. Bottom: Four different illuminations of a multialbedo object. Although the presence of multiple albedos degrades the quality of the light estimation (the peak is broader), it is still a clear single optimum. from outliers and estimate illumination direction and intensity. The scheme can be summarized as follows: 1. 2. Pick three points on the visual hull and from their image intensities and normals estimate an illumination hypothesis for l. Every point on the visual hull xm will now vote for this hypothesis if its predicted image intensity is within a given threshold  of the observed image intensity im , i.e., lT  nm  im < ; 3. ð3Þ where  allows for quantization errors, image noise, etc. Repeat 1 and 2 a set number of times always keeping the illumination hypothesis with the largest number of votes. The shape of the actual function being optimized by the RANSAC scheme described above was explored graphically for a porcelain object in Fig. 3. The number of points voting for a light direction (maximized with respect to light intensity) was plotted as a 2D function of latitude and longitude of the light direction. These graphical representations, obtained for six different illuminations, show the lack of local optima and the presence of clearly defined maxima. This simple method can also be extended in the case where the illumination is kept fixed with respect to the camera for K frames. This corresponds to K illumination vectors R1 l; . . . ; RK l, where Rk are 3  3 rotation matrices that rotate the fixed illumination vector l with respect to the object. In that case, a point on the visual hull xm with normal nm will vote for l if it is visible in the kth image where its intensity is im;k and ðRk lÞT  nm  im;k < : ð4Þ A point is allowed to vote more than once if it is visible in more than one image. Even though, in theory, the single image case suffices for independently recovering illumination in each image, in our acquisition setup, light can be kept fixed over more than one frame. This allows us to use the extended scheme in order to further improve our estimates. A performance comparison between the single view and the multiple view case is provided through simulations with synthetic data in Section 4. An interesting and very useful byproduct of the robust RANSAC scheme is that any deviations from our assumptions of a Lambertian surface of uniform albedo are rejected as outliers. This provides the light estimation algorithm with a degree of tolerance to sources of error such as highlights or local albedo variations. The next section describes the second part of the algorithm, which uses the estimated illumination directions and intensities to recover the object surface. 3.2 Multiview Photometric Stereo Having estimated the distant light-source directions and intensities for each image, our goal is to find a closed 3D surface that is photometrically consistent with the images and the estimated illumination, i.e., its predicted appearance by the Lambertian model and the estimated illumination matches the images captured. To achieve this, we use an optimization approach where a cost function penalizing the discrepancy between images and predicted appearance is minimized. Our algorithm optimizes a surface S that is represented as a mesh with vertices x1 . . . xM , triangular faces f ¼ 1 . . . F , and corresponding albedo 1 ; . . . ; F . We denote by nf and Af the mesh normal and the surface area at face f. Also, let if;k be the intensity of face f on image k and let the set V f be the set of images (subset of f1; . . . ; Kg) from which face f is visible. The light direction and intensity of the kth image will be denoted by lk . IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 3, MARCH 2008 551 We use a scheme similar to the ones used in [9], [15], where the authors introduce a decoupling between the mesh normals n1 . . . nF and the direction vectors used in the Lambertian model equation. We call these new direction vectors v1 . . . vF photometric normals and they are independent of the mesh normals. The minimization cost is then composed of two terms, where the first term Ev links the photometric normals to the observed image intensities F X 2  X lTk f vf  if;k Ev v1;...;F ; 1;...;F ; x1;...;M ¼ ð5Þ f¼1 k2V f and the second term Em brings the mesh normals close to the photometric normals through the following equation: F  X Em x1;...;M ; v1;...;F ¼ knf  vf k2 Af : Fig. 4. The multiview reconstruction algorithm. ð6Þ f¼1 This decoupled energy function is optimized by iterating the following two steps: 1. Photometric normal optimization. The vertex locations are kept fixed while Ev is optimized with respect to the photometric normals and albedos. This is achieved by solving the following independent minimization problems for each face f: vf ; f ¼ arg min v; X k2V f 2 lTk v  if;k s:t: kvk ¼ 1: ð7Þ Vertex optimization. The photometric normals are kept fixed while Em is optimized with respect to the vertex locations using gradient descent. These two steps are interleaved until convergence, which takes about 20 steps for the sequences we experimented with. Typically, each integration phase takes about 100 gradient descent iterations. Note that, for the first step described above, i.e., evolving the mesh until the surface normals converge to some set of target orientations, a variety of solutions is possible. A slightly different solution to the same geometric optimization problem has recently been proposed in [7], where the target orientations are assigned to each vertex rather than each face as we do here. That formulation lends itself to a closed-form solution with respect to the position of a single vertex. An iteration of these local vertex displacements yields the desired convergence. As both formulations offer similar performance, the choice between them should be made depending on whether the target orientations are given on a per vertex or per facet basis. The visibility map V f is a set of images in which we can measure the intensity of face f. It excludes images in which face f is occluded using the current surface estimate as the occluding volume as well as images where face f lies in shadow. Shadows are detected by a simple thresholding mechanism, i.e., face f is assumed to be in shadow in image k if if;k < shadow , where shadow is a sufficiently low intensity threshold. Due to the inclusion of a significant number of viewpoints in V f , (normally at least four) the system is quite robust to the choice of shadow . For all of the experiments presented here, the value shadow ¼ 5 was used (for intensities in the range 0-255). As for the highlights, we also define a threshold highlight such as a face f is assumed to be on a highlight in image k if if;k > highlight . In order to compute highlight , we need to distinguish between single albedo objects and multi-albedo objects. Single albedo objects are easily handled since the light calibration step gives us the light intensity. Hence, under the Lambertian assumption, no point on the surface can produce an intensity higher than the light intensity, i.e., highlight ¼ klk. In the multi-albedo case,  can also vary and it is likely that the albedo picked by the robust light estimation algorithm is not the brightest one present on the object. As a result, we prefer to use a global threshold to segment the highlights on the images. It is worth noting that this approach works for the porcelain objects 2. because highlights are very strong and localized, so just a simple sensor saturation test is enough to find them, i.e., highlight ¼ 254. 4 EXPERIMENTS The setup used to acquire the 3D model of the object is quite simple (see Fig. 1). It consists of a turntable onto which the object is mounted, a 60 W halogen lamp, and a digital camera. The object rotates on the turntable and 36 images (i.e., a constant angle step of 10 degrees) of the object are captured by the camera while the position of the lamp is changed. In our experiments, we have used three different light positions, which means that the position of the lamp was changed after 12 and again after 24 frames. The distant light source assumptions are satisfied if an object of 15 cm extent is placed 3-4 m away from the light. The algorithm (see Fig. 4) was tested on five challenging shiny objects, two porcelain figurines shown in Fig. 5, two fine relief Chinese Qing-dynasty porcelain vases shown in Fig. 6, and one textured Jade Buddha figurine in Fig. 7. Thirty-six 3; 456  2; 304 images of each of the objects were captured under three different illuminations. The object silhouettes were extracted by intensity thresholding and were used to estimate camera motion and construct the visual hull (second row of Fig. 5). The visual hull was processed by the robust light estimation scheme of Section 3.1 to recover the distance light-source directions and intensities in each image. The photometric stereo scheme of Section 3.2 was then applied. The results in Fig. 6 show reconstructions of porcelain vases with very fine relief. The reconstructed relief (especially for the vase on the right) is less than a millimeter while their height is approximately 15-20 cm. Fig. 7 shows a detailed reconstruction of a Buddha figurine made of polished Jade. This object is actually textured, which implies classic stereo algorithms could be applied. Using the camera motion information and the captured images, a state-of-the-art multiview stereo algorithm [16] was executed. The results are shown in the second row of Fig. 7. It is evident that, while the low frequency component of the geometry of the figurine is correctly recovered, the high frequency detail obtained by [16] is noisy. The reconstructed model appears bumpy even though the actual object is quite smooth. Our results do not exhibit surface noise while capturing very fine details such as surface cracks. To quantitatively analyze the performance of the multiview photometric stereo scheme presented here with ground truth, an experiment on a synthetic scene was performed (Fig. 8). A 3D model of a sculpture (digitized via a different technique) was rendered from 36 viewpoints with uniform albedo and using the Lambertian reflectance model. The 36 frames were split into three sets of 12 and, within each set, the single distant illumination source was held constant. Silhouettes were extracted from the images and the visual hull was constructed. This was then used to estimate the illumination direction and intensity as described in Section 3.1. In 1,000 runs of the illumination estimation method for the synthetic scene, the mean light direction estimate was 0.75 degrees away from the true direction with a standard deviation of 0.41 degrees. The model 552 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 3, MARCH 2008 Fig. 5. Reconstructing porcelain figurines. Two porcelain figurines reconstructed from a sequence of 36 images each (some of the input images are shown in (a)). The object moves in front of the camera and illumination (a 60 W halogen lamp) changes direction twice during the image capture process. (b) shows the results of a visual hull reconstruction, while (c) shows the results of our algorithm. (d) and (e) show detailed views of the figurines and the reconstructed models respectively. (a) Input images. (b) Visual hull reconstruction. (c) Our results. (d) Close up views of porcelains. (e) Close up views of reconstructed models. obtained by our algorithm was compared to the ground truth surface by measuring the distance of each point on our model from the closest point in the ground truth model. This distance was found to be about 0.5 mm when the length of the biggest diagonal of the bounding box volume was defined to be 1 m. Even though this result was obtained from perfect noiseless images, it is quite significant since it implies that any loss of accuracy can only be attributed to the violations of our assumptions rather than the optimization methods themselves. Many traditional multiview stereo methods would not be able to achieve this due to the strong regularization that must be IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 3, MARCH 2008 553 Fig. 6. Reconstructing Chinese Qing dynasty porcelain vases. (a) Sample of input images. (b) Proposed method. The resulting surface captures all of the fine details present in the images, even in the presence of strong highlights. Fig. 7. Reconstructing colored jade. (a) Two input images. (b) Model obtained by multiview stereo method from [16]. (c) Proposed method. The resulting surface is filtered from noise, while new high frequency geometry is revealed (note the reconstructed surface cracks in the middle of the figurine’s back). imposed on the surface. By contrast, our method requires no regularization when faced with perfect noiseless images. Finally, we investigated the effect of the number of frames during which illumination is held constant with respect to the camera frame. Our algorithm can, in theory, obtain the illumination direction and intensity in every image independently. However, keeping the lighting fixed over two or more frames and supplying that knowledge to the algorithm can significantly improve estimates. The next experiment was designed to test this improvement by performing a light estimation over K images where the light has been kept fixed with respect to the camera. The results are plotted in Fig. 8b and show the improvement of the 554 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 3, MARCH 2008 Fig. 8. Synthetic evaluation. (a) The accuracy of the algorithm was evaluated using an image sequence synthetically generated from a 3D computer model of a sculpture. This allowed us to compare the quality of the reconstructed model against the original 3D model as well as to measure the accuracy of the light estimation. The figure shows the reconstruction results obtained below the images of the synthetic object. The mean distance of all points of the reconstructed model from the ground truth was found to be about 0.5 mm if the bounding volume’s diagonal is 1 m. (b) The figure shows the effect of varying the length of the frame subsequences that have constant light. The angle between the recovered light direction and ground truth has been measured for 1,000 runs of the RANSAC scheme for each number of frames under constant lighting. With just a single frame per illumination, the algorithm achieves a mean error of 1.57 degrees with a standard deviation of 0.88 degrees. With 12 frames sharing the same illumination, the mean error drops to 0.75 degrees with a standard deviation of 0.41 degrees. accuracy of the recovered lighting directions as K increases from 1 to 12. The metric used was the angle between the ground truth light direction and the estimated light direction over 1,000 runs of the robust estimation scheme. For K ¼ 1, the algorithm achieves a mean error of 1.57 degrees with a standard deviation of 0.88, while, for K ¼ 12, it achieves 0.75 degrees with a standard deviation of 0.41 degrees. The decision for selecting a value for K should be a consideration of the trade-off between practicality and maximizing the total number of different illuminations in the sequence which is M=K, where M is the total number of frames. 5 CONCLUSION This paper has presented a novel reconstruction technique using silhouettes and the shading cue to reconstruct Lambertian objects in the presence of highlights. The main contribution of the paper is a robust, fully self-calibrating, efficient setup for the reconstruction of such objects, which allows the recovery of a detailed 3D model viewable from 360 degrees. We have demonstrated that the powerful silhouette cue, previously known to give camera motion information, can also be used to extract photometric information. In particular, we have shown how the silhouettes of a Lambertian object are sufficient to recover an unknown illumination direction and intensity in every image. Apart from the theoretical importance of this fact, it also has a practical significance for a variety of techniques which assume a precalibrated light-source and which could use the silhouettes for this purpose, thus eliminating the need for special calibration objects and the timeconsuming manual calibration process. REFERENCES [1] [2] [3] [4] M. Levoy, “Why Is 3D Scanning Hard?” Proc. 3D Processing, Visualization, Transmission, invited address, 2002. S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 519-528, 2006. R. Woodham, “Photometric Method for Determining Surface Orientation from Multiple Images,” Optical Eng., vol. 19, no. 1, pp. 139-144, 1980. J. Lim, J. Ho, M. Yang, and D. Kriegman, “Passive Photometric Stereo from Motion,” Proc. IEEE Int’l Conf. Computer Vision, vol. 2, pp. 1635-1642, Oct. 2005. [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] J. Paterson, D. Claus, and A. Fitzgibbon, “BRDF and Geometry Capture from Extended Inhomogeneous Samples Using Flash Photography,” Proc. Eurographics ’05, vol. 24, no. 3, pp. 383-391, 2005. F. Bernardini, H. Rushmeier, I. Martin, J. Mittleman, and G. Taubin, “Building a Digital Model of Michelangelo’s Florentine Pieta,” IEEE Computer Graphics and Applications, vol. 22, no. 1, pp. 59-67, Jan./Feb. 2002. D. Nehab, S. Rusinkiewicz, J. Davis, and R. Ramamoorthi, “Efficiently Combining Positions and Normals for Precise 3D Geometry,” Proc. ACM SIGGRAPH, pp. 536-543, 2005. G. Vogiatzis, P. Favaro, and R. Cipolla, “Using Frontier Points to Recover Shape, Reflectance and Illumination,” Proc. IEEE Int’l Conf. Computer Vision, pp. 228-235, 2005. H. Jin, D. Cremers, A. Yezzi, and S. Soatto, “Shedding Light in Stereoscopic Segmentation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 36-42, 2004. O. Dbrohlav and M. Chandler, “Can Two Specular Pixels Calibrate Photometric Stereo?” Proc. IEEE Int’l Conf’ Computer Vision, pp. 18501857, 2005. C. Hernández, F. Schmitt, and R. Cipolla, “Silhouette Coherence for Camera Calibration under Circular Motion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 2, pp. 343-349, Feb. 2007. A. Laurentini, “The Visual Hull Concept for Silhouette-Based Image Understanding,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 2, pp. 150-162, Feb. 1994. R. Cipolla and P. Giblin, Visual Motion of Curves and Surfaces. Cambridge Univ. Press, 1999. M. Fischler and R. Bolles, “Random Sample Consensus: A Paradigm for Model-Fitting with Applications to Image Analysis and Automated Cartography,” Comm. ACM, vol. 24, no. 6, pp. 381-395, 1981. G. Vogiatzis, C. Hernández, and R. Cipolla, “Reconstruction in the Round Using Photometric Normals and Silhouettes,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 1847-1854, 2006. C. Hernández and F. Schmitt, “Silhouette and Stereo Fusion for 3D Object Modeling,” Computer Vision and Image Understanding, vol. 96, no. 3, pp. 367392, Dec. 2004. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.