Academia.eduAcademia.edu

Variational Regularization and Fusion of Surface Normal Maps

2014, 2014 2nd International Conference on 3D Vision

In this work we propose an optimization scheme for variational, vectorial denoising and fusion of surface normal maps. These are common outputs of shape from shading, photometric stereo or single image reconstruction methods, but tend to be noisy and request post-processing for further usage. Processing of normals maps, which do not provide knowledge about the underlying scene depth, is complicated due to their unit length constraint which renders the optimization nonlinear and non-convex. The presented approach builds upon a linearization of the constraint to obtain a convex relaxation, while guaranteeing convergence. Experimental results demonstrate that our algorithm generates more consistent representations from estimated and potentially complementary normal maps.

2014 Second International Conference on 3D Vision Variational Regularization and Fusion of Surface Normal Maps Bernhard Zeisl Computer Vision and Geometry Group ETH Zurich, Switzerland Christopher Zach Toshiba Research Europe Cambridge, UK Abstract—In this work we propose an optimization scheme for variational, vectorial denoising and fusion of surface normal maps. These are common outputs of shape from shading, photometric stereo or single image reconstruction methods, but tend to be noisy and request post-processing for further usage. Processing of normals maps, which do not provide knowledge about the underlying scene depth, is complicated due to their unit length constraint which renders the optimization nonlinear and non-convex. The presented approach builds upon a linearization of the constraint to obtain a convex relaxation, while guaranteeing convergence. Experimental results demonstrate that our algorithm generates more consistent representations from estimated and potentially complementary normal maps. I. I NTRODUCTION Noisy vectorial images representing surface normals appear in several computer vision applications, and subsequent processing of normal fields relies on cleaner estimates with much of the noise removed. While denoising of standard single-channel intensity or multi-channel color images and also depth maps received a lot of interest in the literature (e.g. [1]), removing noise from surface normals is usually addressed via mesh processing on surfaces (e.g. [2]) but typically not by image denoising applied on normal maps. However, in many applications a surface normal is estimated per pixel and a natural representation is based on normal maps, i.e. a mapping from a regular image grid to surface normals, and without knowledge of the actual 3D scene scene geometry. Noisy normal maps are the intermediate result of shapefrom-shading [3], [4] and photometric stereo techniques (e.g. [5], [6]). Further, surface orientations are the natural output of single-image relative depth estimation [7], [8] and direct regression of normals from a single image [9], [10], [11]; a research direction which has gained quite some interest recently [12], also because the estimated, coarse information about the underlying scene geometry has shown to help other computer vision tasks such as object detection, semantic reasoning or scene understanding. In addition to the denoising task, regularized fusion of normal maps is important if several complementary estimates of surface normals are given, which should be merged to return one consistent normal map. In contrast to multi-channel image denoising [1], where the unknown vector per pixel is either unconstrained or has simple box constraints (max. and min. 978-1-4799-7000-1/14 $31.00 © 2014 IEEE DOI 10.1109/3DV.2014.92 Marc Pollefeys Computer Vision and Geometry Group ETH Zurich, Switzerland Input Image Our fused result Dense Predictions Sparse Primitives [9] Figure 1: Exemplary result for our total variation based fusion of estimated surface normals from pixel-wise predictions and detected primitives (cf. Fig. 5 for color coding). value), processing of surface normals is more difficult due to the unit length constraint on normals. This non-linear and non-convex constraint means that a global minimum of a denoising objective over surface normals is hard to find. In this work we propose a vectorial optimization and demonstrate that by linearizing the constraint we can guarantee a decrease in the objective in each iteration — and consequently that our optimization method is convergent. The observation that man-made environments exhibit predominantly piece-wise planar geometries is incorporated as a total variation based regularization, which favors piecewise constant normals (or piece-wise smooth normals, as discussed in Sec. II). The remainder of our manuscript is structured as follows. In Sec. II we state our problem formulation and show how to cope with the norm constraint on normals. Then Sec. III details our iterative optimization approach, while particular considerations for 3D surface normals and the implementation itself are discussed in Sec. IV and V, respectively. Finally, experiments and their results are presented in Sec. VI. 601 (a) Ground truth (b) Noise added (c) TV denoised (d) Huber-TV denoised Figure 2: Denoising of a 2D normal map. (a) Ground truth orientation field. (b) Same orientation field perturbed by noise; i.e. n′ = d/d2 , d = n + (rx , ry )T , where ri are random variables are from N (0, 0.4). (c) Denoised result with TV regularization. (d) Denoised result with Huber-TV regularization. Note the smoother transition of direction vectors compared to (c), e.g. in the highlighted region. For both denoised results the weight λ of the data term was set to 0.5. II. D ENOISING OF O RIENTATION F IELDS We will first present our algorithm for the general ndimensional case; i.e. the optimization of surface normals of hyperplanes, parameterized as points lying on the respective hypersphere. In Sec. IV we will then concentrate on the important case of 3D surface normals and point out related geometric properties. A normal vector could be efficiently represented in spherical coordinates; however, this choice of parameterization leads to non-linearities in any optimization procedure, due to the periodicity of the angular measure. We take a different approach and chose an (over-)parameterization in vector space. Such vector fields e.g. occur as 2D displacement fields in optical flow or multi-channel color values. However, surface normals are constrained to be of unit length, which is a difficult non-convex constraint. In the following we will linearize this constraint and demonstrate the resulting convex formulation is a majorizer of the original non-convex objective. Thus, the majorize-minimize principle [13] can be applied to determine an (approximate) minimizer. Via this relaxed representation we implicitly obtain a minimal parameterization of our optimization problem. Let us start by stating the original objective function to minimize in n-dimension space. Given an image domain Ω we search for surface normals n which minimize the objective function   φl (x, n(x)) (1) λ E(n) = Ω +  but suppresses regularization along texture gradients. The data term φl : Ω × Rd → R is discussed below. In Eq. 1 we explicitly stated the dependence of functions and unknowns on the image location x, but for clarity we will drop the respective argument in the following. We will also use n etc. to denote both, a single normal or an entire normal map, and the meaning should be clear from the context. We abbreviate the regularization part of Eq. 1 by   def wd ψ(∇d n) dx. (2) R(n) = Ω d∈{h,v} Our choice of regularization is motivated by the assumption that we expect piecewise smooth normal fields. A convex regularizer favoring such piecewise smooth solutions is given by a “smoothed” total variation, where the nondifferentiable norm is replaced by the Huber cost [14],  1 ∇d n < ε ∇d n22 ψ (∇d n|) = 2ε , (3) ∇d n2 − 2ε else Clearly, this regularizer approaches the total variation for ε → 0. Thus for generality in the remainder of this work we will just consider the Huber norm, but piecewise constant regularization behavior is achieved by setting ε = 0 (cf. Fig. 2 for an example of 2D normal field denoising). The data terms φl (x, n) measure the data fidelity between measured observations and optimized normals n, and the exact form of φl depends on the assumed noise model for the observations, but in this work we assume convex φl . We allow several data terms per pixel (indexed by l) in order to allow several estimates of normals to be merged. Suitable choices in our application are the squared and unsquared L2 distance between the observation fl and the unknown n, l wd (x) ψ(∇d n(x)) dx s.t. n(x)2 = 1, d∈{h,v} where {h, v} denote the horizontal and vertical direction in the image domain. We use a (weighted) anisotropic  + regularizer, d wd ψ(∇d n)dx, where ψ : R0 → R is a monotonically increasing convex function. The weights wd ≥ 0 are induced by texture gradients — which serve as strong cues for geometric edges — of the underlying image. By this we aim for a discontinuity preserving, image-driven regularization. The weighting has the effect to facilitate regularization among homogeneous image regions, wl (x) n − fl (x)22 or 2 (ii) φl (x, n) = wl (x)n − fl (x)2 . (i) φl (x, n) = (4) The unsquared distance is known to be more robust to outlier measurements due to the heavier than Gaussian tails in the underlying noise model. It has been used e.g. in image [15] 602 2 (i) φ(θ) = 21 n − f22 = 1 − cos(θ) φ(θ) = 21 h(s) − f22  (ii) φ(θ) = n − f2 = 2 − 2 cos(θ) φ(θ) = h(s) − f2 1.5 1 0.5 0.5 0 −πι −πι/2 −πι/4 0 πι/4 πι/2 πι 0 −πι/4 −πι/8 0 πι/8 Figure 4: Illustration of the locally linearized norm constraint in 2D. The resulting optimization takes place in the tangent space T with minimal parameterization in s. πι/4 Figure 3: The two considered choices for the data fidelity term φl have the desired property to be robust in the angular error θ (solid lines), but represent convex linear functions in n. In the linearized domain (dashed lines) the cost functions are well approximated in the locality of the working point (for the plots f = n0 is assumed). a basis B ∈ Rn×n−1 for T . It corresponds to the null T space of nT 0 , i.e. B = null(n0 ). Consequently the minimal parameterization for surface normals under a linearized constraint is given by s ∈ Rn−1 , where s lives in a n-1 dimension subspace. 1 The needed transformation from local parameterization to global normals and vice versa is given by the linear maps and shape denoising [16]. For normalized vectors it holds that wl n − fl 22 = wl (1 − nT fl ). (5) 2 Therefore the squared L2 distance equals the (weighted) cosine distance. If we regard the data fidelity in terms of the angular error θ = arccos(nT fl ), Fig. 3 illustrates that our considered choices of error measurements are robust cost functions wrt. the angular deviation. In the remainder of this work we will focus on φl (n) = wl n − fl 2 due to the increased robustness. With our choice of convex φl the only source of non-convexity in the overall formulation comes from the pixel-wise unit-length constraint on the normals, which is addressed below. n = h(s) = Bs + n0 s = h−1 (n) = BT (n − n0 ). (7) For the 2 and 3 dimensional case a basis B is directly computed via orthogonal vector and cross product expansion, respectively. In the n-dimension case the Gram-Schmidt process is an efficient algorithm compared to a full singular value decomposition to retrieve the null-space. What remains is our transformed linear, convex but nonsmooth objective function   φl (h(s)) dx + R(h(s)). (8) λ min s Ω l which is now optimized within the coordinates s in the tangent space T of the current valid linearization. A. Linearization of norm constraint As pointed out previously, the objective function of Eq. (1) is a non-convex optimization problem, solely due to the nonconvex constraint on the normal vectors. One can obtain an obvious convex relaxation by replacing the non-convex set {n:n2 = 1} by its convex hull, {n:n2 ≤ 1}, but such an approach would suffer from a strong tendency to shrink the unknown normals due to the regularization term. As an alternative we propose to use the tangent plane approximation of the non-linear manifold, {n : n2 = 1}, and thus the non-linear constraint is replaced by a linear one. The tangent plane at a linearization point n0 is given by T n22 ≈ nT 0 n0 + 2n0 (n − n0 ) = 1. and B. Majorize-minimize method framework According to standards in non-linear optimization with non-linear constraints, we also solve a sequence of surrogate problems in order to minimize the original objective. In contrast to very general approaches such as sequential quadratic programming, we have the guarantee of non-increasing the objective by solving the convex surrogate problem. This is the consequence of   def n − fl 2 dx + R(n) s.t. nT n0 = 1 λ Ĕ(n) = Ω (6) l (9) being a majorizer [13] of our objective   def n − fl 2 dx + R(n) s.t. n2 = 1 λ E(n) = Since the unit sphere is a smooth manifold, the resulting tangent plane equation nT 0 n = 1 gives a very good approximation of the original constraint near n0 . Fig. 4 illustrates the linearization for the 2D case. Effectively we have transformed the feasible set from a hyper-sphere to the hyper-(tangent)-plane. It allows to perform the optimization in a local paramedef terization within the tangent space T = {n : nT 0 n = 1}. The orthogonal complement to the working point n0 builds Ω l (10) 1 The authors in [17] propose to obtain a minimal parameterization for 3D normals by considering only 2 of the components. However, we claim that such a representation leads to a non-uniform, data-dependent regularization, because the resulting gradient and thus the regularization strength correlates with the normal direction – which is an undesired property. 603 This can be seen by noting that (i) the tangent plane {n : nT n0 = 1} is outside the unit sphere (which means that the projection into the set {n : n2 ≤ 1} is always on its boundary {n : n2 = 1}), and (ii) that in general the projection into a convex set is non-expansive. More formally, define the sets def B = {n : n2 ≤ 1} and shown (using convex conjugacy [20]) to have the form   wd ε max pd , ∇d h(s) − pd 22 min p s 2 d Ω d∈{h,v}  max ql , (h(s) − fl ) dx (13) + def = min ∂B = {n : n2 = 1}, s and the projection into B ΠB (n̂) = arg minn − n̂2 = n∈B   l n̂ n̂/n̂2 if n2 ≤ 1 otherwise. and wd ε pd 22 2 (14) ql p̂d /(1 + σǫ) p̂d 2 /wd } max{1,  1+σǫ q̂l . PQ (q̂l ) = max{1, q̂l 2 /λwl } (11) for suitable neighboring pixels x and x′ and our choice of a monotonically increasing ψ(·) in Eq. 3. For the L2 data fidelity term we obtain analogously, 2 max ql , (Bs + n0 − fl ) dx PP (p̂d ) = ′   n̂(x) − fl (x)2 ≥ ΠB (n̂(x)) − ΠB (fl (x))2   = ΠB (n̂(x)) − fl (x) , −∇d T pd , Bs + n0 − where σ denotes the dual time step. The projection operators for the dual variables are defined as for any monotonically increasing function f . Therefore, we have for the discretized gradient ψ (∇d n̂) = ψ (n̂(x) − n̂(x ))   ≥ ψ ΠB (n̂(x)) − ΠB (n̂(x′ )) pd subject to constraints pd  ≤ wd and ql 2 ≤ λwl (applied pixel-wise). The optimization follows an iterative procedure exhibiting gradient descent steps for the primal variables and gradient ascent steps for the dual variables followed by appropriate projection steps. As such in each iteration the dual variables are updated via  = PP pkd + σ∇d h(s̄) pk+1 d  (15) = PQ qkl + σ(h(s̄) − fl ) , qk+1 l It is well-known that projections into convex sets are (firmly) non-expansive (see e.g. [18]), hence ΠB is non-expansive due to the convexity of the set B. Thus, we have   ΠB (n̂) − ΠB (n̂′ ) ≤ n̂ − n̂′    f ΠB (n̂) − ΠB (n̂′ ) ≤ f (n̂ − n̂′ ) max Ω d∈{h,v} +  ql l  (16) The update of the primal variables follows the gradients of regularization and data terms to ⎞ ⎛   sk+1 = sk + τ BT ⎝ ql ⎠ . ∇d T pd − (17) d∈{h,v} (12) l τ is the primal step size. The parametrization in tangent space has the effect that both sets of dual variables are inversely mapped to the local coordinate system of s. The algorithm maintains an overrelaxation of the primal variables in addition to sk , which are updated via s̄k+1 = 2 sk+1 −sk , and n̄k+1 = h(s̄k+1 ) respectively. This extrapolation step from the previous iteration enables faster convergence of the procedure. For a given linearization point, and thus current local parameterization, the algorithm is run for a few iterations until convergence. Then we apply a projection since the noisy observation fl (x) has unit length. Let n̂∗ be a minimizer of our relaxed objective function in Eq. 9 and n∗ = ΠB (n̂∗ ) (its pixel-wise projection). By applying the above fact term-wise, we see that inserting n∗ into the original objective of Eq. 10 has a lower or equal value than the objective in Eq. 9 at n̂∗ , and the objective values coincide at n0 . Overall, by repeatedly minimizing Eq. 9 and subsequent linearization of the norm constraint we obtain a non-increasing sequence of objective values (which is converging, since the energy is always non-negative i.e. bounded from below). n0 ← nk+1 = h(sk+1 )/h(sk+1 )2 (18) on the hypersphere such that the original norm-constraint is fulfilled. As indicated this result then serves as the new working point n0 and a new linearization, local parameterization and orthogonal complement for the tangent space are computed. The overall optimization is run until global convergence. III. O PTIMIZATION For optimization of our objective function in Eq. (8) we build upon the primal-dual algorithm, proposed in [19]. In order to apply it the primal problem needs to be transformed into a convex-concave saddle-point problem, which can be 604 (a) Noisy normals (b) Infeasible normals (c) TV (d) BF (e) 3D normal colors Figure 5: Denoising of a 3D normal map. (a) Noisy surface normals with σ = 0.4. (b) Mask indicating infeasible normal directions in the camera coordinate system. (c) Result with our variational regularization. It produces better results, while (d) bilateral filtering (BF) yields bumpy surfaces (with optimal filter size w = 5 according to Fig. 6). (e) shows the coloring of normals we are using throughout this work; i.e. each normal denotes a point on the feasible half-space. onto the feasible half-sphere, defined by point ray r for normal n, is performed. The cross products create a 3D vector which is closest to n and orthogonal to r and thus on the boundary of the feasible set. IV. A PPLICATION TO 3D S URFACE N ORMALS In the 3-dimensional case we are able consider visibility information and by this constraint the feasible set of possible normal directions. Without loss of generality let us assume that surface normals are defined in the local camera coordinate system and that they point towards the camera. For a given image location x = (x, y) the point ray originating from the camera center and passing through x is defined as r = K−1 (x, y, 1)T , where K defines the camera intrinsics. This setup limits surface normals to lie on a half-sphere, since only surfaces which are oriented towards the camera are visible. As a result the constraint nT r ≤ 0 ∀x ∈ Ω V. I MPLEMENTATION So far our optimization has been formulated in the continuous domain. For implementation, operation on a discrete image grid is required. In this regard, derivative operators have been discretized as forward differences with reflecting (Neumann) boundary conditions. As a result the divergence operator is based on backward differences with Drichilet boundary conditions. For algorithm initialization each normal is assigned the respective data term, i.e. n = f0 . For infeasible 3D normals in the data terms, n is initialized analogous to Eq. (20) or in case of missing data defined to point towards the camera. In case large portions of missing data are present in the orientation field, we have created a coarse to fine framework for faster convergence. The algorithm itself has been implemented as a multithreaded C++ application. Average run-times for the 3D surface normal denoising results in Sec VI (640 × 480 images, 1000 iterations) are around 5 seconds on a current desktop computer with 8 cores. Due to the decomposition into independent per pixel computations for both, the local parameterization and primal and dual update steps, the algorithm would lend itself to highly parallel implementations on a GPU. Image gradients are computed via 3 × 3 Sobel filters in horizontal and vertical image direction. In addition we pre-smooth the image with a Gaussian kernel of standard deviation of 1.2 pixels to suppress image noise. (19) must be fulfilled for each image location individually. One can utilize this prior information as a preprocessing step for the orientation field. It is encoded via the data term weights, i.e. wl = 1 if the observed surface normal is visible and 0 otherwise. As a result the optimization procedure will fill in the missing values. Fig. 5(b) illustrates an exemplary mask for infeasible normals. Visibility consistent regularization: The previous preprocessing makes a hard decision about feasible normals. If the constraint is just slightly violated due to noise in the normals — but the direction is assumed to be largely correct — we can encode the visibility constraint of Eq. (19) directly in the optimization. It has the effect that the observation still exhibits a driving force, but the optimized normals are guaranteed feasible, i.e. visible. In order to achieve this effect the iterative optimization procedure of Sec. III just needs to be modified slightly: The original primal problem in Eq. (8) and accordingly the primal-dual formulation of Eq. (13) are augmented by the visibility constraint. Within the optimization procedure it is simply handled by an additional projection step. In each iteration after the primal update and projection step of Eq. 18 an additional projection ⎧ ⎨n nT r ≤ 0 r×n×r (20) n ← PN (n) = else ⎩ r × n × r2 VI. E XPERIMENTS AND R ESULTS A. Denoising Performance In a first experiment we evaluate the denoising performance of our algorithm. Required ground truth measurements have been obtained from 4 different synthetic 3D models (one shown in Fig. 5). We rendered several depth maps from the models and computed point-wise normals via a least squares kernel within a RANSAC scheme to preserve 605 30 TV BF BF BF 28 : : : : 26 P SN R (dB) 10◦ 20◦ 30◦ 40◦ 50◦ Local coding [10] 0.227 0.427 0.563 0.660 0.732 TV (ours) Bilateral filter 0.229 0.224 0.436 0.428 0.577 0.570 0.678 0.672 0.753 0.748 angular error ≤ λ = 1,  = 0.05 w=3 w=5 w=7 24 Table I: Ratio of pixels within different angular errors evaluated over the whole image for the NYU2 test set. 22 20 describe the observed scene geometry well, but are noisy and may contain visually infeasible estimates. The goal of the regularization is to obtain cleaner, more simplistic results — which is accomplished by our algorithm. It produces a consistent labeling within homogeneous regions and follows boundaries well. The parameter settings for this experiments have been λ = 0.1 and ǫ = 0.05. The decreased weight on the data term compared to the previous experiment is explained by the weaker quality of observations, requesting stronger regularization. Typical examples for partial infeasible predictions are highlighted in the examples. Our method resolves those areas to more consistent estimates. We do not report a PSNR value, because there are no accurate ground truth normals available for the NYU2 dataset. The training data used in [9] or [10] is quite noisy as well, especially for distant structures. However, we evaluated our results equivalently to [10] to obtain an estimate about the (rough) correctness of normals. Table I lists the scores and indicates that we do well. To ensure fair comparison we also applied bilateral filtering with varying kernel sizes and only report the best performance for w = 7. 18 16 0.1 0.2 0.3 0.4 0.5 0.6 noise level σ 0.7 0.8 0.9 Figure 6: Average denoising performance measured as PSNR (in dB). The curves represent averaged results for 9 normal maps, which were obtained from synthetic 3D models and added by increasing artificial noise. geometry edges. In this way 9 ground truth normal maps have been created for evaluation. Each of them has then been distorted via noise of varying strength, i.e. a noisy normal is computed to n′ = d/d2 , where d = n+(rx , ry , rz )T and ri are Gaussian random variables with σ ∈ [0.1, . . . 0.9]. We compare our results to cross bilateral filtering [21] (which also exhibits edge-preserving properties) adapted to orientation fields. There each normal is replaced by a weighted average of normals from nearby pixels. The weight itself is based on two Gaussian distributions over the Euclidean distance of pixels and the intensity difference of the underlying image, respectively. The corresponding standard deviations where set to σS = 53 w, where w is half the window size of the filter, and σI = 0.1 for image intensities in the range [0, 1]. The weighted average of normals is computed as the extrinsic mean (i.e. the normalized Euclidean mean). Quantitative performance is given in Fig. 6. We measure the deviation of our denoised results as mean squared error (MSE) between normals on the unit-sphere; as pointed out in Eq. (5) this coincides with the cosine distance. The reported peak-signal-to-noise-ratios are computed to P SRN = 10 log10 (d2max /M SE), where dmax = 2 and denotes the maximal error between normals. As can be seen in Fig. 6 the performance of bilateral filtering is quite sensitive to the chosen filter size (reported are the top results among several tested kernel sizes). In contrast our approach achieves superior results with one set of parameters over the analyzed noise range. For fusion of normal fields we follow the idea to combine several, potentially complementary predictions to obtain a merged result with increased accuracy. In this regards, Fouhey et al. [9] have presented and interesting work. Their method aims at extracting both visually-discriminative and geometrically-informative shape primitives from RGB-D training data. During inference the learned detector will fire only at sparse positions and reason about the underlying scene geometry by means of the learned shape primitives. Obtained primitives are typically located at 3D corner- or edge-like structures and are a strong cue for the local normal direction. We fuse these sparse predictions with estimates obtained from a boosted classifier, which was trained in a similar way to [10] but without super-pixel features to obtain estimates which do not follow boundaries. Thereby we can demonstrate the applicability of our approach for fusion of normal maps; though, obtained results are likely to be worse than those in Fig 7, simply because initial estimates are less detailed. Still, exemplary fusion results in Fig 8 demonstrate the increased accuracy we obtain over the input estimates. In this regard, we are looking forward to see new, complementary estimation methods being published, which then can be merged by our approach to obtain even more accurate models. B. Regularization and Fusion of Normal Estimates In a second experiment we explore the task of regularization and fusion of predicted 3D surface normal estimates from a single image. For denoising we utilize state-of-the-art prediction results [10] on the NYU2 [22] dataset. Fig. 7 lists several examples for our denoising results. Initial predictions 606 Image Prediction Denoised Image Prediction Denoised Figure 7: Qualitative denoising results on normal estimates [10] for images from the NYU2 [22] dataset (cf. Table I). Typical infeasible normal predictions are marked for images in the bottom row and correctly handled and resolved by our approach. is unknown. As normals have constant length, optimization on them is non-linear and non-convex. We have introduced a convex relaxation of the original objective function and proven guaranteed convergence. Performed experiments demonstrate that our algorithm is able to generate more con- VII. C ONCLUSION In this work we have presented a convex optimization scheme for variational, vectorial regularization and fusion of surface normal maps where the underlying scene depth 607 Image Prediction 3DP Fusion Image Prediction 3DP Fusion Figure 8: Exemplary results for fusion of different normal estimates. “Prediction” denotes pixel-wise predictions from a boosted classifier (similar to [10]) and “3DP” are the detected primitives from [9]. sistent normal maps. It is expected that with the introduction of complementary surface normal estimation methods results will improve further. Future work will concentrate on adaptive fusion of normal estimates and the use of obtained normals for objected detection and scene understanding or as a prior in 3D modeling. [10] L. Ladicky, B. Zeisl, and M. Pollefeys, “Discriminatively Trained Dense Surface Normal Estimation,” in ECCV, Zurich, 2014, pp. 468–484. 1, 6, 7, 8 [11] D. Fouhey, A. Gupta, and M. Hebert, “Unfolding an Indoor Origami World,” in ECCV, 2014, pp. 687–702. 1 [12] N. Silberman, R. Urtasun, A. Geiger et al., “Reconstruction meets recognition challenge 2014,” 2014, [accessed 8-July2014]. [Online]. Available: http://cs.nyu.edu/∼silberman/ rmrc2014/indoor.php 1 R EFERENCES [1] P. Blomgren and T. F. Chan, “Color tv: Total variation methods for restoration of vector-valued images,” Image Processing, IEEE Transactions on, vol. 7, no. 3, pp. 304– 309, 1998. 1 [13] K. Lange, D. R. Hunter, and I. Yang, “Optimization transfer using surrogate objective functions,” J. Comput. Graphical Stat., vol. 9, pp. 1–20, 2000. 2, 3 [14] P. J. Huber, “Robust regression: asymptotics, conjectures and monte carlo,” The Annals of Statistics, pp. 799–821, 1973. 2 [2] T. Tasdizen, R. Whitaker, P. Burchard, and S. Osher, “Geometric surface smoothing via anisotropic diffusion of normals,” in IEEE Visualization, 2002, pp. 125–132. 1 [15] J.-F. Aujol, G. Gilboa, T. Chan, and S. Osher, “Structuretexture image decomposition–modeling, algorithms, and parameter selection,” IJCV, vol. 67, no. 1, pp. 111–136, 2006. 2 [3] B. K. Horn and M. J. Brooks, “The variational approach to shape from shading,” Computer Vision, Graphics, and Image Processing, vol. 33, no. 2, pp. 174–208, 1986. 1 [16] C. Zach, T. Pock, and H. Bischof, “A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration,” in ICCV. IEEE, 2007, pp. 1–8. 3 [4] R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah, “Shape-fromshading: a survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 8, pp. 690–706, 1999. 1 [17] P. Heise, S. Klose, B. Jensen, and A. Knoll, “PM-Huber: PatchMatch with huber regularization for stereo matching,” in ICCV, 2013, pp. 2360–2367. 3 [5] K. Ikeuchi, “Determining surface orientations of specular surfaces by using the photometric stereo method,” IEEE Trans. Pattern Anal. Mach. Intell., no. 6, pp. 661–669, 1981. 1 [18] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168–1200, 2005. 4 [6] D. B. Goldman, B. Curless, A. Hertzmann, and S. M. Seitz, “Shape and spatially-varying BRDFs from photometric stereo,” TPAMI, vol. 32, no. 6, pp. 1060–1071, 2010. 1 [19] A. Chambolle and T. Pock, “A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging,” Journal of Mathematical Imaging and Vision, vol. 40, no. 1, pp. 120–145, Dec. 2010. 4 [7] D. Hoiem, A. A. Efros, and M. Hebert, “Recovering Surface Layout from an Image,” IJCV, vol. 75, no. 1, pp. 151–172, Feb. 2007. 1 [20] R. T. Rockafellar, Convex analysis. Press, 1997, no. 28. 4 Princeton University [8] A. Saxena, M. Sun, and A. Y. Ng, “Make3D: Learning 3D Scene Structure from a Single Still Image.” TPAMI, vol. 31, no. 5, pp. 824–40, 2009. 1 [21] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. ICCV, 1998, pp. 839–846. 6 [9] D. F. Fouhey, A. Gupta, and M. Hebert, “Data-Driven 3D Primitives for Single Image Understanding,” in ICCV, 2013, pp. 3392–3399. 1, 6, 8 [22] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in ECCV, 2012, pp. 1–14. 6, 7 608