2014 Second International Conference on 3D Vision
Variational Regularization and Fusion of Surface Normal Maps
Bernhard Zeisl
Computer Vision and Geometry Group
ETH Zurich, Switzerland
Christopher Zach
Toshiba Research Europe
Cambridge, UK
Abstract—In this work we propose an optimization scheme
for variational, vectorial denoising and fusion of surface normal
maps. These are common outputs of shape from shading,
photometric stereo or single image reconstruction methods, but
tend to be noisy and request post-processing for further usage.
Processing of normals maps, which do not provide knowledge
about the underlying scene depth, is complicated due to their
unit length constraint which renders the optimization nonlinear and non-convex. The presented approach builds upon
a linearization of the constraint to obtain a convex relaxation, while guaranteeing convergence. Experimental results
demonstrate that our algorithm generates more consistent
representations from estimated and potentially complementary
normal maps.
I. I NTRODUCTION
Noisy vectorial images representing surface normals appear in several computer vision applications, and subsequent
processing of normal fields relies on cleaner estimates with
much of the noise removed. While denoising of standard
single-channel intensity or multi-channel color images and
also depth maps received a lot of interest in the literature
(e.g. [1]), removing noise from surface normals is usually
addressed via mesh processing on surfaces (e.g. [2]) but
typically not by image denoising applied on normal maps.
However, in many applications a surface normal is estimated
per pixel and a natural representation is based on normal
maps, i.e. a mapping from a regular image grid to surface
normals, and without knowledge of the actual 3D scene
scene geometry.
Noisy normal maps are the intermediate result of shapefrom-shading [3], [4] and photometric stereo techniques
(e.g. [5], [6]). Further, surface orientations are the natural
output of single-image relative depth estimation [7], [8]
and direct regression of normals from a single image [9],
[10], [11]; a research direction which has gained quite some
interest recently [12], also because the estimated, coarse
information about the underlying scene geometry has shown
to help other computer vision tasks such as object detection,
semantic reasoning or scene understanding. In addition to
the denoising task, regularized fusion of normal maps is
important if several complementary estimates of surface
normals are given, which should be merged to return one
consistent normal map. In contrast to multi-channel image
denoising [1], where the unknown vector per pixel is either
unconstrained or has simple box constraints (max. and min.
978-1-4799-7000-1/14 $31.00 © 2014 IEEE
DOI 10.1109/3DV.2014.92
Marc Pollefeys
Computer Vision and Geometry Group
ETH Zurich, Switzerland
Input Image
Our fused result
Dense Predictions
Sparse Primitives [9]
Figure 1: Exemplary result for our total variation based
fusion of estimated surface normals from pixel-wise predictions and detected primitives (cf. Fig. 5 for color coding).
value), processing of surface normals is more difficult due to
the unit length constraint on normals. This non-linear and
non-convex constraint means that a global minimum of a
denoising objective over surface normals is hard to find.
In this work we propose a vectorial optimization and
demonstrate that by linearizing the constraint we can guarantee a decrease in the objective in each iteration — and
consequently that our optimization method is convergent.
The observation that man-made environments exhibit predominantly piece-wise planar geometries is incorporated as
a total variation based regularization, which favors piecewise constant normals (or piece-wise smooth normals, as
discussed in Sec. II).
The remainder of our manuscript is structured as follows.
In Sec. II we state our problem formulation and show
how to cope with the norm constraint on normals. Then
Sec. III details our iterative optimization approach, while
particular considerations for 3D surface normals and the
implementation itself are discussed in Sec. IV and V, respectively. Finally, experiments and their results are presented in
Sec. VI.
601
(a) Ground truth
(b) Noise added
(c) TV denoised
(d) Huber-TV denoised
Figure 2: Denoising of a 2D normal map. (a) Ground truth orientation field. (b) Same orientation field perturbed by noise;
i.e. n′ = d/d2 , d = n + (rx , ry )T , where ri are random variables are from N (0, 0.4). (c) Denoised result with TV
regularization. (d) Denoised result with Huber-TV regularization. Note the smoother transition of direction vectors compared
to (c), e.g. in the highlighted region. For both denoised results the weight λ of the data term was set to 0.5.
II. D ENOISING OF O RIENTATION F IELDS
We will first present our algorithm for the general ndimensional case; i.e. the optimization of surface normals of
hyperplanes, parameterized as points lying on the respective
hypersphere. In Sec. IV we will then concentrate on the
important case of 3D surface normals and point out related
geometric properties.
A normal vector could be efficiently represented in spherical coordinates; however, this choice of parameterization
leads to non-linearities in any optimization procedure, due
to the periodicity of the angular measure. We take a different
approach and chose an (over-)parameterization in vector
space. Such vector fields e.g. occur as 2D displacement
fields in optical flow or multi-channel color values. However,
surface normals are constrained to be of unit length, which
is a difficult non-convex constraint. In the following we
will linearize this constraint and demonstrate the resulting
convex formulation is a majorizer of the original non-convex
objective. Thus, the majorize-minimize principle [13] can
be applied to determine an (approximate) minimizer. Via
this relaxed representation we implicitly obtain a minimal
parameterization of our optimization problem.
Let us start by stating the original objective function to
minimize in n-dimension space. Given an image domain
Ω we search for surface normals n which minimize the
objective function
φl (x, n(x))
(1)
λ
E(n) =
Ω
+
but suppresses regularization along texture gradients. The
data term φl : Ω × Rd → R is discussed below.
In Eq. 1 we explicitly stated the dependence of functions
and unknowns on the image location x, but for clarity we
will drop the respective argument in the following. We will
also use n etc. to denote both, a single normal or an entire
normal map, and the meaning should be clear from the
context. We abbreviate the regularization part of Eq. 1 by
def
wd ψ(∇d n) dx.
(2)
R(n) =
Ω d∈{h,v}
Our choice of regularization is motivated by the assumption
that we expect piecewise smooth normal fields. A convex regularizer favoring such piecewise smooth solutions
is given by a “smoothed” total variation, where the nondifferentiable norm is replaced by the Huber cost [14],
1
∇d n < ε
∇d n22
ψ (∇d n|) = 2ε
,
(3)
∇d n2 − 2ε else
Clearly, this regularizer approaches the total variation for
ε → 0. Thus for generality in the remainder of this work we
will just consider the Huber norm, but piecewise constant
regularization behavior is achieved by setting ε = 0 (cf.
Fig. 2 for an example of 2D normal field denoising).
The data terms φl (x, n) measure the data fidelity between
measured observations and optimized normals n, and the
exact form of φl depends on the assumed noise model for
the observations, but in this work we assume convex φl . We
allow several data terms per pixel (indexed by l) in order to
allow several estimates of normals to be merged. Suitable
choices in our application are the squared and unsquared L2
distance between the observation fl and the unknown n,
l
wd (x) ψ(∇d n(x)) dx
s.t. n(x)2 = 1,
d∈{h,v}
where {h, v} denote the horizontal and vertical direction
in the image
domain. We use a (weighted) anisotropic
+
regularizer,
d wd ψ(∇d n)dx, where ψ : R0 → R is
a monotonically increasing convex function. The weights
wd ≥ 0 are induced by texture gradients — which serve
as strong cues for geometric edges — of the underlying
image. By this we aim for a discontinuity preserving,
image-driven regularization. The weighting has the effect to
facilitate regularization among homogeneous image regions,
wl (x)
n − fl (x)22 or
2
(ii) φl (x, n) = wl (x)n − fl (x)2 .
(i) φl (x, n) =
(4)
The unsquared distance is known to be more robust to outlier
measurements due to the heavier than Gaussian tails in the
underlying noise model. It has been used e.g. in image [15]
602
2
(i) φ(θ) = 21 n − f22 = 1 − cos(θ)
φ(θ) = 21 h(s) − f22
(ii) φ(θ) = n − f2 = 2 − 2 cos(θ)
φ(θ) = h(s) − f2
1.5
1
0.5
0.5
0
−πι
−πι/2 −πι/4
0
πι/4
πι/2
πι
0
−πι/4
−πι/8
0
πι/8
Figure 4: Illustration of the locally linearized norm constraint in 2D. The resulting optimization takes place in the
tangent space T with minimal parameterization in s.
πι/4
Figure 3: The two considered choices for the data fidelity
term φl have the desired property to be robust in the angular
error θ (solid lines), but represent convex linear functions in
n. In the linearized domain (dashed lines) the cost functions
are well approximated in the locality of the working point
(for the plots f = n0 is assumed).
a basis B ∈ Rn×n−1 for T . It corresponds to the null
T
space of nT
0 , i.e. B = null(n0 ). Consequently the minimal
parameterization for surface normals under a linearized
constraint is given by s ∈ Rn−1 , where s lives in a n-1
dimension subspace. 1 The needed transformation from local
parameterization to global normals and vice versa is given
by the linear maps
and shape denoising [16]. For normalized vectors it holds
that
wl
n − fl 22 = wl (1 − nT fl ).
(5)
2
Therefore the squared L2 distance equals the (weighted)
cosine distance. If we regard the data fidelity in terms of the
angular error θ = arccos(nT fl ), Fig. 3 illustrates that our
considered choices of error measurements are robust cost
functions wrt. the angular deviation. In the remainder of
this work we will focus on φl (n) = wl n − fl 2 due to the
increased robustness. With our choice of convex φl the only
source of non-convexity in the overall formulation comes
from the pixel-wise unit-length constraint on the normals,
which is addressed below.
n = h(s) = Bs + n0
s = h−1 (n) = BT (n − n0 ).
(7)
For the 2 and 3 dimensional case a basis B is directly computed via orthogonal vector and cross product expansion,
respectively. In the n-dimension case the Gram-Schmidt
process is an efficient algorithm compared to a full singular
value decomposition to retrieve the null-space.
What remains is our transformed linear, convex but nonsmooth objective function
φl (h(s)) dx + R(h(s)).
(8)
λ
min
s
Ω
l
which is now optimized within the coordinates s in the
tangent space T of the current valid linearization.
A. Linearization of norm constraint
As pointed out previously, the objective function of Eq. (1)
is a non-convex optimization problem, solely due to the nonconvex constraint on the normal vectors. One can obtain an
obvious convex relaxation by replacing the non-convex set
{n:n2 = 1} by its convex hull, {n:n2 ≤ 1}, but such
an approach would suffer from a strong tendency to shrink
the unknown normals due to the regularization term.
As an alternative we propose to use the tangent plane
approximation of the non-linear manifold, {n : n2 = 1},
and thus the non-linear constraint is replaced by a linear one.
The tangent plane at a linearization point n0 is given by
T
n22 ≈ nT
0 n0 + 2n0 (n − n0 ) = 1.
and
B. Majorize-minimize method framework
According to standards in non-linear optimization with
non-linear constraints, we also solve a sequence of surrogate
problems in order to minimize the original objective. In contrast to very general approaches such as sequential quadratic
programming, we have the guarantee of non-increasing the
objective by solving the convex surrogate problem. This is
the consequence of
def
n − fl 2 dx + R(n) s.t. nT n0 = 1
λ
Ĕ(n) =
Ω
(6)
l
(9)
being a majorizer [13] of our objective
def
n − fl 2 dx + R(n) s.t. n2 = 1
λ
E(n) =
Since the unit sphere is a smooth manifold, the resulting tangent plane equation nT
0 n = 1 gives a very good
approximation of the original constraint near n0 . Fig. 4
illustrates the linearization for the 2D case. Effectively we
have transformed the feasible set from a hyper-sphere to the
hyper-(tangent)-plane.
It allows to perform the optimization in a local paramedef
terization within the tangent space T = {n : nT
0 n = 1}.
The orthogonal complement to the working point n0 builds
Ω
l
(10)
1 The authors in [17] propose to obtain a minimal parameterization for 3D
normals by considering only 2 of the components. However, we claim that
such a representation leads to a non-uniform, data-dependent regularization,
because the resulting gradient and thus the regularization strength correlates
with the normal direction – which is an undesired property.
603
This can be seen by noting that (i) the tangent plane {n :
nT n0 = 1} is outside the unit sphere (which means that
the projection into the set {n : n2 ≤ 1} is always on
its boundary {n : n2 = 1}), and (ii) that in general the
projection into a convex set is non-expansive. More formally,
define the sets
def
B = {n : n2 ≤ 1}
and
shown (using convex conjugacy [20]) to have the form
wd ε
max pd , ∇d h(s) −
pd 22
min
p
s
2
d
Ω d∈{h,v}
max ql , (h(s) − fl ) dx
(13)
+
def
= min
∂B = {n : n2 = 1},
s
and the projection into B
ΠB (n̂) = arg minn − n̂2 =
n∈B
l
n̂
n̂/n̂2
if n2 ≤ 1
otherwise.
and
wd ε
pd 22
2
(14)
ql
p̂d /(1 + σǫ)
p̂d
2 /wd }
max{1, 1+σǫ
q̂l
.
PQ (q̂l ) =
max{1, q̂l 2 /λwl }
(11)
for suitable neighboring pixels x and x′ and our choice of
a monotonically increasing ψ(·) in Eq. 3. For the L2 data
fidelity term we obtain analogously,
2
max ql , (Bs + n0 − fl ) dx
PP (p̂d ) =
′
n̂(x) − fl (x)2 ≥ ΠB (n̂(x)) − ΠB (fl (x))2
= ΠB (n̂(x)) − fl (x) ,
−∇d T pd , Bs + n0 −
where σ denotes the dual time step. The projection operators
for the dual variables are defined as
for any monotonically increasing function f . Therefore, we
have for the discretized gradient
ψ (∇d n̂) = ψ (n̂(x) − n̂(x ))
≥ ψ ΠB (n̂(x)) − ΠB (n̂(x′ ))
pd
subject to constraints pd ≤ wd and ql 2 ≤ λwl
(applied pixel-wise). The optimization follows an iterative
procedure exhibiting gradient descent steps for the primal
variables and gradient ascent steps for the dual variables
followed by appropriate projection steps. As such in each
iteration the dual variables are updated via
= PP pkd + σ∇d h(s̄)
pk+1
d
(15)
= PQ qkl + σ(h(s̄) − fl ) ,
qk+1
l
It is well-known that projections into convex sets are (firmly)
non-expansive (see e.g. [18]), hence ΠB is non-expansive
due to the convexity of the set B. Thus, we have
ΠB (n̂) − ΠB (n̂′ ) ≤ n̂ − n̂′
f ΠB (n̂) − ΠB (n̂′ ) ≤ f (n̂ − n̂′ )
max
Ω d∈{h,v}
+
ql
l
(16)
The update of the primal variables follows the gradients of
regularization and data terms to
⎞
⎛
sk+1 = sk + τ BT ⎝
ql ⎠ .
∇d T pd −
(17)
d∈{h,v}
(12)
l
τ is the primal step size. The parametrization in tangent
space has the effect that both sets of dual variables are
inversely mapped to the local coordinate system of s. The
algorithm maintains an overrelaxation of the primal variables
in addition to sk , which are updated via s̄k+1 = 2 sk+1 −sk ,
and n̄k+1 = h(s̄k+1 ) respectively. This extrapolation step
from the previous iteration enables faster convergence of
the procedure.
For a given linearization point, and thus current local
parameterization, the algorithm is run for a few iterations
until convergence. Then we apply a projection
since the noisy observation fl (x) has unit length.
Let n̂∗ be a minimizer of our relaxed objective function
in Eq. 9 and n∗ = ΠB (n̂∗ ) (its pixel-wise projection). By
applying the above fact term-wise, we see that inserting n∗
into the original objective of Eq. 10 has a lower or equal
value than the objective in Eq. 9 at n̂∗ , and the objective
values coincide at n0 . Overall, by repeatedly minimizing
Eq. 9 and subsequent linearization of the norm constraint we
obtain a non-increasing sequence of objective values (which
is converging, since the energy is always non-negative i.e.
bounded from below).
n0 ← nk+1 = h(sk+1 )/h(sk+1 )2
(18)
on the hypersphere such that the original norm-constraint
is fulfilled. As indicated this result then serves as the new
working point n0 and a new linearization, local parameterization and orthogonal complement for the tangent space
are computed. The overall optimization is run until global
convergence.
III. O PTIMIZATION
For optimization of our objective function in Eq. (8) we
build upon the primal-dual algorithm, proposed in [19]. In
order to apply it the primal problem needs to be transformed
into a convex-concave saddle-point problem, which can be
604
(a) Noisy normals
(b) Infeasible normals
(c) TV
(d) BF
(e) 3D normal colors
Figure 5: Denoising of a 3D normal map. (a) Noisy surface normals with σ = 0.4. (b) Mask indicating infeasible normal
directions in the camera coordinate system. (c) Result with our variational regularization. It produces better results, while
(d) bilateral filtering (BF) yields bumpy surfaces (with optimal filter size w = 5 according to Fig. 6). (e) shows the coloring
of normals we are using throughout this work; i.e. each normal denotes a point on the feasible half-space.
onto the feasible half-sphere, defined by point ray r for
normal n, is performed. The cross products create a 3D
vector which is closest to n and orthogonal to r and thus
on the boundary of the feasible set.
IV. A PPLICATION TO 3D S URFACE N ORMALS
In the 3-dimensional case we are able consider visibility
information and by this constraint the feasible set of possible
normal directions. Without loss of generality let us assume
that surface normals are defined in the local camera coordinate system and that they point towards the camera. For
a given image location x = (x, y) the point ray originating
from the camera center and passing through x is defined as
r = K−1 (x, y, 1)T , where K defines the camera intrinsics.
This setup limits surface normals to lie on a half-sphere,
since only surfaces which are oriented towards the camera
are visible. As a result the constraint
nT r ≤ 0
∀x ∈ Ω
V. I MPLEMENTATION
So far our optimization has been formulated in the
continuous domain. For implementation, operation on a
discrete image grid is required. In this regard, derivative
operators have been discretized as forward differences with
reflecting (Neumann) boundary conditions. As a result the
divergence operator is based on backward differences with
Drichilet boundary conditions. For algorithm initialization
each normal is assigned the respective data term, i.e. n = f0 .
For infeasible 3D normals in the data terms, n is initialized
analogous to Eq. (20) or in case of missing data defined to
point towards the camera. In case large portions of missing
data are present in the orientation field, we have created a
coarse to fine framework for faster convergence.
The algorithm itself has been implemented as a multithreaded C++ application. Average run-times for the 3D
surface normal denoising results in Sec VI (640 × 480
images, 1000 iterations) are around 5 seconds on a current
desktop computer with 8 cores. Due to the decomposition
into independent per pixel computations for both, the local
parameterization and primal and dual update steps, the algorithm would lend itself to highly parallel implementations
on a GPU. Image gradients are computed via 3 × 3 Sobel
filters in horizontal and vertical image direction. In addition
we pre-smooth the image with a Gaussian kernel of standard
deviation of 1.2 pixels to suppress image noise.
(19)
must be fulfilled for each image location individually.
One can utilize this prior information as a preprocessing
step for the orientation field. It is encoded via the data term
weights, i.e. wl = 1 if the observed surface normal is visible
and 0 otherwise. As a result the optimization procedure will
fill in the missing values. Fig. 5(b) illustrates an exemplary
mask for infeasible normals.
Visibility consistent regularization: The previous preprocessing makes a hard decision about feasible normals.
If the constraint is just slightly violated due to noise in the
normals — but the direction is assumed to be largely correct
— we can encode the visibility constraint of Eq. (19) directly
in the optimization. It has the effect that the observation
still exhibits a driving force, but the optimized normals are
guaranteed feasible, i.e. visible. In order to achieve this
effect the iterative optimization procedure of Sec. III just
needs to be modified slightly: The original primal problem
in Eq. (8) and accordingly the primal-dual formulation of
Eq. (13) are augmented by the visibility constraint. Within
the optimization procedure it is simply handled by an
additional projection step. In each iteration after the primal
update and projection step of Eq. 18 an additional projection
⎧
⎨n
nT r ≤ 0
r×n×r
(20)
n ← PN (n) =
else
⎩
r × n × r2
VI. E XPERIMENTS AND R ESULTS
A. Denoising Performance
In a first experiment we evaluate the denoising performance of our algorithm. Required ground truth measurements have been obtained from 4 different synthetic 3D
models (one shown in Fig. 5). We rendered several depth
maps from the models and computed point-wise normals via
a least squares kernel within a RANSAC scheme to preserve
605
30
TV
BF
BF
BF
28
:
:
:
:
26
P SN R (dB)
10◦
20◦
30◦
40◦
50◦
Local coding [10]
0.227
0.427
0.563
0.660
0.732
TV (ours)
Bilateral filter
0.229
0.224
0.436
0.428
0.577
0.570
0.678
0.672
0.753
0.748
angular error ≤
λ = 1, = 0.05
w=3
w=5
w=7
24
Table I: Ratio of pixels within different angular errors
evaluated over the whole image for the NYU2 test set.
22
20
describe the observed scene geometry well, but are noisy and
may contain visually infeasible estimates. The goal of the
regularization is to obtain cleaner, more simplistic results
— which is accomplished by our algorithm. It produces a
consistent labeling within homogeneous regions and follows
boundaries well. The parameter settings for this experiments
have been λ = 0.1 and ǫ = 0.05. The decreased weight
on the data term compared to the previous experiment is
explained by the weaker quality of observations, requesting
stronger regularization. Typical examples for partial infeasible predictions are highlighted in the examples. Our method
resolves those areas to more consistent estimates. We do not
report a PSNR value, because there are no accurate ground
truth normals available for the NYU2 dataset. The training
data used in [9] or [10] is quite noisy as well, especially
for distant structures. However, we evaluated our results
equivalently to [10] to obtain an estimate about the (rough)
correctness of normals. Table I lists the scores and indicates
that we do well. To ensure fair comparison we also applied
bilateral filtering with varying kernel sizes and only report
the best performance for w = 7.
18
16
0.1
0.2
0.3
0.4
0.5
0.6
noise level σ
0.7
0.8
0.9
Figure 6: Average denoising performance measured as
PSNR (in dB). The curves represent averaged results for
9 normal maps, which were obtained from synthetic 3D
models and added by increasing artificial noise.
geometry edges. In this way 9 ground truth normal maps
have been created for evaluation. Each of them has then been
distorted via noise of varying strength, i.e. a noisy normal is
computed to n′ = d/d2 , where d = n+(rx , ry , rz )T and
ri are Gaussian random variables with σ ∈ [0.1, . . . 0.9].
We compare our results to cross bilateral filtering [21]
(which also exhibits edge-preserving properties) adapted
to orientation fields. There each normal is replaced by
a weighted average of normals from nearby pixels. The
weight itself is based on two Gaussian distributions over
the Euclidean distance of pixels and the intensity difference
of the underlying image, respectively. The corresponding
standard deviations where set to σS = 53 w, where w is
half the window size of the filter, and σI = 0.1 for
image intensities in the range [0, 1]. The weighted average
of normals is computed as the extrinsic mean (i.e. the
normalized Euclidean mean).
Quantitative performance is given in Fig. 6. We measure
the deviation of our denoised results as mean squared error
(MSE) between normals on the unit-sphere; as pointed
out in Eq. (5) this coincides with the cosine distance.
The reported peak-signal-to-noise-ratios are computed to
P SRN = 10 log10 (d2max /M SE), where dmax = 2 and
denotes the maximal error between normals. As can be
seen in Fig. 6 the performance of bilateral filtering is quite
sensitive to the chosen filter size (reported are the top results
among several tested kernel sizes). In contrast our approach
achieves superior results with one set of parameters over the
analyzed noise range.
For fusion of normal fields we follow the idea to combine
several, potentially complementary predictions to obtain
a merged result with increased accuracy. In this regards,
Fouhey et al. [9] have presented and interesting work. Their
method aims at extracting both visually-discriminative and
geometrically-informative shape primitives from RGB-D
training data. During inference the learned detector will fire
only at sparse positions and reason about the underlying
scene geometry by means of the learned shape primitives.
Obtained primitives are typically located at 3D corner- or
edge-like structures and are a strong cue for the local normal
direction. We fuse these sparse predictions with estimates
obtained from a boosted classifier, which was trained in
a similar way to [10] but without super-pixel features to
obtain estimates which do not follow boundaries. Thereby
we can demonstrate the applicability of our approach for
fusion of normal maps; though, obtained results are likely to
be worse than those in Fig 7, simply because initial estimates
are less detailed. Still, exemplary fusion results in Fig 8
demonstrate the increased accuracy we obtain over the input
estimates. In this regard, we are looking forward to see new,
complementary estimation methods being published, which
then can be merged by our approach to obtain even more
accurate models.
B. Regularization and Fusion of Normal Estimates
In a second experiment we explore the task of regularization and fusion of predicted 3D surface normal estimates
from a single image. For denoising we utilize state-of-the-art
prediction results [10] on the NYU2 [22] dataset. Fig. 7 lists
several examples for our denoising results. Initial predictions
606
Image
Prediction
Denoised
Image
Prediction
Denoised
Figure 7: Qualitative denoising results on normal estimates [10] for images from the NYU2 [22] dataset (cf. Table I). Typical
infeasible normal predictions are marked for images in the bottom row and correctly handled and resolved by our approach.
is unknown. As normals have constant length, optimization
on them is non-linear and non-convex. We have introduced a convex relaxation of the original objective function
and proven guaranteed convergence. Performed experiments
demonstrate that our algorithm is able to generate more con-
VII. C ONCLUSION
In this work we have presented a convex optimization
scheme for variational, vectorial regularization and fusion
of surface normal maps where the underlying scene depth
607
Image
Prediction
3DP
Fusion
Image
Prediction
3DP
Fusion
Figure 8: Exemplary results for fusion of different normal estimates. “Prediction” denotes pixel-wise predictions from a
boosted classifier (similar to [10]) and “3DP” are the detected primitives from [9].
sistent normal maps. It is expected that with the introduction
of complementary surface normal estimation methods results
will improve further.
Future work will concentrate on adaptive fusion of normal
estimates and the use of obtained normals for objected
detection and scene understanding or as a prior in 3D
modeling.
[10] L. Ladicky, B. Zeisl, and M. Pollefeys, “Discriminatively
Trained Dense Surface Normal Estimation,” in ECCV, Zurich,
2014, pp. 468–484. 1, 6, 7, 8
[11] D. Fouhey, A. Gupta, and M. Hebert, “Unfolding an Indoor
Origami World,” in ECCV, 2014, pp. 687–702. 1
[12] N. Silberman, R. Urtasun, A. Geiger et al., “Reconstruction
meets recognition challenge 2014,” 2014, [accessed 8-July2014]. [Online]. Available: http://cs.nyu.edu/∼silberman/
rmrc2014/indoor.php 1
R EFERENCES
[1] P. Blomgren and T. F. Chan, “Color tv: Total variation
methods for restoration of vector-valued images,” Image
Processing, IEEE Transactions on, vol. 7, no. 3, pp. 304–
309, 1998. 1
[13] K. Lange, D. R. Hunter, and I. Yang, “Optimization transfer
using surrogate objective functions,” J. Comput. Graphical
Stat., vol. 9, pp. 1–20, 2000. 2, 3
[14] P. J. Huber, “Robust regression: asymptotics, conjectures and
monte carlo,” The Annals of Statistics, pp. 799–821, 1973. 2
[2] T. Tasdizen, R. Whitaker, P. Burchard, and S. Osher, “Geometric surface smoothing via anisotropic diffusion of normals,” in IEEE Visualization, 2002, pp. 125–132. 1
[15] J.-F. Aujol, G. Gilboa, T. Chan, and S. Osher, “Structuretexture image decomposition–modeling, algorithms, and parameter selection,” IJCV, vol. 67, no. 1, pp. 111–136, 2006.
2
[3] B. K. Horn and M. J. Brooks, “The variational approach to
shape from shading,” Computer Vision, Graphics, and Image
Processing, vol. 33, no. 2, pp. 174–208, 1986. 1
[16] C. Zach, T. Pock, and H. Bischof, “A Globally Optimal
Algorithm for Robust TV-L1 Range Image Integration,” in
ICCV. IEEE, 2007, pp. 1–8. 3
[4] R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah, “Shape-fromshading: a survey,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 21, no. 8, pp. 690–706, 1999. 1
[17] P. Heise, S. Klose, B. Jensen, and A. Knoll, “PM-Huber:
PatchMatch with huber regularization for stereo matching,”
in ICCV, 2013, pp. 2360–2367. 3
[5] K. Ikeuchi, “Determining surface orientations of specular
surfaces by using the photometric stereo method,” IEEE
Trans. Pattern Anal. Mach. Intell., no. 6, pp. 661–669, 1981.
1
[18] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Modeling and
Simulation, vol. 4, no. 4, pp. 1168–1200, 2005. 4
[6] D. B. Goldman, B. Curless, A. Hertzmann, and S. M.
Seitz, “Shape and spatially-varying BRDFs from photometric
stereo,” TPAMI, vol. 32, no. 6, pp. 1060–1071, 2010. 1
[19] A. Chambolle and T. Pock, “A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging,”
Journal of Mathematical Imaging and Vision, vol. 40, no. 1,
pp. 120–145, Dec. 2010. 4
[7] D. Hoiem, A. A. Efros, and M. Hebert, “Recovering Surface
Layout from an Image,” IJCV, vol. 75, no. 1, pp. 151–172,
Feb. 2007. 1
[20] R. T. Rockafellar, Convex analysis.
Press, 1997, no. 28. 4
Princeton University
[8] A. Saxena, M. Sun, and A. Y. Ng, “Make3D: Learning 3D
Scene Structure from a Single Still Image.” TPAMI, vol. 31,
no. 5, pp. 824–40, 2009. 1
[21] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and
color images,” in Proc. ICCV, 1998, pp. 839–846. 6
[9] D. F. Fouhey, A. Gupta, and M. Hebert, “Data-Driven 3D
Primitives for Single Image Understanding,” in ICCV, 2013,
pp. 3392–3399. 1, 6, 8
[22] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor
segmentation and support inference from RGBD images,” in
ECCV, 2012, pp. 1–14. 6, 7
608