A VARIATIONAL FRAMEWORK FOR
EXEMPLAR-BASED IMAGE INPAINTING
By
Pablo Arias
Gabriele Facciolo
Vicent Caselles
and
Guillermo Sapiro
IMA Preprint Series # 2306
( April 2010 )
INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS
UNIVERSITY OF MINNESOTA
400 Lind Hall
207 Church Street S.E.
Minneapolis, Minnesota 55455–0436
Phone: 612-624-6066
Fax: 612-626-7370
URL: http://www.ima.umn.edu
Form Approved
OMB No. 0704-0188
Report Documentation Page
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington
VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it
does not display a currently valid OMB control number.
1. REPORT DATE
3. DATES COVERED
2. REPORT TYPE
APR 2010
00-00-2010 to 00-00-2010
4. TITLE AND SUBTITLE
5a. CONTRACT NUMBER
A Variational Framework for Exemplar-Based Image Inpainting
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S)
5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
University of Minnesota,Institute for Mathematics and Its
Application,207 Church Street SE,Minneapolis,MN,55455-0436
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
8. PERFORMING ORGANIZATION
REPORT NUMBER
10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES
14. ABSTRACT
Non-local methods for image denoising and inpainting have gained considerable attention in recent years.
This is in part due to their superior performance in textured images, a known weakness of purely local
methods. Local methods on the other hand have demonstrated to be very appropriate for the recovering of
geometric structures such as image edges. The synthesis of both types of methods is a trend in current
research. Variational analysis in particular is an appropriate tool for a unified treatment of local and
non-local methods. In this work we propose a general variational framework non-local image inpainting,
from which important and representative previous inpainting schemes can be derived, in addition to
leading to novel ones. We explicitly study some of these, relating them to previous work and showing
results on synthetic and real images.
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF:
a. REPORT
b. ABSTRACT
c. THIS PAGE
unclassified
unclassified
unclassified
17. LIMITATION OF
ABSTRACT
18. NUMBER
OF PAGES
Same as
Report (SAR)
26
19a. NAME OF
RESPONSIBLE PERSON
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI Std Z39-18
A Variational Framework for Exemplar-Based Image
Inpainting
Pablo Arias · Gabriele Facciolo · Vicent Caselles · Guillermo Sapiro
Abstract Non-local methods for image denoising and
inpainting have gained considerable attention in recent
years. This is in part due to their superior performance
in textured images, a known weakness of purely local methods. Local methods on the other hand have
demonstrated to be very appropriate for the recovering
of geometric structures such as image edges. The synthesis of both types of methods is a trend in current
research. Variational analysis in particular is an appropriate tool for a unified treatment of local and non-local
methods. In this work we propose a general variational
framework non-local image inpainting, from which important and representative previous inpainting schemes
can be derived, in addition to leading to novel ones. We
explicitly study some of these, relating them to previous
work and showing results on synthetic and real images.
Keywords Inpainting · Variational methods · Selfsimilarity · Non-local methods · Exemplar-based
methods
1 Introduction
Image inpainting, also known as image completion or
disocclusion, is an active research area in the image
processing field. The purpose of inpainting is to obtain a visually plausible image interpolation in a region
in which data are missing due to damage or occlusion.
P. Arias, G. Facciolo, V. Caselles
Dept. of Information and Communication Technologies, Universitat Pompeu Fabra
Barcelona, Spain
E-mail: {pablo.arias,gabriele.facciolo,vicent.caselles}@upf.edu
G. Sapiro
Dept. of Electrical and Computer Engineering, University of Minnesota
Minneapolis, USA E-mail:
[email protected]
Usually, to solve this problem, the only available data
is the image outside the region to be inpainted. In addition to its theoretical interest, image inpainting has
applications to image and video editing and restoration.
Most inpainting methods found in the literature can
be classified into two groups: geometry- and textureoriented methods. We now briefly review the developments in both types of approaches, with emphasis in
texture-oriented methods. This review will helpful for
motivating the proposed formulation.
Geometry-oriented methods. Images are modeled as functions with some degree of smoothness, expressed for instance in terms of the curvature of the level lines or the
total variation of the image. The interpolation is performed by continuing and imposing this model inside
the inpainting domain, usually by means of a partial
differential equation (PDE). Such PDE can be derived
from variational principles, as for instance in [6,19,20,
28,44,45], or inspired by physical processes [8,11,54].
These methods show good performance in propagating
smooth level lines or gradients, but fail in the presence
of texture. They are often referred to as structure or
cartoon inpainting.
Geometry-oriented methods are local in the sense
that the associated PDEs only involve interactions between neighboring pixels on the image grid. An implication of this is that among all the data available in the
image, these methods only use that around the boundary of the inpainting domain.
Texture-oriented methods. Texture-oriented inpainting
was born as an application of texture synthesis, e.g., [26,
35]. Its recent development was triggered in part by the
works of Efros and Leung [26] and Wei and Levoy [55]
using non-parametric sampling techniques (parametric
models have also been considered, e.g. [40]). In these
2
works texture is modeled as a two dimensional probabilistic graphical model , in which the value of each pixel
is conditioned by its neighborhood. These approaches
rely directly on a sample of the desired texture to perform the synthesis.
In practice these methods work progressively by expanding a region of synthesized texture. The value of
a target pixel x is copied from the center of a (square)
patch in the sample image, chosen among those that
best match the available portion of the patch centered
at x. Levina and Bickel [41] provided a probabilistic
theoretical justification for this strategy.
This method (as explained above or with some modifications) has been extensively used for inpainting [9,
10,22,25,26,48]. As opposed to geometry-oriented inpainting, these so-called exemplar-based approaches, are
non-local : To determine the value at x, the whole image
may be scanned in the search for a matching patch.
Since these texture approaches are greedy procedures (each hole pixel is visited only once), the results
are very sensitive to the order in which pixels are processed [22]. This issue was addressed in [39,56] where
the inpainting problem is stated as a probabilistic inference problem in a graphical model. In both cases, the
image is modeled using a pair-wise Markov Random
Field (MRF). In [39] an energy considering the overlap error of adjacent patches is minimized using belief
propagation. The method in [56] can be seen as an approximated Expectation-Maximization (EM) method.
A similar scheme was further explored in [38], however
with a different variational justification, closely related
to the models addressed in this work.
The algorithms of [38,56] require an initial completion which, as it turns out, has a great impact on
the final result. In both cases the authors resort to a
multiscale approach: The image is filtered and subsampled, possibly several times. Starting from the coarsest,
the inpainting algorithm is then applied sequentially
to each scale, using the upsampled result of the previous scale as initialization. This so-called multiscale
approach yields very good results, and has been also
applied successfully to other exemplar-based methods
[30].
Another variational justification for texture-based
methods was presented in [24], where the inpainting
problem is reformulated as that of finding a correspondence map Γ : O → Oc , O being the inpainting domain
and Oc its complement w.r.t. the image domain. Denoting the image by u, the inpainted value at position
x ∈ O is then given by u(x) = u(Γ (x)), being Γ (·)
the correspondence map. The authors proposed a continuous energy functional in which the unknown is the
correspondence map itself:
Z Z
(u(Γ (x − y)) − u(Γ (x) − y))2 dydx,
E(Γ ) =
O
Ωp
where Ωp is the patch domain (centered at (0, 0)). Thus
Γ should map a pixel x and its neighbors in such a way
that the resulting patch is close to the one centered at
Γ (x). This model has been the subject of further analysis by Aujol et al. [4], where extensions are proposed
and the existence of a solution in the set of piecewise
roto-translation maps Γ is proved. The authors also
discuss the use of different regularizers which include
a regularization of the curvature of the level lines (or
surfaces) of the image. These approaches are theoretical and no numerical optimization scheme is available
so far.
A different variational model was presented in [49].
Images are modeled as ensembles of patches on a given
patch manifold. For inpainting, the patch manifold can
be learned from the set of patches in the hole’s complement. The method is iterative, with each iteration
having two steps. First, the patches in the hole are projected onto the manifold. Since this is done for each
patch independently, the projected patches are not necessarily coherent with each other, i.e. overlapping patches
may have different values in the overlap region. Therefore, in the second step, an image is computed by averaging the patches in the ensemble.
Another front of activity is given by the techniques
based on the sparseland model [1,16], in which the image is restricted to have a sparse representation over an
overcomplete basis or dictionary [1,27,43]. The main
difference between dictionary-based and exemplar-based
methods lies in where the missing information is obtained from. Dictionary based methods look for the
missing data in the dictionary (as a linear combination of a few atoms), whereas exemplar-based methods
assume that the information needed lies elsewhere in
the image itself (or in a database of images [33]). These
methods perform well in problems with small or scattered inpainting domains, but fail with a large holes.
Exemplar-based methods provide impressive results
in recovering textures and repetitive structures. However, their ability to recreate the geometry without any
example is limited and not well understood. Therefore,
different strategies have been proposed which combine
geometry and texture inpainting [9,17,25,37]. These
methods usually decompose the image in some sort of
structure and texture components. Structure is reconstructed using some geometry-oriented scheme, and this
is used to guide the texture inpainting.
Contributions. Despite these combined methods, geometry and texture inpainting are still quite separate fields,
3
each one with its own analysis and implementation tools.
Variational models as the one introduced in this paper can provide common tools allowing a unified treatment of both trends. We therefore propose a variational
framework for non-local image inpainting as a contribution to the modeling and analysis of texture-oriented
methods.
Like the non-local means denoising algorithm [5,
14] we encode the image redundancy and self-similarity
(measured as patch similarity) as a non-local weight
function w : O×Oc → R. This function serves as a fuzzy
correspondence, and differs from the works [4,24], although a (eventually multivalued) correspondence map
can be approximated as a limit of our model. The proposed formulation is rather general and different inpainting schemes can be derived naturally from it, via
the selection of the appropriate patch similarity criterion. In this work we present four of them, patch NLmeans, -medians, -Poisson and -gradient medians, corresponding to similarity criterions based on L2 - and
L1 -norms between patches or their gradients. We provide an comprehensive empirical comparison on real
and synthetic problems, showing the benefits and limitations of each model.
The patch NL-means is related to the method of [38,
56] and can be interpreted in terms of the mean shift
[21] and the manifold models of [49]. The other schemes
are, to the best of our knowledge, novel. Both gradientbased methods, patch NL-Poisson and -gradient medians combine the exempar-based interpolation with
local PDE-based diffusion schemes. This results in a
smoother continuation of the information across the
boundary and inside the inpainting domain, and in a
better propagation of structures.
Finally, we also present an initial discussion along
the line of providing a formal modelling of the multiscale scheme. We argue that this approach is not just
some heuristic to find a good minimum of the single
scale inpainting functional, but constitutes an inpainting criterion itself.
Although the focus of this work lies on inpainting,
the framework we are introducing can be adapted for
its application in other contexts. In particular it is related to graph-based non-local regularizers. Inspired by
graph regularization techniques [57], these approaches
model the image as a graph characterized by the similarity weights [32,42]. Recently, our formalism [3] was
extended by Peyré et al. in [51] to provide a variational
justification to the case in which the graph is adaptive.
On the other hand the patch NL-means scheme provides a model for an iterative version of the non-local
means algorithm with adaptive similarity weights. Similar approaches have been applied to denoising [5,52],
super-resolution [53], texture denoising [12], and demosaicing [15] among others. We will discuss the relation
with these models in the text.
The rest of this document is organized as follows. In
Section 2 we introduce the proposed variational framework, together with the derivation and the empirical
comparison of the different inpainting schemes. The
links with related work are discussed in Section 3. In
Section 4 we present and discuss the multiscale approach. And in Section 5 we present experimental results on real images allowing to compare our results
with the state of the art. Concluding remarks and future work is discussed in Section 6.
We note that a preliminary version of this work has
been presented in [3].
Notation. Images are denoted as functions u : Ω → R,
where Ω denotes the image domain, usually a rectangle
in R2 . Pixel positions are denoted by x, x̂, z, ẑ or y,
the latter for positions inside the patch. A patch of u
centered at x is denoted by pu (x) = pu (x, ·) : Ωp → R,
where Ωp is a rectangle centered at (0, 0). The patch is
defined by pu (x, y) = u(x + y), with y ∈ Ωp . O ⊂ Ω
is the hole or inpainting domain, and Oc = Ω \ O. We
still denote by u the part of the image u inside the hole,
while û is the part of u in Oc : û = u|Oc . Additional
notation will be introduced in the text.
2 Variational framework
Our variational framework is inspired by the following
non-local functional
Z Z
w(x, x̂)(u(x) − û(x̂))2 dx̂dx
(1)
Fw (u) =
O
Oc
where w : O × Oc → R+ is a weight function that
measures the similarity between patches centered in the
inpainting domain and in its complement.
Let us assume for the moment that the weights are
known. The minimum of (1) should have a low pixel
error (u(x) − û(x̂))2 whenever the similarity w(x, x̂) is
high. In this way the similarity weights drive the information transfer from known to unknown pixels. A
similar functional was proposed in [31] as a non-local
regularization energy in the context of image denoising. It models the non-local means filter [5,14] when
the weights are Gaussian
1
2
w(x, x̂) ∝ exp − kpu (x) − pû (x̂)k ,
h
where k·k is a weighted L2 -norm in the space of patches
and h is a parameter that determines the selectivity of
the weigths w.
4
In [31] the weights are considered known and remain
fixed through all the iterations. While this might be appropriate in applications where they can be estimated
from the noisy image, in the image inpainting scenario
here addressed, the weights are not available and have
to be inferred together with the image (as in [50,53]).
One of the novelties of the proposed framework is the
inclusion of adaptive weights in a variational setting.
For this reason, we will consider the weight function
w as an additional unknown. Instead of prescribing explicitly the Gaussian functional dependence of w w.r.t.
u, we will do it implicitly, as a component of the optimization process. In doing so, we obtain a simpler functional, avoiding to deal with the complex, non-linear dependence between w and u. In our formulation, w(x, ·)
is a probability density function,
Z
w(x, x̂)dx̂ = 1,
Oc
and can be seen as a relaxation of the one-to-one correspondence map of [4,24], providing a fuzzy correspondence between each x ∈ O and the complement of the
inpainting domain.
In this setting, we propose an energy which contains
two terms, one of them is inspired by (1) and measures
the coherence between the pixels in O and those in Oc ,
for a given similarity weight function w. This permits
the estimation of the image u from the weights w. The
second term allows us to compute the weights given the
image. The complete proposed functional is
Z
1
Hw (x)dx,
(2)
E(u, w) = Few (u) −
h
e
O
where
Few (u) =
Z Z
e
O
ec
O
w(x, x̂)ε(pu (x) − pû (x̂))dx̂dx,
(3)
ε(·) is an error function for image patches, and
Z
w(x, x̂) log w(x, x̂)dx̂,
Hw (x) = −
ec
O
is the entropy of the probability w(x, ·).
e the extended inpainting domain, as the
We take O,
set of centers of patches that intersect the hole, i.e.
e = O+Ωp = {x ∈ Ω : (x+Ωp )∩O 6= ∅}. Thus, patches
O
e c are entirely outside O (Figure
pû (x̂) centered in x̂ ∈ O
1). Accordingly, we consider that
the weight function
R
c
e
e
w is defined over O × O and Oec w(x, x̂)dx̂ = 1. For a
e + Ωp ⊆ Ω,
simplified presentation, we assume that O
e supports a patch centered on it
i.e. every pixel in O
e c to
and contained in Ω. Analogously, we also shrink O
c
e
have O + Ωp ⊆ Ω.
Let us now make some additional comments on the
functional. We observe that the term (u(x) − û(x̂))2 in
(1), that penalizes differences between pixels, is substituted in (3) by the patch error function ε(pu (x)−pû (x̂)).
This has two consequences. First, minimizing (3) with
respect to the image will force patches pu (x) to be similar to pu (x̂) for each pair x, x̂ such that w(x, x̂) is large.
The other implication has to be understood together
with the inclusion of the second term, which integrates
e For a
the entropy of each probability w(x, ·) over O.
e
given completion u, and for each x ∈ O, the optimum
weights minimize the mean patch error for pu (x), given
by
Z
w(x, x̂)ε(pu (x) − pû (x̂))dx̂,
ec
O
while maximizing the entropy. For instance taking ε as
the squared L2 -norm of the patch, then the resulting
weights are Gaussian. This can be related to the principle of maximum entropy [36], widely used for inference
of probability distributions. The parameter h controls
the trade-off between both terms and is also the selectivity parameter of the Gaussian weights. The trivial
minima of E with w(x, x̂) = 0 everywhere is discarded
by restricting w(x, ·) to be a probability.
The patch error function. Patches are functions defined
on Ωp , and the error function ε : RΩp → R+ is defined
as the weighted sum of pixel-wise errors e : R → R+
Z
ga (y)e(u(x + y) − û(x̂ + y))dy,
ε(pu (x) − pû (x̂)) :=
Ωp
(4)
where the intra-patch weight function ga is a Gaussian
centered at the origin with standard deviation a. We
will consider the L1 - and the squared L2 -norms as particular cases of ε(·), with e(·) = | · | and e(·) = | · |2
respectively.
We will also consider patch error functions involving the gradient of the image. In an abuse of notation
we will denote the gradient’s patch and pixel-wise error
Ω
functions as ε : R2 p → R+ and e : R2 → R+ , respectively. It will be clear from the argument which case
Ris intended, as in this example ε(p∇u (x) − p∇u (x̂)) :=
Ωpga (y)e(∇u(x + y) − ∇û(x̂ + y))dy.
Additionally, we can impose an additive penalization of the distance ϕ(x − x̂) as in [38] by modifying
the patch distance function εd (pu (x) − pu (x̂), x, x̂) =
ε(pu (x) − pu (x̂)) + ϕ(x − x̂) to penalize the use of distant patches. We will not consider this modification in
the current work.
As it will be discussed below, the patch error function determines not only the similarity criterion but
5
also the image synthesis, and thus is a key element in
the proposed framework.
1
ec
δ(x̂ − n(x)), where n(x) ⊆ O
as limh→0 w(x, x̂) = |n(x)|
is the set of nearest neighbors of x, defined as
n(x) = arg min ε(pu (x) − pû (x̂)).
e
x̂∈O
2.1 Minimization of E
We have formulated the inpainting problem as the constrained optimization
(u∗ , w∗ ) = arg min E(u, w)
u,w
Z
w(x, x̂)dx̂ = 1
subject to
ec
O
(5)
e
∀x ∈ O,
where E is the inpainting energy defined in (2). For energies involving image gradients we will consider Dirichlet conditions at the boundary between the inpainting
domain and the image, and Neumann conditions at the
boundary of the image, i.e. u(x) = û(x) in ∂O \ ∂Ω and
∇u(x) · n(x) = 0 for x ∈ ∂O ∩ ∂Ω.
To minimize the energy E, we use an alternate minimization algorithm. At each iteration, two optimization steps are solved: The constrained minimization of
E with respect to w while keeping u fixed; and the
minimization of E with respect to u with w fixed. This
procedure yields the following iterative scheme:
Algorithm 1 Alternate minimization of E(u, w).
Require: Initial Condition u0 (x) with x ∈ O.
1: repeat
2:
Update Weights: wk =R arg minw E(uk , w),
s.t. Oec w(x, x̂)dx̂ = 1.
3:
Update Image: uk+1 = arg minu E(u, wk ).
4: until Stop Criterion: kuk+1 − uk k < tolerance.
In the weights update step, the minimization of E
w.r.t. w yields:
1
1
wk (x, x̂) =
exp − ε(puk (x) − pû (x̂)) ,
(6)
q(x)
h
R
where q(x) = Oec exp − h1 ε(puk (x) − pû (x̂)) is a normalization factor that makes w(x, ·) a probability. For
the functionals defined with gradient patches, the weight
update equation is analogous to (6) replacing the patch
error function with ε(p∇uk (x) − p∇û (x̂)).
The parameter h determines the selectivity of the
similarity. If h is large, maximizing the entropy becomes more relevant, yielding weights which are less
selective. In the limit, when h → ∞, w(x, ·) becomes
ec . On the other hand, a
a uniform distribution over O
small h yields weights which concentrate on the patches
similar to pu (x). When h → 0, we compute the weights
That is w(x, ·) can be considered as an approximation to a multivalued correspondence. For simplicity,
in practice we assume that |n(x)| = 1, i.e. the nearest
neighbor is unique.
The image update step deserves more attention and
is described next.
2.1.1 Image update step.
In this section we present the derivation of the image update step corresponding to the four patch error functions mentioned earlier. First we will present
the cases when image patches are compared using the
squared L2 -norm and the L1 -norm. We refer to the resulting algorithms as patch-wise non-local means (patch
NL-means), and medians (patch NL-medians). Then
we consider functionals involving the gradients of the
patches both with the squared L2 -norm and the L1 norm. These methods will be referred as patch-wise
non-local Poisson (patch NL-Poisson), and gradient medians (patch NL-GM).
Before moving to the derivation of the these schemes,
let us remark that with the change of variables z =
x + y, ẑ = x + y ′ , the image energy term can be expressed as an accumulation of pixel-wise errors:
Z Z
w(x, x̂)
Few (u) =
e O
ec
O
Z
ga (y)e(u(x + y) − û(x̂ + y))dydx̂dx
=
Z Z
O
Ωp
m(z, ẑ)e(u(z) − û(ẑ))dẑdz + C,
(7)
Oc
where C is a constant term. We have introduced the
pixel-wise influence weights m : O × Oc → R+ defined
as
Z
χOec (ẑ − y)ga (y)w(z − y, ẑ − y)dy. (8)
m(z, ẑ) :=
Ωp
ec and 0 on O.
e
The function χOec takes the value 1 on O
An analogous expression can be computed for gradient
patch error functions. This rewriting simplifies the following derivations and provides some insights on the
implications of using patch-wise errors.
For each pair of pixels (z, ẑ) ∈ O × Oc , m(z, ẑ)
weights the effective contribution of the pixel-wise error
between u(z) and û(ẑ) in the total value of the energy.
The quantity m(z, ẑ) is computed by integrating the
similarity w(z−y, ẑ−y) between all patches that overlap
6
Fig. 1 Patch-wise non-local inpainting. The value at z ∈ O
is computed using contributions from all the patches that overlap
e such that z = x + y
it, these are patches centered at x ∈ O
with y ∈ Ωp . The influence function m(z, ẑ) accumulates all the
contributions w(z − y, ẑ − y) from patches centered at ẑ − y to
z − y.
ẑ and those that overlap z in the same relative position
(shown in Figure 1). It tells us how much evidence there
is supporting a correspondence between
R the locations z
and ẑ. Also observe that for each z, Oc m(z, ẑ)dẑ = 1,
then m(z, ·) can also be interpreted as a probability
density function. Note that the energy (7) corresponds
to (1), with the patch similarity weights w being substituted with the pixel-wise influence weights m, a sort
of spacial convolution of w with kernel ga .
Patch non-local means. If we use a weighted squared
2
L
ε(pu (x) − pû (x̂)) :=
R -norm as a patch error function
2
g
(y)|u(x
+
y)
−
û(x̂
+
y)|
dy
in
(3) then the image
Ωp a
energy term (7) is quadratic on u, and its minimum for
fixed weights w can be computed explicitly as a nonlocal average:
Z
1
m(z, ẑ)û(ẑ)dẑ,
(9)
u(z) =
c(z) Oc
Rfor z ∈ O, where the normalization constant c(z) :=
Oc m(z, ẑ)dẑ. Although c(z) = 1 with the current definition of w and ga , we will keep a generic notation in
the following derivations.
The formal similarity with the non-local means equation hides some important differences, which are a direct consequence of the use of a patch error function in
the image energy term. To obtain more insight about
this let us expand m to obtain:
Z
Z
1
w(z − y, x̂)û(x̂ + y)dx̂dy.
ga (y)
u(z) =
c(z) Ωp
ec
O
There are two averaging processes involved in the synthesis. The outer integral goes through all patches pu (z−
y) overlapping the target pixel z. Each patch suggests
a value for z resulting from the inner sum: A non-local
average of the pixel at position y in all patches pû (x̂)
ec . This sum is weighted by the similarity between
in O
the patch pu (z − y) and each pû (x̂).
Therefore, we can distinguish two types of pixel
interactions. Interactions due to the patch overlap of
nearby pixels in the image lattice and non-local interactions driven by the similarity weights. The latter can
be controlled by the selectivity parameter h, but the
extent of the overlap interactions is given by the patch
size. In particular when h → 0 Eq. (9) yields
Z
ga (y)û(n(z − y) + y)dy
u(z) =
Ωp
(recall we are assuming a unique nearest neighbor). For
e the nearest neighbor of pu (x) is centered
each x ∈ O,
at x and all these patches are averaged according to ga .
This blending may cause some blur, which leads us to
consider the L1 -norm in the search of a more robust
image synthesis.
In Appendix A we prove the existence of minima for
the Patch NL-Means energy in the continuous setting.
Patch non-local medians. The L1 -norm patch error function in the image energy term corresponds to taking
e(x) = |x| in (7). The Euler equation for u, given the
influence function m, can be formally written as
Z
sign[u(z) − û(ẑ)]m(z, ẑ)dẑ ∋ 0.
[δu Few (u)](z) =
Oc
This expression is multivalued, since sign(r) = r/|r| if
|r| > 0 and sign(r) ∈ [−1, 1] if r = 0. Its solution for
each u(z), z ∈ O is obtained as a weighted median of
the pixels of the complement Oc , with weights m(z, ·).
Both schemes presented so far perform inpainting
by transferring (by averages or medians) known gray
levels into the inpainting domain. As we will see next,
using a patch error function based on the gradient of
the image yields methods which transfer gradients and
compute the resulting image as the solution of a PDE.
This results in better continuation properties of the solution, in particular at the boundary of the inpainting
domain.
Patch non-local Poisson. The squared L2 -norm of the
gradient in the image energy term (3) corresponds to
taking e(·) = k · k2 in (7), where k · k is the Euclidean
norm in R2 . The energy term becomes
Z Z
Few (u) =
m(z, ẑ)k∇u(z) − ∇û(ẑ)k2 dẑdz. (10)
O
Oc
Recall that we consider Dirichlet conditions at the boundary between the inpainting domain and the known data
region, and Neumann conditions at the boundary of the
image, i.e. u(x) = û(x) in ∂O \ ∂Ω and ∇u(x)·n(x) = 0
for x ∈ ∂O ∩ ∂Ω. The Euler equation w.r.t. u is given
7
by a non-local Poisson equation, i.e. a Poisson equation
with non-local coefficients:
∇ · [c(z)∇u(z)] = ∇ · [c(z)v(z)],
for all z ∈ O, where u|Oc = û, c(z) =
the field v : O → R2 is given by
Z
1
m(z, ẑ)∇û(ẑ)dẑ.
v(z) =
c(z) Oc
(11)
R
Oc
m(z, ẑ)dẑ and
(12)
The solution of (11) is computed with a conjugate gradient algorithm.
Observe that the solutions of (11) are minimizers of
Z
c(z)k∇u(z) − v(z)k2 dz.
O
Therefore, u is computed as the image with the closest
gradient (in the L2 sense) to the guiding vector field
v, which corresponds to a non-local weighted average
of the gradients in the complement. See [47] for further
uses of the Poisson equation in image editing.
Patch non-local gradient medians. To complete our study
of the image energy term we consider the L1 -norm of
the gradient, that is
Z Z
m(z, ẑ)k∇u(z) − ∇û(ẑ)kdẑdz,
Few (u) =
O
Oc
with the same boundary conditions as with the patchwise NL-Poisson. To minimize it we perform an implicit
gradient descent:
Z Z
min
m(z, ẑ)k∇ut+1 (z) − ∇û(ẑ)kdẑdz+
t+1
u
O
Oc
+
1
kut+1 − ut k2 , (13)
2δt
where at each step ut+1 is computed using the fixed
point algorithm described in the Appendix B, based on
the projection method presented in [18].
As patch NL-Poisson, this scheme transfers gradients and interpolates the gray levels using the boundary conditions. With the use of the L1 error function,
we expect the solution of patch NL-GM to retain more
small scale detail than that of patch NL-Poisson.
Notice that for both gradient-based methods the
patch similarity weights w are computed based only
on the gradients (and thus also the pixel-wise influence weights m). In most cases however, the gradient
is not a good feature for measuring the patch similarity, and it would be desirable to consider also the gray
level data. This can easily be achieved within our variational framework by modifying the patch error function to take into account both gradients and gray level
patches.
Functionals combining intensity and gradients. We can
use any of the gradient-based energies in conjunction
with the patch NL-means energy by considering a linear
combination of the corresponding patch error functions.
This yields
Z Z
Feλ (u) =
m(z, ẑ)[(1 − λ)k∇u(z) − ∇û(ẑ)kp +
O
Oc
+ λ|u(z) − û(ẑ)|2 ]dẑdz
(14)
where the parameter λ ∈ (0, 1) controls the mixture
and p is either 1 or 2. The resulting algorithms update
the weights w considering both the intensity and the
gradient, therefore improving their selectivity.
Regarding the image update, notice that (14) can
be rewritten as
Z Z
Feλ (u) ∝
m(z, ẑ)k∇u(z) − ∇û(ẑ)kp dẑdz
O Oc
Z
λ
c(z)|u(z) − f (z)|2 dz,
+
1−λ O
R
where f (z) = c(z)−1 Oc m(z, ẑ)û(ẑ)dẑ is the solution
of the patch NL-means image update step. Thus we see
that the combination with the squared L2 patch error
function translates into a patch NL-means attachment
term.
For the case of patch NL-Poisson the Euler equation
w.r.t. u becomes:
λ
c(z)u(z) =
(1 − λ)
λ
= ∇ · [c(z)v(z)] −
c(z)f (z), (15)
(1 − λ)
∇ · [c(z)∇u(z)] −
which is linear and can be solved with a conjugate gradient scheme.
The patch NL-GM combined functional has basically the same form as (13) and therefore can also be
solved with the fixed point scheme described in Appendix B.
2.2 Comparison of proposed schemes
Let us advance some results on synthetic problems to
highlight the main characteristics of the proposed methods. First we consider the inpainting of a regular texture
(shown in Figure 2) with two different mean intensities,
where the inpainting domain hides all patches on the
boundary between the dark and bright textures. With
this example we can test the ability of each method
to create an interface between both regions. Situations
like these are common in real inpainting problems, for
instance due to inhomogeneous lighting conditions. We
have also added Gaussian noise with standard deviation
8
Fig. 2 Inpainting of a synthetic texture. The initial condition is shown in the first column. The other four columns show a zoom
(region in the red rectangle) of the results of patch NL-means, -medians, -Poisson and -GM. Top row, h = 0, bottom row h = 200,
h = 14, h = 400 and h = 20, respectively. The intra-patch weight kernel ga is shown in the bottom right corner of the initial condition,
it has a standard deviation a = 5 and the patch size is s = 15.
σ = 10 to show the influence of the selectivity parameter h. Each column of Figure 2 shows the results of
the four methods described in the previous section. We
have tested each method with h = 0 (top row), and
h > 0, chosen approximately to match the expected
deviation of each patch error due to the noise.
The first notorious difference is on how the methods handled the transition between the dark and bright
textures. Patch NL-means produces a smooth transition whereas a sharp step is obtained with the patch
NL-medians. On the other hand, both gradient based
methods yield a much smoother shading of the texture.
This is due to the fact that the image update step is
computed as the solution of a PDE which diffuses the
intensity values present at the boundary of the inpainting domain. These PDEs are driven by a vector field estimated non-locally and therefore combine non-local exemplar based inpainting with local interpolation PDEs.
For the case of patch NL-Poisson this interpolation is
linear, since this is a solution of the homogeneous Poisson equation (i.e. Laplace equation).
250
patch NL−means
patch NL−medians
patch NL−Poisson
patch NL−GM
known image
200
150
100
40
50
60
70
80
250
90
100
110
patch NL−means
patch NL−medians
patch NL−Poisson
patch NL−GM
known image
200
150
100
40
50
60
70
80
90
100
110
Fig. 3 Profiles of the results in Figure 2. The profiles are
taken from an horizontal line going between the circles in Figure
2. Top: results with h = 0 and bottom: results with h > 0.
As expected, the results using a higher h show some
denoising, since for larger h the patch similarity weights
are less selective. This effect can be better appreciated
in the profiles shown in Figure 3, which depict the image
values for a horizontal line between the circles. In the
usual context of inpainting, in which the available data
is not perturbed by noise, this denoising translates into
an undesirable loss of texture quality. For that reason
we use h = 0.
Recall that in the limit when h = 0, the weights
w(x, x̂) converge to a Dirac’s delta function at the set
n(x) of nearest neighbors of pu (x). Even if we assume
that the nearest neighbor is unique, the value of a pixel
is still computed from a population obtained from those
nearest neighbors of all patches that overlap the pixel.
In the case of periodic patterns, once the minimization
has reached a stable state, all values in the population
will be basically the same: patches surrounding a pixel
agree on its value, and in this case all schemes behave
similarly. Differences arise when neighboring patches
cannot agree on their suggested values. Such is the case
of the step in Fig. 2. For non-periodic patterns and random textures this disagreements will be common, which
may affect the perceptual similarity of the synthesized
texture with the original.
Figure 4 studies this effect. The image consists of
Gaussian noise. The top row shows the resulting completions, and the bottom row shows an estimate of the
local variance (computed by smoothing the image of the
squared differences w.r.t. the mean). The red curves require a brief explanation. Following [7], we use the term
Nearest Neighbor Field (NNF) to refer to the vector
e where n(x) ∈ O
ec is the
field n(x) − x, defined over O,
position of the (assumed) unique nearest neighbor of
pu (x). In Figure 4 we show in red the boundaries of the
regions with constant NNF. In those regions patches
are translated rigidly from somewhere in the complement. We can observe that, for all inpainting schemes,
the resulting completion consists basically of a patchwork of large regions of rigid translation. The interior of
these regions, away from their boundaries (red curves),
9
Fig. 4 Inpainting of random texture. From left to right: patch- NL-means, NL-medians, NL-Poisson and NL-GM. Top row:
inpainting results. Bottom row: local variance estimate of the noise, superimposed with the boundaries between regions of constant
NNF (red curves). Notice the attenuation of the variance over the red curves, specially for L2 -based methods.
Fig. 5 Inpainting of structured texture. From left to right: Initial condition, result of path NL-means, -medians, -Poisson and
-GM. Results with s = 15, a = 5 and h = 0. The treatment of color images is described in Section 5.
reproduces an exact copy of the source, and thus the
variance is preserved. The variance decreases along the
red curves, where pixel values are synthesized using incoherent contributions. Both for the intensity models or
the gradient models, the L2 -norm causes a higher decay
of the variance than with the L1 -norm. This effect can
also be observed in figures 2 and 3, where it can be seen
that patch NL-medians and -GM preserve more of the
Gaussian noise than the methods based on L2 -norms.
We will refer to the regions of constant NNF as copy
fronts. Recall however that copy occurs only away from
their boundaries.
In Figure 5 we show results on a non-periodic, structured texture. We can see that patch NL-means and Poisson show some smoothing, whereas both L1 -based
schemes obtained sharper results. Both intensity based
methods show discontinuities at the boundary of the
inpainting domain. Notice also that for the patch NLmedians algorithm it is easy to identify the regions in
the complement that have been replicated. Where two
copy fronts meet a seam is produced. With the patch
NL-means the spatial averaging of overlapping patches
creates a smooth blending of the copy fronts.
2.2.1 Combined schemes
Gradient based methods produce smooth interpolations
and enforce the continuity of the image at the boundary
of the inpainting domain, which are generally desirable
features. For this to be done variationally the similar-
ity weights w have to be computed using patches of
the gradient, which in most cases does not provide a
reliable measure of patch similarity. In practice better
results are obtained by combining these methods with
patch NL-means, allowing to take the image values into
account for the computation of the patch error, and to
synthesize the image with a diffusion PDE. This adds
a parameter λ, which controls the mixture.
In Figure 6 we show some results corresponding to
the combination of both gradient schemes with patch
NL-means while varying the mixture coefficient λ. The
image shows a periodic pattern with an illuminance gradient. Most of the dark exemplars are incomplete, and
thus only bright exemplars from the bottom of the image are available. The rightmost detail shows the result
of patch NL-means: the image has been completed using bright patches and presents a discontinuity on the
upper side of the hole. On the other hand, a completion
using gradients only (see result of the patch NL-GM)
manages to interpolate both the texture and the shading. The small images on the right show results of both
gradient methods with different values for λ. The value
of the mixing parameter λ should be carefully selected
since it mixes two different magnitudes (norms of gradients and gray levels). With λ ∼ 0.1 for patch NLPoisson + NL-means and λ ∼ 0.01 for patch NL-GM
+ NL-means, some of the good continuation properties
are preserved and enough color information is added to
the patch metric.
10
Fig. 6 Linear combination of gradient based methods with NL-means. First image starting from the left: Initialization. The
gray rectangle is the inpainting domain. Only patches centered outside the red area are available. Second image: Result obtained with
patch NL-GM, using λ = 0. A similar result is obtained with the patch NL-Poisson with λ = 0 (not shown). Details: Top, from left
to right: results with patch NL-Poisson with λ = 0.01, 0.05, 0.1, 1 (the latter corresponds to patch NL-means). Bottom, patch NL-GM
with λ = 0.001, 0.005, 0.01. All details have been linearly stretched for display. Differences are most noticeable in the top and in the
left of the images.
GAP
2.2.2 Geometric interpolation
To evaluate the ability of each method to continue the
geometry of the structures at the boundary of the inpainting domain, we consider a very simple image with
a gap (shown in Table 1). The inpainting region is initialized with the background color. For this evaluation
we fix the size of the patch and increase the width of
the gap. For narrow gaps, the method will be able to
join both ends of the green vertical line. When increasing the gap, at a certain width the method is no longer
capable of recovering the vertical line. In these cases
the gray initialization prevails, showing a completion
in which the vertical line is interrupted. The first column of Table 1 shows the maximum height of the gap
for each method. Observe that the combined schemes
improve the prolongation of geometric structures, and
that the optimal mixing parameter λ for this purpose is
near to the values that we have proposed in the previous
section. Basically these schemes have two propagation
mechanisms: A local one, by diffusion of the intensity
values by the PDE, and a non-local one by transference
of gradients from Oc . When λ > 0 these mechanisms
reinforce each other: the diffused values allow a better
estimation of the weights, and therefore the transference of more appropriate gradients, which will help to
diffuse the intensity values further. On the other hand,
intensity based methods depend only on the iteration
of weights computation and image update to propagate
information.
The second column of Table 1 shows the results obtained by incorporating the confidence mask later described in Section 2.3. An alternative way to prolong
the geometric structures is to increase the patch size as
will be discussed in Section 4.
Method
P+Mλ=0
P + M λ = 0.1
P + M λ = 0.5
P+Mλ=1
Md
GM + M λ = 0
GM + M λ = 0.01
GM + M λ = 0.001
13
16
11
9
7
15
21
40
GAP
tc = 5
40
46
34
29
42
> 56
> 56
> 56
Table 1 Geometric interpolation. The inpainting domain is
shown in white and the patch in the lower right corner (9 × 9
pixels). The table reports the maximum gap width for which the
algorithm is capable of recovering the vertical line. P, M, Md and
GM stand for patch NL-Poisson, -Means, -Medians and -Gradient
Median respectively. The rightmost column shows the maximal
gaps obtained with the use of a confidence masks with decay
tc = 5 (see Section 2.3).
2.3 Extensions
Color images. An energy for color images can be obtained by defining a patch error function for color patches
as the sum of the error functions of the three scalar
components:
ε(pu (x) − pû (x̂)) =
3
X
ε(pui (x) − pûi (x̂)),
i=1
where u : Ω → R3 is the color image, and ui , with
i = 1, 2, 3, its components (analogously for gradientbased errors). Given the weights, each channel is updated using the corresponding scheme for scalar images.
All channels are updated using the same weights.
Confidence mask. For large inpainting domains, it is
useful to introduce a mask κ : Ω → (0, 1] which assigns
11
a confidence value to each pixel, depending on the certainty of its information (see also [22,39]). This will help
in guiding the flow of information from the boundary
towards the interior of the hole, eliminating some local
minima and reducing the effect of the initial condition.
The resulting image energy term takes the form
Z Z
Few (u) =
κ(x)w(x, x̂)ε(pu (x) − pû (x̂))dx̂dx,
e
O
ec
O
where κ modulates the penalization of the incoherences
between w and the error ε between patches.
The effect of κ on the image update step can be seen
on the pixel-wise influence weights m
Z
χOec (ẑ − y)ga (y)κ(z − y)w(z − y, ẑ − y)dy.
m(z, ẑ) =
Ωp
Thus, the contribution of the patch pu (z − y) to the
evidence function is now weighted by its confidence.
Patches with higher confidence will have a stronger influence. Notice that with the inclussion of the confidence mask, the normalization coefficient c(z) becomes:
Z
Z
ga (y)κ(z − y)dy.
c(z) =
m(z, ẑ)dẑ =
Oc
Ωp
This does not affect intensity-based methods, but has
implications on the image update step for gradientbased ones. For example, for patch NL-Poisson the errors with respect to the non-local guiding vector field
k∇u(z) − v(z)k are penalized according to c(z).
On the similarity weights, the confidence mask has
the effect of modifying the selectivity parameter h by
a locally varying h/κ(x). If the confidence is high, the
effective selectivity h/κ(x) will be lower, thus increasing
the selectivity of the similarity measure. When h → 0
the weights tend to a Dirac’s delta independently of
κ. The same reasoning applies to the gradient based
energies.
For the experiments shown in this paper, the confidence mask was set to
(
(1 − κ0 ) exp − d(x,∂O)
+ κ0 if x ∈ O,
tc
κ(x) =
1
if x ∈ Oc ,
which shows an exponential decay w.r.t. the distance
to the boundary inside the hole d(·, ∂O). Here tc > 0
is the decay time and κ0 > 0 determines the asymptotic value reached far away from the boundary. Setting
tc = 0 amounts to using a constant confidence mask.
Table 1 shows the effect of using a confidence mask with
tc = 5 and κ0 = 0.1, allowing the restoration of the vertical line for much wider gaps, and thus alleviating the
dependence with the gray initial condition.
Other shapes of the confidence mask could be used
for controlling different aspects of the dynamics of the
completion algorithm. For instance controlling the decay of the mask from certain points of the boundary
allow to privilege the continuation of structures from
them.
Interpolation of sparsely sampled images. In this work
we have implicitly assumed the existence of a set of
e c , constituting the set of exemplars.
complete patches O
This assumption can be removed by generalizing the
patch error function ε(·) so that it depends on the positions of the patches. This implies a potentially different
error function for each pair of patches. This has been
explored in [29], applied to the case of interpolation of
sparsely sampled images. In that case the patch error
function was designed to consider only those locations
known in at least one of the two patches.
3 Discussion and connections
In this section we present various issues related to the
variational framework and its connections with other
models.
3.1 Probabilistic-geometric model interpretation
The proposed energy has an interpretation in terms of
a probabilistic model in the space of patches, which
becomes apparent when rewritten using the generalized Kullback-Leibler divergence [23]. Given two positive and integrable functions p, q defined over a certain
measure space X , the generalized Kullback-Leibler divergence is given by:
KL(p, q) =
Z
p(s) log
X
p(s)
q(s)
ds−
Z
Z
q(s)ds,
p(s)ds +
−
X
X
assuming that the integrals exist. With this notation
(and taking into account that w(x, ·) is a probability)
the functional E can be written as
E(u, w) =
Z
e
O
KL (w(x, ·), r(x, ·)) dx−
Z Z
−
e
O
ec
O
r(x, x̂)dx̂dx,
where r is the unnormalized Gaussian weight function
1
r(x, x̂) = exp − ε(pu (x) − pû (x̂)) .
h
12
The first term integrates the divergence between the
e The second
functions w(x, ·) and r(x, ·), for each x ∈ O.
term can be interpreted by noticing that
Z
r(x, x̂)dx̂
(16)
q̃(x) =
ec
O
is a density estimate (in the patch space) of the set of
ec
patches in Oc : The higher the amount of patches in O
close to pu (x) (according to the scale parameter h), the
higher the value of qe.
The minimizers (u∗ , w∗ ) are obtained when for all
e×O
e c , w∗ (x, x̂) = r∗ (x, x̂)/q̃ ∗ (x), (Gaussian
(x, x̂) ∈ O
weights normalized by (16)), and the patches of the inpainted image are in regions of high density in the patch
space. This provides a geometric intuitive interpretation of our variational formulation. The image is considered as an ensemble of overlapping patches. Known
ec are fixed, forming a patch density model
patches in O
e The richness of the
used to estimate the patches in O.
framework is given in part by the fact that different
norms in the patch space induce inpainting schemes of
different nature.
3.2 Connections with statistical mechanics
The proposed energy can be given another interpretation in terms of statistical mechanics, as the Helmholtz
free energy of a system of particles. In this context
we consider that for each “particle” x ∈ Õ there is a
set of possible configurations indexed by the parameter
x̂ ∈ Õc with a probability density w(x, x̂). If we consider that each configuration has an energy U (x, x̂) =
ε(pu (x) − pu (x̂)), then the probability that the particle
at x is in the state x̂ is proportional to
e−βU(x,x̂) /q(x)
with the normalization factor q given by the partition
function
X
q(x) =
e−βU(x,x̂) .
x̂
Then the Helmholtz free energy for a particle x is
H(x) := U (x) + β −1 E(x) :=
X
X
:=
U (x, x̂)w(x, x̂) + β −1
w(x, x̂) log w(x, x̂).
x̂
x̂
Then the total Helmoltz free energy for the system of
particles is
X
XX
H :=
H(x) =
U (x, x̂)w(x, x̂)+
x
x
x̂
β −1
XX
x
x̂
w(x, x̂) log w(x, x̂),
where the sums are extended to x ∈ Õ and x̂ ∈ Õc . We
have used the notation β as h1 .
3.3 Revisiting related work
In this section we will briefly review the connections of
our work with other inpainting algorithms and also with
existing variational models of non-local regularization
which have been proposed in contexts such as image
denoising.
The method in [56] is closely related to the patch
NL-means scheme of Eq. (9). The key difference lies in
the underlying theoretical model. The problem is addressed as a MRF, where pixels outside the hole are
observable variables, missing pixels in the hole are the
parameters, and the hidden variables are given by the
correspondence Γ : O → Oc , which assigns a patch outside the hole to each x in O. The method can be seen
as an approximate EM algorithm for maximizing the
log-likelihood w.r.t. the pixels in O, and some approximations have to be taken to make the optimization
tractable. Based on heuristics, the authors also propose to use more robust estimators than the mean for
the synthesis of pixels. Within the framework here proposed, robust estimators (as the median) naturally result from particular choices of the patch error functions
ε(·).
In [38] an energy is presented which can be seen as
a limit case of the patch NL-means energy when h → 0.
The authors propose modifications of the energy which
improve the results, such as some spatial localization of
the similarity weights and brightness invariance. The
latter is achieved by introducing a multiplicative constant that matches the mean illuminance between each
pair of patches.
The patch NL-means algorithm is also related to the
manifold image models of [49]. Eq. (9) can be split into
two steps which are analog to the manifold and image
projection steps used in [49]. First, for each patch cene we compute a new patch as a weighted avertered in O
age of all patches in the complement,
R according to the
patch similarity weights pMS
ec w(z, ẑ)pû (ẑ)dẑ
u (z) := O
e Doing this for each hole position yields an
with z ∈ O.
incoherent ensemble of patches. The imageR is obtained
1
by averaging these patches: u(z) = A(Ω
pMS
u (z −
p ) Ωp
y, y)dy. We use a density model, instead of the manifold
model of [49]. Indeed, pMS
u (x) is the mean shift operator applied to pu (x). It is known that the iteration of
this operator corresponds to an adaptive gradient ascent of the Parzen estimate of a PDF [21], which in
this case is generated by the set of patches in the complement of the hole. The use of a density model entails
13
some advantages, mainly from the computational point
of view, learning a manifold model is computationally
costly. Furthermore, the assumption that patches lie on
a manifold is questionable (one could think for instance
in a stratification as a more realistic model), and its dimension is hard to determine for real images.
In the following we will comment on the relation of
this model with recent works on non-local regularization.
The UINTA algorithm, presented in [5] is a nonlocal denoising algorithm that minimizes the entropy
of the patches in the image. Casting this idea to the
context of inpainting the UINTA’s entropy is estimated
as the sample mean
Z
Z
1
2
EU (u) = − log exp(− kpu (x) − pû (x̂)k )dx̂ dx
h
e
ec
O
O
where the inner integral is the probability of occurrence
of the patch pu (x) obtained as a Parzen density estimate. The corresponding Euler-Lagrange equation can
be solved with a fixed point iteration which coincides
with the patch NL-means scheme (9). In [5] this energy is minimized by considering all patches as independent (disregarding the overlap between neighboring
patches), and evolving each of them according to a gradient descent of EU . After this, an image is formed with
the centers of these new patches. The repetition of this
process results in an iterative application of pixel-wise
NL-means.
In [13] the authors use a variational principle for
deriving the iterated pixel NL-means regularizer, and
show its application to the restoration of texture. The
underlying energy corresponds to the quadratic penalty
between the solution image u, and a pixel NL-means
type average of the noisy input image û. The weights for
this average are computed using u. Due to the dependence of the weights with the regularized image u, the
minimizer is no longer a weighted average as NL-means,
but the solution of a nonlinear optimization problem.
It is shown that if the derivative of the nonlinear component is neglected, the resulting Euler-Lagrange equation matches the proposed fixed point algorithm: The
iterated NL-means regularizer.
In [52] the authors presented a variational framework for image denoising consisting in non-local regularization and data adjustment terms. Inpainting could
be performed by considering only the data term as follows:
Z Z
exp(ε(pu (x) − pû (x̂)))dx̂dx
EP (u) = −
e
O
ec
O
This energy is the same as the one adapted from the
UINTA algorithm EU , without the logarithm. In [52]
the Euler-Lagrange equation is solved with a fixed point
iteration. This model has two differences with our framework. First it allows to use a more general nonlinearity
for the computation of the weights other than the exponential. Second, even in the case of the exponential, the
methods differ in the normalization, for instance, when
ε is the squared L2 -norm, the resulting scheme is as the
patch NL-means, with the unnormalized weights.
After its introduction in [3], our model has been
interpreted as a non-local self-similarity regularizer in
[51], where in conjunction with appropriate data fitting terms it has been applied to the solution of inverse problems, including inpainting, super-resolution
and compressive sensing. In [51] a different patch-error
function ε is used, namely the L2 -norm between patches
(without squaring it). This choice is motivated as a
patch-wise version of their work [50] on non-local Total Variation [32,42,57] with adaptive weights. This
patch-wise non-local TV is defined as the L1 -norm of
the non-local gradient of the patch valued image pu :
Ω → RΩp . The non-local gradient is defined as a function ∇w pu : Ω × Ω → RΩp given by ∇w pu (x, x̂) =
w(x, x̂)(pu (x) − pu (x̂)). Thus, the patch-wise non-local
TV reads
Z Z
w(x, x̂)kpu (x) − pu (x̂)k2 dx̂dx.
k∇w pu k :=
Ω
Ω
Note that in this sense, the model of the patch-NL medians corresponds an anisotropic version of the nonlocal TV where the 2-norm in the integral is replaced
by the 1-norm. Our work and the work of [51] are complementary. In [51] the regularization term is fixed, and
the authors focus on the possibilities given by different data terms suited for different applications. On the
other hand in this work we focus on the regularization
term exploring its properties with different patch error
functions ε, and applying them to a problem in which
the data term plays no role at all, since there is no data
to adjust to.
4 Multiscale scheme
Exemplar-based inpainting methods show a critical dependence with the size of the patch. In Figure 7, we
show completions obtained with patch NL-means using
different patch sizes: Two results with a small patch
(a = 4) and one result with a large patch (a = 19). The
latter is able to reproduce the periodic pattern of the
lamps, but the completion is blurry due to the spacial
overlap of the patches and presents many discontinuities at the boundary of the hole.
The results with the small patch do not show these
artifacts, but one of them has failed to reproduce the
14
Fig. 7 Mitama Matsuri. Left column: inpainting domain and initial condition. For the rest of the columns, from left to right single
scale inpainting with a 9 × 9 patch with a = 4, single scale inpainting with a 43 × 43 patch with a = 19 and multi-scale inpainting, with
S = 2, covering a patch size from 9 × 9 with a = 4 to 43 × 43 with a = 19. All results have been computed with the patch NL-means
scheme. The bottom row shows the regions in which the nearest neighbor offset is constant.
lamps. The only difference between both is the initialization. One of them was initialized with the original
image shown in the bottom left, whereas the other one
with the result obtained with the multiscale approach
described in this section.
Figure 8 shows a synthetic problem with similar
characteristics. The result with a large patch recovers
the shape of the incomplete rectangle, but does not continue its texture correctly. Three different small scale
results are shown. They are all minima of the energy
with minimal energy, E = 0. Patches are too small to
capture that these completions are different. Which one
is obtained will depend on the initialization.
As done by almost all state of the art exemplar
based inpainting methods (e.g. [38,39,56]), we will incorporate a multiscale scheme, described below. This is
usually motivated as an heuristic to avoid local minima, to find a good initialization and/or to alleviate
the computational cost. We believe however (following
[34]) that, as these examples suggest, inpainting is inherently a multiscale problem: Images have structures
of different sizes, ranging from large objects to fine scale
textures and edges. The multiscale scheme responds to
the fact that several patch sizes are needed to reproduce
all these structures properly.
4.1 Multiscale scheme
In the following we describe the multiscale method we
adopted in this work, which goes along the lines of what
is customary in the literature [30,38,56]. It consists on
sequentially applying the inpainting scheme on a Gaussian image pyramid, starting from the coarsest scale.
The result at each scale is upsampled and used as ini-
tialization for the next finer scale. The patch size is
constant through scales.
Let us consider S scales, the finest denoted with
s = 0. We will specify the size of the image at the
coarsest level AS−1 . Denoting the size of the image at
the finest scale by A0 , we compute the sampling rate
as r := (A0 /AS−1 )1/(S−1) ∈ (0, 1). The width of the
Gaussian filtering is associated to the subsampling factor as in [46]. Let a0 be the size of the patch and Ea0
the corresponding energy. We will add the superindex
s = 0, . . . , S − 1 to the variables u and w to denote
the scale. As before, the subindex 0 refers the initial
condition, i.e. us0 is the initial condition at scale s.
Algorithm 2 Multiscale scheme.
Require: uS0 , S, a0 and AS−1
1: Initialize: (uS−1 , wS−1 ) = arg min(u,w) Ea0 (u, w)
2: for each scale s = S − 2, . . . , 0 do
3:
Upsample us+1 to obtain us0
4:
(us , ws ) = arg min(u,w) Ea0 (u, w)
5: end for
The upsampling from s + 1 to s is obtained as in
[56]. The coarse weights ws+1 are first interpolated to
the finer image size, yielding w0s . These weights are
then used to solve an image update step at the new
scale: us0 = minu Ea0 (u, w0s ). More conventional upsampling schemes by local interpolation (such as bilinear or splines) introduce a bias towards low-frequency
non-textured regions. This exemplar-based upsampling
avoids this bias.
Notice that keeping the patch size constant while
filtering and reducing the image, is almost equivalent
to enlarging the patch domain and filtering an image
15
Fig. 8 Single scale inpainting. Synthetic inpainting problem (the red rectangle is the inpainting domain) and several local minima
of a single scale inpainting energy with a small patch.
of constant size. The process can thus be seen as the
sequential minimization of a series of inpainting energies with varying patch size given by as = 1/rs a0 ,
s = 0, . . . , S − 1, over a corresponding series of filtered
images.
The scheme starts by the coarsest scale S − 1, in
which a larger portion of the inpainting domain is covered by partially known patches, which makes the inpainting task easier, and less dependent on the initialization. The energy at this scale should have fewer local
minima. The dependency of the minimization process
on the initial condition ensures also that each single
scale solution corresponds to a local minimum close to
the coarse scale initialization. Therefore the multiscale
algorithm exploits this dependency to obtain an image
u0 which is approximately self-similar for all scales (or
equivalently, for all patch sizes).
Figure 7 shows a comparison between single and
multiscale results with the patch NL-means scheme.
As expected the multiscale result shows the benefits
of large and small patch sizes. The missing lamps have
been completed with the correct shape and spacing by
the coarser stages, and the fine details are also taken
care off by the finer scales. The bottom row shows the
regions of constant NNF. The single scale result with
a large patch shows an NNF partitioned in relatively
large regions, showing that with a large patch size the
copying is more rigid. The result with a small patch
size shows that the synthesis has been performed based
on small regions. The multiscale’s NNF shows an intermediate partition, with some large regions inside of
the hole and a smaller ones around its boundary. The
finer inpaintings work by refining the coarse partition
obtained at coarser scales.
5 Experimental results
In this section we further demonstrate the performance
of the proposed schemes on real inpainting problems.
The images used were obtained from Komodakis and
Tziritas [39] and from the 100 images benchmark proposed by Kawai et al. [38], available at http://yokoya.
naist.jp/research/inpainting/. Due to the lack of
space we focus on the methods presented in this work.
Please refer to [38,39] for a comparison with their approaches.
5.1 Experimental setting
We consider four inpainting methods, namely patch
NL-means, -medians, -Poisson and -GM. Both gradient
based methods are always combined with patch NLmeans with mixing parameter λ, as in (14). In all cases
we use the multiscale approach. As stated before, to
prevent blurring we set h, the selectivity of the similarity weights w, to h → 0. In this case, the weights select
e We show rethe nearest neighbors of each patch in O.
sults with the CIE La*b* color space.
The calculation of the weights dominates the computational load of the algorithms. With an exhaustive
search for the exact nearest neighbor, the cost of each iteration is O(A(O)× A(Oc )× s2 ). However, a significant
speed-up can be obtained with approximate searches,
almost without any noticeable decrement in the quality
of the results (see [13] and references therein).
In our implementation we use a modified version
of the PatchMatch algorithm introduced in [7]. PatchMatch is an iterative algorithm that estimates the NNF
jointly for all patches in the inpainting domain by exploiting the coherence of natural images. The modifi-
16
cation we implemented allows the estimation of a lists
of the L first nearest neighbors of each patch. The algorithm has a computational cost of O(A(O) × s2 × L)
per iteration. Typically between 5 and 10 iterations are
sufficient to obtain results comparable with the exhaustive search algorithm when using lists of size L ≃ 10.
In Appendix C we describe the modified PatchMatch.
The image update step is fast for the patch NLmeans, -median and -Poisson, but can be time consuming for patch NL-GM, in particular for small values of
the mixing parameter λ. See Appendix B for more details.
The results are shown in figures 9, 10 and 11. The
first column shows the initialization. The rest of the
columns show the results obtained with patch NL-means,
-medians, -Poisson and -GM in that order. Obtaining
good results requires fixing the following parameters:
Patch size. For almost all experiments we used patches
of size s between 3 × 3 and 9 × 9. We did not use
Gaussian intra-patch weights to shape the patch,
since they need a larger support, which implies less
available exemplars.
Multiscale parameters. The multiscale scheme has two
parameters: the size of the coarsest image AS−1
and the number of scales S. From these, AS−1 is
the most critical parameter. Smaller images with
smaller inpainting domains are easier to complete.
Care must be taken, however, since small images
also imply less available exemplars to copy from. In
these experiments AS−1 was set to a 20% of the
original size when possible. Some cases required less
subsampling. The number of scales S was set such
that the subsampling rate r = (A0 /AS−1 )1/(S−1) ≈
(1/2)1/3 ≈ 0.8 as in [56].
Confidence mask. The confidence mask has two parameters, the asymptotic value c0 and the decay time tc .
For all experiments we fix c0 = 0.1 and used a decay
time tc = 5 except for small inpainting domains, in
which we set tc = 1.
For the mixing coefficient λ of gradient-based methods we tested two configurations: Low-λ corresponding
to λ = 0.01 and λ = 0.001 for patch-NL Poisson and GM, and high-λ corresponding to λ = 0.1 and λ = 0.01
for patch-NL Poisson and -GM. Recall that lower values of λ give a higher weight to the gradient component of the energy. This is appropriate for structured
images with strong edges. In almost each row of figures
9, 10 and 11 we used the same λ configuration for both
gradient-based schemes.
The rest of the parameters are also the same for
each row in figures 10 and 11. Instead, for many images
in Fig. 9 we used different parameters for intensity- and
gradient-based methods. In these cases, image-based
methods required larger patches than gradient-based
ones.
5.2 Observations and comments
Gradient vs. intensity. Gradient-based methods show a
better reconstruction of structure (Fig. 9) and periodic
patterns (Fig. 11). Intensity-based methods present discontinuities at the boundary of the hole and between
copy fronts. Furthermore, gradient-based methods facilitate the prolongation of structures and edges, due to
the reinforcement of local PDE diffusion and non-local
propagation (see Section 2.2). This allows to use smaller
patches, which alleviates the computational load and
reflects in more available exemplars and less blending
due to patch overlap (either by averages or medians).
On the other hand, gradient-based methods show
problems with random textures (Fig. 10) and in some
occasions between copy fronts. In these cases, the nearest neighbors of overlapping patches disagree on the
gradient of a certain pixel. This may result in the attenuation or the omission of gradients, causing a “spilling
effect”, particularly for the patch NL-Poisson (e.g. rows
1,3 and 4 of Fig. 9).
Some failures of gradient-based methods on random
textures result from the use of gradients in the patch error function, as in rows 4 and 6 of Fig. 10. In the former,
for instance, segments of the sky have been reproduced
in the snow. Fig. 12 shows results with an inpainting
scheme in which the weights are computed based only
on the image values (with the squared L2 -norm), and
the image is updated using patch NL-Poisson or -GM
with a low value of λ. Such scheme is non-variational
and its convergence its not guaranteed.
We should point out that in many other cases, considering gradients in the comparison criterion improves
the results. See for instance rows 2 and 4 of Fig. 11.
Means vs. medians. It is notorious that L1 -based functionals perform better at the reproduction of fine texture. The results of the L2 methods are smoothed by
the spatial averaging of overlapping patches. On the
other hand, patch NL-medians creates sharp discontinuities as in Fig. 2 when different copy fronts meet (e.g.
rows 1, 3, 5 and 7 in Fig. 9). These discontinuities are
very noticeable and in these cases some smoothing is
desirable. For the patch NL-GM method, discontinuities between copy fronts occur at the gradient level,
and are eliminated after its integration. The fact that
the gradient is estimated robustly helps in avoiding the
spilling effect of patch NL-Poisson.
17
Fig. 9 Results on structured images. From left to right: Original image and mask, patch NL-means, -medians, -Poisson and -GM.
18
Fig. 10 Results on random textures. From left to right: Original image and mask, patch NL-means, -medians, -Poisson and -GM.
All images in each row have been generated with the same parameters.
We have observed that L1 methods are more reluctant to make changes during the minimization process.
The same robustness that allows a better performance
with textures and avoids the spilling effect makes them
more greedy. Once a set of neighboring patches have
settled on a locally stable solution (typically a region
of constant NNF), it is hard for the algorithm to change
that local decision. To understand why, imagine a copy
front advancing from the boundary carrying correct information which meets an already settled copy front,
which has taken an undesirable decision based on a
bad initialization. Pixels on the boundary of the mistaken front start receiving contributions from patches in
the advancing front. Initially these contributions will be
outliers in the distribution from which the pixel value is
estimated. The median will discard these outliers, and
19
Fig. 11 Results on periodic textures. From left to right: Original image and mask, patch NL-means, -medians, -Poisson and
-GM. All images in each row have been generated with the same parameters.
the pixel value will not change unless of course, patches
in the advancing front outnumber the mistaken ones.
Although the confidence mask diminishes this effect, L1 -based methods (specially patch NL-medians)
still show more dependence on the initialization. A result of this are some misalignments in straight lines,
due to subsampling artifacts of the multiscale scheme.
Also patch NL-medians generally requires the use of
larger patches, particularly for structured images. This
is not always possible, as in row 6 of Fig. 9, where we
could not find a proper set of parameters. However we
found a good result by after 3 iterations of the multiscale scheme.
The focus of this work lies more on introducing and
exploring the variational framework than in presenting
a single inpainting algorithm. Still, we would like to
comment on which would be the best method among
the ones presented. Based on the previous observations,
it is clear that the answer to this question will depend
on the characteristics of the inpainting problem. However, a priori, the patch NL-Poisson seems a reasonable
compromise between quality and computational cost. It
is also able to provide good results for a wider range of
parameters. For cases in which an accurate reproduction of random textures is important, patch NL-GM
20
production of textures. These schemes are however,
more greedy, or “stiff,” and therefore more dependent on the initialization. Conversely the squared
L2 -norm introduces some blur, and show less stiffness.
2. When gradients are considered in the patch error
function, the image update step is computed as the
solution of a (local) diffusion PDE, driven by a nonlocally estimated gradient field. This improves the
synthesis in terms of continuity at the boundary and
attenuation of seams. Also the local interpolation
helps in the propagation of structures, such as edges
at the boundary of the hole.
Fig. 12 Results with a non variational algorithm. The
gradient methods sometimes introduce blurry areas in the solution. Here we show the results using a non variational version
of the patch NL-Poisson (left column) and patch NL-GM (right
column) for some of those images.
may be a better option, however at a higher computational cost.
6 Conclusions and future work
In this work we presented a variational framework for
exemplar-based non-local image inpainting. The proposed energy lends itself to intuitive interpretations in
terms of probabilistic models and has connections with
mean shift and statistical mechanics. We minimize it
using a coordinate descent algorithm, alternating between similarity weights and the image updates. These
processes are coupled through the patch error function,
i.e. the criterion used to compare patches.
We derived from this framework four different inpainting schemes, corresponding to error functions based
on the combinations of L1 - and L2 -norms, with image
or gradient patches. After testing these schemes on synthetic and real problems we arrived to the following
conclusions.
1. The robustness of the L1 -norm (patch NL-medians
and NL-GM) makes it more appropriate for the re-
The proposed functional shows a critical dependence
with the patch size. Furthermore, it is nonconvex and
has many local minima, particularly for small patch
sizes. To tackle these issues we use a multiscale approach. This is customary in the literature, usually motivated as a way to obtain a good initialization and
for computational reasons. We believe however that inpainting is inherently a multiscale problem, and that
underlying the multiscale approach lies an inpainting
criterion. We are currently working on a variational formulation of multiscale inpainting. Other topic of current research is the use of other patch error functions
based on the comparison of structure tensors, which
could provide a more robust estimation of the morphological structure of the image.
A Existence of minima for the functional (2)
Let us consider the model
Z Z
1
w(x, x̂)kpu (x) − pu (x̂)k2 dx̂dx+
E(u, w) =
e O
ec
h O
Z Z
+
w(x, x̂) log w(x, x̂)dx̂dx.
e
O
(17)
ec
O
In this section we consider the squared L2 patch error function
kpu (x) − pu (x̂)k2 =
Z
ga (y)(u(x + y) − u(x̂ + y))2 dy
Ωp
where ga denotes a smooth integrable kernel whose gradient is
also integrable. Let
e×O
ec → R :
P := {w : O
Z
ec
O
e
w(x, x̂) dx̂ = 1 a.e. x ∈ O}.
Let us consider the admissible class of functions
AM := {(u, w) : u = û in O c , |u(x)| ≤ M
w ∈ P}.
Proposition 1 Assume that û ∈ BV (O c ). There exists a solution of the variational problem
min
(u,w)∈AM
E(u, w).
(18)
21
Proof Let (un , wn ) be a minimizing sequence of (18), i.e. a sequence such that E(un , wn ) → inf (u,w)∈AM E(u, w). Since Ω is
a bounded domain we have that
Z Z
χ{w>1} wn (x, x̂) log wn (x, x̂)dx̂dx
e
O
−
=
ec
O
is bounded. Hence wn (1 + log+ wn ) is bounded in L1 . Then the
sequence wn is relatively weakly compact in L1 and by extracting a subsequence, if necessary, we may assume that wn weakly
e×O
e c ) to some w ∈ P. Let us now prove that
converges in L1 (O
Z
ga (y)(u(x + y) − u(x̂ + y))2 dy
∇x+x̂
+2
∇x−x̂
Ωp
=2
Z
Z
∇y ga (y)u(x + y)dy.
Ωp
Ωp
and
Z
ga (y)(u(x + y) − u(x̂ + y))2 dy
Ωp
=2
ga (y)(u(x + y) − u(x̂ + y))
Ωp
(∇y u(x + y) − ∇y u(x̂ + y))dy
Z
∇y ga (y)(u(x + y) − u(x̂ + y))2 dy,
Ωp
the L1 integrability of ∇y ga (y) and the boundedness of u. We are
also assumming that ga vanishes at the boundary of Ωp . Now,
Z
∇x−x̂
ga (y)(u(x + y) − u(x̂ + y))2 dy
Ωp
=2
=2
=
Z
Z Z
e
O
ec
O
w(x, x̂)F 2 +
Z Z
e
O
ec
O
w(x, x̂) log w(x, x̂)dx̂dx
≤ lim inf E(un , wn ).
(∇x u(x + y) − ∇x̂ u(x̂ + y))dy
Z
=−
are uniformly bounded when |u| ≤ M . Thus, modulo the extraction of a subsequence, we may assume that un → u weakly in
all Lp , 1 ≤ p < +∞ and ga ∗ (un (x + ·) − un (x̂ + ·))2 converges
strongly in all Lp spaces and also in the dual of L Log + L to some
function F . Then by passing to the limit as n → ∞ we have
1
h
ga (y)(u(x + y) − u(x̂ + y))
Ωp
Z
ga (y)∇y u(x̂ + y) u(x + y)dy −
Ωp
are uniformly bounded when |u| ≤ M . Indeed, the boundedness
of the first integral is a consequence of
Z
ga (y)(u(x + y) − u(x̂ + y))2 dy
∇x+x̂
+2
∇y ga (y)(u(x̂ + y) + 1)u(x + y)dy
Ωp
From our assumptions, the boundedness of the previous integral
holds. Thus, the functions
Z
ga (y)(u(x + y) − u(x̂ + y))2 dy
∇x
∇x̂
ga (y)(u(x + y) − u(x̂ + y))2 dy
∇y ga (y)u(x + y)dy
Ωp
Ωp
Ωp
and
Z
Z
Z
Z
Z
Z
ga (y)(u(x + y) − u(x̂ + y))
Ωp
(∇x u(x + y) + ∇x̂ u(x̂ + y))dy
ga (y)(u(x + y) − u(x̂ + y))
Ωp
(∇y u(x + y) + ∇y u(x̂ + y))dy
ga (y)(∇y u(x + y)2 − ∇y u(x̂ + y)2 )dy
Ωp
ga (y)(u(x + y)∇y u(x̂ + y) − u(x̂ + y)∇y u(x + y))dy
Ωp
Observe that
Z
ga (y)(u(x + y)∇y u(x̂ + y) − u(x̂ + y)∇y u(x + y))dy
Ωp
=−
Z
ga (y)((u(x̂ + y) + 1)∇y u(x + y)
Ωp
− u(x + y)∇y u(x̂ + y))dy +
=−
Z
ga (y)∇y
Ωp
−
=
Z
Ωp
Z
„
Z
ga (y)∇y u(x + y)dy
Ωp
«
u(x + y)
(u(x̂ + y) + 1)
(u(x̂ + y) + 1)2 dy
∇y ga (y)u(x + y)dy
Ωp
∇y (ga (y)(u(x̂ + y) + 1)2 )
u(x + y)
dy
(u(x̂ + y) + 1)
n
e×O
e c and using
Taking test functions ϕ(x, x̂), integrating in O
the convexity of the square function, we have
Z
ga (y)(u(x + y) − u(x̂ + y))2 dy ≤ F.
Ωp
Then E(u, w) ≤ lim inf n E(un , wn ).
⊔
⊓
B Algorithm for solving Equation (13)
Following [2, 18] we derive here a fixed point algorithm for a discrete version of (13).
Discrete formulation. We consider discrete images defined over a
rectangular bounded domain Ω = {0, . . . , N }2 . We are given an
inpainting domain O ⊂ Ω and the known portion of an image
û : O c → R. Let us define Oe as the set of pixels with at least
one neighbor in O (according to the 4-connectivity). If we think
as the discrete image as a lattice of nodes joined by edges, all
variable edges are between nodes in Oe . We will use the notation
[A → B] = {f : A → B}, the set of functions from A to B.
We define ∇ : [Ω → R] → [Ω → R2 ] as
0
if i = N ,
∇u1i,j :=
ui+1,j − ui,j if i < N ,
and similarly for the component ∇u2i,j . The notation z = (i, j)
refers to for locations on the image. Let us also define a divergence
operator ∇· : [Ω → R2 ] → [Ω → R],
8
8
1
2
>
>
if i = 0,
if j = 0,
< pi,j
< pi,j
if i = N , + −p2i,j−1
if j = N ,
∇·pi,j := −p1i−1,j
>
>
: p1 − p1
: p2 − p2
i,j
i−1,j otherwise,
i,j
i,j−1 otherwise.
These operators incorporate Neumann boundary conditions on
the boundary of Ω, and are dual operators, i.e. denoting h·, ·i
the usual scalar products in [Ω → R2 ] and [Ω → R2 ], h∇u, pi =
−hu, ∇ · pi, for all u ∈ [Ω → R], p ∈ [Ω → R2 ].
Let us also define U := [Oe → R] and V := [Oe → R2 ].
In these spaces we will consider the usual scalar products and
22
will denote them as h·, ·iU and h·, ·iV . Notice that, with these
definitions, the gradient and divergence are not dual operators
when restricted to Oe , i.e. in general hu, ∇ · piU 6= −h∇u, piV .
b := {u ∈ U : uz =
We are looking for a completion u ∈ U
ûz , ∀z ∈ O c } as the solution of the following problem:
min
X X
b
u∈U
z∈Oe ẑ∈D
mz ẑ k∇uz − vẑ k +
1 X
cz (uz − fz )2 ,
2δt z∈O
(19)
where v : D → R2 , m : Oe × D → R+ ∪ {0}, f : O → R and c :
Oe → R+ are given. For the sake of generality we consider a
generic domain D instead of the complement Oec . This generalization is relevant, due for instance to computational considerations
as will be discussed shortly. Let us remark that, since cz > 0 for
z =∈ Oe , the energy is convex and the minimum is unique.
For simplicity we extend f over Oe by defining fz = ûz for
z ∈ Oe \O. We use the fact that for any η ∈ R2 , kηk = supkεk<1 η·
ε, and write the energy as:
min
X X
sup
b kε k61
u∈U
z ẑ
z∈Oe ẑ∈D
mz ẑ εz ẑ ·(∇uz −vẑ )+
1 X
cz (uz −fz )2 .
2δt z∈O
e
We have defined the unit field ε ∈ W := {ε : Oe × D → R2 }.
Interchanging the minimum with the supremum, we obtain
the following expression for u in the inpainting domain
u∗z = fz + δtc−1
z ∇ · Lεz ,
z ∈ O.
(20)
where
we have defined the linear operator L : W → V as Lεz =
P
ẑ∈D mz ẑ εz ẑ .
We can extend expression (20) to Oe , by defining a new diO
vergence operator ∇· : V → U as ∇ · εz := χO
z ∇ · εz . Here χz = 1
O
if z ∈ O and χz = 0 otherwise. We also define a modified gradient ∇· : U → V as the negative adjoint operator of ∇· , given
by ∇uz := ∇[χO
z uz ]. Besides allowing to extend (20) these operators are relevant since, as opposed to the usual gradient and
divergence operators, they are still dual when they are restricted
to functions over Oe : h∇ · p, uiU = −hp, ∇uiV .
Substituting expression (20) in the energy, we arrive to the
following dual formulation:
min
kεz ẑ k61
−
«
„
˜
ˆ
1
δt Lεz · ∇ c−1 ∇ · Lε z + c−1 [∇ · Lεz ]2
2
z∈Oe
3
2
X
X
4Lεz · ∇fz −
mz ẑ εz ẑ · vẑ 5 .
−
X
z∈Oe
ẑ∈D
Using that ∇[c−1 ∇ · Lε]z = ∇[c−1 ∇ · Lε]z for all z ∈ Ω, we can
rewrite the dual problem with the more compact expression
min
kεz ẑ k61
X
z∈Oe
2
X
1 + ν mz ẑ k − ∇fz + vẑ − δt∇[c−1 ∇ · Lεt ]z k
(24)
O \O
where hz ẑ := mz ẑ (vẑ − ∇[χz e fz ]) and h·, ·iW is the usual
scalar product in W . This expression shows that, since cz > 0,
there is a unique minimum value for ∇ · Lε. We prove now that
the sequence defined in (23) converges to that minimum.
P
Proposition 2 If cz = ẑ∈D mz ẑ 6 1 for z ∈ Oe and the time
1
step ν verifies ν < 4δtK
, with
K := max
i,j∈Oe
χO
i,j +
ci,j
ci+1,j
O
χO
i+1,j ; χi,j +
ci,j
ci,j+1
χO
i,j+1
ff
,
(25)
the sequence (εt ) defined in (23) converges to a minimum of the
energy (24).
Proof For reasons of space, we only sketch the main steps of
the proof. The derivations are similar to those in [2, 18]. Let us
1
, the energy J ∗ decreases. Defining
first show that if ν < 4δtK
−1
t+1
t
η := ν (ε
− ε ) for a given t > 0, we have that the variation
in the energy can be bounded by
1
ν(νδtkc−1/2 ∇ · Lηk2U − kηk2W )
2
1
6J ∗ (εt ) + kηk2W ν(4νδtK − 1),
2
J ∗ (εt+1 ) 6J ∗ (εt ) +
(26)
where K is a constant such that 4Kkηk2W > kc−1/2 ∇ · Lηk2U for
all η ∈ W . We will derive this constant later.
Therefore, by choosing ν < (4δtK)−1 the energy decreases
with εt . Let m = limt→∞ J ∗ (εt ) > 0. Let us prove now that
εt also converges. Since kεtz ẑ k 6 1 for all z ∈ Oe , ẑ ∈ D, t >
0, there exists a subsequence (εtk ) converging to ε. From (23)
we see that (εtk +1 ) converges as well. Let ε′ = limtk →∞ εtk +1 .
Defining η := (ε′ − ε)/ν, we see that in the limit when tk → ∞
J ∗ (ε′ ) 6 J ∗ (ε) + 12 kηk2W ν(4νδtK − 1). Since J ∗ (ε′ ) = J ∗ (ε) = m
we can conclude that η = 0 and therefore ε = ε′ . This implies
that the whole sequence (εt ) converges. Furthermore the limit ε
verifies the Euler equation (22), and thus is a minimizer of the
dual energy J ∗ .
for operators A : V → U . For any η ∈ W we can write
(22)
,
δt −1/2
kc
∇ · Lε + c1/2 f /δtk2U + hh, εiW ,
2
(21)
The Karush-Kuhn-Tucker Theorem yields the existence of the
Lagrange multipliers with values λz ẑ = mz ẑ k − ∇fz + vẑ −
δt∇(c−1 ∇ · Lε)z k. The solution of (21) is computed with the
semi-implicit scheme
εtz ẑ + ν mz ẑ [−∇fz + vẑ − δt∇[c−1 ∇ · Lεt ]z ]
J ∗ (ε) =
We now sketch the derivation of the expression for K. Let us
define the following c-weighted norms: kuk2c := hc−1 u, uiU for u ff
∈
Adding the Lagrange
multipliers corresponding to the restricP
tions kεz ẑ k 6 1, z,ẑ∈Oe ×D λz ẑ (kεz ẑ k2 − 1), and deriving with
respect to ε we get the Euler equation
εt+1
z ẑ =
Convergence. Let us first observe that ∇fz − ∇fz = ∇[(1 −
Oe \O
χO
fz ] for all z ∈ Ω, which allows to express
z )fz ] = ∇[χz
the dual energy (21) as
3
2
4 δt c−1
mz ẑ εz ẑ · vẑ 5 .
z [∇ · Lεz ] − Lεz · ∇fz +
2
ẑ∈D
mz ẑ [−∇fz + vẑ − δt∇[c−1 ∇ · Lε]z ] + λz ẑ εz ẑ = 0.
where ν is a time step small enough for assuring the convergence
of the fixed point iteration. Lastly the solution to the primal
problem is recovered with (20).
(23)
U , kξk2c := hc−1 ξ, ξiV for ξ ∈ V and kAk2cc := supξ∈V
kAξk2
c
kξk2
c
kc−1/2 ∇ · Lηk2U = k∇ · Lηk2c 6 k∇ · k2cc kLηk2c .
On one hand we have that
9
8
=
<
X
2
2
−1
2
kLηkc 6 kηkW sup cz
mz ẑ 6 kηk2W ,
;
:
z∈Oe
ẑ∈D
where first inequality is an application of Cauchy-Schwarz, then
P 2
P
2
we have used that
i ai 6 ( i |ai |) , and the fact that cz =
P
ẑ∈D mz ẑ 6 1.
On the other hand, from the definition of the divergence op⊔
⊓
erator, it can be shown that k∇ · k2cc 6 4K.
23
800
Error
Computational considerations. Let us remark that for the case
of inpainting (D = Oec ) the domain of the dual variable ε is
Oe × Oec ⊂ R4 , and therefore the direct implementation of this
algorithm is prohibitive (for large images with large inpainting
domains ε will not fit in memory). To circumvent this problem
we threshold the pixel-wise influence weights mz ẑ , so as to keep
for each z only the M highest contributions. This reduces the
size of ε to M |Oe |. Indeed, when h → 0 the m funcion is already
sparse w.r.t. ẑ, being the number of non-zero elements less or
equal than the patch size (in pixels). In our experiments we set
M to the size of the patch, capturing exactly the function m.
1
2
10
30
600
400
1
2
10
10
Time (seconds)
Fig. 13 Performance of the modified PatchMatch. The
graph shows the evolution of the nearest neighbor’s error with the
iterations of the algorithm. Each graph corresponds to a different
size L of the sets (L = 1 corresponds to the original PatchMatch
[7]). The dots indicate the completion of an iteration.
C Computation of the Nearest Neighbor Field
The computation of the nearest neighbor (or of the weight function w) is the most time consuming step of an exemplar-based
algorithm. Two key observations allow to reduce the computational load for this task. The first one is that, the nearest neighbor
search can be approximated without compromising the quality of
the output, allowing to trade precision for speed (see for instance
[13] and references therein). The second one, as noted in [7], is
that due to the coherence of natural images, nearest neighbors
of patches centered on nearby pixels are likely to be located at
nearby positions.
PatchMatch is a very efficient algorithm for computing approximate nearest neighbors proposed in [7]. The authors in [7]
consider two regions O and O c of the same or different images.
The objective is to find for each patch centered in x ∈ O the
location of the best match n(x) ∈ O c . The authors define the
Nearest Neighbor Field (NNF) as the function x 7→ (n(x) − x)
defined over O. Instead of searching for the nearest neighbor of
each query patch independently, PatchMatch computes the NNF
simultaneously for all query patches. It exploits the fact that since
query patches overlapp, the offset n(x) − x of a good match for
at x is likely to lead to a good match for the adjacent points
of x as well. It is an iterative algorithm which starting from a
random initialization, alternates between steps of propagation of
good offsets and random search (see [7] for details). For most applications with natural images, a few iterations after a random
initialization are often sufficient.
We extended this algorithm following a suggestion in [7]. Instead of storing a single offset for each query patch, we store
queues of L offsets in an L-Nearest Neighbors Field (L-NNF).
Thus the number of initial random guesses gets multiplied by L.
During the propagation step adjacent queues are merged in to
a temporal queue of length 2L from which only the best L offsets are kept. This allows to preserve and transfer non optimal
offsets that may in turn be optimal for farther positions. Our
implementation of the random search is essentially as in [7].
The use of queues represents a significant improvement of
performance with respect to the original PatchMatch. Figure 13
shows the evolution of the nearest neighbor’s error measured as
P
2
x∈O kpu (x) − pu (n(x))k with time, using queues of different
lengths. Even L = 2 allows to achieve results of similar quality
in less time. Clearly L cannot grow unbounded: with L = 30 the
first iteration got to a higher precision but it took more than 10
seconds. In practice we set L ≤ 10 and perform a small number
of iterations.
Let us remark that each queue is an approximated neighborhood of the query patch in the patch space. This neighborhood
can be interpreted as a truncated representation of the function
w(x, ·) for each pixel x, which is useful in the non-degenerated
case when h > 0 in (6).
There is an interesting parallelism between this algorithm
and the mechanic of Genetic Algorithms. The L-NNF problem
can be seen as the simultaneous minimization of many fitness
functions, one for each pixel in O. Each fitness function is defined
as the sum of the patch errors over the corresponding queue. If
we identify the individuals with the queues for each pixel in O,
then the population for single problem is composed by the queue
at the pixel and its immediate neighbors (5 individuals). The random search can be interpreted as mutations, the propagation as
an optimal crossover between individuals (queues). The population overlap of neighboring problems responds to the fact that
adjacent problems are likely to the have similar solutions.
Acknowledgments
The authors thank Pau Gargallo for his insightful comments.
PA is supported by the FPI grant BES-2007-14451 from the
Spanish MICINN. PA, GF, and VC acknowledge partial support
by MICINN project, reference MTM2009-08171, and by GRC
reference 2009 SGR 773. VC also acknowledges partial support
by IP project ”2020 3D Media: Spacial Sound and Vision”, financed by EC, and by ”ICREA Acadèmia” prize for excellence
in research funded both by the Generalitat de Catalunya. GS is
partially supported by NSF, ONR, DARPA, NGA, and ARO.
References
1. Aharon, M., Elad, M., Bruckstein, A.: The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse
representatio. IEEE Trans. on Signal Processing 54(11),
4311–22 (2006)
2. Almansa, A., Caselles, V., Haro, G., Rougé, B.: Restoration and zoom of irregularly sampled, blurred, and noisy
images by accurate total variation minimization with local
constraints. Multiscale Modeling & Simulation 5(1), 235–
272 (2006)
3. Arias, P., Caselles, V., Sapiro, G.: A variational framework for non-local image inpainting. In: Energy Minimization Methods in Computer Vision and Pattern Recognition,
pp. 345–358. Springer Berlin Heidelberg, Berlin, Heidelberg
(2009)
4. Aujol, J.F., Ladjal, S., Masnou, S.: Exemplar-based inpainting from a variational point of view. Submitted (2008)
5. Awate, S., Whitaker, R.: Unsupervised, informationtheoretic, adaptive image filtering for image restoration.
IEEE Trans. on PAMI 28(3), 364–376 (2006)
6. Ballester, C., Bertalmı́o, M., Caselles, V., Sapiro, G.,
Verdera, J.: Filling-in by joint interpolation of vector fields
and gray levels. IEEE Trans. on IP 10(8), 1200–11 (2001)
24
7. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.:
Patchmatch: a randomized correspondence algorithm for
structural image editing. In: Proc. of SIGGRAPH, pp. 1–
11. ACM, New York, NY, USA (2009)
8. Bertalmı́o, M., Sapiro, G., Caselles, V., Ballester, C.: Image
inpainting. In: Proc. of SIGGRAPH (2000)
9. Bertalmı́o, M., Vese, L., Sapiro, G., Osher, S.: Simultaneous structure and texture inpainting. IEEE Trans. on Image
Processing 12(8), 882–89 (2003)
10. Bornard, R., Lecan, E., Laborelli, L., Chenot, J.H.: Missing
data correction in still images and image sequences. In: Proc.
ACM Int. Conf. on Multimedia (2002)
11. Bornemann, F., März, T.: Fast image inpainting based on
coherence transport. J. of Math. Imag. and Vis. 28(3), 259–
78 (2007)
12. Brox, T., Cremers, D.: Iterated non-local means for texture
restoration. pp. 13–24. Springer, Germany (2007)
13. Brox, T., Kleinschmidt, O., Cremers, D.: Efficient nonlocal
means for denoising of textural patterns. IEEE Trans. on IP
17(7), 1057–92 (2008)
14. Buades, A., Coll, B., Morel, J.: A non local algorithm for
image denoising. In: Proc. of the IEEE Conf. on CVPR,
vol. 2, pp. 60–65 (2005)
15. Buades, A., Coll, B., Morel, J.M., Sbert, C.: Self-similarity
driven color demosaicking. Image Processing, IEEE Transactions on 18(6), 1192–1202 (2009)
16. Candes, E.J., Wakin, M.B.: An introduction to compressive
sampling. Signal Processing Magazine, IEEE 25(2), 21–30
(2008)
17. Cao, F., Gousseau, Y., Masnou, S., Pérez, P.: Geometricallyguided exemplar-based inpainting. Submitted (2008)
18. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1-2), 89–97
(2004)
19. Chan, T., Kang, S.H., Shen, J.H.: Euler’s elastica and curvature based inpaintings. SIAM J. App. Math. 63(2), 564–92
(2002)
20. Chan, T., Shen, J.H.: Mathematical models for local nontexture inpaintings. SIAM J. App. Math. 62(3), 1019–43 (2001)
21. Cheng, Y.: Mean shift, mode seeking and clustering. IEEE
Trans. on PAMI 17(8), 790–99 (1995)
22. Criminisi, A., Pérez, P., Toyama, K.: Region filling and object
removal by exemplar-based inpainting. IEEE Trans. on IP
13(9), 1200–1212 (2004)
23. Csiszár, I.: Axiomatic characterizations of information measures. Entropy 10(3), 261–73 (2008)
24. Demanet, L., Song, B., Chan, T.: Image inpainting by correspondence maps: a deterministic approach. Tech. rep., UCLA
(2003)
25. Drori, I., Cohen-Or, D., Yeshurun, H.: Fragment-based image
completion. ACM Trans. on Graphics. Special issue: Proc.
of ACM SIGGRAPH 22(3), 303–12 (2003)
26. Efros, A., Leung, T.: Texture synthesis by non-parametric
sampling. In: Proc. of the IEEE Conf. on CVPR, pp. 1033–
1038 (1999)
27. Elad, M., Starck, J., Querre, P., Donoho, D.: Simultaneous
cartoon and texture image inpainting using morphological
component analysis (MCA). Applied and Computational
Harmonic Analysis 19(3), 340–358 (2005)
28. Esedoglu, S., Shen, J.H.: Digital image inpainting by the
Mumford-Shah-Euler image model. European J. App. Math.
13, 353–70 (2002)
29. Facciolo, G., Arias, P., Caselles, V., Sapiro, G.: Exemplarbased interpolation of sparsely sampled images. In: Energy Minimization Methods in Computer Vision and Pattern
Recognition, pp. 331–344. Springer Berlin Heidelberg (2009)
30. Fang, C.W., Lien, J.J.J.: Rapid image completion system
using multiresolution patch-based directional and nondirectional approaches. Trans. Img. Proc. 18(12), 2769–2779
(2009)
31. Gilboa, G., Osher, S.: Nonlocal linear image regularization
and supervised segmentation. SIAM Mult. Mod. and Sim.
6(2), 595–630 (2007)
32. Gilboa, G., Osher, S.: Nonlocal operators with application to
image processing. Tech. rep., UCLA (2007)
33. Hays, J., Efros, A.: Scene completion using millions of photographs (2008)
34. Holtzman-Gazit, M., Yavneh, I.: A scale-consistent approach
to image completion. Int. J. Multiscale Comput. Eng. 6(6),
617–628 (2008)
35. Igehy, H., Pereira, L.: Image replacement through texture
synthesis. In: Proc. of the IEEE Conf. on CVPR (1997)
36. Jaynes, E.T.: Information theory and statistical mechanics.
Physical Review 106(4), 620–30 (1957)
37. Jia, J., Tang, C.K.: Inference of segmented color and texture
description by tensor voting. IEEE Trans. on PAMI 26(6),
771–86 (2004)
38. Kawai, N., Sato, T., Yokoya, N.: Image inpainting considering brightness change and spatial locality of textures and its
evaluation. In: T. Wada, F. Huang, S. Lin (eds.) Advances
in Image and Video Technology, Lecture Notes in Computer
Science, vol. 5414, chap. 24, pp. 271–282. Springer Berlin
Heidelberg, Berlin, Heidelberg (2009)
39. Komodakis, N., Tziritas, G.: Image completion using efficient belief propagation via priority scheduling and dynamic
pruning. IEEE Trans. on IP 16(11), 2649–61 (2007)
40. Levin, A., Zomet, A., Weiss, Y.: Learning how to inpaint
from global image statistics. In: Proc. of ICCV (2003)
41. Levina, E., Bickel, P.: Texture synthesis and non-parametric
resampling of random fields. Annals of Statistics 34(4)
(2006)
42. Lezoray, O., Elmoataz, A., Bougleux, S.: Graph regularization for color image processing. Comput. Vis. Image Underst.
107(1-2), 38–55 (2007)
43. Mairal, J., Sapiro, G., Elad, M.: Learning multiscale sparse
representations for image and video restoration. SIAM MMS
7(1), 214–241 (2008)
44. Masnou, S.: Disocclusion: a variational approach using level
lines. IEEE Trans. on IP 11(2), 68–76 (2002)
45. Masnou, S., Morel, J.M.: Level lines based disocclusion. In:
Proc. of IEEE ICIP (1998)
46. Morel, J.M., Yu, G.: On the consistency of the SIFT Method.
Preprint, CMLA 26 (2008)
47. Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In:
Proc. of SIGGRAPH (2003)
48. Pérez, P., Gangnet, M., Blake, A.: PatchWorks: Examplebased region tiling for image editing. Tech. rep., Microsoft
Research (2004)
49. Peyré, G.: Manifold models for signals and images. Comp.
Vis. and Im. Unders. 113(2), 249–260 (2009)
50. Peyré, G., Bougleux, S., Cohen, L.: Non-local regularization
of inverse problems. In: ECCV ’08, pp. 57–68. SpringerVerlag, Berlin, Heidelberg (2008)
51. Peyré, G., Bougleux, S., Cohen, L.D.: Non-local regularization of inverse problems. Preprint Hal-00419791 (2009)
52. Pizarro, L., Mrázek, P., Didas, S., Grewenig, S., Weickert,
J.: Generalised nonlocal image smoothing. Technical Report
No. 248, Department of Mathematics, Saarland University,
Saarbrücken, Germany (2009)
53. Protter, M., Elad, M., Takeda, H., Milanfar, P.: Generalizing the non-local-means to super-resolution reconstruction.
IEEE Trans. on IP 18(1), 36–51 (2009)
25
54. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PDE’s: a common framework for different applications. IEEE Trans. on PAMI 27(4) (2005)
55. Wei, L.Y., Levoy, M.: Fast texture synthesis using treestructured vector quantization. In: Proc. of the SIGGRAPH
(2000)
56. Wexler, Y., Shechtman, E., Irani, M.: Space-time completion
of video. IEEE Trans. on PAMI 29(3), 463–476 (2007)
57. Zhou, D., Schölkopf, B.: Regularization on discrete spaces.
In: Proceedings of the 27th DAGM Symposium, pp. 361–368.
Springer (2005)