A Variational Framework for Exemplar-Based Image Inpainting

G. Facciolo

A Variational Framework for Exemplar-Based Image Inpainting

G. Facciolo

2011, International Journal of Computer Vision

visibility

…

description

27 pages

link

1 file

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

A VARIATIONAL FRAMEWORK FOR EXEMPLAR-BASED IMAGE INPAINTING By Pablo Arias Gabriele Facciolo Vicent Caselles and Guillermo Sapiro IMA Preprint Series # 2306 ( April 2010 ) INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS UNIVERSITY OF MINNESOTA 400 Lind Hall 207 Church Street S.E. Minneapolis, Minnesota 55455–0436 Phone: 612-624-6066 Fax: 612-626-7370 URL: http://www.ima.umn.edu Form Approved OMB No. 0704-0188 Report Documentation Page Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2. REPORT TYPE APR 2010 00-00-2010 to 00-00-2010 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER A Variational Framework for Exemplar-Based Image Inpainting 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) University of Minnesota,Institute for Mathematics and Its Application,207 Church Street SE,Minneapolis,MN,55455-0436 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT Non-local methods for image denoising and inpainting have gained considerable attention in recent years. This is in part due to their superior performance in textured images, a known weakness of purely local methods. Local methods on the other hand have demonstrated to be very appropriate for the recovering of geometric structures such as image edges. The synthesis of both types of methods is a trend in current research. Variational analysis in particular is an appropriate tool for a unified treatment of local and non-local methods. In this work we propose a general variational framework non-local image inpainting, from which important and representative previous inpainting schemes can be derived, in addition to leading to novel ones. We explicitly study some of these, relating them to previous work and showing results on synthetic and real images. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: a. REPORT b. ABSTRACT c. THIS PAGE unclassified unclassified unclassified 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES Same as Report (SAR) 26 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 A Variational Framework for Exemplar-Based Image Inpainting Pablo Arias · Gabriele Facciolo · Vicent Caselles · Guillermo Sapiro Abstract Non-local methods for image denoising and inpainting have gained considerable attention in recent years. This is in part due to their superior performance in textured images, a known weakness of purely local methods. Local methods on the other hand have demonstrated to be very appropriate for the recovering of geometric structures such as image edges. The synthesis of both types of methods is a trend in current research. Variational analysis in particular is an appropriate tool for a unified treatment of local and non-local methods. In this work we propose a general variational framework non-local image inpainting, from which important and representative previous inpainting schemes can be derived, in addition to leading to novel ones. We explicitly study some of these, relating them to previous work and showing results on synthetic and real images. Keywords Inpainting · Variational methods · Selfsimilarity · Non-local methods · Exemplar-based methods 1 Introduction Image inpainting, also known as image completion or disocclusion, is an active research area in the image processing field. The purpose of inpainting is to obtain a visually plausible image interpolation in a region in which data are missing due to damage or occlusion. P. Arias, G. Facciolo, V. Caselles Dept. of Information and Communication Technologies, Universitat Pompeu Fabra Barcelona, Spain E-mail: {pablo.arias,gabriele.facciolo,vicent.caselles}@upf.edu G. Sapiro Dept. of Electrical and Computer Engineering, University of Minnesota Minneapolis, USA E-mail: [email protected] Usually, to solve this problem, the only available data is the image outside the region to be inpainted. In addition to its theoretical interest, image inpainting has applications to image and video editing and restoration. Most inpainting methods found in the literature can be classified into two groups: geometry- and textureoriented methods. We now briefly review the developments in both types of approaches, with emphasis in texture-oriented methods. This review will helpful for motivating the proposed formulation. Geometry-oriented methods. Images are modeled as functions with some degree of smoothness, expressed for instance in terms of the curvature of the level lines or the total variation of the image. The interpolation is performed by continuing and imposing this model inside the inpainting domain, usually by means of a partial differential equation (PDE). Such PDE can be derived from variational principles, as for instance in [6,19,20, 28,44,45], or inspired by physical processes [8,11,54]. These methods show good performance in propagating smooth level lines or gradients, but fail in the presence of texture. They are often referred to as structure or cartoon inpainting. Geometry-oriented methods are local in the sense that the associated PDEs only involve interactions between neighboring pixels on the image grid. An implication of this is that among all the data available in the image, these methods only use that around the boundary of the inpainting domain. Texture-oriented methods. Texture-oriented inpainting was born as an application of texture synthesis, e.g., [26, 35]. Its recent development was triggered in part by the works of Efros and Leung [26] and Wei and Levoy [55] using non-parametric sampling techniques (parametric models have also been considered, e.g. [40]). In these 2 works texture is modeled as a two dimensional probabilistic graphical model , in which the value of each pixel is conditioned by its neighborhood. These approaches rely directly on a sample of the desired texture to perform the synthesis. In practice these methods work progressively by expanding a region of synthesized texture. The value of a target pixel x is copied from the center of a (square) patch in the sample image, chosen among those that best match the available portion of the patch centered at x. Levina and Bickel [41] provided a probabilistic theoretical justification for this strategy. This method (as explained above or with some modifications) has been extensively used for inpainting [9, 10,22,25,26,48]. As opposed to geometry-oriented inpainting, these so-called exemplar-based approaches, are non-local : To determine the value at x, the whole image may be scanned in the search for a matching patch. Since these texture approaches are greedy procedures (each hole pixel is visited only once), the results are very sensitive to the order in which pixels are processed [22]. This issue was addressed in [39,56] where the inpainting problem is stated as a probabilistic inference problem in a graphical model. In both cases, the image is modeled using a pair-wise Markov Random Field (MRF). In [39] an energy considering the overlap error of adjacent patches is minimized using belief propagation. The method in [56] can be seen as an approximated Expectation-Maximization (EM) method. A similar scheme was further explored in [38], however with a different variational justification, closely related to the models addressed in this work. The algorithms of [38,56] require an initial completion which, as it turns out, has a great impact on the final result. In both cases the authors resort to a multiscale approach: The image is filtered and subsampled, possibly several times. Starting from the coarsest, the inpainting algorithm is then applied sequentially to each scale, using the upsampled result of the previous scale as initialization. This so-called multiscale approach yields very good results, and has been also applied successfully to other exemplar-based methods [30]. Another variational justification for texture-based methods was presented in [24], where the inpainting problem is reformulated as that of finding a correspondence map Γ : O → Oc , O being the inpainting domain and Oc its complement w.r.t. the image domain. Denoting the image by u, the inpainted value at position x ∈ O is then given by u(x) = u(Γ (x)), being Γ (·) the correspondence map. The authors proposed a continuous energy functional in which the unknown is the correspondence map itself: Z Z (u(Γ (x − y)) − u(Γ (x) − y))2 dydx, E(Γ ) = O Ωp where Ωp is the patch domain (centered at (0, 0)). Thus Γ should map a pixel x and its neighbors in such a way that the resulting patch is close to the one centered at Γ (x). This model has been the subject of further analysis by Aujol et al. [4], where extensions are proposed and the existence of a solution in the set of piecewise roto-translation maps Γ is proved. The authors also discuss the use of different regularizers which include a regularization of the curvature of the level lines (or surfaces) of the image. These approaches are theoretical and no numerical optimization scheme is available so far. A different variational model was presented in [49]. Images are modeled as ensembles of patches on a given patch manifold. For inpainting, the patch manifold can be learned from the set of patches in the hole’s complement. The method is iterative, with each iteration having two steps. First, the patches in the hole are projected onto the manifold. Since this is done for each patch independently, the projected patches are not necessarily coherent with each other, i.e. overlapping patches may have different values in the overlap region. Therefore, in the second step, an image is computed by averaging the patches in the ensemble. Another front of activity is given by the techniques based on the sparseland model [1,16], in which the image is restricted to have a sparse representation over an overcomplete basis or dictionary [1,27,43]. The main difference between dictionary-based and exemplar-based methods lies in where the missing information is obtained from. Dictionary based methods look for the missing data in the dictionary (as a linear combination of a few atoms), whereas exemplar-based methods assume that the information needed lies elsewhere in the image itself (or in a database of images [33]). These methods perform well in problems with small or scattered inpainting domains, but fail with a large holes. Exemplar-based methods provide impressive results in recovering textures and repetitive structures. However, their ability to recreate the geometry without any example is limited and not well understood. Therefore, different strategies have been proposed which combine geometry and texture inpainting [9,17,25,37]. These methods usually decompose the image in some sort of structure and texture components. Structure is reconstructed using some geometry-oriented scheme, and this is used to guide the texture inpainting. Contributions. Despite these combined methods, geometry and texture inpainting are still quite separate fields, 3 each one with its own analysis and implementation tools. Variational models as the one introduced in this paper can provide common tools allowing a unified treatment of both trends. We therefore propose a variational framework for non-local image inpainting as a contribution to the modeling and analysis of texture-oriented methods. Like the non-local means denoising algorithm [5, 14] we encode the image redundancy and self-similarity (measured as patch similarity) as a non-local weight function w : O×Oc → R. This function serves as a fuzzy correspondence, and differs from the works [4,24], although a (eventually multivalued) correspondence map can be approximated as a limit of our model. The proposed formulation is rather general and different inpainting schemes can be derived naturally from it, via the selection of the appropriate patch similarity criterion. In this work we present four of them, patch NLmeans, -medians, -Poisson and -gradient medians, corresponding to similarity criterions based on L2 - and L1 -norms between patches or their gradients. We provide an comprehensive empirical comparison on real and synthetic problems, showing the benefits and limitations of each model. The patch NL-means is related to the method of [38, 56] and can be interpreted in terms of the mean shift [21] and the manifold models of [49]. The other schemes are, to the best of our knowledge, novel. Both gradientbased methods, patch NL-Poisson and -gradient medians combine the exempar-based interpolation with local PDE-based diffusion schemes. This results in a smoother continuation of the information across the boundary and inside the inpainting domain, and in a better propagation of structures. Finally, we also present an initial discussion along the line of providing a formal modelling of the multiscale scheme. We argue that this approach is not just some heuristic to find a good minimum of the single scale inpainting functional, but constitutes an inpainting criterion itself. Although the focus of this work lies on inpainting, the framework we are introducing can be adapted for its application in other contexts. In particular it is related to graph-based non-local regularizers. Inspired by graph regularization techniques [57], these approaches model the image as a graph characterized by the similarity weights [32,42]. Recently, our formalism [3] was extended by Peyré et al. in [51] to provide a variational justification to the case in which the graph is adaptive. On the other hand the patch NL-means scheme provides a model for an iterative version of the non-local means algorithm with adaptive similarity weights. Similar approaches have been applied to denoising [5,52], super-resolution [53], texture denoising [12], and demosaicing [15] among others. We will discuss the relation with these models in the text. The rest of this document is organized as follows. In Section 2 we introduce the proposed variational framework, together with the derivation and the empirical comparison of the different inpainting schemes. The links with related work are discussed in Section 3. In Section 4 we present and discuss the multiscale approach. And in Section 5 we present experimental results on real images allowing to compare our results with the state of the art. Concluding remarks and future work is discussed in Section 6. We note that a preliminary version of this work has been presented in [3]. Notation. Images are denoted as functions u : Ω → R, where Ω denotes the image domain, usually a rectangle in R2 . Pixel positions are denoted by x, x̂, z, ẑ or y, the latter for positions inside the patch. A patch of u centered at x is denoted by pu (x) = pu (x, ·) : Ωp → R, where Ωp is a rectangle centered at (0, 0). The patch is defined by pu (x, y) = u(x + y), with y ∈ Ωp . O ⊂ Ω is the hole or inpainting domain, and Oc = Ω \ O. We still denote by u the part of the image u inside the hole, while û is the part of u in Oc : û = u|Oc . Additional notation will be introduced in the text. 2 Variational framework Our variational framework is inspired by the following non-local functional Z Z w(x, x̂)(u(x) − û(x̂))2 dx̂dx (1) Fw (u) = O Oc where w : O × Oc → R+ is a weight function that measures the similarity between patches centered in the inpainting domain and in its complement. Let us assume for the moment that the weights are known. The minimum of (1) should have a low pixel error (u(x) − û(x̂))2 whenever the similarity w(x, x̂) is high. In this way the similarity weights drive the information transfer from known to unknown pixels. A similar functional was proposed in [31] as a non-local regularization energy in the context of image denoising. It models the non-local means filter [5,14] when the weights are Gaussian 1 2 w(x, x̂) ∝ exp − kpu (x) − pû (x̂)k , h where k·k is a weighted L2 -norm in the space of patches and h is a parameter that determines the selectivity of the weigths w. 4 In [31] the weights are considered known and remain fixed through all the iterations. While this might be appropriate in applications where they can be estimated from the noisy image, in the image inpainting scenario here addressed, the weights are not available and have to be inferred together with the image (as in [50,53]). One of the novelties of the proposed framework is the inclusion of adaptive weights in a variational setting. For this reason, we will consider the weight function w as an additional unknown. Instead of prescribing explicitly the Gaussian functional dependence of w w.r.t. u, we will do it implicitly, as a component of the optimization process. In doing so, we obtain a simpler functional, avoiding to deal with the complex, non-linear dependence between w and u. In our formulation, w(x, ·) is a probability density function, Z w(x, x̂)dx̂ = 1, Oc and can be seen as a relaxation of the one-to-one correspondence map of [4,24], providing a fuzzy correspondence between each x ∈ O and the complement of the inpainting domain. In this setting, we propose an energy which contains two terms, one of them is inspired by (1) and measures the coherence between the pixels in O and those in Oc , for a given similarity weight function w. This permits the estimation of the image u from the weights w. The second term allows us to compute the weights given the image. The complete proposed functional is Z 1 Hw (x)dx, (2) E(u, w) = Few (u) − h e O where Few (u) = Z Z e O ec O w(x, x̂)ε(pu (x) − pû (x̂))dx̂dx, (3) ε(·) is an error function for image patches, and Z w(x, x̂) log w(x, x̂)dx̂, Hw (x) = − ec O is the entropy of the probability w(x, ·). e the extended inpainting domain, as the We take O, set of centers of patches that intersect the hole, i.e. e = O+Ωp = {x ∈ Ω : (x+Ωp )∩O 6= ∅}. Thus, patches O e c are entirely outside O (Figure pû (x̂) centered in x̂ ∈ O 1). Accordingly, we consider that the weight function R c e e w is defined over O × O and Oec w(x, x̂)dx̂ = 1. For a e + Ωp ⊆ Ω, simplified presentation, we assume that O e supports a patch centered on it i.e. every pixel in O e c to and contained in Ω. Analogously, we also shrink O c e have O + Ωp ⊆ Ω. Let us now make some additional comments on the functional. We observe that the term (u(x) − û(x̂))2 in (1), that penalizes differences between pixels, is substituted in (3) by the patch error function ε(pu (x)−pû (x̂)). This has two consequences. First, minimizing (3) with respect to the image will force patches pu (x) to be similar to pu (x̂) for each pair x, x̂ such that w(x, x̂) is large. The other implication has to be understood together with the inclusion of the second term, which integrates e For a the entropy of each probability w(x, ·) over O. e given completion u, and for each x ∈ O, the optimum weights minimize the mean patch error for pu (x), given by Z w(x, x̂)ε(pu (x) − pû (x̂))dx̂, ec O while maximizing the entropy. For instance taking ε as the squared L2 -norm of the patch, then the resulting weights are Gaussian. This can be related to the principle of maximum entropy [36], widely used for inference of probability distributions. The parameter h controls the trade-off between both terms and is also the selectivity parameter of the Gaussian weights. The trivial minima of E with w(x, x̂) = 0 everywhere is discarded by restricting w(x, ·) to be a probability. The patch error function. Patches are functions defined on Ωp , and the error function ε : RΩp → R+ is defined as the weighted sum of pixel-wise errors e : R → R+ Z ga (y)e(u(x + y) − û(x̂ + y))dy, ε(pu (x) − pû (x̂)) := Ωp (4) where the intra-patch weight function ga is a Gaussian centered at the origin with standard deviation a. We will consider the L1 - and the squared L2 -norms as particular cases of ε(·), with e(·) = | · | and e(·) = | · |2 respectively. We will also consider patch error functions involving the gradient of the image. In an abuse of notation we will denote the gradient’s patch and pixel-wise error Ω functions as ε : R2 p → R+ and e : R2 → R+ , respectively. It will be clear from the argument which case Ris intended, as in this example ε(p∇u (x) − p∇u (x̂)) := Ωpga (y)e(∇u(x + y) − ∇û(x̂ + y))dy. Additionally, we can impose an additive penalization of the distance ϕ(x − x̂) as in [38] by modifying the patch distance function εd (pu (x) − pu (x̂), x, x̂) = ε(pu (x) − pu (x̂)) + ϕ(x − x̂) to penalize the use of distant patches. We will not consider this modification in the current work. As it will be discussed below, the patch error function determines not only the similarity criterion but 5 also the image synthesis, and thus is a key element in the proposed framework. 1 ec δ(x̂ − n(x)), where n(x) ⊆ O as limh→0 w(x, x̂) = |n(x)| is the set of nearest neighbors of x, defined as n(x) = arg min ε(pu (x) − pû (x̂)). e x̂∈O 2.1 Minimization of E We have formulated the inpainting problem as the constrained optimization (u∗ , w∗ ) = arg min E(u, w) u,w Z w(x, x̂)dx̂ = 1 subject to ec O (5) e ∀x ∈ O, where E is the inpainting energy defined in (2). For energies involving image gradients we will consider Dirichlet conditions at the boundary between the inpainting domain and the image, and Neumann conditions at the boundary of the image, i.e. u(x) = û(x) in ∂O \ ∂Ω and ∇u(x) · n(x) = 0 for x ∈ ∂O ∩ ∂Ω. To minimize the energy E, we use an alternate minimization algorithm. At each iteration, two optimization steps are solved: The constrained minimization of E with respect to w while keeping u fixed; and the minimization of E with respect to u with w fixed. This procedure yields the following iterative scheme: Algorithm 1 Alternate minimization of E(u, w). Require: Initial Condition u0 (x) with x ∈ O. 1: repeat 2: Update Weights: wk =R arg minw E(uk , w), s.t. Oec w(x, x̂)dx̂ = 1. 3: Update Image: uk+1 = arg minu E(u, wk ). 4: until Stop Criterion: kuk+1 − uk k < tolerance. In the weights update step, the minimization of E w.r.t. w yields: 1 1 wk (x, x̂) = exp − ε(puk (x) − pû (x̂)) , (6) q(x) h R where q(x) = Oec exp − h1 ε(puk (x) − pû (x̂)) is a normalization factor that makes w(x, ·) a probability. For the functionals defined with gradient patches, the weight update equation is analogous to (6) replacing the patch error function with ε(p∇uk (x) − p∇û (x̂)). The parameter h determines the selectivity of the similarity. If h is large, maximizing the entropy becomes more relevant, yielding weights which are less selective. In the limit, when h → ∞, w(x, ·) becomes ec . On the other hand, a a uniform distribution over O small h yields weights which concentrate on the patches similar to pu (x). When h → 0, we compute the weights That is w(x, ·) can be considered as an approximation to a multivalued correspondence. For simplicity, in practice we assume that |n(x)| = 1, i.e. the nearest neighbor is unique. The image update step deserves more attention and is described next. 2.1.1 Image update step. In this section we present the derivation of the image update step corresponding to the four patch error functions mentioned earlier. First we will present the cases when image patches are compared using the squared L2 -norm and the L1 -norm. We refer to the resulting algorithms as patch-wise non-local means (patch NL-means), and medians (patch NL-medians). Then we consider functionals involving the gradients of the patches both with the squared L2 -norm and the L1 norm. These methods will be referred as patch-wise non-local Poisson (patch NL-Poisson), and gradient medians (patch NL-GM). Before moving to the derivation of the these schemes, let us remark that with the change of variables z = x + y, ẑ = x + y ′ , the image energy term can be expressed as an accumulation of pixel-wise errors: Z Z w(x, x̂) Few (u) = e O ec O Z ga (y)e(u(x + y) − û(x̂ + y))dydx̂dx = Z Z O Ωp m(z, ẑ)e(u(z) − û(ẑ))dẑdz + C, (7) Oc where C is a constant term. We have introduced the pixel-wise influence weights m : O × Oc → R+ defined as Z χOec (ẑ − y)ga (y)w(z − y, ẑ − y)dy. (8) m(z, ẑ) := Ωp ec and 0 on O. e The function χOec takes the value 1 on O An analogous expression can be computed for gradient patch error functions. This rewriting simplifies the following derivations and provides some insights on the implications of using patch-wise errors. For each pair of pixels (z, ẑ) ∈ O × Oc , m(z, ẑ) weights the effective contribution of the pixel-wise error between u(z) and û(ẑ) in the total value of the energy. The quantity m(z, ẑ) is computed by integrating the similarity w(z−y, ẑ−y) between all patches that overlap 6 Fig. 1 Patch-wise non-local inpainting. The value at z ∈ O is computed using contributions from all the patches that overlap e such that z = x + y it, these are patches centered at x ∈ O with y ∈ Ωp . The influence function m(z, ẑ) accumulates all the contributions w(z − y, ẑ − y) from patches centered at ẑ − y to z − y. ẑ and those that overlap z in the same relative position (shown in Figure 1). It tells us how much evidence there is supporting a correspondence between R the locations z and ẑ. Also observe that for each z, Oc m(z, ẑ)dẑ = 1, then m(z, ·) can also be interpreted as a probability density function. Note that the energy (7) corresponds to (1), with the patch similarity weights w being substituted with the pixel-wise influence weights m, a sort of spacial convolution of w with kernel ga . Patch non-local means. If we use a weighted squared 2 L ε(pu (x) − pû (x̂)) := R -norm as a patch error function 2 g (y)|u(x + y) − û(x̂ + y)| dy in (3) then the image Ωp a energy term (7) is quadratic on u, and its minimum for fixed weights w can be computed explicitly as a nonlocal average: Z 1 m(z, ẑ)û(ẑ)dẑ, (9) u(z) = c(z) Oc Rfor z ∈ O, where the normalization constant c(z) := Oc m(z, ẑ)dẑ. Although c(z) = 1 with the current definition of w and ga , we will keep a generic notation in the following derivations. The formal similarity with the non-local means equation hides some important differences, which are a direct consequence of the use of a patch error function in the image energy term. To obtain more insight about this let us expand m to obtain: Z Z 1 w(z − y, x̂)û(x̂ + y)dx̂dy. ga (y) u(z) = c(z) Ωp ec O There are two averaging processes involved in the synthesis. The outer integral goes through all patches pu (z− y) overlapping the target pixel z. Each patch suggests a value for z resulting from the inner sum: A non-local average of the pixel at position y in all patches pû (x̂) ec . This sum is weighted by the similarity between in O the patch pu (z − y) and each pû (x̂). Therefore, we can distinguish two types of pixel interactions. Interactions due to the patch overlap of nearby pixels in the image lattice and non-local interactions driven by the similarity weights. The latter can be controlled by the selectivity parameter h, but the extent of the overlap interactions is given by the patch size. In particular when h → 0 Eq. (9) yields Z ga (y)û(n(z − y) + y)dy u(z) = Ωp (recall we are assuming a unique nearest neighbor). For e the nearest neighbor of pu (x) is centered each x ∈ O, at x and all these patches are averaged according to ga . This blending may cause some blur, which leads us to consider the L1 -norm in the search of a more robust image synthesis. In Appendix A we prove the existence of minima for the Patch NL-Means energy in the continuous setting. Patch non-local medians. The L1 -norm patch error function in the image energy term corresponds to taking e(x) = |x| in (7). The Euler equation for u, given the influence function m, can be formally written as Z sign[u(z) − û(ẑ)]m(z, ẑ)dẑ ∋ 0. [δu Few (u)](z) = Oc This expression is multivalued, since sign(r) = r/|r| if |r| > 0 and sign(r) ∈ [−1, 1] if r = 0. Its solution for each u(z), z ∈ O is obtained as a weighted median of the pixels of the complement Oc , with weights m(z, ·). Both schemes presented so far perform inpainting by transferring (by averages or medians) known gray levels into the inpainting domain. As we will see next, using a patch error function based on the gradient of the image yields methods which transfer gradients and compute the resulting image as the solution of a PDE. This results in better continuation properties of the solution, in particular at the boundary of the inpainting domain. Patch non-local Poisson. The squared L2 -norm of the gradient in the image energy term (3) corresponds to taking e(·) = k · k2 in (7), where k · k is the Euclidean norm in R2 . The energy term becomes Z Z Few (u) = m(z, ẑ)k∇u(z) − ∇û(ẑ)k2 dẑdz. (10) O Oc Recall that we consider Dirichlet conditions at the boundary between the inpainting domain and the known data region, and Neumann conditions at the boundary of the image, i.e. u(x) = û(x) in ∂O \ ∂Ω and ∇u(x)·n(x) = 0 for x ∈ ∂O ∩ ∂Ω. The Euler equation w.r.t. u is given 7 by a non-local Poisson equation, i.e. a Poisson equation with non-local coefficients: ∇ · [c(z)∇u(z)] = ∇ · [c(z)v(z)], for all z ∈ O, where u|Oc = û, c(z) = the field v : O → R2 is given by Z 1 m(z, ẑ)∇û(ẑ)dẑ. v(z) = c(z) Oc (11) R Oc m(z, ẑ)dẑ and (12) The solution of (11) is computed with a conjugate gradient algorithm. Observe that the solutions of (11) are minimizers of Z c(z)k∇u(z) − v(z)k2 dz. O Therefore, u is computed as the image with the closest gradient (in the L2 sense) to the guiding vector field v, which corresponds to a non-local weighted average of the gradients in the complement. See [47] for further uses of the Poisson equation in image editing. Patch non-local gradient medians. To complete our study of the image energy term we consider the L1 -norm of the gradient, that is Z Z m(z, ẑ)k∇u(z) − ∇û(ẑ)kdẑdz, Few (u) = O Oc with the same boundary conditions as with the patchwise NL-Poisson. To minimize it we perform an implicit gradient descent: Z Z min m(z, ẑ)k∇ut+1 (z) − ∇û(ẑ)kdẑdz+ t+1 u O Oc + 1 kut+1 − ut k2 , (13) 2δt where at each step ut+1 is computed using the fixed point algorithm described in the Appendix B, based on the projection method presented in [18]. As patch NL-Poisson, this scheme transfers gradients and interpolates the gray levels using the boundary conditions. With the use of the L1 error function, we expect the solution of patch NL-GM to retain more small scale detail than that of patch NL-Poisson. Notice that for both gradient-based methods the patch similarity weights w are computed based only on the gradients (and thus also the pixel-wise influence weights m). In most cases however, the gradient is not a good feature for measuring the patch similarity, and it would be desirable to consider also the gray level data. This can easily be achieved within our variational framework by modifying the patch error function to take into account both gradients and gray level patches. Functionals combining intensity and gradients. We can use any of the gradient-based energies in conjunction with the patch NL-means energy by considering a linear combination of the corresponding patch error functions. This yields Z Z Feλ (u) = m(z, ẑ)[(1 − λ)k∇u(z) − ∇û(ẑ)kp + O Oc + λ|u(z) − û(ẑ)|2 ]dẑdz (14) where the parameter λ ∈ (0, 1) controls the mixture and p is either 1 or 2. The resulting algorithms update the weights w considering both the intensity and the gradient, therefore improving their selectivity. Regarding the image update, notice that (14) can be rewritten as Z Z Feλ (u) ∝ m(z, ẑ)k∇u(z) − ∇û(ẑ)kp dẑdz O Oc Z λ c(z)|u(z) − f (z)|2 dz, + 1−λ O R where f (z) = c(z)−1 Oc m(z, ẑ)û(ẑ)dẑ is the solution of the patch NL-means image update step. Thus we see that the combination with the squared L2 patch error function translates into a patch NL-means attachment term. For the case of patch NL-Poisson the Euler equation w.r.t. u becomes: λ c(z)u(z) = (1 − λ) λ = ∇ · [c(z)v(z)] − c(z)f (z), (15) (1 − λ) ∇ · [c(z)∇u(z)] − which is linear and can be solved with a conjugate gradient scheme. The patch NL-GM combined functional has basically the same form as (13) and therefore can also be solved with the fixed point scheme described in Appendix B. 2.2 Comparison of proposed schemes Let us advance some results on synthetic problems to highlight the main characteristics of the proposed methods. First we consider the inpainting of a regular texture (shown in Figure 2) with two different mean intensities, where the inpainting domain hides all patches on the boundary between the dark and bright textures. With this example we can test the ability of each method to create an interface between both regions. Situations like these are common in real inpainting problems, for instance due to inhomogeneous lighting conditions. We have also added Gaussian noise with standard deviation 8 Fig. 2 Inpainting of a synthetic texture. The initial condition is shown in the first column. The other four columns show a zoom (region in the red rectangle) of the results of patch NL-means, -medians, -Poisson and -GM. Top row, h = 0, bottom row h = 200, h = 14, h = 400 and h = 20, respectively. The intra-patch weight kernel ga is shown in the bottom right corner of the initial condition, it has a standard deviation a = 5 and the patch size is s = 15. σ = 10 to show the influence of the selectivity parameter h. Each column of Figure 2 shows the results of the four methods described in the previous section. We have tested each method with h = 0 (top row), and h > 0, chosen approximately to match the expected deviation of each patch error due to the noise. The first notorious difference is on how the methods handled the transition between the dark and bright textures. Patch NL-means produces a smooth transition whereas a sharp step is obtained with the patch NL-medians. On the other hand, both gradient based methods yield a much smoother shading of the texture. This is due to the fact that the image update step is computed as the solution of a PDE which diffuses the intensity values present at the boundary of the inpainting domain. These PDEs are driven by a vector field estimated non-locally and therefore combine non-local exemplar based inpainting with local interpolation PDEs. For the case of patch NL-Poisson this interpolation is linear, since this is a solution of the homogeneous Poisson equation (i.e. Laplace equation). 250 patch NL−means patch NL−medians patch NL−Poisson patch NL−GM known image 200 150 100 40 50 60 70 80 250 90 100 110 patch NL−means patch NL−medians patch NL−Poisson patch NL−GM known image 200 150 100 40 50 60 70 80 90 100 110 Fig. 3 Profiles of the results in Figure 2. The profiles are taken from an horizontal line going between the circles in Figure 2. Top: results with h = 0 and bottom: results with h > 0. As expected, the results using a higher h show some denoising, since for larger h the patch similarity weights are less selective. This effect can be better appreciated in the profiles shown in Figure 3, which depict the image values for a horizontal line between the circles. In the usual context of inpainting, in which the available data is not perturbed by noise, this denoising translates into an undesirable loss of texture quality. For that reason we use h = 0. Recall that in the limit when h = 0, the weights w(x, x̂) converge to a Dirac’s delta function at the set n(x) of nearest neighbors of pu (x). Even if we assume that the nearest neighbor is unique, the value of a pixel is still computed from a population obtained from those nearest neighbors of all patches that overlap the pixel. In the case of periodic patterns, once the minimization has reached a stable state, all values in the population will be basically the same: patches surrounding a pixel agree on its value, and in this case all schemes behave similarly. Differences arise when neighboring patches cannot agree on their suggested values. Such is the case of the step in Fig. 2. For non-periodic patterns and random textures this disagreements will be common, which may affect the perceptual similarity of the synthesized texture with the original. Figure 4 studies this effect. The image consists of Gaussian noise. The top row shows the resulting completions, and the bottom row shows an estimate of the local variance (computed by smoothing the image of the squared differences w.r.t. the mean). The red curves require a brief explanation. Following [7], we use the term Nearest Neighbor Field (NNF) to refer to the vector e where n(x) ∈ O ec is the field n(x) − x, defined over O, position of the (assumed) unique nearest neighbor of pu (x). In Figure 4 we show in red the boundaries of the regions with constant NNF. In those regions patches are translated rigidly from somewhere in the complement. We can observe that, for all inpainting schemes, the resulting completion consists basically of a patchwork of large regions of rigid translation. The interior of these regions, away from their boundaries (red curves), 9 Fig. 4 Inpainting of random texture. From left to right: patch- NL-means, NL-medians, NL-Poisson and NL-GM. Top row: inpainting results. Bottom row: local variance estimate of the noise, superimposed with the boundaries between regions of constant NNF (red curves). Notice the attenuation of the variance over the red curves, specially for L2 -based methods. Fig. 5 Inpainting of structured texture. From left to right: Initial condition, result of path NL-means, -medians, -Poisson and -GM. Results with s = 15, a = 5 and h = 0. The treatment of color images is described in Section 5. reproduces an exact copy of the source, and thus the variance is preserved. The variance decreases along the red curves, where pixel values are synthesized using incoherent contributions. Both for the intensity models or the gradient models, the L2 -norm causes a higher decay of the variance than with the L1 -norm. This effect can also be observed in figures 2 and 3, where it can be seen that patch NL-medians and -GM preserve more of the Gaussian noise than the methods based on L2 -norms. We will refer to the regions of constant NNF as copy fronts. Recall however that copy occurs only away from their boundaries. In Figure 5 we show results on a non-periodic, structured texture. We can see that patch NL-means and Poisson show some smoothing, whereas both L1 -based schemes obtained sharper results. Both intensity based methods show discontinuities at the boundary of the inpainting domain. Notice also that for the patch NLmedians algorithm it is easy to identify the regions in the complement that have been replicated. Where two copy fronts meet a seam is produced. With the patch NL-means the spatial averaging of overlapping patches creates a smooth blending of the copy fronts. 2.2.1 Combined schemes Gradient based methods produce smooth interpolations and enforce the continuity of the image at the boundary of the inpainting domain, which are generally desirable features. For this to be done variationally the similar- ity weights w have to be computed using patches of the gradient, which in most cases does not provide a reliable measure of patch similarity. In practice better results are obtained by combining these methods with patch NL-means, allowing to take the image values into account for the computation of the patch error, and to synthesize the image with a diffusion PDE. This adds a parameter λ, which controls the mixture. In Figure 6 we show some results corresponding to the combination of both gradient schemes with patch NL-means while varying the mixture coefficient λ. The image shows a periodic pattern with an illuminance gradient. Most of the dark exemplars are incomplete, and thus only bright exemplars from the bottom of the image are available. The rightmost detail shows the result of patch NL-means: the image has been completed using bright patches and presents a discontinuity on the upper side of the hole. On the other hand, a completion using gradients only (see result of the patch NL-GM) manages to interpolate both the texture and the shading. The small images on the right show results of both gradient methods with different values for λ. The value of the mixing parameter λ should be carefully selected since it mixes two different magnitudes (norms of gradients and gray levels). With λ ∼ 0.1 for patch NLPoisson + NL-means and λ ∼ 0.01 for patch NL-GM + NL-means, some of the good continuation properties are preserved and enough color information is added to the patch metric. 10 Fig. 6 Linear combination of gradient based methods with NL-means. First image starting from the left: Initialization. The gray rectangle is the inpainting domain. Only patches centered outside the red area are available. Second image: Result obtained with patch NL-GM, using λ = 0. A similar result is obtained with the patch NL-Poisson with λ = 0 (not shown). Details: Top, from left to right: results with patch NL-Poisson with λ = 0.01, 0.05, 0.1, 1 (the latter corresponds to patch NL-means). Bottom, patch NL-GM with λ = 0.001, 0.005, 0.01. All details have been linearly stretched for display. Differences are most noticeable in the top and in the left of the images. GAP 2.2.2 Geometric interpolation To evaluate the ability of each method to continue the geometry of the structures at the boundary of the inpainting domain, we consider a very simple image with a gap (shown in Table 1). The inpainting region is initialized with the background color. For this evaluation we fix the size of the patch and increase the width of the gap. For narrow gaps, the method will be able to join both ends of the green vertical line. When increasing the gap, at a certain width the method is no longer capable of recovering the vertical line. In these cases the gray initialization prevails, showing a completion in which the vertical line is interrupted. The first column of Table 1 shows the maximum height of the gap for each method. Observe that the combined schemes improve the prolongation of geometric structures, and that the optimal mixing parameter λ for this purpose is near to the values that we have proposed in the previous section. Basically these schemes have two propagation mechanisms: A local one, by diffusion of the intensity values by the PDE, and a non-local one by transference of gradients from Oc . When λ > 0 these mechanisms reinforce each other: the diffused values allow a better estimation of the weights, and therefore the transference of more appropriate gradients, which will help to diffuse the intensity values further. On the other hand, intensity based methods depend only on the iteration of weights computation and image update to propagate information. The second column of Table 1 shows the results obtained by incorporating the confidence mask later described in Section 2.3. An alternative way to prolong the geometric structures is to increase the patch size as will be discussed in Section 4. Method P+Mλ=0 P + M λ = 0.1 P + M λ = 0.5 P+Mλ=1 Md GM + M λ = 0 GM + M λ = 0.01 GM + M λ = 0.001 13 16 11 9 7 15 21 40 GAP tc = 5 40 46 34 29 42 > 56 > 56 > 56 Table 1 Geometric interpolation. The inpainting domain is shown in white and the patch in the lower right corner (9 × 9 pixels). The table reports the maximum gap width for which the algorithm is capable of recovering the vertical line. P, M, Md and GM stand for patch NL-Poisson, -Means, -Medians and -Gradient Median respectively. The rightmost column shows the maximal gaps obtained with the use of a confidence masks with decay tc = 5 (see Section 2.3). 2.3 Extensions Color images. An energy for color images can be obtained by defining a patch error function for color patches as the sum of the error functions of the three scalar components: ε(pu (x) − pû (x̂)) = 3 X ε(pui (x) − pûi (x̂)), i=1 where u : Ω → R3 is the color image, and ui , with i = 1, 2, 3, its components (analogously for gradientbased errors). Given the weights, each channel is updated using the corresponding scheme for scalar images. All channels are updated using the same weights. Confidence mask. For large inpainting domains, it is useful to introduce a mask κ : Ω → (0, 1] which assigns 11 a confidence value to each pixel, depending on the certainty of its information (see also [22,39]). This will help in guiding the flow of information from the boundary towards the interior of the hole, eliminating some local minima and reducing the effect of the initial condition. The resulting image energy term takes the form Z Z Few (u) = κ(x)w(x, x̂)ε(pu (x) − pû (x̂))dx̂dx, e O ec O where κ modulates the penalization of the incoherences between w and the error ε between patches. The effect of κ on the image update step can be seen on the pixel-wise influence weights m Z χOec (ẑ − y)ga (y)κ(z − y)w(z − y, ẑ − y)dy. m(z, ẑ) = Ωp Thus, the contribution of the patch pu (z − y) to the evidence function is now weighted by its confidence. Patches with higher confidence will have a stronger influence. Notice that with the inclussion of the confidence mask, the normalization coefficient c(z) becomes: Z Z ga (y)κ(z − y)dy. c(z) = m(z, ẑ)dẑ = Oc Ωp This does not affect intensity-based methods, but has implications on the image update step for gradientbased ones. For example, for patch NL-Poisson the errors with respect to the non-local guiding vector field k∇u(z) − v(z)k are penalized according to c(z). On the similarity weights, the confidence mask has the effect of modifying the selectivity parameter h by a locally varying h/κ(x). If the confidence is high, the effective selectivity h/κ(x) will be lower, thus increasing the selectivity of the similarity measure. When h → 0 the weights tend to a Dirac’s delta independently of κ. The same reasoning applies to the gradient based energies. For the experiments shown in this paper, the confidence mask was set to ( (1 − κ0 ) exp − d(x,∂O) + κ0 if x ∈ O, tc κ(x) = 1 if x ∈ Oc , which shows an exponential decay w.r.t. the distance to the boundary inside the hole d(·, ∂O). Here tc > 0 is the decay time and κ0 > 0 determines the asymptotic value reached far away from the boundary. Setting tc = 0 amounts to using a constant confidence mask. Table 1 shows the effect of using a confidence mask with tc = 5 and κ0 = 0.1, allowing the restoration of the vertical line for much wider gaps, and thus alleviating the dependence with the gray initial condition. Other shapes of the confidence mask could be used for controlling different aspects of the dynamics of the completion algorithm. For instance controlling the decay of the mask from certain points of the boundary allow to privilege the continuation of structures from them. Interpolation of sparsely sampled images. In this work we have implicitly assumed the existence of a set of e c , constituting the set of exemplars. complete patches O This assumption can be removed by generalizing the patch error function ε(·) so that it depends on the positions of the patches. This implies a potentially different error function for each pair of patches. This has been explored in [29], applied to the case of interpolation of sparsely sampled images. In that case the patch error function was designed to consider only those locations known in at least one of the two patches. 3 Discussion and connections In this section we present various issues related to the variational framework and its connections with other models. 3.1 Probabilistic-geometric model interpretation The proposed energy has an interpretation in terms of a probabilistic model in the space of patches, which becomes apparent when rewritten using the generalized Kullback-Leibler divergence [23]. Given two positive and integrable functions p, q defined over a certain measure space X , the generalized Kullback-Leibler divergence is given by: KL(p, q) = Z p(s) log X p(s) q(s) ds− Z Z q(s)ds, p(s)ds + − X X assuming that the integrals exist. With this notation (and taking into account that w(x, ·) is a probability) the functional E can be written as E(u, w) = Z e O KL (w(x, ·), r(x, ·)) dx− Z Z − e O ec O r(x, x̂)dx̂dx, where r is the unnormalized Gaussian weight function 1 r(x, x̂) = exp − ε(pu (x) − pû (x̂)) . h 12 The first term integrates the divergence between the e The second functions w(x, ·) and r(x, ·), for each x ∈ O. term can be interpreted by noticing that Z r(x, x̂)dx̂ (16) q̃(x) = ec O is a density estimate (in the patch space) of the set of ec patches in Oc : The higher the amount of patches in O close to pu (x) (according to the scale parameter h), the higher the value of qe. The minimizers (u∗ , w∗ ) are obtained when for all e×O e c , w∗ (x, x̂) = r∗ (x, x̂)/q̃ ∗ (x), (Gaussian (x, x̂) ∈ O weights normalized by (16)), and the patches of the inpainted image are in regions of high density in the patch space. This provides a geometric intuitive interpretation of our variational formulation. The image is considered as an ensemble of overlapping patches. Known ec are fixed, forming a patch density model patches in O e The richness of the used to estimate the patches in O. framework is given in part by the fact that different norms in the patch space induce inpainting schemes of different nature. 3.2 Connections with statistical mechanics The proposed energy can be given another interpretation in terms of statistical mechanics, as the Helmholtz free energy of a system of particles. In this context we consider that for each “particle” x ∈ Õ there is a set of possible configurations indexed by the parameter x̂ ∈ Õc with a probability density w(x, x̂). If we consider that each configuration has an energy U (x, x̂) = ε(pu (x) − pu (x̂)), then the probability that the particle at x is in the state x̂ is proportional to e−βU(x,x̂) /q(x) with the normalization factor q given by the partition function X q(x) = e−βU(x,x̂) . x̂ Then the Helmholtz free energy for a particle x is H(x) := U (x) + β −1 E(x) := X X := U (x, x̂)w(x, x̂) + β −1 w(x, x̂) log w(x, x̂). x̂ x̂ Then the total Helmoltz free energy for the system of particles is X XX H := H(x) = U (x, x̂)w(x, x̂)+ x x x̂ β −1 XX x x̂ w(x, x̂) log w(x, x̂), where the sums are extended to x ∈ Õ and x̂ ∈ Õc . We have used the notation β as h1 . 3.3 Revisiting related work In this section we will briefly review the connections of our work with other inpainting algorithms and also with existing variational models of non-local regularization which have been proposed in contexts such as image denoising. The method in [56] is closely related to the patch NL-means scheme of Eq. (9). The key difference lies in the underlying theoretical model. The problem is addressed as a MRF, where pixels outside the hole are observable variables, missing pixels in the hole are the parameters, and the hidden variables are given by the correspondence Γ : O → Oc , which assigns a patch outside the hole to each x in O. The method can be seen as an approximate EM algorithm for maximizing the log-likelihood w.r.t. the pixels in O, and some approximations have to be taken to make the optimization tractable. Based on heuristics, the authors also propose to use more robust estimators than the mean for the synthesis of pixels. Within the framework here proposed, robust estimators (as the median) naturally result from particular choices of the patch error functions ε(·). In [38] an energy is presented which can be seen as a limit case of the patch NL-means energy when h → 0. The authors propose modifications of the energy which improve the results, such as some spatial localization of the similarity weights and brightness invariance. The latter is achieved by introducing a multiplicative constant that matches the mean illuminance between each pair of patches. The patch NL-means algorithm is also related to the manifold image models of [49]. Eq. (9) can be split into two steps which are analog to the manifold and image projection steps used in [49]. First, for each patch cene we compute a new patch as a weighted avertered in O age of all patches in the complement, R according to the patch similarity weights pMS ec w(z, ẑ)pû (ẑ)dẑ u (z) := O e Doing this for each hole position yields an with z ∈ O. incoherent ensemble of patches. The imageR is obtained 1 by averaging these patches: u(z) = A(Ω pMS u (z − p ) Ωp y, y)dy. We use a density model, instead of the manifold model of [49]. Indeed, pMS u (x) is the mean shift operator applied to pu (x). It is known that the iteration of this operator corresponds to an adaptive gradient ascent of the Parzen estimate of a PDF [21], which in this case is generated by the set of patches in the complement of the hole. The use of a density model entails 13 some advantages, mainly from the computational point of view, learning a manifold model is computationally costly. Furthermore, the assumption that patches lie on a manifold is questionable (one could think for instance in a stratification as a more realistic model), and its dimension is hard to determine for real images. In the following we will comment on the relation of this model with recent works on non-local regularization. The UINTA algorithm, presented in [5] is a nonlocal denoising algorithm that minimizes the entropy of the patches in the image. Casting this idea to the context of inpainting the UINTA’s entropy is estimated as the sample mean Z Z 1 2 EU (u) = − log exp(− kpu (x) − pû (x̂)k )dx̂ dx h e ec O O where the inner integral is the probability of occurrence of the patch pu (x) obtained as a Parzen density estimate. The corresponding Euler-Lagrange equation can be solved with a fixed point iteration which coincides with the patch NL-means scheme (9). In [5] this energy is minimized by considering all patches as independent (disregarding the overlap between neighboring patches), and evolving each of them according to a gradient descent of EU . After this, an image is formed with the centers of these new patches. The repetition of this process results in an iterative application of pixel-wise NL-means. In [13] the authors use a variational principle for deriving the iterated pixel NL-means regularizer, and show its application to the restoration of texture. The underlying energy corresponds to the quadratic penalty between the solution image u, and a pixel NL-means type average of the noisy input image û. The weights for this average are computed using u. Due to the dependence of the weights with the regularized image u, the minimizer is no longer a weighted average as NL-means, but the solution of a nonlinear optimization problem. It is shown that if the derivative of the nonlinear component is neglected, the resulting Euler-Lagrange equation matches the proposed fixed point algorithm: The iterated NL-means regularizer. In [52] the authors presented a variational framework for image denoising consisting in non-local regularization and data adjustment terms. Inpainting could be performed by considering only the data term as follows: Z Z exp(ε(pu (x) − pû (x̂)))dx̂dx EP (u) = − e O ec O This energy is the same as the one adapted from the UINTA algorithm EU , without the logarithm. In [52] the Euler-Lagrange equation is solved with a fixed point iteration. This model has two differences with our framework. First it allows to use a more general nonlinearity for the computation of the weights other than the exponential. Second, even in the case of the exponential, the methods differ in the normalization, for instance, when ε is the squared L2 -norm, the resulting scheme is as the patch NL-means, with the unnormalized weights. After its introduction in [3], our model has been interpreted as a non-local self-similarity regularizer in [51], where in conjunction with appropriate data fitting terms it has been applied to the solution of inverse problems, including inpainting, super-resolution and compressive sensing. In [51] a different patch-error function ε is used, namely the L2 -norm between patches (without squaring it). This choice is motivated as a patch-wise version of their work [50] on non-local Total Variation [32,42,57] with adaptive weights. This patch-wise non-local TV is defined as the L1 -norm of the non-local gradient of the patch valued image pu : Ω → RΩp . The non-local gradient is defined as a function ∇w pu : Ω × Ω → RΩp given by ∇w pu (x, x̂) = w(x, x̂)(pu (x) − pu (x̂)). Thus, the patch-wise non-local TV reads Z Z w(x, x̂)kpu (x) − pu (x̂)k2 dx̂dx. k∇w pu k := Ω Ω Note that in this sense, the model of the patch-NL medians corresponds an anisotropic version of the nonlocal TV where the 2-norm in the integral is replaced by the 1-norm. Our work and the work of [51] are complementary. In [51] the regularization term is fixed, and the authors focus on the possibilities given by different data terms suited for different applications. On the other hand in this work we focus on the regularization term exploring its properties with different patch error functions ε, and applying them to a problem in which the data term plays no role at all, since there is no data to adjust to. 4 Multiscale scheme Exemplar-based inpainting methods show a critical dependence with the size of the patch. In Figure 7, we show completions obtained with patch NL-means using different patch sizes: Two results with a small patch (a = 4) and one result with a large patch (a = 19). The latter is able to reproduce the periodic pattern of the lamps, but the completion is blurry due to the spacial overlap of the patches and presents many discontinuities at the boundary of the hole. The results with the small patch do not show these artifacts, but one of them has failed to reproduce the 14 Fig. 7 Mitama Matsuri. Left column: inpainting domain and initial condition. For the rest of the columns, from left to right single scale inpainting with a 9 × 9 patch with a = 4, single scale inpainting with a 43 × 43 patch with a = 19 and multi-scale inpainting, with S = 2, covering a patch size from 9 × 9 with a = 4 to 43 × 43 with a = 19. All results have been computed with the patch NL-means scheme. The bottom row shows the regions in which the nearest neighbor offset is constant. lamps. The only difference between both is the initialization. One of them was initialized with the original image shown in the bottom left, whereas the other one with the result obtained with the multiscale approach described in this section. Figure 8 shows a synthetic problem with similar characteristics. The result with a large patch recovers the shape of the incomplete rectangle, but does not continue its texture correctly. Three different small scale results are shown. They are all minima of the energy with minimal energy, E = 0. Patches are too small to capture that these completions are different. Which one is obtained will depend on the initialization. As done by almost all state of the art exemplar based inpainting methods (e.g. [38,39,56]), we will incorporate a multiscale scheme, described below. This is usually motivated as an heuristic to avoid local minima, to find a good initialization and/or to alleviate the computational cost. We believe however (following [34]) that, as these examples suggest, inpainting is inherently a multiscale problem: Images have structures of different sizes, ranging from large objects to fine scale textures and edges. The multiscale scheme responds to the fact that several patch sizes are needed to reproduce all these structures properly. 4.1 Multiscale scheme In the following we describe the multiscale method we adopted in this work, which goes along the lines of what is customary in the literature [30,38,56]. It consists on sequentially applying the inpainting scheme on a Gaussian image pyramid, starting from the coarsest scale. The result at each scale is upsampled and used as ini- tialization for the next finer scale. The patch size is constant through scales. Let us consider S scales, the finest denoted with s = 0. We will specify the size of the image at the coarsest level AS−1 . Denoting the size of the image at the finest scale by A0 , we compute the sampling rate as r := (A0 /AS−1 )1/(S−1) ∈ (0, 1). The width of the Gaussian filtering is associated to the subsampling factor as in [46]. Let a0 be the size of the patch and Ea0 the corresponding energy. We will add the superindex s = 0, . . . , S − 1 to the variables u and w to denote the scale. As before, the subindex 0 refers the initial condition, i.e. us0 is the initial condition at scale s. Algorithm 2 Multiscale scheme. Require: uS0 , S, a0 and AS−1 1: Initialize: (uS−1 , wS−1 ) = arg min(u,w) Ea0 (u, w) 2: for each scale s = S − 2, . . . , 0 do 3: Upsample us+1 to obtain us0 4: (us , ws ) = arg min(u,w) Ea0 (u, w) 5: end for The upsampling from s + 1 to s is obtained as in [56]. The coarse weights ws+1 are first interpolated to the finer image size, yielding w0s . These weights are then used to solve an image update step at the new scale: us0 = minu Ea0 (u, w0s ). More conventional upsampling schemes by local interpolation (such as bilinear or splines) introduce a bias towards low-frequency non-textured regions. This exemplar-based upsampling avoids this bias. Notice that keeping the patch size constant while filtering and reducing the image, is almost equivalent to enlarging the patch domain and filtering an image 15 Fig. 8 Single scale inpainting. Synthetic inpainting problem (the red rectangle is the inpainting domain) and several local minima of a single scale inpainting energy with a small patch. of constant size. The process can thus be seen as the sequential minimization of a series of inpainting energies with varying patch size given by as = 1/rs a0 , s = 0, . . . , S − 1, over a corresponding series of filtered images. The scheme starts by the coarsest scale S − 1, in which a larger portion of the inpainting domain is covered by partially known patches, which makes the inpainting task easier, and less dependent on the initialization. The energy at this scale should have fewer local minima. The dependency of the minimization process on the initial condition ensures also that each single scale solution corresponds to a local minimum close to the coarse scale initialization. Therefore the multiscale algorithm exploits this dependency to obtain an image u0 which is approximately self-similar for all scales (or equivalently, for all patch sizes). Figure 7 shows a comparison between single and multiscale results with the patch NL-means scheme. As expected the multiscale result shows the benefits of large and small patch sizes. The missing lamps have been completed with the correct shape and spacing by the coarser stages, and the fine details are also taken care off by the finer scales. The bottom row shows the regions of constant NNF. The single scale result with a large patch shows an NNF partitioned in relatively large regions, showing that with a large patch size the copying is more rigid. The result with a small patch size shows that the synthesis has been performed based on small regions. The multiscale’s NNF shows an intermediate partition, with some large regions inside of the hole and a smaller ones around its boundary. The finer inpaintings work by refining the coarse partition obtained at coarser scales. 5 Experimental results In this section we further demonstrate the performance of the proposed schemes on real inpainting problems. The images used were obtained from Komodakis and Tziritas [39] and from the 100 images benchmark proposed by Kawai et al. [38], available at http://yokoya. naist.jp/research/inpainting/. Due to the lack of space we focus on the methods presented in this work. Please refer to [38,39] for a comparison with their approaches. 5.1 Experimental setting We consider four inpainting methods, namely patch NL-means, -medians, -Poisson and -GM. Both gradient based methods are always combined with patch NLmeans with mixing parameter λ, as in (14). In all cases we use the multiscale approach. As stated before, to prevent blurring we set h, the selectivity of the similarity weights w, to h → 0. In this case, the weights select e We show rethe nearest neighbors of each patch in O. sults with the CIE La*b* color space. The calculation of the weights dominates the computational load of the algorithms. With an exhaustive search for the exact nearest neighbor, the cost of each iteration is O(A(O)× A(Oc )× s2 ). However, a significant speed-up can be obtained with approximate searches, almost without any noticeable decrement in the quality of the results (see [13] and references therein). In our implementation we use a modified version of the PatchMatch algorithm introduced in [7]. PatchMatch is an iterative algorithm that estimates the NNF jointly for all patches in the inpainting domain by exploiting the coherence of natural images. The modifi- 16 cation we implemented allows the estimation of a lists of the L first nearest neighbors of each patch. The algorithm has a computational cost of O(A(O) × s2 × L) per iteration. Typically between 5 and 10 iterations are sufficient to obtain results comparable with the exhaustive search algorithm when using lists of size L ≃ 10. In Appendix C we describe the modified PatchMatch. The image update step is fast for the patch NLmeans, -median and -Poisson, but can be time consuming for patch NL-GM, in particular for small values of the mixing parameter λ. See Appendix B for more details. The results are shown in figures 9, 10 and 11. The first column shows the initialization. The rest of the columns show the results obtained with patch NL-means, -medians, -Poisson and -GM in that order. Obtaining good results requires fixing the following parameters: Patch size. For almost all experiments we used patches of size s between 3 × 3 and 9 × 9. We did not use Gaussian intra-patch weights to shape the patch, since they need a larger support, which implies less available exemplars. Multiscale parameters. The multiscale scheme has two parameters: the size of the coarsest image AS−1 and the number of scales S. From these, AS−1 is the most critical parameter. Smaller images with smaller inpainting domains are easier to complete. Care must be taken, however, since small images also imply less available exemplars to copy from. In these experiments AS−1 was set to a 20% of the original size when possible. Some cases required less subsampling. The number of scales S was set such that the subsampling rate r = (A0 /AS−1 )1/(S−1) ≈ (1/2)1/3 ≈ 0.8 as in [56]. Confidence mask. The confidence mask has two parameters, the asymptotic value c0 and the decay time tc . For all experiments we fix c0 = 0.1 and used a decay time tc = 5 except for small inpainting domains, in which we set tc = 1. For the mixing coefficient λ of gradient-based methods we tested two configurations: Low-λ corresponding to λ = 0.01 and λ = 0.001 for patch-NL Poisson and GM, and high-λ corresponding to λ = 0.1 and λ = 0.01 for patch-NL Poisson and -GM. Recall that lower values of λ give a higher weight to the gradient component of the energy. This is appropriate for structured images with strong edges. In almost each row of figures 9, 10 and 11 we used the same λ configuration for both gradient-based schemes. The rest of the parameters are also the same for each row in figures 10 and 11. Instead, for many images in Fig. 9 we used different parameters for intensity- and gradient-based methods. In these cases, image-based methods required larger patches than gradient-based ones. 5.2 Observations and comments Gradient vs. intensity. Gradient-based methods show a better reconstruction of structure (Fig. 9) and periodic patterns (Fig. 11). Intensity-based methods present discontinuities at the boundary of the hole and between copy fronts. Furthermore, gradient-based methods facilitate the prolongation of structures and edges, due to the reinforcement of local PDE diffusion and non-local propagation (see Section 2.2). This allows to use smaller patches, which alleviates the computational load and reflects in more available exemplars and less blending due to patch overlap (either by averages or medians). On the other hand, gradient-based methods show problems with random textures (Fig. 10) and in some occasions between copy fronts. In these cases, the nearest neighbors of overlapping patches disagree on the gradient of a certain pixel. This may result in the attenuation or the omission of gradients, causing a “spilling effect”, particularly for the patch NL-Poisson (e.g. rows 1,3 and 4 of Fig. 9). Some failures of gradient-based methods on random textures result from the use of gradients in the patch error function, as in rows 4 and 6 of Fig. 10. In the former, for instance, segments of the sky have been reproduced in the snow. Fig. 12 shows results with an inpainting scheme in which the weights are computed based only on the image values (with the squared L2 -norm), and the image is updated using patch NL-Poisson or -GM with a low value of λ. Such scheme is non-variational and its convergence its not guaranteed. We should point out that in many other cases, considering gradients in the comparison criterion improves the results. See for instance rows 2 and 4 of Fig. 11. Means vs. medians. It is notorious that L1 -based functionals perform better at the reproduction of fine texture. The results of the L2 methods are smoothed by the spatial averaging of overlapping patches. On the other hand, patch NL-medians creates sharp discontinuities as in Fig. 2 when different copy fronts meet (e.g. rows 1, 3, 5 and 7 in Fig. 9). These discontinuities are very noticeable and in these cases some smoothing is desirable. For the patch NL-GM method, discontinuities between copy fronts occur at the gradient level, and are eliminated after its integration. The fact that the gradient is estimated robustly helps in avoiding the spilling effect of patch NL-Poisson. 17 Fig. 9 Results on structured images. From left to right: Original image and mask, patch NL-means, -medians, -Poisson and -GM. 18 Fig. 10 Results on random textures. From left to right: Original image and mask, patch NL-means, -medians, -Poisson and -GM. All images in each row have been generated with the same parameters. We have observed that L1 methods are more reluctant to make changes during the minimization process. The same robustness that allows a better performance with textures and avoids the spilling effect makes them more greedy. Once a set of neighboring patches have settled on a locally stable solution (typically a region of constant NNF), it is hard for the algorithm to change that local decision. To understand why, imagine a copy front advancing from the boundary carrying correct information which meets an already settled copy front, which has taken an undesirable decision based on a bad initialization. Pixels on the boundary of the mistaken front start receiving contributions from patches in the advancing front. Initially these contributions will be outliers in the distribution from which the pixel value is estimated. The median will discard these outliers, and 19 Fig. 11 Results on periodic textures. From left to right: Original image and mask, patch NL-means, -medians, -Poisson and -GM. All images in each row have been generated with the same parameters. the pixel value will not change unless of course, patches in the advancing front outnumber the mistaken ones. Although the confidence mask diminishes this effect, L1 -based methods (specially patch NL-medians) still show more dependence on the initialization. A result of this are some misalignments in straight lines, due to subsampling artifacts of the multiscale scheme. Also patch NL-medians generally requires the use of larger patches, particularly for structured images. This is not always possible, as in row 6 of Fig. 9, where we could not find a proper set of parameters. However we found a good result by after 3 iterations of the multiscale scheme. The focus of this work lies more on introducing and exploring the variational framework than in presenting a single inpainting algorithm. Still, we would like to comment on which would be the best method among the ones presented. Based on the previous observations, it is clear that the answer to this question will depend on the characteristics of the inpainting problem. However, a priori, the patch NL-Poisson seems a reasonable compromise between quality and computational cost. It is also able to provide good results for a wider range of parameters. For cases in which an accurate reproduction of random textures is important, patch NL-GM 20 production of textures. These schemes are however, more greedy, or “stiff,” and therefore more dependent on the initialization. Conversely the squared L2 -norm introduces some blur, and show less stiffness. 2. When gradients are considered in the patch error function, the image update step is computed as the solution of a (local) diffusion PDE, driven by a nonlocally estimated gradient field. This improves the synthesis in terms of continuity at the boundary and attenuation of seams. Also the local interpolation helps in the propagation of structures, such as edges at the boundary of the hole. Fig. 12 Results with a non variational algorithm. The gradient methods sometimes introduce blurry areas in the solution. Here we show the results using a non variational version of the patch NL-Poisson (left column) and patch NL-GM (right column) for some of those images. may be a better option, however at a higher computational cost. 6 Conclusions and future work In this work we presented a variational framework for exemplar-based non-local image inpainting. The proposed energy lends itself to intuitive interpretations in terms of probabilistic models and has connections with mean shift and statistical mechanics. We minimize it using a coordinate descent algorithm, alternating between similarity weights and the image updates. These processes are coupled through the patch error function, i.e. the criterion used to compare patches. We derived from this framework four different inpainting schemes, corresponding to error functions based on the combinations of L1 - and L2 -norms, with image or gradient patches. After testing these schemes on synthetic and real problems we arrived to the following conclusions. 1. The robustness of the L1 -norm (patch NL-medians and NL-GM) makes it more appropriate for the re- The proposed functional shows a critical dependence with the patch size. Furthermore, it is nonconvex and has many local minima, particularly for small patch sizes. To tackle these issues we use a multiscale approach. This is customary in the literature, usually motivated as a way to obtain a good initialization and for computational reasons. We believe however that inpainting is inherently a multiscale problem, and that underlying the multiscale approach lies an inpainting criterion. We are currently working on a variational formulation of multiscale inpainting. Other topic of current research is the use of other patch error functions based on the comparison of structure tensors, which could provide a more robust estimation of the morphological structure of the image. A Existence of minima for the functional (2) Let us consider the model Z Z 1 w(x, x̂)kpu (x) − pu (x̂)k2 dx̂dx+ E(u, w) = e O ec h O Z Z + w(x, x̂) log w(x, x̂)dx̂dx. e O (17) ec O In this section we consider the squared L2 patch error function kpu (x) − pu (x̂)k2 = Z ga (y)(u(x + y) − u(x̂ + y))2 dy Ωp where ga denotes a smooth integrable kernel whose gradient is also integrable. Let e×O ec → R : P := {w : O Z ec O e w(x, x̂) dx̂ = 1 a.e. x ∈ O}. Let us consider the admissible class of functions AM := {(u, w) : u = û in O c , |u(x)| ≤ M w ∈ P}. Proposition 1 Assume that û ∈ BV (O c ). There exists a solution of the variational problem min (u,w)∈AM E(u, w). (18) 21 Proof Let (un , wn ) be a minimizing sequence of (18), i.e. a sequence such that E(un , wn ) → inf (u,w)∈AM E(u, w). Since Ω is a bounded domain we have that Z Z χ{w>1} wn (x, x̂) log wn (x, x̂)dx̂dx e O − = ec O is bounded. Hence wn (1 + log+ wn ) is bounded in L1 . Then the sequence wn is relatively weakly compact in L1 and by extracting a subsequence, if necessary, we may assume that wn weakly e×O e c ) to some w ∈ P. Let us now prove that converges in L1 (O Z ga (y)(u(x + y) − u(x̂ + y))2 dy ∇x+x̂ +2 ∇x−x̂ Ωp =2 Z Z ∇y ga (y)u(x + y)dy. Ωp Ωp and Z ga (y)(u(x + y) − u(x̂ + y))2 dy Ωp =2 ga (y)(u(x + y) − u(x̂ + y)) Ωp (∇y u(x + y) − ∇y u(x̂ + y))dy Z ∇y ga (y)(u(x + y) − u(x̂ + y))2 dy, Ωp the L1 integrability of ∇y ga (y) and the boundedness of u. We are also assumming that ga vanishes at the boundary of Ωp . Now, Z ∇x−x̂ ga (y)(u(x + y) − u(x̂ + y))2 dy Ωp =2 =2 = Z Z Z e O ec O w(x, x̂)F 2 + Z Z e O ec O w(x, x̂) log w(x, x̂)dx̂dx ≤ lim inf E(un , wn ). (∇x u(x + y) − ∇x̂ u(x̂ + y))dy Z =− are uniformly bounded when |u| ≤ M . Thus, modulo the extraction of a subsequence, we may assume that un → u weakly in all Lp , 1 ≤ p < +∞ and ga ∗ (un (x + ·) − un (x̂ + ·))2 converges strongly in all Lp spaces and also in the dual of L Log + L to some function F . Then by passing to the limit as n → ∞ we have 1 h ga (y)(u(x + y) − u(x̂ + y)) Ωp Z ga (y)∇y u(x̂ + y) u(x + y)dy − Ωp are uniformly bounded when |u| ≤ M . Indeed, the boundedness of the first integral is a consequence of Z ga (y)(u(x + y) − u(x̂ + y))2 dy ∇x+x̂ +2 ∇y ga (y)(u(x̂ + y) + 1)u(x + y)dy Ωp From our assumptions, the boundedness of the previous integral holds. Thus, the functions Z ga (y)(u(x + y) − u(x̂ + y))2 dy ∇x ∇x̂ ga (y)(u(x + y) − u(x̂ + y))2 dy ∇y ga (y)u(x + y)dy Ωp Ωp Ωp and Z Z Z Z Z Z ga (y)(u(x + y) − u(x̂ + y)) Ωp (∇x u(x + y) + ∇x̂ u(x̂ + y))dy ga (y)(u(x + y) − u(x̂ + y)) Ωp (∇y u(x + y) + ∇y u(x̂ + y))dy ga (y)(∇y u(x + y)2 − ∇y u(x̂ + y)2 )dy Ωp ga (y)(u(x + y)∇y u(x̂ + y) − u(x̂ + y)∇y u(x + y))dy Ωp Observe that Z ga (y)(u(x + y)∇y u(x̂ + y) − u(x̂ + y)∇y u(x + y))dy Ωp =− Z ga (y)((u(x̂ + y) + 1)∇y u(x + y) Ωp − u(x + y)∇y u(x̂ + y))dy + =− Z ga (y)∇y Ωp − = Z Ωp Z „ Z ga (y)∇y u(x + y)dy Ωp « u(x + y) (u(x̂ + y) + 1) (u(x̂ + y) + 1)2 dy ∇y ga (y)u(x + y)dy Ωp ∇y (ga (y)(u(x̂ + y) + 1)2 ) u(x + y) dy (u(x̂ + y) + 1) n e×O e c and using Taking test functions ϕ(x, x̂), integrating in O the convexity of the square function, we have Z ga (y)(u(x + y) − u(x̂ + y))2 dy ≤ F. Ωp Then E(u, w) ≤ lim inf n E(un , wn ). ⊔ ⊓ B Algorithm for solving Equation (13) Following [2, 18] we derive here a fixed point algorithm for a discrete version of (13). Discrete formulation. We consider discrete images defined over a rectangular bounded domain Ω = {0, . . . , N }2 . We are given an inpainting domain O ⊂ Ω and the known portion of an image û : O c → R. Let us define Oe as the set of pixels with at least one neighbor in O (according to the 4-connectivity). If we think as the discrete image as a lattice of nodes joined by edges, all variable edges are between nodes in Oe . We will use the notation [A → B] = {f : A → B}, the set of functions from A to B. We define ∇ : [Ω → R] → [Ω → R2 ] as  0 if i = N , ∇u1i,j := ui+1,j − ui,j if i < N , and similarly for the component ∇u2i,j . The notation z = (i, j) refers to for locations on the image. Let us also define a divergence operator ∇· : [Ω → R2 ] → [Ω → R], 8 8 1 2 > > if i = 0, if j = 0, < pi,j < pi,j if i = N , + −p2i,j−1 if j = N , ∇·pi,j := −p1i−1,j > > : p1 − p1 : p2 − p2 i,j i−1,j otherwise, i,j i,j−1 otherwise. These operators incorporate Neumann boundary conditions on the boundary of Ω, and are dual operators, i.e. denoting h·, ·i the usual scalar products in [Ω → R2 ] and [Ω → R2 ], h∇u, pi = −hu, ∇ · pi, for all u ∈ [Ω → R], p ∈ [Ω → R2 ]. Let us also define U := [Oe → R] and V := [Oe → R2 ]. In these spaces we will consider the usual scalar products and 22 will denote them as h·, ·iU and h·, ·iV . Notice that, with these definitions, the gradient and divergence are not dual operators when restricted to Oe , i.e. in general hu, ∇ · piU 6= −h∇u, piV . b := {u ∈ U : uz = We are looking for a completion u ∈ U ûz , ∀z ∈ O c } as the solution of the following problem: min X X b u∈U z∈Oe ẑ∈D mz ẑ k∇uz − vẑ k + 1 X cz (uz − fz )2 , 2δt z∈O (19) where v : D → R2 , m : Oe × D → R+ ∪ {0}, f : O → R and c : Oe → R+ are given. For the sake of generality we consider a generic domain D instead of the complement Oec . This generalization is relevant, due for instance to computational considerations as will be discussed shortly. Let us remark that, since cz > 0 for z =∈ Oe , the energy is convex and the minimum is unique. For simplicity we extend f over Oe by defining fz = ûz for z ∈ Oe \O. We use the fact that for any η ∈ R2 , kηk = supkεk<1 η· ε, and write the energy as: min X X sup b kε k61 u∈U z ẑ z∈Oe ẑ∈D mz ẑ εz ẑ ·(∇uz −vẑ )+ 1 X cz (uz −fz )2 . 2δt z∈O e We have defined the unit field ε ∈ W := {ε : Oe × D → R2 }. Interchanging the minimum with the supremum, we obtain the following expression for u in the inpainting domain u∗z = fz + δtc−1 z ∇ · Lεz , z ∈ O. (20) where we have defined the linear operator L : W → V as Lεz = P ẑ∈D mz ẑ εz ẑ . We can extend expression (20) to Oe , by defining a new diO vergence operator ∇· : V → U as ∇ · εz := χO z ∇ · εz . Here χz = 1 O if z ∈ O and χz = 0 otherwise. We also define a modified gradient ∇· : U → V as the negative adjoint operator of ∇· , given by ∇uz := ∇[χO z uz ]. Besides allowing to extend (20) these operators are relevant since, as opposed to the usual gradient and divergence operators, they are still dual when they are restricted to functions over Oe : h∇ · p, uiU = −hp, ∇uiV . Substituting expression (20) in the energy, we arrive to the following dual formulation: min kεz ẑ k61 − « „ ˜ ˆ 1 δt Lεz · ∇ c−1 ∇ · Lε z + c−1 [∇ · Lεz ]2 2 z∈Oe 3 2 X X 4Lεz · ∇fz − mz ẑ εz ẑ · vẑ 5 . − X z∈Oe ẑ∈D Using that ∇[c−1 ∇ · Lε]z = ∇[c−1 ∇ · Lε]z for all z ∈ Ω, we can rewrite the dual problem with the more compact expression min kεz ẑ k61 X z∈Oe 2 X 1 + ν mz ẑ k − ∇fz + vẑ − δt∇[c−1 ∇ · Lεt ]z k (24) O \O where hz ẑ := mz ẑ (vẑ − ∇[χz e fz ]) and h·, ·iW is the usual scalar product in W . This expression shows that, since cz > 0, there is a unique minimum value for ∇ · Lε. We prove now that the sequence defined in (23) converges to that minimum. P Proposition 2 If cz = ẑ∈D mz ẑ 6 1 for z ∈ Oe and the time 1 step ν verifies ν < 4δtK , with K := max i,j∈Oe  χO i,j + ci,j ci+1,j O χO i+1,j ; χi,j + ci,j ci,j+1 χO i,j+1 ﬀ , (25) the sequence (εt ) defined in (23) converges to a minimum of the energy (24). Proof For reasons of space, we only sketch the main steps of the proof. The derivations are similar to those in [2, 18]. Let us 1 , the energy J ∗ decreases. Defining first show that if ν < 4δtK −1 t+1 t η := ν (ε − ε ) for a given t > 0, we have that the variation in the energy can be bounded by 1 ν(νδtkc−1/2 ∇ · Lηk2U − kηk2W ) 2 1 6J ∗ (εt ) + kηk2W ν(4νδtK − 1), 2 J ∗ (εt+1 ) 6J ∗ (εt ) + (26) where K is a constant such that 4Kkηk2W > kc−1/2 ∇ · Lηk2U for all η ∈ W . We will derive this constant later. Therefore, by choosing ν < (4δtK)−1 the energy decreases with εt . Let m = limt→∞ J ∗ (εt ) > 0. Let us prove now that εt also converges. Since kεtz ẑ k 6 1 for all z ∈ Oe , ẑ ∈ D, t > 0, there exists a subsequence (εtk ) converging to ε. From (23) we see that (εtk +1 ) converges as well. Let ε′ = limtk →∞ εtk +1 . Defining η := (ε′ − ε)/ν, we see that in the limit when tk → ∞ J ∗ (ε′ ) 6 J ∗ (ε) + 12 kηk2W ν(4νδtK − 1). Since J ∗ (ε′ ) = J ∗ (ε) = m we can conclude that η = 0 and therefore ε = ε′ . This implies that the whole sequence (εt ) converges. Furthermore the limit ε verifies the Euler equation (22), and thus is a minimizer of the dual energy J ∗ . for operators A : V → U . For any η ∈ W we can write (22) , δt −1/2 kc ∇ · Lε + c1/2 f /δtk2U + hh, εiW , 2 (21) The Karush-Kuhn-Tucker Theorem yields the existence of the Lagrange multipliers with values λz ẑ = mz ẑ k − ∇fz + vẑ − δt∇(c−1 ∇ · Lε)z k. The solution of (21) is computed with the semi-implicit scheme εtz ẑ + ν mz ẑ [−∇fz + vẑ − δt∇[c−1 ∇ · Lεt ]z ] J ∗ (ε) = We now sketch the derivation of the expression for K. Let us define the following c-weighted norms: kuk2c := hc−1 u, uiU for u ﬀ ∈ Adding the Lagrange multipliers corresponding to the restricP tions kεz ẑ k 6 1, z,ẑ∈Oe ×D λz ẑ (kεz ẑ k2 − 1), and deriving with respect to ε we get the Euler equation εt+1 z ẑ = Convergence. Let us first observe that ∇fz − ∇fz = ∇[(1 − Oe \O χO fz ] for all z ∈ Ω, which allows to express z )fz ] = ∇[χz the dual energy (21) as 3 2 4 δt c−1 mz ẑ εz ẑ · vẑ 5 . z [∇ · Lεz ] − Lεz · ∇fz + 2 ẑ∈D mz ẑ [−∇fz + vẑ − δt∇[c−1 ∇ · Lε]z ] + λz ẑ εz ẑ = 0. where ν is a time step small enough for assuring the convergence of the fixed point iteration. Lastly the solution to the primal problem is recovered with (20). (23) U , kξk2c := hc−1 ξ, ξiV for ξ ∈ V and kAk2cc := supξ∈V kAξk2 c kξk2 c kc−1/2 ∇ · Lηk2U = k∇ · Lηk2c 6 k∇ · k2cc kLηk2c . On one hand we have that 9 8 = < X 2 2 −1 2 kLηkc 6 kηkW sup cz mz ẑ 6 kηk2W , ; : z∈Oe ẑ∈D where first inequality is an application of Cauchy-Schwarz, then P 2 P 2 we have used that i ai 6 ( i |ai |) , and the fact that cz = P ẑ∈D mz ẑ 6 1. On the other hand, from the definition of the divergence op⊔ ⊓ erator, it can be shown that k∇ · k2cc 6 4K. 23 800 Error Computational considerations. Let us remark that for the case of inpainting (D = Oec ) the domain of the dual variable ε is Oe × Oec ⊂ R4 , and therefore the direct implementation of this algorithm is prohibitive (for large images with large inpainting domains ε will not fit in memory). To circumvent this problem we threshold the pixel-wise influence weights mz ẑ , so as to keep for each z only the M highest contributions. This reduces the size of ε to M |Oe |. Indeed, when h → 0 the m funcion is already sparse w.r.t. ẑ, being the number of non-zero elements less or equal than the patch size (in pixels). In our experiments we set M to the size of the patch, capturing exactly the function m. 1 2 10 30 600 400 1 2 10 10 Time (seconds) Fig. 13 Performance of the modified PatchMatch. The graph shows the evolution of the nearest neighbor’s error with the iterations of the algorithm. Each graph corresponds to a different size L of the sets (L = 1 corresponds to the original PatchMatch [7]). The dots indicate the completion of an iteration. C Computation of the Nearest Neighbor Field The computation of the nearest neighbor (or of the weight function w) is the most time consuming step of an exemplar-based algorithm. Two key observations allow to reduce the computational load for this task. The first one is that, the nearest neighbor search can be approximated without compromising the quality of the output, allowing to trade precision for speed (see for instance [13] and references therein). The second one, as noted in [7], is that due to the coherence of natural images, nearest neighbors of patches centered on nearby pixels are likely to be located at nearby positions. PatchMatch is a very efficient algorithm for computing approximate nearest neighbors proposed in [7]. The authors in [7] consider two regions O and O c of the same or different images. The objective is to find for each patch centered in x ∈ O the location of the best match n(x) ∈ O c . The authors define the Nearest Neighbor Field (NNF) as the function x 7→ (n(x) − x) defined over O. Instead of searching for the nearest neighbor of each query patch independently, PatchMatch computes the NNF simultaneously for all query patches. It exploits the fact that since query patches overlapp, the offset n(x) − x of a good match for at x is likely to lead to a good match for the adjacent points of x as well. It is an iterative algorithm which starting from a random initialization, alternates between steps of propagation of good offsets and random search (see [7] for details). For most applications with natural images, a few iterations after a random initialization are often sufficient. We extended this algorithm following a suggestion in [7]. Instead of storing a single offset for each query patch, we store queues of L offsets in an L-Nearest Neighbors Field (L-NNF). Thus the number of initial random guesses gets multiplied by L. During the propagation step adjacent queues are merged in to a temporal queue of length 2L from which only the best L offsets are kept. This allows to preserve and transfer non optimal offsets that may in turn be optimal for farther positions. Our implementation of the random search is essentially as in [7]. The use of queues represents a significant improvement of performance with respect to the original PatchMatch. Figure 13 shows the evolution of the nearest neighbor’s error measured as P 2 x∈O kpu (x) − pu (n(x))k with time, using queues of different lengths. Even L = 2 allows to achieve results of similar quality in less time. Clearly L cannot grow unbounded: with L = 30 the first iteration got to a higher precision but it took more than 10 seconds. In practice we set L ≤ 10 and perform a small number of iterations. Let us remark that each queue is an approximated neighborhood of the query patch in the patch space. This neighborhood can be interpreted as a truncated representation of the function w(x, ·) for each pixel x, which is useful in the non-degenerated case when h > 0 in (6). There is an interesting parallelism between this algorithm and the mechanic of Genetic Algorithms. The L-NNF problem can be seen as the simultaneous minimization of many fitness functions, one for each pixel in O. Each fitness function is defined as the sum of the patch errors over the corresponding queue. If we identify the individuals with the queues for each pixel in O, then the population for single problem is composed by the queue at the pixel and its immediate neighbors (5 individuals). The random search can be interpreted as mutations, the propagation as an optimal crossover between individuals (queues). The population overlap of neighboring problems responds to the fact that adjacent problems are likely to the have similar solutions. Acknowledgments The authors thank Pau Gargallo for his insightful comments. PA is supported by the FPI grant BES-2007-14451 from the Spanish MICINN. PA, GF, and VC acknowledge partial support by MICINN project, reference MTM2009-08171, and by GRC reference 2009 SGR 773. VC also acknowledges partial support by IP project ”2020 3D Media: Spacial Sound and Vision”, financed by EC, and by ”ICREA Acadèmia” prize for excellence in research funded both by the Generalitat de Catalunya. GS is partially supported by NSF, ONR, DARPA, NGA, and ARO. References 1. Aharon, M., Elad, M., Bruckstein, A.: The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representatio. IEEE Trans. on Signal Processing 54(11), 4311–22 (2006) 2. Almansa, A., Caselles, V., Haro, G., Rougé, B.: Restoration and zoom of irregularly sampled, blurred, and noisy images by accurate total variation minimization with local constraints. Multiscale Modeling & Simulation 5(1), 235– 272 (2006) 3. Arias, P., Caselles, V., Sapiro, G.: A variational framework for non-local image inpainting. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 345–358. Springer Berlin Heidelberg, Berlin, Heidelberg (2009) 4. Aujol, J.F., Ladjal, S., Masnou, S.: Exemplar-based inpainting from a variational point of view. Submitted (2008) 5. Awate, S., Whitaker, R.: Unsupervised, informationtheoretic, adaptive image filtering for image restoration. IEEE Trans. on PAMI 28(3), 364–376 (2006) 6. Ballester, C., Bertalmı́o, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans. on IP 10(8), 1200–11 (2001) 24 7. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. In: Proc. of SIGGRAPH, pp. 1– 11. ACM, New York, NY, USA (2009) 8. Bertalmı́o, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proc. of SIGGRAPH (2000) 9. Bertalmı́o, M., Vese, L., Sapiro, G., Osher, S.: Simultaneous structure and texture inpainting. IEEE Trans. on Image Processing 12(8), 882–89 (2003) 10. Bornard, R., Lecan, E., Laborelli, L., Chenot, J.H.: Missing data correction in still images and image sequences. In: Proc. ACM Int. Conf. on Multimedia (2002) 11. Bornemann, F., März, T.: Fast image inpainting based on coherence transport. J. of Math. Imag. and Vis. 28(3), 259– 78 (2007) 12. Brox, T., Cremers, D.: Iterated non-local means for texture restoration. pp. 13–24. Springer, Germany (2007) 13. Brox, T., Kleinschmidt, O., Cremers, D.: Efficient nonlocal means for denoising of textural patterns. IEEE Trans. on IP 17(7), 1057–92 (2008) 14. Buades, A., Coll, B., Morel, J.: A non local algorithm for image denoising. In: Proc. of the IEEE Conf. on CVPR, vol. 2, pp. 60–65 (2005) 15. Buades, A., Coll, B., Morel, J.M., Sbert, C.: Self-similarity driven color demosaicking. Image Processing, IEEE Transactions on 18(6), 1192–1202 (2009) 16. Candes, E.J., Wakin, M.B.: An introduction to compressive sampling. Signal Processing Magazine, IEEE 25(2), 21–30 (2008) 17. Cao, F., Gousseau, Y., Masnou, S., Pérez, P.: Geometricallyguided exemplar-based inpainting. Submitted (2008) 18. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1-2), 89–97 (2004) 19. Chan, T., Kang, S.H., Shen, J.H.: Euler’s elastica and curvature based inpaintings. SIAM J. App. Math. 63(2), 564–92 (2002) 20. Chan, T., Shen, J.H.: Mathematical models for local nontexture inpaintings. SIAM J. App. Math. 62(3), 1019–43 (2001) 21. Cheng, Y.: Mean shift, mode seeking and clustering. IEEE Trans. on PAMI 17(8), 790–99 (1995) 22. Criminisi, A., Pérez, P., Toyama, K.: Region filling and object removal by exemplar-based inpainting. IEEE Trans. on IP 13(9), 1200–1212 (2004) 23. Csiszár, I.: Axiomatic characterizations of information measures. Entropy 10(3), 261–73 (2008) 24. Demanet, L., Song, B., Chan, T.: Image inpainting by correspondence maps: a deterministic approach. Tech. rep., UCLA (2003) 25. Drori, I., Cohen-Or, D., Yeshurun, H.: Fragment-based image completion. ACM Trans. on Graphics. Special issue: Proc. of ACM SIGGRAPH 22(3), 303–12 (2003) 26. Efros, A., Leung, T.: Texture synthesis by non-parametric sampling. In: Proc. of the IEEE Conf. on CVPR, pp. 1033– 1038 (1999) 27. Elad, M., Starck, J., Querre, P., Donoho, D.: Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Applied and Computational Harmonic Analysis 19(3), 340–358 (2005) 28. Esedoglu, S., Shen, J.H.: Digital image inpainting by the Mumford-Shah-Euler image model. European J. App. Math. 13, 353–70 (2002) 29. Facciolo, G., Arias, P., Caselles, V., Sapiro, G.: Exemplarbased interpolation of sparsely sampled images. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 331–344. Springer Berlin Heidelberg (2009) 30. Fang, C.W., Lien, J.J.J.: Rapid image completion system using multiresolution patch-based directional and nondirectional approaches. Trans. Img. Proc. 18(12), 2769–2779 (2009) 31. Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. SIAM Mult. Mod. and Sim. 6(2), 595–630 (2007) 32. Gilboa, G., Osher, S.: Nonlocal operators with application to image processing. Tech. rep., UCLA (2007) 33. Hays, J., Efros, A.: Scene completion using millions of photographs (2008) 34. Holtzman-Gazit, M., Yavneh, I.: A scale-consistent approach to image completion. Int. J. Multiscale Comput. Eng. 6(6), 617–628 (2008) 35. Igehy, H., Pereira, L.: Image replacement through texture synthesis. In: Proc. of the IEEE Conf. on CVPR (1997) 36. Jaynes, E.T.: Information theory and statistical mechanics. Physical Review 106(4), 620–30 (1957) 37. Jia, J., Tang, C.K.: Inference of segmented color and texture description by tensor voting. IEEE Trans. on PAMI 26(6), 771–86 (2004) 38. Kawai, N., Sato, T., Yokoya, N.: Image inpainting considering brightness change and spatial locality of textures and its evaluation. In: T. Wada, F. Huang, S. Lin (eds.) Advances in Image and Video Technology, Lecture Notes in Computer Science, vol. 5414, chap. 24, pp. 271–282. Springer Berlin Heidelberg, Berlin, Heidelberg (2009) 39. Komodakis, N., Tziritas, G.: Image completion using efficient belief propagation via priority scheduling and dynamic pruning. IEEE Trans. on IP 16(11), 2649–61 (2007) 40. Levin, A., Zomet, A., Weiss, Y.: Learning how to inpaint from global image statistics. In: Proc. of ICCV (2003) 41. Levina, E., Bickel, P.: Texture synthesis and non-parametric resampling of random fields. Annals of Statistics 34(4) (2006) 42. Lezoray, O., Elmoataz, A., Bougleux, S.: Graph regularization for color image processing. Comput. Vis. Image Underst. 107(1-2), 38–55 (2007) 43. Mairal, J., Sapiro, G., Elad, M.: Learning multiscale sparse representations for image and video restoration. SIAM MMS 7(1), 214–241 (2008) 44. Masnou, S.: Disocclusion: a variational approach using level lines. IEEE Trans. on IP 11(2), 68–76 (2002) 45. Masnou, S., Morel, J.M.: Level lines based disocclusion. In: Proc. of IEEE ICIP (1998) 46. Morel, J.M., Yu, G.: On the consistency of the SIFT Method. Preprint, CMLA 26 (2008) 47. Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. In: Proc. of SIGGRAPH (2003) 48. Pérez, P., Gangnet, M., Blake, A.: PatchWorks: Examplebased region tiling for image editing. Tech. rep., Microsoft Research (2004) 49. Peyré, G.: Manifold models for signals and images. Comp. Vis. and Im. Unders. 113(2), 249–260 (2009) 50. Peyré, G., Bougleux, S., Cohen, L.: Non-local regularization of inverse problems. In: ECCV ’08, pp. 57–68. SpringerVerlag, Berlin, Heidelberg (2008) 51. Peyré, G., Bougleux, S., Cohen, L.D.: Non-local regularization of inverse problems. Preprint Hal-00419791 (2009) 52. Pizarro, L., Mrázek, P., Didas, S., Grewenig, S., Weickert, J.: Generalised nonlocal image smoothing. Technical Report No. 248, Department of Mathematics, Saarland University, Saarbrücken, Germany (2009) 53. Protter, M., Elad, M., Takeda, H., Milanfar, P.: Generalizing the non-local-means to super-resolution reconstruction. IEEE Trans. on IP 18(1), 36–51 (2009) 25 54. Tschumperlé, D., Deriche, R.: Vector-valued image regularization with PDE’s: a common framework for different applications. IEEE Trans. on PAMI 27(4) (2005) 55. Wei, L.Y., Levoy, M.: Fast texture synthesis using treestructured vector quantization. In: Proc. of the SIGGRAPH (2000) 56. Wexler, Y., Shechtman, E., Irani, M.: Space-time completion of video. IEEE Trans. on PAMI 29(3), 463–476 (2007) 57. Zhou, D., Schölkopf, B.: Regularization on discrete spaces. In: Proceedings of the 27th DAGM Symposium, pp. 361–368. Springer (2005)

Log In

A Variational Framework for Exemplar-Based Image Inpainting

Related papers

Related papers

Related topics