SparseDA ICNSC2013
SparseDA ICNSC2013
SparseDA ICNSC2013
Abstract—In this paper, we focus on sparse signal recovery techniques either do not explicitly incorporate model dynamics
methods for data assimilation in groundwater models. The or make additional assumptions about the system. In one of
objective of this work is to exploit the commonly understood the earliest examples of such considerations, the work by
spatial sparsity in hydrodynamic models and thereby reduce
the number of measurements to image a dynamic groundwater Jafarpour et al [6] focuses specifically on hydrogeological
profile. To achieve this we employ a Bayesian compressive sensing reservoir systems. It uses a Discrete Cosine Transform (DCT)
framework that lets us adaptively select the next measurement parametrization for states as such systems naturally exhibit
to reduce the estimation error. An extension to the Bayesian a sparse representation in the spatial domain. They use the l1
compressive sensing framework is also proposed which incorpo- norm optimization technique to promote sparse signal recovery
rates the additional model information to estimate system states
from even lesser measurements. Instead of using cumulative and performs data assimilation by minimizing the l2 norm of
imaging-like measurements, such as those used in standard the observations with the estimated states. History matching
compressive sensing, we use sparse binary matrices. This choice is done by placing weights and regularization parameters
of measurements can be interpreted as randomly sampling only a on each of the terms and the collective cost function thus
small subset of dug wells at each time step, instead of sampling obtained is minimized to estimate the system state vector.
the entire grid. Therefore, this framework offers groundwater
surveyors a significant reduction in surveying effort without However, there is no procedure defined for setting or adjusting
compromising the quality of the survey. regularization coefficients and weights and the paper mentions
that their value can be set from prior belief. Another example
I. I NTRODUCTION of using compressive sensing like ideas is [13] which studies
The focus of this paper is to investigate techniques for the application of the compressive sensing procedure for soil
state estimation using a small number of measurements and moisture levels.
to propose a framework for data assimilation using those The framework introduced in [10] also exploits the sparsity
techniques, with a special emphasis groundwater monitoring. of a given dynamical system in some transform domain and
For many geophysical processes, groundwater being a case in merges it with compressed sensing framework. It works by
point, the number of available measurements is limited. Data defining a support set of the system state as the indices at
collection for groundwater models is especially cumbersome which the transform of the state vector is non zero. This
because the measurement process for groundwater level, flow framework then runs a reduced order Kalman filter for the non
or contamination involves digging of wells, collection of water zero elements of the state transform coefficients corresponding
samples and subsequent chemical analysis. Digging wells for to the support set. As the support set changes over time, this
the entire field may not be an economically justified option. change is detected by the error between the estimate and the
Even if the wells are already dug (as is the case for agricultural observations. The change is then estimated by performing the
lands in many provinces of Pakistan and India), collecting data l1 norm minimization of the vector that represents the supports
still is an expensive and time consuming exercise. Hence a data set, based on the assumption that the sparsity pattern of the
assimilation technique that makes the most of the limited data state vector changes slowly over time and thus the vector
available is most desirable. Another desirable feature of such representing change in the non zero indices will be sparse.
a technique would be the adaptive nature of its measurements Thus this framework makes the additional assumption about
so that we can theoretically predict which observations points the system that the sparsity pattern of the system states changes
are better than others to conduct the next round of survey such slowly with time. Moreover, it also requires knowledge of the
that the new observations would reduce the estimation error support set for the initial state.
the most. This work differs from all previous approaches in two
Recently, techniques have been introduced for signal re- respects. First, we have tackled the issue of point measure-
construction that exploit a sparse representation in some ments instead of imaging-like cumulative measurements used
transform domain. Most of such techniques have been inspired in conventional compressive sensing. This opens the avenue
by the Compressive Sensing (CS) paradigm. However, these for many other geophysical applications where sweep mea-
surements are not available but where sensors probe the field We now state the problem as follows. What is the minimum
directly at sample points. Secondly, we have given a Bayesian number of measurements M to estimate the joint pdf of
framework that tracks not just point estimates (as in the case of parameters and model state, given a dynamical model with
l1 -optimization in CS) but also their probability distribution. known uncertainties and that for a fixed time, the field ψ(x, .)
This enables one to study adaptation (between and during field is spatially sparse?
measurements) as well as a formal way to incorporate sparse
III. R ECONSTRUCTING S PARSE F IELDS
measurements into a Bayesian filtering framework. Although
the later has not been been fully developed in this paper, The compressive sensing paradigm works by exploiting the
glimpses of both adaptation and model incorporation have prior knowledge that the signal to be recovered is sparse or
been demonstrated on two examples via simulation results. compressible in some transform domain. Consider an N × 1
In both examples, one a toy 1D-heat equation and the other a column vector x in RN . The signal x can be represented in
2D contamination spread model in groundwater, we have ex- another basis of N × 1 vectors {Bi }N
i=1 as
ploited the diffusive nature of the dynamics and demonstrated N
X
adaptation to changing sparsity and model incorporation. x= si Bi , or x = Bs, (3)
i=1
II. P ROBLEM F ORMULATION
where s is the column vector of the weights si = BiT x, and
We now formulate the problem following the standard B is the N × N transformation matrix with Bi as its columns.
data assimilation terminology [5]. Given a domain D with Let y be the M × 1 column vector of measurements given by
boundary ∂D parameterized by coordinates x ∈ Rd , the
spatio-temporal dynamics of a groundwater-like geophysical y = Φx = ΦBs = θs, (4)
model are given by where Φ is the M ×N measurement matrix. In our setup, x can
∂ψ(x, t) be thought of as a grid representation of a static field ψ(x, .)
= G(ψ(x, t), α(x)) + u(x, t). (1) and Φ is a matrix model for the measurement functional M
∂t
introduced in Equation (2). A good choice of transform B
Here, the field to be estimated at each time step is ψ(x, t) ∈ to model spatial sparsity is the DCT. A vector representation
RN , where N is the number of model state variables de- of a signal s is called K-sparse if it has only K non-zero
scribing the field, each variable dependent on time and space. coefficients. This is measured by the l0 norm of the signal
G(ψ, α) is the nonlinear model operator, α(x) are spatially ksk0 .
varying parameters and u is a deterministic control input or
a noisy forcing function. Appropriate initial and boundary A. Standard Compressive Sensing Framework
conditions are also needed to solve the model. In most data The general problem of reconstructing x from y with the
assimilation problems, parameters α(x) are also unknown measurement matrix Φ is equivalent to the problem of recover-
or at best poorly known. In groundwater flow models these ing s from y with the transformed M ×N measurement matrix
parameters correspond to ground porosities or other geological θ. Thus to perform reconstruction using the compressive
structures. Here, we do not use this distinction between state sensing framework, we choose a transform domain in which s
and parameters and augment the state if such a situation arises. has a highly sparse representation. The problem of recovering
The observation process is defined generically as a measure- N × 1 column vector s from the M << N measurements
ment functional M[ψ]. In this paper, we focus on measurement in y is in general under-determined and a unique solution
functionals of the form to the problem does not exist. However, if the signal to be
Z Z recovered is sparse, and the indices of the non zero elements
Mi [ψ] = ψ T (x, t)δψi δ(t − ti )δ(x − xi )dtdx, (2) in the signal are known, then recovery is possible if the
number of measurements is greater than the number of non
which model point-sampling the field at location xi and time zero elements in the signal to be recovered. A necessary and
ti ). To simply notation, we will denote such measurements as sufficient condition for this to hold is that multiplication of θ
an M -dimensional vector y ∈ RM . Assumptions on sensor with any vector v that has the same non zero elements as s
noise can also be built into this simplified model. should not alter the length of v:
We make one additional assumption on this standard data
||θv||2
assimilation framework. In many geophysical problems and 1−≤ ≤ 1 + ,
in particular due to the diffusive properties of fluid flows, it ||v||2
is reasonable to assume that the field to be estimated will for a small > 0. If the indices of the non zero elements
not have very sharp spatial variations. This can be technically in the vector to be recovered are not known, then a sufficient
captured by saying that for a fixed time, the field ψ(x, .) is condition for recovery is that (5) should hold for any arbitrary
spatially sparse in a known transform domain. Specifically, if vector with three times the number of non zero elements as in
we take a Fourier-like transform of ψ(x, .), then the field can the original signal. This is known as the Restricted Isometric
be reconstructed using the knowledge of only a fraction of the property [3], [4]. Another requirement for recovery of the
transform coefficients. signal is that the columns of B should not be combinations of
a small number of rows of Φ and also the other way round. In constructed in this way with a different transformation basis,
other words, the two matrices should be incoherent and this the reconstruction results that they have reported indicate that
condition is referred to as incoherence [3], [4]. such matrices should work well in practice with the right
To construct a measurement matrix which satisfies the amount of available data.
above conditions would mean checking (5) for all possible
arrangements of the non zero elements in the RN vector C. Bayesian Compressive Sensing
space. However, we can simply select Φ as a random matrix The sparse signal recovery problem discussed earlier can
whose entries are chosen from independent and identically also be viewed from a Bayesian viewpoint. The framework is
distributed Gaussian variables [3], [4]. demonstrate that such inspired by work on Relevance Vector machines (RVM) [9]
a matrix would satisfy both the RIP and the incoherence and provides the probability distributions for the unknowns
property with high probability. The recovery algorithm itself instead of the point estimates provided by l1 norm based
is usually via an optimization program that minimizes the optimization techniques. Thus, this framework provides addi-
ksk0 while obeying the basic measurement equation (4). The tional information about the unknown signal and the resultant
breakthrough provided by CS framework is that this difficult probability distributions can also be used for further analysis.
combinatorial problem can be solved by a relaxation using It starts from the same linear regression problem
the l1 norm instead [2]. The success of the CS framework
also lies in the fact that such a recovery is possible with M of y = θs + n, (5)
the order of K log(N/K), for a K-sparse signal which is an where n is additive sensor noise. The RVM works by assuming
astonishingly low number compared to measuring x directly a prior distribution on s for a sparse estimate and gives the
using N measurements. distributions for the non zero weights in s.
B. CS with Sparse Measurement Matrices Assuming the additive noise n to be zero mean Gaussian
with variance σ 2 , the distribution for y can be written as [7]
In the standard framework described above, typical CS
measurements created from Gaussian random matrices are 1
p(y|s, σ 2 ) = (2πσ 2 )−K/2 exp(− ||y − θs||2 ), (6)
themselves dense. This means that each measurement is a 2σ 2
cumulative or weighted average of all the states variables. This where K, as before, counts the number of non zero elements
goes against the requirement of point measurements that are in s or its sparsity. This is the Gaussian distribution for the
themselves often very sparse for groundwater monitoring, as training data y using the Bayesian approach.
explained above. Thus the measurement matrix itself needs Now we introduce prior knowledge that s is sparse by
to be sparse, in addition to satisfying RIP and incoherence placing a prior distribution on s that promotes sparsity. A
properties. hierarchical prior is imposed on s which has been observed
The construction of sparse matrices can be done in multiple to yield sparse representations for s [9]. First, a zero mean
ways. Although, we are forced to place all zeros in the columns Gaussian prior is defined on s
for which the data points are unavailable, we are free to place
N
non-zero values in the rest of the columns in any way we like. Y
p(s|α) = N (si |0, αi−1 ), (7)
For example, we can randomly place a certain number of ones
i=1
in each of these columns at a random row numbers. We can
also place a single one in each of these columns in a different with αi as the inverse of variance of each element. The αi are
row number for every column. This would mean there is only a further drawn from a Gamma distribution. Next we obtain an
single one in each row and the number of rows will be equal to expression for the posterior p(s, α, σ 2 |y) by decomposing it
the number of elements of x that are available for observation. as
Such a matrix is more sparse than the one which randomly p(s, α, σ 2 |y) = p(s|y, α, σ 2 )p(α, σ 2 |y). (8)
places ones in the free columns but is certainly more desirable
The first part of (8) can be computed as
for storage purposes. Our simulation results have shown that,
for the missing data case, it does not make a difference if the p(y|s, σ 2 )p(s|α)
p(s|y, α, σ 2 ) = , (9)
binary matrix contains multiple ones in the columns for which p(y|α, σ 2 )
the data is available or if it contains a single one in a different
1
row in each column. We borrow this construction from [1]. = Cexp(− (s − µ)T Σ−1 (s − µ)), (10)
Thus, in our simulation and experiments, we use sparse binary 2
matrices for measurement which have as many rows as the where the mean and covariance are given by
number of points whose observation is available and contain
a single one in each row for each of the available data points. µ = σ −2 Σθ T y, (11)
It is worth noting here that the Binary matrices constructed Σ = (σ −2 θ T θ + B)−1 , (12)
as defined above have also been used in [13] which studies
the application of the compressive sensing procedure for soil where B = diag(α1 , α2 , ..., αN ). The second part of (8)
moisture levels. Although, the paper uses the binary matrices p(α, σ 2 |y) is approximated by a delta distribution at its
1
most probable values [9]. This would mean searching for the = − log |B + σ −2 θ T θ| + const. (18)
maximum values of p(α, σ 2 |y) and since 2
This expression depends on the measurement matrix θ with M
p(α, σ 2 |y) ∝ p(y|α, σ 2 )p(α)p(σ 2 ), (13) rows, the estimates for α and σ −2 and some other constants.
Now suppose we wish to add another row rM +1 to θ.
we maximize p(y|α, σ 2 ) as the distributions p(α) and p(σ 2 )
The effect of adding this new row would be to change the
are considered uniform because of the choice of parameters
differential entropy of x to [7]
for their priors. Now p(y|α, σ 2 ) is computable and given by
1
hM+1 (x) = hM (x) − log(1 + σ −2 rMT
Z
+1 ΣrM +1 ). (19)
p(y|α, σ ) = p(y|s, σ 2 )p(s|α)ds,
2
(14) 2
To minimize the uncertainty in x, we maximizing the term
1
= C|σ −2 I+θB −1 θ T |−1/2 exp(−y T ( (σ −2 I+θB −1 θ T )−1 y) T
rM
2 +1 ΣrM +1 = Var(yM +1 ), (20)
(15)
Now we need to obtain values for α and σ 2 which maximize which shows that the next projection rM +1 should be con-
the function in (15). As described in [9], this can be done by structed in such a way that the next measurement yM +1 comes
an iterative algorithm. Finally, the posterior distribution for the from the most uncertain subspace. This can be done simply
T
original signal x is also a multivariate Gaussian given by by an eigen-decomposition of Σ and selecting for rM +1 the
eigenvector with the largest eigenvalue [7].
E[x] = Bµ, Cov[x] = BΣB T . (16) This procedure can be directly adopted in the standard
CS framework with dense measurement matrices. However,
Recent work on the theoretical analysis of Relevance Vector this creates a problem for constructing sparse binary matrices
machines has shown that RVM in fact provides a tighter adaptively because the eigen-analysis of Σ will almost always
approximation to the l0 norm based optimization than the give dense eigenvectors. To take care of this, we look for the
l1 norm optimization technique[11], [12]. The iterative RVM nearest sparse binary approximation to the suggested rM +1
algorithm has also been reported to perform better or at using standard dot products to the basis frame. We have
least as good as other l1 norm based optimization algorithms. not done a formal analysis on performance loss due to to
Further, the Bayesian framework gives us a handle on the this approximation but in practise this works well for most
entire distribution rather than a point estimate. This means problems we have done simulations on.
that the RVM based compressive sensing is more suitable
alternative to the more popular l1 norm minimization based B. Incorporating Model Dynamics
techniques. This also opens the door to investigating how the The Compressive sensing framework and its Bayesian coun-
future measurements can be designed to improve the estimates terpart are used to estimate an individual sparse signal from
of the more uncertain points. its limited measurements. In order to estimate a sequence of
IV. A DAPTATION AND DYNAMICS signals varying over time, when the dynamics of the changing
system acting on the signal are also known, the framework
The measurement matrix Φ (or θ in the transform domain) needs to extended to incorporate the model dynamics as well.
for the compressive sensing techniques described earlier is As seen in the section on Bayesian Compressive sensing,
describes a predetermined and fixed sensing strategy and the framework imposes the sparsity constraint on the signal
does not involve adaptation between individual measurements. by assuming a hierarchical prior. The known model dynamics
Since the measurement matrices that we use are not dense but can be viewed as additional information about the signal. The
binary (signifying a temporal sequence), we explore whether standard way to proceed from here would be to use the Bayes
such matrices can be adaptively designed, i.e. adaptively Filter algorithm [5] to incorporate this prior. The filter can be
adding rows to the matrix while ensuring that the estimate implemented using an appropriate estimation technique such
obtained keeps getting more accurate. The advantage of such as a variant of the Kalman filter or the non-parametric particle
a framework is twofold. First it lets us adaptively design filter. For details of this formal development see [8].
our survey points during a static reconstruction of the field. In this paper, we take a more practical view point and outline
Secondly, it gives us clue on how to incorporate previously a method that works for our groundwater-like problems. Sup-
learned state history in designing new measurements, after pose that the grid model of the field evolves from xt := ψ(x, t)
the field has changed due to its model dynamics. to xt+1 := ψ(x, t+∆t) following Equation 1. Further, assume
that this evolution is a linear transformation of the form
A. Adaptive Bayesian Compressive Sensing
The Bayesian CS framework produces a multivariate Gaus- xt+1 = Axt + qt , (21)
sian posterior on the signal x and as measure of the uncertainty where qt models the process noise in model dynamics. Let us
in the signal, the differential entropy of x is given by [7] also assume that initially xt has been reconstructed as mean x̂
and covariance Σt using the Bayesian compressive framework
Z
hM (x) = − p(x) log p(x)dx (17) described above. The standard Bayes’s filtering methodology
would be to first create a prediction x̂pt+1 = Ax̂t and then
correct using the measurement pdf evaluated for the next time
step. We instead use the following procedure which follows
this predict and correct methodology in spirit but is more
explicit in using the adaptation introduced in earlier sections.
We first create pseudo-measurements from the predicted
state by
p
yt+1 = Φt Ax̂t . (22)