Decoding by Linear Programming Candes Tao 2005

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO.
12, DECEMBER 2005 4203
Decoding by Linear Programming

Emmanuel J. Candes and Terence Tao
Abstract—This paper considers a natural error correcting linear code; a linear code is a given collection of codewords
problem with real valued input/output. We wish to recover an
input vector from corrupted measurements . = + which are vectors —the columns of the ma-
trix . We would like to emphasize, however, that there is a
Here, is an by (coding) matrix and is an arbitrary and
unknown vector of errors. Is it possible to recover exactly from clear distinction between our real-valued setting and the finite
the data ? alphabet one which is more common in the information theory
We prove that under suitable conditions on the coding matrix , literature. Given a vector (the “plaintext”) we can then
the input is the unique solution to the 1 -minimization problem
( := )
generate a vector in (the “ciphertext”); if has full rank,
then one can clearly recover the plaintext from the ciphertext
min . But now we suppose that the ciphertext
an arbitrary vector
is corrupted by
giving rise to the corrupted cipher-
text . The question is then: given the coding matrix
provided that the support of the vector of errors is not too large,
:= : =0 for some 0
. In short,
and , can one recover exactly?
As is well known, if the fraction of the corrupted entries is too
can be recovered exactly by solving a simple convex optimization
problem (which one can recast as a linear program). In addition, large, then of course we have no hope of reconstructing from
numerical experiments suggest that this recovery procedure works ; for instance, assume and consider two distinct
unreasonably well; is recovered exactly even in situations where plaintexts , and form a vector by concatenating
a significant fraction of the output is corrupted.
This work is related to the problem of finding sparse solutions to the first half of together with the second half of . Then
vastly underdetermined systems of linear equations. There are also where both and are supported on
significant connections with the problem of recovering signals from sets of size at most . This simple example shows that
highly incomplete measurements. In fact, the results introduced in accurate decoding is impossible when the size of the support
this paper improve on our earlier work. Finally, underlying the of the error vector is greater or equal to a half of that of the
success of 1 is a crucial property we call the uniform uncertainty
principle that we shall describe in detail. output . Therefore, a common assumption in the literature is
to assume that only a small fraction of the entries are actually
Index Terms—Basis pursuit, decoding of (random) linear codes, damaged
duality in optimization, Gaussian random matrices, 1 minimiza-
tion, linear codes, linear programming, principal angles, restricted
orthonormality, singular values of random matrices, sparse solu- (1.1)
tions to underdetermined systems.
For which values of can we hope to reconstruct with prac-
I. INTRODUCTION tical algorithms? That is, with algorithms whose complexity is
at most polynomial in the length of the code ?
A. Decoding of Linear Codes
To reconstruct , note that it is obviously sufficient to recon-
T HIS paper considers the model problem of recovering an

input vector from corrupted measurements
. Here, is an by matrix (we will assume throughout
struct the vector since knowledge of
gives
together with
, and consequently , since has full rank. Our ap-
proach is then as follows. We construct a matrix which annihi-
the paper that ), and is an arbitrary and unknown vector lates the matrix on the left, i.e., such that . This
of errors. The problem we consider is whether it is possible to can be done in an obvious fashion by taking a matrix whose
recover exactly from the data . And if so, how? kernel is the range of in , which is an -dimensional sub-
Our problem has of course the flavor of error correction prob- space (e.g., could be the orthogonal projection onto the cok-
lems which arise in coding theory as we may think of as a ernel of ). We then apply to the output and
obtain
Manuscript received February 2005; revised September 2, 2005. The work
of E. J. Candes is supported in part by the National Science Foundation under
Grants DMS 01-40698 (FRG) and ACI-0204932 (ITR), and by an A. P. Sloan (1.2)
Fellowship. The work of T. Tao is supported in part by a grant from the Packard
Foundation. since . Therefore, the decoding problem is reduced to
E. J. Candes is with the Department of Applied and Computational Mathe-
matics, California Institute of Technology, Pasadena, CA 91125 USA (e-mail: that of reconstructing a sparse vector from the observations
emmanuel@acm. caltech.edu). (by sparse, we mean that only a fraction of the entries of
T. Tao is with the Department of Mathematics, University of California, Los are nonzero). Therefore, the overarching theme is that of the
Angeles, CA 90095 USA (e-mail: [email protected]).
Communicated by M. P. Fossorier, Associate Editor for Coding Techniques. sparse reconstruction problem, which has recently attracted a lot
Digital Object Identifier 10.1109/TIT.2005.858979 of attention as we are about to see.
0018-9448/$20.00 © 2005 IEEE
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
4204 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 12, DECEMBER 2005
B. Sparse Solutions to Underdetermined Systems provided that the number of nonzero entries in the vector be
Finding sparse solutions to underdetermined systems of of the order of [1], [11]. In the special case where
linear equations is in general NP-hard. For example, the is an by random matrix with independent standard
sparsest solution is given by normal entries, [12] proved that the number of nonzero entries
may be as large as , where is some very small and
subject to (1.3) unspecified positive constant independent of .
and to the best of our knowledge, solving this problem essen- C. Innovations
tially requires exhaustive searches over all subsets of columns This paper introduces the concept of a restrictedly almost
of , a procedure which clearly is combinatorial in nature and orthonormal system—a collection of vectors which behaves
has exponential complexity. like an almost orthonormal system but only for sparse linear
Formally, given an integer matrix and an integer combinations. Thinking about these vectors as the columns
vector , the problem of deciding whether there is a vector of the matrix , we show that this condition allows for the
with rational entries such that , and with fewer than exact reconstruction of sparse linear combination of these
a fixed number of nonzero entries is NP-complete, see [2, vectors, i.e., . Our results are significantly different than those
Problem MP5]. In fact, the -minimization problem contains mentioned above as they are deterministic and do not involve
the subset-sum problem. Assume, for instance, that any kind of randomization, although they can of course be
and , and consider the following set of vectors in specialized to random matrices. For instance, we shall see
: that a Gaussian matrix with independent entries sampled from
the standard normal distribution is restrictedly almost or-
thonormal with overwhelming probability, and that minimizing
the -norm recovers sparse decompositions with a number of
where are the usual basis vectors and nonzero entries of size ; we shall actually give numerical
are integers. Now let be another integer and consider the values for .
problem of deciding whether We presented the connection with sparse solutions to under-
determined systems of linear equations merely for pedagogical
reasons. There is a more direct approach. To recover from
corrupted data , we consider solving the following
is a -sparse linear combination of the above vectors ( -minimization problem:
-sparse is impossible). This is exactly the subset-sum problem,
i.e., whether one can write as a sum of a subset of . (1.5)
It is well known that this is an NP-complete problem.
This computational intractability has recently led researchers
Now is the unique solution of if and only if is the unique
to develop alternatives to , and a frequently discussed ap-
solution of . In other words, and are equivalent
proach considers a similar program in the -norm which goes
programs. To see why this is true, observe on the one hand that
by the name of basis pursuit [3]
since , we may decompose as so that
(1.4)
where we recall that . Unlike the -norm

On the other hand, the constraint means that
which enumerates the nonzero coordinates, the -norm is
for some and, therefore,
convex. It is also well known [4] that can be recast as a
linear program (LP).
Motivated by the problem of finding sparse decompositions
of special signals in the field of mathematical signal processing
and following upon the ground breaking work of Donoho and
Huo [5], a series of beautiful articles [6]–[9] showed exact which proves the claim.
equivalence between the two programs and . In a The program may also be re-expressed as an LP—hence,
nutshell, this work shows that for by matrices ob- the title of this paper. Indeed, the -minimization problem is
tained by concatenation of two orthonormal bases, the solution equivalent to
to both and are unique and identical provided that in
the most favorable case, the vector has at most (1.6)
nonzero entries. This is of little practical use here since we are
interested in procedures that might recover a signal when a where the optimization variables are and (as
constant fraction of the output is unreliable. is standard, the generalized vector inequality means that
Using very different ideas and together with Romberg for all ). As a result, is an LP with inequality con-
[10], the authors proved that the equivalence holds with over- straints and can be solved efficiently using standard optimiza-
whelming probability for various types of random matrices tion algorithms, see [13].
CANDES AND TAO: DECODING BY LINEAR PROGRAMMING 4205
D. Restricted Isometries in , . For , the value only conveys magnitude infor-

In the remainder of this paper, it will be convenient to use mation about the vectors ; indeed is the best constant such
some linear algebra notations. We denote by the that
columns of the matrix and by the linear subspace spanned
by these vectors. Further, for any , we let be the for all (1.9)
submatrix with column indices so that
In particular, if and only if all of the ’s have unit
length. Section II-C establishes that the higher control the
orthogonality numbers .
Lemma 1.1: We have
To introduce the notion of almost orthonormal system, we first for all , .
observe that if the columns of are sufficiently “degenerate,” To see the relevance of the restricted isometry numbers
the recovery problem cannot be solved. In particular, if there to the sparse recovery problem, consider the following simple
exists a nontrivial sparse linear combination of observation.
the which sums to zero, and is any partition of
into two disjoint sets, then the vector Lemma 1.2: Suppose that is such that , and let
be such that . Let where is an arbitrary
vector supported on . Then is the unique minimizer to
has two distinct sparse representations. On the other hand, linear

so that can be reconstructed from knowledge of the vector
dependencies which involve a large number
(and the ’s).
of nonzero coefficients , as opposed to a sparse set of co-
Proof: We prove that there is a unique with
efficients, do not present an obvious obstruction to sparse re-
and obeying . Suppose for the sake of contradiction
covery. At the other extreme, if the are an orthonormal
that had two distinct sparse representations
system, then the recovery problem is easily solved by setting
where and were supported on sets obeying .
.
Then
The main result of this paper is that if we impose a “restricted
orthonormality hypothesis,” which is far weaker than assuming
orthonormality, then solves the recovery problem, even if
the are highly linearly dependent (for instance, it is pos-
By construction is supported on of size less or
sible for to be much larger than the dimension of the
equal to . Taking norms of both sides and applying (1.7) and
span of the ’s). To make this quantitative we introduce the fol-
the hypothesis we conclude that , contra-
lowing definition.
dicting the hypothesis that the two representations were distinct.
Definition 1.1 (Restricted Isometry Constants): Let be the
matrix with the finite collection of vectors as
columns. For every integer , we define the - E. Main Results
restricted isometry constants to be the smallest quantity such
Note that the previous lemma is an abstract existence argu-
that obeys
ment which shows what might theoretically be possible, but
does not supply any efficient algorithm to recover and from
(1.7) and other than by brute-force search—as discussed
earlier. In contrast, our main theorem shows that, by imposing
for all subsets of cardinality at most , and all real co- slightly stronger conditions on , the -minimization pro-
efficients . Similarly, we define the -restricted or- gram recovers exactly.
thogonality constants for to be the smallest Theorem 1.3: Suppose that is such that
quantity such that
(1.10)
(1.8)
and let be a real vector supported on a set obeying
holds for all disjoint sets of cardinality and . Put . Then is the unique minimizer to
.
The numbers and measure how close the vectors
are to behaving like an orthonormal system, but only when
restricting attention to sparse linear combinations involving no Note from Lemma 1.1 that (1.10) implies , and is in
more than vectors. These numbers are clearly nondecreasing turn implied by . Thus, the condition (1.10)
is roughly “three times as strict” as the condition required for (such that ) as a matrix with independent Gaussian
Lemma 1.2. entries. Observe that the range of is a random space of dimen-
Theorem 1.3 is inspired by our previous work [1], see also sion embedded in so that the data is the projection
[10], [11], but unlike those earlier results, our results here are de- of on a random space of dimension . The range of a by
terministic, and thus do not have a nonzero probability of failure, matrix with independent Gaussian entries precisely is a random
provided of course one can ensure that the system ver- subspace of dimension , which justifies the claim.
ifies the condition (1.10). By virtue of the previous discussion, We would like to point out that the numerical bounds we de-
we have the companion result. rived in this paper are overly pessimistic. We are confident that
finer arguments and perhaps new ideas will allow to derive ver-
Theorem 1.4: Suppose is such that and let
sions of Theorem 1.5 with better bounds. The discussion section
be a number obeying the hypothesis of Theorem 1.3. Set
will enumerate several possibilities for improvement.
, where is a real vector supported on a set of size at
most . Then is the unique minimizer to
G. A Notion of Capacity
We now develop a notion of “capacity;” that is, an upper
bound on the support size of the error vector beyond which
recovery is information-theoretically impossible. Define the
F. Gaussian Random Matrices cospark of the matrix as
An important question is then to find matrices with good re- (1.11)

stricted isometry constants, i.e., such that (1.10) holds for large
values of . Indeed, such matrices will tolerate a larger fraction We call this number the cospark because it is related to another
of output in error while still allowing exact recovery of the orig- quantity others have called the spark. The spark of a matrix
inal input by linear programming. How to construct such ma- is the minimal number of columns from that are linearly de-
trices might be delicate. In Section III, however, we will argue pendent [6], [7]. In other words
that generic matrices, namely, samples from the Gaussian uni- subject to
tary ensemble obey (1.10) for relatively large values of .
Theorem 1.5: Assume and let be a by matrix Suppose has full rank and assume that is any full rank
whose entries are independent and identically distributed (i.i.d.) matrix such that , then , which
Gaussian with mean zero and variance . Then the condition simply follows from
of Theorem 1.3 holds with overwhelming probability provided s. t.
that is small enough so that
since means that is in the range of and, therefore,
of the form where is nonzero since does not vanish.
where is given in (3.23). (By “with overwhelming Lemma 1.7: Suppose the fraction of errors obeys
probability,” we mean with probability decaying exponentially
in .) In the limit of large samples, only depends upon the (1.12)
ratio, and numerical evaluations show that the condition holds
for in the case where , then, exact decoding is achieved by minimizing .
when , and when . Conversely, if , then accurate decoding is
not possible.
In other words, Gaussian matrices are a class of matrices for Proof: Suppose first that the number of errors obeys
which one can solve an underdetermined systems of linear equa- and let be of full rank with .
tions by minimizing provided, of course, the input vector has Since minimizing is equivalent to finding the
fewer than nonzero entries with . We mentioned shortest so that , and that the proof of Lemma 1.2
earlier that this result is similar to [12]. What is new here is that assures that is the unique vector with at most
by using a very different machinery, one can obtain explicit nu- nonzero entries, the recovery is exact.
merical values which were not available before. The converse is immediate. Let be a vector such that
In the context of error correcting, the consequence is that a and write where both
fraction of the output may be corrupted by arbitrary errors and and are less or equal to . Let be
yet, solving a convex problem would still recover exactly—a any vector in and set . Then
rather unexpected feat.
Corollary 1.6: Suppose is an by Gaussian matrix and
set . Under the hypotheses of Theorem 1.5, the and, therefore, and are indistinguishable.
solution to is unique and equal to .
Corollary 1.8: Suppose the entries of are independent
This is an immediate consequence of Theorem 1.5. The only normal with mean and variance . Then with probability
thing we need to argue is why we may think of the annihilator , . As a consequence, one can
theoretically recover any plaintext if and only if the fraction we ask that be a subgradient of at the point [13], which
error obeys here means
if and otherwise.
(1.13)
Since by definition , conditions i) and ii) are
In other words, given a fraction of error , accurate decoding is now natural.
possible provided that the “rate” obeys .
This may help to establish a link between this work and other A. Proof of Theorem 1.3
information literature. Observe first that standard convex arguments give that there
Proof: It is clear that as one can exists at least one minimizer to the problem .
find an so that the first entries of vanish. Fur- We need to prove that . Since obeys the constraints of
ther, all the by submatrices of are invertible with prob- this problem, obeys
ability . This follows from the fact that each by subma-
trix is invertible with probability , and that there is a finite (2.14)
number of them. With probability one, therefore, it is impos-
sible to find such that has vanishing entries. This
Now take a obeying properties i) and ii) (see the remark fol-
gives .
lowing Lemma 2.2). Using the fact that the inner product
is equal to the sign of on and has absolute value strictly less
H. Organization of the Paper than one on the complement, we then compute
The paper is organized as follows. Section II proves our main
claim, namely, Theorem 1.3 (and hence Theorem 1.4) while
Section III introduces elements from random matrix theory to
establish Theorem 1.5. In Section IV, we present numerical ex-
periments which suggest that in practice, works unreason-
ably well and recovers the exactly from provided
that the fraction of the corrupted entries be less than about 17%
in the case where and less than about 34% in the case
where . Section V explores the consequences of our
results for the recovery of signals from highly incomplete data
and ties our findings with some of our earlier work. Finally, we
conclude with a short discussion section whose main purpose is
to outline areas for improvement.
II. PROOF OF MAIN RESULTS Comparing this with (2.14) we see that all the inequalities in the
Our main result, namely, Theorem 1.3 is proved by duality. above computation must in fact be equality. Since was
As we will see in Section II-A, is the unique minimizer if the strictly less than for all , this in particular forces
matrix has full rank and if one can find a vector with the for all . Thus,
two properties
i) , for all ;
ii) and , for all ;
where is the sign of ( , for ). Applying (1.7) (and noting from hypothesis that ) we
The two conditions above say that a specific dual program is conclude that for all . Thus, as claimed.
feasible and is called the exact reconstruction property in [1], For with obeying the hypothesis of Theorem 1.3,
see also [10]. The reader might find it helpful to see how these has full rank since and thus, the proof simply consists
conditions arise and we first informally sketch the argument (a in constructing a dual vector ; this is the object of the next
rigorous proof follows in Section II-A). Suppose we wish to section. This concludes the proof of our theorem.
minimize subject to , and that is differentiable.
Together with , the classical Lagrange multiplier op- B. Exact Reconstruction Property
timality condition (see [13]) asserts that there exists ( is a We now examine the sparse reconstruction property and begin
Lagrange multiplier) such that with Lemma 2.1 which establishes that the coefficients ,
, are suitably bounded except for an exceptional set.
Lemma 2.2 strengthens this results by eliminating the excep-
tional set.
In a nutshell, for to be a solution, we must have
. In our case , and so Lemma 2.1 (Dual Reconstruction Property, Version): Let
for , but is otherwise not smooth at zero. In this case, be such that , and be a real vector supported
on such that . Then there exists a vector whenever and . In particular, if we set
such that for all . Furthermore, there is an
“exceptional set” which is disjoint from , of size at
most
then must obey , since otherwise we could con-
(2.15) tradict (2.17) by taking a subset of of cardinality . The
claims now follow.
and with the properties
We now derive a solution with better control on the sup norm
for all of outside of , by iterating away the exceptional set
(while keeping the values on fixed). The proof of the fol-
and lowing lemma is given in the Appendix.
Lemma 2.2 (Dual Reconstruction Property, Version): Let
be such that , and be a real vector
supported on obeying . Then there exists a vector
where for short. In addition, for such that for all . Furthermore,
some constant only depending upon . obeys
Proof: Recall that is the linear transfor-
mation where (we use the for all
subscript in to emphasize that the input is a -dimen- (2.18)
sional vector), and let be the adjoint transformation
Lemma 2.2 actually solves the dual recovery problem. In-
deed, our result states that one can find a vector obeying
Property (1.7) gives both properties i) and ii) stated at the beginning of the section.
To see why ii) holds, observe that
and, therefore, (2.18) gives for all
where and are the minimum and maximum eigen-
values of the positive-definite operator . In particular,
since , we see that is invertible with
provided that .
(2.16) It is likely that one may push the condition
a little further. The key idea is as follows. Each vector in
Also note that and set the iteration scheme used to prove Lemma 2.2 was designed to
to be the vector annihilate the influence of on the exceptional set .
But annihilation is too strong of a goal. It would be just as suit-
able to design to moderate the influence of enough
it is then clear that , i.e., for all . so that the inner product with elements in is small rather
In addition, with . than zero. However, we have not pursued such refinements as
Finally, if is any set in disjoint from with and the arguments would become considerably more complicated
is any sequence of real numbers, then (1.8) and than the calculations presented here.
(2.16) give
C. Approximate Orthogonality
Lemma 1.1 gives control of the size of the principal angle
between subspaces of dimension and , respectively. This is
useful because it allows to guarantee exact reconstruction from
the knowledge of the numbers only.
Proof Proof of Lemma 1.1: We first show that
. By homogeneity it will suffice to show that
since was arbitrary, it follows from the variational definition

of the norm, , that
whenever , , , are disjoint, and
In other words
Now (1.7) gives
(2.17)
together with tion inequalities to develop such a uniform bound. Note that for
, we obviously have and
and, therefore, attention may
be restricted to matrices of size . Now, there are large devia-
and the claim now follows from the parallelogram identity tion results about the singular values of [18]. For example,
letting (resp., ) be the largest singular value of
so that (resp.,
It remains to show that . Again by homo- ), Ledoux [19] applies the concentration inequality
geneity, it suffices to establish that for Gaussian measures, and for each fixed , obtains the de-
viation bounds
(3.19)
whenever and . To prove this (3.20)

property, we partition as where and
here the scaling of interest has constant, and is a
and write and .
small term tending to zero as and which can be calcu-
Equation (1.7) together with (1.8) give
lated explicitly, see [20]. For example, this last reference shows
that one can select in (3.19) as .
Lemma 3.1: Put and set
and
where is the entropy function
defined for . For each , the
restricted isometry constant of a by Gaussian matrix
Hence, obeys (for and large enough)
(3.21)
Proof: As discussed above, we may restrict our attention
to sets such that . Set for short and
let where the maximum is over all sets
as claimed. (We note that it is possible to optimize this bound a of size , and similarly . Then
little further but will not do so here.) and
III. GAUSSIAN RANDOM MATRICES It will suffice to prove that

In this section, we argue that with overwhelming probability,
Gaussian random matrices have “good” isometry constants. and
Consider a by matrix whose entries are i.i.d. Gaussian
with mean zero and variance and let be a subset of the
columns. We wish to study the extremal eigenvalues of . as for we have . Both statements are
Following upon the work of Marchenko and Pastur [14], Geman proved in exactly the same way and we only include that for .
[15], and Silverstein [16] (see also [17]) proved that Denote by the -term appearing in (3.19). Observe that
a.s.
a.s.
in the limit where and with
From Stirling’s approximation

In other words, this says that loosely speaking and in the limit we have the well-known formula
of large , the restricted isometry constant for a fixed
behaves like
which gives
where
We have
Restricted isometry constants must hold for all sets of car-
dinality less or equal to , and we shall make use of concentra-
Fig. 1. Behavior of the upper bound (r ) for three values of the ratio p=m, namely, p=m = 3=4; 2=3; 1=2.
and suppose and are large enough so that may be taken as any value obeying which we
used to give numerical values in Theorem 1.5. Fig. 1 graphs the
function for several values of the ratio .
Then
IV. NUMERICAL EXPERIMENTS
This section investigates the practical ability of to recover
an object from corrupted data , (or
and, therefore,
equivalently to recover the sparse vector of errors from
the underdetermined system of equations ).
The goal here is to evaluate empirically the location of the break-
point as to get an accurate sense of the performance one might
expect in practice. In order to do this, we performed a series of
experiments designed as follows:
For sufficiently large, the term is less than 1) select (the size of the input signal) and so that with the
, which concludes the proof. same notations as before, is an by matrix; sample
with independent Gaussian entries;
Ignoring the ’s, Lemma 3.1 states that with overwhelming 2) select as a percentage of ;
probability 3) select a support set of size uniformly at
random, and sample a vector on with i.i.d. Gaussian
(3.22)
entries;1
A similar conclusion holds for and and, therefore, we 4) make (the choice of does not matter as is
established that with very high probability clear from the discussion and here, is also selected at
random), solve and obtain ;
(3.23) 5) compare to ;
6) repeat 100 times for each and ;
with 7) repeat for various sizes of and .
The results are presented in Figs. 2 and 3. Fig. 2 examines
the situation in which the length of the code is twice that of
the input vector , for and . Our
In conclusion, Lemma 1.1 shows that the hypothesis of our main experiments show that one recovers the input vector all the time
theorem holds provided that the ratio is small so that 1Just as in [11], the results presented here do not seem to depend on the actual
. In other words, in the limit of large samples, distribution used to sample the errors.
Fig. 2. ` -recovery of an input signal from y = Af + e with A an m by n matrix with independent Gaussian entries. In this experiment, we “oversample” the
input signal by a factor 2 so that m = 2n. (a) Success rate of (P ) for m = 512. (b) Success rate of (P ) for m = 1024. Observe the similar pattern and cutoff
point. In these experiments, exact recovery occurs as long as about 17% or less of the entries are corrupted.
Fig. 3. ` -recovery of an input signal from y = Af + e with A an m by n matrix with independent Gaussian entries. In this experiment, we “oversample” the
input signal by a factor 4 so that m = 4n. In these experiments, exact recovery occurs as long as about 34% or less of the entries are corrupted.
as long as the fraction of the corrupted entries is below 17%. 37.5%. One needs to apply caution, however, as
This holds for (Fig. 2(a)) and (Fig. 2(b)). these two quantities refer to two distinct situations. On the one
In Fig. 3, we investigate how these results change as the length hand, the limit guarantees that all signals
of the codewords increases compared to the length of the input, may be decoded exactly. On the other hand, the numerical sim-
and examine the situation in which , with . Our ulations examine the case where both and the error vector
experiments show that one recovers the input vector all the time are chosen randomly, and does not search for the worst possible
as long as the fraction of the corrupted entries is below 34%. pair. In this sense, the simulations are not really “testing” The-
It might be tempting to compare the empirical cutoff observed orem 1.5. Under a probability model for the error pattern , say,
in the figures with the theoretical limit calculated in Corollary it is clear that information theoretically, one can tolerate a larger
1.8. For example, for , Fig. 3 suggests a cutoff fraction of errors if one wishes to be able to decode accurately
near 35% while Corollary 1.8 gives a theoretical limit of only for most as opposed to all error vectors.
V. OPTIMAL SIGNAL RECOVERY Suppose, for example, that is a Gaussian random matrix as
Our recent work [1] developed a set of ideas showing that it is in Section III. We will assume the same special normalization
surprisingly possible to reconstruct interesting classes of signals so that the variance of each individual entry is equal to .
accurately from highly incomplete measurements. The results Calculations identical to those from Section III give that with
in this paper are inspired and improve upon this earlier work overwhelming probability, obeys the hypothesis of the the-
orem provided that
and we now elaborate on this connection. Suppose we wish to
reconstruct an object in from the linear measurements
or (5.24)
for some positive constant . Now consider the statement
with , the th row of the matrix . Of special interest is the of the theorem; there is a way to invoke linear programming
vastly underdetermined case , where there are many and obtain a reconstruction based upon mea-
more unknowns than observations. We choose to formulate the surements only, which is at least as good as that one would
problem abstractly but for concreteness, we might think of as achieve by knowing all the information about and selecting
the coefficients of a digital signal or image in some the largest coefficients. In fact, this is an optimal statement
nice orthobasis, e.g., a wavelet basis, so that the information as (5.27) correctly identifies the minimum number of measure-
about the signal is of the form . ments needed to obtain a given precision. In short, it is impos-
Suppose now that the object of interest is compressible in the sible to obtain a precision of about with fewer than
sense that the reordered entries of decay like a power law; con- measurements, see [1], [21].
cretely, suppose that the entries of , rearranged in decreasing Theorem 5.1 is stronger than our former result, namely, The-
order of magnitude, , obey orem 1.4 in [1]. To see why this is true, recall the former claim:
[1] introduced two conditions, the uniform uncertainty principle
(5.25) (UUP) and the exact reconstruction principle (ERP). In a nut-
shell, a random matrix obeys the UUP with oversampling
for some . We will denote by the class of all sig- factor if obeys
nals obeying (5.25). The claim is that it is possible to
reconstruct compressible signals from only a small number of (5.28)
random measurements.
with probability at least for some fixed positive
Theorem 5.1: Let be the measurement matrix as in (5.24)
constant . Second, a measurement matrix obeys the
and consider the solution to
ERP with oversampling factor if for each fixed subset of
subject to (5.26) size (5.28) and each “sign” vector defined on , there
exists with the same overwhelmingly large probability a vector
Let such that and set . with the following properties:
Then obeys i) , for all ;
ii) and , for all not in .
(5.27) Note that these are the conditions listed at the beginning of Sec-
tion II except for the factor on the complement of . Fix
To appreciate the content of the theorem, suppose one would . [1] argued that if a random matrix obeyed the UUP
have available an oracle letting us know which coefficients , and the ERP both with oversampling factor , then
, are large (e.g., in the scenario we considered ear-
lier, the oracle would tell us which wavelet coefficients of are
large). Then we would acquire information about the largest with inequality holding with the same probability as before.
coefficients and obtain a truncated version of obeying Against this background, several comments are now in order.
• First, the new statement is more general as it applies to all
matrices, not just random matrices.
for generic elements taken from . Now (5.27) says • Second, whereas our previous statement argued that for
that not knowing anything about the location of the largest each , one would be able—with high proba-
coefficients, one can essentially obtain the same approxima- bility—to reconstruct accurately, it did not say anything
tion error by nonadaptive sampling, provided the number of about the worst case error for a fixed measurement matrix
measurements is increased by a factor . The larger , the . This is an instance where the order of the quantifiers
smaller the oversampling factor, and hence, the connection plays a role. Do we need different ’s for different ob-
with the decoding problem. Such considerations make clear jects? Theorem 5.1 answers this question unambiguously;
that Theorem 5.1 supplies a very concrete methodology for the same will provide an optimal reconstruction for all
recovering a compressible object from limited measurements the objects in the class.
and as such, it may have a significant bearing on many fields of • Third, Theorem 5.1 says that the ERP condition is redun-
science and technology. We refer the reader to [1] and [21] for dant, and hence the hypothesis may be easier to check in
a discussion of its implications. practice. In addition, eliminating the ERP isolates the real
reason for success as it ties everything down to the UUP. and developed a large deviation bound to quantify the depar-
In short, the ability to recover an object from limited mea- ture from the right-hand side. Now let and be two dis-
surements depends on how close is to an orthonormal joint sets of respective sizes and and consider :
system, but only when restricting attention to sparse linear is the cosine of the principal angle between the two
combinations of columns. random subspaces spanned by the columns of and re-
We will not prove this theorem as this is a minor modification spectively; formally
of that of Theorem 1.4 in the aforementioned reference. The key
point is to observe that if obeys the hypothesis of our theorem,
then by definition obeys the UUP with probability one, but
also obeys the ERP, again with probability one, as this is the
content of Lemma 2.2. Hence, both the UUP and ERP hold and We remark that this quantity plays an important analysis in sta-
therefore, the conclusion of Theorem 5.1 follows. (The fact that tistical analysis because of its use to test the significance of cor-
the ERP actually holds for all sign vectors of size less than is relations between two sets of measurements, compare the litera-
the reason why (5.27) holds uniformly over all elements taken ture on canonical correlation analysis [26]. Among other things,
from , see [1].) it is known [27] that
VI. DISCUSSION a.s.
A. Connections With Other Works as with and . In other words,

In our linear programming model, the plaintext and ciphertext whereas we used the limiting behaviors
had real-valued components. Another intensively studied model
occurs when the plaintext and ciphertext take values in the finite
field . In recent work of Feldman et al. [22]–[25],
there is a chance one might employ instead
linear programming methods (based on relaxing the space of
codewords to a convex polytope) were developed to establish a
polynomial-time decoder which can correct a constant fraction
of errors, and also achieve the information-theoretic capacity of for , and
the code. Our methods are restricted to real-valued texts, and
the work cited above requires texts in . Also, our error anal-
ysis is deterministic and is thus guaranteed to correct arbitrary
errors provided that they are sufficiently sparse. In summary, for , which are better estimates. Just as in
there does not seem to be any explicit known connection with Section III, one might then look for concentration inequalities
this line of work but it would perhaps be of future interest to transforming this limiting behavior into corresponding large de-
explore if there is one. viation inequalities. We are aware of very recent work of John-
The ideas presented in this paper may be adapted to recover stone and his colleagues [28] which might be here of substantial
input vectors taking values from a finite alphabet. We hope to help.
report on work in progress in a followup paper. Finally, tighter large deviation bounds might exist together
with more clever strategies to derive uniform bounds (valid for
B. Improvements all of size less than ) from individual bounds (valid for a
There is little doubt that more elaborate arguments will yield single ). With this in mind, it is interesting to note that our
versions of Theorem 1.5 with tighter bounds. Immediately fol- approach hits a limit as
lowing the proof of Lemma 2.2, we already remarked that one
might slightly improve the condition at (6.29)
the expense of considerable complications. More to the point,
we must admit that we used well-established tools from random where .
matrix theory and it is likely that more sophisticated ideas might Since is greater than if and only if , one would
be deployed successfully. We now discuss some of these. certainly need new ideas to improve Theorem 1.5 beyond the
Our main hypothesis reads but in cutoff point in the range of about 2%. The lower limit (6.29) is
order to reduce the problem to the study of those numbers probably not sharp since it does not explicitly take into account
(and use known results), our analysis actually relied upon the the ratio between and ; at best, it might serve as an indication
more stringent condition instead, since of the limiting behavior when the ration is not too small.
C. Other Coding Matrices

This introduces a gap. Consider a fixed set of size . This paper introduced general results stating that it is pos-
Using the notations of that Section III, we argued that sible to correct for errors by -minimization. We then explained
how the results specialize in the case where the coding matrix
is sampled from the Gaussian ensemble. It is clear, however,
that one could use other matrices and still obtain similar results; Consider first , it follows from the construction that
namely, that recovers exactly provided that the number and for all , and hence,
of corrupted entries does not exceed . In fact, our previous
work suggests that partial Fourier matrices would enjoy sim- for all
ilar properties [1], [11]. Other candidates might be the so-called
noiselets of Coifman, Geshwind, and Meyer [29]. These alter- Second, fix with and let . Since
natives might be of great practical interest because they would and are disjoint, we see that the integers in the set
come with fast algorithms for applying or to an arbi- are spaced at least two apart. Now if , then by definition
trary vector and, hence, speed up the computations to find the and, therefore,
-minimizer.
APPENDIX In other words, the and terms in (1.30) cancel each other
PROOF OF LEMMA 2.2 out. Thus, we have
We may normalize . Write . Using
Lemma 2.1, we can find a vector and a set such
that
On the other hand, if and , then
and
for all
for all
which by the triangle inequality and the geometric series for-
mula gives
Applying Lemma 2.1 iteratively gives a sequence of vectors

and sets for all with the prop- In conclusion
erties
and since , the claim follows.

for all
ACKNOWLEDGMENT
for all
The authors wish to thank Rafail Ostrovsky for pointing
out possible connections between their earlier work and the
for all decoding problem. E. J. Candes would also like to acknowledge
inspiring conversations with Leonard Schulmann and Justin
Romberg.
REFERENCES
[1] E. J. Candes and T. Tao, “Near optimal signal recovery from random
projections: Universal encoding strategies?,” IEEE Trans. Inf. Theory.
Available on the AxXiv preprint server: math:CA=0410542, submitted
By hypothesis, we have . Thus, if we set for publication.
[2] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide
the Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[3] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition
by basis pursuit,” SIAM J. Sci. Comput., vol. 20, pp. 33–61, 1998.
[4] P. Bloomfield and W. Steiger, Least Absolute Deviations: Theory, Appli-
cations, and Algorithms. Boston, MA: Birkhäuser, 1983.
then the series is absolutely convergent and, therefore, is a [5] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic de-
well-defined vector in . We now study the coefficients composition,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845–2862,
Nov. 2001.
[6] R. Gribonval and M. Nielsen, “Sparse representations in unions of
bases,” IEEE Trans. Inf. Theory, vol. 49, no. 12, pp. 3320–3325, Dec.
(1.30) 2003.
[7] D. L. Donoho and M. Elad, “Optimally sparse representation in general
(nonorthogonal) dictionaries via ` minimization,” in Proc. Nat. Acad.
for . Sci. USA, vol. 100, 2003, pp. 2197–2202.
[8] M. Elad and A. M. Bruckstein, “A generalized uncertainty principle and [19] M. Ledoux, The Concentration of Measure Phenomenon. Providence,
sparse representation in pairs of R bases,” IEEE Trans. Inf. Theory, RI: Amer. Math. Soc., 2001, vol. 89, Mathematical Surveys and Mono-
vol. 48, no. 9, pp. 2558–2567, Sep. 2002. graphs.
[9] J. A. Tropp, “Greed is good: Algorithmic results for sparse approxima- [20] N. El Karoui, “New results about random covariance matrices and sta-
tion,” IEEE Trans. Inf. Theory, vol. 50, no. 10, pp. 2231–2242, Oct. 2004. tistical applications,” Ph. .D. dissertation, Stanford Univ., Stanford, CA,
[10] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: 2004.
Exact signal reconstruction from highly incomplete frequency informa- [21] D. L. Donoho, “Compressed sensing,” unpublished manuscript, Sep.
tion,” IEEE Trans. Inf. Theory. Available on the ArXiV preprint server: 2004.
math:GM=0409186, submitted for publication. [22] J. Feldman, “Decoding error-correcting codes via linear programming,”
[11] E. J. Candes and J. Romberg, “Quantitative robust uncertainty principles Ph.D. dissertation, MIT, Cambridge, MA, 2003.
and optimally sparse decompositions,” Found. Comput. Math.. Available [23] , “LP decoding achieves capacity,” presented at the ACM-SIAM
on the ArXiV preprint server: math:CA=0411273, to be published. Symp. Discrete Algorithms (SODA), Vancouver, BC, Canada, Jan.
[12] D. L. Donoho, “For most large underdetermined systems of linear equa- 2005.
tions the minimal ` -norm solution is also the sparsest solution,” unpub- [24] J. Feldman, T. Malkin, C. Stein, R. A. Servedio, and M. J. Wainwright,
lished manuscript, Sep. 2004. “LP decoding corrects a constant fraction of errors,” in Proc. IEEE
[13] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, Int. Symp. Information Theory (ISIT), Chicago, IL, Jun./Jul 2004, p.
U.K.: Cambridge Univ. Press, 2004. 68.
[14] V. A. Marchenko and L. A. Pastur, “Distribution of eigenvalues in certain [25] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using linear program-
sets of random matrices” (in Russian), Mat. Sbornik (N.S.), vol. 72, pp. ming to decode binary linear codes,” IEEE Trans. Inf. Theory, vol. 51,
407–535, 1967. no. 3, pp. 954–972, Mar. 2005.
[15] S. Geman, “A limit theorem for the norm of random matrices,” Ann. [26] R. J. Muirhead, Aspects of Multivariate Statistical Theory. New York:
Probab., vol. 8, pp. 252–261, 1980. Wiley, 1982, Wiley Series in Probability and Mathematical Statistics.
[16] J. W. Silverstein, “The smallest eigenvalue of a large dimensional [27] K. W. Wachter, “The limiting empirical measure of multiple discrimi-
Wishart matrix,” Ann. Probab., vol. 13, pp. 1364–1368, 1985. nant ratios,” Ann. Statist., vol. 8, pp. 937–957, 1980.
[17] Z. D. Bai and Y. Q. Yin, “Limit of the smallest eigenvalue of a [28] I. M. Johnstone, “Large covariance matrices (Third 2004 Wald Lec-
large-dimensional sample covariance matrix,” Ann. Probab., vol. 21, ture),” presented at the 6th World Congress of the Bernoulli Society and
pp. 1275–1294, 1993. 67th Annual Meeting of the IMS, Barcelona, Spain, 2004.
[18] S. J. Szarek, “Condition numbers of random matrices,” J. Complexity, [29] R. Coifman, F. Geshwind, and Y. Meyer, “Noiselets,” Appl. Comput.
vol. 7, pp. 131–149, 1991. Harmon. Anal., vol. 10, pp. 27–44, 2001.

Decoding by Linear Programming Candes Tao 2005

Uploaded by

Copyright:

Available Formats

Decoding by Linear Programming Candes Tao 2005

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decoding by Linear Programming Candes Tao 2005

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO.

12, DECEMBER 2005 4203

Decoding by Linear Programming

T HIS paper considers the model problem of recovering an

where we recall that . Unlike the -norm

D. Restricted Isometries in , . For , the value only conveys magnitude infor-

has two distinct sparse representations. On the other hand, linear

An important question is then to find matrices with good re- (1.11)

since was arbitrary, it follows from the variational definition

whenever and . To prove this (3.20)

III. GAUSSIAN RANDOM MATRICES It will suffice to prove that

From Stirling’s approximation

VI. DISCUSSION a.s.

A. Connections With Other Works as with and . In other words,

C. Other Coding Matrices

Applying Lemma 2.1 iteratively gives a sequence of vectors

and since , the claim follows.

You might also like