Decoding by Linear Programming Candes Tao 2005
Decoding by Linear Programming Candes Tao 2005
Decoding by Linear Programming Candes Tao 2005
Abstract—This paper considers a natural error correcting linear code; a linear code is a given collection of codewords
problem with real valued input/output. We wish to recover an
input vector from corrupted measurements . = + which are vectors —the columns of the ma-
trix . We would like to emphasize, however, that there is a
Here, is an by (coding) matrix and is an arbitrary and
unknown vector of errors. Is it possible to recover exactly from clear distinction between our real-valued setting and the finite
the data ? alphabet one which is more common in the information theory
We prove that under suitable conditions on the coding matrix , literature. Given a vector (the “plaintext”) we can then
the input is the unique solution to the 1 -minimization problem
( := )
generate a vector in (the “ciphertext”); if has full rank,
then one can clearly recover the plaintext from the ciphertext
min . But now we suppose that the ciphertext
an arbitrary vector
is corrupted by
giving rise to the corrupted cipher-
text . The question is then: given the coding matrix
provided that the support of the vector of errors is not too large,
:= : =0 for some 0
. In short,
and , can one recover exactly?
As is well known, if the fraction of the corrupted entries is too
can be recovered exactly by solving a simple convex optimization
problem (which one can recast as a linear program). In addition, large, then of course we have no hope of reconstructing from
numerical experiments suggest that this recovery procedure works ; for instance, assume and consider two distinct
unreasonably well; is recovered exactly even in situations where plaintexts , and form a vector by concatenating
a significant fraction of the output is corrupted.
This work is related to the problem of finding sparse solutions to the first half of together with the second half of . Then
vastly underdetermined systems of linear equations. There are also where both and are supported on
significant connections with the problem of recovering signals from sets of size at most . This simple example shows that
highly incomplete measurements. In fact, the results introduced in accurate decoding is impossible when the size of the support
this paper improve on our earlier work. Finally, underlying the of the error vector is greater or equal to a half of that of the
success of 1 is a crucial property we call the uniform uncertainty
principle that we shall describe in detail. output . Therefore, a common assumption in the literature is
to assume that only a small fraction of the entries are actually
Index Terms—Basis pursuit, decoding of (random) linear codes, damaged
duality in optimization, Gaussian random matrices, 1 minimiza-
tion, linear codes, linear programming, principal angles, restricted
orthonormality, singular values of random matrices, sparse solu- (1.1)
tions to underdetermined systems.
For which values of can we hope to reconstruct with prac-
I. INTRODUCTION tical algorithms? That is, with algorithms whose complexity is
at most polynomial in the length of the code ?
A. Decoding of Linear Codes
To reconstruct , note that it is obviously sufficient to recon-
B. Sparse Solutions to Underdetermined Systems provided that the number of nonzero entries in the vector be
Finding sparse solutions to underdetermined systems of of the order of [1], [11]. In the special case where
linear equations is in general NP-hard. For example, the is an by random matrix with independent standard
sparsest solution is given by normal entries, [12] proved that the number of nonzero entries
may be as large as , where is some very small and
subject to (1.3) unspecified positive constant independent of .
and to the best of our knowledge, solving this problem essen- C. Innovations
tially requires exhaustive searches over all subsets of columns This paper introduces the concept of a restrictedly almost
of , a procedure which clearly is combinatorial in nature and orthonormal system—a collection of vectors which behaves
has exponential complexity. like an almost orthonormal system but only for sparse linear
Formally, given an integer matrix and an integer combinations. Thinking about these vectors as the columns
vector , the problem of deciding whether there is a vector of the matrix , we show that this condition allows for the
with rational entries such that , and with fewer than exact reconstruction of sparse linear combination of these
a fixed number of nonzero entries is NP-complete, see [2, vectors, i.e., . Our results are significantly different than those
Problem MP5]. In fact, the -minimization problem contains mentioned above as they are deterministic and do not involve
the subset-sum problem. Assume, for instance, that any kind of randomization, although they can of course be
and , and consider the following set of vectors in specialized to random matrices. For instance, we shall see
: that a Gaussian matrix with independent entries sampled from
the standard normal distribution is restrictedly almost or-
thonormal with overwhelming probability, and that minimizing
the -norm recovers sparse decompositions with a number of
where are the usual basis vectors and nonzero entries of size ; we shall actually give numerical
are integers. Now let be another integer and consider the values for .
problem of deciding whether We presented the connection with sparse solutions to under-
determined systems of linear equations merely for pedagogical
reasons. There is a more direct approach. To recover from
corrupted data , we consider solving the following
is a -sparse linear combination of the above vectors ( -minimization problem:
-sparse is impossible). This is exactly the subset-sum problem,
i.e., whether one can write as a sum of a subset of . (1.5)
It is well known that this is an NP-complete problem.
This computational intractability has recently led researchers
Now is the unique solution of if and only if is the unique
to develop alternatives to , and a frequently discussed ap-
solution of . In other words, and are equivalent
proach considers a similar program in the -norm which goes
programs. To see why this is true, observe on the one hand that
by the name of basis pursuit [3]
since , we may decompose as so that
(1.4)
is roughly “three times as strict” as the condition required for (such that ) as a matrix with independent Gaussian
Lemma 1.2. entries. Observe that the range of is a random space of dimen-
Theorem 1.3 is inspired by our previous work [1], see also sion embedded in so that the data is the projection
[10], [11], but unlike those earlier results, our results here are de- of on a random space of dimension . The range of a by
terministic, and thus do not have a nonzero probability of failure, matrix with independent Gaussian entries precisely is a random
provided of course one can ensure that the system ver- subspace of dimension , which justifies the claim.
ifies the condition (1.10). By virtue of the previous discussion, We would like to point out that the numerical bounds we de-
we have the companion result. rived in this paper are overly pessimistic. We are confident that
finer arguments and perhaps new ideas will allow to derive ver-
Theorem 1.4: Suppose is such that and let
sions of Theorem 1.5 with better bounds. The discussion section
be a number obeying the hypothesis of Theorem 1.3. Set
will enumerate several possibilities for improvement.
, where is a real vector supported on a set of size at
most . Then is the unique minimizer to
G. A Notion of Capacity
We now develop a notion of “capacity;” that is, an upper
bound on the support size of the error vector beyond which
recovery is information-theoretically impossible. Define the
F. Gaussian Random Matrices cospark of the matrix as
theoretically recover any plaintext if and only if the fraction we ask that be a subgradient of at the point [13], which
error obeys here means
if and otherwise.
(1.13)
Since by definition , conditions i) and ii) are
In other words, given a fraction of error , accurate decoding is now natural.
possible provided that the “rate” obeys .
This may help to establish a link between this work and other A. Proof of Theorem 1.3
information literature. Observe first that standard convex arguments give that there
Proof: It is clear that as one can exists at least one minimizer to the problem .
find an so that the first entries of vanish. Fur- We need to prove that . Since obeys the constraints of
ther, all the by submatrices of are invertible with prob- this problem, obeys
ability . This follows from the fact that each by subma-
trix is invertible with probability , and that there is a finite (2.14)
number of them. With probability one, therefore, it is impos-
sible to find such that has vanishing entries. This
Now take a obeying properties i) and ii) (see the remark fol-
gives .
lowing Lemma 2.2). Using the fact that the inner product
is equal to the sign of on and has absolute value strictly less
H. Organization of the Paper than one on the complement, we then compute
The paper is organized as follows. Section II proves our main
claim, namely, Theorem 1.3 (and hence Theorem 1.4) while
Section III introduces elements from random matrix theory to
establish Theorem 1.5. In Section IV, we present numerical ex-
periments which suggest that in practice, works unreason-
ably well and recovers the exactly from provided
that the fraction of the corrupted entries be less than about 17%
in the case where and less than about 34% in the case
where . Section V explores the consequences of our
results for the recovery of signals from highly incomplete data
and ties our findings with some of our earlier work. Finally, we
conclude with a short discussion section whose main purpose is
to outline areas for improvement.
II. PROOF OF MAIN RESULTS Comparing this with (2.14) we see that all the inequalities in the
Our main result, namely, Theorem 1.3 is proved by duality. above computation must in fact be equality. Since was
As we will see in Section II-A, is the unique minimizer if the strictly less than for all , this in particular forces
matrix has full rank and if one can find a vector with the for all . Thus,
two properties
i) , for all ;
ii) and , for all ;
where is the sign of ( , for ). Applying (1.7) (and noting from hypothesis that ) we
The two conditions above say that a specific dual program is conclude that for all . Thus, as claimed.
feasible and is called the exact reconstruction property in [1], For with obeying the hypothesis of Theorem 1.3,
see also [10]. The reader might find it helpful to see how these has full rank since and thus, the proof simply consists
conditions arise and we first informally sketch the argument (a in constructing a dual vector ; this is the object of the next
rigorous proof follows in Section II-A). Suppose we wish to section. This concludes the proof of our theorem.
minimize subject to , and that is differentiable.
Together with , the classical Lagrange multiplier op- B. Exact Reconstruction Property
timality condition (see [13]) asserts that there exists ( is a We now examine the sparse reconstruction property and begin
Lagrange multiplier) such that with Lemma 2.1 which establishes that the coefficients ,
, are suitably bounded except for an exceptional set.
Lemma 2.2 strengthens this results by eliminating the excep-
tional set.
In a nutshell, for to be a solution, we must have
. In our case , and so Lemma 2.1 (Dual Reconstruction Property, Version): Let
for , but is otherwise not smooth at zero. In this case, be such that , and be a real vector supported
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
4208 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 12, DECEMBER 2005
on such that . Then there exists a vector whenever and . In particular, if we set
such that for all . Furthermore, there is an
“exceptional set” which is disjoint from , of size at
most
then must obey , since otherwise we could con-
(2.15) tradict (2.17) by taking a subset of of cardinality . The
claims now follow.
and with the properties
We now derive a solution with better control on the sup norm
for all of outside of , by iterating away the exceptional set
(while keeping the values on fixed). The proof of the fol-
and lowing lemma is given in the Appendix.
Lemma 2.2 (Dual Reconstruction Property, Version): Let
be such that , and be a real vector
supported on obeying . Then there exists a vector
where for short. In addition, for such that for all . Furthermore,
some constant only depending upon . obeys
Proof: Recall that is the linear transfor-
mation where (we use the for all
subscript in to emphasize that the input is a -dimen- (2.18)
sional vector), and let be the adjoint transformation
Lemma 2.2 actually solves the dual recovery problem. In-
deed, our result states that one can find a vector obeying
Property (1.7) gives both properties i) and ii) stated at the beginning of the section.
To see why ii) holds, observe that
and, therefore, (2.18) gives for all
where and are the minimum and maximum eigen-
values of the positive-definite operator . In particular,
since , we see that is invertible with
provided that .
(2.16) It is likely that one may push the condition
a little further. The key idea is as follows. Each vector in
Also note that and set the iteration scheme used to prove Lemma 2.2 was designed to
to be the vector annihilate the influence of on the exceptional set .
But annihilation is too strong of a goal. It would be just as suit-
able to design to moderate the influence of enough
it is then clear that , i.e., for all . so that the inner product with elements in is small rather
In addition, with . than zero. However, we have not pursued such refinements as
Finally, if is any set in disjoint from with and the arguments would become considerably more complicated
is any sequence of real numbers, then (1.8) and than the calculations presented here.
(2.16) give
C. Approximate Orthogonality
Lemma 1.1 gives control of the size of the principal angle
between subspaces of dimension and , respectively. This is
useful because it allows to guarantee exact reconstruction from
the knowledge of the numbers only.
Proof Proof of Lemma 1.1: We first show that
. By homogeneity it will suffice to show that
In other words
Now (1.7) gives
(2.17)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
CANDES AND TAO: DECODING BY LINEAR PROGRAMMING 4209
together with tion inequalities to develop such a uniform bound. Note that for
, we obviously have and
and, therefore, attention may
be restricted to matrices of size . Now, there are large devia-
and the claim now follows from the parallelogram identity tion results about the singular values of [18]. For example,
letting (resp., ) be the largest singular value of
so that (resp.,
It remains to show that . Again by homo- ), Ledoux [19] applies the concentration inequality
geneity, it suffices to establish that for Gaussian measures, and for each fixed , obtains the de-
viation bounds
(3.19)
and
where is the entropy function
defined for . For each , the
restricted isometry constant of a by Gaussian matrix
Hence, obeys (for and large enough)
(3.21)
Proof: As discussed above, we may restrict our attention
to sets such that . Set for short and
let where the maximum is over all sets
as claimed. (We note that it is possible to optimize this bound a of size , and similarly . Then
little further but will not do so here.) and
which gives
where
We have
Restricted isometry constants must hold for all sets of car-
dinality less or equal to , and we shall make use of concentra-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
4210 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 12, DECEMBER 2005
Fig. 1. Behavior of the upper bound (r ) for three values of the ratio p=m, namely, p=m = 3=4; 2=3; 1=2.
and suppose and are large enough so that may be taken as any value obeying which we
used to give numerical values in Theorem 1.5. Fig. 1 graphs the
function for several values of the ratio .
Then
IV. NUMERICAL EXPERIMENTS
This section investigates the practical ability of to recover
an object from corrupted data , (or
and, therefore,
equivalently to recover the sparse vector of errors from
the underdetermined system of equations ).
The goal here is to evaluate empirically the location of the break-
point as to get an accurate sense of the performance one might
expect in practice. In order to do this, we performed a series of
experiments designed as follows:
For sufficiently large, the term is less than 1) select (the size of the input signal) and so that with the
, which concludes the proof. same notations as before, is an by matrix; sample
with independent Gaussian entries;
Ignoring the ’s, Lemma 3.1 states that with overwhelming 2) select as a percentage of ;
probability 3) select a support set of size uniformly at
random, and sample a vector on with i.i.d. Gaussian
(3.22)
entries;1
A similar conclusion holds for and and, therefore, we 4) make (the choice of does not matter as is
established that with very high probability clear from the discussion and here, is also selected at
random), solve and obtain ;
(3.23) 5) compare to ;
6) repeat 100 times for each and ;
with 7) repeat for various sizes of and .
The results are presented in Figs. 2 and 3. Fig. 2 examines
the situation in which the length of the code is twice that of
the input vector , for and . Our
In conclusion, Lemma 1.1 shows that the hypothesis of our main experiments show that one recovers the input vector all the time
theorem holds provided that the ratio is small so that 1Just as in [11], the results presented here do not seem to depend on the actual
. In other words, in the limit of large samples, distribution used to sample the errors.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
CANDES AND TAO: DECODING BY LINEAR PROGRAMMING 4211
Fig. 2. ` -recovery of an input signal from y = Af + e with A an m by n matrix with independent Gaussian entries. In this experiment, we “oversample” the
input signal by a factor 2 so that m = 2n. (a) Success rate of (P ) for m = 512. (b) Success rate of (P ) for m = 1024. Observe the similar pattern and cutoff
point. In these experiments, exact recovery occurs as long as about 17% or less of the entries are corrupted.
Fig. 3. ` -recovery of an input signal from y = Af + e with A an m by n matrix with independent Gaussian entries. In this experiment, we “oversample” the
input signal by a factor 4 so that m = 4n. In these experiments, exact recovery occurs as long as about 34% or less of the entries are corrupted.
as long as the fraction of the corrupted entries is below 17%. 37.5%. One needs to apply caution, however, as
This holds for (Fig. 2(a)) and (Fig. 2(b)). these two quantities refer to two distinct situations. On the one
In Fig. 3, we investigate how these results change as the length hand, the limit guarantees that all signals
of the codewords increases compared to the length of the input, may be decoded exactly. On the other hand, the numerical sim-
and examine the situation in which , with . Our ulations examine the case where both and the error vector
experiments show that one recovers the input vector all the time are chosen randomly, and does not search for the worst possible
as long as the fraction of the corrupted entries is below 34%. pair. In this sense, the simulations are not really “testing” The-
It might be tempting to compare the empirical cutoff observed orem 1.5. Under a probability model for the error pattern , say,
in the figures with the theoretical limit calculated in Corollary it is clear that information theoretically, one can tolerate a larger
1.8. For example, for , Fig. 3 suggests a cutoff fraction of errors if one wishes to be able to decode accurately
near 35% while Corollary 1.8 gives a theoretical limit of only for most as opposed to all error vectors.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
4212 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 12, DECEMBER 2005
V. OPTIMAL SIGNAL RECOVERY Suppose, for example, that is a Gaussian random matrix as
Our recent work [1] developed a set of ideas showing that it is in Section III. We will assume the same special normalization
surprisingly possible to reconstruct interesting classes of signals so that the variance of each individual entry is equal to .
accurately from highly incomplete measurements. The results Calculations identical to those from Section III give that with
in this paper are inspired and improve upon this earlier work overwhelming probability, obeys the hypothesis of the the-
orem provided that
and we now elaborate on this connection. Suppose we wish to
reconstruct an object in from the linear measurements
or (5.24)
for some positive constant . Now consider the statement
with , the th row of the matrix . Of special interest is the of the theorem; there is a way to invoke linear programming
vastly underdetermined case , where there are many and obtain a reconstruction based upon mea-
more unknowns than observations. We choose to formulate the surements only, which is at least as good as that one would
problem abstractly but for concreteness, we might think of as achieve by knowing all the information about and selecting
the coefficients of a digital signal or image in some the largest coefficients. In fact, this is an optimal statement
nice orthobasis, e.g., a wavelet basis, so that the information as (5.27) correctly identifies the minimum number of measure-
about the signal is of the form . ments needed to obtain a given precision. In short, it is impos-
Suppose now that the object of interest is compressible in the sible to obtain a precision of about with fewer than
sense that the reordered entries of decay like a power law; con- measurements, see [1], [21].
cretely, suppose that the entries of , rearranged in decreasing Theorem 5.1 is stronger than our former result, namely, The-
order of magnitude, , obey orem 1.4 in [1]. To see why this is true, recall the former claim:
[1] introduced two conditions, the uniform uncertainty principle
(5.25) (UUP) and the exact reconstruction principle (ERP). In a nut-
shell, a random matrix obeys the UUP with oversampling
for some . We will denote by the class of all sig- factor if obeys
nals obeying (5.25). The claim is that it is possible to
reconstruct compressible signals from only a small number of (5.28)
random measurements.
with probability at least for some fixed positive
Theorem 5.1: Let be the measurement matrix as in (5.24)
constant . Second, a measurement matrix obeys the
and consider the solution to
ERP with oversampling factor if for each fixed subset of
subject to (5.26) size (5.28) and each “sign” vector defined on , there
exists with the same overwhelmingly large probability a vector
Let such that and set . with the following properties:
Then obeys i) , for all ;
ii) and , for all not in .
(5.27) Note that these are the conditions listed at the beginning of Sec-
tion II except for the factor on the complement of . Fix
To appreciate the content of the theorem, suppose one would . [1] argued that if a random matrix obeyed the UUP
have available an oracle letting us know which coefficients , and the ERP both with oversampling factor , then
, are large (e.g., in the scenario we considered ear-
lier, the oracle would tell us which wavelet coefficients of are
large). Then we would acquire information about the largest with inequality holding with the same probability as before.
coefficients and obtain a truncated version of obeying Against this background, several comments are now in order.
• First, the new statement is more general as it applies to all
matrices, not just random matrices.
for generic elements taken from . Now (5.27) says • Second, whereas our previous statement argued that for
that not knowing anything about the location of the largest each , one would be able—with high proba-
coefficients, one can essentially obtain the same approxima- bility—to reconstruct accurately, it did not say anything
tion error by nonadaptive sampling, provided the number of about the worst case error for a fixed measurement matrix
measurements is increased by a factor . The larger , the . This is an instance where the order of the quantifiers
smaller the oversampling factor, and hence, the connection plays a role. Do we need different ’s for different ob-
with the decoding problem. Such considerations make clear jects? Theorem 5.1 answers this question unambiguously;
that Theorem 5.1 supplies a very concrete methodology for the same will provide an optimal reconstruction for all
recovering a compressible object from limited measurements the objects in the class.
and as such, it may have a significant bearing on many fields of • Third, Theorem 5.1 says that the ERP condition is redun-
science and technology. We refer the reader to [1] and [21] for dant, and hence the hypothesis may be easier to check in
a discussion of its implications. practice. In addition, eliminating the ERP isolates the real
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
CANDES AND TAO: DECODING BY LINEAR PROGRAMMING 4213
reason for success as it ties everything down to the UUP. and developed a large deviation bound to quantify the depar-
In short, the ability to recover an object from limited mea- ture from the right-hand side. Now let and be two dis-
surements depends on how close is to an orthonormal joint sets of respective sizes and and consider :
system, but only when restricting attention to sparse linear is the cosine of the principal angle between the two
combinations of columns. random subspaces spanned by the columns of and re-
We will not prove this theorem as this is a minor modification spectively; formally
of that of Theorem 1.4 in the aforementioned reference. The key
point is to observe that if obeys the hypothesis of our theorem,
then by definition obeys the UUP with probability one, but
also obeys the ERP, again with probability one, as this is the
content of Lemma 2.2. Hence, both the UUP and ERP hold and We remark that this quantity plays an important analysis in sta-
therefore, the conclusion of Theorem 5.1 follows. (The fact that tistical analysis because of its use to test the significance of cor-
the ERP actually holds for all sign vectors of size less than is relations between two sets of measurements, compare the litera-
the reason why (5.27) holds uniformly over all elements taken ture on canonical correlation analysis [26]. Among other things,
from , see [1].) it is known [27] that
that one could use other matrices and still obtain similar results; Consider first , it follows from the construction that
namely, that recovers exactly provided that the number and for all , and hence,
of corrupted entries does not exceed . In fact, our previous
work suggests that partial Fourier matrices would enjoy sim- for all
ilar properties [1], [11]. Other candidates might be the so-called
noiselets of Coifman, Geshwind, and Meyer [29]. These alter- Second, fix with and let . Since
natives might be of great practical interest because they would and are disjoint, we see that the integers in the set
come with fast algorithms for applying or to an arbi- are spaced at least two apart. Now if , then by definition
trary vector and, hence, speed up the computations to find the and, therefore,
-minimizer.
APPENDIX In other words, the and terms in (1.30) cancel each other
PROOF OF LEMMA 2.2 out. Thus, we have
We may normalize . Write . Using
Lemma 2.1, we can find a vector and a set such
that
On the other hand, if and , then
and
for all
for all
which by the triangle inequality and the geometric series for-
mula gives
REFERENCES
[1] E. J. Candes and T. Tao, “Near optimal signal recovery from random
projections: Universal encoding strategies?,” IEEE Trans. Inf. Theory.
Available on the AxXiv preprint server: math:CA=0410542, submitted
By hypothesis, we have . Thus, if we set for publication.
[2] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide
the Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[3] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition
by basis pursuit,” SIAM J. Sci. Comput., vol. 20, pp. 33–61, 1998.
[4] P. Bloomfield and W. Steiger, Least Absolute Deviations: Theory, Appli-
cations, and Algorithms. Boston, MA: Birkhäuser, 1983.
then the series is absolutely convergent and, therefore, is a [5] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic de-
well-defined vector in . We now study the coefficients composition,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845–2862,
Nov. 2001.
[6] R. Gribonval and M. Nielsen, “Sparse representations in unions of
bases,” IEEE Trans. Inf. Theory, vol. 49, no. 12, pp. 3320–3325, Dec.
(1.30) 2003.
[7] D. L. Donoho and M. Elad, “Optimally sparse representation in general
(nonorthogonal) dictionaries via ` minimization,” in Proc. Nat. Acad.
for . Sci. USA, vol. 100, 2003, pp. 2197–2202.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.
CANDES AND TAO: DECODING BY LINEAR PROGRAMMING 4215
[8] M. Elad and A. M. Bruckstein, “A generalized uncertainty principle and [19] M. Ledoux, The Concentration of Measure Phenomenon. Providence,
sparse representation in pairs of R bases,” IEEE Trans. Inf. Theory, RI: Amer. Math. Soc., 2001, vol. 89, Mathematical Surveys and Mono-
vol. 48, no. 9, pp. 2558–2567, Sep. 2002. graphs.
[9] J. A. Tropp, “Greed is good: Algorithmic results for sparse approxima- [20] N. El Karoui, “New results about random covariance matrices and sta-
tion,” IEEE Trans. Inf. Theory, vol. 50, no. 10, pp. 2231–2242, Oct. 2004. tistical applications,” Ph. .D. dissertation, Stanford Univ., Stanford, CA,
[10] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: 2004.
Exact signal reconstruction from highly incomplete frequency informa- [21] D. L. Donoho, “Compressed sensing,” unpublished manuscript, Sep.
tion,” IEEE Trans. Inf. Theory. Available on the ArXiV preprint server: 2004.
math:GM=0409186, submitted for publication. [22] J. Feldman, “Decoding error-correcting codes via linear programming,”
[11] E. J. Candes and J. Romberg, “Quantitative robust uncertainty principles Ph.D. dissertation, MIT, Cambridge, MA, 2003.
and optimally sparse decompositions,” Found. Comput. Math.. Available [23] , “LP decoding achieves capacity,” presented at the ACM-SIAM
on the ArXiV preprint server: math:CA=0411273, to be published. Symp. Discrete Algorithms (SODA), Vancouver, BC, Canada, Jan.
[12] D. L. Donoho, “For most large underdetermined systems of linear equa- 2005.
tions the minimal ` -norm solution is also the sparsest solution,” unpub- [24] J. Feldman, T. Malkin, C. Stein, R. A. Servedio, and M. J. Wainwright,
lished manuscript, Sep. 2004. “LP decoding corrects a constant fraction of errors,” in Proc. IEEE
[13] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, Int. Symp. Information Theory (ISIT), Chicago, IL, Jun./Jul 2004, p.
U.K.: Cambridge Univ. Press, 2004. 68.
[14] V. A. Marchenko and L. A. Pastur, “Distribution of eigenvalues in certain [25] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using linear program-
sets of random matrices” (in Russian), Mat. Sbornik (N.S.), vol. 72, pp. ming to decode binary linear codes,” IEEE Trans. Inf. Theory, vol. 51,
407–535, 1967. no. 3, pp. 954–972, Mar. 2005.
[15] S. Geman, “A limit theorem for the norm of random matrices,” Ann. [26] R. J. Muirhead, Aspects of Multivariate Statistical Theory. New York:
Probab., vol. 8, pp. 252–261, 1980. Wiley, 1982, Wiley Series in Probability and Mathematical Statistics.
[16] J. W. Silverstein, “The smallest eigenvalue of a large dimensional [27] K. W. Wachter, “The limiting empirical measure of multiple discrimi-
Wishart matrix,” Ann. Probab., vol. 13, pp. 1364–1368, 1985. nant ratios,” Ann. Statist., vol. 8, pp. 937–957, 1980.
[17] Z. D. Bai and Y. Q. Yin, “Limit of the smallest eigenvalue of a [28] I. M. Johnstone, “Large covariance matrices (Third 2004 Wald Lec-
large-dimensional sample covariance matrix,” Ann. Probab., vol. 21, ture),” presented at the 6th World Congress of the Bernoulli Society and
pp. 1275–1294, 1993. 67th Annual Meeting of the IMS, Barcelona, Spain, 2004.
[18] S. J. Szarek, “Condition numbers of random matrices,” J. Complexity, [29] R. Coifman, F. Geshwind, and Y. Meyer, “Noiselets,” Appl. Comput.
vol. 7, pp. 131–149, 1991. Harmon. Anal., vol. 10, pp. 27–44, 2001.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on August 08,2023 at 04:54:20 UTC from IEEE Xplore. Restrictions apply.