arXiv:1804.07566v2 [math.ST] 22 Nov 2018
Electronic Journal of Statistics
ISSN: 1935-7524
On the Post Selection Inference constant
under Restricted Isometry Properties
François Bachoc
Institut de Mathématiques de Toulouse;
UMR 5219; Université de Toulouse; CNRS
UPS, F-31062 Toulouse Cedex 9, France
e-mail:
[email protected]
Gilles Blanchard
Universität Potsdam, Institut für Mathematik
Karl-Liebknecht-Straße 24-25 14476 Potsdam, Germany
e-mail:
[email protected]
Pierre Neuvial
Institut de Mathématiques de Toulouse;
UMR 5219; Université de Toulouse; CNRS
UPS, F-31062 Toulouse Cedex 9, France
e-mail:
[email protected]
Abstract: Uniformly valid confidence intervals post model selection in regression can be constructed based on Post-Selection Inference (PoSI) constants. PoSI constants are minimal for orthogonal design matrices, and can
be upper bounded in function of the sparsity of the set of models under
consideration, for generic design matrices.
In order to improve on these generic sparse upper bounds, we consider
design matrices satisfying a Restricted Isometry Property (RIP) condition.
We provide a new upper bound on the PoSI constant in this setting. This
upper bound is an explicit function of the RIP constant of the design matrix,
thereby giving an interpolation between the orthogonal setting and the
generic sparse setting. We show that this upper bound is asymptotically
optimal in many settings by constructing a matching lower bound.
MSC 2010 subject classifications: 62J05, 62J15, 62F25.
Keywords and phrases: Inference post model-selection, Confidence intervals, PoSI constants, Linear Regression, High-dimensional Inference, Sparsity, Restricted Isometry Property.
1. Introduction
Fitting a statistical model to data is often preceded by a model selection step.
The construction of valid statistical procedures in such post model selection
situations is quite challenging (cf. [21, 22, 23], [17] and [25], and the references
given in that literature), and has recently attracted a considerable amount of
attention. Among various recent references in this context, we can mention
those addressing sparse high dimensional settings with a focus on lasso-type
model selection procedures [4, 5, 29, 31], those aiming for conditional coverage
1
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
2
properties for polyhedral-type model selection procedures [14, 19, 20, 27, 28]
and those achieving valid post selection inference universally over the model
selection procedure [1, 2, 6].
In this paper, we shall focus on the latter type of approach and adopt the
setting introduced in [6]. In that work, a linear Gaussian regression model is
considered, based on an n×p design matrix X. A model M ⊂ {1, ..., p} is defined
as a subset of indices of the p covariates. For a family M ⊂ {M |M ⊂ {1, . . . , p}}
of admissible models, it is shown in [6] that a universal coverage property is
achievable (see Section 2) by using a family of confidence intervals whose sizes
are proportional to a constant K(X, M) > 0. This constant K(X, M) is called a
PoSI (Post-Selection Inference) constant in [6]. This setting was later extended
to prediction problems in [1] and to misspecified non-linear settings in [2].
The focus of this paper is on the order of magnitude of the PoSI constant
K(X, M) for large p. We shall consider n ≥ p for simplicity of exposition in
the rest of thispsection (and asymptotics n, p → ∞). It is shown in [6] that
K(X, M) = Ω( log(p)); this rate is reached in particular when X has orthog√
onal columns. On the other hand, in full generality K(X, M) = O( p) for
all X. It can also be shown, as discussed in an intermediary version of [32],
that when M is
p composed of s-sparse submodels, the sharper upper bound
K(X, M) = O( s log(p/s)) holds. Hence, intuitively, design matrices that are
close to orthogonal and consideration of sparse models yield smaller PoSI constants.
In this paper, we obtain additional quantitative insights for this intuition,
by considering design matrices X satisfying restricted isometry property (RIP)
conditions. RIP conditions have become central in high dimensional statistics
and compressed sensing [8, 10, 15]. In the s-sparse setting and for design matrices
X that satisfy ap
RIP property
p of order s with RIP constant δ → 0, we show that
K(X, M) = O( log(p) + δ s log(p/s)). This corresponds to the intuition that
for such matrices, any subset of s columns of X is “approximately orthogonal”.
Thus, under the RIP condition we improve the upper bound of [32] for the
s-sparse case, by up to a factor δ → 0. We show that our upper bound is
complementary to the bounds recently proposed in [18]. In addition, we obtain
lower bounds on K(X, M) for a class of design matrices that extends the equicorrelated design matrix in [6]. From these lower bounds, we show that the new
upper bound we provide is optimal, in a large range of situations.
While the main interest of our results is theoretical, our suggested upper
bound can be practically useful in cases where it is computable whereas the PoSI
constant K(X, M) is not. The only challenge for computing our upper bound is
to find a value δ for which the design matrix X satisfies a RIP property. While
this is currently challenging in general for large p, we discuss, in this paper,
specific cases where it is feasible.
The rest of the paper is organized as follows. In Section 2 we introduce
in more details the setting and the PoSI constant K(X, M). In Section 3 we
introduce the RIP condition, provide the upper bound on K(X, M) and discuss
its theoretical comparison with [18] and its applicability. In Section 4 we provide
the lower bound and the optimality result for the upper bound. All the proofs
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
3
are given in the appendix.
2. Settings and notation
2.1. PoSI confidence intervals
We consider and review briefly the framework introduced by [6] for which the
so-called PoSI constant plays a central role. The goal is to construct post-model
selection confidence intervals that are agnostic with respect to the model selection mehod used. The authors of [6] assume a Gaussian vector of observations
Y = µ + ǫ,
(1)
2
where the n×1 mean vector µ is fixed and unknown, and ǫ follows the N (0, σ In )
distribution where σ 2 > 0 is unknown. Consider an n × p fixed design matrix X,
whose columns correspond to explanatory variables for µ. It is not necessarily
assumed that µ belongs to the image of X or that n ≥ p.
A model M corresponds to a subset of selected variables in {1, . . . , p}. A
set of models of interest M ⊂ Mall = {M |M ⊂ {1, . . . , p}} is supposed to be
given. Following [6], for any M ∈ M, the projection based vector of regression
coefficients βM is a target of inference, with
2
t
t
βM := Arg Minkµ − XM βk = (XM
XM )−1 XM
µ,
(2)
β∈R|M |
where XM is the submatrix of X formed of the columns of X with indices in M ,
and where we assume that for each M ∈ M, XM has full rank and M is nonempty. We refer to [6] for an interpretation of the vector βM and a justification
for considering it as a target of inference. In [6], a family of confidence intervals
(CIi,M ; i ∈ M ∈ M) for βM is introduced, containing the targets (βM )M∈M
simultaneously with probability at least 1 − α. The confidence intervals take the
form
CIi,M := (β̂M )i.M ± σ̂kvM,i kK(X, M, α, r);
(3)
the different quantities involved, which we now define, are standard ingredients
for univariate confidence intervals for regression coefficients in the Gaussian
model, except for the last factor (the “PoSI constant”) which will account for
multiplicity of covariates and models, and their simultaneous coverage. The
t
t
confidence interval is centered at β̂M := (XM
XM )−1 XM
Y , the ordinary least
squares estimator of βM ; also, if M = {j1 , . . . , j|M| } with j1 < . . . < j|M| ,
for i ∈ M we denote by i.M the number k ∈ N for which jk = i, that is,
the rank of the i-th element in the subset M . The quantity σ̂ 2 is an unbiased
estimator of σ 2 , more specifically it is assumed that it is an observable random
variable, such that σ̂ 2 is independent of PX Y and is distributed as σ 2 /r times a
chi-square distributed random variable with r degrees of freedom (PX denoting
the orthogonal projection onto the column space of X). We allow for r = ∞
corresponding to σ̂ = σ, i.e., the case of known variance (also called Gaussian
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
4
limiting case). In [6], it is assumed that σ̂ exists and it is shown that this indeed
holds in some specific situations. A further analysis of the existence of σ̂ is
provided in [1, 2].
The next quantity to define is
|M|
t
n
t
vM,i
:= (ei.M )t G−1
M XM ∈ R ,
(4)
t
XM is the
where eba is the a-th base column vector of Rb ; and GM := XM
|M | × |M | Gram matrix formed from the columns of XM . Observe that vM,i is
nothing more than the row corresponding to covariate i in the estimation matrix
t
t
G−1
M XM , in other words (β̂M )i.M = vM,i Y .
Finally, K(X, M, α, r) is called a PoSI constant and we turn to its definition.
We shall occasionally write for simplicity K(X, M, α, r) = K(X, M). Furthermore, if the value of r is not specified in K(X, M), it is implicit that r = ∞.
Definition 2.1. Let M ⊂ Mall for which each M ∈ M is non-empty, and so
that XM has full rank. Let also
(
vM,i /kvM,i k, if kvM,i k 6= 0;
wM,i =
0 ∈ Rn
else.
Let ξ be a Gaussian vector with zero mean vector and identity covariance matrix
on Rn . Let N be a random variable, independent of ξ, and so that rN 2 follows a
chi-square distribution with r degrees of freedom. If r = ∞, then we let N = 1.
For α ∈ (0, 1), K(X, M, α, r) is defined as the 1 − α quantile of
γM,r :=
1
N
max
M∈M,i∈M
t
wM,i
ξ.
(5)
We remark that K(X, M, α, r) is the same as in [6]. For j = 1, . . . , p, let Xj
2
be the column j of X. We also remark, from [6], that the vector vM,i /kvM,i k
in (4) is the residual of the regression of Xi with respect to the variables
{j|j ∈ M \ {i}}; in other words, it is the component of the vector Xi orthogonal
to Span{Xj |j ∈ M \ {i}}. It is shown in [6] that we have, with probability larger
than 1 − α,
∀M ∈ M, ∀i ∈ M, (βM )i.M ∈ CIi,M .
(6)
Hence, the PoSI confidence intervals guarantee a simultaneous coverage of all
the projection-based regression coefficients, over all models M in the set M.
For a square symmetric non-negative matrix A, we let
corr(A) = (diag(A)† )1/2 A(diag(A)† )1/2 ,
where diag(A) is obtained by setting all the non-diagonal elements of A to zero
and where B † is the Moore-Penrose pseudo-inverse of B. Then we show in the
following lemma that K(X, M) depends on X only through corr(X t X).
Lemma 2.2. Let X and Z be two n × p and m × p matrices satisfying the
relation corr(X t X) = corr(Z t Z). Then K(X, M, α, r) = K(Z, M, α, r).
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
5
2.2. Order of magnitude of the PoSI constant
The confidence intervals in (3) are similar in form to the standard confidence
intervals that one would use for a single fixed model M and a fixed i ∈ M .
For a standard interval, K(X, M) would be replaced by a standard Gaussian
or Student quantile. Of course, the standard intervals do not account for multiplicity and do not have uniform coverage over i ∈ M ∈ M (see [1, 2]). Hence
K(X, M) is the inflation factor or correction over standard intervals to get uniform coverage; it must go to infinity as p → ∞ [6]. Studying the asymptotic
order of magnitude of K(X, M) is thus an important problem, as this order of
magnitude corresponds to the price one has to pay in order to obtain universally
valid post model selection inference.
We now present the existing results on the asymptotic order of magnitude of
K(X, M). Let us define
γM,∞ :=
max
M∈M,i∈M
t
wM,i
ξ ,
(7)
so that γM,r = γM,∞ /N , where we recall that rN 2 follows a chi-square distribution with r degrees of freedom.
We can relate the quantiles of γM,r (which coincide with the PoSI constants
K(X, M)) to the expectation E[γM,∞ ] by the following argument based on
Gaussian concentration (see Appendix A):
Proposition 2.3. Let T (µ, r, α) denote the α-quantile of a noncentral T distribution with r degrees of freedom and noncentrality parameter µ. Then
K(X, M, α, r) ≤ T (E[γM,∞ ], r, 1 − α/2).
To be more concrete, we observe that we can get a rough estimate of the latter
quantile via
p
E[γM,∞ ] + 2 log(4/α)
p
T (E[γM,∞ ], r, 1 − α/2) ≤
;
(1 − 2 2 log(4/α)/r)+
furthermore, as r → +∞, this quantile reduces to the (1 − α/2) quantile of a
Gaussian distribution with mean E[γM,∞ ] and unit variance.
The point of the above estimate is that the dependence in the set of models
M is only present through E[γM,∞ ]. Therefore, we will focus in this paper on
the problem of bounding E[γM,∞ ], which is nothing more than the Gaussian
width [15, chapter 9] of the set ΓM = {±wM,i |M ∈ M, i ∈ M }.
p
When n ≥ p, it is shown in [6] that E[γM,∞ ] is no smaller than 2 log(2p)
√
and asymptotically no larger than p. These two lower and upper bound are
reached by respectively orthogonal design matrices and equi-correlated design
matrices (see [6]).
We now concentrate on s-sparse models. For s ≤ p, let us define Ms =
{M |M ⊂ {1, . . . , p}, |M | ≤ s}. In this case, using a direct argument based on
cardinality, one gets the following generic upper bound (proved in Appendix B).
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
Lemma 2.4. For any s, n, p ∈ N, with s ≤ n, we have
p
E[γMs ,∞ ] ≤ 2s log(6p/s) .
6
(8)
We remark that an asymptotic version of the bound in Lemma 2.4 (as p and
s go to infinity) appears in an intermediary version of [32].
3. Upper bound under RIP conditions
3.1. Main result
We recall the definition and a property of the RIP constant κ(X, s) associated
to a design matrix X and a sparsity condition s given in [15, Chap.6]:
t
κ(X, s) = sup XM
XM − I|M|
|M|≤s
op
.
(9)
Letting κ = κ(X, s), we have for any subset M ⊂ {1, . . . , p} such that |M | ≤ s:
∀β ∈ R|M| ,
(1 − κ)+ kβk2 ≤ kXM βk2 ≤ (1 + κ)kβk2 .
(10)
Remark 3.1. The RIP condition may also be stated between norms instead of
squared norms in (10). Following [15, Chap.6] we will consider the formulation
in terms of squared norms, which is more convenient here.
Since the PoSI constant K(X, M) only depends on corr(X t X) (see Lemma
2.2), we shall rather consider the RIP constant associated to corr(X t X). We let
t
δ(X, s) = sup corr(XM
XM ) − I|M|
|M|≤s
op
.
(11)
Any upper bound for κ(X, s) yields an upper bound for δ(X, s) as shown in
the following lemma.
Lemma 3.2. Let κ = κ(X, s). If κ ∈ [0, 1), then
δ(X, s) ≤
2κ
.
1−κ
The next theorem is the main result of the paper. It provides a new upper
bound on the PoSI constant, under RIP conditions and with sparse submodels.
We remark that in this theorem, we do not necessarily assume that n ≥ p.
Theorem 3.3. Let X be a n × p matrix with n, p ∈ N. Let δ = δ(X, s). We
have
√
p
1+δ p
2s log(6p/s).
E[γMs ,∞ ] ≤ 2 log(2p) + 2δ
1−δ
This upper bound is of the form
URIP (p, s, δ) = Uorth (p) + 2δc(δ)Usparse (p, s),
where:
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
7
p
• Uorth (p) = 2 log(2p) is the upper bound in the orthogonal case;
• Usparse (p, s) is the right-hand side of (8) corresponding to the cardinalitybased upper
√ bound in the sparse case;
• c(δ) = 1 + δ/(1 − δ) satisfies: c(δ) ≥ 0, c(δ) → 1 as δ → 0, and c is
increasing.
We p
observe that if δ → 0, our bound URIP is o(Usparse ). Moreover, when
√
δ s 1 − log s/ log p + 1/ log p → 0, then URIP√is even asymptotically equivalent to Uorth . In particular, this is the case if δ s → 0.
We now consider the specific case where X is a subgaussian random matrix,
that is, X has independent subgaussian entries [15, Definition 9.1]. We discuss
in which situations δ = δ(X, s) → 0. The estimate of κ in [15, Theorem 9.2]
combined with Lemma 3.2 yields
p
s log(ep/s)/n ,
(12)
δ = OP
so that δ → 0 as soon as n/(s log(ep/s)) → +∞.
3.2. Comparison with upper bounds based on Euclidean norms
We now compare our upper bound in Theorem 3.3 to upper bounds recently
and independently obtained in [18]. Recall the notation Y , µ, βM and β̂M
from Section 2 and let r = ∞ for simplicity of exposition. The authors in
[18] address the case where X is random (random design) and consider de−1
t
t
viations of βbM to β̄M = E[XM
XM ] E[XM
Y ], the population version of the
regression coefficients βM , assuming that the rows of X are independent random vectors in dimension p. They derive uniform bounds over M ∈ Ms for
β̄M − β̂M 2 . They also consider briefly (Remark 4.3 in [18]) the fixed design
t
t
case with βM = (XM
XM )−1 XM
µ as in the present paper. This target βM can
be interpreted as the random design model conditional to X. They assume that
the individual coordinates of X and Y have exponential moments bounded by
a constant independently from n, p (thus their setting is more general than the
Gaussian regression setting, but for the purpose of this discussion we assume
Gaussian noise).
√
Let us additionally assume that the RIP property κ(X/ n, s) ≤ κ is satisfied
(on an event of probability tending to 1) and for κ restricted to a compact
of
√
[0, 1) independently of n, p; note that we used the rescaling of X by n, which
is natural in the random design case. Then some simple estimates obtained as
a consequence of Theorems1 3.1 and 4.1 in [18] lead to
!
r
s log(ep/s)
,
(13)
sup βM − β̂M 2 = OP σ
n
M∈Ms
1 The technical conditions assumed by [18] imply a slightly weaker version of the RIP
√
property κ(X/ n, s) ≤ κ < 1.
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
8
as p, n → ∞ and assuming s log2 p = o(n). On our side, under the same assumptions we have that
−1 !
t
XM XM
sup
n
M∈Ms ,i∈M
i.Mi.M
is bounded
on an event of probability tending to 1. This leads to kvi.M k =
√
OP (1/ n) uniformly for all M ∈ Ms , i ∈ M . Hence, from Theorem 3.3, (3),
(6), we obtain
!!
r
r
log(p)
s log(ep/s)
sup βM − β̂M ∞ = OP σ
.
(14)
+δ
n
n
M∈Ms
Thus, if δ = Ω(1), since the Euclidean norm upper bounds the supremum norm,
the results of [18] imply ours (at least in the sense of these asymptotic considerations). On the other hand, in the case where δ → 0, which is the case we are
specifically interested in, we obtain a sharper bound (in the weaker supremum
norm).
In particular, if X is a subgaussian random matrix (as discussed in the previous section), due to (12) we obtain
!!
r
log(p) s log(ep/s)
.
(15)
+
sup βM − β̂M ∞ = OP σ
n
n
M∈Ms
This improves over the estimate deduced from (13) as soon as s log(ep/s) = o(n),
which corresponds to the case where (13) tends to 0. Conversely, in this situation
our bound (15) yields for the Euclidean norm (using kwk2 ≤ kwk0 kwk∞ ):
!!
r
s log(p) s3/2 log(ep/s)
.
(16)
+
sup βM − β̂M 2 = OP σ
n
n
M∈Ms
Assuming s = O(pλ ) for some λ < 1 for ease of interpretation, we see that (16)
is of the same order as (13) when s2 log(p) = O(n), and is of a strictly larger
order otherwise. In this sense, it seems that (14) and (13) are complementary
to each other since we are using a weaker norm, but obtain a sharper bound in
the case δ → 0.
3.3. Applicability
While the main interest of our results is theoretical, we now discuss the applicability of our bound. For any δ ≥ δ(X, s), Theorem 3.3 combined with
Proposition 2.3 provides a bound of the form U RIP (p, s, δ) ≥ K(X, Ms ), with
√
p
1+δ p
U RIP (p, s, δ) = T
2 log(2p) + 2δ
2s log(6p/s), r, 1 − α/2 .
1−δ
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
9
This bound can be used in practice in situations where δ(X, s) (or an upper
bound of it) can be computed, whereas K(X, Ms ) cannot because the number
of inner products in (5) is too large. Indeed, for a given δ, it is immediate to
compute U RIP (p, s, δ).
Upper bounding the RIP constant. When n ≥ p, we have δ(X, s) ≤
δ(X, p) and δ(X, p) can be computed in practice for a given X. Specifically,
t
δ(X, p) is the largest eigenvalue of corr(Xp
X) − Ip in absolute value. When X is
a subgaussian random matrix, δ(X, p) ∼ p/n [3, 24]. Thus, if n is large enough
compared to p, the computable upper bound U RIP (p, s, δ(X, p)) will improve on
the sparsity-based upper bound U sparse (p, s) = T ((2s log(6p/s))1/2 , r, 1−α/2) ≥
K(X, Ms ), see Proposition 2.3 and Lemma 2.4.
On the other hand, when n < p, it is typically too costly to compute δ(X, s)
(or an upper bound of it) for a large p. Nevertheless, if one knows that X is
a subgaussian random matrix, they can compute an upper bound δ̃ satisfying
δ̃ ≥ δ(X, s) with high probability, as in [15, Chapter 9]. We remark that using
the values of δ̃ currently available in the literature, one would need n to be very
large for U RIP (p, s, δ̃) to improve on U sparse (p, s).
Alternative upper bound on the PoSI constant. For any δ ≥ δ(X, s),
we now show how to compute an alternative bound of the form ŨRIP (p, s, δ) ≥
K(X, Ms ). Our numerical experiments suggest that this alternative bound is
generally sharper than U RIP (p, s, δ). For q, r, ρ ∈ N and ℓ ∈ (0, 1), let Bℓ (q, r, ρ)
be defined as the smallest t > 0 so that
≤ ℓ,
Hq,ρ (t) := EG min 1, ρ 1 − FBeta,1/2,(q−1)/2 (t2 /G2 )
where G2 /q follows a Fisher distribution with q and r degrees of freedom, and
FBeta,a,b denotes the cumulative distribution function of the Beta(a, b) distribution. In the case r = +∞, Bℓ is also defined and further described in [2, Section
2.5.2].
It can be seen from the proof of Theorem 3.3 (see specifically (22) which also
holds without the expectation operators), and from the arguments in [1], that
we have
K(X, Ms , α) ≤ Btα (n ∧ p, r, p) + 2δc(δ)B(1−t)α (n ∧ p, r, |Ms |)
for any t ∈ (0, 1). This upper bound can be minimized with respect to t, yielding
ŨRIP (p, s, δ).
The quantity Bℓ (q, r, ρ) can be easily approximated numerically, as it is simply the quantile of the tail distribution Hq,ρ , which only involves standard distributions. Algorithm E.3 in the supplementary materials of [1] can be used to
compute Bℓ (q, r, ρ). An implementation of this algorithm in R [26] is available
in Appendix C. Hence, the upper bound ŨRIP (p, s, δ) can be computed for large
values of p for a given δ.
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
10
4. Lower bound
4.1. Equi-correlated design matrices
The goal of this section is to find a matching lower bound for Theorem 3.3. For
this we extend ideas of [6, Example 6.2] and, following that reference, we restrict
our study to design matrices X for which n ≥ p. The lower bound is based on
the p × p matrix Z (c,k) = (ep1 , ep2 , . . . , epp−1 , xk (c)), where
p
xk (c) = (c, c, . . . c, 0, 0, . . . 0, 1 − kc2 )t ,
| {z } | {z } | {z }
k
p−1−k
1
where we assume k < p, and the constant c satisfies c2 < 1/k, so that Z (c,k) has
full rank. By definition, the correlation between any of the first k columns of
Z (c,k) and the last one is c, and Z (c,k) restricted to its first p − 1 columns is the
identity matrix Ip−1 . The case where k = p − 1 is studied in [6, Example 6.2]:
Theorem 6.2 in [6] implies that the PoSI constant K(X, M), where X is a n × p
√
matrix such that X t X = (Z (c,k) )t Z (c,k) , is of the order of p when k = p − 1
and M = Mall . The Gram matrix of Z (c,k) is the 3 × 3 block matrix with sizes
(k, p − k − 1, 1) × (k, p − k − 1, 1) defined by
Ik
[0]
[c]
(Z (c,k) )t Z (c,k) = [0] Ip−k−1 [0],
(17)
[c]
[0]
1
where [a] means that all the entries of the corresponding block are identical to a.
We begin by studying the RIP coefficient δ(X, s) for design matrices X yielding
the Gram matrix (17). Since this Gram matrix has full rank p, there exists a
design matrix satisfying this condition if and only if n ≥ p.
Lemma 4.1. Let X be a n × p matrix for which X t X is given
√ by (17) with
kc2 < 1. Then for s ≤ k ≤ p − 1, we have κ(X, s) = δ(X, s) ≤ c s − 1.
4.2. A matching lower bound
In the following proposition, we provide a lower bound of K(X, Ms ) for matrices
X yielding the Gram matrix (17).
Proposition 4.2. For any s ≤ k < p, c2 < 1/k and α ≤ 12 , let X be a n × p
matrix for which X t X is given by (17) with kc2 < 1. We have
K(X, Ms , α, ∞) ≥ A p
c(s − 1)
1 − (s − 1)c2
p
p
log⌊k/s⌋ − 2 log 2,
where A > 0 is a universal constant.
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
11
From the previous lemma, we now show that the upper bound of Theorem
3.3 is optimal (up to a multiplicative constant) for a large range of behavior
of√s p
and δ relatively to p. As discussed after Theorem 3.3, in the case where
δ s 1 − log s/ log p + 1/ log
√p = O(1), the upper bound we obtain is optimal,
we show that the upsince it can be written as O( log p). In the next Corollary,
√ p
per bound of Theorem 3.3 is also optimal when δ s 1 − log s/ log p + 1/ log p
tends to +∞, and when δ = O(p−λ ) for some λ > 0.
Corollary 4.3 (Optimality of the RIP-PoSI bound). Let (sp , δp )p≥0 be sequences of values such that sp < p, δp > 0, δp → 0 and satisfying:
q
√
lim δp sp 1 − log sp / log p + 1/ log p = +∞.
p→∞
Then Theorem 3.3 implies
sup
n∈N
s≤sp ,X∈Rn×p
s.t. δ(X,s)≤δp
√
K(X, Msp ) ≤ Bδp sp
q
log(6p/sp ),
(18)
where B is a constant. Moreover, there exists a sequence of design matrices Xp
such that δ(Xp , sp ) ≤ δp and
q
√
(19)
K(Xp , Msp ) ≥ Aδp sp log min(1/δp2 , ⌊(p − 1)/sp ⌋) ,
where A is a constant.
In particular, if δp = O(p−λ ) for some λ > 0 and if ⌊(p − 1)/sp ⌋ ≥ 2, then
the above upper and lower bounds have the same rate.
Therefore, the upper bound in Theorem 3.3 is optimal in most configurations
of sp and δp , except if δp goes to 0 slower than any inverse power of p.
5. Concluding remarks
In this paper, we have proposed an upper bound on PoSI constants in s-sparse
situations where the n × p design matrix X satisfies a RIP condition. As the
value of the RIP constant δ increases from 0, this upper bound provides an
interpolation between the case of an orthogonal X and an existing upper bound
only based on sparsity and cardinality. We have shown that our upper bound is
asymptotically optimal for many configurations of (s, δ, p) by giving a matching
lower bound. In the case of random design matrices with independent entries,
since δ decreases with n, our upper bound compares increasingly more favorably
to the cardinality-based upper bound as n gets larger. It is also complementary
to the bounds recently proposed in [18]. The interest and various applications
of the RIP property are well-known in the high-dimensional statistics literature,
in particular for statistical risk analysis or support recovery. Our analysis puts
into light an additional interest of the RIP property for agnostic post-selection
inference (uncertainty quantification).
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
12
The PoSI constant corresponds to confidence intervals on βM in (2). In section 3.2 we also mention another target of interest in the case of random X,
−1
t
t
β̄M = E[XM
XM ] E[XM
Y ]. This quantity depends on the distribution of X
rather than on its realization, which is a desirable property as discussed in [1, 18]
where the same target has also been considered. In [1], it is shown that valid
confidence intervals for βM are also asymptotically valid for β̄M , provided that
p is fixed. These results require that µ belongs to the column space of X and
hold for models M such that µ is close to the column space of XM . It would be
interesting to study whether assuming RIP conditions on X enables to alleviate
these assumptions.
The purpose of post-selection inference based on the PoSI constant K(X, M)
is to achieve the coverage guarantee (6). The guarantee (6) implies that, for any
model selection procedure M̂ : Rn → M, with probability larger than 1 − α,
for all i ∈ M̂ , (M̂ )i.M̂ ∈ CIi,M̂ . Hence, there is in general no need to make
assumptions about the model selection procedure when using PoSI constants.
On the other hand, the RIP condition that we study here is naturally associated
to specific model selection procedures, namely the lasso or the Dantzig selector
[9, 10, 30, 33]. Hence, it is natural to ask whether the results in this paper could
help post-selection inference specifically for such procedures. We believe that the
answer could be positive in some situations. Indeed, if the lasso model selector
is used in conjunction with a design matrix X satisfying a RIP property, then
asymptotic guarantees exist on the sparsity of the selected model [8]. Thus, one
could investigate the combination of bounds on the size of selected models (of
the form |M̂ | ≤ S and holding with high probability) with our upper bound, by
replacing s by S.
In the case of the lasso model selector, we have referred, in the introduction section, to the post-selection intervals achieving conditional coverage [19],
specifically for the lasso model selector. These intervals are simple to compute
(when the conditioning is on the signs, see [19]). Generally speaking, in comparison with confidence intervals based on PoSI constants, the confidence intervals
of [19] have the benefit of guaranteeing a coverage level conditionally on the selected model. On the other hand the confidence intervals in [19] can be large, and
can provide small coverage rates when the regularization parameter of the lasso
is data-dependent [1]. It would be interesting to study whether these general
conclusions would be modified in the special case of design matrices satisfying
RIP properties.
Finally, the focus of this paper is on PoSI constants in the context of linear
regression. Recently, [2] extended the PoSI approach to more general settings
(for instance generalized linear models), provided a joint asymptotic normality
property holds between model dependent targets and estimators. This extension
was suggested in the case of asymptotics for fixed dimension and fixed number of
models. In the high-dimensional case, an interesting direction would be to apply
the results of [12], that provide Gaussian approximations for maxima of sums
of high-dimensional random vectors. This opens the perspective of applying our
results to various high-dimensional post model selection settings, beyond linear
regression.
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
13
Acknowledgements
This work has been supported by ANR-16-CE40-0019 (SansSouci). The second
author acknowledges the support from the german DFG, under the Research
Unit FOR-1735 “Structural Inference in Statistics - Adaptation and Efficiency”,
and under the Collaborative Research Center SFB-1294 “Data Assimilation”.
Appendix
Appendix A: Gaussian concentration
To relate the expectation of a supremum of Gaussian variables to its quantiles,
we use the following classical Gaussian concentration inequality [13] (see e.g.
[16], Section B.2.2. for a short exposition):
Theorem A.1 (Cirel’son, Ibragimov, Sudakov). Assume that F : Rd → R is
a 1-Lipschitz function (w.r.t. the Euclidean norm of its input) and Z follows
the N (0, σ 2 Id ) distribution. Then, there exists two one-dimensional standard
Gaussian variables ζ, ζ ′ such that
E[F (Z)] − σ|ζ ′ | ≤ F (Z) ≤ E[F (Z)] + σ|ζ|.
(20)
It is known that in certain situations one can expect an even tighter concentration, through the phenomenon known as superconcentration [11]. While such
situations are likely to be relevant for the setting considered in this paper, we
leave such improvements as an open issue for future work.
We use the previous property in our setting as follows:
Proposition A.2. Let C be finite a family of unit vectors of Rn , ξ a standard
Gaussian vector in Rn and N an independent nonnegative random variable so
that rN 2 follows a chi-squared distribution with r degrees of freedom. Define the
random variable
1
max v t ξ .
γC,r :=
N v∈C
Then the (1 − α) quantile of γC,r is upper bounded by the (1 − α/2) quantile of a
noncentral T distribution with r degrees of freedom and noncentrality parameter
E[maxv∈C |v t ξ|].
Proof. Observe that ξ 7→ maxv∈C |v t ξ| is 1-Lipschitz since the vectors of C are
unit vectors. Therefore we conclude by Theorem A.1 that there exists a standard
normal variable ζ (which is independent of N since N is independent of ξ) so
that the following holds:
i
1 h
E max v t ξ + |ζ| .
γC ≤
v∈C
N
We can represent the above right-hand side as max(T+ , T− ) where
i
1 h
T± =
E max v t ξ ± ζ ,
v∈C
N
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
14
i.e. T+ , T− are two (dependent) noncentral t distributions with r degrees of
freedom and noncentrality parameter E[maxv∈C |v t ξ|]. Finally since
P[max(T+ , T− ) > t] ≤ P[T+ > t] + P[T− > t] = 2P[T+ > t],
we obtain the claim.
Since a noncentral distribution is (stochastically) increasing in its noncentrality parameter, any bound obtained for E[maxv∈C |v t ξ|] will result in a corresponding bound on the quantiles of the corresponding noncentral T distribution
and therefore of those of γC . In the limit r → ∞, the quantiles of the noncentral T distribution reduce to those of a shifted Gaussian distribution with unit
variance.
Here is a naive bound on (some) quantiles of a noncentral T :
Lemma A.3. The 1 − α quantile of a noncentral T distribution with r degrees
of freedom and noncentrality parameter µ ≥ 0 is upper bounded by:
p
p
(µ + 2 log(2/α)/(1 − 2 2 log(2/α)/r)+ .
Proof. Let
µ+ζ
T = p
,
V /r
where ζ ∼ N (0, 1) and V ∼ χ2 (r). We have (as a consequence of e.g. [7], Lemma
8.1), for any η ∈ (0, 1]:
i
h√
p
√
P V ≤ r − 2 2 log η −1 ≤ η,
as well as the classical bound
h
i
p
P ζ ≥ 2 log η −1 ≤ η.
It follows that
i
h
p
p
P T ≥ (µ + 2 log η −1 )/(1 − 2 2 log(η −1 )/r)+ ≤ 2η.
The claimed estimate follows.
Appendix B: Proofs
Proof of Lemma 2.2. With the notation of Definition 2.1, K(X, M, α, r) is the
1 − α quantile of (1/N )kzk∞ where z = (zM,i , M ∈ M, i ∈ M) is a Gaussian
vector, independent of N , with mean vector zero and covariance matrix corr(Σ),
where Σ is defined by, for i ∈ M ∈ M and i′ ∈ M ′ ∈ M,
t
Σ(M,i),(M ′ ,i′ ) = vM,i
vM ′ ,i′
|M|
|M ′ |
t
−1
t
t
ei′ .M ′ .
XM )−1 XM
XM ′ (XM
= (ei.M )t (XM
′ XM ′ )
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
15
Hence, Σ depends on X only through X t X. Also, if X is replaced by XD, where
D is a diagonal matrix with positive components, Σ becomes the matrix Λ with
for i ∈ M ∈ M and i′ ∈ M ′ ∈ M,
|M ′ |
|M|
−1
−1 −1
t
t
t
DM ′ ,M ′ ei′ .M ′
Λ(M,i),(M ′ ,i′ ) = (ei.M )t DM,M
(XM
XM )−1 XM
XM ′ (XM
′ XM ′ )
−1 −1
= Di,i
Di′ ,i′ Σ(M,i),(M ′ ,i′ ) .
Hence, corr(Σ) = corr(Λ). This shows that Σ depends on X only through
t
corr(X t X) (we remark that because ∪M M = {1, . . . , p} and each XM
XM is
invertible we have that kXi k > 0 for i = 1, . . . , p). Hence K(X, M, α, r) depends on X only through corr(X t X).
Proof of Lemma 2.4. Using a direct
cardinality-based bound we have the wellp
known inequality E[γMs ,∞ ] ≤ 2 log(2|Ms |), hence
v
!
u
s
X
u
p
t
E[γMs ,∞ ] ≤ 2 log 2
i
,
i
i=1
moreover
s
s
pe s
X
X
p
p
,
≤s
i
≤s
s
i
i
i=0
i=1
the last inequality being classical and due to
p
s X
s i
s
X
s
p
s
p
s
≤
≤ 1+
≤ es .
i
p
p
p
i
i=0
i=0
Since log s ≤ s/e, and using e1+2/e ≤ 6, we obtain
!
s
pe
p
X
p
6p
1+2/e
i
≤ log 2s + s log
,
log 2
≤ s log e
≤ s log
i
s
s
s
i=1
implying (8).
Proof of Lemma 3.2. Put κ = κ(X, s) < 1. Then, kXi k ≥ (1 − κ)1/2 for i =
t
t
1, ..., p so that for i ∈ M ∈ Ms , corr(XM
XM ) = D M XM
XM DM where D
√M is a
|M |×|M | matrix defined by [DM ]i.M,i.M = 1/kXi k. Hence kDM kop ≤ 1/ 1 − κ.
We have, by applications of the triangle inequality and since k.kop is a matrix
norm,
t
corr(XM
XM ) − I|M|
op
t
t
t
= (DM − I|M| )XM
XM D M + XM
XM (DM − IM ) + XM
XM − I|M|
≤ DM −
+
t
XM op kDM kop
I|M| op XM
t
XM
XM
= DM −
− I|M|
op
t
I|M| op XM
XM op
+ DM −
op
t
XM op
I|M| op XM
t
kDM kop + 1 + XM
XM − I|M|
op
.
(21)
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
16
t
From (9)-(10), we have for all M ∈ Ms : kXM
XM kop ≤ 1 + κ, as well as
DM − I|M|
op
1
−1
≤ max
i=1,...,p kXi k
1
1
≤ max 1 − √
,√
−1
1+κ
1−κ
1
− 1.
=√
1−κ
Plugging this into (21), we obtain
1
1
− 1 (1 + κ) √
+1 +κ
δ(X, s) ≤ √
1−κ
1−κ
2κ
=
.
1−κ
Proof of Theorem 3.3. From Lemma 2.2, it is sufficient to treat the case where,
t
for any M , GM = XM
XM has ones on the diagonal; in that case δ(X, s) =
κ(X, s). We have
|M|
t
t
vM,i
= (ei.M )t G−1
M XM
t
|M|
|M|
t
= (ei.M )t I|M| XM
+ (ei.M )t G−1
M − I|M| XM
t
= Xit + rM,i
,
say. We have
|M|
|M|
−1
t
rM,i
rM,i = (ei.M )t G−1
M − I|M| GM GM − I|M| ei.M
|M| 2
≤ ei.M
G−1
M − I|M|
2
kGM kop .
op
From (10), the eigenvalues of GM are all between (1 − δ) and (1 + δ), hence we
have
2
δ
t
rM,i
rM,i ≤
(1 + δ),
1−δ
√
so that letting c(δ) = 1 + δ/(1 − δ)
krM,i k ≤ δc(δ),
and
kwM,i − Xi k =
vM,i
vM,i
− Xi =
(1 − kvM,i k) + vM,i − Xi
kvM,i k
kvM,i k
≤ 2krM,i k,
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
17
from two applications of the triangle inequality, and using that kXi k = 1 since
we assumed that GM has ones on its diagonal for all M . Hence, we have
t
E[γMs ,∞ ] = E
sup
|wM,i ξ|
≤E
M∈Ms ;i∈M
sup
M∈Ms ;i∈M
|Xit ξ| + E
t
≤ E sup |Xi ξ|
i=1,...,p
+ 2δc(δ)E
≤
sup
M∈Ms ;i∈M
sup
M∈Ms ;i∈M
wM,i − Xi
kwM,i − Xi k
p
p
2 log(2p) + 2δc(δ) 2s log(6p/s),
t
(wM,i − Xi ) ξ
t
ξ
(22)
where in the last step we have used Lemma 2.4.
Proof of Lemma 4.1. Since kXi k = 1 for i = 1, ..., p we have corr(X t X) = X t X
and so κ(X, s) = δ(X, s). The Gram matrix in (17) can be written as Ip + cUp,k ,
where Up,k is the 3 × 3 block matrix with sizes (k, p − k − 1, 1) × (k, p − k − 1, 1)
defined by
[0] [0] [1]
Up,k = [0] [0] [0].
[1] [0] 0
Consider a model M with |M | = s ≤ k ≤ p − 1, and denote by GM its
Gram matrix. If p ∈
/ M , then GM = Is and kGM − Is kop = 0. If p ∈ M , then
GM = Is + cUs,m , where m = m(M ) = |(M \ {p}) ∩ {1, . . . k}| ≤ s − 1. The
operator norm of GM −Is is the square root of the largest eigenvalue of (cUs,m )2 ,
2
where Us,m
is a 3 × 3 block matrix with sizes (m, s − m − 1, 1) × (m, s − m − 1, 1)
defined by
[1] [0] [0]
2
= [0] [0] [0].
Us,m
[0] [0] m
The first block is a m × m matrix with all entries equal to 1, hence its only
non-null eigenvalue is m. This is also the (only) eigenvalue of the last block (an
2
1 × 1 matrix). Thus, the largest eigenvalue of Us,m
is m. Therefore, as m ≤ s− 1,
√
we have kGM − Is kop = c s − 1 for all M such that |M | = s ≤ k ≤ p − 1, which
concludes the proof.
Proof of Proposition 4.2. Without loss of generality (by Lemma 2.2) we can
assume that X = Z (c,k) , where Z (c,k) is the p×p matrix defined as the beginning
of Section 4.1. The proof is an extension of the proof of [6, Theorem 6.2]. For
m ≥ 0, consider a model M such that M ∋ p, M ∩ {k + 1, . . . , p − 1} = ∅,
and |M | = m + 1; in other words, M = {i1 , . . . , im , p} such that i1 , . . . , im
are elements of {1, . . . , k}. Denote as M+p
m:k the set of all such models. Let
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
18
uM,p = Zp − PM\{p} (Zp ), where Zp is the last column of Z (c,k) , and where
PM\{p} (Zp ) is the orthogonal projection of Zp onto the span of the columns
with indices M \ {p}. Observe that the column ij of Z (c,k) is the ij -th base
column vector of Rp that we write eij , therefore
PM\{p} (Zp ) =
m
X
(etij Zp )eij = c(ei1 + . . . + eim ).
j=1
Hence, we have, for M ∈ M+p
m:k ,
0
0
uM,p j =
c
√
1 − kc2
for
for
for
for
j
j
j
j
= k + 1, . . . , p − 1,
= 1, . . . , k; j ∈ M,
= 1, . . . , k; j 6∈ M,
= p.
Recall that we have wM,p = uM,p /kuM,p k. Hence, for M ∈ M+p
m:k ,
0
0
√
[wM,p ]j =
c/ 1 − mc2
√
√
1 − kc2 / 1 − mc2
for
for
for
for
j
j
j
j
= k + 1, . . . , p − 1,
= 1, . . . , k; j ∈ M,
= 1, . . . , k; j 6∈ M,
= p.
Hence, we have
t
E[γMs ,∞ ] = E
max |wM,i
ξ|
|M|≤s,i∈M
t
≥E
max
w
ξ
M,p
+p
M∈M(s−1):k
#
√
k−s+1
X
c
1 − kc2
ξk−j:k ,
=E p
ξp + p
1 − (s − 1)c2
1 − (s − 1)c2 j=1
"
where ξ1:k ≤ . . . ≤ ξk:k are the order statistics of ξ1 , . . . , ξk . Hence, since s − 1 <
k, we obtain
#
" k
s−1
X
X
c
E[γMs ,∞ ] ≥ 0 + p
ξj:k
ξj −
E
1 − (s − 1)c2
j=1
j=1
#
" s−1
X
c
ξk−j:k
=p
E
1 − (s − 1)c2
j=1
" s−1
#
X
c
≥p
E
max ξ(j−1)⌊k/s⌋+l .
l=1,...,⌊k/s⌋
1 − (s − 1)c2
j=1
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
19
p
In the above display, each maximum has mean value larger than A log⌊k/s⌋,
with A > 0 a universal constant (see e.g. Lemma A.3 in [11]). Hence, we have
p
c(s − 1)
log⌊k/s⌋.
E[γMs ,∞ ] ≥ A p
1 − (s − 1)c2
Finally, a consequence of Gaussian
√ concentration (Theorem A.1) is that mean
and median of γMs ,∞ are within 2 log 2 of each other. Since we assumed α ≤ 21 ,
√
K(Z (c,k) , Ms , α, ∞) ≥ E[γMs ,∞ ] − 2 log 2, which concludes the proof.
√ p
Proof of Corollary 4.3. When δp sp 1 − log sp / log p + 1/ log p → ∞, one can
see that in Theorem 3.3, the first term is negligible compared to the second one.
Since δp → 0, the first result (18) follows from Theorem
3.3.
p
We now apply Proposition 4.2 with cp = δp / sp − 1 and kp = min(p −
p
1, ⌊1/c2p − 1⌋). From Lemma 4.1, δ(Z (cp ,kp ) , sp ) ≤ cp sp − 1 = δp . We then
have, with two positive constants A′ and A,
s
min(p − 1, ⌊1/c2p − 1⌋)
√
(cp ,kp )
′
K(Z
, Ms , α, ∞) ≥A δp sp log
sp
q
√
≥Aδp sp log min(⌊(p − 1)/sp ⌋, 1/δp2 ) .
This concludes the proof of (19).
Appendix C: Code for computing Bℓ (q, r, ρ)
Bl <- function(q, r, rho, l, I = 1000) {
##
## Compute an upper bound for the quantile 1-l of
## max_{i=1,...,rho} (1/N) | w_i’ V |
## where:
##
- the w_1,...w_{rho} are unit vectors
##
- V follows N(0,I_q)
##
- N^2/r follows X^2(r)
##
## Adapted from K4 in Bachoc, Leeb, Poetscher 2018
##
## Parameters:
## q.......: dimension of the Gaussian vector
## r.......: degrees of freedom for the variance estimator
## rho.....: number of unit vectors
## l.......: type I error rate (1 - confidence level)
## I.......: numerical precision
##
## Value:
##
A numerical approximation of the upper bound
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
20
##
## vector of quantiles of Beta distribution:
vC <- qbeta(p = seq(from = 0, to = 1/rho, length = I),
shape1 = 1/2, shape2 = (q-1)/2,
lower.tail = FALSE)
## Monte-Carlo evaluation of confidence level
## for a constant K
fconfidence <- function(K){
prob <- pf(q = K^2/vC/q, df1 = q,
df2 = r, lower.tail = FALSE)
mean(prob) - l
}
quant <- qf(p = l, df1 = q, df2 = r, lower.tail = FALSE)
Kmax <- sqrt(quant) * sqrt(q)
uniroot(fconfidence, interval = c(1, 2*Kmax))$root
}
References
[1] F. Bachoc, H. Leeb, and B. M. Pötscher. Valid confidence intervals for
post-model-selection predictors. The Annals of Statistics (forthcoming),
2018.
[2] F. Bachoc, D. Preinerstorfer, and L. Steinberger. Uniformly valid confidence intervals post-model-selection. arXiv:1611.01043, 2016.
[3] Z. Bai and J. W. Silverstein. Spectral analysis of large dimensional random
matrices, volume 20. Springer, 2010.
[4] A. Belloni, V. Chernozhukov, and C. Hansen.
Inference for highdimensional sparse econometric models. Advances in Economics and
Econometrics. 10th World Congress of the Econometric Society, Volume
III,, pages 245–295, 2011.
[5] A. Belloni, V. Chernozhukov, and C. Hansen. Inference on treatment effects
after selection among high-dimensional controls. The Review of Economic
Studies, 81:608–650, 2014.
[6] R. Berk, L. Brown, A. Buja, K. Zhang, and L. Zhao. Valid post-selection
inference. The Annals of Statistics, 41(2):802–837, 2013.
[7] L. Birgé. An alternative point of view on Lepski’s method. In State of the
art in probability and statistics (Leiden, 1999), volume 36 of IMS Lecture
Notes Monogr. Ser., pages 113–133. Inst. Math. Statist., 2001.
[8] P. Bühlmann and S. Van De Geer. Statistics for high-dimensional data:
methods, theory and applications. Springer Science & Business Media, 2011.
[9] E. Candes, T. Tao, et al. The dantzig selector: Statistical estimation when
p is much larger than n. The Annals of Statistics, 35(6):2313–2351, 2007.
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
21
[10] E. J. Candes and T. Tao. Decoding by linear programming. IEEE transactions on information theory, 51(12):4203–4215, 2005.
[11] S. Chatterjee. Superconcentration and related topics. Springer, 2014.
[12] V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximations
and multiplier bootstrap for maxima of sums of high-dimensional random
vectors. The Annals of Statistics, 41(6):2786–2819, 2013.
[13] B. S. Cirel’son, I. A. Ibragimov, and V. N. Sudakov. Norm of Gaussian
sample functions. In Proceedings of the 3rd Japan-U.S.S.R. Symposium
on Probability Theory (Tashkent, 1975), volume 550 of Lecture Notes in
Mathematics, pages 20–41. Springer, 1976.
[14] W. Fithian, D. Sun, and J. Taylor. Optimal inference after model selection.
arXiv:1410.2597, 2015.
[15] S. Foucart and H. Rauhut. A mathematical introduction to compressive
sensing. Basel: Birkhäuser, 2013.
[16] C. Giraud. Introduction to high-dimensional statistics., volume 139 of
Monographs on Statistics and Applied Probability. CRC Press, 2015.
[17] P. Kabaila and H. Leeb. On the large-sample minimal coverage probability of confidence intervals after model selection. Journal of the American
Statistical Association, 101:619–629, 2006.
[18] A. K. Kuchibhotla, L. D. Brown, A. Buja, E. I. George, and L. Zhao. A
model free perspective for linear regression: Uniform-in-model bounds for
post selection inference. arXiv preprint arXiv:1802.05801, 2018.
[19] J. D. Lee, D. L. Sun, Y. Sun, and J. E. Taylor. Exact post-selection inference, with application to the lasso. The Annals of Statistics, 44(3):907–927,
2016.
[20] J. D. Lee and J. E. Taylor. Exact post model selection inference for marginal
screening. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence,
and K. Q. Weinberger, editors, Advances in Neural Information Processing
Systems 27, pages 136–144. Curran Associates, Inc., 2014.
[21] H. Leeb and B. M. Pötscher. Model selection and inference: Facts and
fiction. Econometric Theory, 21:21–59, 2005.
[22] H. Leeb and B. M. Pötscher. Performance limits for estimators of the
risk or distribution of shrinkage-type estimators, and some general lower
risk-bound results. Econometric Theory, 22:69–97, 2 2006.
[23] H. Leeb and B. M. Pötscher. Model selection. In T. G. Andersen, R. A.
Davis, J.-P. Kreiß, and T. Mikosch, editors, Handbook of Financial Time
Series, pages 785–821, New York, NY, 2008. Springer.
[24] V. A. Marčenko and L. A. Pastur. Distribution of eigenvalues for some sets
of random matrices. Mathematics of the USSR-Sbornik, 1(4):457, 1967.
[25] B. M. Pötscher. Confidence sets based on sparse estimators are necessarily
large. Sankhya, 71:1–18, 2009.
[26] R Core Team. R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria, 2018.
[27] R. J. Tibshirani, A. Rinaldo, R. Tibshirani, and L. Wasserman. Uniform
asymptotic inference and the bootstrap after model selection. The Annals
of Statistics, forthcoming, 2015.
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019
F. Bachoc, G. Blanchard, P. Neuvial/Post selection inference under RIP
22
[28] R. J. Tibshirani, J. Taylor, R. Lockhart, and R. Tibshirani. Exact postselection inference for sequential regression procedures. Journal of the
American Statistical Association, 111(514):600–620, 2016.
[29] S. van de Geer, P. Bühlmann, Y. Ritov, and R. Dezeure. On asymptotically
optimal confidence regions and tests for high-dimensional models. The
Annals of Statistics, 42:1166–1202, 2014.
[30] S. A. Van De Geer, P. Bühlmann, et al. On the conditions used to prove
oracle results for the lasso. Electronic Journal of Statistics, 3:1360–1392,
2009.
[31] C.-H. Zhang and S. Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical
Society B, 76:217–242, 2014.
[32] K. Zhang. Spherical cap packing asymptotics and rank-extreme detection.
IEEE Transactions on Information Theory, 63(7), 2017.
[33] P. Zhao and B. Yu. On model selection consistency of lasso. Journal of
Machine Learning Research, 7(Nov):2541–2563, 2006.
imsart-ejs ver. 2014/10/16 file: EJS1490.tex date: April 23, 2019