2014 Serra

Robust integral compounding criteria for
trend and correlation structures
M. Stehlík, J. López-Fidalgo, V. Casero-

Alonso & Elena Bukina
Stochastic Environmental Research

and Risk Assessment
ISSN 1436-3240
Stoch Environ Res Risk Assess

DOI 10.1007/s00477-014-0892-5
1 23
Your article is protected by copyright and
all rights are held exclusively by Springer-
Verlag Berlin Heidelberg. This e-offprint is
for personal use only and shall not be self-
archived in electronic repositories. If you wish
to self-archive your article, please use the
accepted manuscript version for posting on
your own website. You may further deposit
the accepted manuscript version in any
repository, provided it is only made publicly
available 12 months after official publication
or later and provided acknowledgement is
given to the original source of publication
and a link is inserted to the published article
on Springer's website. The link must be
accompanied by the following text: "The final
publication is available at link.springer.com”.
1 23
Author's personal copy
DOI 10.1007/s00477-014-0892-5
ORIGINAL PAPER
Robust integral compounding criteria for trend and correlation

structures
M. Stehlı́k • J. López-Fidalgo • V. Casero-Alonso •
Elena Bukina
Ó Springer-Verlag Berlin Heidelberg 2014
Abstract Optimal design is a crucial issue in Environ- integral compound criterion with respect to a density from
mental measurement with typical time–space correlated a given parametric family of distributions is optimized. We
observations. A modified Arrhenius model with a particular also discuss some general conditions around the behavior
correlation structure will be applied to the methane of the introduced approach for comparing the FIMs and
removal in the atmosphere, a very important environmental provide computing methods.
issue at this moment. We introduce a class of integrated
compound criteria for obtaining robust designs. In partic- Keywords Correlated errors Efficiency Equidistant
ular, the paper provides an insight into the relationship of a design Experimental design Fredholm equation
compound D-optimality criterion for both the trend and Parameterized covariance functions Regularization.
covariance parameters, and the Integrated Mean Squared
Prediction Error (IMSPE) criterion. In general, if there are
two or more approaches of a given problem, e.g. two rival 1 Introduction
models or two different parts of a model, an integral
relationship may be constructed with the aim of finding a Comparison of FIMs from different models have been
suitable compromise between them. The Fisher informa- considered in different ways in the statistical literature [see
tion matrix (FIM) will be used in both cases. Then the e.g. Ahmadi and Arghami (2003), Hofmann (2004) or
Alshunnar et al. (2012)]. If there are two or more approa-
ches of a given problem, e.g. two rival models or two
M. Stehlı́k (&)
different parts of a model, an integral relationship may be
Departamento de Matemática, Universidad Técnica Federico
Santa Marı́a, Casilla 110-V, Valparaı́so, Chile constructed with the aim of finding a suitable compromise
e-mail: [email protected]; [email protected] between them. The purpose of this is to obtain robust
designs for both matrices. The usual compound criteria
M. Stehlı́k
may not be powerful enough to do this in a proper way.
Department of Applied Statistics, Johannes Kepler University in
Linz, Linz, Austria Observations from various environmental measurements
e-mail: [email protected] are often approximated as realizations of correlated random
fields. Such an approach is interesting for assessing e.g.
J. López-Fidalgo V. Casero-Alonso
drought and flood risks [see e.g. Unami et al. (2010)]. The
Department of Mathematics, University of Castilla-La Mancha,
Ciudad Real, Spain robust designs obtained in the current paper can be of
e-mail: [email protected] interest for irregular sampling, studied e.g. in Tandeo et al.
V. Casero-Alonso (2011) for the case of aggregation of many meteorological
e-mail: [email protected] and oceanographic variables from satellites. Rodrı́guez-
Dı́az et al. (2012) gave optimal designs for a modified
E. Bukina
Arrhenius model, used for modeling a flux of methane in
Laboratoire I3S, CNRS/Université de Nice-Sophia Antipolis,
Sophia Antipolis, France troposphere. This has an important impact in greenhouse
e-mail: [email protected] gas emission. Kinetics of chemical reactions are usually
123
considered at the Earth. Thus, this problem provides an notation hereafter. A popular criterion is D-optimality,
unusual environment for kinetic reactions, where the cor- based on the maximization of the determinant of the
relation structure and the modified Arrhenius equation can information matrix, that is minimization of the determinant
play a crucial role. In particular, it was applied to the of the covariance function of the estimators of the param-
methane removal in the atmosphere, a very important issue eters, at least asymptotically under some conditions [see
at this moment. This case is used in Sect. 5.1 to illustrate e.g. Pázman (2010)].
the method. The existence of error correlation structure introduces a
In a model with correlated observations, where the trend number of issues, which require special techniques in the
and the covariance structure have different parameters the traditional optimal design theory [for a recent discussion
FIM is a block diagonal matrix. Putting both matrices at the see Müller and Stehlı́k (2009)]. The statistical model we
same level may not be appropriate at all. The method pro- consider in this section is a random field
posed in this paper deals with this situation producing robust
designs for estimating both parts of the model. In this Section Y ð xÞ ¼ gðx; h1 Þ þ eh2 ð xÞ
a motivating example is given where the usual compound with design points (coordinates of monitoring sites) nn ¼
criterion never pays attention to one part of it. In order to ðx1 ; . . .; xn Þ taken from a compact design space
introduce properly the motivating example some background X ¼ X n ; X ¼ ½a; b; 1\a\b\1. The parameters h1
on optimal experimental design is given in the next Section. are unknown and the variance-covariance structure of the
errors depends on parameters h2 . However, if distributional
1.1 Optimal experimental design background assumptions are made, one can employ the Maximum
Likelihood Estimators. If there is not overlapping between
Let y be the response variable and assume x is a vector of h1 and h2 the information matrix exhibits a block diagonal
explanatory variables. Assume the experiment is realized n form
times for n values, x1 ; x2 ; . . .; xn , of x. Let Y1 ðx1 Þ
y1 ; Y2 ðx2 Þ y2 ; . . .; Yn ðxn Þ yn be the corresponding out- M1 ðhÞ 0
:
comes. Let be a general model describing the relationship 0 M2 ðhÞ
between both groups of variables through a parametric
Outputs from various environmental measurements are
family of distributions defined by the probability density
often approximated as realizations of correlated random
functions (pdf’s),
fields. Two approaches are considered to design experi-
fhðy1 ; y2 ; . . .; yn jx1 ; x2 ; . . .; xn ; h1 ; h2 ; . . .; hm Þ ments for a correlated random field when the objective is to
j h ¼ ðh1 ; h2 ; . . .; hm ÞT 2 Hg: obtain precise predictions over the whole experimental
domain. The first one corresponds to a compound Da -
An experimental condition x can be chosen on a compact optimality criterion for both the trend and the covariance
design space, v. An exact design of size n is defined by parameters introduced by Müller and Stehlı́k (2010),
n ¼ ðx1 ; x2 ; . . .; xn Þ, where some of them may be repeated.
Thus, a probability measure can be defined with support on /1 ðh; M1 ðhÞ; M2 ðhÞÞ ¼ jM1 ðhÞjh3 jM2 ðhÞj1h3 ; 0 h3 1;
the different points and weights proportional to the number
of repetitions, say nðxÞ for each x 2 v. The FIM, under where h3 plays here the role of a and M1 ðhÞ and M2 ðhÞ are
typical regularity assumptions, for a particular experi- the FIMs for the parameters h1 and h2 .
mental condition x is The second one relies on an approximation of the mean
2 squared prediction error already proposed in the literature.
oLðx; hÞ oLðx; hÞ o Lðx; hÞ
Iðx; hÞ ¼ E h ¼Eh : In particular, Müller and Pronzato (2009) conjectured and
oh ohT oh2 showed on an example that for some particular settings
both approaches yield similar optimal designs, thereby
The expectation E h is computed here with respect to the
revealing a sort of equivalence theorem for random fields.
distribution with pdf hðyjx; hÞ and Lðx; hÞ is the log–like-
For estimations of spatial fields a classical criterion is the
lihood for this distribution. This matrix is a measure of the
Empirical Kriging (EK) prediction error. Here we have to
information provided by x to the estimation of the model.
b
minimize the so-called Kriging variance Var½ YðxjnÞ ¼
The associated FIM will be
X b 2
E½ð YðxjnÞ YðxÞÞ MSPEðxjnÞ (Mean Squared Pre-
MðhÞ ¼ nðxÞIðx; hÞ; diction Error), where Y b ðxjnÞ denotes the best linear unbi-
x2v
ased predictor (BLUP) of YðxÞ based on a particular design
where for simplicity of notation the dependence of the FIM n. Thus, the EK-optimal design minimizes the criterion
on an experimental design, n, will be omitted in the function
123
bd takes to optimize the simple average of the kernel on the

wðnÞ ¼ max Var½ YðxjnÞ:
x2X parameters. Other typical distributions are the uniform or
the Beta family.
However, this criterion is difficult to manage. A much
As the following results state some kernels are equivalent,
easier criterion is Da -optimality.
that is, they produce the same spectrum of optimal designs. Let /
and /0 two concave optimality criteria defined on an information
1.2 Integral compounding matrices space, let k 2 ½0; 1 and /k ¼ k/ þ ð1 kÞ/0 . The
design maximizing /k will be called nk .
Let consider two models, which may differ in their statis- Lemma 1 Let u : R ! R be a continuous increasing
tical representation. Then we can construct two different function. Then the criteria /~ ¼ l/ þ ð1 lÞu /0 and
l
FIMs. This is the case of the trend and the covariance
/k are equivalent in the sense that for any k 2 ½0; 1 there
models. Thus, let M1 ðhÞ; M2 ðhÞ be two FIMs with h 2 H:
Let u : H ! R be a function from a set D, eventually a exists l 2 ½0; 1 such that nk ¼ n~l , where the later is a
parametric family of functions, e.g. pdf’s. We will consider ~ .
design maximizing /l
a compound integral criterion
Proof Since /0 is concave and u is increasing then u /0
Z
is concave. From Cook and Wong (1994) given k 2 ½0; 1
Lu ¼ /ðh; M1 ðhÞ; M2 ðhÞÞuðhÞdh; ð1Þ there exist a constant Ck such that nk ¼ arg max
H
f/ðnÞj/0 ðnÞ Ck g. But /0 ðnÞ Ck if and only if
where / is a particular kernel. It is important to mention u½/0 ðnÞ uðCk Þ. Applying again the results of Cook and
that the parameter vector here includes model parameters Wong (1994) to / ~ , given uðCk Þ there exists l 2 ½0; 1
l
and other parameters or constants related to the procedure, ~ ðnÞ ¼ n~l .
such that nk ¼ arg maxn / h
l
e.g. new parameters defining the kernel.
The aim is to find an appropriate function u , e.g. Corollary 1
maximizing or minimizing Lu, or solving the Fredholm
1. Let u; u0 : R ! R continuous and increasing then the
integral equation Lu ¼ w, where w is a given criterion.
criteria /~ ¼ lu / þ ð1 lÞu0 /0 and / are
Then a design will be constructed from this function. This l k
will provide a suitable robust optimal design. Notice, that equivalent.

optimizing Lu may be different than optimizing its kernel ~~ l 01l
2. Criterion / l ¼/ / is equivalent to
/. ~ 0
/ ¼ l log / þ ð1 lÞ log / , which is equivalent to
l
Typical examples for the kernel, /, are
/k .
1. /1 ðh; M1 ðhÞ; M2 ðhÞÞ ¼ jM1 ðhÞjh3 =m1 jM2 ðhÞjð1h3 Þ=m2 . This means in particular that the same optimal design is
2. /2 ðh; M1 ðhÞ; M2 ðhÞÞ ¼ mh31 log jM1 ðhÞj þ 1h
m2 log jM2 ðhÞj.
3
obtained for the two criteria for different values k and l in
3. /2 ðh; M1 ðhÞ; M2 ðhÞÞ ¼ h3 log jM1 ðhÞj þ ð1 h3 Þ log each criterion.
jM2 ðhÞj. If the function uðhÞ does not depend on the parameters
1=m1 1=m2 of the model(s) then we get the traditional compound
jM1 ðhÞj jM2 ðhÞj
4. /3 ðh; M1 ðhÞ; M2 ðhÞÞ ¼ h3 jM1 ðhÞj þð1 h3 Þ jM2 ðhÞj ,
design as shown in the following Lemma.
where jMi ðhÞj corresponds to the maximum determinant of R1
jMi ðhÞj, i ¼ 1;2. Lemma 2 If uðhÞ depends only on k then Lu ¼ 0 ½k/ þ

5. /3 ðh; M1 ðhÞ; M2 ðhÞÞ ¼ h3 jM1 ðhÞj
þ ð1 h3 Þ jM2 ðhÞj
. ð1 kÞ/0 uðkÞdk is equivalent to the compound criterion
jM1 ðhÞj jM2 ðhÞj R1
kuðkÞdk
Here j j stands either for the determinant or the absolute /l ¼ l/ þ ð1 lÞ/0 with l ¼ R0 1 .
uðkÞdk
value, hT ¼ ðh1 ; h2 ; h3 Þ, 0 h3 0 and mi is the dimension
0
of sub-vector hi of h, i ¼ 1; 2. After finding an appropriate 1.3 Motivating example: misspecification of linear

function u an optimal design is searched by other means as and exponential correlation
minimizing or maximizing LuðnÞ, or else constructing the
design using a function u. Let us consider the model Yðxi Þ ¼ eðxi Þ; xi 2 ½0; 1 where
The function u may be fixed from the beginning if there errors are correlated according to either an exponential,
is some interest or knowledge on the parameters of this expðhdÞ, or a linear, 1 hd, covariance function. Here
problem. Trivial examples are the Dirac measure, which d ¼ jxi xj j; and h [ 0 is an unknown parameter of
leads to locally optimizing the kernel, or u ¼ 1, which interest. Notice that expðhdÞ ¼ 1 hd þ oðhdÞ; where o
123
is the ‘‘little-o’’ Landau symbol defined as follows: f ðxÞ ¼ (collapsing again) for any h 2 ½0; 1. In the situation
oðgðxÞÞ means that f =g ! 0 as x ! 0: For simplicity, we when the exponential covariance is the true one this
consider exact two-point optimal designs, ðx; x þ dÞ, for design will have only about 15 % of efficiency.
the covariance parameter h: Typical approaches in case of Now we will illustrate the potential of regularized integral
covariance uncertainty consider individual models and compound criteria. Taking kernel
compounding. In the next a, b, c we show how integrated
compound criteria may regularize collapsing of the design /1 ðh; Mexp ðhÞ; Mlin ðhÞÞ ¼ jMexp ðhÞjh jMlin ðhÞj1h ;
points. By collapsing we understand that the criterion
the integral compound criterion has the form
attains its maximum at d ¼ 0.
Z1
(a) Optimal design for exponential correlation The FIM
Lu ¼ /1 ðh; Mexp ðhÞ; Mlin ðhÞÞuðhÞdh:
for h under exponential correlation has the form
2 0
Mexp ðhÞ ¼ d expð2hdÞð1þexpð2hdÞÞ
ð1expð2hdÞÞ2
: The maximum is
Let us take u ðhÞ ¼ 1 (i.e. the density of the uniform dis-
obtained for d ¼ 0; i.e. the optimal design collapses.
tribution over ½0; 1). Then maximum of Lu for h ¼ 1 is
(b) Optimal design for linear correlation This covari-
received at d ¼ 1; i.e. it is not collapsing and it has effi-
ance structure is defined by a linear semi-variogram
ciency 100 % for the linear covariance on the design space
cðdÞ 12 var ðYðxÞ Yðx þ dÞÞ ¼ hd with zero
½0; 1, as we can seen in Fig. 1 (left). If we use non-uniform
nugget, that is, there is not additional term for the prior on parameter h, e.g. having density 2h, the effect is
variance. For regularity we assume that even more significant as we can see in Fig. 1 (right).
hd 2; h [ 0 (thus, negative correlation is allowed; Detailed proofs of the assertions above are included in
to avoid this, we need hd\1). Then the maximal the ‘‘Appendix’’.
FIM is obtained for maximal d: To verify this, we
obtain the FIM 1.4 Structure of the paper
2 2
2hd þ h d þ 2
Mlin ðhÞ ¼ : The paper is organized as follows. In Sect. 2 we study some
h2 ðhd 2Þ2
properties of the class of integrated compound criteria for
We have IMSPE approximation. We provide (by means of Fredholm
oMlin ðhÞ 2d integral operators) the theoretical background for existence
¼ : of specific criteria, related to IMSPE. In Sect. 3 theoretical
od ð2 hdÞ3
arguments are provided for the need of the regularization of
So Mlin ðhÞ is an increasing function for every the Fredholm equation. In Sect. 4 we illustrate the meth-
(acceptable) 0\d\ minf2=h; 1g, otherwise it would odology by numerical experiments. This approach is
be decreasing. Thus d ¼ minf2=h; 1g. applied to the simultaneous estimation. In Sect. 5 we
(c) Compound criterion and its regularization by an integral compare the FIM for both cases of simultaneous equations
compound criterion The criterion h log Mexp ðhÞ þ ð1 and conditionally restricted designs and we optimize the
hÞ log Mlin ðhÞ attains the maximum for d ¼ 0 functional (1) with respect to the density from a given
Fig. 1 Information gain for two

integrated compound designs
123
parametric family (namely Beta). This example considers 2.1 Integrated compound criteria mimics the nugget
two different approaches to a model with an explanatory effect
variable that is not under the control of the experimenter,
but it can only be observed after the experiment is realized. As was seen in the motivating example in the Introduction
The two approaches are simultaneous equations models on a non-constant u may regularize a ‘‘collapsing’’ effect [see
the one hand and conditionally restricted designs on the e.g. Kiseľák and Stehlı́k (2008)]. Thus, it may be said that
other [Casero-Alonso and López-Fidalgo (2014), López– integrating /1 ðh; M1 ðhÞ; M2 ðhÞÞ with non-constant u
Fidalgo and Garcet–Rodrı́guez (2004)]. A Discussion sec- works, in this sense, similarly as the so called ‘‘nugget
tion concludes the paper. effect’’.
Let us consider an Ornstein–Uhlenbeck (OU) process
for a 2-point design where the estimation of the correlation
2 Compound criteria for IMSPE parameter is of interest,
E ½YðxÞ ¼ h1 ; corr ðYðxÞ; Yðx þ dÞÞ ¼ expðh2 dÞ; x 2 ½0; 1:
The main aim of the kriging technique consists in pre-
dicting output of a simulator on the experimental region, The information matrix for both parameters is diagonal
and for any untried location x the estimation procedure is with these two elements in it, apart from the variance,
focused on the BLUP, Y b ðxjnÞ: Thus natural criteria will
minimize suitable functionals of the MSPE. Since often 2 expðh2 dÞ d2 ð1 þ expð2h2 dÞÞ
M1 ðhÞ ¼ and M2 ðhÞ ¼ :
the prediction accuracy is related to the entire prediction 1 þ expðh2 dÞ ð1 expð2h2 dÞÞ2
region v ¼ X n ; a very practical design criterion is the
Integrated MSPE given by IMSPEðxjnÞ ¼ The kernel /1 is considered here with a nominal value
R 2 h2 ¼ 1. If two–point exact designs are searched for each
X n r MSPEðxjnÞdnðxÞ: IMSPE was used in several
papers [see Sacks et al. (1989), Crary (2002)] for con- value of the parameters maximizing the kernel, it can be
struction of optimal designs. seen that the optimal value for the distance is
8
Some of the properties of criterion (1) for /1 will be >
<0 h3 2=3;
studied in this paper, which is given by the direct inte-
d ¼ 2 ð0; 1Þ 2=3\h3 \0:7627;
gration of Da -optimality criterion. There are several pos- >
:
1 h3 0:7627:
sible extensions. One possible modification is related to the
region for h: There is no particular reason for restriction If the tuning parameter h3 is assumed equal to the param-
h 2 ½0; 1: However, we assume in this paper that h belongs eter h2 , say h h3 ¼ h2 , then a similar result is obtained,
to a compact interval and thus ‘for the sake of simplicity’ 8
>
<0 h 2=3;
we normed this interval to be ½0; 1: An important question
is the dimensionality of h: For designs with large number d ¼ 2 ð0; 1Þ 2=3\h\0:7215;
>
:
of points a multivariate function uðh3 ; . . .; hm Þ may be more 1 h 0:7215:
appropriate. Then
Z In both cases for h3 2=3 there is the so called ‘‘collapsing
Lu ¼ jM1 ðhÞjh3 . . .jM1 ðhÞjhm jM2 ðhÞj1h3 . . . effect’’, reported recently by Crary (2002) and Müller and
Stehlı́k (2009). Let us consider the approach introduced in
½0;1m2
this paper.
jM2 ðhÞj1hm uðh3 ; . . .; hm Þdh3 . . .dhm :
Lemma 3
An alternative integrated criterion is given by /1 and the
1. If h h3 ¼ h2 and uðhÞ ¼ 1 then
Fredholm equation w ¼ Lu. From numerical reasons (ill–
conditioned matrix) we use the criterion /1 ðh3 ; Z1
M1 ðhÞ; M2 ðhÞÞ ¼ mh31 log jM1 ðhÞj þ 1h arg maxðLuÞðdÞ ¼ arg max /1 ðh; M1 ðhÞ;
m2 log jM2 ðhÞj: Notice,
3
d d
that in the latter case as stated in Lemma 2 we obtain a 0
criterion of the form mA1 log jM1 ðhÞj þ 1A M2 ðhÞÞuðhÞ dh ¼ 0:

m2 log jM2 ðhÞj,
R1
where A ¼ 0 h3 uðh3 Þdh3 ; i.e. A is the mean of the random Thus, there is not improvement in the collapsing effect
variable h3 (since from practical reasons we later will at all. Let consider other functions.
regularize it by u 0; jjujj ¼ 1Þ: Moreover, this approach 2. If uðhÞ ¼ h; h2 ; eh ; sin h then
relates to a general compound criterion treated in the lit-
d ¼ arg max ðLuÞðdÞ [ 0:
erature [see e.g. McGree et al. (1988)]. d2½0;1
123
In particular, d ¼ 1 for all and d ¼ of XðuÞ and the error jjLu wjj2 [see e.g. Tikhonov and
1:1411; 1:5280; 1:8121; 1:0645 respectively if the Arsenin (1977) for a seminal paper]. The so-called Tik-
design space is large enough. honov regularization has been extensively studied,
Z X r
This means an important improvement since collapsing XðuÞ ¼ ai jf ðiÞ j2 ;
does not occur anymore. By this we have obtained the i¼0
regularization of collapsing by integral compound criteria.
where ai is a real constant and f ðiÞ the i-th derivative.
The other way of regularizing of collapsing is to employ a
Wahba (1977) developed the generalized cross valida-
nugget [Müller and Stehlı́k (2009)].
tion method to estimate k directly from the data. Hansen
(1992) stated the need of regularization in the discretized
2.2 Integrated compound criteria mimics IMSPE
situation. The numerical experiments given in Sect. 4
illustrate the need of regularization in this setup.
Let consider again the Ornstein–Uhlenbeck model with
In what follows the minimization is considered reformu-
constant trend and n design points: x; x þ d1 ;
lating the problem of equivalence of two optimality criteria
x þ d1 þ d2 ; . . .; x þ d1 þ þ dn1 . Then [Baldi Antog-
as a quadratic optimization problem. Numerical methods
nini and Zagoraiou (2010)]
will be used to solve it. The equivalence conjecture states
n1 CðnÞ there exists a function uH ðhÞ that the following holds
wðnÞ ¼ IMSPEðxjnÞ ¼ 1 þ 2AðnÞ ;
h2 BðnÞ Z1
where wðnÞ ’ /ðh; M1 ðhÞ; M2 ðhÞÞuH ðhÞ dh:
0
X
n1
di
AðnÞ ¼ ; Here wðnÞ denotes a criterion related to prediction and
i¼1
expð2h 2 di Þ 1
/ðnjhÞ /ðh; M1 ðhÞ; M2 ðhÞÞ denotes a kernel.
n1
X
expðh2 di Þ 1 The problem is how to find such a function uðhÞ, so that
BðnÞ ¼ di þ ;
i¼1
expðh2 di Þ þ 1 the two criteria are equivalent. Thus, we are interested in
" # the minimization of the following function
X
n1
3ð1 expð2h2 di ÞÞ þ 2h2 di expðh2 di Þ 2 32
CðnÞ ¼ di þ ; Z1
h2 ðexpðh2 di Þ þ 1Þ2
i¼1 fn ¼ 4wðnÞ /ðh; M1 ðhÞ; M2 ðhÞÞuðhÞ dh5 ;
0
and the kernel is
!h3 with respect to uðhÞ.
X
n1
expðh2 di Þ 1 For the sake of simplicity, consider that uðhÞ is of the
/1 ðh; M1 ðhÞ; M2 ðhÞÞ ¼ 1 þ
i¼1
expðh2 di Þ þ 1 form
!1h3 Xr
X
n1 2
d ðexpð2h2 di Þ þ 1Þ uðhÞ ¼ wi dhðiÞ ðhÞ; ð2Þ
i
:
i¼1 ðexpð2h2 di Þ 1Þ2 i¼1
In order to compute optimal designs some methods are where w ¼ ðw1 ; . . .; wr ÞT are some non negative weights,
adapted and evaluated in the next section. H ¼ ðhð1Þ ; . . .; hðrÞ ÞT is a partition of h with 0 hð1Þ
\hð2Þ hðrÞ 1, and dhðiÞ is the Kronecker function.
Designs nj ; j ¼ 1; . . .; q are chosen at this step and then our
3 Numerical solution of the problem problem can be written as
" #2
X q X q X
r
ðiÞ
The problem of solving numerically a Fredholm equation f ¼ fnj ¼ wðnj Þ wi /ðnj jh Þ ;
Lu ¼ w of the first kind is complicated by the fact that the j¼1 j¼1 i¼1
inversion operator is not in general continuous so that X

q
2
¼ wðnj Þ /T ðnj jHÞw ;
numerical stability is a major problem. A common j¼1
approach is to find a regularized solution by minimizing the !
X
q X
q
functional jjLu wjj2 þ kXðuÞ over a suitable set of ¼ 2
w ðnj Þ þ w T T
/ðnj jHÞ/ ðnj jHÞ w
functions D, where X is a stabilizing non-negative func- j¼1 j¼1
tional chosen to be small for u having desirable properties X
q
(typically smoothness) and k is a regularization parameter 2wT wðnj Þ/ðnj jHÞ;

j¼1
giving an appropriate balance (trade-off) between the value
123
Thus the problem of minimizing function f is equivalent to and then solving a corresponding (equivalent) uncon-
minimizing the following function strained problem.
P
1 In our case first we rewrite the constraint ri¼1 wi ¼ 1 in
f~ ¼ wT Aw bT w þ c; ð3Þ the form J T w ¼ 1, where J ¼ ð1; . . .; 1ÞT , and then we
2
define
where
X
q X
q fr ðxÞ ¼ f~ þ rðJ T w 1Þ2 ¼ f~ þ rðwT JJ T w 2wT J þ 1Þ;
A¼ /ðnj jHÞ/T ðnj jHÞ; b ¼ wðnj Þ/ðnj jHÞ; 1
j¼1 j¼1 ¼ wT Aw wT b þ c þ rðwT JJ T w 2wT J þ 1Þ
2
1 Xq
c¼ w2 ðnj Þ: 1
2 ¼ wT ðA þ 2rJJ T Þw wT ðb þ 2rJÞ þ r þ c:
j¼1 2
Hence, our aim is to find the optimum of a quadratic Now we can solve this unconstrained problem with some
function f~. The Hessian A of the objective function f~ is a large penalty term r instead of the original constrained
symmetric positive semi-definite matrix. Actually, for a problem.
fixed n, /ðnjHÞ is a column vector
0 1
/ðnjhð1Þ Þ 3.2 Weighted case
B .. C
/ðnjHÞ ¼ B @ .
C;
A
We also can be interested in minimization of the weighted
/ðnjhðrÞ Þ function
and thus /j ðnjHÞ/Tj ðnjHÞ is a symmetric matrix
0 1
/2j ðnjhð1Þ Þ ... /j ðnjhð1Þ Þ/j ðnjhðrÞ Þ
B C
B .. .. .. C
/j ðnjHÞ/Tj ðnjHÞ ¼ B . . . C:
@ A
/j ðnjhðrÞ Þ/j ðnjhð1Þ Þ ... /2j ðnjhðrÞ Þ
To construct the matrix A we sum up to q (number of the 2 32

Z1
chosen designs n) matrices of this form, so we also get a
fna ¼ 4wðnÞ þ a /ðnjhÞuðhÞ dh5 ;
symmetric matrix.
0
P
3.1 Constrained case where a is some scaling factor. We shall minimize qi¼1 fnai
R1
with respect to uðhÞ 0, such that 0 uðhÞdh ¼ 1. Then we
So far we considered that there are no constraints on the
minimize the resulting function with respect to the scaling
function uðhÞ. However, it can be useful to require uðhÞ to
factor a. Thus, we have
have some certain properties. For instance, we can impose
the following constraints on uðhÞ 0 X
q
min min fnaj :
Z1 a R1 j¼1
uðhÞ: uðhÞdh¼1
uðhÞdh ¼ 1: 0
0
Again, " #2
This corresponds to a constrained quadratic optimization X
q X
q X
r
a ðiÞ
problem. Since we assumed that uðhÞ is of the form fa ¼ fnj ¼ wðnj Þ þ a wi /ðnj jh Þ
P j¼1 j¼1 i¼1
uðhÞ ¼ ri¼1 wi dhðiÞ , the constraint here can be simply !
P X
q X
q
rewritten as ri¼1 wi ¼ 1. ¼ ðwðnj Þ þ aÞ þ w 2 T T
/ðnj jHÞ/ ðnj jHÞ w
The usual trick when solving a constrained quadratic j¼1 j¼1
optimization problem is a penalty approach. It consists in X
q
2wT ðwðnj Þ þ aÞ/ðnj jHÞ;
introducing some penalty term r to the original function f~ j¼1
123
R1 Gradient Descent), nowadays regarded as an algorithm

subject to 0 uðhÞdh ¼ 1.
with poor rate of convergence. The Barzilai and Borwein
The problem of minimization of the function fa is
method introduced by Barzilai and Borwein (1988) has
equivalent to minimization of the following function
much better convergence rate, but it is completely non-
1 monotonic. For more details on the construction of
f~a ¼ wT Aw bT w þ c;
2 sequence of step lengths of this type and the behavior of
where associated gradient methods see Zhigljavsky et al.
(2013).
X
q X
q
A¼ /ðnj jHÞ/T ðnj jHÞ; b ¼ ðwðnj Þ þ aÞ/ðnj jHÞ;
j¼1 j¼1
1X
q 4 Illustrative examples
c¼ ðwðnj Þ þ aÞ2 :
2 j¼1 In this Section we present several numerical examples to
illustrate the approach. Using iterative gradient methods,
To minimize with respect to a we use the MATLAB we are trying to minimize objective function (3) for wðnÞ ¼
function fminbnd within the interval ½10; 10. To min- IMSPEðxjnÞ and
imize fa for each fixed a we shall use some recent gradient
methods described in the next section. /1 ðnjhÞ ¼ jM1 ðhÞjh3 jM2 ðhÞj1h3 ; 0 h3 1:
For this criterion the resulting matrix A is usually very ill-

3.3 Gradient algorithms
conditioned. This can lead to a slow convergence of gra-
dient methods. The speed of convergence depends strongly
Consider a problem of minimizing a quadratic function in
on the spectrum of A, especially on the condition number
the following form
(ratio of the smallest and the largest eigenvalues of the
1 matrix). In this case the smallest eigenvalue is very close to
f ðxÞ ¼ xT Ax xT b ; ð4Þ
2 zero and the largest one is usually of magnitude 105 . It is
where x 2 Rr is an unknown vector, b is a given vector in known that when the condition number is large, numerical
Rr and A is a r r symmetric positive-definite matrix such methods converge slowly.
that When using other criterion function
0\m ¼ Tinf zT Az\M ¼ sup zT Az\1; /2 ðnjhÞ ¼ h3 log jM1 ðhÞj þ ð1 h3 Þ log jM2 ðhÞj; 0 h3 1;
z z¼1 zT z¼1
The gradient of the function (4) at point xk is the condition number of A is still quite large, however, it is
rf ðxk Þ ¼ gk ¼ Axk b. The solution to the optimization smaller than in the previous case. Thus, for our computa-
problem (4) is x ¼ A1 b. However, usually it is a very tions we have used a log-based criterion function.
difficult computational task to compute the inverse matrix For the examples below we choose the dimension of the
A1 , especially when the dimension of the problem, r, is partition H, which is also the dimension of the problem, to
large. In this case some numerical methods can be used to be r ¼ 100. Thus, matrix A is a symmetric r r matrix
determine the solution x . A big class of numerical opti- with the largest eigenvalue and the smallest eigenvalue
mization methods is the class of gradient algorithms. very close to zero. For the gradient algorithm the initial
The general gradient method for solving problem (4) vector is wð0Þ ¼ ð1; 1; . . .; 1Þ and the stopping criterion is
corresponds to the following iterative process. Let x0 2 Rr kgk2 tol with tol ¼ 104 . For each example we plot the
be a starting vector and g0 ¼ Ax0 b 2 Rr be the initial resulting function given by (2), and the two criteria as a
residual vector. At step k ¼ 1; 2. . ., let function of d12 ¼ jnð1Þ nð2Þ j, where nð1Þ and nð2Þ are the
xkþ1 ¼ xk ak gk ; ð5Þ first and second components of the corresponding design.
We have used 2, 3, 4, 5 and 6-point designs n
where is and ak [ 0, the step-size at iteration k, is deter- ðjÞ
of the following type: nj ¼ ð0; x2 Þ, nj ¼ ð0; x2 ; 1Þ, nj ¼
ðjÞ
mined by some rule. The iterations (5) can also be rewritten ðjÞ ðjÞ ðjÞ ðjÞ ðjÞ ðjÞ
ð0; 12 x2 ; x2 ; 1Þ, nj ¼ ð0; 13 x2 ; 23 x2 ; x2 ; 1Þ, nj ¼ ð0; 14 x2 ;
in terms of residuals (gradients) gk as
1 ðjÞ 3 ðjÞ ðjÞ ðjÞ
with 1 j 200, where x2 is a random
2 x2 ; 4 x2 ; x2 ; 1Þ
gkþ1 ¼ gk ak Agk :
vector. In each example the parameter of the covariance of
There exist numerous algorithms of gradient-type, the most the process h2 is equal to one. For the sake of brevity we
popular is the Steepest Descent method (also known as plot only results for 2 and 6-point designs.
123
7 −0.3
6 −0.4
5 −0.5 ψ (ξ)
D (ξ)
α
Criterion value
4 −0.6
3 −0.7
u(
2 −0.8
1 −0.9
0 −1
−1 −1.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 2 Values of wi , i ¼ 1; . . .; r, against H ¼ ðhð1Þ ; . . .; hðrÞ Þ (left) and wðnÞ and Da ðnÞ as a function of d12 ¼ jnð1Þ nð2Þ j (right) for 2-point
designs.
2 0
−0.05
1
−0.1
0
Criterion value
−0.15
ψ (ξ)
−1 D (ξ)
α
−0.2
−2
−0.25
−3
−0.3
−4 −0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25
designs.
4.1 Unconstrained case example Da ðnÞ is very close to wðnÞ, even for the case of
2-point designs.
First, we do not impose any additional constraints on uðhÞ.
For the illustration see Figs. 2 and 3. We plot the resulting 4.2 Constrained case
values of wi , i ¼ 1; . . .; r, against H, and the two criteria
wðnÞ and Da ðnÞ versus the distances of the first two points Now we consider that the function uðhÞ is constrained so
R1 that
of the designs. In the figures Da denotes 0 /ðnjhÞuðhÞ dh,
where as uðhÞ we use the optimal function that we have Z1
found. In this case, approximations are successful: in each uðhÞdh ¼ 1;
0
123
1.5 0
1
−0.05
0.5
−0.1
0
Criterion value
−0.5 −0.15
ψ (ξ)
D (ξ)
α
−1 −0.2
−1.5
−0.25
−2
−0.3
−2.5
−3 −0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25
designs.
P
or simply ri¼1 wi ¼ 1. In this case we apply the gradient the weight a, for which fa is reaching the minimum value, is
method to minimize the function fr described in the pre- chosen. Thus, minimization with respect to a consists in a
vious Section (see Sect. 3.1 for more details), with the sequential solution of unconstrained problems and then
penalty term r ¼ 10. See Fig. 4 for illustration. choosing the best of them. As a result, the computational time
In the constrained case, we were not able to construct is increased but the precision of approximation is better.
reasonable approximations of wðnÞ. Perhaps, this happened It can be seen from the figures that for each optimal
because of the constraints imposed on uðhÞ. For the optimal scaling factor (Table 1), the two criteria wðnÞ and Da ðnÞ are
uðhÞ which satisfies the constraints and minimizes the very close to each other.
function fr , the relative distance (the value of fr at the
optimum uðhÞ) is not small enough to provide a good fitting 4.4 Weighted and constrained case
of the two criteria. This problem can be fixed by intro-
ducing a scaling factor, a, as described in Sect. 3.2. This Let us now illustrate the weighted and constrained case, see
approach is implemented in the upcoming sections. Figs. 7 and 8. Unlike in the simple constrained case, we
were able to construct reasonable approximations when
some weight was added to the objective function. Thus, our
4.3 Weighted and unconstrained case
experiments showed that scaling factor a ¼ 0 is not always
optimal. Table 1 summarizes the optimal values of scaling
In this section we present the results of the minimization of
factors for the different cases that we have considered.
a function fa with a scaling factor a, see Sect. 3.2 for the
Thus, introduction of the weight allowed us to approximate
explanation. First, we shall illustrate the weighted and
wðnÞ even in the case when some constraints on uðhÞ were
unconstrained case. It corresponds to the case described in
imposed.
the Sect. 3.2 when the constraints on uðhÞ are omitted. See
The value of the minimized function at the optimum can
Figs. 5 and 6 for the results.
be treated as a relative distance between the two criteria,
As in the unconstrained case we were able to obtain
see Table 2. Smaller this distance is better the
reasonable approximations of wðnÞ. Moreover, in this case
approximation.
minimization was also done with respect to the scaling
factor a, thus we have slightly better results than in the
4.5 Degenerated case
simple case. This is due to the more complex computations
as well as an additional minimization with respect to the
weight a. Of course, this minimization requires some extra In all the previous examples the partition H ¼
calculations. For each fixed a the corresponding uncon- ðhð1Þ ; . . .; hðrÞ Þ was taking as a random vector. However,
strained problem is solved. From this point of view, the first even if we take a partition H in a very simple form, say
example in the Sect. 4.1 can be regarded as a unconstrained H ¼ ð0; . . .; 0; 1Þ, we still can obtain reasonable approxi-
problem that corresponds to the scaling factor a ¼ 0. Then, mation of the criterion wðnÞ ¼ IMSPEðxjnÞ. One of the
123
3.5 0.4
ψ (ξ)
3 0.3 Dα (ξ)
2.5 0.2
Criterion value
2 0.1
1.5 0
1 −0.1
0.5 −0.2
0 −0.3
−0.5 −0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
designs.
2 −0.1
−0.15
1
−0.2
0
Criterion value
−0.25
ψ (ξ)
−1 Dα (ξ)
−0.3
−2
−0.35
−3
−0.4
−4 −0.45
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25
designs.
possible explanations for this phenomenon is the relative the control of the experimenter and it can only be observed
simplicity of the chosen criterion IMSPE. Perhaps, when after the experiment is realized, are considered. There is
using a more sophisticated criterion, say MSPE, a simple h another variable, x, that has to be designed. The condi-
will not be reliable. This aspect is left for further tionally restricted (CR) approach [López–Fidalgo and
investigation. Garcet–Rodrı́guez (2004), Martı́n-Martı́n et al. (2007)]
considers a joint design of the two variables, ðx; zÞ taking
into account that the second can not be designed and is
5 Simultaneous equations and conditionally restricted going to be observed after the experiment. The restriction
designs is in fixing a conditional design for the first variable given
the second, say nzjx ðjxÞ. That is, what is expected to hap-
In this section an example with two different approaches to pen with the first variable if the experiment is this or that.
a problem with an explanatory variable, z, that is not under This design is assumed completely known and has to be
123
Table 1 Values of the optimal scaling factor a Eðyjx; zÞ ¼ g1 ðyjx; z; h1 Þ ¼ a þ bx þ cz

Designs n Unconstrained Constrained
for both approaches. The unknown variable z is designed
2-Point designs 0.6612 1.1438 (CR) / modeled (SE) by the probability of z ¼ 1 given a
3-Point designs 1.6488 1.4385 value x, say rx . In this example the value of the probability
4-Point designs 1.5914 0.9933 of z ¼ 1 is assumed known for x ¼ 0, r0 ¼ 0. This means
5-Point designs 0.8209 0.3862 no action and so no reaction at all. Meanwhile the proba-
6-Point designs -0.1064 -0.3097 bility of z ¼ 1 for x ¼ 1, r1 , is to be estimated through the
experimentation. Thus, the vector of parameters is hT ¼
ðhT1 ; h2 Þ with hT1 ¼ ða; b; cÞ 2 R3 and h2 ¼ r1 2 ½0; 1. For
Table 2 Distances between the two criteria simplicity, the variance will be assumed known, r2y ¼ 1.
Designs n No constraints Weight Constraints and A general design for v is a two–point design,
weight
0 1
2-Point 3.7700 9 10-2 1.5450 9 10-4 2.03300 9 10-2 n¼ ; 0 p 1:
designs
1p p
3-Point 1.2204 9 10-5 1.2750 9 10-8 2.1008 9 10-7 Since the model is not linear the FIM depends on
designs
the parameter h2 ¼ r1 . Computing the FIMs in the usual
4-Point 1.3000 9 10-3 1.3660 9 10-6 1.8261 9 10-4
way
designs
5-Point 2.0676 9 10-4 4.7275 9 10-6 6.1369 9 10-5
0 1
1 p pr1 0
designs B C
6.8409 9 10-6 3.5032 9 10-6 1.5682 9 10-5
B p p pr1 0 C
6-Point B C
designs MSE ðnÞ ¼ B pr1 pr1 pr1 0 C
B C
@ p A
0 0 0
r1 ð1 r1 Þ
guessed from the experience. Under this approach an
MCR ðnÞ 0
information matrix, MCR , is obtained. From the simulta- ¼ ;
0 MII ðnÞ
neous equations models (SE) point of view [see e.g. Con-
lisk (1979)] the uncontrolled variable is modeled through a where the subscripts of the matrices stand for the two
new probability distribution, which has to be fitted as well. different approaches considered.
In this approach one explanatory variable (exogenous) of Now we use the operators given in the Introduction to
the main equation, z, is then the response variable compute compound designs following the procedure given
(endogenous) of a second equation. In both equations there in this paper. For /1 the determinants cancel and a criterion
is a controllable variable, x. Thus, another FIM, MSE , is defined on the determinant of the second diagonal matrix
obtained. Casero-Alonso and López-Fidalgo (2014) pro- jMII j is obtained. The function ua;b ðr1 Þ will be considered
vided details about all this. In this case the correlation as a prior Beta distribution with parameters a and b. We
structure is not defined for the different observations but will consider the following kernel, based on the
for the two potential response variables in the model. efficiencies,
In particular, the response y is modeled through a
jMCR ðhÞj 1=3 jMII ðhÞj
probability distribution with a pdf depending on a variable /3 ðh; MCR ðhÞ; MII ðhÞÞ ¼ r1 ðhÞj þð1 r1 Þ ;
jMCR jMII ðhÞj
z (only known once the experiment is realized) and the
designable variable x, gðyjx; z; h1 Þ, with mean g1 ðx; z; h1 Þ
where MCR ðhÞ and MII ðhÞ are the information matrices
and variance r2y . Now two ways of dealing with z are
maximizing the corresponding determinants. In particular,
1. CR There is prior information about the unknown the maximum of
values of z given x through a completely known
jMCR ðhÞj ¼ ð1 pÞp2 r1 ð1 r1 Þ
conditional probability design, n~zjx ðjxÞ.

2. SE A second model for z through a marginal pdf is jMCR 4
ðhÞj ¼ 27 r1 ð1 r1 Þ at pCR ¼ 2=3 and the maximum
hðzjx; h2 Þ. In this case there are two sets of parameters, of
h ¼ ðhT1 ; hT2 ÞT . p
jMII ðhÞj ¼
Let the design space be v ¼ f0; 1g, let Z ¼ f0; 1g be the r1 ð1 r1 Þ
space of possible outcomes of z, and assume a normal
is jMII ðhÞj ¼ r1 ð1r
1
1Þ
at pII ¼ 1. Then
distribution for the response with
123
1.2 0.9
1
0.8
0.8
0.7
0.6
Criterion value
0.6
0.4
ψ (ξ)
0.2 0.5 Dα (ξ)
0
0.4
−0.2
0.3
−0.4
0.2
−0.6
−0.8 0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
designs.
1=3
27p2 ð1 pÞ The efficiencies of the integrated compound design are
/3 ðh; MCR ðhÞ; MII ðhÞÞ ¼ r1 þð1 r1 Þp
4 quite robust for both matrices.
and 5.1 A methane case study
Z1 Rodrı́guez-Dı́az et al. (2012) considered a case study on

LuðpÞ ¼ /3 ðr1 ; MCR ðr1 Þ; MII ðr1 ÞÞua;b ðr1 Þdr1 methane (CH4 ) removal in the atmosphere, a very impor-
0 tant issue at this moment. The reaction of methane with the
3
bp þ 22=3 a½p2 ð1 pÞ1=3 hydroxyl radical can be expressed in the Modified–Arrhe-
¼ : nius (MA) form as
aþb
1 b
It will be assumed that the probability of z ¼ 1 (e.g. kðtÞ ¼ a m exp ;
t t
desaturation in blood) in the experimental group (e.g. doing
an exercise, x ¼ 1) is 0.4 on average with a variance of with covariance structure cov ðti ; tj ; rÞ ¼ r2 expðdrÞ ¼
0.04. This means that a ¼ 2 and b ¼ 3. Then the value of p r2 expðjti tj jrÞ; r [ 0. Fitting the parameters of this
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffi model resulted in a0 ¼ 2:80 1014 , m0 ¼ 0:667 and
maximizing LuðpÞ is p ¼ 13 2 p
3
3
ffiffiffiffiffiffiffiffiffi
1
pffiffi þ 1 þ 2 ¼
1þ 2 b0 ¼ 1575 for temperatures between 295 and 660 K. For
0:8654. Efficiencies are the calculations the Generalized Exponential model, which
is equivalent to the MA model through the change of variable
2=3
r1 ð1r1 Þ ti ¼ 1=xi , is considered. Let us assume m known and con-
Eff II ðpCR Þ ¼ ¼ 2=3 ¼ 0:6667
1
r1 ð1r1 Þ
sider two point-designs of the form ðx; x þ dÞ. The FIM for
0:8654 both parameters of interest h ¼ ðh1 ; rÞ ¼ ða; b; rÞ is
Eff II ðp Þ ¼
r1 ð1r1 Þ
¼ 0:8654
1 Mh1 ðhÞ 0
r1 ð1r1 Þ MðhÞ ¼ :

0 Mr ðhÞ
Eff CR ðpII Þ ¼0
!1=3 As in the previous Sect. 5 the kernel /3 is considered
ð1 0:8654Þ0:86542 r1 ð1 r1 Þ !1=2
Eff CR ðp Þ¼ 4
27 r1 ð1 r1 Þ jMh1 ðhÞj jMr ðhÞj
/3 ðh; Mh1 ðhÞ; Mr ðhÞÞ ¼ r þð1 rÞ :
¼ 0:879551 jMh1 ðhÞj jMr ðhÞj
123
2 −0.3
1 −0.35
0 −0.4
Criterion value
−1 −0.45
ψ (ξ)
D (ξ)
α
−2 −0.5
−3 −0.55
−4 −0.6
−5 −0.65
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25
designs.
2 0
1.5
−0.05
1
−0.1
0.5
Criterion value
0 −0.15
ψ (ξ)
Dα (ξ)
−0.5 −0.2
−1
−0.25
−1.5
−0.3
−2
−2.5 −0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25
designs.
The maximum of is reached at dr ¼ 0. Then

!1=2
a2 d 2 exp½2dr 2bð2x þ dÞx2m ðx þ dÞ2m jMh1 ðhÞj jMr ðhÞj
jMðh1 Þj ¼ /3 ðh; Mh1 ðhÞ; Mr ðhÞÞ ¼ r þð1 rÞ :
expð2drÞ 1 jMh1 ðhÞj jMr ðhÞj
for the nominal values of the parameters is reached at the
maximum distance dh1 ¼ 1=660 1=295 ¼ 0:00187468: For the function uðrÞ any distribution with r [ 0 would be
On the other hand, the maximum of considered. For the calculations the distributions Weibull
and Gamma for different values of their parameters are
d2 ½1 þ expð2drÞ considered. The same results are obtained. The value of d
jMðrÞj ¼
½expð2drÞ 12 maximizing
123
Z1 by information aggregation, given by a simple additive

LuðdÞ ¼ /3 ðr; Mh1 ðrÞ; Mr ðrÞua;b ðrÞdr; Lebesgue integral.
0
In Sects. 2 and 3 the operator approximation for learning
about the relationship between IMSPE and the compound
is the maximum distance d ¼ 0:001875. Then the IMSPE criterion Da was given. First we introduced the integral
optimal design matches the D-optimal design for the h1 relationship between IMSPE and the compounding kernel.
parameters. This design has a high efficiency, near 1, with We provide numerical implementation and construct the
respect to the nominal value of r since d is rather small. solution u of a given Fredholm integral equation of the first
kind. Several numerical examples illustrates the methods
provided.
6 Discussion In general, it is a difficult task to make a link between
two optimality criteria: one based on prediction, another
Integral compounding of two matrices can help to construct based on parameter estimation, especially in the case of
designs that are similarly informative for two different correlated processes. In the uncorrelated case it was done
models. In Sect. 5 a binary design space is considered and a by Kiefer and Wolfowitz (1960). The correlated case,
procedure is proposed in order to find compound designs in however, is not fully investigated. The first attempt to
an easy and efficient way. The method is based on the relate two criteria was done in Müller and Pronzato (2009).
minimization of a function /, which measures discrepancy In this paper we have concentrated on the correlated case
between information matrices of the two models. Different and we have reformulated the conjecture of equivalence in
functions are considered. The comparison of rival models an integral form.
does not necessarily mean that they share any of the Initially, we reformulated the problem of equivalence as
parameters. One example of this is the Ornstein–Uhlenbeck a quadratic optimization problem. Then we solved this
process for which the trend and the correlation structure problem by using new gradient methods introduced in
have different parameters. Amo-Salas et al. (2013) studied Zhigljavsky et al. (2013) for quadratic optimization. These
the case when common parameters exists, in particular for methods are advantageous for several reasons: they are
covariance structures proportional to the trend. They much faster than standard gradient methods, have small
showed this is not a standard case and they gave conditions computational cost, they are easily parallelized and
and tools to model in practice the covariance in such a way. monotonic at almost every iteration. We tried the proposed
In the case of two rival models, it might happen that approach on several test problems and we presented the
parameters are the same. However we have not studied obtained results in the paper. In terms of required com-
such a case in the present paper and this is a valuable putational effort it is less expensive to generate optimal
further research direction. A common weighting is used to designs for parameter estimation rather than optimal
regularize the discrepancy through a density defined on designs for prediction. Some particular structure of the
parameter p. Such a compounding in the limit case may relationship between two criteria would allow to substitute
cover classical compound information criteria. The process a costly computation by a much less costly one.
is illustrated with relevant examples. Assuming two dif- A thorough study of other regularizations will be of
ferent approaches to deal with an explanatory variable that interest. As we have seen in Sect. 4 we need regularization
is only known after the experiment is performed. The of Lu ¼ w; as it is usual for Fredholm equations of the first
designs obtained in both examples have high efficiencies. kind. We have regularized it by jjujj ¼ 1; u [ 0: This is a
In this paper we have considered the case of simple two natural form of regularization, since we have u in the form
point design measure. This can be generalized for arbitrary of density. Regularization by maximum entropy (Amato
n-point designs using a Dirichlet density for the design and Hughes (1991)) and regularization by reproducing
weights, however further research will be needed to fulfill kernel Hilbert space norm, (Wahba (1977)) would be worth
this aims. further investigation.
The procedure was applied to a real example modeling a Table 1 shows the optimal values for the scaling
flux of methane in troposphere. The Kyoto Protocol aimed parameter. A sensitivity analysis with respect to this
to reduce emissions of methane. The fall in human-induced parameter would be of much interest, but this is a rather
methane emissions in the 1990s was only transitory Le- difficult issue. One problem is that a theoretical proof of
lieveld (2006). We should be aware of deterministic, sto- weak/strong solvability is difficult, since we cope with
chastic and chaotic parts of physical and biological systems Fredholm integral equation of the 1st kind with asymmetric
and we need robust designs for measurement of such a kernel. We would prefer to make a thorough theoretical
complex environmental systems. This paper aims to pro- study of the problem before a sensitivity study, since too
vide a new way for construction of such designs, motivated many imprecisions are involved (e.g. calibration, chaos in
123
numerics or stochastics of data). In this paper the value of 1 o2 log jRðhÞj 2d2 expð2dhÞ
‘‘a’’ is the output of the fminbnd function for the particular 2
¼
2 oh ð1 expð2dhÞÞ2
settings and its calibration is not completely stochastic,
since chaotical aspects of optimization are also present. and
Criteria for discrimination between rival models, e.g. T- !
or KL-optimality [Atkinson and Fedorov (1975), Atkinson 1 o2 RðhÞ1 d2 expð2dhÞðexpð2dhÞ þ 3Þ
E vT v ¼
and Fedorov (1975), López–Fidalgo et al. (2007)] may be 2 oh2 ð1 expð2dhÞÞ2
included under this approach with some generalizations not
and finally
considered in this paper. Assuming two rival models with
pdf’s f1 ðy; x; hÞ and f2 ðy; x; hÞ the kernel will be now a d2 expð2hdÞð1 þ expð2hdÞÞ
Mh ¼ :
function which depend on the FIM’s of each model only ð1 expð2hdÞÞ2
intrinsically,
1
Z Note that for every h [ 0 the maximum 2h2
is attained for
f1 ðy; x; hÞ
/ðh; M1 ðhÞ; M2 ðhÞÞ ¼ f1 ðy; x; hÞ log dy; d ¼ 0.
f2 ðy; x; hÞ
Proposition 2 Let us have
and the function u vanished everywhere except for
h ¼ arg maxh2H /ðh; M1 ðhÞ; M2 ðhÞÞ, where uðh Þ ¼ 1. Yðxi Þ ¼ gðxi ; #Þ þ eðxi Þ; i ¼ 1; 2; x1 ; x2 2 ½0; 1
Here this approach does not add much, but theoretically it
may be considered as a particular case.
cov ðx1 ; x2 Þ ¼ 1 hjx1 x2 j;
Acknowledgments This work was partially done while V. Casero-
Alonso visited the Institute of Statistics of Johannes Kepler University. and only covariance parameter h is parameter of interest.
He wants to thank their hospitality and ideas. This work has been For regularity assumption we suppose that hd\2; h [ 0:
supported by Ministerio de Educación y Ciencia and Fondos FEDER Then the maximal FIM is obtained for maximal d:
MTM2010-20774-C03-01 and Junta de Comunidades de Castilla la
Mancha PEII10-0291-1850 and Amadee, Project Nr. FR 11/2010. The Proof We have jRðhÞj ¼ hdð2 hdÞ;
work of E. Bukina was partially supported by the EU through a Marie-
Curie Fellowship (EST-SIGNAL program: http://est-signal.i3s.unice. 1 o2 log jRðhÞj 2hd þ h2 d 2 þ 2
fr) under the contract Nb. MEST-CT-2005-021175. Milan Stehlı́k was 2
¼
2 oh h2 ðhd 2Þ2
supported by ANR project Desire FWF I 833-N18. The authors are
thankful for helpful comments of Werner G. Müller, Luc Pronzato and and
Joao Rendas. We also thank the editor and reviewers, whose insightful
!
comments helped us to sharpen the paper considerably.
1 o2 RðhÞ1 2hd þ h2 d2 þ 2
E vT v ¼2
2 oh2 h2 ðhd 2Þ2
Appendix: Proofs and technicalities and finally
Proposition 1 Let us have 2hd þ h2 d2 þ 2

Mh ¼ :
Yðxi Þ ¼ gðxi ; #Þ þ eðxi Þ; i ¼ 1; 2; x1 ; x2 2 ½0; 1; h2 ðhd 2Þ2
cðdÞ ¼ 1 expðhdÞ; We have
where c stands for the semi-variogram and only the oMh 2d
covariance parameter h is of interest. Then the maximal ¼
od ð2 hdÞ3
FIM is obtained for d ¼ 0:
So Mh is increasing function for every (acceptable)
Proof We have the log-likelihood function L¼ 0\d\ minf2h ; 1g:
K 12 log jRðhÞj 12 vT RðhÞ1 v; where v ¼ ðYðx1 Þ
T Proposition 3 By direct integration we obtain the
gðx1 ; #Þ; Yðx2 Þ gðx2 ; #ÞÞ : The FIM for the covariance
following:
parameter h is
2 !
oL 1 o2 log jRðhÞj 1 2 1 Z1
T o RðhÞ
Mh ¼ E 2 ¼ þ E v v ;
Lu ¼ uðMexp ðhÞ; Mlin ðhÞ; hÞu ðhÞdh
oh 2 oh2 2 oh2
0
2
o fA g o A2 ¼ A=B;
and we have oh2i;j f oh2i;j g and jRðhÞj ¼ 1 expð2hdÞ:
Further we have where
123
A ¼ 6 expð2hdÞh2 d2 þ 4 expð2hdÞhd h4 d 4 expð2hdÞ Hofmann G (2004) Comparing the Fisher information in record data
and random observations. Stat Pap 45(4):517–528
þ 4h3 d 3 expð2hdÞ 2hd expð4hdÞ þ h2 d 2 expð4hdÞ Kiefer J, Wolfowitz J (1960) The equivalence of two extremum
problems. Can J Math 12:363–366
þ 2 þ 2 expð4hdÞ 4 expð2hdÞ h4 d4 þ 4h3 d3 Kiseľák J, Stehlı́k M (2008) Equidistant D-optimal designs for
3h2 d 2 2hd; parameters of Ornstein–Uhlenbeck process. Stat Probab Lett
78:1388–1396
B ¼ð logðð2hd þ h2 d2 þ 2Þ=h2 =ðh2 d2 4hd þ 4ÞÞh2 d2 Lelieveld J (2006) A nasty surprise in the greenhouse. Nature
443:405–406
þ 4 logðð2hd þ h2 d2 þ 2Þ=h2 =ðh2 d 2 4hd þ 4ÞÞhd López-Fidalgo J, Garcet-Rodrı́guez S (2004) Optimal experimental
4 logðð2hd þ h2 d2 þ 2Þ=h2 =ðh2 d 2 4hd þ 4ÞÞ designs when some independent variables are not subject to
control. J Am Stat Assoc 99:1190–1199
þ logðd2 ðexpð2hdÞ þ 1Þ=ð expð4hdÞ þ 2 expð2hdÞ López-Fidalgo J, Tommasi C, Trandafir PC (2007) An optimal
experimental design criterion for discriminating between non-
1ÞÞh2 d2 4 logðd 2 ðexpð2hdÞ þ 1Þ=ð expð4hdÞ normal models. J R Stat Soc B 69(2):231–242
þ 2 expð2hdÞ 1ÞÞhd þ 4 logðd2 ðexpð2hdÞ þ 1Þ= Martı́n-Martı́n R, Torsney B, López-Fidalgo J (2007) Construction of
marginally and conditionally restricted designs using multipli-
ð expð4hdÞ þ 2 expð2hdÞ 1ÞÞÞ: cative algorithms. Comput Stat Data Anal 51:5547–5561
McGree JM, Eccleston JA, Duffull SB (1988) Compound optimal
design criteria for nonlinear models. J Biopharm Stat
18(4):646–661
Müller WG, Pronzato L (2009) Towards an optimal design equiva-
lence theorem for random fields? IFAS report Nr. 45 of the
References Department for Applied Statistics of the Johannes Kepler
University in Linz
Ahmadi J, Arghami NR (2003) Comparing the Fisher information in Müller WG, Stehlı́k M (2009) Issues in the optimal design of
record values and iid observations. Stat A J Theor Appl Stat computer simulation experiments. Appl Stoch Models Bus Ind
37(5):435–441 25:163–177
Alshunnar FS, Raqab MZ, Kundu D (2012) On the comparison of the Müller WG, Stehlı́k M (2010) Compound optimal spatial designs.
Fisher information of the log-normal and generalized Rayleigh Environmetrics 21:354–364
distributions. J Appl Stat 37(3):391–404 Pázman A (2010) Information contained in design points of
Amato U, Hughes W (1991) Maximum entropy regularization of experiments with correlated observations. Kybernetika
Fredholm integral equations of the first kind. Inverse Probl 46(4):771–783
7:793–808 Rodrı́guez-Dı́az JM, Santos-Martı́n MT, Waldl H, Stehlı́k M (2012)
Amo-Salas M, López-Fidalgo J, Porcu E (2013) Optimal designs for Filling and D-optimal designs for the correlated generalized
some stochastic processes whose covariance is a function of the exponential models. Chemometr Intell Lab Syst 114:10–18
mean. Test 22:159–181 Sacks J, Schiller SB, Welch WJ (1989) Design for computer
Atkinson AC, Fedorov VV (1975) The designs of experiments for experiments. Technometrics 31(1):41–47
discriminating between two rival models. Biometrika 62:57–70 Tandeo P, Ailliot P, Autret E (2011) Linear Gaussian state-space
Atkinson AC, Fedorov VV (1975) Optimal design: experiments for model with irregular sampling: application to sea surface
discriminating between several models. Biometrika 62:289–303 temperature. Stoch Environ Res Risk Assess 25:793–804
Baldi Antognini A, Zagoraiou M (2010) Exact optimal designs for Tikhonov AN, Arsenin VY (1977) Solutions of ill-pared problems.
computer experiments via Kriging metamodelling. J Stat Plan Wiley, New York
Inference 140:2607–2617 Unami K, Abagale FK, Yangyuoru M, Alam AHMB, Kranjac-
Barzilai J, Borwein JM (1988) Two-point step size gradient methods. Berisavljevic G (2010) A stochastic differential equation model
IMA J Numer Anal 8:141–148 for assessing drought and flood risks. Stoch Environ Res Risk
Casero-Alonso V, López-Fidalgo J (2014) Experimental designs in Assess 24:725–733
triangular simultaneous equations models. Stat Pap (in press) Wahba G (1977) Practical approximate solutions to linear operator
Conlisk J (1979) Design for simultaneous equations. J Econom equations when the data are noisy. SIAM J Numer Anal
11(1):63–76 14:651–667
Cook RD, Wong WK (1994) On the equivalence of contrained and Zhigljavsky AA, Pronzato L, Bukina E (2013) An asymptotically
compound optimal designs. J Am Stat Assoc 89(426):687–692 optimal gradient algorithm for quadratic optimization with low
Crary SB (2002) Design of computer experiments for metamodel computational cost. Optim Lett 7(6):1047–1059
generation. Analog Integr Circuits Signal Process 32:7–16
Hansen PC (1992) Numerical tools for analysis and solution of
Fredholm integral equations of the first kind. Inverse Probl
8:849–872
123

2014 Serra

Uploaded by

Copyright:

Available Formats

2014 Serra

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2014 Serra

Uploaded by

Copyright:

Available Formats

Robust integral compounding criteria for

trend and correlation structures

M. Stehlík, J. López-Fidalgo, V. Casero-

Stochastic Environmental Research

Stoch Environ Res Risk Assess

Robust integral compounding criteria for trend and correlation

Ó Springer-Verlag Berlin Heidelberg 2014

bd takes to optimize the simple average of the kernel on the

will provide a suitable robust optimal design. Notice, that equivalent.

of sub-vector hi of h, i ¼ 1; 2. After finding an appropriate 1.3 Motivating example: misspecification of linear

Fig. 1 Information gain for two

criterion of the form mA1 log jM1 ðhÞj þ 1A M2 ðhÞÞuðhÞ dh ¼ 0:

inversion operator is not in general continuous so that X

(typically smoothness) and k is a regularization parameter 2wT wðnj Þ/ðnj jHÞ;

To construct the matrix A we sum up to q (number of the 2 32

R1 Gradient Descent), nowadays regarded as an algorithm

For this criterion the resulting matrix A is usually very ill-

Table 1 Values of the optimal scaling factor a Eðyjx; zÞ ¼ g1 ðyjx; z; h1 Þ ¼ a þ bx þ cz

and 5.1 A methane case study

Z1 Rodrı́guez-Dı́az et al. (2012) considered a case study on

The maximum of is reached at dr ¼ 0. Then

Z1 by information aggregation, given by a simple additive

Proposition 1 Let us have 2hd þ h2 d2 þ 2

You might also like

The maximum of is reached at dr ¼ 0. Then