Open Research Online
The Open University’s repository of research publications
and other research outputs
Elicitation of Subjective Probability Distributions
Thesis
How to cite:
Elfadaly, Fadlalla Ghaly Hassan Mohamed (2012).
thesis The Open University.
Elicitation of Subjective Probability Distributions.
PhD
For guidance on citations see FAQs.
c 2012 The Author
https://creativecommons.org/licenses/by-nc-nd/4.0/
Version: Version of Record
Link(s) to article on publisher’s website:
http://dx.doi.org/doi:10.21954/ou.ro.0000f119
Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright
owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies
page.
oro.open.ac.uk
uwResTRicrrep-
E licitation o f Subjective P robability
D istributions
By
Fadlalla G haly H assan M oham ed Elfadaly
BSc., Cairo U niversity . M Sc., Cairo U niversity .
A th esis su b m itted for th e D egree o f D octor o f P h ilosop h y in S ta tistics
c
=}
c<u
a.
O
a
D epartm ent o f M athem atics and S tatistics
T he O pen U niversity, U K
A pril 2012
DcdX oL Subivu-ssvorv: 2 3 /vpiAA Zoil.
D a tt oj/ IxwarcL-.
3
Zoi2_
ProQuest Number: 13835951
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is d e p e n d e n t upon the quality of the copy subm itted.
In the unlikely e v e n t that the a u thor did not send a c o m p le te m anuscript
and there are missing pages, these will be noted. Also, if m aterial had to be rem oved,
a n o te will ind ica te the deletion.
uest
ProQuest 13835951
Published by ProQuest LLC(2019). C opyright of the Dissertation is held by the Author.
All rights reserved.
This work is protected against unauthorized copying under Title 17, United States C o d e
M icroform Edition © ProQuest LLC.
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 4 8 1 0 6 - 1346
A cknow ledgem ents
Faithful gratitude, sincere thanks, and appreciation are due to Prof. Paul Garthwaite, The
Open University, UK, for suggesting the research topic, his supervision, guidance, valuable
advice, encouragement, kindness, deep interest and continuous help during the preparation
of this thesis.
I would like also to thank my co-supervisor Dr. Robin Laney, The Open
University, UK, for his advices, directions and his continuous willingness to help. I would like
to express my deepest gratitude to my viva examination panel, Prof. Jim Smith, Warwick
University, UK, Prof. Kevin McConway and Dr. Karen Vines, The Open University, UK, for
their valuable comments, constructive criticism and helpful suggestions.
Also, many thanks are due to the experts whose opinions were quantified in the examples
of this thesis. I am very grateful to Dr. Neville Calleja, M inistry of Health, the Elderly and
Community Care, M alta, for quantifying his opinion in the obesity misclassification example,
and to Dr. Stephen Burnley and Dr. James W arren, The Open University, UK, for quantifying
their opinions in the waste collection and transport preferences examples, respectively.
I wish to thank all members of the Statistics Group, The Open University, UK. They
all helped me a lot in a very cooperative and supportive research environment th a t leads
to continuous progress and achievement. Special gratitude to the previous PhD students,
Dr. Yoseph Araya, Dr. David Jenkinson, Dr. Swarup De, Dr. Youssef Elaziz, Dr. Steffen
Unkel, Dr. Angela Noufaily and Dr. Doyo Gragn and also to the current PhD students,
Mr. Osvaldo Anacleto-Junior, Mr. Yonas Weldeselassie, Mr. Alexandre Santos, Miss. Sofia
Villers. They formed a great academic and social atmosphere for effective work.
True gratitude and deep appreciation are due to Prof. Abdel-Hamid Nigm, Cairo Univer
sity, Egypt, for suggesting, encouraging, and making fruitful efforts to help me undertake my
PhD in the UK. I am also very grateful to Prof. Sanaa El Gayar, Cairo University, Egypt,
for her faithful guidance and support during my studies for BSc and MSc degrees. She really
guided my first steps on an academic career. I am highly indebted to Dr. Osam a Saleh, Cairo
University, Egypt, for being such a sincere, supportive and helpful friend.
Heartily and earnest thankfulness to the soul of my late father Mr. Ghaly Elfadaly, my
caring m other Mrs. Aziza Belal and my two kind sisters Dr. Hanan and Mrs. Fadila Elfadaly,
for their faithful wishes and prayers. I am truly and heartily grateful to my beloved wife,
Mrs. Nehal M arghany for her steady love, care and support, and to our son, M aster. Malek
Elfadaly who lightened up our life with cheer, happiness and innocence.
A bstract
To incorporate expert opinion into a Bayesian analysis, it must be quantified as a prior distribution
through an elicitation process that asks the expert meaningful questions whose answers determine
this distribution. The aim of this thesis is to fill some gaps in the available techniques for eliciting
prior distributions for Generalized Linear Models (GLMs) and multinomial models.
A general method for quantifying opinion about GLMs was developed in Garthwaite and AlAwadhi (2006). They model the relationship between each continuous predictor and the dependant
variable as a piecewise-linear function with a regression coefficient at each of its dividing points. How
ever, coefficients were assumed a priori independent if associated with different predictors. We relax
this simplifying assumption and propose three new methods for eliciting positive-definite variancecovariance matrices of a multivariate normal prior distribution. In addition, we extend the method of
Garthwaite and Dickey (1988) for eliciting an inverse chi-squared conjugate prior for the error variance
in normal linear models. We also propose a novel method for eliciting a lognormal prior distribution
for the scale parameter of a gamma GLM.
For multinomial models, novel methods are proposed that quantify expert opinion about a conju
gate Dirichlet distribution and, additionally, about three more general and flexible prior distributions.
First, an elicitation method is proposed for the generalized Dirichlet distribution that was introduced
by Connor and Mosimann (1969). Second, a method is developed for eliciting the Gaussian copula as
a multivariate distribution with marginal beta priors. Third, a further novel method is constructed
that quantifies expert opinion about the most flexible alternate prior, the logistic normal distribution
(Aitchison, 1986). This third method is extended to the case of multinomial models with explanatory
covariates.
All proposed methods in this thesis are designed to be used with interactive Prior Elicitation
Graphical Software (PEGS) that is freely available at http://statistics.open.ac.uk/elicitation.
C ontents
1
In tro d u ctio n
2
L iteratu re rev iew
10
2.1
In tro d u c tio n ....................................................................................................................
11
2.2 Psychological aspects in eliciting o p in io n ..................................................................
11
2.3 Prior elicitation for normal linear models
...............................................................
15
2.4 Prior elicitation for G L M s ............................................................................................
21
2.5 Prior elicitation for multinomial m o d e l s ..................................................................
37
2.6
O ther general graphical elicitation s o f t w a r e ..........................................................
49
2.7
Concluding c o m m e n ts .................................................................................................
51
3
1
T h e p iecew ise-lin ear m od el for prior e lic ita tio n in G LM s
53
3.1
In tro d u c tio n ....................................................................................................................
54
3.2
The elicitation m ethod for piecewise-linear models (GA m e t h o d ) ...................
56
3.2.1
The piecewise-linear m o d e l ...........................................................................
57
3.2.2
Eliciting the hyperparam eters of the m ultivariate normal prior
. . . .
60
3.2.3
Computing values for the suggested assessments
....................................
67
Assessment tasks and software d e s c rip tio n .............................................................
71
3.3.1
Defining the m o d e l...........................................................................................
71
3.3.2
Defining the response variable and c o v a r ia te s ..........................................
72
3.3.3
Initial medians a s s e s s m e n ts ...........................................................................
74
3.3
iv
3.4
4
The feedback s t a g e ............................................................................................
75
3.3.5
Conditional medians a sse ssm e n ts..................................................................
76
3.3.6
Conditional quartiles assessm ents..................................................................
77
Concluding c o m m e n ts .................................................................................................
81
E licitin g a covariance m a trix for d ep en d an t coefficien ts in G LM s
83
4.1
In tro d u c tio n ....................................................................................................................
84
4.2
A proposed m ethod for eliciting the variance-covariance m atrix of a pair of
4.3
5
3.3.4
correlated vectors of coefficients..................................................................................
85
4.2.1
Notations and theoretical framework . .
.................................................
85
4.2.2
Assessment tasks and software d e sc rip tio n .................................................
91
4.2.3
On the positive-definiteness of the elicited covariance m atrix
95
..............
A nother elicitation m ethod for the variance-covariance m atrix of correlated
coefficients.............................
98
4.3.1
The case of two vectors of correlated c o e ffic ie n ts .....................................
99
4.3.2
The case of various vectors of correlated c o e ffic ie n ts ..............................
103
4.3.3
Assessment t a s k s ...............................................................................................
109
4.4
A general flexible elicitation m ethod for correlated coefficients.........................
114
4.5
Concluding comments
120
..............................................................................................
E licitin g prior d istrib u tio n s for e x tr a p aram eters o f so m e G LM s
122
5.1
In tro d u c tio n .........................................
123
5.2
Eliciting a prior distribution for the error variance in normal GLMs
5.3
............
124
...........................................
125
........................................................
133
5.2.1
The m athem atical framework and notations
5.2.2
Im plem entation and assessment tasks
Eliciting a prior distribution for the scale param eter in gamm a GLMs
... .
137
5.3.1
GLMs with a gamma distributed response v a r ia b le ..................................
138
5.3.2
Assessment t a s k s ...............................................................................................
143
v
5.4
6
148
E licitin g D irich let priors for m u ltin om ial m od els
150
6.1
In tro d u c tio n ....................................................................................................................
151
6.2
Eliciting beta param eters using q u a rtile s ................................................................
152
6.2.1
Introduction
152
6.2.2
Normal approximations for beta elicitation
6.2.3
Least-squares optimizations for beta param eters
6.3
6.4
7
Concluding c o m m e n ts .................................................................................................
......................................................................................................
..............................................
154
.....................................
158
Eliciting a Dirichlet prior for a multinomial m o d e l ..............................................
160
......................................................................................................
160
6.3.1
Introduction
6.3.2
The
multinomial and Dirichlet d istrib u tio n s...........................................
161
6.3.3
The
marginal approach
...............................................................................
162
6.3.4
The
conditional a p p r o a c h ............................................................................
167
Concluding c o m m e n ts ...............................................................................................
173
E licitin g m ore flexib le priors for m u ltin om ial m o d els
174
7.1
In tro d u c tio n ....................................................................................................................
175
7.2
Eliciting a generalized Dirichlet prior fora multinomial m o d e l.................
176
7.2.1
Connor-Mosimann d is tr ib u tio n ......................................................................
176
7.2.2
Assessment t a s k s ...............................................................................................
179
7.2.3
Marginal quartiles of the generalized Dirichlet d istrib u tio n ......................
180
7.3
Example: Obesity m isclassification..........................................................................
182
7.4
Constructing a copula function for the prior d is trib u tio n ....................................
189
7.4.1
Gaussian copula f u n c t i o n ...............................................................................
189
7.4.2
Assessment t a s k s ...............................................................................................
192
7.4.3
Eliciting a positive-definite correlation m atrix R
197
......................................
7.5
Example: W aste collection...............................................................................................203
7.6
Concluding c o m m e n ts .................................................................................................
vi
210
8
E licitin g lo g istic norm al priors for m u ltin om ial m od els
8.1
In tro d u c tio n ........................................................................................................................ 212
8.2
The additive logistic normal d is trib u tio n ..................
8.2.1
8.3
8.4
9
211
212
Approximate distribution of the lognormal s u m .............................................215
Assessment t a s k s ..........................................................................................................
217
8.3.1
Assessing initial medians
...................................................................................217
8.3.2
Assessing conditional q u a r t i l e s ......................................................................... 218
8.3.3
Assessing conditional m ed ia n s............................................................................ 220
Eliciting prior h y p e rp a ra m e te rs......................................................................................221
8.4.1
Eliciting a mean vector
...................................................................................... 227
8.4.2
Eliciting a variance-covariance m a t r i x ............................................................ 227
8.5
Feedback using marginal quartiles of the logistic normal p r i o r ............................. 240
8.6
Example: Transport preferences.....................................................................................245
8.7
Concluding c o m m e n ts ..................................................................................................... 250
E licitin g m u ltin om ial m od els w ith covariates
252
9.1
In tro d u c tio n .........................................
253
9.2
The base-line multinomial logit m o d e l ........................................................................ 254
9.3
N otation and theoretical framework
9.4
Eliciting the mean vector
9.5
Eliciting the variance m atrix
9.6
............................
255
...........................................................................................
258
........................................................................................ 261
9.5.1
Eliciting the variance-covariance sub-matrices
9.5.2
Assessing conditional q u a r t i l e s .................
9.5.3
Assessing conditional m ed ia n s............................................................................. 263
9.5.4
Eliciting the covariance m atrix E ajjg ............................................................
265
Concluding c o m m e n ts .................................................................................................
268
10 C on clu d in g com m en ts
...................................261
262
269
vii
List of Figures
3.1
A piecewise-linear relationship given by median asse ssm e n ts.............................
56
3.2
A bar chart relationship for a factor given by median assessm ents...................
57
3.3
The dialogue box for defining the m o d e l .................................................................
72
3.4
The feedback s c r e e n ....................................................................................................
76
3.5
Conditional median assessments for the continuous covariate “Weight” . . . .
77
3.6
Quartile assessments for a continuous c o v a r ia te ....................................................
78
3.7
Quartile assessments for a f a c t o r ..............................................................................
79
3.8
Assessing quartiles conditioning on two fixed p o in ts .............................................
80
3.9
Assessing conditional quartiles for the last level of a f a c to r ................................
81
4.1
Assessments needed in the first phase for correlated covariates..........................
92
4.2
Assessments needed in the second phase for correlated c o v a ria te s ...................
94
4.3
Assessments needed for two correlated v a ria b le s....................................................
110
4.4
Assessments needed for five correlated v a ria b le s....................................................
116
4.5
Assessments needed for various correlated v a ria b le s .............................................
119
5.1
Three dimension plots of d {q^jqj)jdv against v and Cj for various sample sizes
k ( j ) ..........................................
131
5.2
Assessing a median value conditioning on a set of d a t a ......................................
135
5.3
The output table showing the elicited h y p erp aram eters.......................................
136
5.4
Changes in quartile values with the change of A a t different mean values. . . . 142
5.5
The main software panel for assessing gamma p aram eter,...................................
viii
145
6.1 Assessing probability quartiles of each c a te g o r y .....................................................
159
6.2 A feedback screen showing 2 different quartile o p t i o n s ........................................
166
6.3 Assessing conditional quartiles for Dirichlet elicitation
.....................................
168
6.4 Assessing conditional quartiles with scaled beta feedback.....................................
169
6.5 The feedback graph presenting marginal q u a rtile s ..................................................
172
7.1 Medians and quartiles a s se s s m e n ts ............................................................................
185
7.2 Assessing conditional m e d ia n s .....................................................................................
186
..................................................................................
187
7.3 Assessing conditional quartiles
7.4 Assessing conditional quartiles for copula elicitation
...........................................
194
7.5 Assessing conditional medians for copula e lic ita tio n ...............................................
196
7.6 Software suggestions for conditional medians
197
........................................................
7.7 The initially assessed marginal medians and q u a rtile s............................................... 204
7.8 The coherent assessments suggested by the s o f t w a r e ................................................205
7.9 Assessing conditional quartiles
.......................................................................................206
7.10 Assessing conditional quartiles for the last two categories
...................................... 207
7.11 Assessing conditional m e d ia n s ......................................................................................
208
8.1 Assessing probability medians for logistic normal e lic ita tio n ...................................218
8.2 Assessing conditional quartiles with lognormal f e e d b a c k ......................................... 219
8.3 Assessing conditional medians for logistic normal e lic ita tio n ...................................221
8.4 Software suggestions for initial medians
8.5 Assessing conditional quartiles
.......................................................................246
.......................................................................................247
8.6 Revised conditional m e d ia n s ............................................................................................. 248
8.7 Software suggestions for marginal medians and q u a r t i le s ......................................... 249
9.1 Assessing probability medians at age = 40 years
...................................................... 259
9.2 Assessing conditional medians at age = 40 years
...................................................... 264
9.3 Assessing conditional medians given changes at the reference p o i n t ......................266
ix
10.1 Options for assessing correlations between regression coefficients
........................273
10.2 A flowchart of the prior elicitation software for multinomial m o d e ls .................... 277
x
List of Tables
7.1 Probability assessments for different elicited p r i o r s ..............................................
188
7.2 E x p ert’s assessments of medians and q u a rtile s ............................................................ 205
7.3 E xpert’s assessments of conditional q u a rtile s................................................................206
7.4 E x p ert’s assessments of conditional m e d ia n s ................................................................208
7.5 The elicited hyperparam eters of marginal b eta d is trib u tio n s .................................. 209
7.6 The elicited covariance m atrix of the Gaussian copula p r i o r .................................. 209
7.7 Probability means and variances from marginal b e ta d istrib u tio n s.........................210
8.1 The elicited mean vector of a logistic normal p r i o r ...................................................249
8.2 The elicited variance-covariance m atrix of a logistic normal p r i o r .........................250
xi
C hapter 1
Introduction
In many situations there is a substantial amount of information th a t is only recorded in
the experience and knowledge of experts. To efficiently use this knowledge as an input to a
statistical analysis, the experts m ust be asked meaningful questions whose answers determine
a probability distribution. This process is referred to as elicitation and different forms of
probability model require different elicitation methods.
Bayesian statistics offers an approach in which data and expert opinion are combined
at the modelling stage, yielding probabilities th a t are a synthesis of the survey data and
the expert’s opinion. To incorporate expert opinion into a Bayesian analysis, it m ust be
quantified as a prior distribution. This should be accomplished through an elicitation process
th a t asks the expert to perform various assessment tasks. These tasks include questions th a t
the expert is able to comprehend and answer accurately according to her prior knowledge,
w ithout needing to know about m athem atical and statistical coherence th a t is required in
her assessments.
The elicitation of prior beliefs has been studied extensively in the statistical, psycho
logical, decision and risk analysis literature. Elicitation techniques have been proposed for
many probabilistic models including both univariate and m ultivariate probability distribu
tions. However, achieving accurate elicitation is not an easy task, even for single events or
univariate distributions. The difficulty increases for m ultivariate distributions in which many
constraints m ust be imposed on the expert’s assessments to be statistically coherent. Due to
this complexity, relatively little literature deals with elicitation techniques for m ultivariate
distributions. O ’Hagan et al. (2006) argued th a t the lack of elicitation m ethods for m ulti
variate models and the lack of user-friendly elicitation software to implement them constitute
remarkable deficiencies in the existing elicitation research.
The aim of this thesis is to fill some gaps in the available techniques for eliciting prior
distributions for m ultivariate models. We are mainly interested in eliciting prior distribu
tions for the param eters of Generalized Linear Models (GLMs) and multinom ial models. We
extend some of the available methods of prior elicitation for GLMs param eters and propose
some original novel methods for eliciting different prior distributions for the param eters of
multinomial models. All proposed methods in this thesis are designed to be used with in
teractive graphical software th at is w ritten in Java and tailored to the specific requirements
of each m ethod. These pieces of software are freely available as Prior Elicitation Graphical
Software (PEGS) at http://statistics.open.ac.uk/elicitation.
The elicitation methods for GLMs th a t are available in the literature focus mainly on
logistic regression. A more general elicitation m ethod for quantifying opinion about a logistic
regression model was developed in Garthwaite and Al-Awadhi (2006). The m ethod is very
general and flexible and can be generalized to GLMs with any link function. The same authors
proposed this generalization in an unpublished paper, Garthw aite and Al-Awadhi (2011). In
their m ethod, the relationship between each continuous predictor and the dependant variable
is modeled as a piecewise-linear function and each of its dividing points is accompanied w ith a
regression coefficient. However, a simplifying assum ption was made regarding independence
between these coefficients, in the sense th a t regression coefficients were a priori independent
if associated with different predictors. One of the main purposes of this thesis is to relax
the independence assum ption between coefficients of different variables. Then the variancecovariance m atrix of the prior distribution is no longer block-diagonal. Different elicitation
methods for this more complex case are proposed and it is shown th a t the resulting variancecovariance m atrix is positive-definite.
The m ethod of Garthw aite and Al-Awadhi (2006)
was designed to be used w ith the aid of interactive graphical software. It has been used in
practical case studies to quantify the opinions of ecologists and medical doctors (Al-Awadhi
and Garthw aite (2006); Garthw aite et al. (2008)). The software is revised and extended
further in this thesis to handle the case of GLM w ith correlated pairs of covariates.
Available methods of prior elicitation for GLMs all concentrate on the task of quantify
ing opinion about regression coefficients. For some GLMs, such as logistic regression, this
determines the prior distribution completely. B ut w ith some other common GLMs, such as
the normal linear model and gamma GLMs, prior opinion about an extra param eter m ust
also be quantified in order to obtain a prior distribution for all model param eters. For this
reason, we extend the m ethod of Garthwaite and Dickey (1988) for eliciting an inverse chisquared conjugate prior for the error variance in normal linear models. We also propose a
novel m ethod for eliciting the scale param eter of a gamma GLM.
The other m ultivariate model for which we develop original elicitation methods in this
thesis is the multinomial model. M ultinomial models consist of items th a t belong to a number
of complementary and m utually exclusive categories. These models arise in m any scientific
disciplines and industrial applications. The multinomial d a ta are well described using the
multinomial distribution, say with param eter vector p. In Bayesian analysis of multinomial
models, an im portant assessment task is to elicit an informative joint prior distribution for
the multinomial probabilities p. It is well-known th a t the Dirichlet distribution is a conju
gate prior for the param eters of multinomial models. A limited number of attem pts have
been made to introduce elicitation methods for Dirichlet param eters. However, the Dirichlet
distribution has been criticized as insufficiently flexible to represent prior information about
the param eters of multinomial models [e.g.Aitchison (1986), O ’Hagan and Forster (2004)].
Its main drawback is th a t it has a limited num ber of param eters. A fc-variate Dirichlet distri
bution is specified by just k param eters th a t determine all means, variances and covariances.
Dirichlet variates are always negatively correlated, which may not represent prior belief.
Several authors have been interested in constructing new families of sampling distributions
to model proportions.
Some of these distributions can be used as prior distributions for
the probabilities of multinomial models. See, for example, Forster and Skene (1994) and
Wong (1998). However, elicitation methods th a t give these more flexible families as prior
distributions for multinomial models have not been proposed. It is tricky, in the case of
multinomial models, to elicit assessments th a t satisfy all the necessary constraints. Some
of these constraints are obvious; the probabilities of each category m ust be non-negative
and sum to one, for example. Others are less obvious. For example, if there are only two
categories, the lower quartile for one category and the upper quartile of the other category
m ust add to one. As the num ber of categories increases the constraints th a t must be satisfied
increase and become less intuitive.
Partly because of these difficulties, no doubt, elicitation methods and software for multi
nomial sampling seem to have been constructed only for modelling opinion by a Dirichlet
distribution. In this thesis, we propose novel m ethods th a t quantify expert opinion about
a Dirichlet distribution and additionally about three more general and flexible prior dis
tributions. First, an elicitation m ethod is proposed for a generalized Dirichlet distribution
as a more flexible prior distribution. The generalized Dirichlet distribution, introduced by
Connor and Mosimann (1969), has a more general covariance structure than the standard
Dirichlet distribution and a larger number of param eters. Second, another m ethod elicits the
Gaussian copula as a m ultivariate prior th a t expresses the dependence structure between the
marginal beta priors of multinomial probabilities using a m ultivariate normal distribution.
Third, a further novel m ethod quantifies expert opinion about the most flexible alternate
prior, the logistic normal distribution, Aitchison (1986). W ith this distribution, the m ulti
nomial probabilities are transform ed to variables th a t (by assumption) follow a m ultivariate
normal distribution, using a m ultivariate form of the logistic transform ation. These different
elicitation m ethods are each implemented in interactive graphical software.
The logistic normal distribution has a large number of param eters and gives a prior
distribution with a much more flexible dependence structure. Moreover, assuming a logistic
normal prior for multinomial models enables us to extend the elicitation m ethod to the case
of multinomial models with explanatory covariates. For these models, we proposed a m ethod
for eliciting a m ultivariate normal prior distribution for the regression coefficients based on
the m ultivariate logistic transform ation.
The assessment tasks and the task structure implemented in all the proposed m ethods
lead to coherent assessments without the expert having to be conscious of coherence con
straints. Using the interactive software, the expert is only required to assess conditional
an d /o r unconditional medians and quartiles for the elements of the probability vector p. For
each of the available prior distributions, the expert does not need to be conscious of the con
straints on her assessments. Instead, through the software we suggest coherent values th at
are close to her initial assessments, which she may accept or modify.
This thesis consists of 10 chapters. After this introductory chapter, C hapter 2 first gives a
brief review of the main findings and considerations from psychological literature th a t should
influence the construction of elicitation methods. Then the most relevant methods of eliciting
prior distribution for normal linear models and GLMs are reviewed and discussed. Interactive
computer software for these purposes is also listed with some of the different applications for
which they have been used. In addition, the limited literature of prior elicitation methods
for multinomial models is also reviewed, together with its implementing software. We also
discuss some recent interactive graphical computer programs th a t have been reported in the
literature for some other problems.
In Chapter 3, the piecewise-linear model of G arthwaite and Al-Awadhi (2006), for eliciting
m ultivariate normal priors for regression coefficients in GLMs is reviewed in detail and the
assessment tasks th a t the expert performs to quantify her opinion are discussed. Also, we
describe the software th a t implements it and detail improvements to the im plem entation th a t
were made by the author of this thesis.
As mentioned earlier, the elicitation m ethod of Garthw aite and Al-Awadhi (2006) makes
the simplifying assumption th a t the regression coefficients associated w ith different predictors
are independent in the prior distribution. In Chapter 4, we propose 3 new m ethods for eliciting
positive-definite variance-covariance matrices of a m ultivariate normal prior for regression
coefficients th a t do not require this simplifying assumption.
Each m ethod is a trade-off
between flexibility and the number of assessments th a t must be made by the expert.
The first m ethod proposed in Chapter 4 is an extension to the m ethod of Garthw aite
and Al-Awadhi (2006). It is the most flexible of the m ethods but it needs a large num ber of
assessments. The second m ethod requires fewer assessments but assumes a restricted correla
tion pattern between regression coefficients. The third m ethod first uses one of the other two
methods to obtain the correlations between the regression coefficients of two predictors. Then
all other correlations are induced through some assessed weights th a t reflect the m agnitude
of correlations relative to each other. The expert assesses these weights and then an imple
menting software presents interactive graphs th a t help her review and revise assessments to
her satisfaction.
In C hapter 5, we introduce two elicitation m ethods th a t aim to complete the prior struc
ture of the normal and gamma GLMs. The methods quantify expert opinion about prior
distributions for the extra param eters of these models. The first proposed methods elicits
a conjugate inverted chi-squared prior distribution for the error variance in normal models.
Our proposed m ethod is based on the expert’s assessments of medians and conditional me
dians of the absolute difference between two observed values of the response variable at the
same design point. It extends the m ethod of Garthw aite and Dickey (1988) by using more
than one d a ta set of hypothetical future samples.
The second proposed m ethod in C hapter 5 is a novel m ethod for eliciting a lognormal
prior distribution for the scale param eter of gamma GLMs. Given the mean value of a gamm a
distributed response variable, the m ethod is based on conditional quartile assessments. It
can also be used to quantify an expert’s opinion about the prior distribution for the shape
param eter of any gamma random variable, if the mean of the distribution has been elicited
or is assumed to be known.
Chapter 6 proposes two methods for eliciting a standard Dirichlet prior distribution for
multinomial probabilities, using either a m arginal or a conditional approach.
The main
difference between the two proposed approaches is in the assessment tasks th a t they require.
In the marginal approach, the expert assesses unconditional medians and quartiles for each
multinomial probability pi. Then we use these quartiles to obtain a marginal beta distribution
for each pi. The param eters of these m arginal betas are reconciled to form a standard Dirichlet
distribution. Three different forms of reconciliation are used, each based on least-squares
optimizations. For each optim ization m ethod, the medians and quartiles of the consequent
Dirichlet distribution are computed and graphically presented to the expert, who chooses
which of the Dirichlet distributions best represents her opinion. She is also offered the option
to change the medians and quartiles if none of the offered sets is an adequate representation
of her opinions.
The other approach proposed in Chapter 6 is the conditional approach. Using this ap
proach, the expert is asked to assess the median and quartiles of the first probability. For
each of the remaining probabilities, she assesses conditional medians and quartiles, where
the conditions state values for the preceding probabilities th at the expert should treat as
correct when making her assessments. These conditional assessments are then used to form
conditional beta distributions th a t are also reconciled into a standard Dirichlet distribution.
New elicitation methods for two more general prior distributions for multinomial models
are proposed in Chapter 7. The first m ethod uses the same conditional assessments, as
obtained in C hapter 6, to elicit a flexible generalized Dirichlet prior, a Connor-Mosimann
distribution, through its conditional beta distributions.
The flexibility of the generalized
Dirichlet distribution means th a t the elicited param eters of these conditional betas are exactly
the same hyperparam eters of the elicited generalized Dirichlet prior; no reconciliation is
required. This elicitation m ethod and the elicitation methods proposed in C hapter 6 are
compared in an example in Section 7.3. In the example, a prominent medical expert in M alta
quantified his prior opinions about obesity misclassification in health surveys in M alta.
The second proposed m ethod in C hapter 7 elicits a Gaussian copula prior for the m ultino
mial probabilities. To do this, marginal beta distributions for the multinomial probabilities
are obtained from their assessed unconditional medians and quartiles.
Then the correla
tions between the multinomial probabilities are elicited using extra sets of assessments of
their conditional medians and quartiles. The proposed Gaussian copula prior assumes th a t
the dependence structure between the multinomial probabilities can be represented by a
m ultivariate normal distribution, where the m arginal prior distribution of each multinomial
probability is still expressed as a beta distribution. In Section 7.5, the proposed elicitation
method and its implementing software are used by an environmental engineering expert to
quantify his opinion about the fuel used by waste collection vehicles in the UK.
In Chapter 8, a novel m ethod is proposed for eliciting a logistic normal prior distribution
for the probabilities of a multinomial distribution. The m ethod requires conditional medians
and quartiles of multinomial probabilities to be assessed. No beta distribution is elicited,
instead, a monotonic m ultivariate logistic transform ation is used to transform these assess
ments into medians and quartiles of a m ultivariate normal vector. Then a mean vector and a
positive-definite covariance m atrix of the m ultivariate normal are determined using the trans
formed quartiles. The adopted structural m ethod of getting assessments guarantees th a t the
elicited variance-covariance m atrix is positive-definite. Chapter 8 also gives an illustrative
example in which prior knowledge of a transport expert is quantified to elicit a logistic normal
prior distribution for a multinomial model about a transportation problem.
The elicitation m ethod proposed in C hapter 8 for logistic normal priors of multinomial
distributions is extended further in C hapter 9 to handle multinomial models th a t contain
explanatory covariates. Our extended m ethod in Chapter 9 elicits a m ultivariate normal
prior distribution for the regression coefficients associated w ith different covariates in a form
of the base-line multinomial logit model. For k categories and m covariates, the model th a t
contains a constant term has exactly (k — l ) ( m + 1) free param eters. In Chapter 9, we show
th a t the same assessment tasks of Chapter 8 can be repeated for each covariate to elicit a
mean vector and a positive-definite variance-covariance m atrix of a m ultivariate normal prior
distribution for the (k — l){m + 1) regression coefficients.
Concluding comments are given in C hapter 10 where some directions for future research
are also considered.
9
C hapter 2
Literature review
10
2.1
In trod u ction
Relatively recent comprehensive reviews of eliciting probability distributions in its theory,
methods, techniques, software, applications and case studies are found in Garthw aite et al.
(2005), O ’Hagan et al (2006) and Jenkinson (2007). The aim of this chapter is to review
the recent literature on quantifying expert opinion th a t is most relevant to eliciting prior
distributions for Bayesian GLMs and multinomial models. The emphasize here is on the
different statistical formulations of elicitation models as well as on the design of the software
pieces available in the literature as elicitation tools.
A brief review of some im portant elicitation topics, ideas and psychological aspects is
given in Section 2.2. The im portant elicitation m ethod of Kadane et al. (1980) for normal
linear models is reviewed in Section 2.3, where some other elicitation m ethods for these
models are also reviewed briefly. Im portant and recent elicitation m ethods and software
tools available in the literature for the prior distributions of Bayesian GLMs are reviewed in
Section 2.4. However, most of these methods and their accompanying com puter programs
were devoted to prior elicitation of the Bayesian logistic regression models w ith anticipated
extensions to the more general family of GLMs. Section 2.5 reviews available m ethods and
computer programs for quantifying expert’s opinion about priors for multinomial models. As
expected, the m ajority of these methods and tools are quantifying opinions about the simple
conjugate prior, the Dirichlet distribution. Some of the recent graphical interactive software
th a t quantifies expert opinion about different problems other th an GLMs and multinomial
priors are reviewed in Section 2.6.
2.2
P sych ological a sp ects in elicitin g op inion
Psychological research on hum an performance in assessing probabilities dates back to the
1960’s. Peterson and Beach (1967) in their paper “M an as an Intuitive Statistician” studied
hum an statistical inference for estim ating proportions, means, variances and correlations.
11
Their results conclude th a t m an can use probability theory and statistics intuitively in per
forming these inferential tasks. In the same year, W inkler (1967) stated th at, in assessing
prior distribution for Bayesian analysis, the expert has no ‘tru e ’ built-in prior distribution
th a t can be elicited. Instead, an elicitation process only “helps to draw out an assessment
of a prior distribution from the prior knowledge” . This prior distribution is affected by both
the assessor and the assessment techniques.
G arthw aite et al. (2005) reviewed a body of psychological literature about some of the
main m ental operations, heuristics, th a t an expert may perform in his mind to give a specific
numeric assessment and biases th at may influence these operations. A recent comprehensive
review of psychological research on assessing probabilities including heuristics and biases is
given by Kynn (2008).
She also provided some guidelines for eliciting expert knowledge
based on hum an biases and inadequacies in assessing probabilities given in the psychological
literature. O ther useful discussions on psychological aspects in the elicitation context may
be found in Hogarth (1975), W allsten and Budescu (1983) and O ’Hagan et al. (2006).
The main interest of this thesis is to elicit m ultivariate probability distributions. Mul
tivariate distributions require more quantities to be elicited th an univariate distributions.
Beside the usual summaries of each random variable, the dependence structure between all
variables m ust be also assessed. In the rest of this section, we briefly review psychological
aspects involved in assessing quantities required for m ultivariate distributions.
As a measure of central tendency for each random variable, we have decided to elicit
its median value from the expert. Experim ental work in the literature reveals th a t people
are better at eliciting medians rather than means, especially for skewed distributions. See
Garthwaite et al. (2005) and references therein. The median value can be assessed through
one step of the bisection m ethod, see for example W inkler (1967), Stael von Holstein (1971)
and P ra tt et al. (1995). The expert is asked to determine her m edian as the value th a t the
random variable is equally likely to be less than or greater than. For more discussion about
bisection tasks and their usage, see for example Garthw aite and Dickey (1985), Hora et al.
12
(1992) and Fischer (2001).
To elicit variances, we have chosen to assess the two quartile values of each univariate dis
tribution. By assuming a smooth unimodal distribution, such as the normal or approxim ate
normal distribution, quartiles are transform ed to elicit the variances. Quartiles can be easily
assessed using the bisection m ethod, which is also called the successive subdivision m ethod,
as follows. The upper quartile is assessed by asking the expert to assume th a t the random
variable is above her assessed median value. She is then asked to assess her upper quartile as
the value th a t the random variable is equally likely to be less th an or greater than. Similarly,
the lower quartile is assessed as the value th a t divides the range below the median into two
equally likely ranges.
The assessed quartiles represent a central 50% credible interval. People can perform the
task of assessing credible intervals reasonably well. However, there is a clear tendency for
people to be overconfidence in assessing central credible intervals; they tend to give shorter
intervals [Garthwaite et al. (2005)]. Some other quantiles were found to reduce the degree of
overconfidence, such as the 33 and 67 percentiles. O ’Hagan (1998) suggested using the central
66% interval, and mentioned th a t experimental work about different quantile assessments had
not revealed any single choice to be the best in all cases. For more details, see Hora et al.
(1992), Garthw aite and O ’Hagan (2000) and Kynn (2005, 2006).
To complete the elicitation process of a m ultivariate distribution for dependent variables,
summaries of dependence structure must be elicited.
Typically, determ ining correlations
is the trickiest p art in a m ultivariate elicitation, especially when there are more th an two
random variables and a variance-covariance m atrix m ust be assessed. Such a m atrix must
be positive-definite for m athem atical coherence. We will make extensive use of the m ethod
of Kadane et al. (1980) to elicit positive-definite variance-covariance matrices. The m ethod
is described in the next section. It relies on assessing conditional medians and quartiles
to compute conditional variances and covariances. Conditional quartiles are assessed in a
structural way th a t guarantees positive-definiteness.
13
Assessing conditional quartiles is not, however, the only way to elicit correlations. Other
m ethods were suggested in Clemen and Reilly (1999) and Clemen et al. (2000). These m eth
ods include direct assessment of a correlation coefficient, and assessing conditional percentiles
or probabilities of one variable given percentiles or probabilities of the other variable, either
for one or two items from the population. These assessments were used to calculate Pear
son, Spearm an and Kendall’s r correlation coefficients. Although Clemen and Reilly (1999)
discussed building copula functions as joint distributions, th a t can be elicited using marginal
distributions and elicited correlations, they did not attem pt to obtain a positive-definite
variance-covariance m atrix for m ultivariate distributions.
In summary, in building our proposed elicitation methods throughout this thesis, we take
into account the following considerations. These were mentioned by Kadane and Wolfson
(1998) as the points of agreement among most of the statistical literature on how elicitation
should be carried out.
1. Expert opinion is the most worthwhile to elicit.
2. Experts should be asked to assess only observable quantities, conditioning
only on covariates (which are also observable) or other observable quantities.
3. Experts should not be asked to estim ate moments of a distribution (except
possibly the first moment); they should be asked to assess quantiles or prob
abilities of the predictive distribution.
4. Frequent feed-back should be given to the expert during the elicitation pro
cess.
5. Experts should be asked to give assessments both unconditionally and con
ditionally on hypothetical observed data.
14
2.3
P rior elicita tio n for norm al linear m od els
Although it was introduced as an elicitation m ethod for the param eters of a normal linear
model, the work of Kadane et al. (1980) has been an im portant step towards eliciting prior
distributions for GLMs, and even for eliciting many other m ultivariate distributions. See, for
example, Dickey et al. (1986) Al-Awadhi and Garthwaite (1998), Garthw aite and Al-Awadhi
(2001, 2006). The ideas of Kadane et al. (1980) are utilized, modified and implemented
extensively throughout this thesis. A detailed review of their elicitation m ethod is given
below.
Suppose the normal linear model is given by
Y = X!f3 + e,
where X_ = ( ^
. . . } x r)'
£ ~ N ( 0 , c t 2)
( 2 . 1)
• • • , (3rY
a vector of r explanatory variables, and
is the vector of regression coefficients. Kadane et al. (1980) introduced an elicitation m ethod
for the natural conjugate prior distribution structure of the param eters in model (2 . 1 ) as
(2 .2 )
w5
(2.3)
The hyperparam eters to be elicited are thus a mean vector
6,
the two positive scalars
5, w and a positive-definite m atrix R. The expert cannot be asked about these quantities
directly as they are not observable. Instead, the prior distributions are induced from expert
assessments about the response variable Y , which is an observable quantity, at some given
values of the explanatory variables. Hence, a num ber of m realizations X i , ■• • , 2Lm
selected.
Kadane and Wolfson (1998) discussed how these design points can be selected efficiently.
At each design point X _^i = 1, • • • , m, the expert assesses a median value
quartile
0.75
and a 0.9375 quantile
0.9375
0 .5 ,
an upper
of the explanatory variable Y{. The quantile
yi,0.9375 can be obtained using two bisection iterations above y^ 0 .7 5 . These assessments were
used by Kadane et al. (1980) to elicit b and S as follows.
To elicit the mean vector b, the assessed medians were treated as observations of Y , and
b was elicited as the least-squares estim ate
(2.4)
where y Q5 = (yi.o.5,2/2.0.5, • • • ,2/m,o.s)', and X is the design m atrix, which is given by X =
Under the prior structure in (2.2) and (2.3), the predictive distribution of (F |X ) is a
m ultivariate t distribution with 5 degrees of freedom.
To elicit S, Kadane et al. (1980)
pointed out th a t the ratios
, v ,s
Vi , 0.9375
-
Vi, 0.5
a»C£i) = —--------- —
Vi, 0.75
—
,
/0
(2-5)
Vi, 0.5
depend only on 5 as a measure of the thickness of the distribution tails. Since the standard
normal distribution has the minimum value of this ratio as 2.27, Kadane et al. (1980) used
a* instead of a; to elicit <5, where a*(X{) = m ax{ai(X i), 2.27}. Then S was elicited as the
nearest value of degrees of freedom th a t gives the closest ratio ij(0.9375)/^(0.75) to
-* =
l
ai (X i)
(2.6)
m
We propose a different m ethod for eliciting a degrees of freedom hyperparam eter in Chap
ter 5 of this thesis. Our proposed m ethod is an extension of the approach given by Garthw aite
and Dickey (1988), which is described later in Chapter 5.
Although the m ethod of Kadane et al. (1980), for eliciting a positive definite m atrix R and
a value for w, is complicated and requires substantial m athem atical notation and details, we
review it here because its structural elicitation approach is essential in our proposed m ethods
for eliciting positive-definite matrices throughout this thesis.
The m ethod is based on the properties of the m ultivariate t distribution. The center
and spread of the distribution are defined as follows. For any constant vector a , and any
constant m atrix B, if Y_ has a standard m ultivariate t distribution, then the center of the
vector Z_ = a + BY_ is defined as C{Z) = a. The spread of Z_ is defined as S ( Z ) = B B ' . If
5 > 1, then the mean exists and B ( Z ) = C (Z). If S > 2, then the variance exists, and is
16
given by V ar(Z_) = jz^S(Z_). E x p ert’s assessments were used to compute centers and spreads
to elicit R and w , as detailed below.
The conditional elicitation structure suggested by Kadane et al. (1980), for i = 2, • ■• , ra,
involved assessing conditional medians and upper quartiles of Yi given sequences of hypo
thetical values Vi, - ' -
The conditions th a t were imposed on these hypothetical values
insured discrepancy between conditional and unconditional centers, in the sense th a t
y°i ? C (U ),
y H C (Y i\y l--
(2.7)
t=
(2.8)
These conditions guarantee the existence of the elicited positive-definiteness m atrix R , as will
be shown later.
Centers and conditional centers were assessed using medians and conditional medians.
For example, C{Y\) was taken as the unconditional median assessment yifi. 5 . For j < i,
C ( Y i\y i,’ " ,y j) were taken as the conditional medians of Yi given th a t Yi = y \ , • • • ,Yj = y ®,
which are denoted by (j/i.o.sls/ij • ■• ,Vj)- Similarly, conditional upper quartiles of Y; given
y \ , • • • ,y® are denoted by (^ , 0 .7 5 \Vi, ■• • , Vj)- Spreads and conditional spreads were computed
by dividing the assessed semi-interquartile range by the corresponding sem i-interquartile
range t(5 ,0.75) of a standard m ultivariate t distribution w ith 5 degrees of freedom. This
gives
S(Yi)
y 1,0.75 — y 1,0.5
2
t(5 ,0.75)
(2.9)
and, for i = 1,2, • • • , m — 1,
(Z/i+1,0.7512/l >‘ ' , V i ) ~ ( 2 / t + l ,0 . 5 | 2 / i > - " , y i )
t(5 + i, 0.75)
( 2 . 10 )
To elicit a positive-definite m atrix R, the approach of Kadane et al. (1980) is to successively
elicit the spread matrices Ui of (Yi, • ■• , Yj) in a way th a t guarantees the positive-definiteness
of the final m atrix, Um . The value of U\ equals S'(Yi) > 0 as given in (2.9). Then, supposing
th a t Ui has been estim ated as a positive-definite m atrix, the aim now is to elicit Ui+i, and
17
show it is positive-definite. Ui+\ is partitioned as
Ui
Uiii+ 1
(2 .11)
Ui + 1 =
£
Ui
S(Yi+1)
Conditional median assessments were used to estim ate g
as follows. The partition in
(2.11), with the properties of the m ultivariate t distribution, gives
C(YiJrl\ y l - - - ,!/,■) - C (yi+i) = (v\ - C (Yi), ■■■ ,y? - C (Y J ) g.+y
(2.12)
Moreover, for j < i , taking the center of both sides of (2.12) given th a t Y\ = y \ , - • • , Yj = Vji
gives
Vi ~ C(Yi)
Vj ~ C (yj)
C(Yi+1 \ y l .. . ,Vj) — C(Yi+i) =
(2.13)
C(Yj+ 1 \ y l - - - >y^ - C ( Y j+1)
C iY ily l--- , y ? ) - C ^ )
Since j — 1,2, • • • , i, Kadane et al (1980) ended up with a system of %equations of the form
—
i+1
(2.14)
1’
where
c ( y i+i | y f ) - c ( y i+i)
C(Yi+1 \yly° 2 ) - C ( Y i+1)
(2.15)
hi+i —
C(Yi+l\ y l - - - ,</;) —C(Yi+i)
and
^ -c (Y i)
c ( y 2 \y? ) - c ( y 2)
2 /? - C ( U )
V° - C(Y2)
- C(Yi)
y \ - <7(Y2)
•••
cu ^ ?)-c(y )
< 7 ( ^ 2 ,° ,^ ) - C ( Y i)
M i+1 —
18
(2.16)
M ultiplying both sides of (2.14) from the left by the m atrix
Qi+l —
1
0
0
...
0
-1
1
0
...
0
0
-1
1
0
:
(2.17)
0
0
-1
0
gives an upper diagonal system th a t can be solved for g
1
as follows,
-1
Vi ~C{ Yi )
0
C(Y2\y0l) - C ( Y 2) ■
2/20 - c ( y 2|y?)
C (5 % ?) - C{Yi)
■
Zi+ii
—
i+i
:
0
0
...
(2.18)
:
0
/?-C'(yj|22?,... .;/?_,)
2
where 9i+1 = Qi+ihi+1.
Under conditions (2.7) and (2.8), the upper diagonal m atrix in (2.18) is nonsingular and hence
a unique solution for g
exists. It remains now to elicit the value of the spread 5 (Y + i) in
(2.11). Kadane et al. (1980) used the elicited conditional spread, with the properties of the
conditional spread of m ultivariate t distribution, to get a formula for S'O'i+i) as follows,
S(Yi+1) =
SW+ilff?,--- ,»?)[l + i/<5]
(2.19)
where
Hi^ifi-CiY,),
•••, ^ - C { Y i ) ) U t
( y f - C ( y i) ,
•••,
yf - C(Yi))'-
Using Schurr complement, the m atrix Ui+i as partitioned in (2.11), is positive-definite if
and only if Ui is positive-definite and
S(Yi+1) - tfi+1Uig.+1 >0,
( 2 .20 )
which is guaranteed from (2.19). Then, using m athem atical induction, the final m atrix Um
is positive-definite.
19
To elicit R using Um , properties of the m ultivariate t distribution were used to yield the
following formula
R-
1
= ^! { X ' X ) - 1X '( U m - w l ^ X i X ’X ) - 1,
(2.21)
where Im is the identity m atrix of order m. See Kadane et al. (1980) for details.
The formula requires w to be elicited first. To elicit w, the expert is asked to suppose th at
two independent observations Yi and Y* are taken at the same design point JL — 2Q- Given
V i>• • • >Vi- 1 >the expert assesses the median of Yi which is used to estim ate C(Yi\i/i, • • • ,
Then the expert is given a hypothetical value y° for Yi and is asked to assess the conditional
median of Y* given y \ , • • • , y? to be used as an estim ate of C (Y*\yi, • • • ,y f). The conditional
distribution of the two observations is a bivariate t, and its properties were used to elicit Wi
as
wi = [5 ( ^ 1!/;, • • • , y U ) - K *]Ss + l r
(2,22)
where
Ki = [c {y ; \ vl ■■■ , y? ) - c ( Y i \ v l ■■■< y li ) ] vo S{c m v a
y
Vi
i) ■■■ iVi-i)
and
L' =
•••.
v U - x ! i^ b ) u r - \ ( y Q
l - x ! 1b_,
v ti-
Different values wi, ■• • ,w m , were then averaged to get a final elicited value w. Our exten
sion of the m ethod suggested by Garthw aite and Dickey (1988) for eliciting w, as proposed
in Chapter 5, makes the same assumption of getting two independent observations at the
same design point. But we require a median assessment of the difference between the two
observations, which is due only to the random variation.
The m ethod of Kadane et al. (1980) has been extensively reviewed in the literature. See
for example Kadane and Wolfson (1998) and Daneshkhah and Oakley (2010), where two
extra examples for its implementation were also discussed. Two drawbacks of the m ethod
were mentioned by Garthwaite et al. (2005). The assessments it uses are likely to be biased
by conservatism as the expert is asked to revise her opinion based on hypothetical data.
20
Eliciting the spread using the median and upper quartile may not reflect both halves of the
distribution, hence masking any asymm etry of expert opinion.
Some other alternate methods for eliciting the param eters of normal linear models are
available in the literature. See, for example, Oman (1985), Garthw aite and Dickey (1988,
1992) and Ibrahim and Laud (1994). Oman (1985) used empirical Bayes methods to estimate
both 5 and R instead of eliciting them from the expert. The m ethod of Garthwaite and
Dickey (1988) is similar to th a t of Kadane et al. (1980) in th a t both of them make use of
repeated assessments th a t are reconciled and utilize a structural set of conditional questions
to guarantee the positive-definiteness of the covariance m atrix.
However, instead of asking about Yi, Garthwaite and Dickey (1988) suggested asking the
expert about the mean Y{ of Y th at may be observed in a large num ber of experiments at
the design point X^. In this way, the expert’s assessments do not include random variation.
On the other hand, the design points th a t are used in Garthw aite and Dickey (1988) are to
be selected by the expert. This enabled the m ethod to be extended to the variable selection
problem in linear models, see Garthwaite and Dickey (1992). Nevertheless, the m ethod of
Kadane et al. (1980) is more flexible th an th a t of Garthw aite and Dickey (1988). The latter
is not designed to handle categorical explanatory variables nor polynomial regression models
th a t contain interactions between explanatory variables. A more detailed review of normal
linear models elicitation can be found in Garthwaite et al. (2005) or O ’Hagan et al. (2006).
2.4
P rior elicita tio n for GLM s
Starting from the idea th a t it is more efficient and easier to elicit expert opinion about
observable quantities, rather than about param eter values, Bedrick et al. (1996) were the first
to elicit priors for some arbitrary generalized linear models. Their work switched from normal
linear regression elicitation (Kadane et al. (1980); Garthw aite and Dickey (1988); Garthw aite
and Dickey (1992)) into GLM. Their specification of informative prior distributions for the
regression coefficients of a GLM is based on expanding the idea of conditional means priors
21
(CMP).
The idea of the CMP is th a t the expert is asked to give his assessment of the mean
of potential observations conditional on given values at some carefully chosen points in the
explanatory variable space. This information is used to specify a prior distribution at each
location point. These priors are conveniently assumed to be independent for the various
locations. A prior distribution for the regression coefficient vector is then induced from the
CMP.
To clarify this idea, consider for example the binomial GLMs, with n independent obser
vations Y{, each w ith a corresponding vector X_{ of p explanatory variables. Let N{Yi\X_i ~
Binomial(N j,pi), hence pi — E{Yi\X_j). The probability of success p is related to the vector
X_ through a monotonic increasing link function g(.) as
g(ti) = x l p ,
(2.23)
where (3 is a p vector of regression coefficients. Common choices for the link function g(.)
yield logistic, probit and complementary log-log regressions. The likelihood function for /3 is
given by
L(0) (X n < r 1Q d £ )'',,1'i [l - S - 1G £ g )]JVi(1- y‘).
(2.24)
i=l
Bedrick et al. (1996) induced the prior on /3 from a CMP on pi = E(Yi\X_i), the suc
cess probability for a “potentially observable” response Yi a t the vector X_iof explanatory
variables.
They assume th a t the p vectors X_i are linearly independent and assume th a t
Pi
~ b e ta (a i)i ,a 2)i).
(2.25)
Hence, from independence, the prior on p is given by
7 T @ c x f[A ° M' 1( l - M i r ' i“ 11
(2.26)
Under the independence assumption and from (2.23), (2.26), they gave the induced prior
on (3 as
tt(£) oc
(2.27)
i= 1
22
Although the above example is only valid for binomial GLMs, Bedrick et al. (1996) gave
generalization and examples where their m ethod is applicable to common GLMs including
Poisson and exponential regression. However, for normal and gamma regression models they
were only interested in eliciting priors on the regression coefficients (3 assuming th a t the
dispersion param eters of these models are known.
The power of this approach as they stated is th a t “it is much easier to elicit information
about success probabilities such as E ( Y |X ) = p, which are on the same scale as the data,
than to attem pt the extremely difficult task of eliciting prior knowledge about /?.”
In their work, the use of d a ta augm entation priors (DAP) was also proposed to induce
priors on (3. They showed th a t D A P’s are closely related to C M P’s and can be induced by
particular cases of C M P’s. A DAP on /3 has the same functional form of the likelihood and can
be obtained by specifying “prior observations” and their weights. These prior observations
must be taken at specific locations in the predictor space. Hence, a DAP also needs some
locations in the predictor space to be specified as in the case of a CMP.
The good choice of the predictor space location should be in the expected range of X ,
spread enough so th a t the corresponding probabilities can be reasonably assumed to be
independent and they should also be accepted by the expert. It is straightforw ard, however,
to let the field expert choose these locations. Bedrick et al. (1996) noted th a t the independence
in C M P’s does not mean th a t the component of the
vector will be independent too.
After selecting a proper JQ, i = 1, • • • ,p, to determine the value of Li in a DAP, it can
be thought of as a typical prior observation associated with X_{. For example, in binomial
GLMs, it can be thought of as a prior estim ate of the mean num ber of successes at JQ. If
the beta prior in (2.25) is reparameterized such th at
aiti = WiYi
a2)i = Wi(l - Yi),
and
(2.28)
then, for the logistic model, the CMP in (2.27) is exactly a DAP since it takes the same
functional form of the likelihood in (2.24). The CMP in (2.27) induces a DAP for the logistic
23
model as the logit link function is such th at
d[9 - \ Z p ) \ = g - \ £ m
~ JT 1(£ '£ )]•
(2.29)
The induced DAP in (2.27), using (2.28) and (2.29), is proportional to a likelihood based on
the “prior observations” (1^-,XZ- , ^ , : i = 1, • • • ,p ). The weight param eter W{ in (2.28) can
be interpreted as the prior number of observations associated with Yi. Consequently, large
values of Wi reflect more confidence in the prior belief which means th a t the prior is relatively
more informative. However, these extra param eters need to be quantified, the m atter which
may make the CMP easier to be elicited.
Although the resulting priors are not necessarily members of any specific family of dis
tributions, Bedrick et al. (1996) argued th a t the CMP and DAP priors lead to tractable
posteriors for GLMs through importance sampling and Gibbs sampling techniques.
Another approach for eliciting different classes of priors for GLM param eters started with
the work of Ibrahim and Laud (1994) for normal linear models. Their work was then extended
to prior elicitation and variable selection for logistic regression models by Chen et al. (1999).
A further extension to GLMs was given by Chen et al. (2000), who proposed the class of
power priors for GLMs.
The main idea of the above series of papers is th a t a prior prediction vector T 0 can
specified for the response vector Y, either using historical d a ta or an expert’s opinion. A
scalar 0 < ao < 1 needs also to be elicited to quantify the expert’s confidence about her
best guess Y 0 relative to the actual data. Hence the scalar ao reflects the contribution of the
prior information in the posterior relative to the information given by the current experiment.
Together w ith the design m atrix X , Y 0 and ao are used to specify an informative prior for
regression coefficients.
In the class of power priors, the prior density is raised to the power ao, which is considered
as a precision param eter th a t controls the heaviness of the tails of the prior distribution. For a
random ao, a beta distribution was assumed by Chen et al. (2000) as a prior for ao. Although
the class of power priors cannot be expressed in a closed form, Chen et al. (2000) discussed
24
its theoretical properties and propriety together with its required computations.
Different extensions to this class of priors have been proposed in the literature. For exam
ple, based on the same ideas, Chen and Ibrahim (2003) proposed a class of conjugate priors
for GLMs and discussed its elicitation. Moreover, Chen et al. (2003) introduced an informa
tive class of priors for generalized linear mixed models. Extensions to variable selection were
suggested by Meyer and Laud (2002), Chen and Dey (2003) and Chen et al. (2008).
Garthw aite and Al-Awadhi (2006) developed an elicitation m ethod for piecewise-linear
logistic regression. The m ethod is also valid for other GLMs and Garthw aite and Al-Awadhi
(2011) extends the idea to GLMs with any link function.
They assumed a m ultivariate
normal distribution for the regression coefficients; its param eters can be determined from the
expert assessments. One of the main aims of this current thesis is to extend this piecewiselinear elicitation m ethod in the context of GLMs to treat the case of correlated regression
coefficients. The m ethod is reviewed in detail in Chapter 3 and the proposed extensions are
given in Chapters 4 and 5.
The piecewise-linear elicitation m ethod was designed to be used w ith the aid of interactive
graphical software w ritten for this purpose.
Older prototypes of the software were used
in practical case studies for threatened species in Garthw aite (1998) and Al-Awadhi and
Garthwaite (2006). A more recent version of the software has been w ritten by Jenkinson
(2007), this version of the software has been reviewed, modified and extended further in
Chapters 3, 4 and 5 of the current thesis.
Another prototype of the interactive graphical software was given by K ynn (2005, 2006)
to elicit expert opinion for the Bayesian logistic regression model. The software is called
ELICITO R and appeared as an add-on to WinBUGS. Kynn extended the program w ritten by
Garthwaite (1998) and rewrote it in a more robust programming language. The software was
originally developed as a user friendly tool for quantifying environmental experts’ knowledge
while studying the presence or absence of endangered species. It adopted the same approach
of Al-Awadhi and Garthwaite (2006).
25
Following Garthw aite (1998), the elicitation scheme adopted in ELICITO R is based on
the logistic regression model in which the probability of the presence of an endangered species
is represented by a Bernoulli distribution and can be related to a num ber of environmental
variables via a logit function. The expert is asked to give conditional probability assessments
at the preferred or optimum site of species presence as the intercept. Then assessments are
made at other sub-optimum levels of each other covariate.
The choice of the “optim um ” value or level of each covariate to be its intercept, also called
the reference value, is made by Garthwaite (1998) and thoroughly justified in Kynn (2006).
She discussed th a t it is psychologically meaningful to the expert to be asked about conditional
probabilities given th a t all or all except one covariate are at their optim um level. In this case,
conditioning on all other covariates can be translated in the expert’s mind as conditioning on
one event where everything is optimal. Kynn mentioned also some ecological concerns th a t
make the optim um point a good selection, a noticeable concern is th a t the species responses
distribution is usually considered to be unimodal. However, in our extensions to the piecewiselinear model, the expert freely chooses the reference level, although she is advised to select
the optim um one.
While categorical covariates are related to the probability of presence, or generally of
success, through a bar chart in both ELICITO R and the prototype and its extensions,
representing continuous covariates is clearly different.
ELICITO R does not only assume
a piecewise-linear relation between continuous covariates and the presence probability, b u t it
also offers the options of linear and quadratic functions to model this relation. Nevertheless,
Kynn (2006) stated th a t the fully linear form is not realistic and th a t the quadratic form can
be too restrictive. We believe th a t the piecewise-linear relation is a very general form th a t
can model many other forms as special cases.
The main critical point in the statistical model of ELICITO R is th a t the regression
coefficients are assumed to be independent a priori, an assum ption th a t may not be true in
many situations. Thus, only univariate normal priors were elicited and no attem p t was made
26
to elicit covariances even for the coefficients at the dividing points of the same piecewise-linear
curve or at the different levels of each single categorical covariate.
The idea of successive sub-division, also called the bisection m ethod, as a technique to
assess the three quartiles from an expert, has been generally accepted as a comparatively
easy task for the expert to perform. The prototype software in Garthw aite (1998) and its
extensions apply the bisection m ethod to obtain expert’s assessments. However, Kynn (2006)
has a detailed discussion about available alternatives to assess percentiles, and cites results
of studies comparing these methods. But in designing ELICITOR, she decided to use a quite
different technique by letting the expert give her two boundaries of a credible interval, then
give the probability of this interval. Despite being easy to perform, this m ethod does not
seem to be efficiently tested or justified.
R ather than assessing probabilities as numbers, the users of ELICITO R have more in
teractive visualizations for estim ating probabilities. These include a probability wheel, a
probability bar and other visualizations to help experts assess probabilities closer to their
knowledge. The feedback provided after the assessment process are alternative credible in
tervals and probability distribution functions for the intercept and categorical variables.
ELICITO R was intended to be extended to encompass other GLMs, with flexible options
of the link functions and prior distributions, not only the logistic regression. The software
docum entation mentioned th a t this and other extensions were being tested, but we do not
know of any version of the software where these extensions have been implemented. For more
details on ELICITO R see Kynn (2005); Kynn (2006) and O ’Leary et al (2009), although the
software and its docum entation no longer seem to exist as an open source on the web.
Denham and Mengersen (2007) introduced a m ethod and developed software to elicit
expert opinion based on maps and geographic d a ta for logistic regression models. Elicit
ing information on observable quantities, such as values of the dependant variable at given
values of the predictors, (referred to as the predictive procedure) is usually preferred and
easier than direct assessment of the regression param eters (structural procedure). However,
27
they argued th a t each procedure is more convenient for a specific type of experts. For ex
ample, they considered two types of ecological experts: the ‘physiologist’, who has a good
understanding of the physical requirements of each species, is more likely to respond well to
a structural elicitation. The ‘field ecologist’, who has more knowledge about the places of
existence for each species, may be better at responding to a predictive elicitation. Denham
and Mengersen (2007) proposed a new approach th a t combines both strategies. In their
combination approach, the expert may use either m ethod or the two m ethods simultaneously
w ith each variable, according to his preference and background.
They adopted the usual logistic regression for species modelling,
Yi ~ Binomial(n^, ^ ) ,
with the logit link function Yi = g{^i) = l o g ( ^ / ( l — /2Z)), and Y_ = X/3, where Yi is the
number of observations of a species at site i, and X is the m atrix of explanatory variables.
The aim is to quantify the expert’s opinion about the prior distribution of
in the form
£ ~ M V N (6 ,E ).
They stated th a t the methods of Kadane et al (1980) and Garthw aite and Dickey (1988,
1992) can be used in this context to estim ate the hyperparam eters b and S by asking the
expert to assess some quantile information for the value of Y at particular values of X .
However, they referred to the difficulty of this predictive elicitation procedure for the ‘field
ecologists’ who may have knowledge about the presence of a specific species at a located site
map rather than the explanatory variables affecting this presence.
To help this type of experts, Denham and Mengersen (2007) suggested two alternatives.
The m ethod of Kadane et al. (1980) can be used, with the expert choosing the design points
based on location, without specific reference to explanatory variables. Or, instead, the design
points could be selected as in the m ethod of Kadane et al. (1980), and then transform ed to
map locations th a t are displayed on the m ap for the expert.
Their proposed combination approach as an elicitation m ethod is not only a hybrid ap28
proach th a t combines both the predictive and structural procedures together, but it also
offers the opportunity to use either of the two procedures simultaneously for each single
variable. The basis of their m ethod is to use the standard elicitation m ethod with maps as
discussed above, to derive a “first pass” elicitation of b. A structural elicitation procedure is
then applied. The latter is implemented by presenting a univariate graph for each of the p
explanatory variables. In each graph, they fix all the other p — 1 variables a t their mean or
median value, i.e. for the j th variable, j = 1, • • • ,p, they display the graph of
p
Y = bQ+ bjX j +
frfcXfc.
k=l,k^j
These univariate graphs are autom atically updated once the expert updates the map
by adding new points or editing values. Moreover, the expert can directly m anipulate the
graphs, which cause the m ap to autom atically change as well. The expert is m eant to keep
changing the map an d /o r the graphs until they all represent her prior knowledge. To elicit
S, The expert is asked to provide a 95% “envelope” around the displayed regression lines by
assessing upper and lower 95% quantiles.
To apply this approach, Denham and Mengersen (2007) developed elicitation software
under a Geographic Information System (GIS), in which design points were actual location
on interactive maps. They listed the benefits of the elicitation procedure using the software
w ith interactive maps over the usual elicitation with paper maps. The new procedure is more
flexible, it allows the expert to access information at any point in a convenient m anner. The
scale dependency of the hard copy maps could be removed by using the feature of zoom in
and out. Using the software allows the visualization of the responses and provide feedback
to the expert. In which case, the expert can revisit an d /o r modify any previous assessment
on the interactive map.
Denham and Mengersen (2007) implemented their software in two case studies for m od
eling the median house prices in an Australian city and for predicting the distribution of an
endangered species in Queensland. In their first case study, they modelled the m edian house
prices using a piecewise-linear regression to a tta in flexibility and m aintain the simplicity of
29
the linear regression. They chose the dividing knots of the piecewise-linear relations as the
0.33 and 0.66 quantiles of each explanatory variable. Their model takes the form
Yi = (3o + (3\Xi\ + faX'n + foX'ii + (3aX i2 + (3§X[2 + fieX1^,
where X \ is the distance from city center in kilometers and X 2 is the distance from the river
in kilometers. For j = 1,2, they defined X[j and X ”- as
and
X ij — Xo.33j
if X ij > Xo.33j,
0
otherwise,
/
X ij
XofiQj
if X ij
X()'Q6j ,
X'lj = <
0
otherwise,
where Xo. 33j and Xo.66j are the 0.33 and 0.66 quantiles of X j , respectively.
They m eant to simplify the Bayesian prior structure of the model compared to th a t of
Kadane et al. (1980) or Garthwaite and Dickey (1992), to be of the form
Y \X .,P ,a 2 ~ N ( X .'P ,a 2),
P ~ M V N (6,E),
a 2 ~ Inverted Gam ma(^o/2, vqSq/ 2 ),
In this case study, they specify a prior for the regression param eters /?. However, it does
not seem th a t they implemented any procedure to elicit the two extra hyperparam eters
uq and
So. The results suggested th a t the experts managed to elicit quantifications of their opinions
of the house prices in the city th at were consistent with the actual house prices. The priors
appeared to be relatively consistent. All participant experts in this case study reported th a t
they preferred the combined approach over the m ap or the standard approach. Most experts
elicited slightly different priors under the different elicitation m ethods they used.
The second case study in Denham and Mengersen (2007) was devoted to eliciting two
experts’ opinion about the distribution of the brush-tailed rock-wallaby in Queensland. The
30
explanatory variables were chosen by one of the experts to be X \ \ a measure of terrain, X 2 '
a moisture index, X%\ aspect and X 4 : a 4-category variable representing the rock type. They
were interested in the following logistic model
Yi ~ Bernoulli (pi),
logit (pi) ~ N(/ii,(72),
Hi = P0 + Pi X u + /?2 ^ 2 z + PzX\i + ^ 4 X 3 ^
+ PsX^i + peX^u + P-jX^2i + Ps X ^ u
P_ ~ MVN(6, E).
They aimed to elicit the m ultivariate normal prior of p. The experts were allowed to
choose the design points. The expert chooses a design point by clicking on a map, then an
interactive dialogue pops up giving a plot of a beta distribution of the probability of presence
at the selected design point. The given plot has three adjustable points at the m edian and the
0.05 and 0.95 quantiles. The expert is asked to adjust the three quantiles, or the computed
b eta param eters, until the presented beta curve is the best representation of the expert’s
belief about the probability of the specie presence at the selected point. This procedure is
repeated for a number of design points.
Once the expert has selected a minimum num ber of points, a logistic regression model
is fitted by the software at each design point. Then the univariate relation between the
probability of presence and each of the explanatory variables is presented to the expert in a
separate graph, a response curve. Each curve is drawn assuming th a t the other variables are
kept fixed at their means. The categorical variable X 4 is represented by box-plots rather than
a curve. The expert can review and modify the design points to get the autom atic im pact
on the response curves. The elicited beta distribution at each design points could be used to
elicit the m ultivariate normal distribution of the regression param eters P through weighted
logistic regression or a simulation based approach, see Denham and Mengersen (2007) for more
details. They stated th a t the priors elicited from the experts were reasonably informative,
31
with corresponding posteriors th a t are clearly different from those posteriors obtained from
a uniform improper prior.
Although the software is specially designed for geographical d a ta elicitation of a logistic
regression model, they indicated th a t the concepts can be generalized to any GLM. However,
Denham and Mengersen (2007) wrote the software explicitly for each of the two case studies
separately, tailored for the given cases and sets of explanatory variables. In its present form
their software is thus limited and cannot be used as a general elicitation tool. Moreover,
they used the R language to code statistical functions, with Visual Basic and other software
for interactive graphs embedded in the GIS system. The latter limits the usability of their
software.
Jenkinson (2007) re-wrote the software of Garthw aite and Al-Awadhi (2006) in Java
to provide a more transportable and stable version.
He gave a detailed description and
docum entation of both the software and the piecewise-linear theoretical model behind it
[Jenkinson (2007), p.215-251]. Further modifications of the theoretical model and the software
are given in this current thesis in Chapters 3, 4 and 5.
An im portant medical application of the GLM elicitation software is given in a case
study reported in Garthwaite et al. (2008). Aiming to estim ate the costs and benefits of
current and alternate bowel cancer service in England, a pathw ay model was developed,
whose transition param eters depend on covariates such as patient characteristics. D ata to
estim ate some param eters were lacking and expert opinion was elicited for these param eters,
using the indicated software and under the assum ption th a t the quantity of interest was
related to covariates by the generalized piecewise-linear model given by Garthw aite and AlAwadhi (2006). The assessments were used to determine a m ultivariate normal distribution to
represent the expert’s opinions about the regression coefficients of th a t model. One conclusion
of this work was th a t quantifying and using expert judgem ent can be acceptable in real
problems of practical importance, provided th a t the elicitation is carefully conducted and
reported in detail.
32
A thorough detailed comparison has been conducted by O ’Leary et al. (2009) for three
relatively recent elicitation tools for logistic regression. The comparison included the interac
tive graphical tool of Kynn (2005) and Kynn (2006), the geographically assisted tool under
GIS of Denham and Mengersen (2007) and a third simple direct questionnaire tool with no
software. These tools were compared in an elicitation workshop (see O ’Leary et al. (2009)
for more details on the third m ethod). The paper discusses and gives a detailed description
for each of the three methods used, showing advantages and disadvantages of each of them.
M ethods were compared according to their differences in the type of elicitation, the proposed
prior model, the elicitation tool and the requirement of a facilitator to help the expert. Prior
knowledge of two experts was elicited to model the habitat suitability of the endangered
Australian brush-tailed rock-wallaby. The comparison revealed th at the elicitation m ethod
influences the expert-based prior, to the extent th a t the three m ethods gave substantially
different priors for one of the experts. Some guidelines were also given for proper selection of
the elicitation method. This work of O ’Leary et al. (2009) is part of a large body of applied
research which shows the importance of eliciting expert knowledge when modeling rare event
data, see also Kynn (2005); Al-Awadhi and Garthw aite (2006); Low Choy et al (2009) and
Low Choy et al. (2010).
Although they are interested mainly in designing the elicitation process for ecological ap
plications, Low Choy et al. (2009) give a framework for statistical design of expert elicitation
processes for informative priors which may be valid for Bayesian modeling in any field. The
proposed design consists of six steps, namely, determining the purpose and m otivation for
using prior information; specifying the relevant expert knowledge available; formulating the
statistical model; designing effective and efficient numerical encoding; managing uncertainty;
and designing a practical elicitation protocol. O ther im portant stages in the elicitation pro
cess may be found in Garthwaite et al. (2005), Jenkinson (2007) and Kynn (2008). Low Choy
et al. (2009) validated these six steps in a detailed discussion and comparison of five case stud
ies, revisiting the principles of successful elicitation in a m odern context.
33
The recent work of James et al. (2010) is very interesting and im portant in the current
review for two aspects. First, it introduces and describes a general elicitation tool for quan
tifying opinion in logistic regression using interactive graphical stand-alone software, called
Elicitator. Second, the software is based on a novel statistical methodology to elicit a normal
prior distribution for regression param eters.
Their work is an extension to th at of Denham and Mengersen (2007) as applied on nor
mal prior elicitation for logistic regression in a geographically-based ecological context. As
m entioned before, Denham and Mengersen (2007) did not introduce a general purpose tool;
their software was tailored to the requirements of specific case studies. M otivated by th at,
James et al. (2010) developed the Elicitator software as a stand-alone elicitation tool th a t
can be used for a wide range of applications.
Although the Elicitator software is based on the same interface and protocol as its pro
totype in Denham and Mengersen (2007), the statistical m ethod adopted to transform as
sessed values into elicited priors is a novel one inspired from the CMP ideas of Bedrick et al.
(1996). James et al. (2010) argued th a t the CMP is more tractable and more applicable in
general compared to the predictive approach used by Kadane et al. (1980) and Denham and
Mengersen (2007). The novel modification in the Elicitator design to the approach of Bedrick
et al. (1996) is th a t it relaxes the assumption th a t the num ber of chosen points a t which the
expert assesses her priors is exactly equal to the number p of explanatory variables in the
logistic model. This is the assumption th a t leads to the induced prior on j3 as in (2.27).
Relaxing this assum ption allows the number of elicitation points, say k, to exceed the
num ber p of explanatory variables, the situation th a t is commonly encountered. Although the
prior on /3 can no longer be induced as in (2.27), James et al. (2010) proposed a m easurem ent
error model in which elicitation points represent d a ta in a beta regression model. In this
sense, increasing the number k of elicitation points will lead to a more accurate prior.
Specifically, they assume a standard logistic regression model with a Bernoulli distribution
and a logit link function as used by Bedrick et al. (1996). A main criticism is th a t they
34
assume th a t the explanatory variables are independent a priori, in the sense th at independent
univariate normal priors were assumed for (3, i.e.
j = l,
(63-,a?),
(2.30)
Although they mentioned the possibility of assuming a m ultivariate normal prior distribution,
no attem pt has been made for its implem entation in Elicitator.
For i = 1, • • • , k, the expert assesses information about the probability of success pi at
a geographical site i, selected by the expert, with a known combination of the explanatory
variables X i j , X 2 ,i, ■• • , X Pti.
For example, the expert may assess information about the
probability of presence of a species at a known combination of environmental predictors at
site i. Following Bedrick et al. (1996), expert’s assessments are used to elicit a beta prior
on p,i as in (2.25). However, in situations where k > p, a beta prior on m would not help
induce the normal prior for (3. Instead, James et al. (2010) assumed a beta prior on the
expert’s probability of success, say Zi, which is different from the actual probability pi. As
in a measurement error model, pi is the conditional expectation of Z{ in the sense th a t
logit (pi) = X!iP
Zi\pi ~ b e ta (a i>t,a 2 >»),
(2.31)
Ei^Zi\pi) = pi.
James et al. (2010) discussed the expert’s assessments about Zi th a t are required to
elicit beta distributions as in (2.31). They argued th a t the required best estim ate of the
probability Zi in the measurement error model is the arithm etic mean, however it is difficult
to assess. They were also against the idea of assessing the median, claiming th a t it needs
more effort from the expert to assess.
Hence, Elicitator requires the mode of Zi as its
best estimate. Then, following the well-established practice of assessing several quantiles for
b eta elicitation, Elicitator requires the four bounds of the 50% and 95% credible intervals.
Although two assessments are m athem atically sufficient for eliciting the two beta param eters,
it is better to elicit more assessments and reconcile them , especially for skewed distributions.
35
A simple numerical procedure is used to elicit beta param eters from the mode and either two
or four assessed quantiles.
To elicit the hyperparam eters bj and cr|, j = l , - - - ,p in (2.30) using the elicited beta
param eters a\^ and <2 2 ,i, i — 1, • • - , A; in (2.31), James et al (2010) proceed as follows.
In principal, the beta regression in (2.31) is performed using the expert’s data on Zi and
the known values of the explanatory variables to provide the expert-defined estim ates of ft.
However, due to difficulties in implementing any beta regression package in Elicitator, the
beta regression problem has been approxim ated by its discrete version, a binomial regression.
An R software package is used to perform the binomial regression, where point estimates
(3j and their corresponding standard errors s.e.(Pj) are obtained. The prior distributions in
(2.30) are finally elicited using these estimates as
f t ~ N ( f t,s .e .( f t) 2),
j = 1, • • • ,p.
(2.32)
Two criticisms of the proposed measurement error model in this context are as follows.
First, it adds additional sources of uncertainty, namely, the discrepancy between the expert’s
probability Z{ and the conceptual probability fii. Second, it imposes difficulties in compu
tation and implementation in the software, requiring a binomial regression approximation.
However, these criticisms do not seem to be a high price compared to the increased accuracy
gained by increasing the num ber of elicitation points of CMPs. Moreover, the use of beta or
binomial regression make it easy to represent standard regression diagnostics to the expert
as feedback.
Interactive graphs th a t are given by Elicitator to the expert as feedback fall in three
m ain groups. The first group includes a box-plot, a pdf curve and some numeric statistics
of the elicited beta prior at each site. These are all interactive in the sense th a t they are
autom atically modified if the expert changes her assessments of the mode value or the credible
interval bounds of the probability of success at each site.
The second group involves the univariate graphs th a t highlight the main effect of each
explanatory variable associated with each of the elicitation sites.
36
These graphs plot the
elicited probability against the value of the site predictor with a standard regression fit. The
categorical predictors are drawn as bars to emphasize their discrete nature. Various regression
diagnostics graphs are given in the third group. These graphs help the expert consider how
the estim ated prior model elicited from her assessments corresponds to her knowledge overall.
The Elicitator software is w ritten in Java and uses open source libraries. It does not
require a commercial GIS, in contrast to the prototype of Denham and Mengersen (2007).
All statistical calculations are performed using the R statistical package. Elicitator uses a
Java package to communicate with R, without needing to run an actual instance of the R
software. This greatly increases the generality and flexibility of Elicitator as a stand-alone
tool th a t can be used by a wide range of experts with different backgrounds.
According to James et al. (2010), Elicitator is highly extensible and one of the main
extensions they are willing to handle is the ability to implement more GLMs rather than
only the logistic regression model. But they did not mention or discuss how this can be done
for other distributions and link functions under their proposed model for measurement error.
2.5
P rior elicita tio n for m ultin om ial m odels
An early attem pt to elicit a Dirichlet prior distribution for multinomial param eters was
suggested by Bunn (1978).
He argued th a t the usual fractile assessment procedure th a t
has been used for eliciting beta priors may be difficult and tedious to be applied on their
m ultivariate extensions, the Dirichlet priors, when more conditions and restrictions must
be taken into consideration. As will be shown on Chapter
6
of this thesis, developments in
computing techniques and tools make it easy to implement fractile procedures in user-friendly
software th a t assess quartiles and elicit Dirichlet priors effectively and interactively.
However, the approach suggested by Bunn (1978) as an alternative to the fractile m ethod
for Dirichlet elicitation was the m ethod of ‘imaginary results’. He used two versions of this
m ethod, namely, the Equivalent Prior Samples (EPS) and the Hypothetical Future Sample
37
(HFS), to quantify opinions about a Dirichlet prior. Specifically, let p =
p2)
.. . } pk),
be the vector of multinomial probabilities, with a Dirichlet prior distribution of the form
/(g ) = £ ( E k ]5 i i n k P i l I i
1 Ii=l
yQ'i)
2
> . = Xi
a.>0.
( 2 .3 3 )
1
It can be shown th a t the posterior mean of p i, say p i, after sampling N d ata is given by
di + rii
K = Z Z v^— ’
N + E i= l ai
,
.
(2'34)
where rii is the num ber of items, out of N , th a t falls in category i.
In the EPS m ethod, the expert is asked to assess a set of prior means p?, i = 1,2, • • • , k.
She also assesses the equivalent sample size of her subjective belief th a t would empirically
give this set of probabilities. This sample size gives direct information on
ai• Thus, the
prior hyperparam eters can be elicited as
k
a i = p * Y ,a ii —1
(2.35)
The main criticism to the usage of the EPS m ethod here is th a t the expert cannot easily
give an assessment for
ai directly. The assessed value does not necessarily represent
her opinion accurately and may contain sources of assessment bias. Therefore, Bunn (1978)
proposed the alternate HFS method, in which the expert also assesses the set of prior expec
tations p*,i =
1,2
, • • • , k, but, in addition, she is asked to assess her posterior expectations,
say p**,i — 1,2, •• • , k, given th a t a hypothetical future sample of size M has resulted in a
number
of mi items in category i , where
elicited,
using (2.34) and (2.35), as
1 < mi < M . Hence, the hyperparam eters can be
_* mi - Mpl*
ai = Pi — 7,;---- •
Pi - P i
(2-36)
The main source of bias in the HFS m ethod is ‘conservatism ’; the expert tends to revise
her probabilistic beliefs from prior expectations to posterior expectations as a result of the
new data ‘insufficiently’ if compared with the revision indicated by Bayes theorem. The
strong assumptions of the HFS m ethod, th a t the expert can be an ‘intuitive Bayesian’ and
38
can modify her prior beliefs in the light of new d ata sets, turned out to be poorly satisfied in
the case study of Bunn (1978) and other studies mentioned therein. For example, in eliciting
beta priors, W inkler (1967) found th a t the methods of imaginary results gave greater bias
than the usual fractile methods.
Another problem with the two methods suggested by Bunn (1978) is th a t probability
means are directly elicited from the expert. We believe th a t medians are easier to assess
and, by using the bisection m ethod, the expert will represent her beliefs more accurately.
Although the unit sum of the probability assessments can be directly fulfilled by assessments
of means (the means m ust sum to one), median assessments of these probabilities can be
elicited for beta marginal or conditional distributions. M ethods for reconciliation of beta
elicited distributions into a Dirichlet prior are proposed in Chapter
6
.
In the HFS m ethod of Bunn (1978), he did not give any suggestion about the selection
of the hypothetical sample. Instead, in a case study, he used an actual sample based on a
survey, and called his m ethod an Actual Future Sample (AFS). To investigate the feasibility
of this m ethod and its possible biases and subjective inconsistencies, the AFS m ethod was
implemented in a case study reported in Bunn (1978). In this study, a publisher quantified
his opinion about the expected m arket attitudes towards a new product. Different possible
attitu d e events were summarized in three categories, for which he assessed their expected
prior probabilities as
p\ = 0.20, pi = 0.30, pi = 0.50.
(2.37)
From his EPS assessment, Y h =i ai was se^ eQual to 10.
Then, a survey of 20 customers revealed th a t the num ber of customers in each category
were
6
, 7, 7, respectively. Based on this survey, the publisher was asked to revise his prior
probability expectations. He gave the following posterior expectations
$ * 0 4 ) = 0.25, $ * (A ) = 0.30, pl*(A) = 0.45.
(2.38)
To investigate the conservatism of the publisher, the posterior expected probabilities were
39
computed as in (2.34). Since, a\ =
2
,
<22
= 3,
03
= 5, the computed posterior expectations
given by Bayes theorem are
Pi*(C) = 0.27, $;*(<?) = 0.33, #5* (C) = 0.40.
(2.39)
Comparing the assessed posterior probabilities p**(A) in (2.38) to the computed ones in
p*i*{C) in (2.39) reveals the conservatism of the publisher, who did not revise his prior prob
abilities by as much as Bayes theorem would revise them.
Bunn (1978) discussed the possible reasons of the revealed bias and inconsistency in using
the methods of imaginary results for eliciting a Dirichlet prior. He argued th a t the expert
should complete several iterations with these m ethods to achieve consistent results. However,
he did not discuss how this might be done through feedback given to the expert, nor did he
suggest any m ethod of reconciliation. These drawbacks of the imaginary results m ethods
suggest th a t a fractile m ethod is to be preferred, especially in m ultivariate cases where more
inconsistency can be expected.
Using the same idea as the HFS m ethod, and consequently the same forms of equation as
in Bunn (1978), Dickey et al. (1983) reintroduced the elicitation m ethod with a different case
study. The m athem atical formulation of the two methods is identical. However, two main
differences in the elicitation process can be identified.
In assessing the expected prior probabilities
i = 1,2, • • • ,k , Bunn (1978) assumed th a t
the expert is coherently aware th a t these assessed expected probability m ust sum to one.
In contrast, in the work of Dickey et al. (1983), the expert was free to assess the expected
probabilities without being conscious of any probabilistic constraints. Instead, Dickey et al.
(1983) suggested normalizing the initial assessed probabilities to get the following normalized
set
(2.40)
th a t is guaranteed to add up to one. We use this simple normalization procedure extensively
for our proposed logistic normal prior in Chapters
40
8
and 9. An im portant property of a good
elicitation m ethod is th a t the expert is not overly conscious of the m athem atical constraints
on her assessments. M ethods th a t include normalization and reconciliation procedures are
generally better th an those th a t ask the expert to make assessments th a t meet specified
constraints.
The second difference between the elicitation procedure of Bunn (1978) and th a t of Dickey
et al. (1983) regards the reconciliation of an expert’s assessments. As mentioned before,
given the hypothetical sample, one expected posterior probability suffices to elicit the full
vector of the Dirichlet hyperparameters. B ut it is usually better to assess several posterior
probabilities and then reconcile the different results. Bunn (1978) regarded discrepancies in
results as inconsistency on the p art of the expert and suggested asking the expert to resolve
inconsistency by doing many iterations of the elicitation process. On the other hand, Dickey
et al. (1983) suggested reconciling different hyperparam eter values by averaging them . They
also advised th a t large discrepancies may indicate th a t the Dirichlet distribution is not a
suitable prior.
The case study in Dickey et al. (1983) quantified a social psychologist’s opinion about
the attitudes of potential jurors in law trials where the death penalty was available. Their
attitudes were classified into 4 categories, and the psychologist’s assessments of the prior
probabilities of the categories were:
Pl = 0.02, pi = 0.08, pi = 0.15, p\ = 0.75.
(2.41)
The psychologist was then told th a t a hypothetical sample of 200 potential jurors had
been distributed between the four categories as 16, 20, 32, 132. Given this information, the
expert revised her prior probabilities and gave the following expected posterior probabilities:
PI* = 0.05, p*2* = 0.09, pl* = 0.16, pl* = 0.70.
(2.42)
Using each of these values in (2.36) gives an initial value of ai, which can then be used
in (2.35), together with the corresponding prior probability, to get an estim ate of Y lt= iaiThese estimates were averaged in Dickey et al. (1983) and gave a value of 140. This gives
41
the final hyperparam eter elicited values, again from (2.35), as ai = 2.8,
0,2
=
1 1 .2
,
03
= 21,
a 4 = 105.
In contrast to the case study in Bunn (1978), the expert here was not conservative; her
posterior probabilities were closer to the relative frequency of the hypothetical data, 0.08,
0.10, 0.16, 0.66, rather than to her prior probabilities. A lack of conservatism is also shown
by the small value of J2i=i ai = 140, compared to the hypothetical sample size of N = 200.
Using (2.35), the posterior probabilities in (2.34) can be considered as a weighted average of
the prior probabilities and the relative frequency of the hypothetical sample, since
f,** _
______
Pi - N
^
Ui 4 .
^ i= l
ai
f*
+ Z t 1 ai N + N + T t= 1 a ! ’'-
(0 4Q')
^
If the expert assesses Y a =i ai t° be less than the hypothetical sample size N , then she gives
more weight to the relative frequency of the hypothetical sample. If Y^l= 1 ai = N , then the
expert has given her prior opinion and the data equal weight. As in Bunn (1978), Dickey
et al (1983) did not suggest a way to generate the hypothetical sample.
Another m ethod for eliciting a Dirichlet prior distribution was developed by Chaloner
and Duncan (1987) as an extension of their m ethod for eliciting beta distributions (Chaloner
and Duncan, 1983). Their approach relied on assessing the mode vector for the predictive
distribution, and some probabilities for other vectors around the mode. These assessments
were used to elicit a Dirichlet-multinomial predictive distribution th a t was then used to
induce a Dirichlet prior distribution for multinomial sampling. The approach thus differs from
other Dirichlet elicitation methods in using mode assessments and in utilizing the predictive
distribution rather than the prior distribution.
The predictive distribution of a multinomial likelihood and a conjugate Dirichlet prior
is a Dirichlet m ixture of multinomial distributions.
This distribution is referred to as a
Dirichlet-multinomial distribution and its probability mass function takes the form
r(n + i)r(7\r) [ n j u r f a + <.()'
f ( x i , x 2i- " , x k) = --------------r - -------r ( « + N ) [n jL j r ( z ; +
x i > 0)
1 )]
E iL l x i = n > ai > 0,
42
y r —— ---=r,
[n tiiW
E i = l CLi = N.
(2.44)
Chaloner and Duncan (1987) proved th at the Dirichlet-multinomial predictive distribution
in (2.44) is a unimodal distribution for large values of n. They also gave sufficient conditions
under which a vector, with components greater than or equal to one, is the unique mode of the
Dirichlet-multinomial distribution. These conditions are mainly related to the probabilities of
a set of vectors th a t are coordinate adjacent to the mode vector. Moreover, the identifiability
of the Dirichlet prior distribution from the Dirichlet-multinomial predictive distribution was
also proved.
The above results were used in an elicitation scheme th a t was implemented in a computer
program, in Chaloner and Duncan (1987), as follows. The expert specifies a large value of
n as the sample size. Then she specifies a mode vector m = ( m i , m 2 , • • • ,m k ) th a t satisfies
]Ci=i m i =
71 and
mi > 1. The computer program then uses a multinomial probability vector
of n ~ l m to compute probabilities at some points th a t are component adjacent to the mode
vector. These probabilities are presented to the expert and she is given the option of changing
them if they do not represent her opinion adequately. The modified set of probabilities,
together with the mode vector m, determine an initial value for the param eter vector a of
the Dirichlet-multinomial predictive distribution. This is also taken as the elicited param eter
vector for the Dirichlet prior distribution.
The elicitation scheme of Chaloner and Duncan (1987) does not stop there. Instead, they
chose to use the initially elicited vector a to compute the Dirichlet-multinomial probabilities at
the same points where assessments had been elicited and give them as feedback to the expert,
offering her the possibility of revising them to more closely represent her opinion. Moreover,
Chaloner and Duncan (1987) believed th a t more replications were required. Therefore, the
expert was to repeat the whole process again for a num ber of S different sample sizes n \,
n 2 , • • •, n s • The resulting param eter vectors a1, a2, • • •, as were to be reconciled to give
one final elicited vector of param eters. Chaloner and Duncan (1987) argued th a t it might be
“dangerous” to use an autom atic specific reconciliation m ethod, instead, they recommended
th a t the expert should examine the inconsistencies and “reconcile them introspectively” .
43
However, the m ethod requires direct assessment of the sample size n, this might lead to
improper representation of an expert’s opinion and incur more bias [Bunn (1978)]. On the
other hand, Chaloner and Duncan (1987) did not mention how large the assessed value n
should be, neither did they discuss whether the expert should keep in mind the constraint
HiL=i m i = n > on the mode vector m, or whether it may be corrected by the program if
necessary. Nevertheless, it seems from their reluctance to apply any reconciliation th a t they
preferred to leave it to the expert to make sure th a t the constraints were satisfied. Repeating
the elicitation process for S different sample sizes may constitute an extra burden on the
expert, especially if she is responsible for the final reconciliation. Unfortunately, the computer
program implementing their m ethod does not seem to be available for reviewing and testing.
Instead of using means or modes, van Dorp and Mazzuchi (2000, 2003, 2004) introduced
a numerical algorithm and software to specify the param eters of a beta distribution and its
Dirichlet extensions using quantiles. The motivation for their work was to quantify expert
opinion as beta and Dirichlet distributions for subjective Bayesian analyses. They favored
assessing quantiles rather than means or modes, as betting strategies can be used by the
expert to make their assessments. They started by solving for the two param eters of a beta
distribution using two quantiles, as follows.
First, to ease the generalization to Dirichlet extensions, the beta distribution with two
param eters a and b was reparameterized in term s of a location param eter fi — a/{a + 6 ), and
a shape param eter N = a + 6 . Given the values of any two quantiles, say L and U, L < U : the
two param eters fi and N can be obtained, although solving for these two param eters involves
the use of the incomplete beta function, so th a t no closed form solution can be obtained, van
Dorp and Mazzuchi (2000) utilized the limiting forms of a beta distribution as N tends to 0
and oo to prove the existence of at least one solution for the beta param eters in term s of any
two quantiles.
They gave a numerical algorithm to determine the beta param eters using a bisection
m ethod as a numerical search procedure. If multiple solutions were found, the algorithm
44
selects the solution with the lowest value of N , i.e. with the highest level of uncertainty. The
algorithm was implemented in software called BETA-CALCULATOR th a t inputs any two
beta quantiles to output the corresponding values of the beta param eters.
To extend the numerical algorithm to Dirichlet param eters, van Dorp and Mazzuchi
(2003, 2004) used quantiles th a t were assessed through direct specification of marginal beta
distributions. A Dirichlet distribution as given in (2.33) was also reparameterized in terms of
its mean values /n = cn /N, as location param eters, and N = ^2i=l ai as a shape param eter.
The extended algorithm was designed to use two quantiles for one of the Dirichlet variates,
say Li and Ui, Li < Ui, for the ith variate, and ju st one quantile for each of the remaining
variates, say Qj , j ^ i. Hence, the number k of quantile equations th a t they had is exactly
equal to the number of required param eters.
Following similar lines to their arguments for the beta distribution, van Dorp and Maz
zuchi (2003, 2004) showed theoretically th a t at least one solution of the resulting system of
equations always exists. The two quartiles Li and Ui were first used to elicit the m arginal
distribution of the ith Dirichlet variate as X i ~ beta(/Xi, N ) . The value of N is then used
with the quantiles Qj to elicit the remaining beta m arginal distributions as X j ~ beta(/Ltj, N ) ,
j 7^ i. If more than one solution exists, they decided to choose the solution w ith the smallest
N , which is again the solution with maximum Dirichlet variance, hence giving the highest
level of uncertainty. In addition to the Dirichlet distribution, they also gave another numer
ical algorithm for the ordered Dirichlet distribution, which differs from the Dirichlet in the
domain of its variates, see Wilks (1962).
A criticism of the algorithm regards the selection of the Dirichlet variate for which two
quantiles are assessed. No comment regarding the selection of this special variate was given in
the published paper. The importance of its choice is th a t it determines the value of N for all
other variates and hence determines the variances of the Dirichlet distribution. If substantial
bias is made in assessing these two quartiles, all elicited param eters will be highly affected as
a result.
45
In addition, to get a better representation of an expert’s opinion in the elicitation context,
it is better to use over-fitting (Kadane and Wolfson (1998)). We believe th a t it is preferable to
assess more quantiles than the minimum necessary and then apply a reconciliation technique
to estim ate param eters. The expert may then be given feedback and questioned as to whether
the feedback corresponds to her opinion, with re-assessment made when necessary.
A possible general m ultivariate distribution, th a t can serve as a prior distribution for
multinomial models, is constructed through using a m ultivariate copula function. A copula
is defined as a function th a t represents a m ultivariate cumulative distribution in term s of
one-dimensional marginal cumulative distribution functions. Hence, it joins marginal distri
butions into a m ultivariate distribution th a t has those marginals. The im portance of the
copula function is due to Sklar’s Theorem, which states th a t any joint distribution can be
w ritten in a copula form. The marginal distributions can thus be chosen independently from
the dependence structure th a t is represented by the copula function. For an introduction to
copulas, see for example Joe (1997), Frees and Valdez (1998) and Nelsen (1999).
The use of copula functions to elicit m ultivariate distributions has been considered in the
literature, see Jouini and Clemen (1996), Clemen and Reilly (1999) and Kurowicka and Cooke
(2006), among others. The joint distribution can be elicited by first assessing each m arginal
distribution. Then the dependence structure is elicited through the copula function. Different
families and classes of copula functions have been defined for both bivariate and m ultivariate
distributions. Jouini and Clemen (1996) used bivariate and m ultivariate Archimedean and
Frank’s families of copulae to aggregate multiple experts’ opinions about a random quantity.
However, the simplest and most intuitive family of copulae is the inversion copula [Nelsen
(1999)], of the form
C[G iO n),-- - , G k ( x h)} = F(1,..,t) {•F’i"1[Gi(:ei)],• • • .F ^ G * ^ * ) ] } ,
where G{ are the known marginal distribution functions,
(2.45)
tk) is the assumed m ultivariate
distribution function and its marginals are Fi. Hence, the marginal functions Gi S are coupled
through
into a new m ultivariate distribution given by the copula function C.
46
The distribution F ( i s
usually selected as a m ultivariate normal distribution, which
gives a Gaussian copula [Clemen and Reilly (1999)]. It has also been taken as a m ultivariatei
distribution, [Demarta and McNeil (2005)], or even as a Dirichlet distribution [Lewandowski
(2008)]. The Gaussian copula function is given by
,G t ( ^ ) ] = $ M {<6-1[G i(x1)],--- .S-H G kO r*)]}-
(2-46)
where $k,R is the cdf of a fc-variate normal distribution with zero means, unit variances,
and a correlation m atrix R th a t reflects the desired dependence structure. $ is the standard
univariate normal cdf.
For eliciting a m ultivariate distribution, the Gaussian copula is the most appealing, see
Clemen and Reilly (1999), as it is param eterized by the correlation m atrix R of the m ulti
variate normal distribution; hence it only requires pairwise correlations among the variables.
To elicit the Gaussian copula, any assessed positive-definite correlation m atrix R can be used
together with the elicited marginal distributions G i ( x \ ) , • • • ,Gk(%k)- As w ith any other in
version copula, any univariate distributions are allowed as m arginal distributions Gi s in the
Gaussian copula.
To elicit R, Clemen and Reilly (1999) suggested th a t a pairwise rank-order correlation
between each
and X j , such as Spearm an’s p i j or Kendall’s Tij, should be assessed.
Then properties of the m ultivariate normal distribution are used to transform them into
the product-mom ent Pearson correlation r y as follows:
Tij =
2
sin( 7T/0i,j/ 6 ),
or
n j = sin( 7rrtJj / 2 ).
(2.47)
Then the product-m om ent correlation m atrix R is formed from the elements
Clemen and Reilly (1999) suggested th a t only rank-order correlations should be elicited,
not product-mom ent Pearson correlation, as the latter cannot necessarily be transform ed
through the function 4>- 1 [£?*(.)] - while rank-order correlations transform regardless of the
choice of the marginal distribution function Gi(.). To elicit these correlations, Clemen and
Reilly (1999) mentioned three methods th a t can be used either separately or together. The
47
first m ethod involved the direct assessment of the correlation coefficient. Although people are
not good at such direct assessment (Kadane and Wolfson, 1998), experimental evidence in
Clemen et al. (2000) suggested th a t it can be a reasonable approach. The other two methods
were based on assessed conditional probabilities or conditional quantiles th a t can be used to
compute Kendall’s r or Spearm an’s p correlation coefficients, respectively.
The m ethod proposed by Clemen and Reilly (1999) for eliciting a correlation m atrix is
not guaranteed to yield a positive-definite m atrix. They cited two other studies in which
dependence measures were assessed in a hierarchical way using dependence trees th a t require
a fewer number of assessments. These studies use entropy maximization to guarantee the
positive-definiteness of the resulting correlation m atrix. However, Clemen and Reilly (1999)
criticized this approach for the relatively constrained nature of its dependence structure
modelling. Instead, they suggested th a t the expert should be asked to revise her assessments
if the resulting correlation m atrix is not positive-definite. For large problems with many
variables, this revision m ethod would generally be very tedious and confusing.
In Chapter 7, we propose a m ethod for eliciting a Gaussian copula function, as a prior
distribution for multinomial models. Our approach overcomes two problems of the m ethod of
Clemen and Reilly (1999) simultaneously. First, we transform the assessed conditional quar
tiles of X i and X j , through 4>- 1 [(?*(.)], then product-m om ent correlations can be computed
on the normal scale with no need for the rank-order correlations. Second, the conditional
quartiles are assessed according to the structural elicitation procedure of Kadane et al. (1980),
which guarantees th a t the elicited correlation m atrix is positive-definite.
Copula functions were used extensively in the literature for building m ultivariate distri
butions based on known marginals. This includes, of course, building joint prior distributions
for Bayesian analysis using copulae. For example, Yi and Bier (1998) utilized some copula
families to construct a joint prior distribution th a t reflects inter-system dependencies between
accident precursors in a Bayesian study to estim ate accident frequencies. A Gaussian cop
ula has not been widely used in the literature as a prior distribution for m ultinom ial models.
48
However, the need for a flexible joint prior distribution th a t effectively combines the marginal
beta prior distributions of multinomial probabilities makes the Gaussian copula an attractive
choice as it gives a more general dependence structure th an the usual Dirichlet distribution.
An applied Bayesian study by Palomo et al. (2007) used a Gaussian copula to model external
risk in project management. In one of their adopted scenarios, they assumed th a t any of
k potential disruptive events might occur, one at a time, according to a multinomial distri
bution. The multinomial probabilities were assigned beta marginals, and a Gaussian copula
function was used as a m ultivariate distribution to param eterize the dependence structure
between these probabilities.
2.6
O ther general graphical elicita tio n softw are
This section reviews other interactive graphical elicitation software th a t has been reported in
the literature. Software projects th a t are reviewed below cover general elicitation problems
apart from those for GLMs and multinomial models. These have already been reviewed in
Sections 2.4 and 2.5.
Chaloner et al. (1993) aimed to quantify experts’ opinion in the form of a prior distribution
about regression coefficients in a proportional hazards regression model. In a clinical trial,
prior distributions from five AIDS experts were elicited. To compare two treatm ents with
a placebo, experts were asked to elicit the joint and m arginal distributions of the survival
probability under each treatm ent. This could be done by assessing some probabilities and
quantiles to elicit a joint extreme value prior distribution for the proportional hazards model
param eters.
For this purpose, they developed an interactive com puter program th a t uses interactive
graphs to elicit experts’ opinion and give them feedback. The curves of the two m arginal
distribution and the contour representing the joint distribution were presented to the experts.
This feedback was given in the form of dynamic graphical displays of probability distributions
th a t can be adjusted freehand.
49
Some of the main “lessons” learned about this elicitation process, as stated by Chaloner
et al. (1993), can be summarized as follows. They stressed the importance of the dynamic
graphical displays in helping experts to visualize probability distributions and in giving useful
instant feedback. They also noted th a t it is necessary to have a clear well-defined outline
and explanation of the questions th a t will be addressed to the expert. In cases where an
expert had to assess her best guess of a specific probability, they wanted her also to report
her uncertainty about it. In assessing approximate bounds, experts found extreme percentiles
easier to think about than quartiles. However, there is substantial empirical evidence th at
people are poor at assessing extreme quantiles [e.g. W inkler (1967); Hora et al. (1992)] and we
believe th a t quartiles provide a more faithful representation of an expert’s opinion, especially
if they are assessed using the bisection method.
A comparatively simple elicitation computer program was developed by Kadane et al.
(2006) for the generalized Poisson distribution. In their paper, they explored the properties
of the Conway-Maxwell-Poisson (COM-Poisson) distribution, in particular, the conjugate
family of prior distributions associated with it. A computer application has been created to
elicit the hyperparam eters of the conjugate prior distribution of the COM-Poisson param eters.
The COM-Poisson distribution is a two param eter generalization of the Poisson distribu
tion th a t allows for over- and under-dispersion. It has the following probability function
P r { X = x\X,v} =
Xx
1
(x\y' z (\,v y
a; =
0
, 1,2 ,...,
where
OO
The distribution indicates over-dispersion (under-dispersion) if u is less (greater) th an 1. It
is the usual Poisson distribution if v — 1 . Since the COM-Poisson distribution is a member
of the exponential family, it has a conjugate prior of the form
h(X,v) = Xa l e ubZ(X,v) Ck(a,b,c)}
where k(a,b,c) is the integration constant.
X > Q ,v> 0,
The computer program, available at http://w w w .stat.cm u.edu/C O M -Poisson/, is de
signed to elicit the values of the hyperparam eters a, b and c from the field expert. It computes
and plots the histogram of the predictive distribution at allowable selected values of a, b and
c. Specifically,
P r{ X = x\a, 6 , c} = fc(a, b, c) /
roo roo
/ A‘*+ I- 1 e - ‘/(i,+log(l!))Z(A, v)~[c+1)d\dv.
Jo
Jo
Kadane et al. (2006) pointed th a t it may be difficult for the expert to give meaningful
values for the hyperparam eters a, b and c, since the distribution is likely to be new to her.
They assumed th a t the expert may have some knowledge about P r{X = x}.
program plots the predictive distribution as feedback to the expert.
Thus, the
She can type in or
modify the values of a, b and c using sliders and see the direct im pact on the predictive
histogram.
However, it does not seem th a t the expert will be able to adjust three values sim ulta
neously to assess a histogram th a t represents her prior belief. Also, some combinations are
not allowed because of m athem atical incoherence, and some others need large numbers of
iterations to produce the histogram. A lot of adjustm ent may be needed before the expert
is happy with the histogram, since no specific combination of the hyperparam eter values is
known in advance for any intended appearance of the histogram.
2.7
C oncluding com m en ts
In this chapter, we have reviewed some of the relevant research work on eliciting prior distri
butions for the Bayesian analysis of GLMs and multinomial models. We have also discussed
and reviewed the main psychological aspects th a t are usually involved in making the assess
ments to elicit these prior distributions. In addition, we commented on some of the recent
interactive graphical software th a t have been reported in the literature for implementing and
facilitating the elicitation processes in some other statistical problems. However, this review
has been restricted to work th at is directly relevant to the elicitation m ethods proposed in
51
this thesis. There is a huge body of research th a t handles elicitation problems and tech
niques in general. As noted earlier, psychological concerns and recommendations for efficient
elicitation techniques will be taken into consideration while developing the elicitation m eth
ods proposed in this thesis. Available elicitation techniques and computer software will feed
into the methods developed in the next chapters and will help in building the software to
implement these proposed methods.
52
C hapter 3
The piecew ise-linear m odel for
prior elicitation in GLMs
53
3.1
In trod u ction
Generalized linear models (GLMs) constitute a natural generalization of classical linear mod
els, where the linear predictor p art is linked to the mean of the dependent variable through
some link function. The distribution of the dependent variable is not necessarily assumed to
be normal. The model is determined by a combination of the link function and the family
of distributions to which the dependent variable belongs (see McCullagh and Nelder (1989)
for an introduction to GLMs). Being very common in both frequentist and Bayesian data
analysis, GLMs have attracted much research.
An im portant task in the Bayesian analysis of GLMs is to specify an informative prior
distribution for model param eters. Suitable elicitation methods play a key role in this task of
representing expert knowledge as a prior distribution (see, for example, Bedrick et al. (1996)
and O ’Leary et al (2009)).
A m ethod of quantifying opinion about a logistic regression model was developed by
G arthwaite and Al-Awadhi (2006). They mentioned th a t the m ethod is very flexible and can
be generalized to GLMs with any link function, not just the logistic link. This generalization
has been introduced by the same authors in an unpublished paper, Garthw aite and AlAwadhi (2011). Their m ethod has been used to quantify the opinions of ecologists (Al-Awadhi
and Garthwaite (2006)) and medical doctors (Jenkinson (2007); G arthw aite et al. (2008)).
However, the m ethod makes simplifying assumptions regarding independence between the
regression coefficients. One purpose of the current thesis is to extend the elicitation m ethod
so th a t these assumptions are unnecessary. Different m ethods for this extension are proposed
in Chapter 4. This will significantly increase the range of situations where the m ethod is
useful.
The original m ethod for logistic regression was developed and implemented in user-friendly
interactive software.
The software was re-w ritten in Java by Jenkinson (2007) who also
extended it to elicit expert opinion about some other GLMs.
extended further by the author of the current thesis.
54
It has been modified and
The software is interactive, requiring the expert to either type in assessments or plot points
on. graphs and bar-charts using interactive graphics. An executable stand-alone version of
the current software is available as a java executable (jar) file and a Windows executable
file (with .exe extension). The stand-alone versions together with the user m anual and the
source code are freely available as Prior Elicitation Graphical Software for Generalized Linear
Models (PEGS-GLM) at http://statistics.open.ac.uk/elicitation. The software is aimed to
be executable on any machine regardless of its operating systems and w ithout need of any
other software packages.
The current modified version of the software is more flexible in determining the options
available for the user, especially for data input and results output. Some im portant modifica
tions involve broadening the scope of available models and the range of the link functions, and
giving the user many suggestions, help notes and video clips, questions, warning messages and
directions aimed at making the software more interactive and easy to use for non-statistical
experts. Useful feedback has also been added.
In this current chapter, the piecewise-linear model of Garthw aite and Al-Awadhi (2006)
is reviewed, and we describe the elicitation m ethod they propose together with the above
modifications. The assessment tasks th a t the expert performs quantify her opinion about
the regression coefficients as a m ultivariate normal prior distribution. The largest extension
to the current version of the software is a new section for assessing expert knowledge about
correlated covariates. This will be introduced in Chapter 4. Im portant options have been
added to the m ethod th a t quantify opinion about the extra param eter in GLMs th a t involve
gamma and normal distributions. The theoretical derivation and im plem entation of these
options are proposed in C hapter 5.
55
3.2
T h e elicita tio n m eth o d for p iecew ise-lin ear m od els (G A
m eth o d )
For quantifying expert’s opinion about GLMs, Garthw aite and Al-Awadhi (2011) proposed a
m ethod to elicit expert opinion about the prior distribution of regression coefficients and its
hyperparam eters. As mentioned before, the m ethod, which will be referred to here as GA, is
a generalization of the same authors’ piecewise-linear model th a t they used for quantifying
opinion for logistic regression (Garthwaite and Al-Awadhi (2006)).
In their work, the relationship between each continuous predictor variable and the link
function (assuming all other variables are held fixed) was modeled as a piecewise-linear func
tion. Figure 3.1 illustrates a piecewise-linear relationship between the quantity of interest
Y, and a continuous covariate “Weight” ; the relationship correspondence to a sequence of
straight lines th a t form a continuous line. The endpoints of the straight lines are refereed to
as knots.
Hour, you have finished with this continuous covariate (W eight), you m ay p ress 'Next Covariate' to proceed
F8o Edit Tools Help
Eliciting M edians of Y fo r v a lu e s o f W eight
W eight
[Revised median at 1o O ]
Figure 3.1: A piecewise-linear relationship given by median assessments
If the predictor variable is a categorical covariate, it is referred to as a factor. Its relation
ship with Y corresponds to a bar chart as in Figure 3.2, where the factor takes four levels:
Very large, Large, Normal and Small.
The aim of the elicitation process is to quantify opinion about the slopes of the straight
lines (for continuous variables) and the heights of the bars (for factors). In the GA m ethod, a
m ultivariate normal distribution was used to represent prior knowledge about the regression
coefficients.
These coefficients were allowed to be dependant if associated w ith a single
variable. A detailed discussion of their model is given next.
Now, you fiava finished with this factor (X I). you m ay p ress Tfoxt co v a ria te ' to proceed
Fie Edft Tools Help
Eliciting M edians of Y fo r v alu e s o f X1
0.95
0.90
0.85
0.60
0.75
0.70
0.65
0.60
0.55
>- 0.50
0.45
0.40
0 .35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
Very large
Large
N o rm a l
S m a ll
[Revised median atQ.ol
Figure 3.2: A bar chart relationship for a factor given by median assessments
3 .2 .1
T h e p ie c e w is e -lin e a r m o d e l
Consider a response variable £, w ith m continuous covariates R \ , R 2 : - ■■ , R m and n cat
egorical variables (factors) R m+i , R m+2 , • • • , R m+n- Each variable R{ has
6 (i)
+ 1 knots,
ri,Q, riti, • • • , ritS(i), where r* j_ i < ri}j for j = 1,2,-** , 8 (i) and i = 1,2, • • • , m + n. These
knots represents the dividing points of the piecewise-linear relation for the continuous vari
ables, or levels for factors, with r^o taken as the reference point of each continuous covariate
Ri, i = 1, • • • , m, or the reference level of each factor R{, i — m + 1, • • • , m + n.
57
Let r 0 be the overall reference point, where all variables are at their reference values, i.e.
“ (>"1,0 , r2,0 ,
(3.1)
rm+n,0) •
For the response variable £, the expert is asked about its mean values given points on the
space of the explanatory variables, i.e. about
(3.2)
M(r) = S(C IS = r).
where R =
( / £ 1;
... 5
J7T n+n) / ) a n d
r is any specific value of R.
Let
Y = ffKr)] = “ + £=£i + 0 2X 2 + --- + 0 m+nX n+n,
where </(.) is any monotonic increasing link function.
(3.3)
If g(.) is monotonic decreasing we
multiply it by -1, then change the sign of the resulting regression coefficients. We put
2L = (Xi,u
=
X (,2 ,
X iAi)y,
A.2,
/ W '>
i = 1 .2 ,- " , m + n,
(3.4)
+ n.
(3.5)
i =
The relation between Ri and X_{, for continuous covariates is that:
0
X hJ
if Ri < n j - i
Ri
if n j -
nj-i
-
dij
1
<
R i < r itj
(3.6)
if r ij < R i ,
for i = 1,2, • • • ,m , and j — 1,2, • • • , 5(i), where
di,j — r i j
(3.7)
r i j - 1.
For factors, Xi j is defined by:
1
Xij
(3.8)
0
for %= m + l ,m + 2, • • •
,7 7 7
+
77 ,
if Ri = r
—
otherwise,
and j = 1,2,*** , <5(i).
Note th at, if Ri — r^o, then X_{ is a zero vector {i = 1, • • • ,7 7 7 +
58
77 ).
The m ethod concentrates on an expert’s opinion about each covariate Ri separately, one
at a time, assuming th a t all other covariates are kept at their reference values. Hence, for
any specific value r, Yi(r) is defined as
(3.9)
Yi(r) = g[m(r)],
where
tH{r) = M (rij0, •••,
n - i,o,
r,
ri+i.o,
•••,
(3.10)
denotes the mean value of £ when Ri has a value of r, and Rj = r+o, j ^ i.
Then
Yi(r) = a + {Y.J(_i,
i = 1,2, ••• ,m + n.
(3.11)
Now, for i = 1, • ■• , m + n, j = 1,' ” , 6 (i), let
(3.12)
Y{j = Yi (r ij ) .
For (3. as in (3.5), if Ri is a factor and r — r i j , then, in view of (3.8),
(3.13)
Y i j — ex. -f- P i j ,
hence, for factors, where i = m + 1, • • • , m + n, j = 1, • • • , S(i), we have
Pi,j = Y i j
(3.14)
— Y ^q.
For continuous covariates, from (3.6) and (3.7), for i = 1, • • • , m, j = 1, • • •
Pi,j ~
Yi,j ~ Y i j - i
(3.15)
dhj
The values of f a j are the slopes of the piecewise-linear relation in Figure 3.1.
The prior distribution of a and (3 = { r '
r'
...
r'
)' is assumed to be the follow— t—V tL2’
’ t-m+n
ing m ultivariate normal distribution,
/ (
( \
a
\
bo
w I
~ MVN
Kb~ J
59
(0-0,0 £ i,\\
V -1
*7
(3.16)
The elicitation of the hyperparam eters bo, b, <7 0 ,0 ,
£1
and S is reviewed in the next section.
The m atrix S is assumed to have a block-diagonal structure, as the vectors (3^,(3'2, • • • >/3m+n>
are assumed to be independent a priori. We propose three elicitation m ethods th a t relax this
assum ption in the next chapter.
3.2.2
E liciting th e hyper param eters o f th e m ultivariate norm al prior
The assessments th a t are required for eliciting all the prior hyperparam eters are only medians
and quartiles of Pi(r). The monotone increasing function g(.) in (3.9) is then used to transform
these assessments into medians and quartiles of Yi(r). Two main properties of the assumed
normal distribution of Y are used extensively to elicit the hyperparam eters from medians
and quartiles. Namely, these properties are equating means to medians and getting variances
from interquartile ranges.
It is well-known th at, for normally distributed Y ,
(3.17)
where Q\ and Q 3 are the lower and upper quartiles of Y , respectively, as 1.349 is the in
terquartile range of a standard normal distribution.
Using the above approach, the elicitation of each hyperparam eter is detailed below.
E licitin g
60
Let 7/10,0,5,
and
m o ,o .2 5
0 0 ,0
and
7710 ,0 .7 5
be the median, lower and upper quartiles, respectively, of
yu(r0). Recall th a t r 0 is defined in (3.1) as the reference point of all variables, in which case,
Y is equal to the constant term a. The expert assesses
7710 ,0 .5
,
7710 ,0 .2 5
and mo,0 .7 5 >which are
then transform ed into the corresponding quartiles of Y , using the monotone increasing link
function g{.) in (3.3), as
yo,q = g{m 0 ,9)1
for
60
9
= 0.25,0.5,0.75.
(3.18)
So, bo and <to,o are elicited, in view of (3.17), as
bo = t/0,0.5,
7/0,0.75 — 7/0,0.25
0 o ,o =
(3.19)
2
1.349
(3.20)
E licitin g b
The expert is asked to assume th a t her previously assessed value mo,o.5 is the true value of
the mean of £ at the reference point r^o, i.e. assume th a t /ii(r;,o) = mo,0 .5 , for each covariate
%in turn, i = 1,2, ••• ,m +
71
. Given this information, she then assesses the conditional
median of P i ( r ) at all other knots of Ri . These conditional medians are denoted by m ^ j y0 .5 ,
for j = 1,2, - • • ,5{i).
Hence
m j , 0 .5 = The Median of [/ii(ri,j)|/ii(ri,0) = m 0,o.5]-
(3.21)
The use of the software to assess these conditional medians is reviewed in detail in Sec
tion 3.3.3.
From (3.16),
b = E((3) = E ( P \ a = bo),
(3.22)
but, from (3.1), (3.10), (3.18) and (3.19), we have
b = E [ P \ m ( r i , o ) = m 0,0 .5 ]-
(3-23)
From the conformaly partitioning in (3.16), each element of b in (3.23) is of the form
= m 0,o.5 ]-
bitj =
(3-24)
Applying g{.) on both sides of (3.21), in view of (3.9) and (3.12), we get
,0 ) = mo,0 .5 ] = Vi,j, 0 .5 ,
(3.25)
where
7/i,j, 0.5
=
0 .5 ).
61
(3.26)
Now, from (3.24) and (3.25), bij can be elicited for factors, in view of (3.14), as
K j ~ Vi,j,0-5 - 2/0,0.5,
(3.27)
for i = m + 1, • • • , m + n, j = 1, • • • , J(i), and for continuous covariates, in view of (3.15), as
bid = —J’0-5 ~ V ' j - 1’0-5 ,
di,j
(3.28)
for i = 1, • • • , 771, j = 1, • • • ,5(i).
Eliciting
For any value a* satisfying a* ^ bo, it can be seen, from (3.16) and the theory of m ultivariate
normal distribution, th a t
E{§_\a = a^■) = b + a l (7a l(a •- b 0),
(3.29)
from which
£1
[E{§\a = a *) = ----------- j — 7
6 ]cr0 ,0
a* - b 0
So, g_x can be elicited using assessments of
,0
(3-30)
•
— E((3\a = ck*), or equivalently, the expert is
asked to assess
= g ' 1^*)],
77iijj)0.5|a* = The Median of
Following the same approach as in (3.27) and (3.28), equation (3.31)implies, for
bi7j\a* Vi,j,0.5|a*
for % =
771
+ 1,
•
^ >
(3-31)
factors, th a t
(3.32)
••, 771 + 7i,j = 1, • • • , 5(i), while for continuous covariates it implies th a t
i
®i,j\a*
_
Vi,j,0.5|a*
Vi,j- l,0.5|a*
i
hj
>
/ Q Qq \
(.o .o o j
for i = 1, ■■• , 77i, j = 1, • • • , £(i), where
Vi,j, 0.5|a*
9(V^i j,0 .5 |a * ) •
(3.34)
Using the interactive software, a * is taken as 2/0 ,0 .7 5 , and the task of assessing 77i^j o.5 |y0 0 75
is detailed in Section 3.3.5.
62
E lic itin g E
For eliciting the variance-covariance m atrix E of the m ultivariate normal prior distribution of
/?, the m ethod of GA adopts a structured approach th a t recursively elicits conditional lower
and upper quartiles given incremented sets of the previously assessed m edian values. The
aim of using this structural elicitation is to be able to ensure th a t assessments yield a m atrix
E th a t is positive-definite, as required for m athem atical coherence.
The idea is th a t assessed conditional quartiles are transformed, under the normality as
sumption, into sets of conditional variances th a t determine all elements of E. The positive
definiteness of E is guaranteed under a very logical condition th a t is quite simple to recognize
and which the expert can fulfill during the elicitation process. Specifically, the expert is
asked to keep reducing her uncertainty as a set of conditional values is increased. Condition
ing on more information should increase her confidence in her assessed values, especially as
the conditions say th a t her previous median assessments were accurate.
In what follows, we review the m ethod of GA for eliciting E, using the same notations
and equations of Garthwaite and Al-Awadhi (2006).
In the next chapter, we propose a
generalization of the m ethod for the case of correlated vectors of regression coefficients.
Let the conditions th a t /^(r*,o) = rao.o.s and
0.5
be denoted by m °0 and
respectively, for i = 1,2, • • • , m + n, j = 1,2, • • • , S(i).
For each covariate R 4 , %= 1,2, • • • , m + n, the assessment process consists of 5(i) steps.
At step k, k — 1,2, • • • , 5(i), the expert is asked to assume th a t conditions m ^ 0, m - l5 • • •,
m ik-
1
hold. Given this information, she assesses the conditional lower and upper quartiles of
denoted by
771*^,0.25
|77i°0, *■•
and
771^ , 0.75
|77iJi0, • • • , m ^ k_ v respectively, for
j = k, k + l , -- - ,S(i).
The use of the interactive software to obtain the assessments of these conditional quartiles
is discussed in Section 3.3.6.
For i = 1,2, • • • , m + n, fc = 1,2, • • ■, S(i), j — k, k + 1, • • • , £(z), using (3.17), the assessed
63
conditional quartiles are used to elicit the conditional variance:
where y
denotes the condition th a t Y^i —
0 .5 ,
which is equivalent to
from (3.10),
(3.12) and(3.26).
For m athem atical coherence, conditioning on more values at each further step must reduce
the value of the conditional variance in (3.35). Consequently, the expert m ust steadily reduce
her uncertainty when she moves from one step to another. In view of (3.35), this means th a t
the assessment of the interquartile range in step k must be less than th a t in step k —1 , which
guarantees th a t
(3.36)
For i = 1,2, • • • , m + n, k = 0,1, • • • , S(i) — 1, let the conditional variance-covariance
m atrix
be defined as
(3.37)
To elicit the full m atrix A^o in the last step and investigate its positive definiteness,
m athem atical induction is used to obtain a positive-definite m atrix A j^ -i from A^k th a t has
the same property.
To achieve this, let
(3.38)
for k =
1,2
, • • • , <5(z), where
is a scalar,
fc is a vector and 4 ^ is a square m atrix.
In particular, the scalar <p^k,k in (3.38) is given by
The scalar 4>i,k,k can thus be directly elicited using (3.35).
The vector —
d).,
%,k takes the form:
(3.40)
From the theory of conditional m ultivariate normal distributions, and for j — k + 1 , • • • ,S(i),
we have
V a r ^ -I ^ o , • • • , yf)k) = V a i ( Y i t j \ y l 0 , • • • , y - ^ ) -
(3-41)
Hence, from (3.36) and (3.41), $i,k,j >j — k + 1, • • • , £ (« )5 i*1 (3.40) is given by
= {hk,k[y&TiYi j \ y i ^ y i , i r •' ’Vi,k- 1 ) - V a r ^ j l ^ o , ^ ! , • • •,2/i>fc)]}2-
(3.42)
W hat is left to be elicited in (3.38) is the m atrix $ i fk, which can be computed, using the
conditional m ultivariate normal theory, as
$i,k
= A ijk +
( 3 ,4 S )
Hence, the m atrix A ^ - i in (3.38) can be obtained from A ^ , for k = 1,2, • • • ,S(i) — 1.
Finally, A^o is the result of applying the same routine recursively, starting with A ^ j - i
as
Ai,S(i)-i
=
Va r(Yi,S(i) A
>2/?,i’ • • • ’ V
i m - 1)-
( 3 > 44)
It can be seen, from (3.35) and (3.44), th a t
Ai,5(i)_i > 0.
(3.45)
From (3.38) and (3.43), we can write the determ inant of A ^ - i as
|A i ,f c - l | =
4>i,k,k\$i,k ~
— 0i,fclfc|AiIfc|.
(3.46)
Hence, from (3.45) and (3.46), A^o is positive-definite.
Under the independence assumption between the elements of different vectors of regression
coefficients, the m atrix A can be defined as
65
O
O
D ^ A i . o ^ r 1)'
O
O
I
O
O
D - 1Am)0( D - 1)'
:
:
(3.47)
A
\
•
•
O
Ajn4.ito
O
•
:
:
o
•.
o
O
• ••
•••
•••
O
Am+n,Q
where, for i = 1,2, • • • , m, each Di is a lower triangular m atrix given by
^ ,i
Di =
0
0
•••
0
di, l
dit2
0
•••
0
di, i
<^^2 c?i,3
0
:
ydi,i
W ith d{,j as defined in (3.7), di,j
d{, 2
d{,3
•••
^
(3.48)
d{,$^ J
0, and hence D r 1 exists. Since, for continuous covariates,
from (3.15), we have
(YiA,
yj,2,
Yi m )' = (a,
■■■,
aY + DiPi,
(3.49)
then
V ar(C i^ .|a) = Var((yi l j
YiMi)Y\a ) = A>,o,
yj2i
for * = 1,2,■ ■■ ,m .
(3.50)
Hence,
D ^ 1A i ^ D ^ Y ,
for i = 1,2, • • • , m,
vm<*) =
(3.51)
A^o,
for i = m + 1 ,m + 2, • • • , m + n.
In view of (3.16), the m atrix S, as the unconditional variance of (3, can be given by
£ = A + fljtr^ i.
(3.52)
The full variance-covariance m atrix of (q^ p')' is thus positive-definite, from (3.16), (3.47)
66
and (3.52), since
—^o,o|S
ai
£i0-O)o£il —°o,o|A|.
(3.53)
S
The needed assessment tasks in order to elicit all the hyperparam eters bo, b, <ro,o, Q_\ and S,
are given in detail with the software description in Section 3.3.
3.2.3
C om puting values for th e suggested assessm ents
For larger elicitation problems, with many covariates or large numbers of knots per covariate,
the number of required assessments increases and may represent an overload on the expert. To
reduce this number of assessed quantities and help the expert to go through the elicitation
process more easily, the m ethod of GA suggests some values of assessments th a t can be
presented by the software to the expert, as a guide for her possible assessed conditional
medians and quartiles.
The expert may accept these suggestions if she finds them a reasonable representation of
her opinion. Or, instead, she may change or modify them to the best of her knowledge and
experience. The m ethod of GA chooses values to suggest by extrapolation from the previously
assessed medians and quartiles, assuming some patterns of dependence or independence at
different knots of each covariate. The derivations of these suggestions are reviewed below.
S u g g estin g co n d ition al m ed ians
Assuming independence between a and /?, the conditional medians ra^ o .sla* in (3.31) th a t
are required for eliciting a 1} can be suggested as follows.
Conditioning on a = o*, and under the independence assumption, we have
—bij,
Vi, j.
(3.54)
Taking a* = 2/0 ,0 .7 5 > and equating the right hand sides of (3.27), (3.28) to those of (3.32),
(3.33), respectively, equation (3.54) implies th a t
{ Vi , j , 0.512/0,0.75) — 2/0,0.75 = 2/i,j , 0.5 “ 2/0,0.5
67
(3.55)
for %— m + 1, • • • , m + n, j = 1,2, • • • , 6 (i), and
{ Vi , j , 0 . 5 12/0,0.75) “
—1 ,0 .5 12/0,0.75) = Vi , j , 0.5 -
V i , j —1,0.5
(3.56)
for i = 1,2, • • • , m, j = 1,2, • • • , 8 (i).
Now, from both (3.55) and (3.56), we have
( V i j',0.512/0,0.75) -
2/z,j,0.5 = 2/ 0 ,0.75 -
2/0,0 .5 ,
(3.57)
for i = 1,2, • • • , m + n, j = 1,2, • • • , 5(i).
Hence, from (3.34), (3.57) and the independence assumption, a reasonable suggestion denoted
by 771^,0.512/0,0.75 for 77^,0.512/0,0.75 is given by
777i,j,0 .5 12/0,0.75 =
g 1 ( 2/ 0 ,0.75 - 2/ 0 ,0.5 + Vi,3,0 .5 ) ,
(3.58)
for i = 1, 2, • • • , m + n, j = 1, 2, • • • , £(2 ).
All the components in the right hand side of (3.58) can be computed from the previous
assessments as in (3.18) and (3.26). Of course, accepting these suggested medians by the
expert will lead to a zero vector as a value of a^ .
S u g g estin g co n d ition al q uartiles for factors
The simple idea here is to assume th a t the expert’s opinion at one factor level is independent
of her opinion at other levels. These lead to conditional quartiles th a t are unchanged as the
number of conditions increases.
In particular, let rhij^0 .2 5 |t7 7 °0 5 " ' >m i,k an(f ™'i,j,o.75 \m i,o>' ' ’ >m i,k b e the suggested values
of the conditional lower and upper quartiles,
777^^,0.25
• ** >m ik anh m /j,o.75|77i®o> • • • , 77i°fc,
respectively, as required in (3.35), for i = m + I, -- - , m + n, k = 1,2, - • • ,S(i) — 1 and
j = fc + l,fc + 2, ••• ,S(i).
Under the independence assumption, the suggested values are
(3.59)
68
and
(3.60)
for i = m + 1, • • • , m + n, k = 1,2, • • • , J(i) — 1 and j = k + 1, k + 2, • • • , 5(i).
Again, the expert can change any of these suggestions should she wish.
S u g g estin g co n d ition al q uartiles for con tin u ou s covariates
' ' ’ ’m i,k an<^ m *J,o.75lm i,o>
In offering suggestions for the conditional quartiles,
•••
as required in (3.35), the m ethod of GA distinguishes between two cases, the case
where k — 0, and the case where k > 0.
In the case of k = 0, the assumption is th a t the relation between Y and Ri is approximately
linear, instead of being piecewise-linear. Hence, we may imagine three lines emerging from
2/o,o.5
at the reference knot r^o- The middle line connects all the medians
while the
lower (upper) line connects all the lower (upper) quartiles # ( ^ ,. 7,0 .2 5 I ^ q ) (fl,(flitj,o.75lrrii>o))>
at all other knots, r^j, for j =
1,2
, • • • , £(z).
The linearity assum ption ensures th a t the slopes of each of these three lines are equal at
all knots r ij, j — 1 , 2 , • • • , <5(i). This implies th at, for any value I — 1 , 2 • • • , S(i), I 7 ^ j ,
Vi,j,0.5 - g(w»ij,0.25|n»°o) _ 3/i,i,0.5 - g(ra»,l,0.25|ra°o)
(3.61)
and
0(rat,j,o.75|ra?>o) - Vi,j,0.5 _ 9(m,i,o.75\ml0) - ^ ,
Once the expert has assessed one conditional quartile,
777^ , 0.25
0,5
(3.62)
l ^ o or m i,l,0.75 1m i,o> equation
(3.61) or (3.62) can be used to suggest conditional quartiles as
(3.63)
or
7 7 ^ ,0 .7 5 |77l?i0
= 5 M 2/ij , 0.5 ~ [Vi,l, 0.5 ~
respectively, for j — 1,2, • • • , $(i), j ^ I.
69
0 .751™
Suggestions for all conditional lower (upper) quartiles are extrapolated from only one
assessed value of the conditional lower (upper) quartile. This helps a lot in saving the expert’s
time and effort during the elicitation process.
For the remaining assessment tasks, where k = 1,2, •• • ,S(i) — 1, a new assum ption is
imposed to obtain the suggested quartiles
771*^,0 .2 5 \m lo,
The conditional correlation coefficient between Y i j and
m i,k an<^
’ " m i,k-
for j = k + 1, k + 2, • • • , 5(i), is
assumed to be of the form
Corr (Yt J , Yi<k\ yl 0, ■■■, y ^ ) =
(3.65)
From which, using theory of bivariate normal distributions, the conditional variance is given
by
V ar(Y y|2/?,o, • • • , & _ „ ! & ) = (1 -
• ■, 3 / ^ ) , '
(3.66)
for j = k + 1, fc + 2, • • • , 5(i).
Once the expert has assessed both a lower and an upper conditional quartiles at any one
knot, say r^fc+i, the value of V a r ^ ^ + i l ^ o , • • • ^y^k- vV lk) can be elicited from equation
(3.35). Since V a r ^ ^ + il^ Q , • • • ,y®k-i) bas already been elicited in step k — 1, then the
value of pitk-
1
can be computed from (3.66) for j = k -f-1.
Substituting with p i ^ - i again in (3.66), and using the already elicited values of Var(Y^|7/?Q,
‘ ’ * »Vi,k-i)> ^or j = k + 2, • • ■, S(i), the value of V&r(Yitj \y?0, • • • ,
i »2/?,fc) can be obtained
for all j = k + 2, • • • , 5(i).
After the value of Yar(Yij\yf0, • • • , y®k) has been
equations for mj, 0 .25 ^
0 , • • ■ , 2/°fc and
Wj, 0 .75 ^
0, •••
elicited, we can solvethe following two
, 2/°fc,
2
V ar(y)j |yf0, ■■■, y f k) ~
1.349
(3.67)
and
(2/ij,o.75l2/z9,o, • • • , y l k) ~ yi,j,0.5 _ P ( ^ ,j ,0,75|m?0, • • • , m ? ^ ) - yiJt0,5
Vi,j,0.5 - (yi,j,o.25\ylo, • • • , ylk)
yij, 0.5 - g ( m itj,o.25\mlo, • • • , ^ 9 fc_ x) '
(3.68)
The use of equation (3.68) aims to ensures th a t asym m etry of the suggested quartiles around
the median a t step k is the same as any asymm etry of the assessed quartiles at step k — 1.
70
Finally, in view of (3.26), the suggested quartiles are given by
(3.69)
and
(3.70)
for « = 1,2, • • • , m, A: = 1,2, • • • ,5(i) — 1 and j = k + 1, k -f 2, • • • ,S(i).
3.3
A ssessm en t tasks and softw are d escrip tion
The assessment procedure divides naturally into five stages, which are described in turn. A
description of the m ethod and theory for using the assessments to estim ate the hyperparam
eters of the prior distribution was reviewed in Section 3.2.2.
3 .3 .1
D e fin in g t h e m o d e l
The modified version of the software, PEGS-GLM, offers the expert different options for the
model to be fitted. The choices available are ordinary linear regression, logistic regression,
Poisson regression and any other user defined model. Ordinary linear regression assumes
a normal distribution for the response variable w ith the identity link function.
For the
logistic regression the assumed distribution is Bernoulli with the logit link function. Poisson
regression assumes a Poisson distribution with the logarithm link function.
The expert can choose to define any other model, in which case she will be asked to
give a distribution and a link function.
Available distributions are the normal, Poisson,
binomial, gamma, inverse normal (inverse Gaussian), negative binomial, Bernoulli, geometric
and exponential. The user is also asked for some param eters of the selected distribution
where appropriate. However, the expert has the option to elicit the extra param eters of the
normal and gamma distributions. Novel methods for eliciting these param eters are proposed
in C hapter 5.
71
Available link functions are the canonical, identity, logarithm, logit, reciprocal, square
root, probit, log-log, complementary log-log, power, log ratio and user defined link function.
For a detailed definition of these link functions see McCullagh and Nelder (1989). For the
power link function the software expects the exponent of the power function to be entered by
the expert, a value of (-2) is suggested as a default. On choosing the distribution the software
suggests the suitable canonical link function so as to help the expert (see Figure 3.3).
"
Hunger tfcorariates in fteniGde£
Chocsatte regressionmodefc
~~~
j?
{Qtfisrmotel
31
Gnosettedstiftu&xc
{Binomial
anosetteinkfunc&ixc
|otfierlinf;function
Write your function here:
~ 33
Dist.2ndparameter. |l
3J
ErpMentva.’ue: |-2
y=|log(x}
Help? |
Writs your irwefse function here: x=jesp&)
<Back {
fieri
>j
Help? |
Figure 3.3: The dialogue box for defining the model
An im portant modification to the software (made by the author) is th a t it offers a large
range of GLM’s. It also lets the expert write her own link function and its inverse. The
program m can parse both formulas and check their validity as m athem atical expressions.
Moreover, the program can help by checking w hether the functions are valid inverses of each
other.
3.3.2
D efining th e response variable and covariates
The expert determines the dependant variable w ith its minimum and m aximum values in
a dialogue box. The modified version of the software suggests the maximum and minimum
values of the response variable whenever possible. The expert may still change them , but, in
the light of the chosen model w ith the specified link function, invalid values are not accepted,
and the expert is shown a warning message (For example, the range for a binomial proportion
m ust not extend outside the interval (0,1)).
A set of explanatory variables (covariates) are chosen by the expert. Each covariate is
72
treated as either a continuous random variable or a factor. Continuous covariates are specified
with their minimum and maximum, factors are specified with their levels. For each continuous
covariate, knots are chosen by the expert or suggested by the software. A reference point is
chosen for each covariate, while the origin is the setting for which every covariate is at its
reference point. After determining the number, names and types (continuous covariate or
categorical factor) of the variables, the expert has only to give the maximum and minimum
for each of her continuous covariates together with the value of its reference knot, and the
modified software then suggests a suitable number of knots and the position of the reference
knot relative to the other knots. The software can then divide the range and gives the value
of each knot. This process is done autom atically to reduce the burden of data entry, but,
again, the expert can change any of these.
The fractional p art of each single numeric value is always being rounded to four decimal
places, so as to avoid large decimal numbers which are not easily readable nor suitable for
graph axis. If higher precision is to be used, measurement units can be modified to use data
values of no more than four decimal places. For categorical factors, the expert gives the value
of each level. In some cases, when the factor levels are ordinal data, for example, the expert
may wish to keep the order of the factor levels, while still being able to select any level as
the reference level. The author’s modification of the software gives an option to select the
reference level of each factor without restricting it to be the first knot (see Figure 3.2).
Using a dialogue box, the median, lower and upper quartiles of l^i{r0) at the origin are
assessed, namely, rao,o.2 5 >^ 0 ,0.5 and rao,o.7 5 , as denoted in Section 3.2.2. These values must
be inside the previously specified range of the response variable; if not, the software warns
the expert and asks her to resolve this conflict. In the expert’s opinion, the true value of
/iji(r0) is equally likely to be bigger or smaller th an the assessed median. Together w ith the
median, these quartiles should divide the range into four equally likely intervals. The expert
is encouraged to modify her median and quartile assessments until they divide the range
into four intervals th a t each seem equally likely to her. These assessed values are used as in
73
equations (3.18), (3.19) and (3.20), in Section 3.2.2, to estim ate &o and cro.o-
3.3.3
Initial m edians assessm ents
In the rem ainder of the elicitation procedure, the expert is separately
questioned about
each covariate in turn. She is asked to assume the other covariates are at their reference
values/levels and forms a piecewise-linear graph or bar chart to represent her opinion about
each separate covariate.
The previous stage elicited the expert’s median estim ate, rao,o.5 , of ££i(r0) at the origin
r = r 0. The software plots this value on the reference vertical line and the expert is told
to treat it as being correct. The expert then plots her median estimates,
5,
of /^(r^-),
as given in equation (3.21), to form the rem ainder of the graph. She does this by using
the computer mouse to ‘click’ points on the vertical lines.
Straight linesare drawn by the
computer between the ‘clicked’ points, which the expert can change until she feels the graph
corresponds to her opinions.
As an illustration, Figure 3.1 shows a software graph for the variable “Weight” . The
horizontal axis gives values for the variable and the vertical axis gives values of Y. Thus the
graph plots the effect on Y as the value of “Weight” varies. The experts is told th a t, if the
graph is fairly flat, then the variable has less influence on Y th an if the graph is more curved.
The axes and vertical lines are drawn by the software.
For factors, bar charts are formed to represent the expert’s opinion. The value of Y has
been elicited earlier for the reference level and this gives the height of the reference bar. The
expert is told to assume th a t this bar is correct and to judge the appropriate heights for
other bars relative to it. These heights give the value of Y for each level when the other
covariates are at their reference values/levels. The software draws thin vertical lines for each
level and the expert specifies the height of a bar by clicking on the line w ith the mouse. This
is illustrated in Figure 3.2 where all bars have been specified.
The expert could change an assessment by re-clicking on a line. These m edian assessments,
74
m i,j,0.5) for the continuous covariates and factors yield estimates of the hyperparam eter b, the
mean of the regression coefficient vector (3. Theoretical derivation of this estim ation is given
in detail in Section 3.2.2, equations (3.26), (3.27) and (3.28).
3.3.4
T he feedback stage
It is im portant to help the expert check th a t her assessments have resulted in a prior distri
bution th a t is a reasonable representation of her opinion. This is done through a feedback
stage, in which the expert is informed of some other measurements th a t are inferred from her
assessments. She can review and revise her original assessments, in the light of this feedback,
if necessary. The current elicitation m ethod has quantified the relationship between the re
sponse variable and each covariate in turn, while assuming th a t all other covariates are at
their reference points. Hence, it is im portant th a t the expert has feedback th a t shows her
implied predictions of the response variable when all covariates are simultaneously changed
from their reference points.
The software computes the values of the response variable at some suggested design points
and presents these values to the expert to check th a t they are reasonable representation of her
opinion about the response variable at each suggested design point. Figure 3.4 illustrates a
feedback screen, in which the software suggests 6 design points, each of which is a combination
of the values and levels of all covariates. Combinations 1 and 4 are the covariate values th a t
gives the minimum and maximum response values, respectively. Combinations 2 and 3 consist
of the values th a t divide each covariate range into one-third and two-thirds, respectively.
Minimum and maximum values of each covariate are suggested in combinations 5 and 6,
respectively. The expert is asked to specify other design points of interest and to revise any
design points offered by the computer th a t are unrealistic combinations of covariates.
75
'
CcKanateKana
C orftonatttoji
CombffabanfT)
CorofcinaSanfS)
Ccmtwiaboolti
CDnflKisSanps
Corat*naSca$6)
|
j
» o -r
|
m o-T
|
15.0- r
I
j»$x€u
JS.O-T
TOa*erT«t****!are |
JO a-f
]
U 3 J3 lr]
I
SUIUBM
15.0“
1
21.6667-^
I
fe restilip e
V & rillM to n
|
jnb€a
jst
23.3 !33-rj
^
2
.
K .
z l
H
ZJ
|
(sj*H
z l
N
M f-r
japat
Z
Graph values otiT:
Scaled values ol Y:
jauB
|0 .U 32
137343
j 3.4727
I
Overati scale tactoc
"
J in f x S ]
a
,E.............. *
ja m
jl3 £ 3 4
|: m 2 e
|iU ??4
0.671-p
■
e
m
0
Do the values o f Y represent jour opinion reasonably vreH? I so, cut* Tio to next section wtttiou* seating-,
otherwise, change ccrrariales o r Y values, then click 'Apply scaling factor and go to next section' o r "Apply scaling factor and review graphs*.
||
| -pel- seating latfci and w i * jjiopte |
seating tactw an <3go to ne«t section 11
*30 to non section «vtthc*ft seating
| }Heip? it2) 1
Figure 3.4: The feedback screen
The expert is asked to check th a t the row of “Graph values of Y ”, as given in Figure 3.4, is
an acceptable representation of her opinion at each design point. These values are predicted
from the graphs of medians th a t were assessed by the expert in Section 3.3.3. T he values
th a t are outside the range of the response variable, which was specified at the sta rt of the
elicitation process, are flagged in red. The expert can change the unacceptable values by
varying the “Overall scale factor” until the row of the “Scaled values of Y n, in Figure 3.4,
represents her opinion reasonably well in term s of the predicted values a t each design point.
The scaled values of Y are computed by multiplying all regression coefficients, except the
constant term , by the selected value of the overall scale factor.
The expert may choose to review and revise the scaled median assessments again as in
Section 3.3.3. Then she will be shown an updated feedback screen. The process will continue
until the expert is happy w ith the graph values of Y as presented in the feedback.
3.3.5
C onditional m edians assessm ents
During this stage the expert is asked to assess her conditional medians,
each covariate in turn, i = 1,2, • • • ,772 +
at the reference point from the median,
72.
772^ , 0 .5 17720,0.755
f°r
This is done by changing the conditioning value
7720 ,0 .5 5
76
to the upper quartile,
7720 ,0 .7 5 -
See Figure 3.5
in which median assessments made in the previous stage axe given together w ith the upper
quartile a t the reference point. The expert assumes th a t the true value of Y at the reference
point is the given upper quartile and she is asked to change the median values at other points
to assess rriijfi,5\mo,o.75 in the light of this new conditioning value. Conditional medians for
all values have been assessed by the expert in Figure 3.5.
These assessments are needed to elicit a p art of the covariance m atrix A, namely, a 1,
the covariances between a and each of the components of /?, see equations (3.32), (3.33) and
(3.34), in Section 3.2.2. Suggested values of these conditional medians,
are
given by the software, assuming th a t a and /? components are independent, see equation
(3.58) in Section 3.2.3. The expert can change these suggested values if she wishes.
.....'Tia*i
1:
Now, you have finished with this continuous covariate (W eight), you ntay p ress 'Next C ovariate' to proceed
FBe E d t Tools Help
Eliciting Conditional M edians of Y fo r v alu e s of W eight
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
>- 0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
10.0
23.0
43.3333
71.6667
We'9h*
100.0
iRevised median at
IQ.'ol
Figure 3.5: Conditional median assessments for the continuous covariate “W eight”
3.3.6
C onditional quartiles assessm ents
The median assessments provide point estimates of the relationship between different covari
ates and the variable Y. The remaining task is to quantify the expert’s confidence in these
estimates and their interrelationship, i.e. how accurate she believes the estim ates to be and
the correlations between them for each covariate individually. Correlations between coeffi
77
cients of different covariates are estim ated in three different methods proposed in Chapter 4.
In this stage, assessments of conditional lower and upper quartiles,
j,o .75
0’
771^ 0 .2 5 !777° o
and
respectively, are elicited. Assessing quartiles is a harder task for an expert
th an assessing medians, and quite a large num ber of quartile assessments are required. To
assist the expert, the software suggests some quartile values by extrapolating from other
quartile assessments of the expert. The theoretical procedure for getting these suggested
values, rhijfi, 25 \m^0 and
as reviewed in Section 3.2.3, was programmed into the
software to effectively help the expert during the current stage. The expert can change these
assessments and commonly does so but, even then, a starting value to consider seems to make
the task easier.
For each continuous covariate in turn, the software displays the graph of the medians
th a t had been assessed earlier, tu ^ o . 5 , and then sets of conditional quartile assessments,
m i,j,o.2 5 |tu°o and
777^ 0.75
|m ?0, are elicited. For this first set of assessments, the condition is
th a t the value of Y at the reference value/level equals the m edian assessment, i.e. /^(r^o) =
7770,0.5Now, you have finished with the continuous covariate (Weight) a t step (1). you may press 'Next step* to proceed'
; m e Edit Tools Help
Eliciting Q uartiles of Y for values of W eight
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
> 0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
10.0
23.0
43.3333
71.6667
Weight
100.0
lUppcr Quartile at IQ.ol
Figure 3.6: Quartile assessments for a continuous covariate
In an interactive graph like Figures 3.6 and 3.7, the expert is asked to give her lower
and upper quartiles for Y at one point on each side of the medians for each value/level of
the covariate except for the reference value/level. The lines joining quartiles look similar
to confidence intervals and it is emphasized to the expert th a t there should only be a 50%
chance th a t the value of Y is between the lines at any point. The expert uses the computer
mouse to make assessments or change values suggested by the software.
•You a s se s s e d your lower quartile a t (Very large) to b e • (a.4 7 6 ). Please com plete for o th e r points
Very large
L a rg e
ILower Quartile at Q
.o]
Figure 3.7: Quartile assessments for a factor
For the second set of conditional assessments, the expert is asked to assume th a t the me
dian estimates of Y are correct at both the reference value/level and the nearest points
on each side of it, i.e.
conditions m ?0, m ^ , •••,
in Section 3.2.2.
The expert
gives lower and upper quartiles at another point, r^k+i, and the software suggests quar
tiles,
• • • ,m - fe and m iJ;o.7 5 |m °0,m ° i, • • • ,
for the rem aining points,
j — fc+ 2, • • • , <5(z). In Figure 3.8 lower quartiles have been assessed while upper quartiles are
to be assessed. The expert modifies quartile values so as to represent her opinion, subject
to the restriction th a t the current values m ust be w ithin the previous set of quartile assess
ments, m i o , m ^ i , • • • ,m°i k_ x and
• • • , 7 1 ^ ^ . The idea is th a t as
conditions increase, uncertainty should reduce. As detailed in Section 3.2.2, this condition
guarantees th a t the covariance m atrix of correlation coefficients is positive definite.
Figure 3.8 illustrates the graph formed at th a t stage. The two red lines (the outer lines)
represent the previous set of quartile assessments, the second highest (black) line gives the
median assessments, and the second lowest (blue) line joins the new lower quartile assess
ments. The black line joining the median a t the right two bold points represents the condition
th a t these medians should be treated as being correct. In assessing quartiles, the expert is
told to consider the points to which she thinks the blue line may reasonably extend.
You a s se s s e d your lower quarti'e a t (190.0) tu b a (0.306). p tease com plete for o th er points
mmrnammm
H e Edit Tools Help
Eliciting Q uartiles of Y fo r v alu e s of H eight
Heigh!
[Lower Quartile at 19Q!o|
Figure 3.8: Assessing quartiles conditioning on two fixed points
Conditional assessments are also needed for factors. The software displays the bar chart
th a t was formed during the assessment of medians. Conditional on the value of the bar a t the
reference level being correct, i.e. on m- 0, the expert assesses a lower and an upper quartile,
0.25
and 77^ ^ 0 .7 5 I ^ q , respectively, for other factor levels.
For each further set of conditional assessments, for both continuous covariates and factors,
the expert is asked to assume th a t a further median given by another value/level was correct
and to give her opinion about quartiles for the remaining values/levels. This is continued
until the condition includes all but one of the values/levels at one side or one a t b o th sides,
when the expert gives her opinion about just the last one or two values/levels (see Figure 3.9).
H i a t m s tb s L a « s te p of thK facto r, P lease P ress "Hex* Covrartore’ hi the. Current Menu o f Sectfnn Three
Eliciting Q uartiles e f Y fo r v a lu e s e f XI
Very large
U rg e
[Lower Quartile at O.dl
Figure 3.9: Assessing conditional quartiles for the last level of a factor
As in other parts of the elicitation procedure, the expert uses the mouse to make assess
ments. Figure 3.9 illustrates the bar chart when conditioning values are specified (indicated
by the solid squares); quartiles for the last level are marked w ith short horizontal blue lines
(the inner two lines), while the highest and lowest (red) lines represent the previous quartiles
conditioning on fewer medians. Again, current conditional quartiles are not allowed to lay
outside these red lines. The conditional quartile assessments,
771^ 0.25
' ' ' j m ik and
yield estimates of the variance, E, of the hyperparam eter /?, see
Section 3.2.2.
The conditional assessments complete the elicitation procedure for the case of independent
coefficients as required in Section 3.2.2.
3.4
C onclu din g com m en ts
The piecewise-linear elicitation m ethod for logistic regression introduced by G arthw aite and
Al-Awadhi (2006), as reviewed in this chapter, is widely applicable for GLMs w ith any
monotonic increasing link function. The m ethod only requires conditional and unconditional
medians and quartiles to be assessed from the expert, these assessment tasks are easy to
perform using the bisection method. The num ber of assessed quantities is sufficient to elicit
a mean vector and a positive-definite variance-covariance m atrix for a m ultivariate normal
prior distribution of the regression param eters of any GLM. The available modified software
has increased the applicability of the m ethod and made its implem entation easier for the
expert. However, the independence assumption between different regression coefficients th a t
is imposed by the m ethod is sometimes unrealistic and need to be relaxed. Extended methods
th a t relax this assum ption are proposed in the next chapter.
82
C hapter 4
E liciting a covariance m atrix for
dependant coefficients in GLM s
83
4.1
In trod u ction
For quantifying expert’s opinion about generalized linear models (GLM), Garthwaite and
Al-Awadhi (2011) proposed a m ethod of eliciting opinion about the prior distribution of the
regression coefficients. This m ethod, which will be referred to here as GA, is a generalization
of the same authors’ piece wise-linear model th a t they used for quantifying opinion for logistic
regression (Garthwaite and Al-Awadhi (2006)). A detailed description of their m ethod has
been given in the previous chapter.
In their work, the relationship between each continuous predictor variable and the depen
dant variable (assuming all other variables are held fixed) was modeled as a piecewise-linear
function. They used a m ultivariate normal distribution to represent prior knowledge about
the regression coefficients. These coefficients were allowed to be dependant if they were asso
ciated with a single variable. However, they assumed th a t there was no interaction between
any variables, in the sense th a t regression coefficients were a priori independent if associated
with different variables.
Our aim in this chapter is to relax the independence assum ption between coefficients of
different variables. In fact, in many practical situations, it may be thought th a t regression
coefficients of different variables should be related in the prior distribution, if the prior dis
tribution is to give a reasonable representation of the expert’s opinion. The expert may be
asked to state which variables this applies to. We propose three different elicitation m ethods
th a t are implemented in interactive graphical software. The software is freely available as
PEGS-GLM (Correlated Coefficients) at http://statistics.open.ac.uk/elicitation.
In the first m ethod, after assessing additional conditional quartiles, G A ’s m ethod of es
tim ating the variance-covariance m atrix is generalized and used to estim ate the variancecovariance m atrix in generalized linear models where pairs of correlated vectors of coef
ficients are not necessarily independent in the prior distribution.
The second m ethod is
designed to require a smaller number of assessments. Its generalization to the case of var
ious vectors of correlated coefficients is straightforward, where the required conditions for
84
positive-definiteness can be easily investigated. A third flexible m ethod is proposed in which
the expert assesses the relative correlation structure for all pairs of vectors, then chooses one
of the other two methods to specify the coefficient for the highest correlated vectors. This
m ethod autom atically fulfil the requirement th a t the whole variance-covariance m atrix must
be positive-definite. The three proposed methods are detailed below.
4.2
A p roposed m eth o d for elicitin g th e variance-covariance
m atrix o f a pair o f correlated vectors o f coefficients
In this section, we propose an elicitation m ethod th a t generalizes the m ethod of GA to
handle correlated coefficients in GLMs.
We sta rt by generalizing the equations given in
the previous chapter to make them applicable to the case of correlated coefficients. The
underlying m athem atical framework is given in Section 4.2.1. The equations given there show
how the required conditional assessments are m athem atically treated to elicit the variancecovariance m atrix. Our approach to assess these conditional quartiles from the expert using
interactive software is detailed in Section 4.2.2.
4.2.1
N otation s and th eoretical fram ework
Consider the piecewise-linear GLM of GA, with m continuous covariates R i, 1?2, • • • , Rm and
n categorical variables (factors) Rm+h Rm+ 2 , • • * >-Rm+n- The model has been defined in
Chapter 3, equations (3.1) to (3.15).
Recall th a t the prior distribution of
a
and (3 =
—
(r'
l_ 1 ’
r'
£12’
...
’
R1
!—m + n
)'
is assumed to be
a m ultivariate normal distribution
a
\
(4.1)
~M V N
\i)
\ w
Va
s //
The elicitation of the hyperparam eters &o> k, cr0)o, g_i and E has been reviewed in Section 3.2.2.
Equation (3.52) states th a t E = A + (Li O'qq(t!i , where A has been assumed to have the
block-diagonal structure
(
i
L>r1A1;0(L
>1- :ly
O
O
0
O
O D-lAmfi(D-'y
:
A=
:
:
:
O
!
i
(4.2)
O
V
Ayn+l^
:
:
:
O
^
’ ■■
■■■
*■■
O
O
G
Am+n)o
where, for i = 1,2, • • • , m, each Di is a lower triangular m atrix given by
Di =
^dki
0
0
• • •
dn
di2
0
• • •
dii
di2 diz
0^
0
0
(4.3)
:
0
\d ii
di2 diz
dis(i)
•••
Renee, for continuous covariates
and
V a r ( D i / ? . |o ! ) = V a x ( ( y . >1>
y . >2j
.. . ,
yi>5(i))V) =
A.o,
for i = 1,2, ■• • ,m ,
where
Yi,j = 9[K(rifi,
•••,
r i - 1)0, r*j,
ri+i.o,
•••,
Tm+n.o)7)]-
As required, V is a continuous piecewise-linear function of the variable Ri, if all other
variables are kept at their reference values. Hence,
Di 1Ai)0(Di 1)/, for i = 1,2,
V a r ( / ? . |a ) =
,m ,
(4.4)
<
A i to,
for
2
= m
4 -1 ,
m
+ 2, • • • , m + n .
Formulae for A^o are given in GA as reviewed in the previous chapter, see equations (3.37)
to (3.44).
86
Instead of assuming the block-diagonal structure given by (4.2), we will conformally par
tition A as
S l.l
^1,2
^2,1
S 2)2
•’•
S 2)m + n
A=
y ^ m + n ,!
E i,m + n
^ro+ n ,2
' *'
(4.5)
^ m + n ,m + n J
where
£'i,i = Var(/+|a:),
(4.6)
for z = 1,2, • • • , m + n,
and the submatrices £ S)* are not necessarily zero matrices (s = 1 , 2, • • • , m + n, t = 1 , 2, • • • ,
m + n and s ^ t). We will estim ate the S S)t matrices in (4.5) by generalizing the m ethod of
GA.
Assume th a t the expert believes th a t (3 and (3t are correlated.
For s < t, we must
estim ate the upper diagonal covariance subm atrix VSyt of V , where,
V = Var[(/3'
(4.7)
p.)'\a} =
Vt,t J
As a variance-covariance m atrix is symmetric, VttS = V'}t.
The correlation relationships are handled one pair at a time.
Suppose we are currently
interested only in the pair (3 , (3 , and th a t these are correlated in the prior distribution.
(The same procedure can be followed for each pair th a t is correlated.)
For s = 1,2, • • • , m + n, t = 1,2, • • • , m + n, and s < t, let Sst = S(s) + S(t), and for
k - 0 ,1, • • • , S s t - 1 , put
V a r(y s ^ + i, • • • ,
Y s ^ s ( s )i
^ ,i
j
’ ’ ’
for
>
1 ^ 5 ,0 ’
’ V s y k ) ’
0 < k < 5(s) — 1.
Vax(yM._J(s)+1, • ■• , Yt m \y°S)Q, • • • , y °s>6{s), y °0, • • • , ^ °fc_<5(s))’
for
87
5(s) < k < 5st — 1.
Specifying conditional values j/? •, is equivalent to conditioning on the corresponding assessed
medians
as detailed in the previous chapter.
We sta rt with
A-st,5st —1
V ^ ( Y t , 5 ( t ) l?/s,0>
»^ s , 5 ( s ) ’ V t,0i
>y t , 5 ( t ) —l ) j
(^-8)
which can be computed from the conditional quartile assessments of the covariate Rt at
£(£). The conditioning specifies the values of Y a t all previous knots of Rt and all knots of
R s as well. Given these conditions, the expert assesses conditional quartiles m tjs(t),0.25 and
m t,S(t),0 .7 5 - The m ethod of assessing these quartiles is detailed in Section 4.2.2. The formula
for computing the variance ensures th a t A g t ^ t - i > 0, since
A * t ,6 a t - 1 ~
[ 9 ( m t ,5 ( t) ,0.75 I ^ q , • • •
0> ' * * >m t , 5 ( t ) - l )
~ 9{mt,6(t),o.25\m%, • • • ,m°sA s), m l Q, • • • ,m j>*(t)_ 1)/1.349]2.
(4.9)
We put
$ st,k ,k
0 t ,k
—s
(4.10)
A -st,k —1
\ ^ s t ,k
for k = 1,2, • • • ,S st, where <pst,k,k is a scalar, <fist
^ st,k
J
is a vector and $ st,k is a square m atrix. In
particular, the scalar <f>st,k,k in (4-10) is given by:
Var(Yajfc|y °0, • • •
<i>st,k,h =
1 < k < 5(s),
for
1Var(Kti*_iW|2/°0, ■•• ,i
y l 0t •■• ,
for
(4.11)
<5(s) + 1 < k < 5st.
Recall from the previous chapter th at, for j = k + 1, • • • , 5(i),
Vai(Ylj\ylu, ■■■ , ] / ( , . )
= V a rfy jlt^o , • • • ,y'ik-l) ~ <P7,k,k^h , j <
(4.12)
as a result of the theory about conditional m ultivariate normal distributions. Equation (4.12)
can be generalized for the case where there are two correlated vectors of coefficients. Then,
the vector (f)st
in (4.10) takes the form:
s t ,k
^4>s t , k , k + 1)
'’’ 5
$ s t , k , S st ^ ’
where
[0 «t,fc,fc{Vax(y^^|2/Oo, - - -
“ Var(YS|j| ^ )0,--- ,y ° fc)}] 2 ,
for
1 < k < S(s),
j = k + 1 , " ’ , S(s).
[05i>fc>fc{Var(YtJ_J(a)|y°o, • • • , y j ^ )
-V ar(Y t|i_*(a)|y °0, • • • ,y ;>fc)}]5,
fistfij = <
for
1 < A; < <5(s),
(4-13)
j - 5(s) + 1, • • • ,5st.
y >st,k,kV^&T0^tj-S(s)\y$,Oi
> 2 / s , 5 ( s ) ’ 2 /* ,0 »
’ ^ ? ,f c - 5 ( s ) - l)
-V aj(Y t>J-_J(a)|y°0,--- ,y ° 5(s),y?i0, • • • ,2/t°>fc_5(a))} ]^
for
£(s) + 1 < k < 5st,
j — k T 1, • • • , Sst.
The main constraint needed here is th a t conditioning on more values at each further step must
reduce the value of a conditional variance. The expert m ust therefore reduce her uncertainty
as the elicitation process progresses. It means th a t her assessments of each interquartile range
must steadily decrease. This will ensure th at, for i = 1,2, • • • , m + n, j > k, 1 < k < 5st,
Var(Yij|j/°0, ■■• ,y ° k- 1) > Var(Y;j|2/?0, •••
(4.14)
Conditional variances in (4.11) and (4.13) can be w ritten in term s of the assessed conditional
quartiles as
Var(YSJ-|y°0,- - . ,y£>jfc) =
g{msd,o.75\m0S)o, • • • , m° fc) - g (m sJ)o.25|m°0, • • • , m° fc)
1.349
for 0 < k < S(s),
Var(YtJ-|y°0,---
,y°s,k) =
g{mt,j,o.75 \m l0, • • •
j = k + 1, • • • , 6(s),
(4.15)
, m ° fc) - y ( 7 ^ 0 . 2 5 | t o ° 0 , • • • , m ° fc)
1.349
for 0 < k < S(s),
89
j = 1, ■■• , 5(t),
(4.16)
V a x ( y ^ | y ° 0 , ■ • • , y ° 5 ( s ) , y t0| 0 , • • • , y ° fc) =
{ 9 ( r n t j , o . 7 5 \ m % , ■ • • ,r n ° a A a ) , r r ^ tQ, • • •
- ^ ( m t j >o.25k°o»-”
^ ? ,fc ) /L349]2>
for 1 < k < 5(t),
j = 1, • • • , S(t).
(4.17)
W hat is left to be estim ated in (4.10) is the m atrix <&sttk, which can be computed, using the
conditional m ultivariate normal theory, as
$ st,k — K t , k + ^ s t ^ s t X k ^ s t i k
Hence, the m atrix
(4.18)
in (4.10) can be obtained from A st}k, for k — 1,2, • • • , Sst — 1.
Finally, A styo is the result of applying the same routine recursively, starting with A sttsat- i as
in (4.8).
If A ^o is conformally partitioned as
■
.
.
AS)S AS)t
\
(4.19)
Ast,o —
^ a ^ )S A tj j
then its submatrices can be used to obtain the required conformally partitioned m atrix in
(4.7), as follows. Take
(
Vs,s
Vs,t
\
V =
\K t
where Vs,s is the variance of
V tt)
given a. Clearly, VStS — S SiS of equation (4.5), also Vs,s = ASjS
of equation (4.19). Hence, from (4.2),
D s 1A St0{Ds 1y,
for s = 1 ,2 ,-•• ,m ,
AS)o,
for s = m + 1, m + 2, • • • , m + n.
Va,a =
90
The subm atrix Vsj is the covariance of (3g and ]3 given a , of the form
D s 1ASit(Dt 1y,
for s = 1 ,2,-•• ,m ,
for s = 1,2, • • • ,m ,
t = m + l ,m + 2, ••• , m + n
for s = m + 1, m + 2, • • • , m 4- n.
t = m + 1, m + 2, • • • ,m + n.
Noting th a t At,t in (4.19) is the conditional variance of ^ given /?g and a , another version
conditional only on a can be taken as
D ^ k t it( D ^ y + V ^ V s-JVSlu
for t = 1,2, • • ■ , m,
for t = m + l ,m + 2, • • • ,m + n.
W ith this construction, in Section 4.2.3 below, the m atrix V is shown to be positive-definite.
4.2.2
A ssessm ent tasks and software description
The modified elicitation software PEGS-GLM (Correlated Coefficients), th a t is freely avail
able at http://statistics.open.ac.uk/elicitation, elicits the expert’s conditional quartiles th a t
are needed to estim ate the covariance m atrix of correlated pair of covariates. The m athe
m atical details have been given in Section 4.2.1. The expert is asked whether the regression
coefficients of any pair of covariates are dependent in her prior distribution. If so, she will be
asked to name the two variables th a t have such dependence. Then she will be shown a panel
th a t simultaneously displays two graphs (see Figure 4.1 or Figure 4.2).
91
7"
AXtftls s ta g o , Y 'onshocritassc'S .scerw JU lG nalqfiartiT e-sateaellfenotG f (STpSgftt)
"
.
JnL*3
pn tfia to rro r p a n<3rtj*srt?n s e q ts s m c n s o f m e d ia n s e f (H eig..
HI
: f3* E A T«dts RSp
P re v io u s m edian v a lu e s o f H eight
0.9
0.8
0.6
0.5
0.4
0.3
0.2
°0.0-1
120.0
190.0
260.0
330.0
H eight
Eliciting Q uartiles of Y fo r v a lu e s of W eight conditional on p re v io u sly a s s e s s e d v alu e s of H eight
0.9
Q.e
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
10.0
43.3333
23.0
71.6667
100.0
Figure 4.1: Assessments needed in the first phase for correlated covariates
The upper graph of the panel is for one variable of the correlated pair. It shows the
previously assessed m edian values for th a t variable, denoted by
i = 1,2, •• • , m + n,
j = 1,2, ••• ,S(i), as in equations (4.9) and (4.15)-(4.17). The expert is asked to assume
th a t these median values are the correct values of Y at the given knots. T h at is, they are
accurate estim ates of the mean response for the specified covariate values. Conditional on
this information, the expert clicks on the lower interactive graph to assess new conditional
quartile values, denoted by
0.25
and r a ^ 0 .7 5 , * = 1,2,**- ,m + n, j = 1,2, ••• ,£(«), in
equations (4.9) and (4.15)-(4.17).
The procedure consists of two phases; in the first phase the expert assesses quartile
values for the variable in the lower graph given sets of medians for the variable in the upper
graph. Specifically, these medians are denoted by m ° 0, • ■• ,m ° fc in equation (4.16). The set
of conditioning values of the first variable in the upper graph are incremented by one extra
value at each new step. The expert is asked to take account of the additional inform ation and
re-assess conditional quartiles. This gives the assessments denoted by m t j to.25 and mt,j, 0.75
in equation (4.16).
92
Step 1 of the first phase is shown in Figure 4.1, where the expert is asked to assess con
ditional quartiles for different knots of the “Weight” variable in the lower graph conditioning
on the previously assessed medians m ^0,
of the “Height” variable at its reference knot
and one other knot. These two medians are connected by the rightmost (black) line in the
upper graph. The conditioning set includes also the median of the “Weight” variable at its
reference knot (23.0).
The upper and lower (red) curves in Figures 4.1 and 4.2, represent the previous quartile
assessments conditioning on fewer medians. Current conditional quartiles are not allowed
to lay outside these red lines. This fulfils condition (4.14), which guarantees the positive
definiteness of the variance-covariance m atrix, as discussed before. Specifying these conditions
by drawing boundary lines on the graph makes it easier for the expert to absorb what the
conditional values are and what they imply.
This helps her apply the idea of reducing
uncertainty as conditions increase.
The second phase starts after conditioning on the median values at all knots in the top
graph, denoted by m® 0, • • • , mP, §^ in equation (4.17). Each further step in this second phase
adds an extra median value from the lower graph to the conditioning set. These additional
values are m j0, • • • , m ^ k in equation (4.17). Further conditional quartiles m ^o.25 and mt,j,0.75
are assessed in the lower graph and used in equation (4.17).
93
-
Y a u a s s e s s fltfy o tirc o n d lY fa n a f t o t r e r q u a r t S o a t (ICO JJ} t o f m
^ t 3 i ) . P tefl50com [>iot© fofoC ficirpofnts.
'. . "
,
.=101*2
Site C A T « ts R^p
P re v io u s m e d iin v a lu e s of H eight
0 .9
0.8
0.7
0.8
0.5
0.3
0.2
0.1
0.0
120.0
190.0
260.0
330.0
H eight
Eliciting Q uartiles o f Y fo r v a lu e s o f W eight conditional o n p re v io u sly a s s e s s e d v alu e s o f H eight
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
10.0
23.0
43.3333
71.6667
100.0
We|9ht____________________________________________ iLowa-Quartile at iOO.Ol
Figure 4.2: Assessments needed in the second phase for correlated covariates
This phase is very similar to the assessment of conditional quartiles in the GA m ethod, as
reviewed in the previous chapter, where incremented sets of medians of the same variable are
used as conditioning sets for assessing conditional quartiles. However, in this phase previously
assessed median values at knots for a different variable (R s) are also taken into consideration
when assessing conditional quartiles of Rt, where s < t.
One of the steps of the second phase is shown in Figure 4.2. In this step, the expert
is asked to assess conditional quartiles mt,j, 0.25 and m t j )0 .7 5 , for j = I ,- - - ,4, for different
knots of the “Weight” variable in the lower graph. Some of the conditioning values are the
previously assessed medians, m ^ , • • • ,m®3, of the “Height” variable at all of its four knots.
These are connected by the black line in the upper graph. The other conditioning values are
the median, m °0 of the “Weight” variable at its reference knot (23.0).
Suggested conditional quartiles are computed by extrapolating from other quartile assess
ments in the same m anner as in GA method; see the previous chapter. T he middle (green)
lines in the lower graph in Figure 4.2 represent these suggested values.
On finishing all phases of the assessment for this pair of explanatory variables, the user is
94
asked about other correlated pairs, and the process starts again for the new pair, if there is
one. The modified software outputs data in three different files, one containing the basic setup
data, the second containing all assessments made by the expert, and the third containing the
resulting mean vector and covariance m atrix of the hyperparam eter vector, which are in a
form suitable for further Bayesian analysis.
4.2.3
On th e p ositive-d efiniten ess o f th e elicited covariance m atrix
After generalizing GA’s m ethod, as shown in Section 4.2.1 above, to estim ate the variancecovariance m atrix of
and (3_t , we ended up with
V = Var((£
i t)'\a) =
(4.20)
where S s>s is estim ated using the m ethod of GA. Now Vt,t r1 £(,(■ Instead,
with
/
D ^ 1At,t { D ^ 1y,
for t = 1,2, • • • , m,
At,t,
for t = m + 1, m + 2, • • • , m
S?,( = Var(£|£ ,,«) = <
+ n.
To check the positive-definiteness of the variance-covariance m atrix Var((/3^
f^t)'\a), we
proceed as follows. First, we will show th a t V in (4.20) is positive-definite. Then we will find
a transform ation to replace the sub-m atrix 14,t of V by the directly elicited unconditional
variance m atrix
This transform ation replaces V with a new m atrix, say A, which will
be shown to be positive-definite.
Now, in the m atrix V , we have:
• From (4.4) and (4.6) E SiS is positive-definite, since AS)o is positive-definite as shown in
the previous chapter, and from (4.3), D s are lower triangular for s — 1,2, • • • ,m .
• £ £ t is positive-definite since it was computed in the m anner of BS)S, above.
95
• Since E SiS is positive-definite, so is E s *, and VgtE s lV Sft is sure to be positive semidefinite. In fact, Vx ^ 0,
from the positive-definiteness of E s ].
• Vtyt is thus the sum of a positive-definite and a positive semi-definite m atrix, hence V^t
is positive-definite.
For V to be positive-definite, we use the Schurr complement (Abadir and Magnus, 2005,
p.228) to show th at
is positive-definite, which is the case.
We believe th a t the subm atrix E^t is b etter th an Vtyt as an estim ate of Var(/3Jai). Note
th a t Vt}t was computed by conditioning on both a and /^ .
Our aim now is to introduce a new m atrix, A, conformally partitioned as,
A =
s t,tJ
to replace V, where we believe A will generally be a better estim ate of the variance-covariance
m atrix of (/?
£ ') '|c r
To this end, put
/
I
O
B =
96
\
and take A = B V B ' . Then
/
o
£ S)S VSit
A =
Vo
\K t
(
\ (
o
y t,tj [ O
W
\ Z 2t Vt,t 2v j tt
o
\
v ? z l
\
4
E l t \ V tft2v tltv u 2J E ty
s S)S A S£
)
y A S,t
j
_1 1
with A Syt — VS)tVt t 2'E2t . We next investigate whether A is necessarily positive-definite.
Since ES)S and E ^ are positive-definite, A is positive-definite, using the Schurr comple
ment again, if and only if
Ast,
S s,s is positive-definite. But
t- i
Ss,s - A s^ A ' s , t = £ SlS - I Vs,tVt/ E l t ] E t7 ( El t v t/ v : >t
= s S)5 -
va,tv -*
(E^S^eM v j v ' , t
^ s - V s M fv ^ .
Thus E S)S —A s f^ t- A - s t is positive-definite from the positive-definiteness of the m atrix V. It
can be simply seen also from the m atrix equation A — B V B ' th a t A is positive-definite since
V is positive-definite, and B is non singular (Abadir and Magnus, 2005, p.221).
Now, although each variance-covariance m atrix A for any pair of correlated vectors of
coefficients, has been shown to be positive-definite, some extra conditions m ust be imposed for
the whole variance-covariance m atrix A in (4.5) to be positive-definite. For th at, a structural
elicitation m ethod should be applied to the whole m atrix. In which case, a huge num ber of
conditional assessments will be needed to inter-relate all pairs, even though m any of them
may be slightly correlated. This puts an extra assessment burden on the expert and there
may be no real gain.
97
However, the power of this m ethod is apparent when only one pair of vectors is highly
correlated. Another good situation for its application is when there are only a few correlated
pairs and the whole variance-covariance m atrix can be re-arranged so th a t these are 2 x 2
partitioned matrices on the main diagonal and off-diagonal covariance matrices are zeros.
The whole m atrix is sure to be positive-definite in this case. The expert should, of course,
be willing to use the proposed m ethod to elicit each main diagonal 2 x 2 partitioned m atrix
by assessing all the required conditional quartiles.
Although the variance-covariance m atrix cannot be guaranteed to be positive-definite
when there are many correlated pairs of vectors, it can still be checked for positive-definiteness.
The expert may be asked to review her assessments, if needed, to fulfil the property. How
ever, we propose another elicitation m ethod in the next section th a t not only fulfils the
positive-definiteness of A in (4.5), but which also requires a smaller num ber of assessments.
We also combine the two m ethods to give a flexible approach in which the expert assesses
the variance-covariance m atrix for the highest correlated pair of vectors using the current
m ethod. She then assesses the relative correlation of other pairs of vectors in comparison
with the most highly correlated pair of vectors. These relative correlations are scaled to give
the whole m atrix. The details of this approach are presented in the next two sections.
4.3
A n o th er elicita tio n m eth o d for th e variance-covariance
m atrix o f correlated coefficients
One possible drawback of the elicitation m ethod proposed in Section 4.2 is th a t the num
ber of conditional quartiles th at the expert m ust assess will become uncomfortably large, if
many pairs of covariates are thought to be correlated. For such situations, another m ethod
is proposed here to elicit the off-diagonal covariance matrices. It uses a small num ber of
coefficients to reflect the p attern of correlation between pairs of vectors and this reduces the
number of assessments th a t are required. At the same time, the assessments can be used to
induce all the elements of the covariance m atrix and, under suitable conditions, the resulting
variance-covariance m atrix is positive-definite. These conditions can be translated into allow
able ranges shown to the expert on an interactive graph; the expert will be asked to restrict
her assessments so th a t conditional medians lie inside these ranges. The m athem atical details
of the proposed m ethod are given in Sections 4.3.1 and 4.3.2 below. The required assessments
for the equations in these two sections are discussed in detail in Section 4.3.3, where the use
of the interactive software to obtain the conditional medians is also discussed.
4.3.1
T he case of tw o vectors o f correlated coefficients
To reduce the num ber of required assessments for estim ating the covariance m atrix of any
correlated vectors (3 and /^ , we assume a fixed p a tte rn of correlation between the elements
of these two vectors. We must make some simplifying assumptions about the correlation
between these vectors. If the variance-covariance m atrix of /5^ were the identity m atrix and
the same were true for (3 , then it might be reasonable to assume th a t any component of
/3 had the same correlation with each component of (3 , and vice-versa.
— £
— 5
variances of {3 and (3 are not identity matrices. Instead, we transform
Of course, the
and /3f into £s
Var(£t) = Is(t)- Then we assume th a t the
and £ , respectively, for which, Var(£s) =
correlation coefficient between any element £S)i (i = 1 , 2 , • • • , £(s)) of £s and any element £t,j
(j = 1,2, • • •', 5(t)) of £ is a fixed number, cs>t. We elicit the value of cS}t using a small
number of conditional assessments.
The matrices Var(/?s) = E S)S and Var(/3 ) = E*^ may be estim ated using the m ethod
of GA th a t was reviewed in Chapter 3. These m atrices are positive-definite, so there exist
non-singular matrices A and B such th at
AE SiSA ' = I 5(s),
= l$(t).
In fact, we take A and B as the inverse of the two unique symmetric positive-definite square
roots th a t can be obtained from the eigenvalue decomposition of E S)S and E^t, respectively,
99
i.e.
A = S sJ ,
Let £s = A(3_s and £t = B 0 t , then
We assume th at
Cs,t
'''
Cs,t
\
— Cs,t i<5(5) ^ 5 ( t ) '
C m (i S’Q = c >* =
\ c s ,t
’••
(4.21)
CS)t J
S(s)xS(t)
So th at
( \
A6S
£
l 8 (s)
MVN
(4.22)
^S{t)J
\ B b tj
U</
CS)t
Assume further th a t
E (Lt \is = A - s + n ) = B ^ t + !L>
(4.23)
where
r)s = (^s ^
...
T^y = T)s 1, for an arbitrary chosen value rjs > 0,
fit = (et ot ■■■ et)’ = fit IB ut it is known, from the conditional m ultivariate normal theory, th a t
£ (!,!£ , = ^
+ 2 ,) =
- c <A w K
- (Ab-s + 2,)] = B h +
(4 -24)
Thus, from (4.23) and (4.24), we get
fit = G 's,tlIs
(4.25)
The expert will be asked to determine the conditional mean of £ given a specific value of
£ , hence the value of Qt will be computed from the expert’s assessment of -^(£J£5)- I*1
100
fact, the expert assesses only conditional medians of Y, which are then transformed, under
normality assumption, into conditional means of the slopes of the piecewise-linear relation,
or bar heights for factors, as will be detailed in Section 4.3.3.
From (4.25), the value of csj is simply estim ated as
c s,t
It will be shown th a t V ar(^'
(4.26)
5(s) x rjs '
p')' is a positive-definite m atrix if, and only if,
(4.27)
v 'w
x m
Using (4.26), this condition can be w ritten in term s of 9t, as
M
\ 9t \ < r j s ‘
(4.28)
s(ty
To prove (4.27), note th a t
(
= Var
V = Var
i
(4
\
Csj S
^s,s
1
S2
1
V*2
t ,t ° S,t
s t,t
Since, E S)S and E^t are both positive-definite matrices, V is positive-definite, using the Schurr
complement, if and only if
E S)S -
si
Cs,tC'sj E i - E i s (IS(S) - Cs,tC ')t)
si
(4.29)
sit =sit (im - CSitc.,t) sit
(4.30)
is positive-definite, or equivalently
Sm-S*5,
is positive-definite.
In other words, from (4.29) or (4.30), V is positive-definite if and only if
\
h{s)
Ca,t
h(t) J
is positive-definite.
and note th a t G is a symmetric idem potent m atrix w ith rank(G) = trace(G ) = 1. Then F
can be w ritten as
F = h { s ) - c s,t c ; it
= I 5{s) - 5{s)5(t)c\tG
=
h(s)
—G -f- G —5{s)5(t)(?s t G
= (l 8 {s) - G ) + ( l - 5 ( s ) 5 ( t ) c l t)G
=
a i{Is(s)
~ G) + CX.2 G ,
with
oil
1,
a.2 = (1 - 5(s)S(t)c2).
As both G and (Is(s) — G) are idem potent m atrices summing up to /<5(s), the eigenvalues of
F are
0:1
=
1,
w ith multiplicity rank(/6(s) — G) = tra c e(/ 5 (s) — G) = <5(s) — 1 and
0:2
=
(1 —5(s)5(t)c11) with multiplicity one. Hence, the necessary and sufficient condition for the
m atrix F, and consequently for V, to be positive-definite is th a t both a\ and
0:2
m ust be
positive. Since aq = 1, the m atrix V is positive-definite if and only if (1 — 5(s)5(t)c^t) > 0,
which gives the condition (4.27).
The same condition can also be deduced from the quadratic form of the m atrix F. First,
recall, from Cauchy’s inequality, th at
/
n
\ ^
n
(X >i)
\i= l
/
i= 1
102
Then \/x ^ 0,
5(5)
x ' F x = ^ ( 1 - S(t)clt)x 2i + Y ^ ( - 5 ( t ) c 28tt)xiXj
i=1
#3
5(5)
5(5)
= Y ^ x i - 5{t)c\t
i=1
5(5)
i=l
5(5)
= ' 5 2 x i - 5(t)<Z,t
i=1
^2
Y^xi
i= 1
5(5)
5(5)
> 5 ^ ® ? - 5{t)S{s)clt Y ^ ^
i—1
i=l
5(5)
= (l-< 5 (s)£ (t)c 2* ) ^ : r ? .
t=i
Since ]T)i=i x2 > 0, F is positive-definite if and only if (1 —5(s)6(t)c*j) > 0.
4.3.2
T he case o f various vectors o f correlated coefficients
W hen there are more than two correlated explanatory variables, the m ethod given in Sec
tion 4.3.1 is still valid. We next obtain a set of n(n — l) /2 conditions th a t are necessary
and sufficient for the full variance-covariance m atrix to be positive-definite, for any num ber
n > 2 of correlated explanatory variables. The num ber of assessments required for eliciting
a variance-covariance m atrix using this proposed m ethod when n > 2 is only n (n — l)/2 .
The case of n = 2 has been considered already. For n = k > 2 explanatory variables, let
&i = E
Vi = Var
W
Assume th a t Vi, i
—
,
for %= 1,2,
W
1 , 2 , . . . , A: — 1 , have been obtained and th a t they are known to be
positive-definite matrices.
103
Let
V
Pn
for i = 1 ,2,..., k — 1,
,
W
i-k ~ ^k,k d k ’
with
Sfc,fc = Var(/3fc) .
We assume th a t
Ck.i ^
1
Ck,2
\ C k , k -1
(4.31)
y
where Ck is a m atrix of order (X)i=i ^(0) x ^(^)> an(l th at each Cfc.i is a subm atrix of order
£(z) x 5(k), taking the form
(
Ck,i
' ’’
Ck,i
\
,
Ck,i =
\^k,i
'’‘
for i = 1,2,..., k — 1.
Ck,i J
Then
( (
Uifc-i A
u
MVN
<j
\ \
(4.32)
-i
\
* U Sfc-1
/
\\
(4.33)
S fc,fc -A: /
Now suppose th at
E (L \L = V P
et + r M) = S “ * 6* + 0_k<i,
for %= 1 ,2,..., k - 1,
where
2* / ’
~ k j ~ ( Tlk,j
Vk,j
r}k .)' = I k j 1, j = 1,2,
104
for arbitrary chosen rjk,j > 0,
(4.34)
—k,i ~
Ok,i
(flfc.i
' 1'
Ok^y — Qk,i
!■
_1
The process will consist of k —1 steps, at the «th step, an elicited value of E (£ k \£. = Vi 2 ei +
r k i ) will be obtained. This can be done by asking the expert to assess conditional median
values of Y th a t can be transformed, under the normality assumption, to conditional means
of the slopes of the piecewise-linear relation, or bar heights for factors, as will be discussed
_i
in Section 4.3.3. We can then obtain the conditional medians E (£k|£ = Vi 2 e{ + r k)i) in
(4.34) from E ( ^ k \P1, P2, • • • ,0.). The conditional values of
through a set of i graphs, each of which gives a value for a different
displayed
0
, j = 1 ,2,..., i.
Moreover, from the conditional m ultivariate normal theory, equation (4.33) gives
E ( i k\ =
&i + rk,i) = ^ k
h + (c'kil
c
■■■
'ki2
(4-35)
Then, from (4.34) and (4.35), we get
c 'K2 ■■■ qpa,<■
(4.36)
Hence, after finishing the k — 1 steps, the following system of equations can be formed
0fc,i = K ^ ) ck,lT]k,l,
Ok,2 =^(l)Cfc,l7/jfe,l + 5(2) Ck,2Vk,2,
Ok,k-1 = 5(l)cfc,ir/fc,i + 5(2)cfc)2?7fc,2 +
To solve for
c k}i ,
h 5(k - l)ck^i7]k,k-i-
i = 1,2, ...,k — 1, the system can be w ritten as
Ok, i
C/c,2
0 k ,2
\Ck,k-iy
\Qk,k-i J
n
105
(4.37)
where
6(l)rik)i
0
S(2)r)kt2
n =
(4.38)
^(2)^,2
•••
S{k -
Provided th a t r)kj ^ 0, \/ i = 1,2, ...,k — 1, the m atrix
/
Ck, 1
l)7]k, k - i J
is non-singular and hence
\
Ck,2
= n- i
\ c k>k- 1 y
@k,2
(4.39)
y^fc,fc-i y
Now, the variance-covariance m atrix Vk can be estim ated as follows:
'a'
I
Vk = Var
&
I
\
= Var
I
U -:
\ ^k ,k C'k Vk- 1
S M fib /
\
C*
(4.40)
/
\ik/
We define the matrices E^fc, for i = 1,2, • • • , k — 1, th a t conformally partition V]fc2_ 1 Ck E | fc
as
! v^i,fc \
1 Ck,1 ^
Ck ,2
■<2^
n ,k= v h c k
fc-i
\pk-l,k y
(4.41)
yCjfc,fc-l y
Following the same steps as in the case n = 2, and since Vk- \ and Hkjk are positive-definite
matrices, equations similar to (4.29) and (4.30) show th a t Vk is positive-definite if and only
if the m atrix
(l
\
C \
Ck
J5(k) J
106
is positive-definite. P utting
Fk = h{k) — CkCk
k- 1
i=l
— <5(&)
where Gk is idem potent of rank 1, it can be shown th a t
Fk = (h(k) - Gk) + ^1 - £ > ( * 0 ( 5 ( 0 4 ,^ Gk
is positive-definite if and only if
^~
(4.42)
>0-
This condition implies A: — 1 conditions for ck,i, i = 1,2, ...,k — 1, of the form
l - ^ 2 $ ( k ) 5 ( j ) cK
■3=1_____________
(4.43)
<S(i) x 5(k)
These k — 1 conditions guarantee th a t the elicited m atrix Vk in (4.40) is positive-definite,
provided th a t Vk- i is positive-definite. Since V2 is known to be positive-definite from Sec
tion 4.3.1, we can use m athem atical induction to prove th a t the full variance-covariance
m atrix Vn is positive-definite, as follows. For any number (n > 2) of correlated vectors ^ ,
0 2, • • •, (3 , the whole m atrix
Vn = Var
( ~\
a
I y,
•^1,1
&
£ 2,1
\JLn)
y ,
^-'1,2
^ 2,2
^n,2
’ ••
„
-^1,1
\
^2 ,r
' ’ ‘ Sn,n J
is certain to be positive-definite if (a) the n — 1 conditions in (4.43) hold and (b) Vn- \ is
positive-definite. This imposes an extra i — 1 conditions on each m atrix Vi, (i = 2, • • • , n —1),
so th a t each Vi is positive-definite. Then Vn is positive-definite under a num ber of J2k = 2 ^ ~
107
1 = ]Cfc=i k = n(n — l) /2 conditions of the form:
i —1
Using these conditions, the range of each 6 k,i, for i = 1,2,...,& — 1, k = 2
,
3
can be
computed and shown to the expert who can ensure th a t her assessed values fall within these
ranges. This will guarantee th a t the estim ated variance-covariance m atrix is positive-definite.
For i = 1,2, ...,k — 1, from (4.36) and (4.43), the range of 6 ^ is given by
^2
5 U)ck,jVk,j
1-
± Vk,-,
5 (k )s U)cl
This formula for the allowable range of 6 k,i has a drawback: we cannot calculate these ranges
until quite late in the assessment procedure, so the expert may sometimes be asked to revise
assessments th a t she made some time earlier. Hence, we decided to find a different approach
th a t gives a more direct range for each 6 kti, and which only asks the expert to modify recent
assessments th a t she has made. At step z, when conditioning on the value of .£ , the expert
may be asked to modify the assessment she has made in step z —1, b u t she will not be asked
to modify assessments she gave at stages before th at. This can be formulated as follows.
Instead of equation (4.34), let
(4.46)
(4.47)
j =i
where
■■■
%k,j = (Vkj
Vkj
Qk,i = idk^i
Qk,i
r
t k
/ ’
• • • Vkj)' =
''•
= -1’2’
®k,i) ~ ®k'i -
108
for arbitrary chosen r)k,j > 0,
In this case, using standard results of conditional expectations, we get
S fc.fc h + F a , - s k.k h + C'kZk'i,
j =i
for z — 1 ,2 ,..., k — 1.
(4.48)
Then equation (4.36) becomes
(4.49)
which gives
Hence
(4.50)
The positive-definiteness of the whole variance-covariance m atrix Vn is still guaranteed under
the same conditions in (4.44). B ut the allowable range for each O^i (i = 1,2,
— 1,;
k = 2,3, ...,n) has the simplified form,
l-^ c5 (fc)S (j)c|
This represents a simple range for
6^
, in comparison w ith (4.45).
(4.51)
The range in (4.51)
depends only on the change rjk,i in the zth variable, £ , not on the changes rjkj in all variables
ij, 3 = V " ) * -
4.3.3
as in (4.45).
A ssessm ent tasks
• The current assessment tasks start after eliciting all variance m atrices E ^ (i = 1,2,..., k).
• For any pair of correlated vectors (/?s, Pt), we assume th a t
(4.52)
where Cs>t is given in (4.21) and E SjS and E t,t are the variances of Ps and £1 , respectively.
• The expert will be shown a panel th a t simultaneously displays two graphs and a slider
(see Figure 4.3). For continuous covariates, the upper graph of the panel shows the
piecewise-linear relation between Y and X s. The slopes of the black (lower) curve
109
represent bs = i?(/?s), while the slopes of the blue (upper) curve represent the change
i
i
of E( P s) by S |)S77s, i.e. the slopes of the blue (upper) curve are bs + £ | )S?7s. The
black (lower) lines represent the expert’s original median assessments bu t she is asked
to suppose th a t the correct values are actually the blue (upper) lines.
Given this
information, the expert is asked to use the slider to change the position of the black
(middle) curve in the lower panel so th a t it gives her new opinion about the median
value th a t Y will take as X t varies. The m agnitude and direction of the change reflects
the correlation between @s and ^ .
Figure 4.3: Assessments needed for two correlated variables
• The two red (outer) piecewise-linear curves in the lower panel of Figure 4.3 represent the
allowable boundaries for the change of /? ; these boundaries ensure th a t the resulting
variance-covariance m atrix is positive-definite.
The boundaries are calculated from
the condition given in equation (4.28). Moving the slider simultaneously changes the
position of all the medians of Y in the lower panel. W hen the expert is happy w ith the
new position of the curve on the lower panel, the corresponding value of the slider is
110
used to compute cS)t, as will be shown later.
• The expert is asked to assume th a t the slopes of X s, in the upper panel of Figure 4.3,
i
have changed from bs to bs + Es)S?7s. Conditional on this information, she revises the
i
slopes of Xt, in the lower panel, changing them from bt to bt + E f2^ .
The exPert
changes all the slopes simultaneously using the slider.
• The size of the change, 7]s, in the conditioning variable, X s, in the upper panel, is
chosen such th a t the vertical distances between the two piecewise-linear curves in the
upper graph do not exceed the upper quartile at any of the knots of X s. This ensures
i
th a t the new conditioning values bs + E f)S77s are not too far from bs, as they have to
be values th a t the expert finds plausible. This choice is also not too close to bs, so it
should prom pt a measurable change in bt in the lower panel of Figure 4.3.
i
The software calculates medians to draw a piecewise-linear curve w ith slopes bs + 'Es,sVsFor i = 1,2, ••• ,S(s), the median value of Y at each knot i, m s,i,0 .5 , is changed to
m li, 0.5> as follows.
First, let
m s, 0 ,0 .5 ,
m s ,0,0.5 =
and
d i ,i —1 — 1"s,i
Ts , i —1-
Then, for i = 1,2,- • • , 6 (s), we put
K i , 0 . 5 - < i - l A5 = ^
+
di,i-1
M s,* ,0.5 — 7715,1-1,0.5 .
/ v ,5 \
= ---->
_!------*->---- + rjai^lsH,
di,i- i
I
I
where (EJ,s)j is the sum of the elements of the ith row of E f)S.
Hence,
i
m s, i , 0.5 = ^ 5,1—1,0.5 "i- m s , i , 0.5
111
~
1,0.5 T
1(^s,s)i-
(4.53)
If X s is a factor, then
i
m s,i, 0.5
In view of
( 4 .5 3 )
and
( 4 .5 4 ) ,
r)s
= m s,i, 0.5 + r]s( E l s)i.
( 4 .5 4 )
r)s can be chosen as
= m in i
I ™>s,i,0.75 -
771a,i,0.5 \
— — ------------------r —
, A CEv
,
( 4 .5 5 )
\ J 2 j= i d j,j-1 (Es,s)j J
for continuous covariates. For factors, it can be chosen as
T)s -
m in i
I m g i 0.75 -
»,0.5 1
— - ---------1-------- LJ—
V
(£?«)<
m t,i,0.5 = m t,i- 1,0.5
+
( 4 .2 8 ) ,
"li, 1,0.5
-
( 4 .5 6 )
1
In order to draw the red (outer) boundaries in Figure
bounds, m^i, 0.5 an(^ m t,i,0 .5 - From
•
4 .3 ,
we require upper and lower
if X t is a continuous covariate, we put
mt ,i- 1,0.5 +
Vs
J ^ y d i,i- i( S ^ )i,
(4.57)
and
S(s)
^ M ,0 .5 = ^ M - 1 ,0 .5 + " it ,i,0.5
-
- mt,i- 1,0.5 - W
(4.58)
If X s is a factor, we put
" l M,o.5 = ™ M ,o.5 +
( 4 -5 9 )
m t,i,0.5 = m t,i,0.5
( 4 -6 ° )
and
~ V s]J j^ {^ lt)i-
Using the slider, in view of (4.27), the expert changes the value of cS)t between its two
boundaries, ± l / y /S(s) x S(t). To be interpretable by the expert, the slider presents a
scaled range between -1 and 1 as a measure of correlation between /3 and ^ . Hence
cSft = The slider value / \/S(s) x 5(t).
The corresponding new curve, say Tn't i 0 5, is interactively changing w ith each movement
of the slider. For continuous covariates, m't i Qi5 is computed after m 't ^ _ 1 0 5 has been
calculated:
i
m t,i,0.5 = m 't,i-i,0.5 + m t,i,0.5 - m t,i- 1,0.5 + Cs j d i' i -i & l J i.
112
(4.61)
For factors
m t,i,0.5 =
+ Cs,t(E t,t)i-
(4 -6 2 )
W hen the expert is happy with the new position of the curve, the value of cStt is used
in (4.21) and (4.52) to calculate the covariances between @s and (3 .
• For k > 2 correlated vectors of coefficients, the process will consist of —1 steps. At the
zth step, the expert will be asked to change the conditional medians of ((3k\(31 , /?2, • • • ,
by a value of 9 ^ given a set of i graphs, each of which shows a change with a different
fixed value rjj for each /? , j — 1 ,2,..., z.
• However, we choose not to offer this general case as an option in the interactive soft
ware. Although it has been shown to have a consistent m athem atical framework and
adequate theoretical properties as proposed in Section 4.3.2, its practical implementa
tion may raise some critical issues in the elicitation process. Conditioning on simulta
neous changes in many graphs for different variables gives too much information for an
expert to readily absorb. She may not be able to assess the direct conditional impact
of these changes on the variable of concern.
• Another difficulty arises in choosing the different values rjj, j = 1,2, • • • , 2 , th a t control
the change in the conditioning set used in step
2.
These values m ust be carefully
specified so th a t the resulting simultaneous change represents a valid combination of
values th a t is acceptable by the expert to condition on.
• A general problem in successive increment of variables in the conditioning set is th a t the
allowable range of medians at the variable of concern gets tighter as we approach the
last variable in the list. This problem is not only a practical one, but it has also been
shown th a t variances, and hence covariances, of the last variables in the list are usually
over estim ated by the expert due to incremental conditioning (Garthwaite, 1994). These
drawbacks constitute the motivation for the third elicitation m ethod proposed in the
next section.
113
4.4
A general flexib le elicita tio n m eth o d for correlated coef
ficients
The aim here is to form an elicitation m ethod suitable for GLMs th a t contain a large num
ber of correlated vectors. We propose the following elicitation m ethod as a promising new
approach for eliciting the whole variance-covariance m atrix. It uses only a small num ber of
assessments th a t directly reflect the p attern of correlations between all pairs of vectors.
The m ethod avoids the previously mentioned disadvantages of using incremented condi
tioning sets of variables. Instead, the m ethod treats all variables symmetrically. As w ith
the m ethod proposed in Section 4.3.1, it assumes a fixed correlation structure for the ele
ments of each pair of vectors. The current m ethod differs from the generalization proposed
in Section 4.3.2, in th a t it avoids incremented conditioning and assesses all covariances si
multaneously.
The main idea is th a t the expert assesses the relative m agnitudes of the average corre
lations between each pair of vectors. She is asked to ensure th a t these weights reflect the
strength of the average correlation of each pair relative to each other pair. The expert need
not be conscious of conditions th a t are required for m athem atical coherence. Instead, the
assessed relative weights will be scaled to ensure th a t the assessed variance-covariance m atrix
is positive-definite.
The current m ethod can be used alone or together with one of the two m ethods proposed
before in this chapter. In the latter case, the current m ethod needs an assessment of the
correlation of only one pair of vectors, then all other correlations are computed using the
relative weights. This correlation assessment may be obtained using the m ethod proposed
in Section 4.2 or the m ethod proposed in Section 4.3.1. W ith the latter m ethod the expert
might use a slider to adjust the slopes of one vector of a highly correlated pair.
In what follows, the m ethod is introduced in detail and the scaling needed to obtain a
positive-definite m atrix will also be investigated.
114
Assuming th a t all the k covariates are correlated, let
t = 1 ,2 ,- .. ,fc,
(4.63)
then
Var( 0 = h(i),
i = 1,2, •••,&.
(4.64)
For all z = 1,2, • • ■, k, j = 1,2, • • • ,k, i ^ j, we assume th a t
(
CiJ
\
' ••
C1,3
in
(4.65)
= CiJ =
\°i,j
* jj
•••
6(i)x5(j)
w ith Cjti = Cij.
Then
C o v (g .,£ ) = X f - C i j Z l j ,
(4.66)
and hence
V = Var
= AJ" z . z G a "Iz , z ,
(4.67)
i
i
where A £ .. is a block-diagonal m atrix with S ? i as the zth main diagonal block and
C =
C \ tk
I 6 (1)
C l ,2
C 2 ,l
I 8(2)
'•
:
•
•
Ck—l,k
•
\Ck, i
•••
Ck,k-1
(4.68)
h(k)
with
C u = C ij.
1
1
Since each EA is positive-definite, so is A |.... Hence, we can state th a t V in (4.67) is positivedefinite if and only if C in (4.68) is positive-definite.
115
For i = 1,2, • • • , k, j = 1,2, • • • , k, i ^ j , let
Cij = cwij,
(4.69)
where w^j are the relative weights to be assessed from the expert and c > 0 is a fixed scaling
constant th a t adjusts to ensure th a t C is positive-definite.
The m ain assessment task w ith this m ethod consists of one dialogue box. An example
is shown in Figure 4.4. The expert assesses the relative m agnitudes (weights) and signs
of different correlations between all pairs of vectors. Since the correlation m atrix m ust be
symmetric, we just require the elements below the m ain diagonal to be assessed. Hence, when
there are n vectors of coefficients, we require n ( n — l) /2 assessments for this stage. The main
diagonal elements are necessarily set equal to ones, as C is a correlation m atrix.
E n te r y o u r re lativ e c o rre c tio n s :
X1
C cw artate:
XI
X2
X3
X4
X5
r n
c
f
[
X2
1
r
[
i N ex t»
1
X3
X4
xs
1
1
1
!
1
....... 1........
1
I
l
1
j
1
j~HeipTj
Figure 4.4: Assessments needed for five correlated variables
The relative weights th a t are assessed in this task need not be coherent correlation co
efficients. For example, they are not necessarily restricted to be between -1 and 1. Instead,
any assessed numbers are accepted; they m ust simply reflect the m agnitude of the correla
tion between any pair of vectors relative to other pairs. Negative values are allowed and are
appropriate when an expert believes a correlation is negative. The expert is asked to assess a
single weight for each pair of vectors. The weight should reflect her opinion about the average
correlation between all pairs of elements in th a t pair of vectors.
116
The relative weights assessed in Figure 4.4 will be denoted by w*j, where w*j corresponds
to the fixed average correlation between all elements of
and (3 . The expert is asked to
ensure th a t the relative m agnitudes of w*j, i = 1 ,2 , • • • ,k, j = 1 ,2 , • • • ,k, i > j, model her
opinion about the relative correlation of each pair compared to the others. As mentioned
before, wf 4■will be scaled later to attain m athem atical coherent values of correlations.
For m athem atical simplicity, we use the weights, Wij, of correlations
between
and £.
when investigating the conditions required for the scaling constant c in (4.69). However, we
assess the weights w*j in terms of (3. and
, as the expert cannot think about correlations
between the transform ed vectors —i
£. and —j
£.. Hence, we need an explicit relationship between
Witj and w*j. We obtain one as follows.
For i = 1,2, • • • ,k, j = 1,2, • • • , k, i > j, let
<j = » g
(4 '7°)
be the scaled average correlation between (3. and (3.. Then
—i
—j
r*
-
£ r = l S S l[C o v ( A ,r ,ft> )/o -r ^ ]
5(i)5(j)
’
1
;
where, as in (4.66), Cov(/lyr ,/3j)S) is the (r, s) element of CovQT,/T), and oy and as are the
square roots of the r th and sth main diagonal elements of E^i and Ej j , respectively.
Hence, from (4.65), (4.66) and (4.71),
E 2 ? iE £ S K ./< v r.]
' : : 'r
(4.72)
<5
1
i
where ar>s is the (r, s) element of E ^ l ^ j ^ E j C , i.e.
Ci i = S1,i3
S(i)
^ S ( j ) r
/
.
1
(4.73)
X]r'=l H2s=l\.ar>s/arVs]
So, in view of (4.69) and (4.70), we have
IVii
= w 1, 3
Jh 3 „•
^
8(i)
^ 6 ( j )
r
,
! *
(4.74)
X 3 s = 1 [°rr',s/crr'Crs]
It remains now to investigate the allowable range for the positive scaling constant c, so th a t
C in (4.68), and consequently V in (4.67), are positive-definite.
117
First, from (4.69), we write C in (4.68) as
C = I + cW,
(4.75)
where I is the identity m atrix of order X3j=i 5(j), W is a conformally partitioned m atrix with
main diagonal zero block matrices, and all the elements of each (i,j) off-diagonal block are
equal to
Wi j .
Let Aw, i, i — 1) 2, • • • , Y l j = i ^C?)j b e the eigenvalues of W . We have th a t
min(Avy,i) < 0,
1
i
since if not, W with zero main diagonal elements will be a nonnegative-definite m atrix, in
which case
wf j < WijWjj = 0,
Vi ^ j,
which is true if and only if W is a zero m atrix.
Since I and W are symmetric, C in (4.75) is positive-definite if and only if all its eigenvalues,
say Ac , i , i = 1,2, • • • , Yj)= i
are strictly greater th an zero.
But
k
^C,i = 1 +
i — 1,2, • • • ,
S(j).
(4.76)
j=i
Consequently, C is positive-definite if and only if
min(Ac'i) > 0,
%
i.e. if and only if
C
<
.
7 '1
V
min(Aw,i).
i
(4 -7 7 )
The condition in (4.77) guarantees th a t C and V are positive-definite, and also th a t
Cij = c w i j , i
j , are coherent correlation values, since, from the positive-definiteness of C ,
c i,j
The software obtains the value of
<
ci,icj , j ~
1-
using the eigenvalue decomposition of the m atrix
W. Then the boundary of c in (4.77) is computed.
118
W ith the software, different options are available to the expert for assessing a value of c
th a t fulfils condition (4.77). The default option is to use a slider. The expert chooses the
value of c th a t represents her opinion on the basis of interactive graphs. Specifically, the
software displays a panel w ith k graphs, as illustrated in Figure 4.5.
G iven th e c h a n g e s o f (Y) o n th e u p p e r p a n el, g iv e y o u r ne w a s s e s s m e n t s o n a ll th e low er p a n e ls using th e slid er.
tntdi&ns of (Y) a t valu o s of <X4) conditional o n tfio a b o v e c hange* of (XI).
Figure 4.5: Assessments needed for various correlated variables
The upper graph shows the slopes for one continuous covariate after each of its slopes
has been changed by a fixed amount,
77.
This covariate is one of the m ostly highly correlated
pair of vectors. In the same m anner as in Section 4.3.3, the expert is asked to assess the new
medians of all other k — 1 covariates (factors) given the change in the above graph. A part
from the condition in (4.77), other equations needed for drawing the graphs are exactly as in
Section 4.3.3.
Instead of using the slider and all graphs in Figure 4.5, another two options are also
available to the expert after assessing the relative weights w*j. As the first option, the expert
can choose to use the m ethod proposed in Section 4.2, to elicit different covariances for the
elements of the highest correlated pair, say
and (3 . An averaging argum ent as in (4.71)
119
is then used to get c*>t. As the second option, the expert might use the m ethod proposed in
Section 4.3.1 to obtain c* t . In both cases, the value of c may be taken, for a small e > 0, as
c = mm
min(Aw;i)
(4.78)
The expert may choose the option th a t suits her most. For example, the option th at
combines the current m ethod with the one in Section 4.2.1 is flexible although it requires
more conditional assessments. However, we favour the default option as it gives the expert a
good chance to see how all the other covariates are affected by her choice of c.
The expert can, of course, go back in the software to change her assessed values of w*j,
if she finds th a t the allowable range of c is not a reasonable representation of her opinion.
4.5
C oncluding com m en ts
Three different methods for eliciting expert opinion about the variance-covariance m atrix of
correlated coefficients in GLMs have been proposed.
The first m ethod is the most flexible for modelling correlations between pairs of vectors
- it is a good m ethod if correlations are only substantial between a few pairs of variables,
while the other correlations are near zero. However, it needs lots of assessments if there are
lots of variables th a t are inter-related, and the number may become uncomfortably large.
The positive-definiteness of the resulting m atrix has only been investigated in the case of
two vectors of correlated coefficients. No clear conditions have been investigated for the
positive-definiteness of the whole m atrix if many vectors of coefficients are thought to be
correlated.
The second proposed m ethod requires fewer assessments and has been shown to be a valid
m ethod for any number of vectors of correlated coefficients. Also, the required conditions for
positive-definiteness of the covariance m atrix in this m ethod have been investigated. These
were translated into boundaries for conditional assessments on the interactive graphs, which
helps the expert fulfill the conditions. The disadvantage of the m ethod is th a t it makes
120
strong assumptions about the correlation structure between two vectors of coefficients, and
sometimes the assumptions will be inappropriate.
The third proposed m ethod requires a smaller num ber of assessments. For n > 2 cor
related vectors of coefficients, the expert is required to make only n(n — l) / 2 assessments
of relative m agnitudes of correlations between pairs of vectors. This leads to coherent es
tim ates of correlations and a scaled variance-covariance m atrix th a t is guaranteed to be
positive-definite. The needed conditional medians can be easily assessed from the expert by
the movement of one slider using the available user-friendly software. The m ethod has been
shown to give flexible options to the expert as an extension of the first or the second proposed
methods. This third m ethod is very promising. It also avoids incremented conditioning and
treats all covariates symmetrically.
121
C hapter 5
E liciting prior distributions for
extra param eters of som e GLM s
122
5.1
In trod u ction
So far, we have completed the process of eliciting the m ultivariate prior distribution for the
vector of regression coefficients of any GLM. However, in some common GLMs, such as the
normal and gamm a regression models, the regression param eters are not the only param eters
in the sampling model. The other param eters in these GLMs m ust be either assumed known
or expert opinion about them m ust be quantified in a suitable way.
In normal GLMs, prior opinion about regression coefficients can be quantified using the
m ethods discussed in the previous two chapters. However, prior opinion about the error
variance in normal GLMs must also be quantified to complete the prior distribution of all
the model param eters.
A limited num ber of elicitation m ethods for error variance in normal linear models has
been proposed in the literature. See, for example, Kadane et al (1980), Garthw aite and
Dickey (1988) and Ibrahim and Laud (1994). However, these available m ethods have been
criticized for using assessment tasks th a t the expert may not be very good at performing
(Garthwaite et a l , 2005).
The m ethod of Garthw aite and Dickey (1988) elicits a conjugate inverted chi-squared
prior distribution for the error variance through conditional assessments th a t depend only
upon the experimental error. The expert is required to assess her median of the absolute
difference between two observed values of the response variable at the same design point.
Then conditional medians of the same difference is assessed given a set of hypothetical data.
These two assessments are sufficient to elicit the two hyperparam eters of the inverted chisquared prior of the normal error variance. However, it is better to specify several d a ta sets
and get a conditional median for each d a ta set, then different assessments can be reconciled
to elicit the two hyperparam eters. In this chapter, we propose an elicitation m ethod based
on more than one d ata set of hypothetical future samples.
The second task addressed in this chapter is to assess prior distributions for the shape
param eter of a gamm a distribution and the scale param eter of gamm a GLMs. Prior dis123
tributions for these param eters have been proposed in the literature [see for example Miller
(1980), West (1985) or Chen and Ibrahim (2003)], but no prior elicitation m ethod for these
param eters has been suggested. To fill this gap, we propose a new m ethod for eliciting log
normal prior distributions for such param eters. The proposed m ethod is based on conditional
quartile assessments given th a t the m ean of the gamm a distribution is known or has already
been elicited.
In Section 5.2 of this chapter, we extend the m ethod of Garthw aite and Dickey (1988) for
eliciting the variance of random errors in normal GLMs. A novel m ethod for eliciting a lognor
mal prior distribution for the scale param eter in gamma GLMs is proposed in Section 5.3. The
two methods have been implemented as extra options in our elicitation software PEGS-GLM
(Correlated Coefficients) th a t is freely available at http://statistics.open.ac.uk/elicitation.
5.2
E licitin g a prior d istrib u tio n for th e error variance in nor
m al GLM s
The m ethod of Garthwaite and Dickey (1988) is based on conditional assessments th a t depend
only on the random error to elicit a conjugate inverted chi-squared prior distribution for the
normal error variance. In their m ethod, the expert is asked to assume th a t two observations
are taken at the same design point. Then she assesses her m edian of their absolute difference
- the two observations differ only because of random variation.
The m ethod has been also used to quantify experts’ opinion about m ultivariate normal
distributions [Al-Awadhi and Garthwaite (1998, 2001), Garthw aite and Al-Awadhi (2001)].
However, it has been criticized for eliciting only the minimum num ber of assessments th a t
are required to determine the hyperparam eters. To overcome this, G arthw aite et al. (2005)
suggested th a t it is a good idea to elicit more th an one estim ate of the hyperparam eters and
to then reconcile these estimates in some way.
The aim of this section is to extend the m ethod of Garthw aite and Dickey (1988) by
124
increasing the size and frequency of the hypothetical (virtual) sample d a ta th a t are used as
the conditioning set on which the expert is modifying her opinion. Our extended m ethod
is designed to elicit a conjugate prior for the error variance in normal GLMs. This will
complete the prior distribution structure of these models when the prior distribution of their
regression coefficients is elicited using the piecewise-linear model discussed in the previous
chapters. However, the m ethod developed here can be used to elicit the prior distribution of
error variance in any normal model where the prior distribution of its regression coefficients
is totally known or has been elicited using any other elicitation method.
The theoretical derivation of the proposed extension is detailed in Section 5.2.1. The
implem entation of the m ethod has been programmed as a new option in the PEGS-GLM
(Correlated Coefficients) software. The assessment tasks and the description of the procedure
th a t implements our proposed m ethod are discussed in Section 5.2.2.
5.2.1
T he m athem atical fram ework and n otation s
The normal GLM assumes th a t the link function g(.) in (3.3) is the identity link function,
which means, in view of (3.2), th at
£ = a + (3\Xi + P2 X 2 +
where e is assumed to be a
h /3^+nX m+n + e,
(5.1)
normal random error w ith zero mean and an unknown variance
i.e.
e~ N (0 ,< r* ).
(5.2)
A conjugate prior for u\ is the inverted chi-squared distribution [see, for example, P ra tt
et al (1995), Kadane et al (1980) or Garthwaite and Dickey (1988)]. Equivalently, we assume
th a t
<jg.~ Inverted G am m a(z//2,vw/2),
125
(5-3)
The aim now is to elicit the values of the hyperparam eters v and w of the pdf in (5.4). To
attain this, the expert should preferably be asked to assess values th a t depend only on the
random variation. For th at, the m ethod of Garthw aite and Dickey (1988) requires the expert
to assess a median value, say
<70,
of the absolute difference, |£i — C2 I, between two observed
values of the response variable ( at the same design point (X i, X 2 , • • • , X n+m).
The expert is then asked to assume th a t the true value of this absolute difference is a
suggested value z. Given this piece of information, she gives her new median assessment , say
<71 , of the absolute difference between two observations for any new hypothetical experiments
at the same design point (X \, X 2 , • • • , Xn+m). The difference between qo and the new m edian
assessment, <71 , reflects the expert’s confidence in her first median assessment qo. Then both
qo and q\ were used in Garthwaite and Dickey (1988) to calculate the two hyperparam eters
v and w.
To extend their m ethod, instead of conditioning on only one hypothetical datum z, we
repeat the assessment of the conditional median for a num ber of s steps. At each step,
the condition is on a steadily increasing set of hypothetical data representing the response
differences for pairs of experiments at the same design point.
At each step j , j = 1,2, • • • , s, the expert is asked to assume th a t a num ber k ( j ) = 2J_1
of experiment pairs at the same design point has given a hypothetical d a ta set of absolute
differences, zi, Z2 , • •• , Zk(j)- She is then asked to give her conditional median qj of the absolute
response difference of a new pair of experiments at the same design point. In w hat follows,
we show how to use these assessments to estim ate a number of elicited values th a t can be
reconciled to give a better assessment of v and w.
For i = 1, • • • ,k, where k > 1 is any integer number, let Z{ be the difference between the
two observed values,
£^1
and (^ 2 , of the response variable ( in any two experiments at the
same design point ( X v • • • , X m+n), i.e. Z { = Q,i -
0 ,2 -
Clearly, from (5.1) and (5.2), given of, the random variables Zi, ■■■, Z k are independent
and identically distributed normal variates, i.e. for i — 1,2, • • • , k,
(5.5)
ZiWe ~ N(0, 2(7^),
with the joint distribution
f ( z i , --- , z k \cr2e )
— 0 0 < Zi <
00,
a 2 > 0.
(5.6)
From (5.4) and (5.6), the joint distribution of Z\, • • •, Z k and a 2 is given by
/_1\ ^ + 1
{uw/ 2 ) v ! 2
/(•*!.• •• ,Zk,ae',v <w )
exp { - 4 ^ [ T , zi +2,' w
vi=1
~~ r (i//2 )(4 7 r )fc/ 2 W /
— 00 < Z{ <
00,
<j2 i ' , w > 0.
(5-7)
Integrating cr^ out from the RHS of (5.7), we get
v+ k
f{z
,Zk \U ,W )=
~2
r ( ( ^ + fc)/2)
1
j
.
^
v( 2 w)
r(i//2 ) K ( 2 « )) ] fe/ 2
— 0 0 < zi <
00,
w > 0,
(5.8)
which is the ft-variate version of the general three-param eter Student-i distribution with v
degrees of freedom, zero mean vector and a diagonal scale m atrix 2w l k, where I k is the
identity m atrix of order k , i.e.
Z h ' ■’ >
(5.9)
2w l k).
~
Now, the conditional distribution of a 2 given Z\ = zi,-- - , Zk = Zk, can be obtained by
dividing the RHS of (5.7) by th at of (5.8) to get
(v + k ) / 2
1
f { p l \ Z x = z l r -- , Z k = zk\v,w) =
T((v + k ) / 2 )
2=1
k
+ 1
exp < -
2 vw
zj
+
i=l
127
<T2 , V, W
> 0.
(5.10)
Since the inverted gamma distribution is a conjugate prior for erf, comparing (5.10) with
(5.4), we can write
(of |Zi = Zi, • • • , Z k = zk) ~ Inverted Gamma
^
^
(5.11)
where
vw +
2-1 1
2
(5.12)
For j = 0,1, • • • , s, define a new set, Z ^ = C(j),i ~ C(j),2 >°f the response variable differences
for two further experiments at the same design point (X lt • • • ,X m+n). The variates in this
new set are iid with the same normal distribution as in (5.5).
The conditional distribution of ( Z ^ j Z i = z \ , ••• , Z k^
= z k(j)), w ith k(j) = 2J~1, for
j = 1, • • • , s, is given by
1^1
^1)
’ ^k(j)
zk(j))
r oo
/
/ ( % ) k e ) X / ( ^ l ^ l = *1> ' • • »
J <jf=0
= ZHj))dc7e ■
(5-13)
Using the normal distribution in (5.5), and putting k = k ( j ) in (5.11), the integrand in (5.13)
is similar to the RHS of (5.7) with k set equal to 1, v replaced by v + k(j) and w replaced
by 'iOfc(j) with k set equal to k ( j ) in (5.12).
As in (5.8) and (5.9), integrating erf out from (5.13) gives
(^(j ) \ ^ 1 ~
i ^k(j)
Zk(j))
for j = 1, • • • ,s.
Similarly, for j = 0, the marginal unconditional distribution of Z(q) is obtained, from (5.4)
and (5.5), as
Z {0) ~ tv (0,2w).
(5.15)
As will be discussed in the next section, under reasonable choices of the conditioning values
z h ’ •1 ’ z k(j) j the expert assesses her median of the absolute value for each of the Student-i
distributions in (5.14) and (5.15). These are exactly the upper quartiles of the t-variates,
from symm etry about zero.
128
Let the assessed upper quartile of
by go and
for j — R • • •
and
(Z ^ \Z \ — zi,
• • • , Z k(j) =
z k(j))
be denoted
respectively. If we denote the upper quartile of a standard
Student-t distribution with is degrees of freedom by Q u, then we have
Qo = (2w) 1 / 2 Q„,
(5.16)
and
Qj
^ ^ k{j)}
^
Qv+k{j)i
(5.17)
for j = 1, • • • ,s.
The aim now is to solve the above pairs of equations for is and w. By division, for each pair,
we get
qo
qj
w
Q"
r W lV 2 .
Qv+k(j) -Wk(j).
(5.18)
Using (5.12), (5.16), we can eliminate w from (5.18), to get
qo
qj
v + k{j)
Qi
(5.19)
Qv+k(j) L^ + « E S ( V ? o ) 2J
for j = 1, • • • , s.
For each value of j , the assessed ratio of qo/qj is used by the software to search for the
value of the degrees of freedom is, say isj, th at solves equation (5.19).
To guarantee the existence of a unique solution for v using this approach, two conditions
m ust be imposed on the function in (5.19). It m ust be strictly monotonic in v on the interval
of concern. For statistical coherence, the assessed quartile, qj, m ust also be above a lower
limit, say aj, for j = 1,2, • • • , s.
To satisfy the latter condition, we assume th a t there is a reasonable minimum value of
the elicited degrees of freedom, say min(z/). Since go has already been assessed, using the
extreme value min(z/) in the RHS of (5.19) gives the lower limit of qj, as follows:
aj =
for j = 1,2,
go
m in{is) + Q2min(t/) Y!t=Kzi/go?
min(z/) + k { j )
, s.
129
(5.20)
Setting this limit, we can now investigate the monotonicity condition. In fact, the mono
tonicity of (5.19) as a function of v is required to ensure th a t there exists a unique value
Vj > min(^) th at satisfies (5.19) for qj > aj, j = 1,2, • • • , s.
In (5.19), if we put
HJ)
C3
=
X ^ / 2 ° ) 2>
(5 -2 1 )
i=1
then the first derivative of qo/qj with respect to v will take the form
3 (g o /g j)
V 2 f e /g O ) Q v Q v + m
Qi+k(j)(cjQi + ^ 2
dv
+ 2 v(y
r r pSrp
-hdW O '
^
1
^))Qv+k{j)]
k^j^^QyQv+kti) ~ QuQv+k{j)) ~ k'U)QvQi'+k(j)}'
So, for all v > m in(^),
9(qo/qj)
<0
dv
if and only if
Cj <
min
I"
k{j)Q vQ v+k{j)
+ k(j)){QvQv+k(j)
Q v Q u + k(j )) \
QllQv+m-^ +mQWkU)}
J
_
p
/ r
o o N
j’0'
Since there does not exist a closed form for the derivative of a Student-t quantile with
respect to its degrees of freedom, the values of Cjto cannot be found analytically. Instead,
these values have been computed numerically using Maple 14 Software, for s = 5, v G [1,50].
Figure 5.1 lists these values of Cjto, where the derivative d(qo/qj)jdv is plotted against v and
Cj , for j = 1,2, ••• ,5.
130
0J3&0.0?J
OOH
O.Q5-,'
0.04-1
0.03-/
4
0U341
J
0 .0 2 -J
0 .02-1
0.0 i-J
For fc(l) = 1, C i ;0 = 1.626.
For k( 2) = 2, C 2,0 = 3.367.
0.0 i4
dv
For k( 3) = 4, C3,0 = 6.950.
For k( 4) = 8, (74,0 = 14.222.
For j = 1,2, • • • ,5, Cj,o is such that:
— ( — J < 0, for all 1 < v < 50,
d v \ qj J
50 40 20 .;
d_
c)v
/-
0.1
-0.0(
-
if and only if Cj < Cj, o-
0 .00!
-0.01C
For k{ 5) = 16, C5,o = 28.846.
Figure 5.1: Three dimension plots of d(qo/qj)/dv against v and Cj for various sample sizes
k ti)-
131
It can be seen from Figure 5.1 th at
C i,o< % £,
for j = 2 ,3 ,4 ,5 .
(5.23)
Now, from (5.21), (5.22), (5.23) and Figure 5.1, we can state th a t the function in (5.19) is
strictly monotonic decreasing in v, for all 1 < v < 50 and j = 1,2, • • • ,5, if and only if
1.626.
(5.24)
K j)
Although we have not examined the case where v > 50, Figure 5.1 suggests th a t (5.24) holds
for v >
1.
In the implem entation of the m ethod, the software generates the values of zi th a t satisfy
(5.24). Hence, for j = 1 ,2, • • • , 5, a unique solution Vj can be obtained from (5.19), then the
corresponding Wj can be obtained by substituting Vj for v in (5.16). We then reconcile the
five different values of the degrees of freedom param eter v by taking their geometric mean.
W hen averaging different assessments of a degrees of freedom param eter, taking their
geometric mean is favored, by empirical evidence, rather th an their arithm etic mean. See for
example, Al-Awadhi (1997), Al-Awadhi and Garthw aite (1998) or G arthw aite et al. (2005).
The elicited value of w can then be obtained from (5.16) by substituting for v w ith the
geometric mean of v\, • • • ,u 5 .
Finally, we assume th a t the regression coefficients vector of param eters (3 = (a,/?i, • • • ,
(3m+n) is independent from
a priori, and give the full prior structure of the normal GLM
as
/ ( £ , ^ ) = / i ( £ ) / 2 (<7e2),
(5.25)
where f i (/?) can be taken as the m ultivariate normal prior distribution elicited in the previous
chapters, and f 2 (&e) =
as given in (5.4) with the elicited hyperparam eters v and
w.
132
5.2.2
Im plem entation and assessm ent tasks
The elicitation m ethod proposed in the previous section has been programmed into the PEGSGLM (Correlated Coefficients) software by the author of this thesis. The option of eliciting
the prior distribution of the random error variance is given to the expert once she selects
her model as an “ordinary linear regression” model.
The same procedure has also been
programmed in a separate piece of software th a t can be used as an add-on to any other
elicitation software for normal models. This developed software is freely available as PEGSNormal at http://statistics.open.ac.uk/elicitation.
In a dialogue box, the expert is asked to assume th a t two independent experiments have
been conducted at the same design point, i.e. at the same values of the explanatory variables.
She then assesses her median value, go, of the absolute difference, |£(o)|» between the observed
values of the response variable after these two virtual experiments.
Since the distribution of Z(q) is symmetric about zero, see (5.15), the assessed median
go of |Z (0)| is exactly the upper quartile of Z(0). In fact, Pr{|Z (0)| < go} — 0.5 implies
th a t P r{-g o < Zyy < go} = 0.5, which implies from symm etry th a t P r {Z(0) < go} = 0.75.
Similarly, from (5.14), each upper quartile qj, for j = 1, • ■■,s will be assessed as the median
of the absolute difference \ Z ^ \ given th a t Z\ = z\, • • • , Z ^ = z ^ y
In assessing the remaining conditional medians qj, the choice of the conditioning values
zi, Z2 , • • • , Zf-yy for j = 1,2, • • • , 5, is an im portant issue. As mentioned before, the m ethod
of Garthw aite and Dickey (1988) uses only one hypothetical d a ta point z\, for which they
suggested a value of z\ = qo/2. They argued th at, this choice will give a conditioning value
th a t is not too close to go, so as to prom pt a significant change in the expert’s opinion in
assessing gi. This value of z\ is, at the same time, not too far from go, so as to represent an
acceptable value for the expert to condition on.
In our im plem entation of the extended m ethod, the above two criteria will be considered
in choosing values for Z{, i > 1. This means th a t the values should result in a considerable
change in the expert’s opinion, while the expert still find them plausible values. To a tta in
133
this, we take
z\
— qo/
2 , following Garthwaite and Dickey (1988). Then we generate four extra
sets of hypothetical data, for j — 2, • • • , 5, the j th set consists of k ( j ) = 2J_1 data points.
The first 2J 2 data points of each set, namely
zi,
■■• ,,
of the previous data set, while the new extra elements
z 2j - 2 ,
are taken as the same elements
z 2j - 2 + 1 , • • • , , z 2 j - i ,
are generated as
follows.
For i = 2J-2 + 1, • ■• , 2-7-1, we generate Zi as random variates from a population with a
median of qo/2. Hence, we choose each zi as the absolute value of a normal variate with zero
mean and a variance of (go/1-349)2. Thus, the interquartile range of this normal distribution
is go, and the upper quartile of the signed variates, which is also the m edian of the unsigned
ones, is exactly qo/2.
For any d a ta set j , j = 2, • • • , 5, if the generated values fail to satisfy the following
condition
(5.26)
we resample the new elements
(5.26)
z 2 j - 2 + 1 , • • • , , z 2j - i ,
from the same normal distribution, until
is satisfied. This guarantees th a t the generated d a ta should prom pt the expert to
revise her opinion by a substantial amount.
To implement the proposed procedure, The expert is asked to perform an assessment
task th a t consists of s = 5 steps. In each step j , for j = 1, • • • ,5, the software presents an
interactive graph to the expert. The graph in Figure 5.2 is an example of the graph presented
to the expert by the software at step j = 3.
134
J* ?
th e drsiributKjn o f th e norniailerrorvarianae
The thick black Brce marks your original assessment of the median diffenenoe betxieen two responses at the same design point.
But suppose two experiments mere conducted aft e a * of a number of design points.
The difference between each pair Of response s is marked by an arrow in the diagram th e new differences are marked by green arrows and earlier ones (or th e earlier one3 by black arrows.
The median value Ofthese arrows is also matted -b y a downward-pointing arrow.
If w e again ran another two experiments at one design point, their values are again likely to differ,
how how big do you think their difference would be?
Please giwe your median assessment by <£ddng on the horizontal fere.
fYour assessment should be between the red marksj
Eliciting Conditional M edians of The Absolute D ifference of two re sp o n s e s at th e sam e d esign point
I-------------1------------- j]----------------0.0
1.0
2.0
|:
1---L)---------------------------1------------- 1------------- ,-------------1------------- 1------------- ,
4.0
5.0
0|0
7.0
8.0
9.0
10.0
11.0
12.0
M ed ian of 4 d a ta p o in ts
<
A bsolute Difference
lA ssessed m edian a t4 .8 9 2 6 |
Figure 5.2: Assessing a median value conditioning on a set of data
This graph shows the expert’s first unconditional median qo drawn as the thick black long
line and the more recent assessed median in the second
green long line. The graph also shows a number
other thick
generated data points
z i , • • • ,Z 4 , represented by upward arrows, together with a downward arrow that shows the
sample median of this virtual data set.
The upward arrows of the data points from the previous set of hypothetical data, z \ and
Z2 , are shown in the green color, while the upward arrows of the new generated data points,
Z3
and
Z4,
are shown in the black color.
Given the virtual data set (displayed as arrows), the expert is asked to assess her current
median value
<73
by clicking on the horizontal line between the two short red lines. These
are the lower limit
<23
computed as in (5.20) with min(i/) = 1 and the initial assessment qo.
The expert’s median must lie between the red boundaries, otherwise she will get a warning
message asking her to re-assess her median and satisfy this condition.
To assess <7 3 , the expert has two obvious strategies. The first strategy (the black one) is
to look at the black line that shows her initial assessment qo, and decide where to revise this
value in the light of the new information given by the black downward arrow that shows the
median of the whole hypothetical data set z \ , • • • , 2 4 . The other strategy (the green one) is
for the expert to look at the green line th a t shows her most recent median assessment which
has been based on the hypothetical data set in green arrows z\ and
22.
She then decides
where to revise this median assessment in the light of the new generated points
23
and
24
shown as the black arrows.
W ith both of these strategies, if the expert is confident about her previous assessment,
then her new median assessment should be near to this value rather th an near to the new
hypothetical data. W hen the expert gives her new median assessment #3 , its value is first
used by the software to compute z/3 from (5.19), and then to compute W3 from (5.16) using
^3The final output of the procedure, as illustrated in Figure 5.3, gives the five different
elicited pairs of v and w, together w ith the geometric mean of 1/ and its corresponding value
of w. The expert is asked to check w hether the different elicited values are close to each
other and represents her opinion well. If not, she has the option to change any of them by
going back to reassess a specific qj through pressing the corresponding ‘Change’ b u tto n for
this step, see Figure 5.3.
]|fi| GIM tUCITATION (eliciting the distribution of the normal error variance)
Step
EScited value of DF
—
.. ^
.....
“ -a n
EBtited value of W
1
3.6740
32.2546
[C h an g e
2
3.0870
30.9910
j ' Change
3
4.5280
333489
t Change
4
4.3840
333635
f Change
5
2.7370
30.0095
( Change
Average
3.6136
32.1419
| Head>d
jrHejp?(t7r|
Figure 5.3: The output table showing the elicited hyperparam eters
After the expert has finished making any revision, the hyperparam eters v and w are set
equal to the two values in the last row of the table illustrated in Figure 5.3.
136
5.3
E licitin g a prior d istrib u tio n for th e scale param eter in
gam m a GLM s
In this section, we propose a novel m ethod for eliciting a lognormal prior distribution for the
scale param eter of a gamma GLM. It is well-known th a t the scale param eter of a gamma
GLM, which is the reciprocal of the dispersion param eter, is in fact the shape param eter of
the gamma distribution. Our new m ethod is a valid means of eliciting the shape param eter
of any gamma distribution once the distribution’s mean has been elicited (or the mean is
assumed to be known).
Bayesian methods have been developed for analyzing d a ta to estim ate the shape pa
ram eters of a gamma distribution, or the scale param eters of a gamma GLM. Miller (1980)
proposed a general conjugate class of priors for the two param eters of the gamm a distribution,
but he gave no m ethod of eliciting its hyperparam eters. Sweeting (1981) introduced some
suggestions for the Bayesian estim ation of the scale param eters in exponential families. The
problem of unknown scale param eters in GLMs was examined by West (1985). In his work,
he discussed general ideas concerning scale param eters and variance functions in non-normal
models including gamma GLMs, (see also West et al. (1985)). However, there does not seem
to be a good m ethod of eliciting a prior distribution for such param eters. Ibrahim and Laud
(1991) suggested a Jeffreys’s prior for the regression coefficients and an independent marginal
informative prior on the scale param eter of gamma GLM, but they did not suggest any fam
ily of distributions for this informative prior. The m ethod of Bedrick et al. (1996), which
is considered as the first elicitation m ethod of informative prior distributions for GLMs, as
sumed the scale param eter to be known and elicited priors only for the regression coefficients.
Chen and Ibrahim (2003) proposed a novel class of conjugate priors for GLMs. They also
discussed elicitation issues and strategies of these conjugate priors. Their proposed prior
structure involves the dispersion param eter as well. However, no explicit elicitation m ethod
was introduced for the dispersion param eter.
5.3.1
GLM s w ith a gam m a distribu ted response variable
For a continuous, positive, skewed distributed response variable £ in a GLM of the form,
Y = g(fi ) = g ( E ( ( ; \ X ) ) = a + p 1X 1 +( 32 X 2 + --- + PmXm,
(5.27)
the observations are often assumed to follow a gamma distribution, say
Gamma(A,0),
where A and 0 depend on X.- Its pdf is
/(C IA e) =
C, A, e > 0,
(5.28)
where A is the shape param eter, 0 is the rate param eter or the inverse of the scale param eter.
It is well-known th at
// - E(C} - A/d/.
^ - V » r '0 -- A/ft".
(5.29)
For the gamm a GLM in (5.27), with any monotone increasing link function <?(.), the
methods discussed in Chapters 3 and 4 can be used to elicit the prior distribution of the
regression coefficients
£=(«,
which represents the prior distribution
A/0.
A, A,
0m>’
(5’3°)
of fi, i.e. reflects the prior knowledge about the ratio
Weassume th a t the prior distribution of this ratio has already been elicited as
g ( \ / e ) ~ N ( r Qb,X!QX X 0),
(5.31)
where b = E((3), S = Var(/3), have been assessed using m ethods given in the previous
chapters, and the vector 2Lo denotes all explanatory variables to be at their reference points.
Having elicited this prior for the ratio A/0, the prior expert’s opinion about one of the
hyperparam eters A and 0 m ust be quantified to complete the prior structure of the gamma
GLM model. In w hat follows, expert opinion about the scale param eter A is modelled by
a lognormal prior distribution and we propose an assessment m ethod for determ ining the
138
hyperparam eters of this distribution. As discussed before, the proposed m ethod can be also
used to elicit a shape param eter A of any gamma distribution.
We base our m ethod on a gamma distribution with A as the only unknown param eter,
assuming fi to be already assessed or completely known. For gamma GLMs, the elicited
vector b can be used to obtain a single value of /i, say /iq, from (5.31). As we assume th at
the link function g(.) is monotonic increasing, the median value of X/ 6 is then
Mo =
0- 1CXo&).
(5-32)
We take the gamma distributed random variable £ defined in (5.28) and change param eters
by putting
6
= \ / f i as in 5.29. This gives
/ ( C|A,m) =
j~
G )
C,A,M > 0,
(5.33)
We let
W = -,
(5.34)
and then the pdf of W will depend only on A, i.e. W ~ Gamma(A, A). This has the form
/H A ) = -
L
w , \ > 0.
(5.35)
Our aim now is to find some meaningful strictly monotonic function in A, such th a t the
expert can quantify her opinion about this function effectively. The expert cannot answer
questions about A directly, as a gamma distribution param eter has little m eaning to an
expert because it is not an observable quantity. Instead, the expert should be asked about
an observable quantity th a t directly relates to the observable gamma variate, and which can
be monotonically transform ed to A. The expert can thus be asked about any quantile of
the gamma distribution as an observable quantity, provided th a t it is a strictly monotonic
function in A. In what follows, we show th a t quantifying the expert opinion about the lower
quartile of the gamma distribution in (5.35) will lead to a full prior distribution for A, and
th a t this quartile is a strictly monotonic function in A.
139
To checkthe monotonicity of different quantiles in A, let F ( w ,A, A) be the cdf of W , then
it can be w ritten in the form of a regularized gamm a function as follows
\ X) = ~
where
7
,
(5-36)
(A, A,w) is a form of the lower incomplete gamma function,
rw
'y(\,\,w )= /
Axt x~ 1 e~^tdt.
(5.37)
J t =o
Note th a t it differs from the usual lower incomplete gamma function
7
(A, w ) in th a t the latter
does not contain Aa in the integrand.
It is clear th a t the function F(w, A, A), as a cdf of IT, is strictly monotonic increasing in
w. But, as a function in A, the usual cdf
•F ( w
,A
)
= 2r ^ ’
( 5 '3 8 )
as a regularized gamma function is strictly monotonic decreasing in A. The proof of this fact
is given in Tricomi (1952), see also Gautschi (1998).
We next show th a t the same type of monotonicity is true for the function F(w, A, A) in
(5.36). This helps in finding a range of quantiles th a t are monotonic functions in A.
In fact, following the note of Koornwinder (2008) for F(w, A), we can write
\ ^ _
F (w j A, A)
t (a , a , w )
,. v
T(A)
_
7
7(A,A,u;)
f \ \ \ i t>/'\ \ \>
(A, X, w ) + r(A , X,w)
,
,
(p.oyj
where j( X ,X ,w ) takes the form in (5.37), and T(X,X,w) is a form of the upper incomplete
gamma function, i.e.
r co
r(A , X,w) = / A
(5.40)
J t=w
Differentiating (5.39) with respect to A, we have
dF(w, A, A)
—1 f
N$r(A , X,w)
,
^ 7 (A, AjU;)!
a\
= r n \ ) \ ^ x ’x ' w ) ax
~ r{x' x'w ) a x \ '
The quantity in curly braces can be written, after getting the derivatives as,
,
.
{5A1)
So, the function F ( w , \ , \ ) is monotonic decreasing in A if log{t/u) — (t — u) > 0 in the
integration domain, i.e. if
roo
/
rw
te^dt > /
Jt = 0
J t=w
te^dt.
(5.43)
Apparently, the above condition is fulfilled if
w < median of G am m a(2,1) = 1.678.
(5.44)
Hence, from the positive skewness of a gamma distribution, and for all 0 < a < 0.5,
wa < w 0,5 < E ( W ) = 1 < 1.678,
VA > 0,
(5.45)
where wa is the a-quantile of W .
Prom (5.44) and (5.45) we can see th a t F(w, A, A) is strictly monotonic decreasing in A
for all quantiles w, such th a t w < wq^. However, we believe th a t the expert can efficiently
quantify her opinion about quartiles more easily by using the bisection m ethod, see for
example P ra tt et al. (1995). So, we choose the lower quartile, u>o.2 5 >as a monotonic function
in A since the function F(ico.2 5 >A, A) is decreasing in A. Note th a t the opposite is not true, i.e.
if w > iuo.5 then w is not necessarily greater than 1.678, and no monotonicity is guaranteed
for wo.7 5 , for example.
Another reason for choosing the lower quartile and not the upper quartile, beside mono
tonicity as discussed above, is th a t the lower quartile is more sensitive than the upper quartile
to changes in the the shape param eter A at any fixed value of the mean. Figure 5.4 illustrates
this fact; it shows the changes in both the lower and upper quartiles of gamma distributions
due to the change of its param eter value A, for different fixed mean values at 0.5, 5, 50, and
500. It can be seen from Figure 5.4 th a t the lower quartile is more sensitive th an the upper
quartile to the changes in A at fixed mean values.
141
Mean=0.5
Mean=5
co
CO
o
o
in
d
co
co
o
o
CM
CM
2
4
6
8
10
2
4
6
8
10
8
10
L am bda
Mean=50
Mean=500
o
oo
CO
O
CO
oo
o•>sfoCO
oCM
oo
CM
2
4
6
8
10
2
4
6
L am bda
Figure 5.4: Changes in quartile values with the change of A at different mean values.
Now, since F(w, A, A) is strictly monotonic increasing in w and strictly monotonic decreas
ing in A, for w < iuo.5 , then fixing F(w, A, A) = 0.25, the lower quartile itfo.25 is an implicit
monotonic increasing function in A, say
W0.25 = h*( A).
(5.46)
Qi = M*(A) = MA),
(5.47)
Hence, from (5.34), we have
where Q 1 is the lower quartile of £, and h(.) is a monotonic increasing function of A.
The expert will be asked to assess three quaxtiles of her prior distribution for Q\. Then,
from the monotonicity of h{.) in (5.47), these quartiles can be transform ed into the corre
sponding three quartiles of A. We assume th a t the prior distribution of A is a lognormal
distribution, and use the three transform ed quartiles to solve for the two param eters of the
lognormal distribution. The required assessment tasks to implement this m ethod using in
teractive graphical software are detailed in the next section.
142
5.3.2
A ssessm ent tasks
The expert is questioned about the lower quartile of the gamma distribution, Q\ say. However,
she is not simply asked to give a point estim ate of Qi - she is asked to give assessments th at
quantify her uncertainty about it.
Specifically, she is asked to give her lower and upper
quartiles for Q\ in addition to her median assessment of its value. Questions th a t make this
a meaningful task th a t an expert can reasonably be asked to perform are suggested later.
• Three quartiles of Qi will be assessed by the expert, say Qi,i, Q 1,2 and Q i ts, where the
median Q \ $ is a point estim ate of Q 1 , and Q ^ 3 —Qi^i is its interquartile range. Details
on how to ask about these quartiles are given later.
• Under the monotonicity of h{.) in (5.47), the three assessed quartiles Q i,i, Q i ,2 and
Qi,3 of Qi can be transform ed to the three corresponding quartiles of A|/i, say Q \ t\ ,
Qx, 2 and Qa;3 , respectively.
• Hence, we obtain the three quartiles Qa,i> Qa ,2 and Qx ,3 of the prior distribution of A
given /i, as
Qx,i — h ^(Qi,z),
(5.48)
i — 1,2,3,
where /i_1(.) can be implemented by numerically inverting the incomplete gamm a func
tion F(w, A, A) via a simple search procedure.
• From (5.47) and (5.48), if the three assessed values Q i,i, Qi$ and (^1,3 are the three
quartiles of Q \ , then Q\,\, Q \ $ and Q a ,3 are the three corresponding quartiles of A|/i,
respectively. Clearly
P r{Q i < Q u ] = Pr{(A|/u) < h
1 (Qi,*)}
= Pr{(A|/i) < Qx,i} = 0.25(f),
i =
1,2
,3.
(5.49)
• We assume th a t the prior distribution of A given fi is a lognormal distribution w ith two
hyperparam eters a and b of the form
(5.50)
143
The properties of the normal distribution are used to estim ate a and b from the trans
formed assessments Q \ j , i = 1,2,3.
• Since, from the assumed lognormal prior distribution in (5.50), we have
(In A|/x) ~ N(a, 6),
(5.51)
and using the fact th a t b = IQ R /1 .349, then clearly
„
i \
a — ln(Qx, 2 j,
L_ ln (Q Ai3) - l n ( Q A)1)
b—
l ~349
’
,c
(5.52)
• The prior structure of the gamma GLM param eters take the form
f ( v , A) = f(fj) x f ( \ \ n ) ,
(5.53)
where f{ n ) can be obtained from (5.31), and / ( A|/z) is given as lognormal (a, b).
This elicitation m ethod has been implemented in graphical user-friendly software th a t au
tomatically estimates the two hyperparam eters of the lognormal distribution.
The soft
ware has been developed as an add-on to the PEGS-GLM (Correlated Coefficients) soft
ware for eliciting the scale param eter A of the gamm a GLM. It is also freely available at
http://statistics.open.ac.uk/elicitation as a stand alone version, PEGS-Gam m a, for eliciting
the shape param eter A of a gamm a distribution with a known mean.
In the former case, the median
no
and and the lower quartile
Q
i of the response variable
£ at the reference point have already been elicited, see (5.32). For the latter case, the expert
is asked, in a dialogue box, to assess her mean value no and the lower quartile Q\ of the
gamma random variable. In both cases, these two assessments represents the first assessment
step, from which the software suggests reasonable initial values for the other two required
assessments.
The median value Q\,i is set equal to the assessed value of Q i, while the other two quartile
values Q iti and Q i )3 are suggested as
Qi,i = Q i ,2 - ^min(<5i)2, no ~ Q i j ) ,
144
(5.54)
These initial suggested values are used in (5.47) and (5.49) to get the three quartiles
Q\
2
1?
and Q® 3 of the param eter A, respectively. The inversion of (5.47) is done by the software
through a simple search procedure.
As in (5.52), these quartiles are used to compute the two hyperparam eters a and b of the
assumed lognormal distribution of A. Using a and b, the mean value of A, say fi\, is computed
from the lognormal distribution of A:
fi\ = e x p (a + i&2).
(5.56)
Then fi\ is used w ith the assessed mean value fio to draw the pdf graph of the gamma
distribution, G am m a(^,M A /^o)- A main panel is presented to the expert showing this pdf
graph; see the upper graph of Figure 5.5. The thick black line on this graph represents the
mean value /iq .
}[fij F ikitnqG cim nu P ara m e ter
C ondtional on th e given value o f th e m ean, adjust th e th re e es tim a te s of th e tow er quartile o f th e re s p o n s e variable:
Eliciting Q uartiles fo r QI of a Gamma distribution
0.09
-Q1 of The R esp o n se variable Y
0.08
0.07
14,7288
0.06
M ecSanofQ I:
0.05
|
Upper quartile o f Q1:|5.8236
0.04
0.03
0.02
r Q uartiles o f lam bda0.01
0.00
4.73
Low er quartile: |1.B401
10.0
R e s p o n s e v ariable Y
Median:
DISTRIBUTION OF THE LOWER QUARTILE Q1
|2.45
Upper quartile: |3.094
0.5
■H yperparam eters o f tognorm al0.4
0.2
j E ene
“In W S p T|
4.73
L ow er quartile Q1
Figure 5.5: The main software panel for assessing gamm a param eter
For statistical coherence of the assumed normal distribution of ln(A), the two normal
quartiles ln(Q® x) and ln(Q° 3) should be symmetrical around the normal mean, a = ln(QA,2 )145
To attain this, we assume th a t the expert is always more confident in assessing the median
value, than assessing the other two quartiles.
So we treat her original and transform ed
medians Q 1,2 and Qx, 2 , respectively, as being correct. Then we suggest two coherent sets
of quartiles Q i.i, Q 1,3 and Qx, i , Q a ,3 to replace the initial assessments
Q ° 3, respectively, as follows. First, Q a , i ,
Q a,3
Q i ,3 and
1?
are computed as the actual first and third
quartiles, respectively, of a lognormal distribution with the two elicited param eters a and b.
Then Q i,i, Q 1,3 are computed from Qx,i, Q \ , 3 , respectively, using (5.47) and (5.49).
The first group of values in the right-hand side panel of Figure 5.5 gives the values of the
three suggested coherent quartiles Q i,i, Q 1,2 and Q 1 ,3 . These quartiles are also drawn as the
three blue lines in the upper and lower pdf graphs of Figure 5.5. The second group of values
gives the three quartiles of A,
Q a,i>
Q A,2 and
Q a ,3 -
The elicited values of a and b are shown
as the third group of values in the same panel.
The lower graph in Figure 5.5 represents the elicited distribution of the lower quartile
Q 1 , with the three vertical blue lines representing Q i,i, Q 1,2 and $ 1 ,3 . The graph is intended
to help the expert check th a t the distribution is a reasonable representation of her prior
knowledge of Q\ . Although we do not assume any specific family of distributions for Q 1 ,
the pdf graph is drawn using pointwise numerical derivatives of the cdf of Q
This cdf is
obtained as in (5.49), not only for the three quartile points, bu t also for a sufficiently large
number of points. A set of 1000 points covering the whole range of Q\ has been used.
Hence, Figure 5.5 shows all the assessed and suggested quartiles of Q\ and A, with the
two corresponding values of a and b. The two pdf graphs of A and Q 1 are also presented
to the expert to show her the im pact of these quartile values and hyperparam eters on the
two distributions. The main assessment task th a t the expert is asked to perform uses the
following type of question. Let us suppose th a t the variable th a t has the gamm a distribution
is the period of tim e th a t a patient with some medical disorder may stay in hospital. Then
the expert will be asked to consider the length of tim e th a t a hypothetical patient, John,
will spend in hospital. She is told, “John has this disorder and will spend a tim e in hospital.
146
Suppose he is fortunate and does not spend as long as most people in hospital. Specifically,
suppose exactly 25% of patients with John’s disorder spend a shorter tim e in hospital than
John. Give your median assessment for the length of tim e th a t John spends in hospital. Now
give your lower and upper quartiles for this length of tim e.”
The expert will be shown suggested coherent assessments and graphs. If she finds the
suggestions a reasonable representation of her opinion, she can accept them, which finishes
the assessment procedure.
If they do not represent her opinion adequately, she has the
option of directly reviewing the median value Q
of Q i, or indirectly reviewing the quartiles
Q 1,1 and Q ^ 3 by changing the value of the hyperparam eter
6.
As discussed before, for
statistical coherence, changes m ust be made first to the value of b and then transform ed into
corresponding coherent changes in Q i)1 and Q 1 ,3 .
In principal, the expert can change Q 1,2 to any value in (0,^o)> and she can change b to
any positive value. However, to get a unimodal distribution for Qi, some restrictions must
be imposed on the values of a and b, as detailed below.
Although the relation between Q 1 and A, as given in (5.47), is strictly monotonic increasing
for all A > 0, the numerical second derivative of h(A) reveals a critical point of zero at
A = 0.5045. Therefore, the pdf of Q 1 is not guaranteed to be unimodal if the elicited values
of a and b lead to a non-neglectable probability of A < 0.5045.
To avoid an undesirable appearance of the pdf of Qi, we restrict the elicited lognormal
hyperparam eters a and b to satisfy
This condition insures (from the standard normal distribution) th a t
(5.58)
i.e. it guarantees th a t
Pr(A < 0.5045) < 0.001.
147
(5.59)
If condition (5.57) is not satisfied, the right hand side panel on Figure 5.5 will only allow
the expert to increase the value of Qi,2 , hence increasing a = ln(<3 i, 2 )> or directly decreasing
the value of 6.
A ‘Reset’ button is available for the expert to return at any time to the initial coherent set
of suggestions and graphs and review them again if she needs to. W hen the expert is happy
w ith the quartile values and the corresponding pdf graphs, she clicks ‘Done’ and obtains the
two corresponding hyperparam eters a and b as the output of her assessments.
5.4
C onclu din g com m en ts
To elicit an informative prior distribution for normal and gamma GLMs, expert opinion
must be quantified about both the regression coefficients and the extra param eters in these
models. In this chapter, two elicitation methods have been proposed to quantify expert’s
opinion about a prior distribution of the random error variance in normal GLMs, and a prior
distribution for the scale param eter in gamma GLMs.
A m ethod of assessing a conjugate inverted chi-squared prior distribution for the error
variance in normal models has been proposed.
The m ethod quantifies an expert’s opin
ions through assessments of a median and conditional medians of the absolute difference
between two observations of the response variable at the same design point. Conditional
assessments have been based on various sets of hypothetical future samples. These assess
ments depend only on the random error and have been used to elicit the inverted chi-squared
distribution. A computer program th a t implements the m ethod is available as an option
in the PEGS-GLM (Correlated Coefficient) software and also as an add-on to any other
elicitation software for normal models, PEGS-Normal. Both versions are freely available at
http: / / statistics.open.ac.uk/elicitation.
A novel m ethod for eliciting a lognormal prior distribution for the scale param eter of a
gamma GLM, or the shape param eter of any gamma distribution, has also been proposed.
The m ethod depends only on quantifying an expert’s opinion about the lower quartile of
148
a gamm a distributed random variable. This lower quartile is itself a random variable; for
which the expert assesses a median value as a point estim ate and an interquartile range. An
example of questions th a t can be addressed to the expert has been given. The interactive
graphical PEG S-Gam m a software implementing this m ethod is user-friendly. It gives coherent
suggestions for all the required assessments and presents instant graphical feedback. To the
best of the author’s knowledge, this is the first piece of interactive software th a t is designed
for eliciting a prior distribution of the shape param eter of a gamm a distribution or the scale
param eter of a gamma GLM.
149
C hapter 6
E liciting D irichlet priors for
m ultinom ial m odels
150
6.1
In trod u ction
M ultinomial models, consisting of items th a t belong to a number of complementary and
m utually exclusive categories, arise in many scientific disciplines and industrial applications.
For example, they are frequently encountered in geology for different compositions of rocks, in
microeconomics for patterns of consumer selection preferences, in political science for voting
behavior. O ther application areas include medicine, psychology and biology.
For m athem atical coherence, the probabilities of each category m ust be non-negative and
satisfy a unit-sum constraint. The multinomial distribution describes this model as a direct
generalization of the binomial distribution to more than two categories.
It is well-known th a t the Dirichlet distribution is a conjugate prior for the param eters
of multinomial models. The distribution preserves the unit sum constraint of multinomial
probabilities and imposes a simple Dirichlet p attern of dependency between them .
This
structure gives negative correlations between the probabilities of categories, as will be shown
later.
A different way of thinking about prior distributions for multinomial models is to use the
m ultivariate normal distribution as a large sample approxim ation to the Dirichlet distribution
or to the distribution of the log contrasts of the multinomial probabilities. Another option is
to estim ate the exact distribution of log contrasts using a Monte Carlo sample. Generalized,
nested or mixed forms of the Dirichlet distribution have been also introduced and suggested
as suitable priors for multinomial models. For more details on possible prior distributions for
multinomial models see, for example, O ’Hagan and Forster (2004).
Eliciting param eters of m ultivariate distributions is not, in general, an easy task. It is
even more complex when the variates are not independent, in which case summaries of the
marginal distributions should be assessed, together with effective and reliable summaries of
the dependence structure of the joint distribution [O’Hagan et al. (2006)]. In this chapter, our
proposed m ethod makes use of assessments of marginal beta distributions. Decomposition
of the Dirichlet elicitation process into the assessment of several marginal b e ta distributions
151
helps reduce the complexity of eliciting a m ultivariate distribution.
In Section 6.2, we develop a m ethod of quantifying opinion about a beta prior distribution
by the assessment of three quartiles. The m ethod will be generalized to elicit a Dirichlet
distribution in Section 6.3. The elicited beta univariate distribution will also be used to
construct more flexible distributions in the next chapter, including the generalized Dirichlet
prior and a Gaussian copula function for the prior distribution.
6.2
E licitin g b e ta p aram eters u sing quartiles
6.2.1
Introdu ction
The beta distribution is widely used in Bayesian analysis as a conjugate prior for the proba
bility of success in Bernoulli trials. The domain of definition for the beta distribution of the
first type is the interval [0,1], which is appropriate for the probability param eter of Bernoulli
and binomial distributions. Moreover, the beta distribution is also a conjugate prior for
Bernoulli and binomial sampling distributions, so th a t the posterior distribution is obtained
through simple arithm etic. The wide range of valid values of the two hyperparam eters of the
beta prior gives it great flexibility and its pdf has varied shapes. In this sense, the b eta dis
tribution is more likely to be a reasonable model of the expert’s opinion compared w ith other
priors such as the uniform distribution over the interval [0,1] or the triangular distribution
suggested by van Dorp and Kotz (2002).
It seems th a t eliciting beta param eters is. the most studied elicitation problem to date,
whether it is a beta prior for Bernoulli or Binomial sampling distributions, a distribution
of a probability of an event, or a proportion th a t ranges between zero and one.
There
are many methods available in the literature for eliciting b e ta distribution param eters. A
comprehensive literature review may be found in Hughes and M adden (2002), Jenkinson
(2007) or O ’Hagan et al. (2006).
The available methods for beta elicitation can be classified into two general classes of
152
elicitation methods, variable interval and fixed interval. In the variable interval methods, the
probability is fixed and the expert assesses an interval th a t gives this probability. In the fixed
interval methods, the interval is fixed and the expert assesses the probability th a t the event
of interest will be in th a t interval. Asking about quartiles is an example of the first methods,
while assessing probabilities is an example of the second class of methods.
Beta elicitation m ethods vary in the quantities th a t the expert m ust assess. She may be
asked to assess a location value such as the mean, the median or the mode. Also, a scale
value must be assessed, such as the probability of being in an interval, the boundaries of an
interval, or the mean absolute deviation about a location value. These quantities may be
converted into the hyperparam eters in exact forms or through numerical approximation.
Regarding the number of required assessments, most of the available methods use only
two assessed quantities, usually one for location and the other for scale. These give estim ates
of the two beta param eters. Although only two assessments are m athem atically needed to
elicit two unique param eters, some methods use over-fitting through assessing three or more
quantities, followed by some sort of averaging or reconciliation.
In this section we propose a new m ethod of eliciting the param eters of a b e ta prior
distribution for the binomial success probability. Assessments of the m edian and two quartiles
are elicited. A compromise is needed to reconcile these three assessments into two unique
param eters. We use a normal approxim ation to the beta distribution to estim ate initial
values of the beta param eters, followed by a least-squares technique to optimize the two
initial values. According to the classifications given above, the proposed m ethod is a variable
interval m ethod th a t uses three assessments, a median and two quartiles.
We believe th a t it is better to elicit a median as a location value and quartiles for scale,
than, say, to elicit a mean and other quantiles. The m edian and quartiles are easier for an
expert to assess as they are obtained by the first two steps of equally likely subdivisions (bi
section m ethod). The expert can be asked about the median as the value th a t the probability
of success is equally likely to be above or below. Then we ask the expert to sub-divide the
153
interval above the median into two equally likely intervals for the probability; her assessed
value is her upper quartile. The same concept is used for the interval below the median in
order to obtain her lower quartile.
van Dorp and Mazzuchi (2000, 2003, 2004) introduced a numerical algorithm and software
to specify the param eters of the beta distribution and its Dirichlet extension using quantiles.
They used the median as a measure of central tendency w ith any other single quantile as
a measure of dispersion. Although they proved th a t this suffices m athem atically for the
existence of a unique solution for beta param eters, it is more useful in elicitation contexts to
use over-fitting as a means towards better representation of an expert’s opinion.
6.2.2
N orm al approxim ations for b e ta elicitation
To estim ate the two param eters of the beta distribution using three assessed quartiles, we
propose a two step approach. In the first step, a normal approxim ation for the beta distri
bution is used to transform and reconcile the three assessed quartiles as two initial values for
the beta param eters. In the second step, a numerical least-squares m ethod is applied to the
initial param eter values so as to optimize them . The aim is to find param eter values th a t give
nominal quartiles th a t are as close as possible to the assessed values. This section is devoted
to the proposed normal approximation, while the least-squares optim ization is discussed in
Section 6.2.3 below.
A m ethod th a t directly fits a beta distribution to the assessed median and two quartiles
is given in P ra tt et al. (1995). They used a normal approxim ation for the beta distribution
together with averaging. The m ethod was also used as the main assessment m ethod in a study
of the effect of feedback and learning on the assessment of subjective probability distributions
(Stael von Holstein, 1971). Our proposed m ethod adopts the technique of P ra tt et al. (1995),
but with a different normal approximation and a new compromise to get initial param eter
values. We also add a least-squares optim ization technique. In w hat follows, we summarize
the argument of P ra tt et al. (1995) and then propose a different normal approxim ation and
154
a different compromise.
Let p be the success probability of concern, and assume th a t p has a conjugate standard
beta prior distribution of the form
f{jp)
=i
- p ) 6" 1,
0 < p < 1, a > 0 , b > 0 .
(6 .1)
P ra tt et al. (1995) stated th a t the transform ation
Z = 2 { [ p ( 6 - l / 3 ) ] 1/2- [ ( l - p ) ( a - l / 3 ) ] 1/ 2}
(6.2)
has approximately a standard normal distribution. Let qi be the zth quartile of p th a t is
assessed by the expert, for i = 1,2,3. Using the assessed lower quartile qi and the assessed
median q2 , we get the following two equations from (6.2):
P r \ Z < 2 {[?i(6 - 1/3)]1/2 - [(1 - qi)(o - 1/3)]1/2} } = 0.25,
(6.3)
P r { z < 2 { [ © ( 6 - l / 3 ) ) 1/2- | ( l - ? 2) ( a - l / 3 ) ] 1/2} } = 0 .5 .
(6.4)
Solving (6.3) and (6.4) for a and b gives
ai = ci <72 + g
(6 -5)
bi = c i( l - q2) +
(6.6)
and
where
-2
ci = 0.112 {[g2(l - q i )}1/2 ~ fei(l - ©)]1/2}'
Similarly, the assessed upper quartile, qs, gives the equation
P r { z < 2 { f e ( 5 - l / 3 )]1/ 2- [ ( l - © ) ( a - l / 3 )]I/ 2} } = 0.75.
(6.7)
Solving (6.4) and (6.7) for a and b gives
«2 = C2<?2 + g
(6-8)
b2 = C2(l —9 2 ) +
(6-9)
and
155
where
c2 = 0.112 |[ g 2(l - <?3 )]1//2 - [9 3 ( 1 - ?2 )]1//2}
The compromise of P ra tt et al. (1995) is simply to estim ate a and b as the average of (6.5),
(6.6), (6.8) and (6.9), i.e.
a\ + a2
a = = _ ’
z
bi + &2
6= ^ —
(6.10)
However, P ra tt et al. (1995) did not mention the theoretical derivation of the approx
im ation in (6.2), nor its accuracy. So, we tried to use another approxim ation th a t is still
m athem atically tractable, but whose justification and accuracy have been investigated. P a
tel and Read (1982) give a good review of some accurate normal approximations to beta
variables. They describe the following normal approxim ation as a simple yet accurate ap
proximation.
If p has a beta distribution of the form in (6.1), then the transform ation
Z = 2 {[p(6 - 1/4)]1/ 2 - [(1 - p)(a - 1/4)]1/2} ,
has an approxim ate standard normal distribution. The absolute error of this
(6.11)
approxim ation
is of order
We adopt the approxim ation (6.11) to propose a new elicitation m ethod for the b eta param
eters a and b using the three assessed quartiles <&, i — 1,2,3.
Instead of direct averaging, we introduce a new compromise, making use of the charac
teristics of the normal distribution. In fact, it is well-known th at
20.75 ~ 2o.25 = 1-349,
(6.12)
where zo.25 and zo.75 are the lower and upper quartiles of the standard normal distribution,
respectively.
156
In view of the approxim ation (6.11), we have
[<n i b - 1/4)]1/ 2 - [(1 - q2)(a - 1/4)]1/ 2 = 0.
{[gi(6-
(6.13)
1/4)]1/2 - [(1 - qi)(a - 1/4)]1/ 2} ,
(6.14)
*0.75 = 2 {fe(& - 1/4)]1/2 - [(1 - ®)(<t - 1/4)]1/2} .
(6.15)
20.25 = 2
Substituting with (6.14) and (6.15)
in (6.12) we
get the new compromise between q\ and qs
as
{\q3(b - 1/4)]1/ 2 - [(1 - q3)(a - 1/4)]1/ 2} {[9 l(i>- 1/4)]1/ 2 - [(1 - qi)(a — 1/4)]1//2} =
Solving (6.13) and (6.16) for a and
(6.16)
6, we get
a = cq2 + ^
(6.17)
&= c ( l - g 2) + i ,
(6.18)
and
where
c = ( L M 9 f { fc(1 _ 5i)]1/2 _ fc (1 _ 9j)]1/2 +
f e ( l - 9 2 ) ] 1/2- [ ? 2 ( l - ? 3 ) ] 1/2} ''2 .
We argue th a t our m ethod preserves the assessed median value and the only compromise
is between the two quartiles. We believe this will represent the expert’s opinion better. The
expert
usually assesses her median with more certainty and less bias th an her lower and
upper quartiles. By using the new compromise of quartiles in (6.16) and keeping the m edian
equation (6.13) fixed, we reflect the probable greater accuracy of the median assessment.
According to the accuracy of the normal approxim ation, the proposed initial values of the
beta param eters, given in (6.17) and (6.18), lead to nominal values for the b e ta quartiles th a t
are close to the assessed quartiles. However, they are not guaranteed to be the param eter
157
values th a t minimize the differences between nominal and assessed quartile values. This is
not ideal, so we just treat equations (6.17) and (6.18) as giving initial param eter values th a t
can be improved upon.
6.2.3
L east-squares optim ization s for b e ta param eters
Oakley (2010) gave a least-squares m ethod for choosing beta param eters a and b th a t minimize
Q = [F(qi,a,b) — 0.25f + [F(q2, a, 6) - 0.5]2 + [F(q3, a, b) - 0.75]2 ,
(6.19)
where F ( x , a, b) is the cdf of a b eta distribution with param eters a and b at the point x.
The same approach has been implemented in the SHELF elicitation framework developed
in Oakley and O ’Hagan (2010). They introduced an R package of tem plates and software
for conducting elicitation, within which minimizing Q in (6.19) was used to estim ate beta
param eters from assessed quartiles. However, they do not use any explicit normal approx
im ation to a b eta distribution when deriving the initial estim ates of the beta param eters.
Instead, they just transform the assessed b e ta quartiles into the mean and variance of a nor
mal distribution, as if the quartiles were assessed for a normal distribution. The mean and
variance are then assumed to be those of a beta distribution, from which initial values for
the param eters can be computed.
Our accompanying elicitation software, PEGS-Dirichlet, implements program s w ritten by
Flanagan (2011) for the Java scientific library. These numerically minimize (6.19), which
cannot be minimized analytically. They use a multidimensional technique called the down
hill simplex m ethod. The m ethod was introduced by Nelder and Mead (1965) as a quick
multidimensional minimization m ethod th a t uses only function evaluations, not derivatives.
To constrain beta param eters to be positive, we transform them to a logarithm ic scale.
Hence we actually minimize
Q = (F[gi,exp(a*),exp(&*)] - 0.25}2 + {F[q2, exp(a*), exp(&*)] - 0.5}2
+ {F[g3, exp (a*), exp(&*)] - 0.75}2 ,
158
for a* and b*, w ith initial values as in (6.17) and (6.18), bu t on the logarithmic scale, i.e.
log(a) and log (6). The final resulting beta param eter values are thus exp (a*) and exp(&*).
Our elicitation software, PEGS-Dirichlet, presents an interactive graph to the expert
showing the previously assessed probability medians of all categories. The expert is asked
to assess a lower and an upper probability quartile for each category by clicking on the
graph. Once the two required quartiles are assessed for any single category, the proposed
m ethod of beta param eter elicitation is implemented by the software on the probability of
this category. A pop up window opens showing the pdf graph of the elicited b e ta distribution
w ith the location of the three assessed quartiles. This gives instant feedback to the expert,
see Figure 6.1.
Tho Bota Distribution of P2
/•>
You a s s e s s e d t h e u p p e r q u a rtile p robability o f c a te g o ry (C a te g o ry 2) t o b*
0.5
P2
Eliciting Q uarttlos of tfco probabilities of e a c h c ategory
E
Category 3
Categories
Rusi
» tfim rep^aicnA ?...1
rwpn
liar............................................................................. r D 5 )o g ? B u*6
.
I*.«■■»>«»a a a a a
»-'• I
Figure 6.1: Assessing probability quartiles of each category
If she is not satisfied w ith the fitted beta distribution, the expert can simply change her
assessments of the two quartiles. The whole elicitation process is applied again whenever the
expert changes her quartile assessments. The pdf curve is interactively changing to show the
direct impact of changing quartiles.
159
On finishing the elicitation process for all categories, the b e ta param eters are then com
promised to estim ate the Dirichlet hyperparam eter vector as discussed in Section 6.3, below.
6.3
E licitin g a D irich let prior for a m u ltin om ial m od el
6.3.1
Introdu ction
A limited num ber of attem pts have been made to develop elicitation m ethods for Dirichlet
param eters, see C hapter 2 for more details. Jenkinson (2007) and O ’Hagan et al. (2006)
discussed two methods for Dirichlet elicitation. Namely, the m ethod of Dickey et al. (1983)
and th a t of Chaloner and Duncan (1987).
The elicitation m ethod suggested by Dickey et al. (1983) starts by assessing the probability
of each category directly from the expert. She will then be given a hypothetical future sample
of a fixed size and told the number of items in each category. She is asked to re-assess the
probabilities given this hypothetical sample. The equivalent sample size th a t corresponds to
her prior knowledge can thus be estim ated using Bayes’ theorem.
Chaloner and Duncan (1983) give a m ethod for eliciting a beta distribution. Chaloner
and Duncan (1987) generalize this m ethod and give an interactive graphical tool for Dirichlet
elicitation. This is based on assessing the sample size and the modal values of Dirichlet
variates, and then giving feedback to adjust the param eter values.
As mentioned before, van Dorp and Mazzuchi (2003, 2004) introduced a numerical algo
rithm th at yields the Dirichlet param eters from quantile assessments. Their algorithm uses k
quantile assessments to estim ate all the param eters of a ^-dimension Dirichlet distribution.
However, we believe th a t it is better to assess more than k quantiles and then apply some
form of reconciliation to estim ate the param eters.
Assuming a Dirichlet prior for the success probabilities is one way of reconciling separate
marginal beta prior distributions. Eliciting a Dirichlet prior by using assessed b e ta m arginal
distributions was outlined in Bunn (1978, 1979). However, his elicitation m ethod used the
160
hypothetical future sample technique. He stated th a t the application of the usual univariate
quantile m ethods may generally be difficult and tedious in practice because of the multivari
ate nature of the Dirichlet distribution. However, the availability of interactive graphs and
efficient computing enables us to use the quantile m ethod in an elicitation m ethod th a t is
easy for the assessor and quick.
In what follows, we propose some reconciliation m ethods, based on the Dirichlet dis
tribution, of combining beta marginals th a t have already been assessed using the m ethod
introduced in Section 6.2.
6.3.2
T he m ultinom ial and D irichlet distributions
Let the random vector X = ( x 1; X 2, • • • , Xf.)
niultinomially distributed with k cat
egories, n trials and a vector of probabilities P = ( p 1, p2> • *• 9 Pk)’
f{
x
>
x k)
f°rm
(6 .20)
Xi\x2\ ••■ajfc!
0 < Xi < n,
J2 x i = n > 0 < Pi < 1,
J^Pi = 1,
or, equivalently, in the form
f ( x i , x 2, • • • ,Xk) =
-^pTpT • • -P ^ U
x \ \ x 2\ •••Xk'.
0 < xi < n,
-PI - P
l > * = n,
<Pi < 1 ,
0
P k - i T k,
2
(6.21)
£ p * < l.
A conjugate prior for the param eter vector p is the Dirichlet distribution, which has the form
7r(pi,P2,-- - ,Pk) =
(6 .22)
r(ai)r(a2) ••-r(afc)
0 < p i < 1,
J^Pi = b
ai > 0)
N = J2 ai,
or, equivalently, the form
n(pi,P 2 , - ' ,Pk- 1 ) =
T(N)
p
r(a i)r(a 2 )---r(a ifc )
„ x- i p „ 2 _ i . . .
0<P*<1,
161
(1
J2Pi < b
_
pi _ p2
a-i > 0,
. . .
_
N = ^ 2 ai-
( 6 .2 3 )
It is well-known th a t the expectations, variances and covariances of the Dirichlet variates
P i,
for i = 1,2, • • • ,k, are given by
(6.24)
(6.25)
(6.26)
To elicit the vector of hyperparam eters a = (ai)
. . . ? afc), we use the direct relation
between the Dirichlet distribution and its special univariate case, the beta distribution. We
have already developed, in Section 6.2, a m ethod of eliciting the two hyperparam eters of a
beta distribution. The hyperparam eters of the Dirichlet distribution can be induced from
those of the univariate b eta distributions through some form of reconciliation. This can be
done using either the standard marginal beta distributions of the multinomial probabilities, or
the conditional scaled beta distribution of each of them . In w hat follows, these two proposed
approaches are given in detail.
6.3.3
T he m arginal approach
Consider the form in (6.20) for the multinomial distribution with the conjugate prior Dirichlet
distribution in (6.22). It is well-known that, from (6.20), the marginal distribution of each
X{ is a binomial distribution with the two param eters
rii - n,
pi,
i = 1,2, • • • , k.
It is straightforward to show, using the Dirichlet pdf in (6.22), th a t the m arginal distribution
of each pi is a beta distribution:
Pi
where
~ beta(ai,/% ),
for i = 1,2, • • • , k
(6.27)
A ssessm en t task s
Exploiting the beta marginal distributions, the elicitation process may be divided into k
At each step, the expert will be asked to assess three quartiles for pi, the binomial
steps.
probability of category i (i = 1,2, • • •,k). SeeFigure 6.1, where the lower and upper quartiles
have already been elicited for the first two categories. These quartiles can then be used to
estim ate the two hyperparam eters a.{ and fa of the beta prior distribution of pi, as proposed
in Section 6.2. Since we use the marginal approach, the categories here are interchangeable.
It does not m atter where to sta rt assessing nor the order of the categories.
To reconcile these separate marginal beta distribution into a Dirichlet distribution, we
use a least-squares technique as follows.
Least-squares techniques
It is clear th a t the system of equations in (6.28) does not have a consistent solution, a =
(ai
<22
•••
afc)'
■^rom (6-28), each marginal step of the elicitation process provides
estimates of ai and N{, namely
for i = 1,2, • • • ,.fc,
(6.29)
for i = 1,2, • • • , k.
(6.30)
and
Nj,
—
on
-f-
fa
aj,
— ^
The estim ated hyperparam eters must fulfill the unit sum constraint of the probability expec
tations, i.e. they m ust satisfy
k
X > = 1,
i= 1
where
« = # .
* = 1 ,2 ,--- ,fe.
(6.31)
Lindley et al. (1979) investigated the reconciliation of assessments th a t are inconsistent with
the laws of probabilities (incoherent). They developed least-squares procedures as recon
ciliation tools th a t may be used for any expert’s incoherent assessments. Following their
163
approach, we propose the following options for reconciling different incoherent estimates of
Hi and N , yielding coherent estimates /r* and N *, respectively.
O p tio n s fo r
h *:
1. Normalize each Hi> as required for the Dirichlet distribution, giving
(6.32)
2. Minimize the sum of squares of differences between
h * and
/i*, i = 1,2, • • • , fc, subject to
the constraint Y a =i Mi = 1- ^ h is can be done using Lagrangian multipliers to minimize
Q as follows.
k
Minimize Q = ^ ( m * - Hi f +
k
M* ~ !)•
(6.33)
Solve for ya*, giving
(6.34)
However, the values of h * computed here using Lagrangian optim ization are not guaran
teed to be positive. If negative values are found, we replace the Lagrangian multipliers
m ethod with a numerical restricted minimization technique.
The downhill simplex
m ethod of Nelder and Mead (1965) can also perform restricted minimization as follows.
k
Minimize Q = ^^(M i ~ Mi)2?
(6.35)
0 < Hi < 1,
(6.36)
such th at
i = 1,2, ••• ,k,
k
(6.37)
To solve this restricted optim ization problem, for /z*, i = 1,2, ••• , fc, our elicitation
software, PEGS-Dirichlet, implements a program for minimization w ritten by Flanagan
(2011). The initial values for this m ethod are obtained from (6.32).
3. The option in (6.34) changes each value of Hi by adding a fixed amount. However,
the precision of each estimate, i.e. the inverse of its variance, can be used as a weight
to reflect the expert’s confidence in each of her assessments [Lindley et al. (1979)]. A
constrained weighted least-squares procedures can be formulated as follows.
k
k
Minimize Q =
_ Mi)2 +
MX^ ^
~ ^
(6.38)
i—1
z=1
where
aiPi
Wi = [Var(pi)]_1 =
1 1
X a i + Pi +
,
i = 1,2, • • • , k.
(6.39)
+ A )2.
Solving for /i* gives
1
Hi=lH+
* = 1.2, --- ,fc.
-V
Again, the minimization m ethod implementing the restricted downhill simplex m ethod
is used if negative values of /i* are found:
k
Minimize Q = X ^
■■ ■
— fii)2,
(6.40)
7=1
under the same constraints given by (6.36) and (6.37), using initial values as in (6.32).
O p tio n s for N *:
1. Since no constraints are imposed on N*, minimizing the sum of squares
k
Minimize Q = X^(-^* —Nj)2,
7=1
gives the average
N* = ^ i=l
k
N-
\
(6.41)
2. Using the same weights as in (6.39) gives the weighted average
N* =
WiNi
(6.42)
as a solution of
k
Minimize Q =
— Ni)2.
7=1
Estim ating fi* and N *, using any of the options listed above, makes it easy to estim ate
a{ by a*, where
a* = ntN*,
i = l,.2 ,...,f c .
165
Implementation and feedback
We use three different combinations of the options given above as follows:
1. Direct normalization of fi? as in (6.32) and the average N* in (6.41).
2. Least-squares optim ization for /i* as in (6.33) or (6.35), and for N* as in (6.41).
3. Weighted least-squares optim ization for f.11 as in (6.38) or (6.40), and for N* as in
(6.42).
The software elicits three hyperparam eter vectors of the Dirichlet distribution, one vector
for each of the above combinations. Each vector is then used to compute the corresponding
pairs of marginal beta param eters as given in (6.28). Three quartiles for each b eta m arginal
are computed numerically for each different Dirichlet hyperparam eter vector. The three sets
of quartiles are then displayed to the expert and she is asked to select the set of quartiles
th a t best represents her opinion. The vector w ith the selected set of quartiles will be taken
as the final elicited hyperparam eter vector of the Dirichlet prior. See Figure 6.2, where the
first two combinations are shown and the expert has selected the second one.
H e re a r e y o u r un co n d itio n al a s s e s s m e n ts , y o u m a y c h a n g e a n y of th e m !
Fie
E<St
Tods
Help
U nconditional M edians and q u artiles already a s s e s e d fo r Each C ategory
B
1
i
Category t
Category 2
Category 3
C ategories
r';
S ip w w i l
’O o p o o n a r
Figure 6.2: A feedback screen showing 2 different quartile options
166
The expert is still able, however, to modify any or all of the selected set of quartiles,
in which case beta param eters are computed again as in Section 6.2, and the final Dirichlet
hyperparam eter vector is computed according to equations (6.29) - (6.32) and (6.41).
6.3.4
T he conditional approach
Consider the form of multinomial distribution given in (6.21), with the form of conjugate
Dirichlet distribution given in (6.23).
If
P k_ i ~
ipi
p2
•••
P k —i )
~ D hichlet(ai,
02,
• • • , &k)
then it can be shown [e.g. Wilks (1962)] th a t the marginal distribution of any subset of P k _ x
is again a Dirichlet distribution, e.g.
k
Pr = (pi
p2
• • ■ pr ) ~ Dirichlet (ai, a2, • • • , ar ,
1 < r < k - 1.
Oj),
i=r+1
For l < r < f c — 1, we can get the following conditional scaled beta distributions
(6.43)
(6.44)
which are the scaled beta distributions over the intervals (0,1 —
1 Pi)-
The distributions
in (6.44) are also known as three param eter beta distributions, i.e.
k
r —1
i=r+ 1
i=l
for 1 < r < k — 1.
Applying the transform ation
(
for r = 1
gives
k
for r = 1,2, • • • , k — 1.
i= r+ 1
167
(6.45)
A ssessm en t task s
The elicitation process is conducted as follows:
• The expert chooses the most convenient category to sta rt with; we denote its probability
as p i .
• The expert assesses three quartiles for pi, which are then converted into estim ates of
the two hyperparam eters a\ and Pi of the beta distribution, beta(o:i, Pi).
• The expert is asked to assume th a t the median value she gave in the first step is the
correct value of pi, and she then assesses three quartiles for p 2 - Figure 6.3 shows the
graph after the median and lower quartile of the second category have been assessed
by the expert, given the median of the first category as shown by the red bar.
:.r :z~z;.;:;i:zzzrziziz^^
" ~
~
You a s se s s e d th e lower quartile probability of category (Category 2) to be (0.213).
a —
—
.................. z —
1
fit Edt Tools Help
Eliciting Q uartiles of th e probabilities of C ategory (C ategory 2)
0.95
0.90
085
0.80
0.75
0.70
0.65
0.60
0.55
0.50
|
0 ,5
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
Category 1
Category 2
C ategory 3
Category
4
C a tego ries
niadTl
fHwT|
Figure 6.3: Assessing conditional quartiles for Dirichlet elicitation
• Dividing each of the three quartiles of P2 by 1 —pi, we get the quartiles of p\. Hence we
obtain estim ates of the hyperparam eters a 2 and P2 of the m arginal b e ta distribution of
P i
168
• The process is repeated for each category except for the last one. For r = 3,4■ • • , k —1,
the expert gives quartiles for (pr \pi,P2 >• *• ,P r-i)- Dividing by 1 — J2i=i Pi gives the
three quartiles of p*, which are used to estim ate the two hyperparam eters a r and (3r of
its m arginal distribution. (We do not require the m arginal distribution of pk-)
• To help the expert during this task, the software presents an interactive graph show
ing the pdf curve of the conditional beta distribution of (pr \pi, "
for r =
2,3, •• • ,k — 1, see Figure 6.4. The expert is able to change her assessed conditional
quartiles of pr until she finds the conditional pdf curve an acceptable representation of
her opinion.
Tho C onditional Scalod B ota D litrlbutlon of P2
4.0
35
3.0
2.5
1.5
1.0
0.0
0
0 21
0.3
0.37
0.5
FM EdK T00t» Help
Eliciting Q uarttlot o f tfio probability* of Catogory (Catogory 2)
Category 3
Category 1
Hw-.-f
£0 i g
j j io j
**
| Q !■«!»»hgp_________ |
Category 4
fwgrl
| |f f l TUg C
Figure 6.4: Assessing conditional quartiles w ith scaled beta feedback
Eliciting the hyperparameter vector
Using (6.45), we get the following system of equations
(Xj*
for r = 1,2, • • • , k — 1,
Cby J
(6.46)
k
Pr =
air
for r = 1,2,
i= r + l
169
— 1.
Each elicited beta distribution has its own different estim ate of N , given by
Nr
—
^ ^ OL{ T
Oir
T
fin
(6.47)
based on ai, i — 1,2, • • • , r — 1, which has been estim ated in previous steps.
The system of equations in (6.46), as in the marginal approach, might not be consistent
nor have a unique solution for a = (a1} a 2 , • • • , a*,)- So, we try to find a way of averaging
this system to get a vector of estimates a* = (a *; a *? . . . ? a *) th a t is a good representa
tion of the expert’s opinion. We believe th a t keeping the mean value fixed, where possible,
while moving from different beta distributions to a Dirichlet distribution may be a sensible
approach.
Using (6.24), put
Hence, in view of (6.46) and (6.47)
r = 1,2, — , k - 1,
r
(6.48)
file - 1
k-
r = k.
1
Since, for the Dirichlet Distribution, it is required th a t
k
k'r — 1)
r —1
we normalize the set of /ir , for r = 1,2, • • • , k, to get
Moreover, let
and take
a*r = n*rN*,
r = 1,2,-•• ,k.
It remains now to find a proper estim ate of N*.
We take this as the average of all the
denominators in (6.48):
k—1 f
k -1
r
E
E ai +
T—1 —1
i
=1
&i + Pk- 1
k
Changing the expert’s selection of the first category, as well as the order of conditioning
categories at each step, will lead to different estimates of a. To overcome this, one possibility
is to repeat the whole process several times, using different starting categories and orderings.
This will give sets of estimates a*’s, for which a simple averaging might give a suitable choice
for a*. However, showing the m arginal quartiles of the marginal beta distributions as a
feedback to the expert and offering her the option of changing them seems another sensible
option.
Feedback
The feedback process for the conditional approach is similar to th a t for the m arginal approach.
The main difference is in the relationship between the Dirichlet hyperparam eters and beta
param eters in the two approaches.
To present the quartiles of each probability pi, i =
1,2, • • • , k, as feedback to the expert after applying the conditional elicitation approach, we
must first compute the param eters of the marginal beta distributions.
The two param eters oci and
of each marginal beta distribution of pi, i = 1,2, • • • , k,
can be simply computed from the already elicited hyperparam eter vector a* of the Dirichlet
distribution:
k
171
These can be used to compute numerically the three quartiles of each beta m arginal distri
bution. The computed quartiles are then presented to the expert as feedback, see Figure 6.5.
m
-Ini x |
H e re a r e y o u r unco n d itio n al a s s e s s m e n ts , y o u m a y c h a n g e a n y o f th em !
W-
JSlx}
: T ie B B T o rts (to p
U nconditional M edians a n d q u artiles alre ad y a s s e s e d fo r Each C ategory
J
C ategory 3
Change Ita fia n s j
J
O Change Quartiles
C ategory
j_
4
fniuinr»~|
Figure 6.5: The feedback graph presenting m arginal quartiles
The expert is asked to change any of the quartiles th a t do not satisfactorily represent her
opinion. If any (or all) of these m arginal quartiles are changed by the expert, we apply the
m arginal approach to re-elicit the Dirichlet hyperparam eters as follows.
The new set of modified m arginal b e ta quartiles are used to elicit new pairs of beta
param eters as proposed in Section 6.2. Using these new param eters, together w ith equations
(6.29), (6.30) and (6.31), we apply the first combination proposed in the m arginal approach
in Section 6.3.3. We implement the first combination th a t uses simple averaging as a quick
and straightforw ard way to recompute the Dirichlet hyperparam eter vector using th e new
set of modified quartiles. The whole process can be continuously applied until the expert is
satisfied w ith the quartiles in the feedback.
172
6.4
C oncluding com m en ts
A reasonable m ethod for eliciting beta param eters using quartiles has been proposed. The
m ethod combines two different approaches th a t have been used separately in the literature.
A normal approxim ation was used to compute initial param eter values, which have then been
optimized using a least-squares technique. In order to elicit the hyperparam eter vector of the
Dirichlet distribution, we made use of both the m arginal and conditional beta distributions
in two different approaches. The two approaches are programmed in the PEGS-Dirichlet
software th a t is freely available at http://statistics.open.ac.uk/elicitation.
As it is the simplest conjugate prior distribution for multinomial models, the Dirichlet
distribution is very tractable. However, its lack of flexibility limits its usefulness as a prior
distribution. In the next chapter, we discuss the drawbacks of the Dirichlet distribution and
propose new elicitation methods th a t give more flexible prior distributions for multinomial
models.
173
C hapter 7
E liciting more flexible priors for
m ultinom ial m odels
174
7.1
In trod u ction
Being a conjugate prior for the multinomial models, the standard Dirichlet distribution is
widely used for its tractability and m athem atical simplicity.
However, the Dirichlet dis
tribution in its standard form has been criticized as insufficiently flexible to represent prior
information about the param eters of multinomial models [e.g. Good (1976), Aitchison (1986),
O ’Hagan and Forster (2004), Wong (2007)].
The main criticisms of the Dirichlet distribution can be summarized as follows.
1. It has a limited number of param eters. A fc-variate Dirichlet distribution is only speci
fied with k param eters. These determine all the k means, k variances and the k{k —1)/2
covariances, as given in (6.24)-(6.26).
2. The relative m agnitudes of each a2- determine the prior mean, while only the overall
m agnitude N — Sa^ determines all the variances and covariances if the means are kept
fixed.
3. Consequently, the dependence structure between Dirichlet variates cannot be deter
mined independently of its mean values.
4. Dirichlet variates are always negatively correlated, as can be seen from the covariances
formulae in (6.26), which may not represent prior belief.
5. Dirichlet variates th a t have the same mean necessarily have equal variances.
M otivated by these deficiencies, many authors have been interested in constructing new
families of distributions for proportions to allow more general dependence structures [e.g.
Leonard (1975), Aitchison (1982), Albert and G ubta (1982), Krzysztofowicz and Reese (1993),
Rayens and Srinivasan (1994), Tian et al. (2010)].
Some of these new distributions are direct generalizations of the standard Dirichlet dis
tribution [e.g. Dickey (1968, 1983), Connor and Mosimann (1969), Grunwald et al (1993),
Hankin (2010)]. We select one of them and develop a m ethod of eliciting its hyperparam
175
eters as a prior distribution for the multinomial model. The selected generalized Dirichlet
distribution shares some of the desirable properties of the standard Dirichlet distribution.
It is conjugate, reasonably tractable and can be elicited via the beta elicitation procedure
proposed in Chapter 6. The m ethod of eliciting a generalized Dirichlet distribution is given
in Section 7.2 and an example illustrating its use is given in Section 7.3. A Gaussian copula
function is proposed in Section 7.4 as a flexible m ultivariate distribution th a t combines the
marginal beta distributions th a t an expert has assessed.
7.2
E licitin g a generalized D irich let prior for a m u ltin om ial
m od el
7.2.1
C onnor-M osim ann d istribu tion
Connor and Mosimann (1969) introduced a form of the generalized Dirichlet distribution th a t
is also known as Connor-Mosimann distribution. It has a more general covariance structure
than the standard Dirichlet distribution and a larger number of param eters, 2(k — 1).
Its properties have been investigated by Lochner (1975) and Wong (1998), who used it
as a prior distribution in a real life engineering application in Wong (2005) and addressed its
maximum likelihood estim ation in Wong (2010). The density function can be w ritten in the
form [Connor and Mosimann (1969)],
k- 1
*( pi,P2,-” ,Pk) =
n
»=i
0 < Pi < 1,
^ bi—i
r (a* + bi) Dai-1 / y '
r(o i)r(6 j) Pi
\ j =' P]
YPi = b
ai > 0,
v t 1' 1,
(7-1)
bi > 0, &o is arbitrary.
Or, equivalently, in the form [Lochner (1975)]
fc -l r
i=
1
T ( c ij T 6 j )
—j .
---------------------P i 1
(1
r (a i)r(fei)
0 < Pi < 1,
where j i = bi - (ai+1 + bi+1 ), for i
=
Y,Pi ^
Pl
P 2 ---------------- P i ) 11
a%> 0>
bi > 0,
1 , 2, • • • , k - 2, and j k - i = bk-i - 1 .
176
(7.2)
The standard Dirichlet distribution is a special case of the Connor-Mosimann distribution
when b{ = a*+i + fef+i, for i = 1,2, • • • , k — 2. Moreover, it is also a conjugate prior to the
multinomial distribution. See, for example, Wong (1998).
This generalized Dirichlet distribution can be obtained by transform ing (k — 1) indepen
dent beta variates Zi, Z 2 , • • • , Z k - 1 , each with param eters ai and b{, for i = 1,2, • • • , k — 1,
as follows
for j = 1,
Zi,
Pj =
(7.3)
3-1
j = 2r - - , k - l .
\
i= l
The remaining variable pk can be also given, in term s of Z i, Z 2 , • • • Zk, as
fc—1
Pk =
(7.4)
If(l _
i=l
where, by definition, Zk — 1.
The inverse transform ations are given by
for j = 1,
Pi,
Zj — <
Pj
3-1
for j = 2, • • • , k.
(7.5)
1i=l
The first two moments of the generalized Dirichlet variates can be computed, in view of (7.3)
and (7.4), as
for j = 1,
Sj — E (pj) — <
j-
(7.6)
1
£ ( Z , ) J ] E ( l - . Z i ),
for j = 2, - - - ,fc,
i=l
and
for j = 1,
Ti = E ( t f ) =
j-i
£ ( Z f ) n £ (1 - Z i)2,
i=l
(7.7)
for j = 2, • • • , A.
Hence, using well-known formulae for the first two moments of the standard beta distribution,
177
and since Zk = 1, we write
ai
ai+ bi'
Sj =
for j = 1,
j-i
j-j
Cl-i
lj
,
Oi
b j . ^ CLi T b{
CLj
J
for j = 2, • • • , k - 1,
(7.8)
J i= l
k-1
n
i= 1
a i(a i + 1)
(ai + 6 i)(o i + b i
for j = k ,
ai -f bi
for j = 1,
+ 1)
j-i
aj(aj + 1)
+ 1)
,
(%• + bj)(aj + bj + 1)
(ai + bi)(ai + + 1)
n
for j = 2, ••• , k - 1,
bi(bi + 1)
1^1 (cii + bi)(ai + bi +
(7.9)
for j = k.
1)
and
V ar(pj) = Tj — S j ,
for j = 1,2, • • • , k.
(7.10)
Regarding covariances, Connor and Mosimann (1969) showed th a t
C ov(pi,pj) = -
E (Pj)
E(1 - p i )
V ar(pi),
for j = 2, ••• ,k,
(7.11)
j -1
Cov ( p j , p j + 1 ) = J2 (Zj + 1) E [ Z j ( 1 - Zj)} J ] JS[(1 - Z i f }
i= 1
- E(pj)E(pj+ 1 ),
for j = 2, • • • , k - 1,
(7.12)
and
C ov(pj,pTlt) =
E ( Z m)
E ( Z j+1)
m —1
n
Cov(pj,pj+\),
for 1 < j < m < k.
(7.13)
i= j'+ l
Therefore, pi is always negatively correlated with all other variates. However, any other
two successive variates can be positively correlated, as can be seen from equation (7.12).
Moreover, the correlation between any pj and pm, for 1 < j < m < k, has the same sign as
th a t of Cov(pj,pj+i). In this sense, the generalized Dirichlet distribution has a more flexible
dependence structure than the standard Dirichlet, which always imposes negative correlations
between all pairs of variables, as mentioned before. Similar results were found by Lochner
(1975), while Wong (2005) used these properties to select a generalized Dirichlet prior for
sorting probabilities of microelectronic chips th a t tend to be positively correlated.
178
As in the case of the standard Dirichlet distribution, the conditional distributions of the
generalized Dirichlet variates are still scaled beta distributions. This can be shown, using the
marginal distributions of the generalized Dirichlet distribution, as follows.
If pfc_ 1 = (pi j P2 i • • • ,Pk- 1 ) has a generalized Dirichlet distribution of the form (7.2), then
the marginal distribution of any subset from pfc_ 1, say pr = (pi,P 2 , • • ■,Pr), r = 2,3, • ■• , k —1,
is again a generalized Dirichlet distribution with the corresponding param eters [e.g. Wong
(1998)].
The conditional distributions of pr \pi,P2 , • • ■,Pr- 1 > for r = 2,3, • • • ,
— 1, can be com
puted from (7.2) as follows
.
.
i r ( P r\ P l , P2 ,
.
' ■■ , P r - l ) =
'^'(P.r’
a2>’ ' ’ >Ur—1) b l i ^2?
>br —i)
T— r----- 7---- \
KWr- i ; a i >fl2’ ’ " 5ar-2,O i,62,-- - A - 2 )
1
/
1
' l - ,
,
(7-14)
\ i>r 1
■■ 1
/JK.MU-EtJpO"’-V i-EClw
,
(7-15)
which are scaled beta distributions over the intervals (0 , 1 — X)[=i Pi)’ They are also known
as three param eter beta distributions, i.e.
r —1
(Pr\pi>P2 5 ’ ■’ 5Pr—l) ~ b eta(ar , 6 r , l - ^ 2 Pi),
for r = 2,3,--- , / c - l .
i= l
As in Section 6.3.4, applying the transform ation
f
pu
for r =
^r r— 1
1,
for r — 2 ,3, • • • , k — 1 ,
i= 1
gives
p* ~ b eta(ar , 6 r )
7.2.2
Vr = 1,2, • • • , k — 1.
(7-16)
A ssessm ent tasks
The elicitation process given before in the conditional approach for the standard Dirichlet
case in Section 6.3.4 is still valid here.
The main difference in the current case is th a t
the generalized Dirichlet hyperparam eters (ai, a 2 , • • • , Ufc-i, &i, &2 >• • • >k/c-i) are exactly the
179
param eters (a*, bi) of the beta distribution of p* in (7.16), for r = 1,2, • • • , k — 1. Hence, the
generalized Dirichlet hyperparam eters are directly estim ated using beta param eters th a t can
be elicited using conditional assessments as in Section 6.3.4. Note th a t no compromise or
averaging is needed here, since the total num ber of hyperparam eters th a t are elicited is equal
to the num ber of hyperparam eters in the generalized Dirichlet distribution, namely, 2(k —1).
This extended num ber of param eters does not eliminate the benefits of feedback, but it gives
the generalized Dirichlet distribution a more flexible structure than the standard one.
Positive correlations can occur in this generalized case, as discussed before, making it more
useful and practical in quantifying expert’s opinion. However, Aitchison (1986) criticized the
class of generalized Dirichlet distributions as being intractable, particularly w ith respect to
statistical analysis. He also noted th at, despite having a more general dependence structure
than the standard Dirichlet, the class still retains a strong independence structure.
7.2.3
M arginal quartiles o f th e generalized D irichlet d istribu tion
It is always useful to give feedback to the expert based on her elicited hyperparam eters. This
feedback makes the elicited quantities a better representation of the expert’s opinion. For the
generalized Dirichlet prior, where the assessed probability quartiles are all conditional except
for the first category, it is helpful to inform the expert of the corresponding m arginal proba
bility quartiles of each category. She should be given the opportunity to modify them so th a t
they are closer to her opinion, and the elicitation m ethod should change the hyperparam eter
vector according to these modifications.
Unfortunately, marginal distributions of the generalized Dirichlet are not directly of the
b eta type. However, we make use of the independent beta random variables given in (7.5)
to approxim ate the distribution of each
Pj,
j = 1,2, • • • , k , as a standard beta distribution.
Detail is given in the remainder of this section.
180
A n ap p roxim ate d istrib u tio n for th e p rod u ct o f in d ep en d en t b e ta variates
Fan (1991) introduced a beta approxim ation to the product of a finite num ber of independent
beta random variables. His m ethod is described in Johnson et al. (1994) and G upta and
N adarajah (2004), who report favorably on the m ethod based on Fan’s comparison of the
first ten approxim ate and exact moments. The m ethod equates the first two moments of
the approxim ate beta distribution to the corresponding product moments of the independent
beta random variables.
In what follows, we use the m ethod of Fan (1991) to derive the marginal approxim ate beta
distribution of each P j , j = 1,2, • • • , k, from which the marginal quartiles are computed. The
m ethod can also be inverted to give a new elicited hyperparam eter vector of the generalized
Dirichlet distribution, based on the marginal quartiles, if any have been modified by the
expert.
For j = 1,2, •• • ,k, using the m ethod of Fan (1991), the distribution of each pj can be
approxim ated by
Pj ~ b e ta (aj,(3j),
(7.17)
where
j ~
T j-Sj
’
and
Pj -
Tj - S f
’
w ith Sj and Tj as given by equations (7.8) and (7.9), respectively.
Feedback
The three quartiles of the distributions in (7.17) are numerically computed and presented to
the expert. She is invited to modify some or all of them as she thinks necessary, in which
case the modified quartiles are converted in the same m anner as proposed in Section 6.2, to
give modified pairs of param eters (a? ,flj)-
181
The modified two moments of each pj, for j = ' 1,2,--* , k , are computed as follows
s' = aj
(7.18)
*»
and
rrJ
+
J
! )
(7.19)
(aJ+^;)(aJ + /8; + l)
After obtaining Sj and Tj, they are transform ed into normalized values Sj and Tj", respec
tively, such th a t X)jLi -S’j = 1In the m anner of (7.8) and (7.9), we can write the two modified moments of each Zj ,
denoted by Uj = E * ( Z j ) and Wj = E * ( Z j ) , for j = 1,2, • • • , k — 1, as
st,
tj.
3
—
for j = 1,
St
_______________
a*: +
j- 1
n
1=1
\
for j = 2,
-,
, A; — 1,
b*
a* + I*
1
1
and
for j = 1,
rn*
a-(a - + 1)
w . = --------- U J -------------J
(aj + 6j)(aj + 6* + 1)
j
jL
1
n
-,
for j = 2, • • • , A: — 1,
&*(&*+1)
(«? + &:)(«?
The above system of equations can be recursively solved for the modified hyperparam eters
of the generalized Dirichlet distribution, aj and 6j, for j = 1,2, • • • , k — 1, to give
. _ I'liUj-Wj)
j ~
»J =
Wj (1 -
uj ’
Uj )( Vj
-
Wj )
W j-U f
These modified hyperparam eters of the generalized Dirichlet distribution represent the final
output of the method.
7.3
E xam ple: O b esity m isclassification
Obesity and being overweight are serious public health problems whose adverse consequences
can include diabetes, high blood pressure and cardiovascular disease. Obesity can be mea-
sured using the Body Mass Index (BMI) of adults, which is defined as body weight (in
kilograms) divided by body height (in meters) squared. Obesity is defined as a BMI of over
30 and overweight is a BMI over 25. Looking at the situation in Europe it is estim ated th at
50% of adults between 35 and 65 years of age are overweight, of whom 10-25% are obese.
M alta reportedly has one of the highest levels of overweight people in Europe. According
to the European Health Interview Survey (EHIS), November 2011, M alta recorded the highest
proportion of obese men (24.7%) and women (21.1%) amongst the 19 EU Member States for
which d a ta are available. The EHIS reports 36.3% of adults in M alta being overweight and
a further 22.3% being obese. Obesity in M alta is indeed a m ajor public health challenge and
it is targeted as a priority action in M alta’s Strategy for Sustainable Development.
In interview surveys, the heights and weights of participating subjects are not measured.
Self-reported values of these variables are normally used instead. However, self-reported val
ues are less precise and have no guarantee of accuracy, specially when they are converted
into BMI (Shields et al. (2008)). Indeed, the prevalence of overweight and obesity are gen
erally underestim ated when calculated from self-reported d a ta as compared w ith measured
data. Adults have been shown to systematically overestimate their height, and underestim ate
their weight. The extent of weight underreporting increases w ith increasing measured weight
(Shields et al. (2008)). As a result, significant misclassification occurs when BMI categories
are estim ated from self-reported data. Correcting interview d a ta for this misclassification
bias is desirable but d ata to estim ate the bias is lacking. Instead, quantifying expert opinion
might be used to estim ate the bias.
One aspect of the obesity misclassification problem in M alta was formulated in a m ultino
mial model as follows. It relates to Maltese adults (16+) who self-report themselves as having
a normal weight (18.5<BM I<25). Their actual clinical BMI classification may fall in one of
the following multinomial categories: Underweight (BM I<18.5), Normal (18.5<BM I<25),
Overweight (25<BM I<30) or Obese (BM I>30). A health information expert, Dr. Neville
Calleja, used our PEGS-Dirichlet elicitation software to quantify his opinion about this
183
model, first giving two separate sets of assessments, each of which determines the param
eters of a Dirichlet distribution, so th a t his opinion could be represented by a Dirichlet prior
distribution. The second set of assessments was also used to determine the param eters of a
generalized Dirichlet distribution, so th a t his opinion could be modelled by a more flexible
prior distribution. Dr. Calleja has been responsible for all health surveys in M alta for the last
10 years. Currently, he is the director of the D epartm ent of Health Information and Research
in the M inistry of Health, the Elderly and Community Care, M alta. His departm ent leads
the collection, analysis and delivery of health related information in M alta.
To elicit a Dirichlet prior based on unconditional b eta marginals, the expert ordered
the four categories as Normal, Overweight, Obese, Underweight. His unconditional median
assessments for these categories were 0.65, 0.20, 0.10, 0.04, respectively. Then he gave his
unconditional lower (upper) quartile assessments as 0.55, 0.15, 0.06, 0.02 (0.70, 0.30, 0.14,
0.07), respectively. See Figure 7.1. The four beta marginals were then reconciled into a
Dirichlet distribution using three different ways; direct normalizing and averaging, leastsquares optimization, and weighted least-squares. Since the expert’s assessed medians nearly
sum to one, the three different ways gave sets of reconciled quartiles th a t were very close
to each other. He selected marginal medians and quartiles th a t were computed by direct
normalizing and averaging. The elicited hyperparam eters of the Dirichlet prior distribution
were obtained as a\ — 13.23,
02
= 4.71,
03
= 2.18,
184
04
= 1.08, with their sum N — 21.20.
m
=Mx3|
Now, y o u h a v e finished w ith all c a te g o rie s . You m a y p ro s s 'N e x t' t o p ro c e e d
F k EOS t a b
*Wp
Eliciting Q uartiles o f t h e probabilities of e a c h ca teg o ry
Overweight
O b ese
C ategories
fwfgri
Figure 7.1: Medians and quartiles assessments
Based on conditional beta distributions, the expert quantified his opinion again to elicit
another Dirichlet prior for the same problem, but using a different elicitation m ethod. His
three quartile assessments of the first category were 0.60, 0.65, 0.72. Then, he was asked to
assume th a t the probability value of the first category is exactly 0.65; given this inform ation
he gave his three quartiles for the second category as 0.17, 0.20, 0.25. Finally, conditioning
on the probabilities of the first two categories being 0.65, 0.20, he gave the three quartiles
of the third category to be 0.07, 0.09, 0.15. The three quartiles of the fourth category were
autom atically computed and shown to the expert as 0 .0 1 , 0.06, 0.08.
185
|
j
Now, y o u h a v e finished w ith th is c a te g o ry . You m a y p r e s s 'N e x t1 to p ro c e e d
B
a
s
g
g
s
^
r z z z z : .; .: .: : : z i: .: - : : .: : .:: z z i,j a a
f R e Edt Tsdts VfiAp
Eliciting m edian probability o f c a teg o ry (O bese)
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
3 0.50
I 0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
Norm al
Overweight
O b e se
Underw eight
Categories
I « B ack
i
f Ne'»t> |
rW p F j? ;~
Figure 7.2: Assessing conditional medians
Figure 7.2 is a screen shot after the expert had assessed his median for the third category.
The median probability of the third category is in blue (it was assessed), while the fourth
median is in yellow (it was calculated from other assessments). Figure 7.3 shows the condi
tional quartiles th a t the expert assessed for the third category and the conditional quartiles
th a t were calculated for the fourth category. The elicited hyperparam eters of the Dirichlet
distribution using this m ethod were ai = 19.91, a 2 = 5.00, az = 1.11, a4 — 0.65, w ith a sum
of N = 26.67.
186
raiis, you bavntlnfslietlYTfUi tts£s category- You may press ta a t* to proceed
Eliciting Q uartiles o f t h e p robabilities o f t
f He
n ]
U n d e rw e ig h t
Overweight
C ategories
Figure 7.3: Assessing conditional quartiles
On finishing the elicitation process using conditional assessments, the expert was shown
a software message offering him the possibility of using the same conditional assessments to
elicit a generalized Dirichlet distribution. The expert chose to elicit this more general distri
bution as well. The following hyperparam eters of the generalized Dirichlet prior distribution
were elicited, a\ = 19.29, a-i — 4.41, as = 0.91, b\ — 10.23,
62
= 3.15,
63
= 0.54.
To compare the three prior distributions elicited in this example, expected values and
variances of multinomial probabilities were computed for each distribution as shown in Ta
ble 7.1. The means and variances of the Dirichlet distribution were computed using the
elicited values of the hyperparam eters in formulae (6.24) and (6.25), respectively. The same
was done for the generalized Dirichlet using formulae (7.6) to (7.10).
187
Table 7.1: Probability assessments for different elicited priors
Marginal assessments
Conditional assessments
Generalized Dirichlet
Median
E (Pi)
V (Pi)
Median
E{pi)
V(pi)
E (Pi)
VM
Pi
0.65
0.624
0 .0 1 2
0.65
0.746
0.007
0.653
0.008
P2
0 .2 0
0 .2 2 2
0.008
0 .2 0
0.187
0.006
0 .2 0 2
0.006
P3
0 .1 0
0.103
0.004
0.09
0.042
0 .0 0 1
0.091
0.004
PA
0.04
0.051
0 .0 0 2
0.06
0.024
0 .0 0 1
0.054
0.003
It can be seen from Table 7.1 th a t the first Dirichlet prior, which was elicited using
marginal assessments, and the generalized Dirichlet prior both gave expected values of the
multinomial probabilities th at are close to the assessed medians. The second Dirichlet prior
th at was elicited using conditional assessments gave a relatively higher mean value for the
first probability than its assessed median, combined w ith a reduction in the expected values
of all other probabilities. This is a little surprising as the generalized Dirichlet utilized the
conditional assessments th a t give the second Dirichlet distribution, yet its hyperparam eters
are similar to the m ethod th a t uses marginal assessments. The two elicited values of the
hyperparam eter N were relatively close to each other, 21.20 and 26.67, in the two elicited
standard Dirichlet priors. (There is no single value for N with the generalized Dirichlet.)
Moreover, variances of the multinomial probabilities were all small and also close to each
other in the three elicited prior distributions.
After eliciting each of the three Dirichlet prior distributions discussed above, the software
showed the suggested marginal medians and quartiles of each pair to the expert. He accepted
the suggested marginal quartile values, saying th a t the suggested values were very close to
his initial beliefs. Keeping the unit sum constraint in his mind, the expert rem arked th a t
assessing conditional medians and quartiles was easier th an assessing m arginal quartiles. He
stated th a t he could not think about marginal assessments for each category independently
of the others. However, he noted at the same tim e th a t the elicited generalized Dirichlet
distribution may be the most flexible prior of the three.
7.4
C on stru ctin g a copula fun ction for th e prior d istrib u tion
Using the m arginal elicitation process given before, we obtain a number of marginal beta
distributions. R ather than assume these stem from a Dirichlet distribution, we would like to
allow a more flexible dependence structure via their joint distribution, with the aim of better
representing the expert’s opinion. A flexible tool for this task is given by the copula function,
which allows us to choose the marginal distributions independently from the dependence
structure between them . The latter structure is given by the copula.
A copula is best described as a m ultivariate distribution function th a t is used to bind
together m arginal distribution functions so as to form a joint distribution. The copula pa
rameterizes the dependence between the marginals, while the param eters of each marginal
distribution function can be assessed separately. See for example, Joe (1997), Nelsen (1999)
and Kurowicka and Cooke (2006).
There are many types and classes of copula functions, but the m ost intuitive ones use
inverted distribution functions as argum ents in known m ultivariate distributions [Nelsen
(1999)]. The general inversion form of a copula function C is given by
C[G i(z i), • • • ,G k(xk)] = F (1,...ik) { F f 'I G U n ) ] , • • • , F ^ [ G k(xk)}} ,
where Gi are the known marginal distribution functions, Tp,... ,fc) and F{ (i =
1,
• • • , k) are the
assumed joint and marginal distribution functions, respectively. The copula function C works
as the cdf of the m ultivariate distribution th a t “couples” the given m arginal distributions.
7 .4 .1
G a u s sia n c o p u la fu n c tio n
The best-known example of the inversion m ethod is the Gaussian copula [Clemen and Reilly
(1999)], which is given by
,G fc(xfc)] = ®it,,! { $ - 1[G1(x1)],--- .S -^ G tO t* )]}.
189
(7.20)
Here <&k,R is the cdf of a fc-variate normal distribution with zero means, unit variances, and
a correlation m atrix R th a t reflects the desired dependence structure.
is the marginal
standard univariate normal cdf.
Since <&k,R and 4> are differentiable, the Gaussian copula density function can be simply
obtained by differentiating (7.20) with respect to X{, i = 1,2, • • • ,k , giving
f ( x i , x 2, - " ,Xk\R) = 9 l ^
9k^
ex p { - ^ Z fc ( ^ _1 ~ h ) Y k}.
(7.21)
where
X t = ( $ - I [G1( s 1))) ^ - ' [ G s ^ ) ] ,
gi(.) is the density function corresponding to
® - 1[G?*(a:fc)])>
i — 1,2, ••• ,k, and Ik is the identity
m atrix of order k.
To construct a Gaussian copula function in the case of a multinomial model, we can
think of each marginal distribution as a beta distribution whose two hyperparam eters have
been assessed. Then we can construct a Gaussian copula function for the m ultivariate dis
tribution of pi,p2, • • • jPfc-i- According to the unit sum constraint, the remaining variable,
Pk = 1 — Yli=i Pi> can
t reat ed as a redundant variable th a t may be removed from the
m ultivariate distribution to avoid singularity problems. Using the Gaussian copula function,
the dependence structure of the m ultivariate distribution will have high flexibility rather than
the limited dependence structure imposed by the Dirichlet distribution.
The Gaussian copula function is indexed by the correlation m atrix R, which needs to be
elicited effectively and must be a positive-definite m atrix. In what follows we introduce a
m ethod, inspired by Kadane et al. (1980), to elicit the correlation m atrix R th a t is sure to
be positive-definite.
Let Gi(pi) be the cdf of the beta distribution of pi with hyperparam eters cn* and fy,
z = l , 2, --* , k —1, and assume th a t the joint density of P \ , P 2 , • • •
,P k-i
is given by a Gaussian
copula density, such th a t
,Pk-i\R) = 9liPl) X '|'^[vf*!~l(P*~l) e x p f - i i l - j t i r 1
190
(7-22)
where
Z'k- i = t e - ^ G x f a ) ] ,
S -M G s fe )],
■■■,
S -M G n fe - i) ]).
and gi(.) is the beta density of pi, i — 1, 2, • • • , k — 1.
Note th a t the marginal distributions of this joint density are still the desired beta marginals.
Since the hyperparam eters of each beta distribution of pi, i = 1,2, • ■• , k — 1, have already
been elicited, the prior distribution is totally known except for the m atrix R. Although the
above density is not m ultivariate normal for p \ , p 2 , • • • ,Pk-i and the m atrix R is not their cor
relation m atrix, we can still use the m ultivariate normal properties to elicit a positive-definite
m atrix R by considering the following normalizing transform ations,
Yi = 4>-1[Gi (p«)],
i =
,fc.
(7.23)
We should stress th a t with this copula function, the marginal distributions of the pi are
beta distributions th a t can be fixed independently of R. Thus the ability to specify R gives
added flexibility. The aim is to choose R so as to model the expert’s opinion about the
dependence between the pi.
According to the main assumption of the Gaussian copula construction, and from (7.23),
the vector Y^k_ x =
( y 1)
y 2)
.. . } Y k -\) ^ as a m ultivariate normal distribution with zero
means, unit variances and a correlation m atrix R , i.e.
y fc_ !
~ M V N (0 ,tf).
Following this assumption, together with the unit sum constraint of the elements of p, the
full vector Y_' =
( y 1?
y 2)
.
y fc)
has w hat is known as a singular m ultivariate normal
distribution, which will be discussed in more detail in the next chapter. However, we will be
interested, during the rest of this chapter, in eliciting a non-singular correlation m atrix R for
the Gaussian copula function only for pi, p 2 , • • •, Pk-iKeeping in mind th at the Pearson correlation coefficients, as elements of R, are not
transform ation respecting, i.e. they are not invariant even under strictly monotone increasing
transform ations as in (7.23).
We do not attem pt to elicit any correlations between the
191
elements of p. Even if a correlation m atrix for p has been elicited it may be of no use in
estim ating R as no explicit relationship between the two m atrices is available. Moreover, the
density function in (7.22) is indexed by R, the correlation m atrix of F fc_ 1, not the correlation
m atrix of p.
An alternate m ethod of estim ating R th a t has been proposed in the literature was reviewed
in C hapter 2. In th a t approach, a transform ation th a t respects non-param etric measure of
correlation, such as Kendall’s r or Spearm an’s p, is computed for p. The monotonicity of a
transform ation like (7.23) is then used to impose the same correlations on Y_k_ l . Pearson’s
correlations are calculated using approxim ate relations between different correlation coeffi
cients for the normal distribution. For more details see, for example, Clemen and Reilly
(1999), Palomo et al. (2007) or Daneshkhah and Oakley (2010).
In our proposed approach, the m atrix R is elicited as a covariance or correlation m atrix of a
m ultivariate normal random vector Y_k- i • However, we still utilize the monotone increasing
property of the transform ations in (7.23). We may assess conditional quartiles of p, then
transform them into those of Y_ using (7.23). Correlation coefficients between the elements
of Kfc_i can then be estim ated using their conditional quartiles and utilizing the properties
of the m ultivariate normal distribution. This is described in Sections 7.4.2 and 7.4.3.
Although the elicitation m ethod of Kadane et al. (1980) has been designed to elicit the
covariance m atrix of a m ultivariate t-distribution as a conjugate prior for the hyperparam eters
of a normal multiple linear regression model, their m ethod can be useful in a variety of
m ultivariate elicitation problems th a t require eliciting positive-definite m atrices [Garthwaite
et al. (2005)]. The m ethod is modified here to elicit the correlation m atrix R of the Gaussian
copula function.
7 .4 .2
A s s e s s m e n t ta s k s
Since the transform ations in (7.23) are strictly monotonic increasing from p to K, we can
establish a one-to one correspondence between medians and quartiles of these two vectors.
192
The required assessments are as follows.
A ssessin g in itial m ed ian s and q u artiles
1. To elicit each marginal beta distribution, the expert has already assessed a lower quartile, a median and an upper quartile for pi, i = 1,2, ••• ,k, say L*0, m *Q and U*0,
respectively. The m ethod proposed in Section 6.2 can be used to determine the two
param eters a* and Pi of each marginal beta distribution, for i = 1,2, • • • , k.
2. To help the expert assess the medians and quartile in (1), the PEGS-Copula software
presents an interactive graph showing the pdf curve of the beta distribution of pr ,
for r = 1,2, • • • , k. The expert is able to change her assessed quartiles of pr until its
pdf curve represents her opinion to her satisfaction, see Figure 6.1.
3. To attain the unit sum constraint, the mean values of the elicited beta marginals must
and Pi are thus modified to fulfill this condition,
sum to one. The elicited param eters
as follows.
The mean values pi are computed as
=
for z = 1,2, • • • ,k.
+ Pi
The normalized mean values p* are given by
=
■ * = 1, 2, --- ,fe..
(7.24)
1N
We keep the variances fixed as
_
cr„2- =
(XiPi
(ai + Pi)2(ai + Pi + 1) ’
for i = 1,2, • • • , k.
(7.25)
Equations (7.24) and (7.25) give the modified set of param eters a* and P?, for i =
1,2, - - - ,fc:
>?(!-/*?)
2
.
ct,?
P t = {1 - r f )
193
a?
/
j
’
- 1
4. Before going further, the modified param eters of each m arginal beta distribution are
used to compute the corresponding quartiles numerically. These quartiles are presented
as feedback to the expert, who is still able to change some or all of them , in which case
the process is repeated again until the modified sets of quartiles are accepted by the
expert.
A sse ssin g co n d itio n a l qu artiles
5. To estim ate the correlation m atrix R, the expert is asked to assume th a t p\ = m \ 0
and gives a lower quartile L \ and an upper quartile
for p 2 - For each remaining
Pj, j = 3, • • • , k — 1, she assesses the two quartiles L j and Uj given th a t p\ = m ^ 0,
P2 = ^
20
, •••, P j-i — irij-ifl- Figure 7.4 shows the process of assessing conditional
quartiles, where the expert has already assessed the lower quartile of the th ird category,
conditional on the median values of the first two categories, which are shown by the
red bars.
— ............
.
You a s s e s s e d t h e lo w er q u a rtile p ro b a b ility o f c a te g o ry (C a te g o ry 3 ) t o b e (0 .1 3 8 ).
i, .
F lc E a t Tools Hotp
Eliciting Q uartiles of th e probabilities of C ategory (C ategory 3)
I
0.45
C ategory t
Category 2
Category 3
Category 4
C a tegories
( ^Bace 3
rifetV |
fW fl
Figure 7.4: Assessing conditional quartiles for copula elicitation
194
.jsja
6. The lower (upper) quartile L k (Uk ) of pk will be autom atically shown to the expert
once she assesses the upper (lower), quartile Uk_ x ( L * ^ ) of
P k-i-
The two quartiles
L k and Uk are shown to the expert as a guide to help her choose L*k_ x and Uk_ v See
Figure 7.4, where the software has shown the upper quartile of the fourth category
after the expert assessed the lower quartile of the third category. In fact, L k (Uk) is
the lower (upper) quartile of
m i,o>' ‘ ’ t P k - i — m
k-i
(pk\pi
o)’ as
~
m i
'' ‘ iVk- 2
=
m k- 2 0)
instead of
(pk\pi
=
^w0 quartiles in the latter case should be ju st equal
to m*k 0, because of the unit sum constraint.
A ssessin g co n d itio n a l m ed ians
7. Here we assume th a t the median of pi has been changed from m | 0 into
Given this information, the expert will be asked to change her previous medians
of each
p j
to be
m *^
r r ij
0
1. We put
m j,i = m lo + ej ,i ’
for j = 2,--.* ,k.
(7.26)
8. In each successive step i, for i = 2,3, • • • , k —2, the expert will be asked to suppose th a t
the median values of p\,
P 2 , •••, P i
are m \ ^ —
0+
77^,
ra22 = ^
2,1
" >m
i,i
=
respectively. Given this information, she will be asked to update her assessed
medians from the most recent previous step m*+l i_l5
dated assessments are ra*+M = 7n |+1)i_ 1+ ^ +l f,
' i m *ki-1- The UP“
i = m*i+2 ,i - \ + 0 i+2 ,n • • • , m%A =
rn% i_i + 0k,i’ respectively. In other words, for i = 1,2, • • • , k —2,
j = i + 1, z+ 2, • • • , As,
we can write
m ^i =
+ 0jti is the median of (pj\pi = m \ jl, • • • ,p» = m*fi).
(7.27)
On an interactive graph produced by the PEGS-Copula software, see Figure 7.5, the
conditioning set of median values are shown as red bars. The conditional m edians of
the remaining categories at the most recent previous step are shown as black lines. The
195
expert is asked to assess how her new m edian values will change based on the new
conditioning set.
Unix!
EE
You a s s e s s e d t h e co n d itio n al p ro b a b ility m ed ian o f c a te g o ry (C a te g o ry 3 ) t o b e (0 .3 7 4 ). P le a s e c o m p le te fo r o th e r c a te g o rie s !
m
Ob
C fi T ads B«p
Eliciting conditional m edians o f P robabilities fo r Each C ategory
£L
C ategory 3
Category 2
Category 1
Category
4
C ategories
'R ev ise S uggestion
fBSpTl
*cee p tS u g g e stro o s
Figure 7.5: Assessing conditional medians for copula elicitation
9. For m athem atical coherence, as will be proved in Lemma 7.1, we require
i
k
1 2 mh + J 2
j —1
=
* = 1 , 2 , " - ,fc —2.
j —i +l
The expert has the option of changing her initial set of assessments m'i+l i ,
" '■
>
m'k i until she feels th a t the suggested normalized set m*+ l i , m*+ 2 i, •••, m k i gives
an adequate representation of her opinion.
The software suggests each normalized
conditional median m ^ , given by yellow bars in Figure 7.6, as
r= l
m jti =
m
for i = 1, • • • , k — 2,
j = i + 1, • • • , k.
J 2 m 'r.i
.
r=i+l
10. The current assessment task stops at step k — 2, as we do not ask for any conditional
assessments for the last remaining category pi~. Since the condition of summing to one
should always be fulfilled, conditioning on specific values of all p i,P 2 , • • • ,P k - 1 gives a
196
fixed value for pf~. In this case no upper or lower quartiles can be assessed for pk, as
mentioned before.
liaBss^ :111:......1" ............... ... ...." ~~~~ _i_
‘ .
-iai*i
Now, y o u h a v e finished w ith th is fra m e . You m a y p r e s s 'N e x t 1 to p ro c e e d
ja a
; fie Effi Tods Rdp
Eliciting conditional m edians o f P robabilities fo r Each C ategory
0 .9 5 .........................
-...... - ...........................- .................•........................................................ ^................................................................
0.90 ............................
i................................................4............................................... - .................................................
0.75 .................................
-.............
-.......................................................
. ■ .. ------------------------------------ *..<
i. .I---------------------------------Category 2
C ategory 3
Category
—.
C ategory 1
4
C ategories
P^eSTl
RCTS8su;ie£fenS|
F ^ s ^ S ir]
\
fH5p?"l
Figure 7.6: Software suggestions for conditional medians
7 .4 .3
E liciting a p ositive-d efinite correlation m atrix
R
The normalizing one-to-one functions in (7.23) are used to transform the assessed condi
tional quartiles of p into conditional quartiles of Y_, and hence, into conditional expectations,
variances and covariances of the m ultivariate normal variables. In particular, letting M ( X )
denote the median function of the random variable X , we proceed as follows.
For i = 1,2, • • • ,k, let m^o =
$ _1
[G i(m |0)].
For i = l , 2, -*- , k — 2, and j = i + 1, • • • , k — 1, let
m jti = E(Yj\pi = m l t0 + rjl,P2 = m*2jl + 77I, • • • ,p { =
+
77?).
Then
(7.28)
For
2
= 1, 2, --- , k —2 define rji by letting rji = Yi — r a ^ - i when Pi =
'm jti = E(Yj \Y1 = m i f l + rjuY 2 = m 2,i +
197
772 , • • •
+ rj*. Then
, Yi = ™i,i-i + Vi),
(7.29)
and
m = 4
+
for i = 1,2, ■■• , k - 2.
„*)] _
Analogous to mU =
+
77*,
define
rriij = m ^ i - i + 77*,
so th a t 1 7 = rriij when pi =
For
2
=
for
2
= 1,2, • • • , k - 2,
(7.30)
.
1,2,-** , k —2, and j =
2 + 1 ,•••
0 j,i
, fc — 1, analogous to 0 ^ = rrij^
define
= Trijj —rrijj- 1 ,
so th a t
e,- 4 = r
For
2
=
‘ [G,(m*(. i + < y ] - s - H e j K i - i ) ] -
1,2,-** , k — 2, and j =
2
+ 1, ■■• , k —1, let
Vk = Var(Y)|Yi = m i)0, U2 = m 2>0, • • • , 1* = m ij0),
so th a t
Vjj- 1 —
U j-L f2
,
1.349
for j = 2,3, • • • , k — 1,
(7.31)
with
U j = $ _ 1 [G^(£f/)],
L j = $ _ 1 [G3-(l;)].
Having defined the above quantities, we are ready now to state and prove the following
lemma.
L em m a 7.1. Under the unit sum constraint of p, and the multivariate normality o f Y , ,
i
k
mh
j= 1
+
mi,< = 1>
i = 1,2, • • • , fc - 2.
j= i+ 1
P roof
A property of conditional expectations of singular m ultivariate normal distributions is given
by equation (8a.2.11) in (Rao, 2002, p 522). Using this property, for
198
2
= 1,2, • • • ,k — 2, we
have
E[Y k \Yi =
m i,i, • • •
,Yi =
77ij,i] =
E[Yk \Yi —
Yi + 1
, , ••• ,
7721 1
=
Yt = m ifi,
E ( Y i + 1 |Y i
= 7 7 ii,i, • • •
y fc_ i = .E7(Yfc_ i | Y i = 7 7 ii,i,- • •
,Yi = m i ti ) ,
,Yi = m i}i)],
then, from equations (7.29) and (7.30)
M (Y k\Yi = 772i,i,- • ■,Yi = m iti) = M ( Y k \Yi = m i,i, - •• ,Y { =
Yi+ 1 — 772j+i,^, • • • , Ffc—1 =
772/5—1 ,1).
Hence
M
{ $ - 1 [G fc (p fc )]b i = 772^1, • • •
M { $ - 1 [<?fc(pfc)]|pi =
772^ 1 ,
,pi
m li}
=
=
• • • ,pi = m liiP i+ i = TnJ+i^, • • • , p k- i = m k- i , i } >
which, utilizing equations (7.26) and (7.27), gives
$ - 1 [Gfc(772^)] = $ - 1 {Gfc[M(pfc|pi =
772^ 1 , • • •
Pi+ 1
,pi = m li,
’ ' ' i P k —1
'TH'k—
i.e.
7 7 2 ^ = M ( p fc| P l = m \ t1 , • • ■ , P i = 7 7 2 U , P i + i = 772-+ l i , • • • , p fc_ i =
Since the condition in the RHS of the above equation is on all the pis except pk, applying the
unit sum constraint gives the conditional median in the form of the following complement
k —1
i
= 1~ Y mh ~ Y
j= l
mh ’
j= i+l
which ends the proof of Lemma 7.1.
• To elicit a positive-definite correlation m atrix R, let
Y-i — (> 1 ,
Y2,
-•-,
Yi)>
199
2 = 1, 2, - - - , A: — 1,
where R \ = Var(Yi) = 1 and the final m atrix R = R k -i-
Suppose th a t R i - i has been estim ated as a positive-definite m atrix, we aim now to
elicit R4 , and show it is positive-definite. R{ can be partitioned as follows
R i —l
R i —lTLi
(7.32)
Ri =
t i R i -1
Vi
where
R i - i u = C o v C y ^ y * ),
Vi = Var {Yi).
Although the Gaussian copula function implies th a t Var(Yi) — 1, we will find another
estim ate for Vi using the conditional variance of Yi elicited in (7.31). The reason for
this, as will be shown later, is to follow the approach of Kadane et al. (1980) so as to
ensure the positive-definiteness of the m atrix Ri. In what follows, we use the conditional
median assessments to estim ate r^.
Using the partition (7.32), it is well-known from m ultivariate normal distribution theory,
since E(Y_) = 0, th a t
(7.33)
Moreover, for j < i — 1, taking the conditional expectation of both sides of (7.33), given
th a t y_. — im ifi +
+ *72 ,- • • ,
E
gi ves
= y}] =
= y.) U ,
(7.34)
i.e
^ « K J-= 1 /J.) = (y1> . . . ,
Vj, E (Y j+ 1 \Y j )i
E iY i^lY j)) n .
(7.35)
From (7.29) and (7.35) we get
m i,j = (mift + 771,
m 2)i +
772,
m j+ 1:j,
• • • , r r i j j - i + r}j t
• • • , r r i i - i j ) U-
Since j — 1,2, • • • , i — 1, we end up with a system of i — 1 equations of the form
(7.36)
T i — Q i —lL-i
where
m i, 1
m it2
Ti =
mi^i—i
and
1
771
Q i —l —
7773,1
TTlz-1,1
771
7772,1
+ 772
7773,2
T77i_i,2
7/1
7772,1
+ 772
7773,2 + 773
777^—1,3
7?1
7772,1 + 772
7773,2 + 773
7 7 7 i_ l,i_ 2 + T ] i- \
•••
Since m i:j - m j - i = 0id , j = 1,2, • • • , z - 1, and
777*,0
= 0, m ultiplying both sides of
(7.36) from the left by the m atrix
0
-1
1
0
...
0
0
-1
1
0
0
-1
1
1
M i_i =
;
...
O
0
;
0
0
1
the system can be w ritten as
@i, 1
771
#2,1
•••
0 z - l,l
0i,2
0
772
•••
0 » - l,2
0
7 7 i_ i
0
@i,i—1
0
201
Provided th at
Vj
7^ 0 )
3 =
1>2, • • • ,
i
— 1,
the upper diagonal m atrix M i - i Q i - i is non-singular. Hence
r
-
m
#2,1
0
rj2
• ••
- l r-
-|
0 i - i,i
01,1
*** 01- 1 , 2
0z,2
0
0
0
77*—1
0*,i—1
• Since
V ar(y,|Z i_ 1) = Var(yi)
we can now use the assessed conditional variance given by V^i-i in (7.31) to estim ate
the unconditional variance Vi as follows
Vi = Vij-i +
Using the Schurr complement, the m atrix Ri is positive-definite if and only if
Vi - r^Ri-iZi > 0,
which is guaranteed from (7.31) since
i > 0.
• Choosing the arbitrary values r/j ^ 0, j = 1,2, • • • ,i — 1, guarantees the existence of a
unique solution for r {. It can be seen from the relation
V j
th a t r]j
=
+
, ; ) ]
-
0 as rjj ^ 0, j = 1 , 2, • • • , i — 1 .
• W ith the proposed m ethod, Ri is a positive-definite m atrix if R i - i is positive-definite
(■i = 2,3, • •• , k —1). Since R \ = 1 > 0, by m athem atical induction, the full correlation
m atrix R = R k - i is guaranteed to be positive-definite.
202
We have to note th at, according to this m ethod of elicitation, the variances on the
main diagonal of R, say r ^ , i = 1, 2, • • • , k — 1, will seldom equal one, except for the
first element r \^ . It is easy, however, to transform R into R*, where R* is a suitable
correlation m atrix for the Gaussian copula function, satisfying both the unit variances
and positive-definiteness. R* can be obtained from f using the transform ation
R * = ARA.
where
1
0
1
A =
0
0
0
i,fc—i .
The unit variances in the correlation m atrix R* ensures th a t each m arginal distribution
Gi(pi) is still a beta distribution with the same marginal hyperparam eters ai and fa
th a t were elicited before
(2
= 1,2,-** , k).
• The accompanying software outputs the elicited pairs of beta param eters ai and fa, for
i = 1,2, • • • , k, together with the elicited covariance m atrix, R*.
7.5
Exam ple: W aste co llection
The Environm ental Agency in the UK is currently interested in the fuel consumption of
waste collection vehicles. It is thought th a t substantial quantities of fuel are used to collect
recyclable waste and th a t local authorities are insufficiently aware of the amounts involved.
In this example, a waste management expert, (Dr. Stephen Burnley, The Open University)
used the PEGS-Copula elicitation software to quantify his opinion about the proportions of
waste collection trips according to the type of recyclable waste. Dr. Burnley is a fellow of
the Chartered Institution of Waste Management. He advised th a t two main types of the
waste are considered; urban recycle and rural recycle. Each of them may contain bins, sacks,
203
garden waste and recycle waste. Hence, each collection trip is arranged by the local authority
for only one of eight different waste types. Considering the proportions of collection trips for
waste in each category, the problem can be form ulated in a multinomial model w ith eight
categories. Our m ethod and software were used to quantify the expert’s opinion about a
Gaussian copula prior for the param eters of this multinomial model.
After initializing the software and defining the model, the expert assessed his medians of
the proportion of collection trips for each of the following 8 types of waste: urban-bins/ urbansacks/ urban-garden/ rural-bins/rural-sacks/ rural-garden/ urban-recycle/ rural-recycle. Then
the expert assessed lower and upper quartiles for the proportion of each category. His as
sessed medians and quartiles are shown as blue bars and short dark blue horizontal lines,
respectively, in Figure 7.7. These assessments are also given in Table 7.2 below.
You h a v e a lre a d y a s s e s s e d alt c a te g o rie s b e fo re , b u t still y o u m a y c h a n g e it
Eliciting Q uartiles of th e p robabilities of e a c h category
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
§ 0.50
2 0.45
0.40
0.35
0.30
0.25
0.20
0.15;
0.10
0.05
0.00
U. g a rd en
U .recycle
R. recycle
C ategories
f i'gt"*"]
Figure 7.7: The initially assessed m arginal medians and quartiles
204
Table 7.2: E xpert’s assessments of medians and quartiles
Pi
P2
P3
Pa
P5
Pe
P7
Ps
Lower quartile
0.25
0.05
0.13
0.05
0.01
0.02
0.18
0.07
M edian
0.30
0.08
0.20
0.07
0.03
0.05
0.25
0.09
Upper quartile
0.35
0.12
0.28
0.15
0.05
0.07
0.30
0.25
These assessments were used to elicit a m arginal beta prior distribution for the proportion
of trips in each category. For m athem atical coherence, the expected values of these elicited
beta priors m ust sum to 1, so, the software used the initial assessments to elicit b e ta dis
tributions th a t satisfy this condition. The median values and quartiles of the coherent b eta
distributions were computed and presented to the expert as feedback in Figure 7.8. During
this feedback stage he was invited to accept or revise these quantities. The initial median
values given by the expert have a sum th a t is nearly equal to one, so the coherent medians
and quartiles suggested by the software in Figure 7.8 were close to his assessments and he
naturally accepted them as representatives of his opinions.
H e re o re y o u r unco n d itio n al a s s e s s m e n ts , y o u m a y c h a n g e a n y o f th em !
Unconditional M edians an d quartiles already a s s e s e d fo r Each C ategory
_i
U. recycle
U. g a rd en
R. recycle
C ategories
Figure 7.8: The coherent assessments suggested by the software
205
To elicit a correlation m atrix for the Gaussian copula prior, the expert gave conditional
assessments th a t quantified his opinion about the dependence structure between the marginal
beta distributions. To do th a t, he assessed conditional quartile values, under the condition
th a t the assessed medians for the previous categories were actually the true values. For
example, he assessed his conditional quartiles of the proportion for the fourth category, given
th a t the median values for the first three categories equalled their true values.
This is
illustrated in Figure 7.9.
You have already a s se s s e d this c ategory (R. bins) before, b u t stfll you m ay change It
FI* Ecfit Tods Help
Eliciting Q uartiles of t h e probabilities of C ategory (R. bins)
S 0.45
U. g ard en
U. recycle
R. recycle
C a te go rie s
I he* » |
Figure 7.9: Assessing conditional quartiles
The expert’s seven pairs of assessments for the lower and upper conditional quartiles are
given in Table 7.3. The quartiles for the last category are shown in bold typeface in Table 7.3
as they were autom atically com puted by the software when the expert assessed two quartiles
for the seventh category. This is also illustrated in Figure 7.10.
Table 7.3: E x p ert’s assessments of conditional quartiles
P2
PZ
PA
Ps
Ps
Pi
P8
0.03
0.10
0.03
0.01
0.02
0.20
0.19
0.13
0.23
0.08
0.04
0.08
0.28
0.27
206
no«r„ yet* fcav a fin is h e d w flh th is c a te g o ry . Y ou m a y p r e s s *Next’ t o p ro c e e d
sm
8 0.45
U . b in s
U sacks
U . g a rd e n
R . b in s
R .s a c k s
C a te g o r ie s
R . G a rd e n
U .re c y c le
R . re cy c le
Figure 7.10: Assessing conditional quartiles for the last two categories
Next, conditional on the proportion for the first category being 0.12, the expert gave
conditional median assessments for the proportions of the seven remaining categories. The
number of conditions was then increased in stages. For example, in Figure 7.11, the expert
has assessed the conditional medians for the last five categories given th a t the proportions
for the first three categories are 0.12, 0.04 and 0.08, respectively. Table 7.4 gives all the
conditional median assessments, where the underlined values constitute the conditioning set
at each stage.
207
• o z ^ z irz riiz iz z z z z z ^ ^
r *
*
llcrrr, you: bav® fTimSzed wiTlx t h i s fra m e . You m a y p r e s s 'f f a x f t o p ro c e e d
|V^
- ■-
jb i *s
n j^ j
FBe t e a Tools Help
Eliciting conditional m edians o f P robabilities fo r Each C ategory
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
£ 0.55
I 0-50
|
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
U .b ln s
U. s a c k s
U. g a rd en
R. bins
R. sa c k s
R. G arden
U. recycle
R. recycle
C ategories
r < BacK |
-: Rs^seStfggesb'ons |
f *cs«pt S u g g g s te n ij
j
!&* >■
|
Figure 7.11: Assessing conditional medians
Table 7.4: E xpert’s assessments of conditional medians
Pi
P2
P3
P4
Pb
P6
P7
P8
0.12
0.09
0.16
0.10
0.06
0.12
0.15
0.14
0.12
0.04
0.16
0.14
0.06
0.14
0.14
0.2
0.12
0.04
0.08
0.14
0.06
0.18
0.14
0.22
0.12
0.04
0.08
0.07
0.10
0.20
0.14
0.23
0.12
0.04
0.08
0.07
0.05
0.22
0.15
0.26
0.12
0.04
0.08
0.07
0.05
0.11
0.22
0.33
This was the last assessment task, after which the software output the elicited hyperpa
ram eters of the m arginal beta prior distributions as in Table 7.5. The dependence structure
between these beta marginals was quantified as a m ultivariate Gaussian copula function w ith
an elicited covariance m atrix as given in Table 7.6.
208
Table 7.5: The elicited hyperparam eters of marginal beta distributions
Pi
P2
P3
PA
P5
P6
P7
P8
a
3.7607
1.0661
1.6536
0.6951
0.5731
0.6344
1.2489
0.4545
b
12.0047
14.8133
8.6742
8.6012
19.5493
14.8684
3.3578
3.3669
Table 7.6: The elicited covariance m atrix of the Gaussian copula prior
Yi
y2
y3
Yi
y5
y6
Y7
Yi
1
-0.1279
-0.2601
-0.7773
-0.55
-0.6192
0.5414
y2
-0.1279
1
0.1328
0.1479
-0.4842
-0.3304
0.3326
y3
-0.2601
0.1328
1
-0.082
0.042
-0.03
-0.0618
Ya
-0.7773
0.1479
-0.082
1
0.2358
0.4632
-0.4406
y5
-0.55
-0.4842
0.042
0.2358
1
0.5664
-0.5812
y6
-0.6192
-0.3304
-0.03
0.4632
0.5664
1
-0.8354
y7
0.5414
0.3326
-0.0618
-0.4406
-0.5812
-0.8354
1
The elicited m atrix in Table 7.6 does not give covariances between the beta distributed
proportions, pi , -- - ,P 8 - Instead, it gives the covariances between the transform ed normal
variates, Yi,-- - , Y7 . The eighth transform ed normal variate is om itted so as to avoid the
singularity of the elicited m atrix, as discussed before. The Gaussian copula m ultivariate dis
tribution is param eterized by both the marginal beta param eters and the covariance m atrix
in Table 7.6. The software produces a WinBUGS file with the Gaussian copula prior distribu
tion. Marginal beta param eters can also be used to compute the expected value and variance
of the proportions of each category. These are given in Table 7.7, where the expected values
are very close to the coherent median assessments in Figure 7.8, and even closer to the initial
median assessments in Table 7.2 and Figure 7.7.
The elicitation process took about an hour to complete. The expert stressed the im por
tance of the convenient order of categories when conditioning. During the task of giving
conditional assessments based on an increasing num ber of conditions, he commented th a t
209
ordering the categories in a suitable sequence made it easier for him to think about these
conditions according to his knowledge.
Table 7.7: Probability means and variances from marginal beta distributions
7.6
Pi
P2
P3
PA
P5
P6
P7
P8
E ( Pi)
0.239
0.067
0.160
0.075
0.028
0.041
0.271
0.119
V (Pi)
0.011
0.004
0.012
0.007
0.001
0.002
0.035
0.022
C oncluding com m en ts
The elicitation methods for beta param eters proposed in the previous chapter have been
used in this chapter as the main tools for eliciting two more flexible prior distributions
for multinomial models. A novel elicitation m ethod for the generalized Dirichlet distribu
tion has been introduced.
The m ethod makes use of the fact th a t the conditional dis
tributions of the generalized Dirichlet variates are b e ta distributions.
The m ethod has
been implemented in user-friendly software th a t is freely available as PEGS-Dirichlet at
http://statistics.open.ac.uk/elicitation.
The elicitation of copula functions for multinomial models faces two obstacles, as noted
in the literature. The usual correlations cannot be transform ed through the assumed cop
ula transform ation, which is one obstacle, and the need to elicit a positive-definite variancecovariance m atrix is the other. Our proposed elicitation m ethod for the Gaussian copula prior
has overcome both problems. The assessed conditional quartiles could be transform ed through
the normalizing one-to-one transform ation, making it possible to elicit correlations. Moreover,
the m ethod of Kadane et al. (1980) has been modified to elicit a positive-definite variancecovariance m atrix for the Gaussian copula. The m ethod has been implemented in the userfriendly PEGS-Copula software th a t is freely available at http ://statistics.o p en .ac.u k /elicitatio n .
210
C hapter 8
E liciting logistic norm al priors for
m ultinom ial m odels
211
8.1
In trod u ction
The logistic normal distribution has long been used as a m ultivariate distribution for propor
tions (Aitchison, 1986). The constrained proportions are obtained by transform ing normally
distributed unconstrained variables on the real space using some one-to-one transform ation.
Different m ultivariate logistic transform ations are given in the literature, see for example
Aitchison (1986). The most well-known and widely used logistic transform ation, specially for
multinomial logit models, is the additive logistic transform ation.
We propose a m ethod for quantifying opinion about a logistic normal prior for multinomial
models. Our proposed m ethod has been implemented in interactive graphical user-friendly
software developed in Java. This is freely available as PEGS-Logistic at h ttp ://sta tistic s.o p en .
ac.uk/elicitation. The elicitation m ethod proposed here is generalized in C hapter 9 to handle
the case of multinomial models with covariates, or w hat are known as the multinomial logit
models.
In Section 8.2 we define the logistic normal prior to be used and consider its assumptions.
The required assessments with our structural procedure to elicit them using the software are
given in Section 8.3. The use of these assessments to elicit the hyperparam eters of the logistic
normal prior distribution is proposed in Section 8.4. A m ethod to obtain the prior’s marginal
quartiles, which are useful as feedback, is proposed in Section 8.5. We finish this chapter by
giving an example in Sections 8.6 and some concluding comments in Section 8.7.
8.2
T h e a d d itive logistic norm al d istrib u tio n
The additive logistic transform ation from V* to p is defined by
with inverse transform ation
Yi
=
log ( E )
=
log
\P lJ
( - ---------------- ----------------------1 ,
V1 ~P2 ~ P 3
i = 2 ,3,
( 8 .2 )
PkJ
where
r
= (y2) y 3,
Yfc) ~ MVN(/xfc_ 1,Sfc_1).
(8.3)
• The transform ation is one-to-one from the k — 1 dimension random vector Y* into the
k dimension random vector p. The definition of an extra random variable Y\ will be
given later.
k
• For any values Y 2 , • • • , Ffc, (8.1) gives
E
Pi
= 1.
i= 1
• The m atrix E/c_i is non-singular.
• The transform ation is not symmetric in the p i, as we choose a fill-up variable
Pi
= 1~P2
- P 3
--------- P k -
• The transform ation is used in the multinomial logit regression model when
Yi = X %
• If (8.3) applies, the elements of the vector p are said to have the m ultivariate logistic
normal distribution. Their joint density has the form
f(S} H k - v E k -i) =
(27r)fc21|E fc_ i|2 (p 1 x p 2 x ••• x p k)
ex p | - i
2*2,
p o g ^ j/P i)
k
where
p '^ =
(p2
p3
...
pk),
0 < p{ < 1,
^ P i = 1.
i= 1
• This additive logistic normal distribution is said to be perm utation invariant. T h a t is,
whatever be the ordering of the elements of the vector p , the density function given
above is invariant. For a theoretical proof of this property see Aitchison (1986). Under
the perm utation invariance, any order of the elements of p can be considered. Con
sequently, the choice of the fill-up variable is arbitrary. Usually it is chosen as the
probability of the most common category, the first category, or the last category. To
elicit a logistic normal prior, we favour choosing the most common category as the first
category and making pi the fill-up variable. This is more convenient for our m ethod
because of the order of conditioning we adopt later.
• For sampling compositional data, the problem of zero components has been reported by
Aitchison (1986) as a critical irregular case th a t needs special attention in dealing with
the logistic normal distribution. Clearly, the log transform ation cannot be applied with
zero components. However, we need not worry about this problem in our elicitation
m ethod, as categories with assessed zero probabilities can simply be removed from the
analysis at the first early step w ithout any loss.
We assume th a t prior opinion about Y_* can be represented by the m ultivariate normal dis
tribution in (8.3). As will be shown later, for the assessments of p to be fully transform able
to y*, a further normalizing transform ation m ust be defined on the fill-up variable p\. We
define an extra variable Yi such th a t
(8.4)
Based on the normality assum ption of Y_* in (8.3) and the unit sum constraint of p, the
random variable e Yl can be represented as a sum of k — 1 lognormally distributed random
variables, since
k
Although the sum of lognormal random variables has no simple exact distribution, it is
common to approxim ate its distribution by another lognormal distribution. This is discussed
in the next section.
214
8 .2 .1
A p p r o x im a te d is tr ib u t io n o f t h e lo g n o r m a l s u m
Fenton (1960) considered the numerical convolution of lognormal distributions and showed
th a t the sum of such distributions is a distribution th a t approxim ately follows the lognormal
law.. He added th a t the sum of two (or more) lognormal distributions can be assumed, as a
first approximation, to have another lognormal distribution. Later, Schwartz and Yeh (1982)
mentioned th a t there is an accumulated body of evidence indicating th a t the distribution of
the sum of a finite num ber of lognormal random variables is well-approximated, at least to
first order, by another lognormal distribution.
Several approximations have been introduced for the sum of lognormal random variables.
Although the idea of approximating their sum using another lognormal distribution has been
common in many studies, methods differs in approxim ating the moments of the lognormal
distribution of the sum. Fenton (1960) matches the first two moments of the sum of lognormal
random variables to the first two moments of an equivalent lognormal random variable.
Schwartz and Yeh (1982) follow the same approach but compute the exact first two moments
for the sum of two lognormal random variables; the procedure is then iteratively applied
for the sum of more than two lognormal random variables. Their m ethod of computing the
distribution of a sum of independent lognormal random variables was extended to the case
of correlated lognormal random variables by Safak (1993).
Recently, based on approxim ating the distribution of the sum of lognormal random vari
ables by another lognormal distribution, a lot of work have been devoted to giving various
approxim ation methods. For example, Beaulieu and Xie (2004) uses a linearizing transform
w ith a linear minimax approxim ation to determine an optim al lognormal approxim ation to a
lognormal sum distribution. Tellambura and Senaratne (2010) use the classical complex in
tegration techniques to approxim ate the moment generating function of the sum. M ahmoud
(2010) approximates the characteristic function and the cumulative distribution function of
the lognormal sum by exploiting the recent Hermit-Gauss quadrature-based approxim ation.
It is thus natural to approxim ate the distribution of Y\ by a normal distribution with
215
elicited mean and variance. (We do not require any approximations to obtain its param eters.)
We can then state our main assumption:
ijc = (Yi,
Y2,
y fc) '~ M V N ( Mt,E fc).
The unit sum constraint of p will always lead to a singular m atrix
(8.5)
However, we assume
th a t there is only one condition on the elements of p, namely the unit sum. In particular,
we assume th a t there does not exist any subset of categories such th a t the sum of their
probabilities is known with certainty.
Although no density function can be defined for the singular m ultivariate normal distribu
tion, its theoretical properties and numerical results have been investigated in the literature.
See, for example, Bland and Owen (1966), Kwong and Iglewicz (1996), A lbajar and Fidalgo
(1997) or Genz and Kwong (1999).
Usage of the singular normal is thus feasible and has been exploited in numerous mul
tivariate methods.
K hatri (1968) used the notion of a generalized inverse to utilize the
singular normal distribution in m ultivariate regression. Styan (1970) discussed the distribu
tion of quadratic forms in singular normal variables. West and Harrison (1997) defined the
covariance m atrix of the m ultivariate normal distribution as a non-negative definite m atrix.
In C hapter 8 of his book on linear statistical inference, Rao (2002) did not use the density
function to define the m ultivariate normal distribution. Instead, he characterized it by the
property th a t every linear function of its elements has a univariate normal distribution.
He could then list properties and characterizations of the m ultivariate normal distribution
w ithout using the pdf. The singular normal distribution is thus a special case of the standard
normal distribution, and has similar properties, but with the usual inverse of the covariance
m atrix replaced by its generalized inverse. Conditional properties of the singular normal
distribution have been extensively used in the current chapter for eliciting a logistic normal
distribution.
To this end, using (8.5), we assume th a t the prior distribution of p is the logistic normal
216
distribution induced by the vector
r
= ^ r fc~ M V N (& _ 1, s fc_ 1).
where
A — 0 j Ifc—i
(8 .6)
(8.7)
S&_1 — A E kA'.
(8.8)
We sta rt by eliciting p,k and a m atrix E& of rank k — 1. In our approach, we modify the
m ethod of Kadane et al. (1980) and add a special treatm ent for the k th row and column.
This will give the
and Y k- i in equations (8.7) and (8.8). The m atrix E& is singular
of rank k — 1, given th a t no other constraint can be imposed on subsets of probabilities
except the unit sum. However, the m atrix
is shown to be positive-definite of full rank
k — 1, since it is simply E^ with its first row and column removed. A formal proof of the
positive-definiteness of E ^ -i will be given later in Section 8.4.2.
8.3
A ssessm en t tasks
Since the transform ations in (8.2) and (8.4) are strictly monotonic increasing from p to Y fc,
we can establish a one-to one correspondence between the medians and quartiles of these two
vectors. The required assessments are detailed as follows.
8.3.1
A ssessing initial m edians
• The choice of a category to sta rt with is arbitrary, as discussed earlier. Hence it may
be chosen by the expert as the most common category and its probability is denoted
p\. A median value m \ for p\ will be assessed as a first step. Then the expert assesses
median values rrij, j = 2, • • • , k, for all the remaining categories. These assessed values
are shown by the blue bars in Figure 8.1.
217
~
~
uaa
Now, you have finished with th is fram e. A ccept o r modify suggestions to sum to one!
: ...........
Jja&
S3e E tt Tods IHp
Eliciting M edians o f P robabilities fo r Each C ategory
1
4
3
2
C a te g o r ie s
im S S r l
r
- t
- |
; A c c e p t S u g g e stio n s
'l
rtet > |
r^ W f!
Figure 8.1: Assessing probability medians for logistic normal elicitation
• The norm ality assum ption of Y_k , together w ith the unit sum constraint of p, can be
used in Lemma 8.1 and Theorem 8.1 (which are given in Section 8.4) to show th a t
the unit sum constraint must be also fulfilled by the rrij. T hat is X]jLi m j = 1- To
attain m athem atical coherence, the software suggests a normalized set of assessments,
given by the yellow bars in Figure 8.1, as follows. Suppose the initial assessments were
• • • , m'k. Then the coherent assessments th a t are suggested for the rrij are given
by
m'm 3 = — ------ .
for i = 1.2,••• , k.
i 2 m 'i
i= 1
W ith our software, the expert can keep changing her assessed values until she is happy
w ith the normalized values th a t are suggested.
8.3.2
A ssessing conditional quartiles
• In this assessment task, the expert is asked to assess a lower quartile L \ and an upper
quartile U£ for p\. She is then asked to assume th a t p\ = m \ and gives a lower quartile
L \ and an upper quartile U2 for P2 - For each remaining pj, j = 3, • • • , k — 1, she
218
assesses the two quartiles L*j and Uj given th a t pi — m i, P2 = m 2 , • • •, P j-i = rrij- 1 .
See Figure 8.2, where the expert has assessed the two quartiles of p% conditional on the
m edian values of p \ and P2 as given by the red bars.
• The lower (upper) quartile L k (Uk ) of pk is autom atically shown to the expert once
she assesses the upper (lower) quartile Uk - 1 ( L ^ j ) of Pk-i, see Figure 8.2. The two
quartiles L k and Uk are also shown to the expert as a guide to help her choose L k_ x and
Uk_ v In fact, L k (Uk) is the lower (upper) quartile of
as
+ Uk =
(pk\pi = m i , • • • ,P k- 2 = m k - 2)
+ L k = 1 —m i — • • • —m k - 2 , from the unit sum constraint.
■not*)
Tho C onditional D istribution o f P3
6
Now, y o u h a v e finished w ith th is ca te g o ry . You m a y p r e s s 'Next* to pn
■■■
..
■■■■■ ■■■■■_■111
■
■"
0
Flc E « Toots Help
P3
Eliciting Q usrdlos of th o probabllltlos of cate g o ry (3)
I
2
F6aV|
IS @ 0 ^ S tit p
* ffi
m.T2£»Q*i.
QJCTTA?... I Qawyahd?
3
4
Categories
fsar|
| ffiootnoQiarUts
fTwp~l
||g j TheCootfPwat PtetrT
j« © £ £ J-W»
Figure 8.2: Assessing conditional quartiles w ith lognormal feedback
o To help the expert during this current task, the software presents an interactive graph
showing the pdf curve of the lognormal distribution of
(pj\pi =
m i,
• • • , P j - i — r r i j - 1) ,
for j = 2,3, • • • , k —1, see Figure 8.2. The expert is able to change her assessed condi
tional quartiles of pj until the conditional pdf curve forms an acceptable representation
of her opinion. W ith the aid of the lognormal curve, the expert is advised to make
219
sure th a t her assessed interquartile range gives an almost zero probability of pj exceed
ing 1 —
i m i • This boundary is given by the red vertical line on the pdf graph of
Figure 8.2. See Lemma 8.2 for the formal validity of the above results.
8.3.3
A ssessing conditional m edians
• Here, the expert is asked to assume th a t the median of p\ has been changed from m i
= m i + rf[. Given this information, the expert will be asked to change her
to
previous medians rrij of each pj. Her new assessment, rrij
rrij i = rrij + 6 j }1,
may be w ritten as
for j — 2, • • • , k.
(8.9)
• In each successive step i, for %= 2,3, • • • , k —2, the expert will be asked to suppose th at
the median values of pi, p 2 , • • •, Pi are m |^ = m i + V iim 2,2 — m 2 ,i + v b ' ' ' »m i,i =
m*i_i -f 77*, respectively, shown as red bars in Figure 8.3. Given this information, she
will be asked to change her assessed medians of the most recent previous step m *+ 1 i_ 1,
m i+2 i-i> - •• i ^ k i - 1 > sh°wn by black lines in Figure 8.3.
< fi,i
=
°i+ hv
+
m i+2 ,i = m i+2 ,i-i
+
ei+2 ,v
Her new assessments are
• ••
> mU = mU -1+
respectively, which are shown as the blue bars in Figure 8.3. For i = 2,3, • • • , k — 2,
and j = i + 1, i + 2, • • • , k, we can write
rriji = irij^i-i + Oji is the median of {pj\pi = m ^ 1, • • • ,pi = m j^).
(8.10)
• For m athem atical coherence, as will be proved in Lemma 8.3, we have to make sure
th at
i
k
. 2 3 roh + H m h = 1<
j= 1
j=i + 1
<= i , 2 , - - , f e - 2 .
The expert has the option of changing her initial set of assessments rn'i+l i , 'rn,i+2 )i, • • •,
m'k
the blue bars on Figure 8.3, until she feels th a t the suggested normalized set m*+l i ,
> m ki-> shown as yellow bars on Figure 8.3, gives the best representation of
her opinion. The software suggests each normalized conditional median m ^ as
220
i - E
m r,r
r=1
mJ)i5
m i,i =
for i = 1, • • • , fc —2,
J = ®+ 1, *- - , fc.
E mr
_ r= i+ 1
JsL*l
EC
Now, you have finished with th is fram e. You m ay press 'Next* to proceed
■aigjxj
He E S Totfc M b
Eliciting conditional m edians o f P robabilities fo r Each C ategory
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
Bocfify jo u r tnedtans to sum to one o r ac cept suggestions tn yeiiow!
I 0.55
\ 0.50
* 0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
C ategories
Revise S u g g e stio n s
“c;eptSuggestions
pw rj
Figure 8.3: Assessing conditional medians for logistic normal elicitation
• The current assessment task stops at step k — 2, as we do not ask for any conditional
assessments for the last remaining category p&. As the condition of summ ing to one
must be fulfilled, conditioning on specific values of a llp i,p 2 5• • • ,P k - 1 gives a fixed value
for Pk- Then no upper or lower quartiles can be assessed for pk, as m entioned before.
Conditional medians of Y& given specific values of Yf, I 2 , • • • , Tfc-i can be autom atically
computed when needed, as will be shown later.
8.4
E licitin g prior h yperp aram eters
The normalizing one-to-one functions in equations (8.2) and (8.4) are used to transform the
assessed conditional quartiles of p into conditional quartiles of Y k and, hence, into conditional
expectations, variances and covariances of the m ultivariate norm al variables. In particular,
221
letting M ( X ) denote the median function of the random variable X , we proceed as follows.
Let
for j = 1
mi,
mlo
=
(8.11)
<
M (p j\p i= rr ii) ,
for j — 2,3, - - - ,k.
Since the normal variates, Yj = log(pj/pi), j = 2, • • • , k, depend on the fill-up probability
p i , eliciting prior hyperparam eters for Y_* is tractable if we condition on p i . T hat is why we
define the extra normal variate Yi as in (8.4) and the conditional medians, m*jQ, as in (8.11).
These conditional medians are required instead of the assessed unconditional medians, rrij,
to elicit the hyperparam eters of the logistic normal prior distribution. However, we chose
to elicit the unconditional medians as they are easier to assess than conditional medians.
Fortunately, under the normality assum ption of Y f and the unit sum constraint of p, we will
show in Theorem 8.1 below th a t the marginal unconditional medians, rrij, are identical to
conditional medians, m j Q, of pj, for j = 1,2, • • • ,k, respectively, provided the lognormal sum
is adequately approxim ated by another lognormal random variable.
For
2
= 1,2,-*- , k, let
(8 .12)
rriifi = E (Y i).
R em ark 8.1
It is worth noting th a t Yi = m^o when pi = m \ Q, but, Yi =
o when both pi = m*0 and
Pi = m \ 0, for i = 2,3, • • • , k.
Extensive use is made of the fact th a t each Y{ follows a symmetric distribution (each has
a normal distribution), so E (Y i) = M (Yi). This is a key assum ption in proving the following
lemma, which states an im portant result th a t is needed in the proof of Theorem 8.1.
L em m a 8.1. Under the unit sum constraint of p, and the multivariate normality ofY_k,
k
X ^ K o = 1i=1
222
P roof
As given by (Rao, 2002, p 522), the conditional distribution of any subset of singular normal
random variables is normally distributed with the usual conditional mean and variance, but
with generalized inverses of matrices. This property enables us to write, as in the non-singular
case,
B (Y k) = E[Yk \Y! = E(Y!)}
= £[Yfc|Yi = £ (Y i),y 2 = E (Y 2), ■■■, Yfc-i = E{Yk_{j\.
Then, replacing means by medians and using (8.12), we get
M {Y k \Yi = m i )0) = M {Y k\Y1 = m ij0, Y2 = m 2 ,o,--- ,Y k - i = ^ - 1 ,0 )-
Hence, from Remark 8.1,
M[\og(pk) - log(pi)|pi = m i)0] =
M[\og(pk)
-
l o g ( p i ) |p i = m i 0 , • •' i P k - l =
which gives
l o g ^ . o ) - los (^ i,o ) =
\ o g [ M ( p k \pi =
m j|0, • • •
, p k - 1 = m*k - h 0 )} ~
log(m*)0).
i.e.
m k ,0 = M ( P k \ P i = m*l t 0 , P 2 = 7712,0, • • • , P k - 1 = m*k _ i j0)
fc-i
i= 1
This is the unit sum constraint, which completes the proof of Lemma 8.1.
The main idea in Theorem 8.1 is th a t the fill-up category can be changed from the first
category to any other category, and the same assumptions are still valid. We first give some
relations and notations needed for the proof of the theorem.
223
Let Y ^ = Y*, and denote the mean vector and variance-covariance m atrix of the multi
variate normal distribution of Y^-q by fi ^
and
= Efc_i. We supposed
and
S(!) have already been assessed. Moreover, let Y \^ = log(pi) —log(l —pi), w ith E \ = E ( Y i fi)
and V\ = Var(Yijl).
To change the fill-up category from the first category to any other category j , for j =
2,3,■ • • , k, let
Y ij.
log(pi) - log (pj)
Y j-U
log(pj_i) - log ( P j )
Yj+i,j
log(Pj+i) - lo g fe )
Ykj
logfrfc) - log(pj)>-.
Y j j =log{pj) - l o g ( l - p j ) ,
with
tt(j) ~ E{Y_(j)),
^(j)
E j^ E iY jj),
V a r(Y ^ ),
Vj = Var(Y)j),
and
p itj = E (Y id ),
crfj = Var (Yi J ) ,
i , j = l , 2 ,--- ,k ,
i^ j.
It can easily be shown th at, for j = 2, • • • , fc,
1);
— (j) ~
(8.13)
where Fj is the identity m atrix of degree k —1 with the j th column replaced by a column of
-1. From the normality assum ption of Y(i), and in view of (8.13), we have
Y {j) ~ M VN (£o r S w))
224
with
filfl = FiU < J-iV
EU) = Fi S (i-1) F'r
Approximate normality of each Y jj, for j = 2,3, • • • ,k, is thus induced from the normality
assum ption of
in a m anner exactly similar to th a t for Y \t
Hence, for each j = 1,2, • • • , k,
we can also assume th a t the k random variables Y ij, for i = 1,2,
,k, are m ultivariate
normally distributed. Moreover, using the norm ality assum ption of Y^i, we assume th a t
the k + 1 random variables Y i}i and Y ij, for i = 1,2, •• • , k, are also m ultivariate normally
distributed for each j = 1,2, ■• • ,k.
T h e o r e m 8.1. For any j = 2,3, • • • ,k , under the unit sum constraint of p, and the multi
variate normality ofYf(j)>
m j = M (pj) = M (pj\pi =
= m j)0.
P ro o f
Let
= M[l°g(pi) - \og(pj)],
i = 1,2, •••,& ,
i^j,
then
= F (y i j )
= E{Yij\Y j d = E j )
=M[\og{pi) - log{pj)\pj = M(pj)].
Hence, exponentiating both sides of the above relation, we get
M\pi\pj = M{pj)} = M (pj) exp(m i>(j-)).
(8.14)
As in Lemma 8.1, we put
k
M (Pj) + Y M\pi\pj = M (pj)\ = 1.
#3
225
(8.15)
Solving (8.14) and (8.15) for M ( p j ), we get
M( pj ) = ------~
k— ----------- •
1+ S
(8.16)
exp(m i>(i))
On the other hand, for j ^ 1, since
P r {pj < m lo\pi =
= 0.5,
then
P r { f e /P i) < (^ j.o M i.o ) ! ^ = ™i,o} = °-5>
and
Pr{log(pi/pj-) < l°g(m i,o/m j,o) l^i,i = # i} = °-5So, we can write
log(m l,0/m j,0) = M ( Y 1J \Y1,1 = E{)
= E ( Y l d \Y1,1 = E 1) = E (Y l d ) = m 1|0).
(8.17)
Moreover, for j ^ i ^ 1, since
m i>U)
= E (Y id \Yhl = E u Y j j = E (Y j d \Yltl = E l ))
=M[\og(pilpj )\p 1 = m i)0,Pi = m j|0],
we have th at
Pr{log(pi/pj) < m ii{j)\pi = m*h0,pj = m j>0} = 0.5,
and
Pr{pi < m l o exp(m i>(7-))|pi = m \ t0} = 0.5.
So,
m !,o = m j,o exp(m ,i(i)),
which gives
(8.18)
Substituting (8.17) and (8.18) into (8.16) shows th a t M(jpj) is as stated in Theorem 8.1.
226
8.4.1
E liciting a m ean vector
To elicit a mean vector jj^ = (miQ
m20
...
mfej0), we put
(8.19)
m i,0 = E (T i) = M (T i)
= M (log(pi) - log(l - pi))
(8 .20)
= log(mi,o )- lo g ( l - m lj0).
For i = 2,3, • • • , k, put
(8 .21)
mi,o = E (Y i) = E[Yi\Yi = E ^ ) ]
= M(Yi\Y1 = m lt0)
= M[ log (pi) - log(pi) \p\ = m j>0]
(8 .22)
= log ( K o ) - log(m i,o)-
8.4.2
E liciting a variance-covariance m atrix
For i — 1,2, • • • , k —2, and j = i + 1, • • • , k — 1, let
m 3,1
E(Yj\pi = m *1>0 + r)$,P2 = ml,! + *72. *• • ,P» = m i,i-1 +
)•
Then
/ mu
m j i = log
J’
\ m ?h i /
For 2 = 1 , 2 , - - ,k — 2, define rji by letting r\i =
Y i~
(8.23)
m ^ - i when pi = m * ^ + 77*. Then
rrij^i = E(Yj\Yi = rai)0.+ r)i>Y 2 = m 2,i 4- 772 , —
=
^ - 1
+ 77*),
(8.24)
and
log f — ^1,0 + 71. ^ _ log
--’V l > for 2 =
\ l - { m * 1Q + l?i) i
I 1 —7T2i 0 /
1,
Vi =
l0g
" ') ~ l0g ( ^
Analogous to m*^ — m * ^ +
77*,
f ) ’
for i = 2,3, • • • , k
~ 2
define
mi,i = m , i -
1
+ 7a,
for i = 1,2, • • • , k - 2,
227
(8.25)
so Yi = m iti when pi = m ^ .
For
2
= 1,2, • • • , k - 2, and j =
2
+ 1, • • ■,k — l, analogous to 0Y = mjh -
define
so th at
6 j,i
- log
/ '’’ft? t-1 + 0? *\
Im
" \
"
-lo g
V
m ?.i
/
V " ‘ 1,1
of rank k — 1, let
To elicit a (singular) variance-covariance m atrix
U i-h
1.349
Vi = Var(Yi) =
(8.26)
where U\ and L \ are the upper and lower quartile of Yi, respectively. We have th a t
Ui = log(C/*/l - [/*),
L>1 = log(Z q/l —L\).
For
2
= 1,2, • • • , k — 2, and j =
2
+ 1, • • • , k — 1, let
Vjti = Var(Y)|Yi = m i|0, Y2 = m 2>0, • • • , Y = rriifi),
so th at
VjJ -
1
=
Uj — L j
1.349
,
for j = 2,3, • - , k - 1,
,
L3 =
(8.27)
with
U *\
^J=l0g
-mif
‘i .o/
( L*a
'
\ m i.o /
Having defined the above quantities, we are ready to state and prove the following two
lemmas.
L e m m a 8.2. Under the assumptions of Lemma 8.1, for
{Pi\pi = m i|0,p 2 = ™2,o> • ’ • >Pi-
1.
1
2
= 2, • • • , k — 1,
= m i - i,o) ~ Lognormal(n*,V*),
where
Pi = m ito + lo g (m i0) = log(m*0),
228
and
VC = Viti- i =
'Ui
-
Li
1.349
i —1
2.
Pr ^ p i > 1 -
m j ,o f <
j =i
if and only if
Uf
I 1.349
< exp
LI
| zi_
i—1
log (
3=1
where za is the a quantile of the standard normal distribution.
P roof
From the normality of F fc together with property (v) of the singular normal distribution in
(Rao, 2002, p 522), we have
(Yi\Yi = mifl, • • • ,Yi- 1 = ?7ii_i,o) ~ N(mi)0, Rv-i).
Then for known fixed m | 0,
(Yi + log(m ij0)|Fi = m i|0, • • • ,Yi - 1 = m i-i,o) ~ N ^ o + log(m i>0), V ^i-i).
The one-to-one transform ations in (8.2) and (8.4) then give
(—
pi 1m i,o\Pi = m i,o>' ' ' >Pi-1 = ™*-i,o)
= (p*bi = m i,o> • • • >Pi-1 = m i-i,o) ~ Lognormal (ra^o + log(m i>0), F ^ - i) .
Using equation (8.22), the first statem ent of the lemma is proved.
To prove the second statem ent, we use standard normal distribution theory and the first
statem ent of this lemma to state th at
p r , log(Pi)^ ft- > log ( i - J2lj=\ ™j|0) - lA
VW
'
if and only if
log ( l - E}=i rnlo) Zi _ c
229
< a,
or, equivalently, if and only if
- n*
log
Z l-a
This proves the second statem ent.
L e m m a 8.3. Under the assumptions of Lemma 8.1,
k
j= l
J=i+1
P ro o f
Using equation (8a.2.11) of (Rao, 2002, p 522), for i = 1,2, • • • , fc —2, we can state th a t
E[Yk \Yi = m i,!, • • • , Yi = mi,i] = E[Yk \Yi = m i,i, • • • , Y = m u ,
Y + 1 — E ( Y { + 1 |Y
— 7771,1, • • • , Y i — 7 7 7 i,i), • • • , Y —1 — E ( Y k —l IY i — 7 7 7 l , l , ■■’ j Y
^ i,i) ] •
Then, from definition (8.24) and (8.25)
M (Y k \Yi =
7771,1,
• • • , Yi =
7 7 2 i,i)
= M(Yfc|Yi =
7771,1,
Y +l —
• ** , Y =
777f,i,
) Y —1 —
Hence
M[log(p&) - log(pi)|pi = m ifl,
Af[log(pfc) - log(pi)|pi =
- ,Pi
=
my =
••• ,Pi
=
m l i}p i + 1 = m j+1>i, • • • ,Pfc_i =
which, utilizing equations (8.9) and (8.10), gives
l°g(™fc,i) - l o g ^ ^ ) =
\og[M(pk \pi = m y ,
,Pi = m li,P i + 1 = K+i,i> • • • ,Pfc-i = rn%_hi)] - log(7n*jl),
i.e.
m k,i = M ( p k \pi =
777^ 1 , • • •
,pi = m li,pi+ i = m*i+hi, • • • ,p k - 1 =
230
Since the condition in the RHS of the above equation relates to all ps except pk, applying the
unit sum constraint gives the conditional median in the form of the following complement:
i
k—1
j =1
j=i+1
m
which ends the proof of Lemma 8.3.
Now, we modify the m ethod of Kadane et al (1980) to show th a t the quantities in
(8.24)-(8.27) are sufficient to elicit a positive-definite variance-covariance m atrix Vk-i for
Y*.-i = (Yi, • • • , Yfc_i). Then, based on the condition of Y^i=iPi —
an^ assuming th a t it
is the only constraint on sums of these probabilities, we add a k th row and column to get E&
as a singular variance-covariance m atrix for all the elements of Y fc. Removing the first row
and column of Efc will lead to the desired positive-definite variance-covariance m atrix Efc_i
of Y*.
For i = 1,2, • • • , k — 1, let
and
Vi = Var(y^),
with Vi as defined in (8.26). Suppose th a t Vi-\ has been estim ated as a positive-definite
m atrix. We aim now to elicit Vi and investigate its positive-definiteness.
Vi can be partitioned as
Vi—l
Vi—iWj
(8.28)
Vi =
y/iVi-i
o?
where
V -m i =
Cov(yi_1,y i ),
and
of = Var(y<).
231
It is well-known from m ultivariate normal distribution theory th a t
E (Y ^ Y i_ l ) -
m
i) =
lY .i- 1 -
= [Z i-i -
E ^ Y i-^ 'V r -iV i-m
£ ( & - ! ) ] '& •
(8 -2 9 )
Moreover, for j < i — 1, taking the conditional expectation of both sides of (8.29), given
th at
Vj = (m i )0 + r/i, m 2)i + 772, • • • , r r i jj - i +
gives
E [ E W Y ^ Y j = y,.] - E(Yi) = E { [ Y - E i Y ^ Y j = y . } ’ a .
(8.30)
i.e.
E (Y i \Yj = y ^ - E ( Y i )_
= ( 3/1 - E ( Y 1 ) ,y 2 - E ( Y 2),
E (Y j), .
- S « + i ) , ■■■, B W - il Z j ) - B (V i-i)) 2Sj.
(8-31)
Prom (8.24) and (8.31) we get
m i,j -
m i ,o =
(t?1 > 7712,1 -
771j+1,j
7712,0 +
7?2, • • ' , 7 7 l j , j - l -
771j-)_i,o,
j TTli—i j
771j,0 + TJj,
771j_l,o ) I L i •
This holds for j — 1,2, • • • , i — 1, so we have a system of i — 1 equations of the form
(8.32)
where
771i,l
777-2,0
777-2,2
777/2,0
Ti =
777/2,2—1
232
777-2,0
and
Vl
Q i—l
—
7712,1 -
m 2)0
771
7772,1 -
7772,0 + V 2
771
7772,1 -
7772,0 +
7?2
771
7772,1 -
7772,0 +
V2
7713,1 ~
7773,0
’ **
7 7 7 i_ i,l -
7 7 7 j _ i,0
7773 ,2 ~
7773 ,0
• ■•
7 7 l i _ l ,2 -
7 7 7 * _ l,0
• • •
777; _ i ,3 -
771j _ 1,0
7773 ,2 - 7773 ,0 +
7773 ,2 -
773
7773j0 + 7?3
•••
77r7 * _ l,* _ 2 ~
777^ -1,0 + V i - l
Since rriij — m ^ j - \ = 8 i j , j = 1,2, • • • ,7 — 1, multiplying both sides of (8.32) from the
left by the m atrix
1
0
0
...
0
-1
1
0
...
0
0
-1
1
0
0
-1
1
;
M_i =
°
0
gives
$ i, i
771
$ 2,1
••'
$ i —1,1
$ i ,2
0
772
•••
$ i —1,2
U a.
0
0
$ i , i —1
0
T 7 i_ i
Provided th at
Vj 7^ 0,
j = 1,2, z — 1,
the upper diagonal m atrix M i - i Q i - i is non-singular and hence
-1
- 1
r
-1
771
$ 2 ,1
•••
$ i —1 , 1
$ z ,l
0
772
•••
$ i —1 , 2
$ i,2
Ua =
1
1
5?
0
0
0
$ i , i —1
• Since
V a r ^ l X ^ ) = V a r(^ ) - uJVi-m*,
233
we can now use the assessed conditional variance given by V ^_i in (8.27) to estim ate
the unconditional variance of as follows:
o l = Viti-1 + y^Vi-iUi.
Using the Schurr complement, the m atrix Vi is positive-definite if and only if
of - IkiYi-iUi > 0,
which is guaranteed from (8.27) since V ^_i > 0.
Choosing the arbitrary values rjj ^ 0, j = 1,2, • • • , i — 1, guarantees the existence of a
unique solution for u{. It can be seen from the relation
log
log
m l v + rit
log
i - K i0+)?nj
( m j , j - 1 + Vj \
-lo g
m 1,0
u - mi.oJ ’ for 3
( m
for
m 1,1
— 2,3,*** , i - l ,
th a t rjj = 0 if and only if r]j = 0, j = 1,2, • • • , z — 1.
• So far, the proposed m ethod estim ates Vi as a positive-definite m atrix, assuming th a t
V i- 1 is positive-definite. Since V\ > 0, the m ethod yields a positive-definite m atrix
Vk- \ , by m athem atical induction.
E stim a tin g th e last row and colu m n o f E&
Let E*; be partitioned as follows
Vk- 1
Vk-lUk
u'kv k- 1
^
where
Vfc-iUfc = C o v ^ .^ U f c ) ,
and
4 = Var (Yk).
234
(8.33)
Note th at, according to the condition th a t elements of p m ust sum to one, the condi
tional variance of Yk, given any specific value for Y_k_ i, has a fixed value of zero. Hence,
using the standard theory of the m ultivariate normal distribution, we estim ate ak as
= u'kVk- i u k.
• To estim ate u k we write, as in (8.29),
£ ( n |Z * - i ) - E (Y k) = [y*_, - E ( Y k^ ) ] ' u k.
Exploiting the condition th a t Y^t= i Pi =
(8.34)
we can obtain k — 1 estimates of E ( Y k\Y_k_i)
from k — 1 different sets of conditioning values for Y_k_ 1. More preciously, let
m k>o
=
E[Yk \Yi =
m fc,i
= E \Y k \Yi
m i )0, Y2 =
-
m k)i = E[Yk \Yi =
• • • , Yk- i
m i , i , Y2
, , Y2=
7771 1
m 2,o,
1
>0],
rn2)0, ■■ •
=
, , • • • , T i-
7772 2
= m k-
1
=
777
Yi+1 = 777i+i,i, • • • , Yfc_2 =
; _ i , ; _ i , Yi
, Y fc- i
=
777
= ^ U fc-i.o ],
*,*—! ,
= "7fc-l,fc-l],
for i = 2,3, • • • , k —2,
Wlfc.jfe-l = £?[Yfc|Yi = 7771,1,^2 = 7^2,2, • • • , Yfc-2 = ^fc-2,fc-2, ^fc-1 = ^fc-l,fc-l],
where m k- i ik- \ is an arbitrary value, which will be chosen such th a t
l,fc—1 7^ ^fc—1,0*
We require 777.fc—i,fc_i
7
^
^ fc - 1,0 ln order to solve the resulting system of equations, as
will be shown later.
This gives the system of k — 1 equations,
Tfc = Q k- \ u k,
235
(8.35)
where
1
m k ,0
2
rnk,Q
Tk =
'W'k^k—1
'm,k,Q
m
0
0
0
•••
0
m
m 2 ,l
m 3,2
''’
m 'k-2,k-3
m k-l,k-l
m
™ 2 ,2
m 3,2
■ ■•
m 'k-2,k-3
m 'k-l,k-l
m
m 2,2
m 3,3
m k —2 , k —3
m k-l,k-l
m 'k-2,k-2
m 'k-l,k-l
Q k —1 —
m
m 2,2
r r iij -
rrn t
m 3,3
and
m[ j
=
i = 2, 3, • • • , k - 1,
o,
j = i - 1, i.
We multiply both sides of (8.35) from the left by the m atrix M k~i, which has a different
structure from
(i < k), taking the form
1
M k -1 —
0
0
0 -1
1
0
0
0
0
-1
0
0
1
0
0
0
-1
The system of equations can then be w ritten as
n^k, i
rrikfi
m
. m k,3 — m k,2
0
T]2
0
(8.36)
'nr1/k,k—1
^ k ,0
mkjk —2
TYlk,k—l
0
-m
~ m 2,2
236
0
Vk- 2
•••
~ m k - 2 ,k- 2
Vk- 1_
where
™>iti = rrii,i ~ m , o,
Vk-i — m k -
i=
2
,3, • • • , k — 2 ,
i,o -
Provided th a t
rij t^O,
j = l , 2 , k — 1,
the lower triangular m atrix M k - i Q k - i is non-singular and hence
- l
r
mk, i
m
0
-|
p
0
T)2
m k ,o
m k fi ~ m k ,2
U.k =
0
0
-m
— m
m k , k —i
V k-2
- m k -2 ,k -2
2 ,2
r]k-i
m k ,o
m k , k —2
m k , k —l
P o sitiv e -d e fin iten ess o f th e variance-covariance m atrix
As mentioned before, the inverse of the additive logistic transform ation is applied to the k
dimension random vector p, transform ing it into the k — 1 dimension random vector Y_* =
(y^
Y s , • • • , Pfc)* We are interested in the hyperparam eter T,k-i as this is the variance-
covariance m atrix of Y_*. Although the whole m atrix
is clearly a singular m atrix, we will
show th a t the subm atrix £&_i is sure to be a positive-definite m atrix, provided th a t no subset
of categories has a known fixed sum of probabilities.
Consider the following partition of the singular m ultivariate normally distributed Y_k:
log(pi) - log(l - Pi)
1” Fi
y2
y3
=
Yh- i
Yk
logfe) ~ log(pi)
log(p3) - log(pi)
log(pfc_i) - log(pi)
log (P k )
237
-
log(pi)
.* L
y'**
"n
Recall th at, by definition,
Y2 '
y3
y*
y
**
__
"Yk '
Yk- i
"n"
Let Sfc be conformally partitioned as
a' i b
V!
a V*\ c
Sfc =
b
"bi
b
1
where
V* is a (k —2) x (A; —2) square m atrix,
a and c are (/c - 2) x 1 vectors,
V\ ,cr| and b are scalars.
Y\ a'
a v*
The m ethod we used to estim ate Vk-\ =
guarantees its positive-definiteness, hence
V* is also positive-definite.
The m atrix Efc_i is then partitioned as
~v*
d
For Efc_i to be positive-definite, we must show th at
4 > e n v y 's .
In fact, using the inverse of a partitioned m atrix, and for d = V\ —
-l
a
A
c
*
Vi. \ a'
I
b id
we may write
- d -'a 'iV * )-1
d~l
- { y * ) ~ l ad
' (
- Y ’(V"*) 1’a'rf-f ’(T>*) - r
= £, (l/ *)“ 1c + i {62 - 26[fi'(K*)-1c] + [s'(V *)-1a][a'(V *)-1£|}
= c '(K * )-1c + i [ 6 - a '( F * ) _1s ]2 -
So, Efc_i is positive-definite if and only if
b - a ' i V * ) - ^ ^ 0.
238
(8.37)
The m ethod used to estim ate E& autom atically guarantees the fulfilment of such a condition.
In fact, using the following partition of u k,
u\_
U2
Us
.Uk
.VlI.
U<1
U k-1
gives
— Vk—1 Uk
Vi a'
a V*
“ 1.
U2
V\ u\ + a' u 2
a u\ + V* u 2
Condition (8.37) thus holds if and only if
Ul ± 0.
[Vi -
But V\ —d ( y * ) ~ l a > 0 from the positive-definiteness of V k -i, and hence
definite if and only if u\ ^ 0.
It can be seen from (8.36) th at
UT'k, 1
u\
772fc,0
11
This condition is sure to be fulfilled since
mkfi = log
1
E j J
m *i,o
and
m k,i = log
m;
from which
fUkyl 7^ Ulkfil
unless
™1,1 = ™i,o>
239
m j ,o _
1
i is positive-
which can never occur since
ThVO.
So, the proposed m ethod for eliciting the m atrix E& ensures th a t Efc_i is positive-definite,
even though
is itself singular.
Once n k and Efc have been estim ated, equations (8.6)-(8.8) give the hyperparam eters
and Efc_i of the logistic normal prior distribution of p based on the normalizing transform a
tions given by Y_*.
8.5
Feedback using m arginal quartiles o f th e lo g istic norm al
prior
After eliciting the mean vector p k _ 1 and the variance-covariance m atrix E ^ -i of Y*, the
software calculates marginal medians and quartiles of the probability of each category and
displays their values as feedback to the expert. Since the initially assessed quartiles were all
conditional, it is useful to inform the expert of the marginal quartiles and give her the option
of changing them if she wants.
To add this feedback option to the software, we had to develop a reliable technique for
estim ating marginal quartiles from the elicited hyperparam eters
and Efc_i. Moreover,
we must correspondingly modify the elicited hyperparam eters once the m arginal quartiles
have been changed by the expert during the feedback stage.
A simple direct m ethod for estim ating the m arginal moments, or quartiles, of the logis
tic normal distribution in closed forms does not seem to exist in the literature. Aitchison
(1986) suggested using Hermitian numerical integration m ethods to obtain m arginal mo
ments. However, he argued th a t the main practical interest is in the ratio of components,
not in the component themselves. This is not the case here, as we are mainly interested in
marginal probabilities, not in their ratios. Another approach, based on the Gibbs sampling
technique, has been used by Forster and Skene (1994) to accurately approxim ate the posterior
240
marginal densities and other summaries for a broad class of prior distributions including the
Dirichlet and logistic normal distributions. However, the m ethod approximates the marginal
densities of the posterior distribution rather th an the prior distribution.
Under the normality assum ption of Y* and the unit sum constraint, it has been proved in
Theorem 8.1 th a t the m arginal unconditional medians o fp j, rrij, are equal to their conditional
medians, m |)0,.for j = 1,2, • • • , k.
Moreover, the same assumptions make it possible to estim ate marginal lower and upper
quartiles for each pj, for j = 1,2, ■• • , k. In the following lemma we formally state and prove
the above results. Then, we propose a m ethod of revising the estimates of Affc_ 1 and E ^ -i to
reflect any change made by the expert to the m arginal quartiles.
L e m m a 8.4. For any j = 1,2, • • • , k, under the assumptions of Theorem 8.1,
and Vj is guaranteed to be strictly greater than zero.
P ro o f
Since
with known p i j , af j, the expected value of the lognormal distribution of (Pi/pj) is given by
On the other hand, by the assum ption of approxim ate norm ality for Yj j, we have
los ( i Z ^ : ) ~ N (£ a > ri),
SO
10S^ ~ p T )
and
Mj = E [ 1
= exp
Pj
J
- E j + -Vj ) .
V
(8.39)
^
We take Mj as in (8.38), and Theorem 8.1 gives
^ = Iog( i ^ % ) -
(8-40)
Equation (8.39) can be solved for Vj to give the first statem ent of Lemma 8.4.
Substituting m ^0 for M ( p j ) in equation (8.16) and putting
Ej = - log ^ ^ e x p ( / x i(j ) j .
= fiij, gives
(8.41)
This guarantees th a t Vj > 0 in (8.39), since by comparing the RHSs of (8.38) and (8.41), we
can see clearly th at
Mj > exp (—E j ) .
This ends the proof of Lemma 8.4.
The two unconditional quartiles of pj can be obtained from
n („ \ exP[<My« ) l
Ql[Pl)
l + expfQ, f t , , ) ]
and
n (ri i _
Qz{ Pl )
exp[Q3(y ij)]
l + exp[Q3« j ) ] ’
with
=
+
0.25),
Q 3 (Xjj) = Ej + y/Vj <S!-\0.7t>),
242
where $ is the cdf of the standard normal distribution.
The unconditional quartiles Qiijpj) and Qz(pj) are presented to the expert as feedback
with the unconditional median M(pj), for j = 1 ,2 ,-•• ,k.
The expert has the option of
changing any of the unconditional medians an d /o r quartiles. The changes are reflected in
estimates of the hyperparam eters p<k _ 1 and Efc_i, using the following approach.
• Let m'( pj ) denote the values of M ( p j ) after re-assessment (j = 1,2, • • • , k). We revise
H, i to
log{m*{pj)) - log(l - m*(pj))
for j = 1,
log(m*(pj)) - log(ra*(pi))
for j = 2, • • • , k,
Pj,i = E*(YjA) =
with a new normalized set of medians m*(pj), where
^
j = 1,2, ••• ,fc,
i= 1
• Suppose one ormore of the marginal unconditional quartiles Qi{pj) an d /o r Qz(Pj) are
re-assessed as Q[(pj) an d /o r Q'3 (pj), respectively, for j
variance-covariance m atrix
—
1 ,• • • , k.
Then we change the
to
= Var*(y(1)) = D l E(1) D l ,
(8.42)
where D is a diagonal m atrix with diagonal elements
°il
di = - s - ,
ah
i = 2,3, •• • ,k,
and cr?i is defined by
°f*i = Var* (log(pi) - log(pi))
= V ar*(log(^)) + Var*(log(pi)) - 2Cov*(log(pj), log(pi)).
(8.43)
The modified variances and covariances, Var* and Cov*, respectively, are determ ined as
follows.
243
As Y j j is assumed to have an approxim ate normal distribution, let
Q 'ziP j)
log
V- = Var log
-
Q i (p j)
1 - Q'liPj)
1 - Q'zivj)
Pj
1
log
1.349
P j
so,
Y,J ~ N ( E J t v ; ) .
Using a simple numerical integration technique on the normal pdf of Yj j, we can get the
expectations, for j = 1,2, • • • , k, in the RHS of the following equation,
Var* (log (pj)) = E<\o g
,1 + exp (Y^-).
- E <log
,1 + exp (Yj d )_
To attain a strictly positive value of crf\ as in (8.43), we modify Cov(log(pi), log(pi)) by
putting
Cov*(log(p»),log(pi)) = Wi Cov (log (p*), log (pi))
i = 2,3, • • • , k.
where
1Var* (log {pi)) Var* (log (pi))
Wi =
Var (log (p^) Var (log (pi))
i = 2,3, • • • , k.
In (8.42) we use the diagonal m atrix,
D =
so as to change the variances of Y _ , while preserving correlations and also preserving the
positive-definiteness for E ^ .
Another feedback window is available on request for the expert, should she need to see
the influence of changing one or more of the marginal quartile values. If this option is taken
and further re-assessment made, then the m ethod given in Lemma 8.4 is applied again on the
modified m atrix E ^ , to give a new set of m arginal quartiles. These can be changed again
by the expert if she does not find it a satisfactory representation of her opinion.
We should mention th a t the new set of m arginal quartiles does not necessarily have
the same values as the modified quartiles. The unit sum condition of p, with the norm ality
assum ption of each Yj j, for j = 1,2, • • • , k, always forces the m arginal interquartile range for a
244
single probability to partly depend on the other probabilities, as shown in Lemma 8.4. Hence,
for m athem atical coherence, the resulting set of m arginal quartiles will not correspond exactly
to the expert’s assessments. The proposed approach th a t uses Lemma 8.4 and continuous
feedback enables the expert to adjust the quartiles until she is happy w ith the feedback values.
8.6
Exam ple: T ransport preferences
In designing transport systems for the future, one ingredient is the relative im portance of fac
tors a person may consider in selecting the mode of transport for different journeys. Estim ates
of these preferences help in planning rail services, roads and other transport infrastructure.
Such estim ates are also of interest from the environmental point of view, because of the
impact of transport emissions.
For a preparatory environmental study, estimates about factors affecting transport pref
erences in 2020 were needed. In this example, a transport expert quantified his opinion about
the factors affecting the choice of transport for a hundred mile journey across UK in th a t year.
Prim ary interests of the expert (Dr. James W arren, The Open University) include modelling
energy and emissions to gain a better understanding of transport systems and the potential
effects of transportation policy and technology on the environment. He specified five quan
tities as the main factors a passenger would consider in choosing the means of transport for
such a journey. These factors are: cost, journey time, environmental im pact, comfort, and
convenience. Interest focuses on the relative frequency w ith which each of these quantities is
the most im portant factor: For what proportion of people would cost be the m ost im portant
factor in choosing the mode of transport for the journey? For what proportion would it be
journey time? And so on. The problem can thus be described as a multinom ial model w ith
five categories, one for each factor. Our m ethod and PEGS-Logistic software were used by
the expert to quantify his opinion about a logistic normal prior for the param eters of this
multinomial model.
After initializing the software and defining the model, the expert assessed his medians of
245
the proportion of people for whom C o st/ T im e/ E colm pact/ Com fort/ Convenience would
be the most im portant factor. These medians assessments were 0.61, 0.25, 0.04, 0.06, 0.10,
respectively, and they are the blue bars in Figure 8.4. These values do not sum to 1 and the
software suggests values (yellow bars) th a t did. R ather th an accepting these suggestions, the
expert revised his initial median assessments to be 0.49, 0.28, 0.04, 0.06, 0.11, respectively. As
their sum is nearly equal to one, the medians suggested next were very close to his assessments
and the expert accepted them as representatives of his opinions.
; l E r r'"r7~z:zzizzzzr'jz :
nz “
—
Now; you have finished with this fram e. Accept or modify suggestions to sum to onel
........
i Fite Edtt Tools Nefe
Eliciting M edians of P robabilities fo r Each C ategory
0.95
0.90
0.85
0.80
0 .7 5
0.70
0.65
0.60
0.55
■«°-50
I 0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0 .0 0 !
C ost
I < 8 a c k [|
T im e
E c o lm p a c t
C a te g o r ie s
S u g .y s fK .n a |
C o m fo rt
: H e#
C o n v e n ie n c e
|
j, H e l p ? '( 4 ) , |
Figure 8.4: Software suggestions for initial medians
The expert then gave his assessed upper and lower quartile values for the probability of
the first category; these were 0.62 and 0.43 respectively. Then conditioning on his assessed
medians for previous categories, he assessed his conditional quartile values. The four con
ditional lower quartiles were 0.18, 0.03, 0.03, 0.10, respectively, while the four conditional
upper quartiles were 0.36, 0.10, 0.08, 0.15, respectively. See Figure 8.5, in which the expert
has given his two quartiles of the fourth category conditional on the probabilities of the first
three categories. The quartiles of the last category follow automatically. Although the expert
is not a statistician, he had no problems in assessing quartiles after a brief discussion about
246
the m ethod of bisection.
Now, you have finished with this category. You m ay p ress 'Next* to proceed
Eliciting a u a rtn e s o f th e probabilities o f C
P ro p o rtio n
W
E c o lm p a c t
t < S ack ;
Figure 8.5: Assessing conditional quartiles
Next, the expert gave conditional median assessments of 0.41, 0.16, 0.12, 0.33 for the
remaining four categories, conditional on the probability of the first category being 0.25. The
num ber of conditions was then increased in stages. Conditional on 0.25 and 0.20 being the
probabilities for the first and second categories, respectively, the expert revised his probability
median assessments for the last three categories to 0.13, 0.18 and 0.25, respectively. See
Figure 8.6. Finally, he gave the conditional medians of 0.19, 0.30 for the last two categories
given th a t the probabilities of the first three categories were 0.25, 0.20 and 0.07, respectively.
247
: Hot/, you hove. E n r s l t o i niVtr t i u s fram e. You may* cSci ’liaxt* n o v i
Eliciting conditional m ed ian s o f P robabilities fo r E ach C ategory
i
!!
C ost
T im e
E c o lm p a c t
C a te g o r ie s
C o m fo rt
C o n v e n ie n c e
<*ladT1
Figure 8.6: Revised conditional medians
It is worth mentioning th a t the suggestions given by the software played a crucial role in
helping the expert choose medians th a t satisfy the unit sum constraint. During the elicitation
process, obviously the sums of expert’s assessments never equalled one exactly. W hen sug
gestions were offered by the software, he normally revised one assessment and then accepted
the second round of offered suggestions. After making his conditional m edian assessments,
the expert was then shown the unconditional medians and unconditional quartiles th a t were
implied by all his assessments. See Figure 8.7. During this feedback stage he was invited
to accept or revise these quantities. The unconditional medians th a t were offered were ac
cepted by the expert as an adequate representation of his opinion. However, he decided to
use the change quartiles b u tton to revise the unconditional quartiles and then reduced the
interquartile range of the last category.
248
.
:
.
.
.
1- —
—~
.
"
!
~~... aqta
H eraareyouruncontilC frnafassxissnients, yoti m aycfoang&any o f tfusmi
t
„ „„„
> F it E«
^
,
:JP1^
Toots Ba*p
0.95
0.90.
0.&5
0.80
0.75
0.70
0.65
0.60
0.55
a. 0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
« C ta n g .lM m »
p rg jg j-j
OCftanaeQuatttes
1— :------- 1
T im e
p.
E c o lm p a c t
C a te g o r ie s
,
|
j is :
C o m fo rt
|
1
!
1
C o n v e n ie n c e
f F ira s n
|
1
-
C ost
— *-----“*
F fiS W
l
1------------- 1
Figure 8.7: Software suggestions for m arginal medians and quartiles
The elicitation process took about 20 minutes to complete. The expert commented th a t
although the elicitation problem was quite tricky, the software gave a helpful form of visual
ization. He also mentioned th a t he had found it hard to make his median assessments sum
to one, so th a t the software’s suggestions had been very welcome. He also advised th a t it
would be helpful if the different categories were ordered according to their im portance, i.e. in
a descending order according to their median probability values. He thought th a t this order
would make it easier for him to think about conditional assessments.
The software output the following elicited hyperparam eters of the logistic norm al prior
as in Tables 8.1 and 8.2.
Table 8.1: The elicited mean vector of a logistic normal prior
y 2 = log(p 2 /pi)
Y 3 = log(p3 /pi)
Y 4 = log(p4 /pi)
Y 5 = log(p5 /p i)
-0.5058
-2.4517
-2.0639
-1.5043
249
Table 8.2: The elicited variance-covariance m atrix of a logistic normal prior
^ 2 = l0 g (g )
^3 = l0 g (g )
n = log(EJ)
Ys = lo g (g )
^2
== lo g (g )
0.3414
0.1511
0.1598
-0.3035
t
== 10g ( g )
0.1511
0.9087
0.3677
-0.5551
y4 =
JO
II
0.1598
0.3677
1.0906
-1.9076
y 5 == los(g-)
-0.3035
-0.5551
-1.9076
3.468
3
This output gives the mean vector and variance-covariance m atrix of a m ultivariate normal
distribution of degree 4 for Y2 , 1 3 , Y4 , Y5 . However, the marginal moments of each pi are not
given as output. Instead, marginal medians and quartiles are presented to the expert during
the feedback stage as discussed before, see Figure 8.7. The m ultivariate normal distribution
of F 2 , F 3 , Yj, Y§ may be used as a prior distribution in a Bayesian analysis. Details of the
additive logistic transform ations are also needed:
,
1
Pi =
<
for i = 1 ,
+ ^ e x p (Yj)
3=2
exp(Y )
■■■.-----,
for i = 2,3, • • • ,5.
l + ]Texp(Y-)
j =2
Of course, the extra variable Y\ is om itted as it is a redundant variable due to the unit sum
constraint on p. The software has an option to implement this prior distribution in a WinBUGS file. After the sample data are obtained, the software produces a file for a W inBUGS
model th a t contains sample data, a multinomial likelihood and a complete specification of
the logistic normal prior distribution th a t the expert assessed.
8.7
C oncluding com m en ts
In Chapters
6
and 7, we introduced elicitation m ethods for Dirichlet, generalized Dirich-
let and Gaussian copula as prior distributions for the param eter vector p of the m ulti
nomial model.
Hence the logistic normal distribution is our fourth suggested prior dis250
tribution for this model.
Among these priors, the logistic normal prior gives the most
general correlation structure. The PEGS-M ultinomial software, th a t is freely available at
http://statistics.open.ac.uk/elicitation, offers the option of eliciting any of these four prior
distributions.
As noted earlier, it is tricky to elicit assessments th a t satisfy all the necessary requirements
for multinomial models. For example, if there are only two categories, the lower probability
quartile of one category and the upper quartile of the other must add up to one. As the
number of categories increases the requirements th a t m ust be satisfied increases. In our
proposed elicitation m ethod, we chose assessment tasks and a structure th a t led to a coherent
set of assessments, w ithout the expert having to be conscious of the requirements.
251
C hapter 9
E liciting m ultinom ial m odels w ith
covariates
252
9.1
In trod u ction
W ith multinomial models, the membership probabilities of different categories may depend
on one or more continuous or categorical explanatory variables (covariates) th a t influence
these probabilities. The simpler well-known example in this context is the logistic regression,
where the probability of being in one of only two categories is related to a set of explanatory
variables through the logit link function.
Suppose there are k categories, let pi,P 2 , - " ,Pk denote the membership probabilities
and let
= ( X i , X 2 , • • • , X m) be a vector of m explanatory variables. Relating X_ to each
probability pi using separate logit link functions is not the best choice. The inverse link
functions gives
exp (ai + X ' p . )
P i { X ) = 1 + exp(ai +T Xy P' oi )V
z' = 1’2
(9>1
in which case, it will not be easy to investigate the conditions under which the constraint
J2i=iPi{20 = 1 is fulfilled. Some other link functions are available in the literature [e.g.
Aitchison (1986)]. However, the additive multinomial logistic link function is the most con
venient, as it autom atically accounts for the unit sum constraint. It links the classification
probabilities to linear predictors in the form,
k
’ 2==1,
1 + Y L exP(a i + X' Pj )
Pi(X) = 1
3= 2
(9.2)
exp (a* + X ' f i J
k
%— 2, • • • , k.
1 + J 2 exp ( « j + x ' p . )
3=2
Expressing the model in the form of (9.2) helps to generalize results obtained in the previous
chapter to the current case.
For the Bayesian analysis of the multinomial logit model, a m ultivariate norm al prior may
be assumed [e.g O ’Hagan and Forster (2004)] for the param eter vector
253
where the vectors of coefficients, (ai,p.)', are category specific, for i = 2, • • • , k, i.e. each
category has its own vector of regression coefficients. We select the first category as the fill-up
category, hence, its regression coefficients, (a q ,/^ )', are not included in the prior distribution
for identifiability.
In this chapter we propose an elicitation m ethod for eliciting a mean vector and a
positive-definite variance-covariance m atrix of the normal prior distribution of /3*.
Our
proposed m ethod is based on the results obtained in the previous chapter for the logis
tic normal prior distribution of the multinomial model. The proposed m ethod has been
implemented in the PEGS-M ultinomial with Covariates software th a t is freely available at
http://statistics.open.ac.uk/elicitation.
In Section 9.2, we define the underlying model, namely, the base-line multinomial logit
model, in term s of the additive logistic transform ation. The required assumptions, notation
and theoretical framework are discussed in Section 9.3. Elicitation m ethods and assessment
tasks required for eliciting a mean vector and a positive-definite variance covariance m atrix
for the regression coefficients are proposed in Sections 9.4 and 9.5. Final concluding comments
of this chapter are given in Section 9.6.
9.2
T h e b ase-lin e m ultin om ial logit m od el
The model th a t uses the link function in (9.2) is known as the multinomial logistic (logit)
model, since it has multinomial responses with a num ber of k > 2 categories. The model in
(9.2) is usually given in the more general form
exp(oti +X'/3.)
Vi{X)
= - ------------ = = ^ - ,
j
=1
i = l,2 ,...,fc ,
(9.3)
exp(oy + X'fi.)
which is called the base-line multinomial logit model. See, for example, Agresti (2002) or
Powers and Xie (2000). In the rest of this chapter, for ease of notation, each classification
probability Pi(A ), as defined in (9.2), will be ju st denoted by pi, for i = 1,2,*-* , k.
254
To attain the unit sum constraint in the base-line model, an identifiability constraint must
be imposed by equating the coefficients of the “base-line” category to zeros. The selection of
the base-line category is arbitrary. If we select the first category as the base-line category,
then, under the identifiability constraint (ctq, f t )' = 0, it can easily be shown th a t the model
in (9.3) is equivalent to th a t in (9.2).
Thus, the model has exactly (k — l)(m + 1) free
param eters.
From (9.2), the linear predictor, Yj = otj + 2C/3 -, can be w ritten in term s of the logistic
transform ations in classification probabilities as
Yj = Oij + X!fi. = log f a ) - log(pi),
for j = 2,3, • • • , k,
(9.4)
where the regression coefficients for the jfth category are
We define an extra variable, Yi, as
Yi = log(pi) - log(l - pi).
(9.5)
This extra variable is required to be used as a conditioning value in the elicitation process,
as shown in the previous chapter. We do not assume Yi to be a linear predictor, since the
trivial param eters, aq and (3 , will not appear in the elicited prior distribution. We adopt
the conventions oq = 0,
9.3
= 0, for identifiability of the base-line model.
N o ta tio n and th eo retica l fram ew ork
We assume th a t the prior opinion about the linear predictors Y 2 , • • • , Tfc, can be adequately
represented by a m ultivariate normal distribution of degree k — 1. Then from equations
(9.4), (9.5) and Section 8.2.1, Yi has an approxim ate normal distribution. In addition, the
classification probabilities, pi,P 2 , ‘ ” lPk, have a logistic normal distribution as defined in
Section 8.2. Following O ’Hagan and Forster (2004), we assume a m ultivariate norm al prior
distribution for the regression coefficients.
255
For tractability in the elicitation process, the expert is asked to give her assessments
for the classification probabilities, p\, ■■• ,p^, and consequently for Y 2 , • • • , Yfc, for only one
covariate at a time. All other covariates are assumed to be at their reference values/levels. By
doing this for each covariate in turn, the expert can concentrate on revising her assessments
as a result of the change in ju st one explanatory covariate.
The relationship between each Yj and each continuous covariate X r is not necessarily
linear. A piecewise-linear relationship as discussed in Chapters 3 and 4 might be a reasonable
choice here th a t can model many types of relationships. However, in dealing with k categories
and m explanatory covariates, a piecewise-linear relationship will seldom be practical as it
imposes a large num ber of dividing points (knots) at which the expert m ust give assessments.
This would lead to a lengthy elicitation process. So, to simplify the elicitation process, we
assume th a t relationships are linear. Specifically, we assume a linear relationship between
each continuous covariate X r , r = 1,2,•• • ,m , and each Yj, j = 2, • • • , k, of the form
Yj = aj + X rpr,j,
r = 1, • • • ,m,
j =
2 ,---, k,
(9.6)
given th a t all other covariates are fixed at their reference values/levels. T h at is, equation
(9.6) holds when X{ = a^o, for z = 1,2, • • • , m, i ^ r, where
2^0
is the reference value/level
of X{. If, all covariates are at their reference values/levels, i.e. X{ = x ^ 0 , for i = 1,2, • • • , m,
then
Yj = aj>
j = 2, •••,& •
(9-7)
To achieve this, for r = 1,2, • • • , m, if the covariate X r is a factor (categorical variable), with
a reference level x Vjq and any number S(r) of levels, x rji , x r^ , • • • , av,<5(r)> then X r is split into
<5(r) new factors, X rj defined as
1
if Xi-rf*
Xy^'l
(9.8)
Xr,i — <
0
otherwise,
for i = 1,2, • • • , 5(r).
256
If X r is a continuous covariate with a reference value
x t>
q,
then we define a new variable
X* as
X* = X r —x rfi,
for r = 1,2, • • • , m.
(9.9)
W ith the new covariates defined by (9.8) and (9.9), the value of each covariate is equal to
zero at its reference value.
Hence, if m consists of m \ factors and m 2 continuous covariates, we get a new set of, say,
m* explanatory variables, where
mi
m* =
+ m 2.
j= 1
To simplify the notation, with no loss of generality, we keep the notation X i , W2, • • •, X m,
for the set of covariates, while keeping in mind th a t m actually denotes m* and th a t each
X r is of the form of (9.8) for a factor or (9.9) for a continuous covariate. In this sense, the
models in (9.6) and (9.7) are equivalent to (9.4).
It is convenient to rearrange the regression coefficients into a m atrix, say (3, of the form
(a\. \
OC2
AN
(\
\
Oik
(9.10)
\ N i
\& )
Then we define the new set of vectors a , j3^y for r = 1,2, • • • ,m , as the rows of /?, of the
form
QL = (an,
—(r)
CX-2 >
^Pr , l )
O'ky,
J
> fir,k^ ’
f i r , 2j
(9.11)
(9.12)
and the same set with the first zero elements removed, as
a1=
—(r)
(Oi 2
>
^fi r, 2i
3
f i r, 3i
Oik
3
3
fir,k^
(9.13)
(9.14)
Since each column of the (3 m atrix in (9.10) contains regression coefficients th a t correspond
to one category, it is more convenient to work w ith the rows, which each correspond to one
covariate. In this case, elements of a single row correspond to classification probabilities, and
257
hence these elements m ust be inter-related in a way th a t reflects the unit sum constraint
of the probabilities. Therefore, we assume th a t the elements of a are correlated, and th at
the elements of each / 3 ^ are also correlated, hence statistically dependent, a priori, for all
r = 1,2, ••• ,ra.
While elements from different rows of /?, th a t corresponds to different
covariates, are assumed to be independent a priori, so as to simplify the elicitation process
and obtain a block-diagonal variance-covariance m atrix.
If we let d 1 = (ft1'
ft1'
.. .
ft1' V, then the m ultivariate normal prior distribution
V£ (l)’ £(2)’
’ £(m)
.
to be elicited is thus of the form,
(a
MVN
\? J
9.4
f1 a.
t—
(9.15)
VV
E licitin g th e m ean vector
To elicit the mean vectors p a and p^, in (9.15), we proceed as follows
• The expert is asked to assume th a t all covariates are at their reference values/levels, i.e.
X r = 0, r = 1,2, • • • , m. We call this situation as the reference point. She then assesses
a median value, say m^o g, for the probability p\ of the first category. As discussed
in the previous chapter, since the choice of the first category is arbitrary, it is chosen
by the expert as the most common category. Then the expert assesses m edian values
mJ 0>o» i = 2, • *• >k, for all the remaining categories.
• As proved in Theorem 8.1 in the previous chapter, these unconditional m edian assess
ments are equal to the conditional medians of (pj\pi = ^ iio o ) f°r J ~ 2,3, ••• ,/c.
For convenience, we denote both conditional and unconditional medians by m*^0j0 )
j = 2, • • • , k. Lemma 8.1 in the previous chapter states th a t median assessments m ust
sum to one, so they are normalized by the PEGS-M ultinom ial w ith Covariates software
to fulfill this condition.
258
• For each covariate in turn, the expert is asked to assume a specific value of the current
covariate, say X r = x r , while all other covariates are assumed to be at their reference
values/levels. Under these assumptions the expert starts by assessing a median value
for pi, say m \ Qr. Then she assesses a new set of median values, say ra*0i7o f°r 3 =
2,3,--- ,k, for all the remaining categories. Again, these assessments are normalized
to satisfy the unit sum constraint. This process is repeated for each covariate, i.e. for
r = 1,2, • • • , ra.
• Figure 9.1 shows the assessed probability medians when only one of the covariates, age,
has changed from its reference value to a new value (40 years). To help the expert during
this stage, the software gives the previously assessed medians when all covariates were
at their reference values/levels. This is presented by the upper right graph of Figure 9.1.
The reference value/level of each continuous covariate/factor is also listed in the upper
left table as in Figure 9.1.
Probability m edians a t th e r e fe re n c e point
Now, y o u h a v e fin ish ed w ith th is fra m e . You m a y click 'N ext11
File Ectt Tools Help
Eliciting M edians of Probabilities fo r Each C ategory w hen th e covariate (age;
Categories
IHem?,18; il
H«ie' ~ |
<9 i
Figure 9.1: Assessing probability medians at age = 40 years
259
Now, let the conditional median of Yj, given th a t all covariates are at their reference
levels, be denoted by m ^ o ? for j — 1,2, • • • ,k. Also, let the conditional median of Yj, given
th a t X r = x r and all other covariates are at their reference levels, be denoted by rrijto>r, for
j = 1,2, • • • , k, and r = 1,2, • • • , m.
As the transform ations in (9.4) and (9.5) are monotonic increasing, medians and con
ditional medians are transform ed.
2
Hence we can write, for r = 0,1,2, ••• , m, and j =
,3,• • • , k,
mi,o,r = log(rai)0)T.) - log(l - m*1Ar),
(9.16)
mj,o,r = log(m*0,r ) - log(m ^0)r).
(9.17)
It is worth mentioning here th a t the validity of (9.17) is a result of defining r a j^ r
as the
conditional median of (pj\pi = r a |)0r), which implies th a t m^o.r is a conditional median of
(Yj\Yi = m i)o,r)--That is why we need the redundant variable, Yi, to be defined in (9.5).
The computed assessments from (9.16) and (9.17), together w ith the linearity assumptions
in (9.6) and (9.7), enable us to determine fij = E{a.j), for j = 2, •
••
, k, as
p.j = E( Yj\Xi = 0,Vz = 1,2, • • ■,m ) =
(9.18)
We m ust determine /rrj- = E((3rj ) for r — 1,2, • • • , ra, j = 2, • • • , k. If X r is a factor, then
from (9.6) and (9.7), and utilizing the assessments in (9.16) and (9.17), we put
lirJ = E ( Y j \ X r = 1 , X i = 0, Vi ^ r) - E{Yj\Xi = 0,Vi = 1,2, • • • ,ra)
= rrij^r - rrijfifi.
(9.19)
If X r is a continuous covariate, then /3rj is the slope of the linear relation in (9.6), so
firj = [E(Yj\Xr = x r , X i = 0,Vi 7^ r) —E(Yj\X{ = 0,Vi = 1,2, • ■• , m ) ) / x r
= [mj>0)r - m j toto]/xr ,
for r =
1,2
(9.20)
, • • • , ra, and j — 2 , • • • , k.
Finally, we put
ifa = ^ 2 , M3,
260
•• • , Mfc)
5
(9-21)
and
Up ~
9.5
(m i,2j
•••>
Ml,fc>
M2,2,
M2,fc,
Mm,2 , ' • * ,
Mm.fc) '
(9-22)
E licitin g th e variance m atrix
To elicit a positive-definite m atrix for the m ultivariate normal prior distribution of the re
gression coefficients in (9.15), we proceed as follows.
9.5.1
E liciting th e variance-covariance sub-m atrices
We denote £ Q = Var ( a 1) by So? and put
S r ,a = V a r e l a 1)
(9.23)
and
S ^ V a r ^ ,
^
....
(9.24)
In order to develop a m ethod for eliciting positive-definite matrices So and £ r |a (r =
1, • • • , ra), we proceed as follows.
From (9.7) we put
So = V a r(F 1|X i- = 0,Vz = l , 2, - . - ,m ) = V0,
where X1=
( y 2i
y 3)
...,
(9.25)
y fc) .
For continuous covariates, if we assume th a t X r = x r and X{ = 0, for i = 1,2, • • • ,m , i ^ r ,
we have from (9.6) th a t
VarQ^lJW = x r , a 1 = j x j = x 2r Var(/?Jr)|a 1 = p j = Vr .
(9.26)
Hence, for r = 1,2, • • ■, ra
■Er |a = x -
2
Vr .
(9.27)
For factors, (9.27) is reduced to
E r |* = Vr .
Each m atrix Vr (r = 0,1, • • • , ra) can be elicited as
(9.28)
a positive-definite m atrix in the way
used to obtain the variance m atrix of the logistic normal prior in Chapter 8.
261
R em ark 9.1
The main difference between this chapter and Chapter 8 is th a t here the process of assessing
the conditional medians and quartiles must be repeated m + 1 times. In the initial step, the
expert is asked to assume th a t X{ = 0, Vi = 1,2, • • • ,m. Then, in each successive step number
r, for r = 1,2, • • • , ra, the expert is asked to assume th a t the r th covariate has changed from
0 to x r , i.e. X r — x r , while all other covariates are at their reference values, i.e.
= 0,
for i — 1,2, • • • ,m, i ^ r. During these remaining ra steps, another key assum ption is made.
The expert is also conditioning o n a 1 = / ^ .
Under these main assumptions at step r, r = 0,1, • • • , m, the assessment tasks can be
detailed as follows.
9.5.2
A ssessing conditional quartiles
• Under the assumptions listed in Remark 9.1, the expert is asked to assess a lower quartile
L \ r and an upper quartile U{ r for p \ . She is then asked to assume th a t pi = m \ Qr
and gives a lower quartile L\
and an upper quartile U2r for p2.
• For each remaining pj, j = 3, • • • , k — 1, she assesses the two quartiles L j r and UjT
given th a t pi = m j>0>r, p 2 = m*2Ar, ..., p j - i = m ]_ 10 r .
• Using the interactive PEGS-M ultinomial w ith Covariates software, and due to the unit
sum constraint, the lower (upper) quartile L ^ r {U^r) of p^ is autom atically shown to
the expert once she assesses the upper (lower) quartile
r (T^_l r ) of pk-i-
• W ith the aid of a lognormal curve produced by the software, the expert is advised to
make sure th a t her assessed interquartile range gives an almost zero probability of pj
exceeding 1 — Yj[Zi
^ or more details on this, see Section 8.3.2 in the previous
chapter.
262
9.5.3
A ssessing conditional m edians
• Under the assumptions listed in Remark 9.1, for r = 0,1, • • • , m, the expert is asked to
assume th a t the median of p\ has been changed from m \ Qr to
Given this information, the expert is asked to change her previous medians
0)T- of
each pj. Her new assessment, Trij l r , may be w ritten as
m l i , r = m l o.r- +
6 j , i >r,
for j = 2, • • • , k.
(9.29)
• In each successive step i, for i = 2,3, • • • , k — 2, the expert is asked to suppose th a t the
median values of pi, p2, ..., p%are m \ l r = m ^ 0)T. + ^ r , m*22r =
m i i-i r
••• ,
=
+ ^iri respectively. These are shown as red bars in Figure 9.2.
Given this information, she is asked to revise the medians th a t she assessed at the
most recent previous step nr*+l i_ l r , m?+2i_ 1)7., ’ ' '
Figure 9.2. Her new assessments are denoted m j+1 ^ = 77i^+1
m i+ 2 ,i-i,r
+ 0i+2,t,r»
' ' 1>
m k ,i,r
=
m k , i - i,r
sh°wn by black lines in
+^*+1 >ijT., m*+2 i)T, =
+ 0jfc,i,r» respectively, which are shown as
the blue bars in the main graph of Figure 9.2. In other words, for i = 1,2, • • • , k —2,
and j = i + 1, i + 2, • • • , k, we can write
m h,r =
+ 0j,i,r is the median of (pj |pi = m ijl>r, • • • ,Pi = mf>i>r).
(9.30)
• For m athem atical coherence, as proved in Lemma 8.3, Section 8.4.2 in the previous
chapter, we have to make sure th at
i
k
y ! m j,j,r +
m j,hr =
j= l
j=i+l
2 = 1, 2,
—2.
The software suggests new normalized conditional medians satisfying the above con
straint.
• As mentioned in Remark 9.1, the expert assesses her conditional medians assuming th a t
only one of the covariates, age, has changed from its reference value to 40 years, and
assuming at the same time th a t her previously assessed medians at the reference point
263
are correct. Probability medians at the reference point are presented to the expert in
the upper right graph of Figure 9.2. The expert is asked to assume these medians are
the true values while assessing her conditional medians on the main graph of Figure 9.2.
jig
yg ^
j]
» j.g frte x - Marotoft Out... I , > V>Vfctt,»*TcX • [C:V.. | P 7 fctobt Reacter
Figure 9.2: Assessing conditional medians a t age = 40 years
Assessment tasks in Sections 9.5.2 and 9.5.3 will be repeated m + 1 times, for r =
0,1, • • • ,m . Then, as detailed in the previous chapter, the normalizing one-to-one functions
in (9.4) and (9.5) are used to transform the assessed conditional quartiles of p into condi
tional quartiles of Y_ and, hence, into conditional expectations, variances and covariances of
the m ultivariate normal elements.
The m ethod of Kadane et al. (1980) is modified, as in the previous chapter, to estim ate
a positive-definite variance-covariance m atrix Vr for Y } \ X r, from the assessed conditional
medians and quartiles. So, because of the unit sum constraint, each positive-definite m atrix
Vr is of order (k — 1).
Under the assumptions leading to (9.15), and in view of (9.23) and (9.24), the diagonal
blocks of the block-diagonal m atrix S^|a are S r |a , where each E r |a is given by (9.27), for
264
r = 1,2, ••• ,ra.
Hence, E^|a is a positive-definite m atrix.
The unconditional variance-
covariance m atrix E^ will be obtained from E ^ using the covariance m atrix Eq,^. The
latter is elicited as follows.
9.5.4
E liciting th e covariance m atrix Eaig
The covariance m atrix of a 1 and ft 1 is the m atrix Eq,^ of order (k — 1) x m{k —1). To elicit
this m atrix, it is convenient to conformally partition Ea)/g as
= ( E a ^ , E Qj/32, •••,
E
(9.31)
where, for r = 1,2, • • • , m,
S a A ^ C o v f e 1, ^ ) .
(9.32)
We denote the rows of each S a,/3r by <Z-a,(3r,t’ f°r ^ — 2, • • • , &, where
^
For any specific value
satisfying
it = C ov(at , ^ r)).
(9.33)
^ fit, for t = 2, • • • , k, it can be seen from (9.15),
(9.32), (9.33) and the theory of m ultivariate normal distribution th a t
&LU. =
—(r )1“ < = “ t) = fi«.
£&• +
- (H
_Var(af)_
(9.34)
From this
Vai(at)
-
Vt (ifft-la ,-H a .)-
(9-35)
Since Var(o;t) is the (i —l) th element of the main diagonal of So as in (9.25), then, from
(9.32) and (9.33), E a a can be elicited using (k
}
—
1) assessments of fin .
Pr\Ott
,
for t = 2, • • • k.
,
Under the normality assumptions, these conditional means of the regression coefficients can
be computed from the conditional median assessments of the classification probabilities. This
can be detailed as follows. For each covariate X r (r = 1,2, • • • , m) in turn, the expert is asked
to assume th a t each single at {t = 2, • • • , k) in tu rn has changed from fit to af, i.e. she is
asked to assume th a t the true value of (pj\X{ = 0, Vz = 1,2, • • • , m) has changed from
to a new specific value, rrijQ Qt . This is shown by the change from the black lines to the red
265
bars in the upper right graph of Figure 9.3. Given this information, the expert then assesses
her median of (Pj\Xr — x r , X{ = 0,* = 1,2, • • • , m , i ^ r), which we denote by ^ 0 r|at’ ^or
j = 2, • • • , k. These are assessed as the blue bars in the main graph of Figure 9.3.
-a n yjj
C onditioning probabilities a t th « re f e r e n c e point
C r w l w ! K efc rw e n fcie * m *
igt
f® kr
1
30.fi
I
code
1
60.8
Now, y o u l u v e Rnlsboii w ith th is fra m e . You m a y efiefc *8 0 x1* n o w
......."-"M g
no
tmTea* Hcd>
Eliciting c on d itio n al M edians fo r Each Category w hen th e c o v a m te (age) I;
F T 5 S T iilf|
IB @ 0
i£l 0
i!3
*
r * I X ta /H m *
| Q 7 MteBCTdc- » | QMJtrOTalw«ic...|| ltj5
■ »jjDreirentt.Mcr..,| JjDaaCTmli-Her... | Q
IM?
Figure 9.3: Assessing conditional medians given changes at the reference point
The choice of the specific values a£ is arbitrary, provided th a t a£ ^ fit. However, we
select each of them to be the upper quartile of the normally distributed variable at, namely,
a^ = fit + 0.674>/Vhr(at),
for t = 2, • • • , k.
(9.36)
This leads, from (9.4), (9.5) and (9.7), to sets of conditioning probabilities, rrij 0 0 t , th a t are
given by
m
exp(aj)
for j = 1, • • • , k,
(9.37)
1 + E z t i exp(ajj)
where a\ = 0, a f = a% and c& = fij, for j ^ t.
Since, as in (9.34), we condition on changing at, for t = 2, • • • , k, one at a tim e, we have
to compute the resulting conditioning probabilities from this change as in (9.37). If we had
chosen to first change the conditioning probabilities, the desired change for at would not have
been guaranteed.
266
As in (9.17), the corresponding median assessments for Yj can be computed, for j =
2, • • • , k, r — 0,1, • • • , m, and t = 2, • • • , k, as
™ j,0,r|at
=
i o g K V l a t ) _ lo S ( m lA r |a t )-
( 9 -3 8 )
Hence, we denote E((3rj\cxt = a£) by iirj\ at and compute it as follows.
If X r is a factor, then as in (9.19), we put
A*r,j\at
^j,Q,r\at
^U,0,0|af
(9.39)
If X r is a continuous covariate, then as in (9.20), we put
fir j\at
—
m j>°>r\at ^~
m J»o.Qlat
(Q An\
,
(9.4U)
for r = 1,2, • • • , m, j = 2, • • • ,k, and t = 2, • • • , k.
P utting
^l^r,2 \at^ Mr,3|at»
—/3r|ott
all the components of g[!a
> f^r,k\at^ ’
as in (9.35), and hence of Y,a^ r as in (9.32), are elicited. Then
E a>/? as in (9.31) is fully determined.
After obtaining the covariance m atrix E a>/g, and utilizing the elicited m atrix E ^ q,, we get
E/j from the conditional variance
^/3|a — S/3 — E ^)jgEQ1E a)/3,
(9.42)
S/9 = S/3|a + E J ^ E ^ E q,^.
(9.43)
which gives
Since E ^ and E a are positive-definite, so is E/j. Also, from (9.43) and using the Schurr
complement, the full variance-covariance m atrix of the m ultivariate normal prior distribution
in (9.15)
ispositive-definite. It is of order (k — l)(m + 1)
and does not contain
or covariances of a\, nor the elements of /? .This is equivalent
variances
to the usual identifiability
assum ption of the base-line multinomial logit models, where the regression coefficients of the
base-line category are set equal to zeros.
267
9.6
C onclu din g com m en ts
A novel m ethod has been introduced for eliciting a m ultivariate normal prior distribution
for the regression coefficients in a multinomial logit model w ith explanatory covariates. The
m ethod is an extension of our proposed m ethod in Chapter 8 for eliciting a logistic normal
prior for classification probabilities in a multinomial model. Specifically, under a base-line
m ultinomial logit model containing k categories and m explanatory covariates, assessment
tasks of a standard multinomial model are repeated m 4-1 times. The expert assesses con
ditional medians and quartiles for the multinomial probabilities at specific values of each
explanatory covariate.
This determines a m ean vector and a positive-definite variance-
covariance m atrix of a m ultivariate normal prior distribution for (k — l)(m + 1) regression
Coefficients.
268
C hapter 10
C oncluding com m ents
269
This chapter summarizes the main results and conclusions of the thesis. We give a brief
review of the elicitation methods proposed throughout this thesis, commenting on the main
assumptions, strength and weakness points of each proposed m ethod. In addition, the inter
relationships between related methods are mentioned and clarified. The proposed methods
divide naturally in two groups: m ethods of quantifying expert opinion for GLMs and methods
of prior elicitation for multinomial models. The proposed m ethods in each group are briefly
discussed in order. Some extensions for further future research are given.
The m ethod proposed by Garthwaite and Al-Awadhi (2006) and its extension in Garthwaite and Al-Awadhi (2011) can be considered a general tool for eliciting a m ultivariate
normal prior for the regression coefficients in any GLM. In their m ethod, opinion about the
relationship between each continuous predictor variable and the response variable is modeled
by a piecewise-linear function. This gives a flexible model th a t can represent a wide variety of
opinion. Expert opinion about each categorical predictor variable (factor) is elicited through
a bar-chart. Each slope of the piecewise-linear relationships and each level of the factors has
a corresponding regression coefficient. The expert assesses conditional medians and quar
tiles of the response variable at different selected design points. In this sense, the m ethod
applies the idea of conditional means prior proposed by Bedrick et al. (1996). Conditional
assessments are transformed, under the normality assum ption of regression coefficients, to
estim ate a mean vector and a variance-covariance m atrix for the m ultivariate normal prior
distribution. Conditional quartiles are assessed in a structural way th a t ensures th a t the
resulting m atrix is positive-definite.
The m ethod proposed by Garthwaite and Al-Awadhi (2011) has been implemented in
interactive graphical user-friendly software, in which the expert draws piecewise-linear curves
and bar-charts by clicking on interactive graphs on a com puter screen to give her assessments.
The software computes and offers suggestions to the expert to help reduce the burden of
making assessments. A prototype of this software was w ritten in Java by Jenkinson (2007)
and has been modified and extended in the current thesis to be more flexible and to include
270
more options. A detailed description of the m ethod and the current modifications to the
software has been given in Chapter 3. Previously the software could only handle logistic
regression but now it handles a wide range of GLMs. As noted earlier, an im portant feedback
option has been added to the software. As each covariate is assessed separately, this feedback
option is very useful for helping the expert see the joint im pact of all explanatory covariates
th a t her assessments imply.
A simplifying assum ption in the m ethod of Garthw aite and Al-Awadhi (2011), th a t has
been relaxed in this thesis, is th a t regression coefficients had been assumed to be indepen
dent, a priori, if attached to different explanatory variables. This yielded a block-diagonal
variance-covariance m atrix and reduced the num ber of required assessments for its elicita
tion. However, this independence assum ption can be unrealistic in many practical situations.
We proposed three elicitation methods for a m ultivariate normal prior distribution th a t do
not impose this simplifying assumption. The proposed methods elicit full variance-covariance
matrices, but additional assessments are needed in order to estim ate the off-diagonal elements.
As noted earlier, the three proposed m ethods differ in their flexibility and in the num ber
of additional assessments th a t they require. The first m ethod is a direct extension to the
m ethod of Garthw aite and Al-Awadhi (2011). It is the most flexible m ethod among the three
and perm its different correlations between regression coefficients attached to the same pair of
covariates. Consequently, it requires a large number of conditional assessments, but it should
prove useful when there are only a few pairs of variables th a t, a priori, have highly correlated
regression coefficients.
The second proposed m ethod uses only one assessment to model the correlation between
all regression coefficients attached to any specific pair of explanatory covariates. This assum p
tion, of fixed correlations for all elements belonging to the same pair of vectors of coefficients,
is useful as it reduces the assessment tasks to ju st one task. The expert is asked to use a
slider to determine the correlation between two vectors of regression coefficients. This can
be attractive as an easy and quick m ethod for eliciting correlations if only two vectors of
271
regression coefficients are thought to be correlated. Moreover, for the case where more than
two vectors have correlated regression coefficients, we extended the m ethod and showed it
will yield a full variance-covariance m atrix th a t is positive-definite.
The third m ethod we proposed is suitable for GLMs th a t contain a large number of
correlated vectors. It uses a few assessments th a t directly reflect the p attern of correlations
between all pairs of vectors. In a dialogue box, the expert assesses the relative magnitudes
and signs of the average correlations between each pair of vectors. Hence, for n vectors of
coefficients, n (n —l) / 2 assessments are needed. These relative m agnitudes should reflect the
strength of the average correlation of each pair relative to other pairs. It is a comparatively
easy task for the expert as these assessments need not be coherent correlation coefficients; they
are scaled later to attain statistical coherence. The m ethod avoids incremented conditioning
and assesses all covariances simultaneously.
After assessing the relative magnitudes, using the PEGS-GLM (Correlated Coefficients),
the third m ethod can be used alone or together with one of the other two proposed m eth
ods, to obtain correlations. The default option, th a t implements this m ethod alone, is to
use one slider to determine correlation coefficients based on simultaneous interactive graphs
th a t show the changes of different variables according to their assessed relative magnitudes.
The other two alternate options need an assessment of the correlation of only one pair of
vectors, then all other correlation coefficients are computed from this assessment using the
relative magnitudes. The correlation assessment for one of the highly correlated pairs may
be obtained using one of the other two proposed methods. The first of them needs more as
sessments, while the second m ethod assumes a fixed correlation structure for the elements of
the highly correlated pair of vectors. Figure 10.1 shows the different options available to the
expert for choosing which m ethod to use when she is assessing correlations between regression
coefficients in GLMs. These are the different options offered by our PEGS-GLM (Correlated
Coefficients) software th a t is freely available at h ttp ://statistics.open.ac.uk/elicitation.
272
Eliciting
a block-diagonal
matrix
No
.Any correlated vectors?.
Yes
More
One pair or more?
One pair
No
Fixed correlations?
Yes
Method 1
Method 3
Method 2
Yes
Weight by a pair?
Finish
No
Figure 10.1: Options for assessing correlations between regression coefficients
To complete the prior structure of GLMs w ith normal and gamm a response variables, we
proposed two m ethods of eliciting prior distributions for the extra param eters in these models.
One of these methods elicits a conjugate chi-squared prior distribution for the random error
variance in normal linear models. The expert is asked to revise her assessments conditional
on various sets of hypothetical future samples. A num ber of sets of hypothetical d a ta are used
in order to obtain several estimates of the hyperparam eter th a t is most difficult to assess,
namely, the degrees of freedom param eter of the chi-squared distribution. Reconciliation
of these estimates, using the geometric mean, yields an overall estim ate of the num ber of
degrees of freedom. The second hyperparam eter of the chi-squared prior distribution is also
determined from the same assessments. The use of interactive graphical software greatly
273
facilitates the tasks th a t the expert m ust perform.
For a gamm a response variable, the additional param eter th a t m ust be assessed is the
scale param eter. We assumed th a t prior opinion about this positive-valued param eter can be
reasonably quantified as a lognormal distribution. To determine the hyperparam eters of the
lognormal prior distribution, the expert is asked to give a point estim ate and an interquartile
range for the lower quartile of the gamma response variable.
We proved th a t the lower
quartile is a monotonic increasing function of the scale param eter. The expert’s assessments
are thus transform ed to quartiles of the lognormal distribution, and hence to the mean and
variance of the lognormal distribution. An example of the questions th a t can be asked in
order to obtain the expert’s assessments has been given. As noted earlier, no other reasonable
elicitation m ethods for the scale param eters of gamma GLMs seems to be available in the
literature.
Eliciting flexible prior distributions for the classification probabilities in multinomial m od
els has been another im portant interest of this thesis. In this context, we started by proposing
two elicitation methods for the natural conjugate Dirichlet prior. The first m ethod is based
on m arginal quartile assessments of the classification probabilities. These assessments were
used to elicit separate m arginal beta distributions of the Dirichlet prior distribution. A nor
mal approxim ation and least-squares techniques have been used to obtain b eta param eters
from the quartile assessments. From three reconciliations of b eta distributions into a Dirich
let prior distribution, the expert is asked to select the reconciliation th a t best describes her
opinions, based on graphical feedback. The second m ethod elicits conditional quartile assess
ments for the classification probabilities. These conditional assessments are used to determine
conditional beta distributions th a t are averaged to obtain a Dirichlet prior distribution.
The same marginal and conditional quartile assessments for classification probabilities
have been used to elicit two other flexible prior distributions for multinomial models. Condi
tional quartile assessments were used to elicit conditional beta distributions of a generalized
Dirichlet prior distribution.
As noted earlier, this distribution is more flexible th an the
274
standard Dirichlet distribution for quantifying expert opinion. It has the same number of
hyperparam eters as the total num ber of param eters in the conditional beta distributions th a t
determine it. Hence no reconciliation is needed. The generalized Dirichlet distribution has a
more general dependence structure th an the standard Dirichlet. For example, its correlation
structure allows positive correlations between classification probabilities.
Marginal assessments were used to elicit m arginal beta distributions for multinomial prob
abilities. Then, instead of assuming a Dirichlet prior, the beta marginals were used in a
Gaussian copula function to model the joint prior distribution of multinomial probabilities.
This required further conditional quartile assessments to describe the correlation structure
between these probabilities. The monotonicity of the Gaussian copula transform ation allowed
conditional quartiles of the multinomial probabilities to be transform ed into normal quartiles.
The latter were used to obtain product-m om ent correlations for normal variates. This power
ful technique of transform ing quartiles avoids the difficulties encountered when transform ing
product-m om ent correlations. Structural assessment of the conditional quartiles has been
used to ensure th a t the elicited variance-covariance m atrix is positive-definite.
The conditional quartile assessments th a t were used to elicit correlations for a Gaussian
copula prior were also used in a new m ethod for eliciting a logistic normal prior distribution for
multinomial probabilities. Quantifying expert opinion as a logistic normal prior raised some
interesting points th a t do not seem to have arisen in elicitation contexts before. We made
use of the natural approxim ation of the lognormal sum by another lognormally distributed
random variable. In addition, our proposed m ethod has extensively used the notion of singular
m ultivariate normal distribution; available literature shows th a t conditional properties of
the singular normal distribution is nearly identical to their corresponding properties in the
standard normal distribution. These results were used to prove th a t the medians, not only the
means, of multinomial probabilities must sum to one, assuming they follow a logistic normal
distribution. This was critical in building the elicitation m ethod as it enables assistance to
be given to the expert th a t leads to statistically coherent assessments.
275
The four proposed prior distributions are interrelated regarding the assessments th a t they
use. Each type of assessments can be used to elicit more than one prior distribution. The
Prior Elicitation Graphical Software package for M ultinomial models, PEGS-Multinomial,
th a t is freely available at http://statistics.open.ac.uk/elicitation, arranges the assessment
tasks th a t are required for the four proposed prior distributions. Software is also available
th a t elicits each of the prior distributions separately. The flowchart in Figure 10.2 shows the
options for prior distributions th a t are available in PEGS-M ultinomial and the corresponding
assessments th a t they require. For example, it shows th a t a Gaussian copula prior is elicited
using two types of assessments, and th a t a standard Dirichlet prior is elicited using either
m arginal or conditional assessments, as discussed before. Since conditional beta assessments
can be used to elicit both the standard and generalized Dirichlet distributions, the software
gives the option of eliciting both of them using the same conditional quartiles.
276
Which Prior?
Standard
Dirichlet
Generalized
Dirichlet
Conditional
Logistic
Normal
[arginal or Conditional?.
/ Marginal
Beta
Assessment!
/Conditional
Beta
Assessments/
Gaussian
Copula
/Conditional
Quartiles
Assessments/
I---Standard
Dirichlet
Prior
Generalized
Dirichlet
Prior
Gaussian
Copula
Prior
Logistic
Normal
Prior
Yes
Generalized Dirichlet?
Standard Dirichlet?
Yes
No
No
Finish
Figure 10.2: A flowchart of the prior elicitation software for multinomial models
All the proposed prior elicitation methods for multinomial models and their implementing
software have been used in examples by real experts. In all examples, the experts suggested
the problem according to their fields of expertise. They understood the multinom ial formula
tion and were keen to participate in the elicitation process. After a brief discussion about the
ideas of the bisection m ethod and conditional assessments they had no problem in assessing
quartiles and conditional quartiles. All the experts expressed the view th a t visualization of
the problem had helped them a lot in quantifying their opinions. They also made use of the
coherent suggestions given by the software and used the feedback options to revise some of
their assessments. Thus the software proved im portant in providing visualization, coherent
suggestions and feedback. It also helped the experts review and revise their assessments, and
277
reduced the time taken by the elicitation processes.
Future research in assessment methods for GLMs may include eliciting prior distribu
tions for the overdispersion param eters in binomial and Poisson GLMs. In these im portant
GLMs, it is common th a t the d a ta show a greater variability than the theoretical variability
assumed by the model. However, no elicitation m ethod have been proposed in the literature
for quantifying opinion about overdispersion param eters. A reasonable approach might be
to assume a generalized binomial distribution or a generalized Poisson distribution for the
response variable, instead of the standard binomial or Poisson distributions. These general
ized distributions have extra param eters th a t allow for overdispersion. M ethods of assessing
suitable prior distributions for these extra param eters need to be developed.
Another extension to the proposed m ethod for GLMs elicitation concerns the proportional
hazard model. This model, also known as the Cox regression model, is often used to model
survival data in medical research. See, for example, Collett (1994). Due to its wide practical
importance, a huge bulk of research has been devoted to investigating both theoretical and
applied aspects of Bayesian analysis of a proportional hazard model. See, Ibrahim and Chen
(1998) and Zuashkiani et al. (2008), among others. Quantifying opinion about these models
has also attracted some attention. See, for example, Chaloner et al. (1993) and Henschel
et al. (2009). A daptation is needed for the current GLM elicitation m ethods to handle a
proportional hazard model.
The m ethod of eliciting logistic normal prior distributions for multinom ial models has
already been extended further in C hapter 9. The extended m ethod treats the case of m ulti
nomial models in which classification probabilities are influenced by explanatory covariates.
Specifically, we proposed a m ethod th a t quantifies opinion about the param eters of a base
line multinomial logit model as a m ultivariate normal prior distribution. The m ethod uses
conditional median and quartile assessments for the classification probabilities at different
combinations of the explanatory variables. These assessments have been obtained in a struc
tured way th a t yields the mean vector and positive-definite variance-covariance m atrix of
278
the prior m ultivariate normal distribution. Another desirable extension would be to elicit a
logistic normal prior distribution for the cell probabilities of contingency tables. The logistic
normal distribution is considered a reasonable prior for contingency tables, see for example
Goutis (1993). Hence, our proposed elicitation m ethod for a logistic normal prior promises
to be useful in further contexts.
O ther models for which elicitation m ethods still need to be developed include tim e series
analysis, extreme values analysis and modelling the spread of infectious diseases.
These
models sometimes investigate cases for which d a ta are scarce, the events are rare, or situations
are new and uncontrollable. Expert opinion is highly im portant in such situations, so the
need for appropriate elicitation methods is clear.
279
Bibliography
Abadir, K. M. and Magnus, J. R. (2005). M atrix Algebra. Cambridge University Press, New
York.
Agresti, A. (2002). Categorical Data Analysis. Wiley Series in Probability and Statistics.
John Wiley k Sons, Inc., New Jersey, second edition.
Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal
Statistical Society, Series B , 44, 139-177.
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapm an and Hall,
London.
Al-Awadhi, S. A. (1997). Elicitation of Prior Distribution fo r a M ultivariate Normal D istri
bution. Ph.D . thesis, University of Aberdeen, UK.
Al-Awadhi, S. A. and Garthwaite, P. H. (1998). An elicitation m ethod for m ultivariate normal
distributions. Communications in Statistics-Theory and Methods, 27, 1123-1142.
Al-Awadhi, S. A. and Garthwaite, P. H. (2001). Prior distribution assessment for a mul
tivariate normal distribution: An experimental study. Journal of Applied Statistics, 28,
5-23.
Al-Awadhi, S. A. and Garthwaite, P. H. (2006). Quantifying expert opinion for modelling
fauna h abitat distributions. Computational Statistics, 21, 121-140.
280
A lbajar, R. A. and Fidalgo, J. F. L. (1997). Characterizing the general m ultivariate normal
distribution through the conditional distributions. Extracta M athematica, 12, 15-18.
Albert, J. H. and G ubta, A. K. (1982). M ixtures of Dirichlet distributions and estim ation in
contingency tables. The Annals of Statistics, 10, 1261-1268.
Beaulieu, N. C. and Xie, Q. (2004). An optim al lognormal approxim ation to lognormal sum
distributions. IE E E Transactions on Vehicular Technology, 53, 479-489.
Bedrick, E. J., Christensen, R., and Johnson, W. (1996). A new perspective on priors for
generalized linear models. Journal of the American Statistical Association, 91, 1450-1460.
Bland, R. P. and Owen, D. B. (1966). A note on singular normal distributions. Annals of
the Institute of Statistical M athematics, 18, 113-116.
Bunn, D. W. (1978). Estim ation of a Dirichlet prior distribution. Omega, 6, 371-373.
Bunn, D. W. (1979). Estim ation of subjective probability distributions in forecasting and
decision making. Technological Forecasting and Social Change, 14, 205-216.
Chaloner, K. and Duncan, G. T. (1983).
Assessment of a b eta prior distribution: P M
elicitation. The Statistician, 32, 174-180.
Chaloner, K. and Duncan, G. T. (1987). Some properties of the Dirichlet-multinomial distri
bution and its use in prior elicitation. Communications in Statistics-Theory and Methods,
16,511-523.
Chaloner, K., Church, T., Louis, T. A., and M atts, J. P. (1993). Graphical elicitation of a
prior distribution for a clinical trial. The Statistician, 42, 341-353.
Chen, M.-H. and Dey, D. K. (2003). Variable selection for m ultivariate logistic regression
models. Journal of Statistical Planning and Inference, 111, 37-55.
Chen, M.-H. and Ibrahim , J. G. (2003).
Conjugate priors for generalized linear models.
Statistica Sinica, 13, 461-476.
281
Chen, M.-H., Ibrahim , J. G., and Yiannoutsos, C. (1999). Prior elicitation, variable selection
and Bayesian com putation for logistic regression models. Journal of the Royal Statistical
Society, Series B , 61, 223-242.
Chen, M.-H., Ibrahim , J. G., and Shao, Q.-M. (2000). Power prior distributions for generalized
linear models. Journal o f Statistical Planning and Inference, 84, 121-137.
Chen, M.-H., Ibrahim , J. G., Shao, Q.-M., and Weiss, R. E. (2003). Prior elicitation for
model selection and estim ation in generalized linear mixed models. Journal of Statistical
Planning and Inference, 111, 57-76.
Chen, M.-H., Huamg, L., Ibrahim , J. G., and Kim, S. (2008). Bayesian variable selection
and com putation for generalized linear models with conjugate priors. Bayesian A nalysis,
3, 585-614.
Clemen, R. C. and Reilly, T. (1999). Correlations and copulas for decision and risk analysis.
M anagement Science, 45, 208-224.
Clemen, R. T., Fischer, G. W., and Winkler, R. L. (2000). Assessing dependence: Some
experimental results. Management Science, 46, 1100-1115.
Collett, D. (1994). Modelling Survival Data in Medical Research. Chapm an and Hall, London.
Connor, R. J. and Mosimann, J. E. (1969). Concepts of independence for proportions with a
generalization of the Dirichlet distribution. Journal of the Am erican Statistical Association,
64, 194-206.
Daneshkhah, A. and Oakley, J. (2010). Eliciting m ultivariate probability distributions. In
K. Bocker, editor, Rethinking Risk Measurement and Reporting: Volume I. Risk Books,
London.
D em arta, S. and McNeil, A. J. (2005). The t copula and related copulas. International
Statistical Review, 73, 111129.
282
Denham, R. and Mengersen, K. (2007). Geographically assisted elicitation of expert opinion
for regression models. Bayesian A nalysis, 2, 99-136.
Dickey, J. M. (1968). Three multidimensional-integral identities with Bayesian applications.
The Annals of Mathematical Statistics, 39, 1615-1627.
Dickey, J. M. (1983). M ultiple hypergeometric functions: Probabilistic interpretations of
statistical uses. Journal of the American Statistical Association, 78, 628-637.
Dickey, J. M., Jiang, J. M., and Kadane, J. B. (1983). Bayesian m ethods for multinomial sam
pling w ith noninformatively missing data. Technical Report 6/83 - # 15, State University
of New Yourk at Albany, D epartm ent of M athem atics and Statistics.
Dickey, J. M., Dawid, A. P., and Kadane, J. B. (1986).
Subjective probability assess
ment methods for m ultivariate-t and m atrix-t models. In P. Goel and A. Zellner, editors,
Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de F inetti, pages
177-195. North-Holland, Amestrdam.
Fan, D. Y. (1991). The distribution of the product of independent beta variables. Commu
nications in Statistics-Theory and Methods, 20, 4043-4052.
Fenton, L. F. (1960). The sum of log-normal probability distributions in scatter transmission
systems. IR E Transactions on Communications System s, C S -8, 57-67.
Fischer, M. H. (2001). Cognition in the bisection task. TR E N D S in Cognitive Sciences, 5,
460-462.
Flanagan,
M.
T.
(2011).
Michael
Thom as
Flanagan’s
Java
scientific
library.
http://w w w .ee.ucl.ac.uk/~m flanaga/java/. [Accessed 9 M arch 2011].
Forster, J. J. and Skene, A. M. (1994). Calculation "of m arginal densities for param eters of
multinomial distributions. Statistics and Computing, 4, 279-286.
283
Frees, E. W. and Valdez, E. A. (1998). Understanding relations using copulas. North A m er
ican Actuarial Journal, 2, 125.
Garthw aite, P. H. (1994). Assessment of prior distributions for regression models: An exper
im ental study. Communications in Statistics-Sim ulation and Computation, 23, 871-895.
Garthw aite, P. H. (1998). Quantifying expert opinion for modelling hab itat distributions.
Technical report, Sustainable Forest M anagement 1998/02, D epartm ent of N atural Re
sources Queensland.
Garthwaite, P. H. and Al-Awadhi, S. A. (2001). Non-conjugate prior distribution assessment
for m ultivariate normal sampling. Journal of the Royal Statistical Society, Series B , 63,
95-110.
Garthwaite, P. H. and Al-Awadhi, S. A. (2006). Quantifying opinion about a logistic re
gression using interactive graphics. Technical Report 06/07, Statistics Group, The Open
University, UK.
Garthwaite, P. H. and Al-Awadhi, S. A. (2011). Quantifying subjective opinion about gen
eralized linear and piecewise-linear models. In preparation.
Garthwaite, P. H. and Dickey, J. M. (1985). Double- and single-bisection m ethods for sub
jective probability assessment in a location-scale family.
Journal of Econom etrics, 29,
149-163.
Garthwaite, P. H. and Dickey, J. M. (1988). Quantifying expert opinion in linear regression
problems. Journal of the Royal Statistical Society, Series B , 50, 462-474.
Garthwaite, P. H. and Dickey, J. M. (1992). Elicitation of prior distributions for variable
selection problems in regression. Annals of Statistics, 20, 1697-1719.
Garthwaite, P. H. and O ’Hagan, A. (2000). Quantifying expert opinion in U K w ater industry:
An experimental study. The Statistician, 49, 455-477.
284
Garthwaite, P. H., Kadane, J. B., and O ’Hagan, A. (2005). Statistical m ethods for eliciting
probability distributions. Journal of the American Statistical Association, 100, 680-7Q1.
Garthwaite, P. H., Chilcott, J. B., Jenkinson, D. J., and Tappenden, P. (2008). Use of expert
knowledge in evaluating costs and benefits of alternative service provisions: A case study.
International Journal of Technology Assessm ent in Health Care, 24, 350-357.
Gautschi, W. (1998). The incomplete gamma functions since tricomi. In In Tricom i’s Ideas
and Contemporary Applied Mathematics, A tti dei Convegni Lincei, n. 147, Accademia
Nazionale dei Lincei, Roma, pages 203-237.
Genz, A. and Kwong, K. (1999). Numerical evaluation of singular m ultivariate normal dis
tributions. Journal of Statistical Computation and Simulation, 68, 1-21.
Good, I. J. (1976). On the application of symmetric Dirichlet distributions and their mixtures
to contingency tables. The Annals of Statistics, 4, 1159-1189.
Goutis, C. (1993). Bayesian estim ation methods for contingency tables. Journal of the Italian
Statistical Society, 2, 35-54.
Grunwald, G. K., Raftery, A. E., and G uttorp, P. (1993). Time series of continuous propor
tions. Journal of the Royal Statistical Society, Series B , 55, 103-116.
G upta, A. K. and N adarajah, S. (2004). Products and linear combinations. In A. K. G upta
and S. N adarajah, editors, Handbook of Beta Distribution and Its Applications. Marcel
Dekker, Inc., New York.
Hankin, R. K. S. (2010). A generalization of the Dirichlet distribution. Journal of Statistical
Software, 33, 1-18.
Henschel, V., J., E., Holzel, D., and Mansmann, U. (2009). A sem iparam etric Bayesian
proportional hazards model for interval censored d a ta with frailty effects. B M C Medical
Research Methodology, 9:9. Available at http://w w w .biom edcentral.eom /1471-2288/9/9
[Accessed 27th February 2009].
285
Hogarth, R. M. (1975). Cognitive processes and the assessment of subjective probability
distributions. Journal of the American Statistical Association, 70, 271-289.
Hora, S. C., Hora, J. A., and Dodd, N. G. (1992). Assessment of probability distributions
for continuous random variables: A comparison of the bisection and fixed value methods.
Organizational Behavior and Human Decision Processes, 51, 133-155.
Hughes, G. and Madden, L. V. (2002). Some methods for eliciting expert knowledge of plant
disease epidemics and their application in cluster sampling for disease incidence.
Crop
Protection, 21, 203215.
Ibrahim, J. G. and Chen, M.-H. (1998). Prior distributions and Bayesian com putation for
proportional hazards models.
Sankhya: The Indian Journal of Statistics, S p l S eries,
48-64.
Ibrahim, J. G. and Laud, P. W. (1991). On Bayesian analysis of generalized linear models
using Jeffreys’s prior. Journal of the American Statistical Association, 86, 981-986.
Ibrahim, J. G. and Laud, P. W. (1994). A predictive approach to the analysis of designed
experiments. Journal of the American Statistical Association, 89, 309-319.
James, A., Low Choy, S., and Mengersen, K. L. (2010). Elicitator: an expert elicitation tool
for regression in ecology. Environmental Modelling & Software, 25, 129-145.
Jenkinson, D. J. (2007). Quantifying Expert Opinion as a Probability Distribution. Ph.D .
thesis, The Open University, UK.
Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapm an &; Hall, London.
Johnson, N., Kotz, S., and Balakrishnan, N. (1994). Continuous Univariate Distributions,
volume 1. Wiley, New York, second edition.
Jouini, M. N. and Clemen, R. T. (1996). Copula models for aggregating expert opinions.
Operations Research, 44, 444-457.
286
Kadane, J. B. and Wolfson, L. J. (1998). Experiences in elicitation. The Statistician, 47,
3-19.
Kadane, J. B., Dickey, J. M., Winkler, R., Smith, W., and Peters, S. (1980). Interactive
elicitation of opinion for a normal linear model. Journal of the Am erican Statistical Asso
ciation, 75, 845-854.
Kadane, J. B., Shmueli, G., Minka, T. P., Borle, S., and Boatwright, P. (2006). Conjugate
analysis of the Conway-Maxwell-Poisson distribution. Bayesian Analysis, 1 , 363-374.
K hatri, C. G. (1968). Some results for the singular normal m ultivariate regression models.
Sankhya, 30, 267-280.
Koornwinder, T. H. (2008).
On a m onotonticity property of the normalized incomplete
gamma function, h ttp ://sta ff.sc ie n c e .u v a.n l/~ th k /a rt/c o m m e n t/. [Accessed 5 December
2011 ].
Krzysztofowicz, R. and Reese, S. (1993). Stochastic bifurcation processes and distributions
of fractions. Journal of the American Statistical Association, 88, 345-354.
Kurowicka, D. and Cooke, R. (2006). Uncertainty Analysis with High Dimensional Depen
dence Modelling.
Wiley Series in Probability and Statistics. John Wiley &; Sons Ltd,
Chichester.
Kwong, K. and Iglewicz, B. (1996). On singular m ultivariate normal distribution and its
applications. Computational Statistics and Data Analysis, 22, 271-285.
Kynn, M. (2005). Eliciting Expert Knowledge fo r Bayesian Logistic Regression in Species
Hapitat Modelling in Natural Resources. Ph.D . thesis, Queensland University of Technol
ogy, Australia.
Kynn,
priors
M.
for
(2006).
logistic
Designing
regression
elicitor:
models
287
Software
in
ecology.
to
graphically
Available
elicit
at
h ttp :/ / www.winbugs—development.org.uk/elicitor/files / designing, elicitor.p df
[Accessed
10th October 2008].
Kynn, M. (2008). The ‘heuristics and biases’ bias in expert elicitation. Journal of the Royal
Statistical Society, Series A, 171, 239-264.
Leonard, T. (1975). Bayesian estim ation methods for two-way contingency tables. Journal
of the Royal Statistical Society, Series B , 37, 23-37.
Lewandowski, D. (2008). High Dimensional Dependence: Copulae, Sensitivity, Sampling.
Ph.D . thesis, Delft University of Technology, Netherlands.
Lindley, D. V., Tversky, A., and Brown, R. V. (1979). On the reconciliation of probability
assessments. Journal of the Royal Statistical Society, Series A, 142, 146-180.
Lochner, R. H. (1975). A generalized Dirichlet distribution in Bayesian life testing. Journal
of the Royal Statistical Society, Series B , 37, 103-113.
Low Choy, S., O ’Leary, R., and Mengersen, K. (2009). Elicitation by design in ecology: Using
expert opinion to inform priors for Bayesian statistical models. Ecology, 90, 265-277.
Low Choy, S., James, A., Murray, J., and Mengersen, K. (2010). Indirect elicitation from
ecological experts: From methods and software to h abitat modelling and rock-wallabies.
In A. O ’Hagan and M. West, editors, The Oxford Handbook of Applied Bayesian Analysis.
Oxford University Press, Inc., New York.
Mahmoud, A. S. H. (2010). New quadrature-based approxim ations for the characteristic func
tion and the distribution function of sums of lognormal random variables. IE E E Transac
tions on Vehicular Technology, 59, 3364-3372.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. C hapm an and Hall,
London, second edition.
288
Meyer, M. C. and Laud, P. W. (2002). Predictive variable selection in generalized linear
models. Journal of the American Statistical Association, 97, 859-871.
Miller, R. B. (1980). Bayesian analysis of the two-param eter gamm a distribution. Techno
metrics, 22, 65-69.
Nelder, J. A. and Mead, R. (1965). A simplex m ethod for function minimization. Computer
Journal, 7, 308-313.
Nelsen, R. B. (1999). A n Introduction to Copulas. Lecture Notes in Statistics, 139. SpringerVerlag, New York.
Oakley, J. (2010). Eliciting univariate probability distributions. In K. Bocker, editor, Re
thinking Risk Measurement and Reporting: Volume I . Risk Books, London.
Oakley, J. E. and O ’Hagan, A. (2010).
SH E L F: the Sheffield elicitation framework
(version 2.0). School of M athem atics and Statistics, University of Sheffield, UK.
http://tonyohagan.co.uk/shelf. [Accessed 9 March 2011].
O ’Hagan, A. (1998). Eliciting expert beliefs in substantial practical applications. The Statis
tician, 47, 21-35.
O ’Hagan, A. and Forster, J. (2004). Bayesian Inference, volume 2B of K endall’s Advanced
Theory of Statistics. Arnold, London, second edition.
O ’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J.,
Oakley, J. E., and Rakow, T. (2006). Uncertain Judgements: Eliciting Expert Probabilities.
John Wiley, Chichester.
O ’Leary, R. A., Low Choy, S., Murray, J. V., Kynn, M., Denham, R., M artin, T. G., and
Mengersen, K. (2009). Comparison of three expert elicitation methods for logistic regression
on predicting the presence of the threatened brush-tailed rock-wallaby Petrogalae peicillata.
Environmmetrics, 20, 379-398.
289
Oman, S. D. (1985). Specifying a prior distribution in structured regression problems. Journal
of the American Statistical Association, 80, 190-195.
Palomo, J., Insua, D. R., and Ruggert, F. (2007). Modeling external risks in project m an
agement. Risk A nalysis, 27, 961-978.
Patel, J. K. and Read, C. B. (1982). Handbook of the Normal Distribution. Marcel Dekker,
Inc., New York.
Peterson, C. R. and Beach, L. R. (1967). Man as an intuitive statistician. Psychological
Bulletin, 68, 29-46.
Powers, D. A. and Xie, Y. (2000). Statistical Methods fo r Categorical Data Analysis. Aca
demic Press, San Diego, CA.
P ra tt, J. W., Raiffa, H., and Schalifer, R. (1995). Introduction to Statistical Decision Theory.
The M IT Press, London.
Rao, C. R. (2002). Linear Statistical Inference and its Applications. John Wiley h Sons,
Inc., New York, second edition.
Rayens, W. S. and Srinivasan, C. (1994). Dependence properties of generalized Liouville
distributions on the simplex. Journal of the Am erican Statistical Association, 89, 14651470.
Safak, A. (1993). Statistical analysis of the power sum of multiple correlated log-normal
components. IE E E Transactions on Vehicular Technology, 42, 58-61.
Schwartz, S. C. and Yeh, Y. S. (1982). On the distribution function and moments of power
sums with log-normal components. The Bell System Technical Journal, 61, 1441-1462.
Shields, M., Gorber, S. C., and Tremblay, M. S. (2008). Effects of m easurement on obesity
and morbidity. Health Reports, Statistics Canada, Catalogue 82-003, 19, 1-8.
290
Stael von Holstein, C. A. S. (1971). The effect of learning on the assessment of subjective
probability distributions. Organizational Behavior and Human Performance, 6, 304-315.
Styan, G. P. H. (1970). Notes on the distribution of quadratic forms in singular normal
variables. Biometrika, 57, 567-572.
Sweeting, T. (1981). Scale param eters: a Bayesian treatm ent. Journal of the Royal Statistical
Society, Series B , 43, 333-338.
Tellambura, C. and Senaratne, D. (2010). Accurate com putation of the M G F of the log
normal distribution and its application to sum of lognormals.
IE E E Transactions on
Communications, 58, 1568-1577.
Tian, G. L., Tang, M. L., Yuen, K. C., and Ng, K. W. (2010).
Further properties and
new applications of the nested Dirichlet distribution. Computational Statistics and Data
Analysis, 54, 394-405.
Tricomi, F. G. (1952). Sulla funzione gamma incompleta. Annali di Matematica Pura ed
Applicata, 31, 263-279.
van Dorp, J. R. and Kotz, S. (2002). A novel extension of the triangular distribution and its
param eter estimation. The Statistician, 51, 93-79.
van Dorp, J. R. and Mazzuchi, T. A. (2000). Solving for the param eters of a b eta distribution
under two quantile constraints. Journal of Statistical Computation and Simulation, 67,
189-201.
van Dorp, J. R. and Mazzuchi, T. A. (2003). Param eter specification of the beta distribution
and its Dirichlet extensions utilizing quantiles. Beta Distributions and Its Applications,
29, 1-37.
van Dorp, J. R. and Mazzuchi, T. A. (2004). Param eter specification of the b eta distribution
and its Dirichlet extensions utilizing quantiles. In A. K. G upta and S. N adarajah, editors,
Handbook of Beta Distribution and Its Applications. Marcel Dekker, Inc., New York.
291
Wallsten, T. S. and Budescu, D. V. (1983). Encoding subjective probabilities: A psychological
and psychometric review. Management Science, 29, 151-173.
West, M. (1985). Generalized linear models: Scale param eters, outlier accommodation and
prior distributions. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith,
editors, Bayesian Statistics 2, pages 531-558. Elsevier, North-Holland.
West, M. and Harrison, J, (1997). Bayesian Forecasting and Dynamic Models. SpringerVerlag, New York, second edition.
West, M., Harrison, P. J., and Migon, H. S. (1985). Dynamic generalized linear models and
Bayesian forecasting. Journal of the American Statistical Association, 80, 73-83.
Wilks, S. S. (1962). Mathematical Statistics. John Wiley h Sons, Inc., New York.
W inkler, R. L. (1967). The assessment of prior distributions in Bayesian analysis. Journal
of the American Statistical Association, 62, 776-800.
Wong, T. T. (1998). Generalized Dirichlet distribution in Bayesian analysis. Applied M ath
ematics and Computation, 97, 165-181.
Wong, T. T. (2005). A Bayesian approach employing generalized Dirichlet priors in predicting
microchip yields. Journal of the Chinese Institute of Industrial Engineers, 22, 210-217.
Wong, T. T. (2007). Perfect aggregation of Bayesian analysis on compositional data. Statis
tical Papers, 48, 265-282.
Wong, T. T. (2010). Param eter estim ation of generalized Dirichlet distributions from the
sample estim ates of the first and the second moments of random variables. Computational
Statistics and Data Analysis, 54, 1756-1765.
Yi, W. and Bier, V. M. (1998). An application of copulas to accident precursor analysis.
M anagement Science, 44, S257-S270.
292
Zuashkiani,
A.,
Banjevic,
D.,
ters
of proportional hazards
data.
Journal of the
and
Jardine,
model based
A.
(2008).
on expert
Operational Research Society,
Estim ating
knowledge
pages
parame
and statistical
1-16.
Available
at http://w w w .palgrave-journals.com /jors/journal/vaop/ncurrent/full/jors2008119a.htm l
[Accessed 17th June 2009].
293