Academia.eduAcademia.edu

Elicitation of Subjective Probability Distributions

2012

To incorporate expert opinion into a Bayesian analysis, it must be quantified as a prior distribution through an elicitation process that asks the expert meaningful questions whose answers determine this distribution. The aim of this thesis is to fill some gaps in the available techniques for eliciting prior distributions for Generalized Linear Models (GLMs) and multinomial models. A general method for quantifying opinion about GLMs was developed in Garthwaite and Al-Awadhi (2006). They model the relationship between each continuous predictor and the dependant variable as a piecewise-linear function with a regression coefficient at each of its dividing points. However, coefficients were assumed a priori independent if associated with different predictors. We relax this simplifying assumption and propose three new methods for eliciting positive-definite variance-covariance matrices of a multivariate normal prior distribution. In addition, we extend the method of Garthwaite and Dickey...

Open Research Online The Open University’s repository of research publications and other research outputs Elicitation of Subjective Probability Distributions Thesis How to cite: Elfadaly, Fadlalla Ghaly Hassan Mohamed (2012). thesis The Open University. Elicitation of Subjective Probability Distributions. PhD For guidance on citations see FAQs. c 2012 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.21954/ou.ro.0000f119 Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk uwResTRicrrep- E licitation o f Subjective P robability D istributions By Fadlalla G haly H assan M oham ed Elfadaly BSc., Cairo U niversity . M Sc., Cairo U niversity . A th esis su b m itted for th e D egree o f D octor o f P h ilosop h y in S ta tistics c =} c<u a. O a D epartm ent o f M athem atics and S tatistics T he O pen U niversity, U K A pril 2012 DcdX oL Subivu-ssvorv: 2 3 /vpiAA Zoil. D a tt oj/ IxwarcL-. 3 Zoi2_ ProQuest Number: 13835951 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is d e p e n d e n t upon the quality of the copy subm itted. In the unlikely e v e n t that the a u thor did not send a c o m p le te m anuscript and there are missing pages, these will be noted. Also, if m aterial had to be rem oved, a n o te will ind ica te the deletion. uest ProQuest 13835951 Published by ProQuest LLC(2019). C opyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346 A cknow ledgem ents Faithful gratitude, sincere thanks, and appreciation are due to Prof. Paul Garthwaite, The Open University, UK, for suggesting the research topic, his supervision, guidance, valuable advice, encouragement, kindness, deep interest and continuous help during the preparation of this thesis. I would like also to thank my co-supervisor Dr. Robin Laney, The Open University, UK, for his advices, directions and his continuous willingness to help. I would like to express my deepest gratitude to my viva examination panel, Prof. Jim Smith, Warwick University, UK, Prof. Kevin McConway and Dr. Karen Vines, The Open University, UK, for their valuable comments, constructive criticism and helpful suggestions. Also, many thanks are due to the experts whose opinions were quantified in the examples of this thesis. I am very grateful to Dr. Neville Calleja, M inistry of Health, the Elderly and Community Care, M alta, for quantifying his opinion in the obesity misclassification example, and to Dr. Stephen Burnley and Dr. James W arren, The Open University, UK, for quantifying their opinions in the waste collection and transport preferences examples, respectively. I wish to thank all members of the Statistics Group, The Open University, UK. They all helped me a lot in a very cooperative and supportive research environment th a t leads to continuous progress and achievement. Special gratitude to the previous PhD students, Dr. Yoseph Araya, Dr. David Jenkinson, Dr. Swarup De, Dr. Youssef Elaziz, Dr. Steffen Unkel, Dr. Angela Noufaily and Dr. Doyo Gragn and also to the current PhD students, Mr. Osvaldo Anacleto-Junior, Mr. Yonas Weldeselassie, Mr. Alexandre Santos, Miss. Sofia Villers. They formed a great academic and social atmosphere for effective work. True gratitude and deep appreciation are due to Prof. Abdel-Hamid Nigm, Cairo Univer­ sity, Egypt, for suggesting, encouraging, and making fruitful efforts to help me undertake my PhD in the UK. I am also very grateful to Prof. Sanaa El Gayar, Cairo University, Egypt, for her faithful guidance and support during my studies for BSc and MSc degrees. She really guided my first steps on an academic career. I am highly indebted to Dr. Osam a Saleh, Cairo University, Egypt, for being such a sincere, supportive and helpful friend. Heartily and earnest thankfulness to the soul of my late father Mr. Ghaly Elfadaly, my caring m other Mrs. Aziza Belal and my two kind sisters Dr. Hanan and Mrs. Fadila Elfadaly, for their faithful wishes and prayers. I am truly and heartily grateful to my beloved wife, Mrs. Nehal M arghany for her steady love, care and support, and to our son, M aster. Malek Elfadaly who lightened up our life with cheer, happiness and innocence. A bstract To incorporate expert opinion into a Bayesian analysis, it must be quantified as a prior distribution through an elicitation process that asks the expert meaningful questions whose answers determine this distribution. The aim of this thesis is to fill some gaps in the available techniques for eliciting prior distributions for Generalized Linear Models (GLMs) and multinomial models. A general method for quantifying opinion about GLMs was developed in Garthwaite and AlAwadhi (2006). They model the relationship between each continuous predictor and the dependant variable as a piecewise-linear function with a regression coefficient at each of its dividing points. How­ ever, coefficients were assumed a priori independent if associated with different predictors. We relax this simplifying assumption and propose three new methods for eliciting positive-definite variancecovariance matrices of a multivariate normal prior distribution. In addition, we extend the method of Garthwaite and Dickey (1988) for eliciting an inverse chi-squared conjugate prior for the error variance in normal linear models. We also propose a novel method for eliciting a lognormal prior distribution for the scale parameter of a gamma GLM. For multinomial models, novel methods are proposed that quantify expert opinion about a conju­ gate Dirichlet distribution and, additionally, about three more general and flexible prior distributions. First, an elicitation method is proposed for the generalized Dirichlet distribution that was introduced by Connor and Mosimann (1969). Second, a method is developed for eliciting the Gaussian copula as a multivariate distribution with marginal beta priors. Third, a further novel method is constructed that quantifies expert opinion about the most flexible alternate prior, the logistic normal distribution (Aitchison, 1986). This third method is extended to the case of multinomial models with explanatory covariates. All proposed methods in this thesis are designed to be used with interactive Prior Elicitation Graphical Software (PEGS) that is freely available at http://statistics.open.ac.uk/elicitation. C ontents 1 In tro d u ctio n 2 L iteratu re rev iew 10 2.1 In tro d u c tio n .................................................................................................................... 11 2.2 Psychological aspects in eliciting o p in io n .................................................................. 11 2.3 Prior elicitation for normal linear models ............................................................... 15 2.4 Prior elicitation for G L M s ............................................................................................ 21 2.5 Prior elicitation for multinomial m o d e l s .................................................................. 37 2.6 O ther general graphical elicitation s o f t w a r e .......................................................... 49 2.7 Concluding c o m m e n ts ................................................................................................. 51 3 1 T h e p iecew ise-lin ear m od el for prior e lic ita tio n in G LM s 53 3.1 In tro d u c tio n .................................................................................................................... 54 3.2 The elicitation m ethod for piecewise-linear models (GA m e t h o d ) ................... 56 3.2.1 The piecewise-linear m o d e l ........................................................................... 57 3.2.2 Eliciting the hyperparam eters of the m ultivariate normal prior . . . . 60 3.2.3 Computing values for the suggested assessments .................................... 67 Assessment tasks and software d e s c rip tio n ............................................................. 71 3.3.1 Defining the m o d e l........................................................................................... 71 3.3.2 Defining the response variable and c o v a r ia te s .......................................... 72 3.3.3 Initial medians a s s e s s m e n ts ........................................................................... 74 3.3 iv 3.4 4 The feedback s t a g e ............................................................................................ 75 3.3.5 Conditional medians a sse ssm e n ts.................................................................. 76 3.3.6 Conditional quartiles assessm ents.................................................................. 77 Concluding c o m m e n ts ................................................................................................. 81 E licitin g a covariance m a trix for d ep en d an t coefficien ts in G LM s 83 4.1 In tro d u c tio n .................................................................................................................... 84 4.2 A proposed m ethod for eliciting the variance-covariance m atrix of a pair of 4.3 5 3.3.4 correlated vectors of coefficients.................................................................................. 85 4.2.1 Notations and theoretical framework . . ................................................. 85 4.2.2 Assessment tasks and software d e sc rip tio n ................................................. 91 4.2.3 On the positive-definiteness of the elicited covariance m atrix 95 .............. A nother elicitation m ethod for the variance-covariance m atrix of correlated coefficients............................. 98 4.3.1 The case of two vectors of correlated c o e ffic ie n ts ..................................... 99 4.3.2 The case of various vectors of correlated c o e ffic ie n ts .............................. 103 4.3.3 Assessment t a s k s ............................................................................................... 109 4.4 A general flexible elicitation m ethod for correlated coefficients......................... 114 4.5 Concluding comments 120 .............................................................................................. E licitin g prior d istrib u tio n s for e x tr a p aram eters o f so m e G LM s 122 5.1 In tro d u c tio n ......................................... 123 5.2 Eliciting a prior distribution for the error variance in normal GLMs 5.3 ............ 124 ........................................... 125 ........................................................ 133 5.2.1 The m athem atical framework and notations 5.2.2 Im plem entation and assessment tasks Eliciting a prior distribution for the scale param eter in gamm a GLMs ... . 137 5.3.1 GLMs with a gamma distributed response v a r ia b le .................................. 138 5.3.2 Assessment t a s k s ............................................................................................... 143 v 5.4 6 148 E licitin g D irich let priors for m u ltin om ial m od els 150 6.1 In tro d u c tio n .................................................................................................................... 151 6.2 Eliciting beta param eters using q u a rtile s ................................................................ 152 6.2.1 Introduction 152 6.2.2 Normal approximations for beta elicitation 6.2.3 Least-squares optimizations for beta param eters 6.3 6.4 7 Concluding c o m m e n ts ................................................................................................. ...................................................................................................... .............................................. 154 ..................................... 158 Eliciting a Dirichlet prior for a multinomial m o d e l .............................................. 160 ...................................................................................................... 160 6.3.1 Introduction 6.3.2 The multinomial and Dirichlet d istrib u tio n s........................................... 161 6.3.3 The marginal approach ............................................................................... 162 6.3.4 The conditional a p p r o a c h ............................................................................ 167 Concluding c o m m e n ts ............................................................................................... 173 E licitin g m ore flexib le priors for m u ltin om ial m o d els 174 7.1 In tro d u c tio n .................................................................................................................... 175 7.2 Eliciting a generalized Dirichlet prior fora multinomial m o d e l................. 176 7.2.1 Connor-Mosimann d is tr ib u tio n ...................................................................... 176 7.2.2 Assessment t a s k s ............................................................................................... 179 7.2.3 Marginal quartiles of the generalized Dirichlet d istrib u tio n ...................... 180 7.3 Example: Obesity m isclassification.......................................................................... 182 7.4 Constructing a copula function for the prior d is trib u tio n .................................... 189 7.4.1 Gaussian copula f u n c t i o n ............................................................................... 189 7.4.2 Assessment t a s k s ............................................................................................... 192 7.4.3 Eliciting a positive-definite correlation m atrix R 197 ...................................... 7.5 Example: W aste collection...............................................................................................203 7.6 Concluding c o m m e n ts ................................................................................................. vi 210 8 E licitin g lo g istic norm al priors for m u ltin om ial m od els 8.1 In tro d u c tio n ........................................................................................................................ 212 8.2 The additive logistic normal d is trib u tio n .................. 8.2.1 8.3 8.4 9 211 212 Approximate distribution of the lognormal s u m .............................................215 Assessment t a s k s .......................................................................................................... 217 8.3.1 Assessing initial medians ...................................................................................217 8.3.2 Assessing conditional q u a r t i l e s ......................................................................... 218 8.3.3 Assessing conditional m ed ia n s............................................................................ 220 Eliciting prior h y p e rp a ra m e te rs......................................................................................221 8.4.1 Eliciting a mean vector ...................................................................................... 227 8.4.2 Eliciting a variance-covariance m a t r i x ............................................................ 227 8.5 Feedback using marginal quartiles of the logistic normal p r i o r ............................. 240 8.6 Example: Transport preferences.....................................................................................245 8.7 Concluding c o m m e n ts ..................................................................................................... 250 E licitin g m u ltin om ial m od els w ith covariates 252 9.1 In tro d u c tio n ......................................... 253 9.2 The base-line multinomial logit m o d e l ........................................................................ 254 9.3 N otation and theoretical framework 9.4 Eliciting the mean vector 9.5 Eliciting the variance m atrix 9.6 ............................ 255 ........................................................................................... 258 ........................................................................................ 261 9.5.1 Eliciting the variance-covariance sub-matrices 9.5.2 Assessing conditional q u a r t i l e s ................. 9.5.3 Assessing conditional m ed ia n s............................................................................. 263 9.5.4 Eliciting the covariance m atrix E ajjg ............................................................ 265 Concluding c o m m e n ts ................................................................................................. 268 10 C on clu d in g com m en ts ...................................261 262 269 vii List of Figures 3.1 A piecewise-linear relationship given by median asse ssm e n ts............................. 56 3.2 A bar chart relationship for a factor given by median assessm ents................... 57 3.3 The dialogue box for defining the m o d e l ................................................................. 72 3.4 The feedback s c r e e n .................................................................................................... 76 3.5 Conditional median assessments for the continuous covariate “Weight” . . . . 77 3.6 Quartile assessments for a continuous c o v a r ia te .................................................... 78 3.7 Quartile assessments for a f a c t o r .............................................................................. 79 3.8 Assessing quartiles conditioning on two fixed p o in ts ............................................. 80 3.9 Assessing conditional quartiles for the last level of a f a c to r ................................ 81 4.1 Assessments needed in the first phase for correlated covariates.......................... 92 4.2 Assessments needed in the second phase for correlated c o v a ria te s ................... 94 4.3 Assessments needed for two correlated v a ria b le s.................................................... 110 4.4 Assessments needed for five correlated v a ria b le s.................................................... 116 4.5 Assessments needed for various correlated v a ria b le s ............................................. 119 5.1 Three dimension plots of d {q^jqj)jdv against v and Cj for various sample sizes k ( j ) .......................................... 131 5.2 Assessing a median value conditioning on a set of d a t a ...................................... 135 5.3 The output table showing the elicited h y p erp aram eters....................................... 136 5.4 Changes in quartile values with the change of A a t different mean values. . . . 142 5.5 The main software panel for assessing gamma p aram eter,................................... viii 145 6.1 Assessing probability quartiles of each c a te g o r y ..................................................... 159 6.2 A feedback screen showing 2 different quartile o p t i o n s ........................................ 166 6.3 Assessing conditional quartiles for Dirichlet elicitation ..................................... 168 6.4 Assessing conditional quartiles with scaled beta feedback..................................... 169 6.5 The feedback graph presenting marginal q u a rtile s .................................................. 172 7.1 Medians and quartiles a s se s s m e n ts ............................................................................ 185 7.2 Assessing conditional m e d ia n s ..................................................................................... 186 .................................................................................. 187 7.3 Assessing conditional quartiles 7.4 Assessing conditional quartiles for copula elicitation ........................................... 194 7.5 Assessing conditional medians for copula e lic ita tio n ............................................... 196 7.6 Software suggestions for conditional medians 197 ........................................................ 7.7 The initially assessed marginal medians and q u a rtile s............................................... 204 7.8 The coherent assessments suggested by the s o f t w a r e ................................................205 7.9 Assessing conditional quartiles .......................................................................................206 7.10 Assessing conditional quartiles for the last two categories ...................................... 207 7.11 Assessing conditional m e d ia n s ...................................................................................... 208 8.1 Assessing probability medians for logistic normal e lic ita tio n ...................................218 8.2 Assessing conditional quartiles with lognormal f e e d b a c k ......................................... 219 8.3 Assessing conditional medians for logistic normal e lic ita tio n ...................................221 8.4 Software suggestions for initial medians 8.5 Assessing conditional quartiles .......................................................................246 .......................................................................................247 8.6 Revised conditional m e d ia n s ............................................................................................. 248 8.7 Software suggestions for marginal medians and q u a r t i le s ......................................... 249 9.1 Assessing probability medians at age = 40 years ...................................................... 259 9.2 Assessing conditional medians at age = 40 years ...................................................... 264 9.3 Assessing conditional medians given changes at the reference p o i n t ......................266 ix 10.1 Options for assessing correlations between regression coefficients ........................273 10.2 A flowchart of the prior elicitation software for multinomial m o d e ls .................... 277 x List of Tables 7.1 Probability assessments for different elicited p r i o r s .............................................. 188 7.2 E x p ert’s assessments of medians and q u a rtile s ............................................................ 205 7.3 E xpert’s assessments of conditional q u a rtile s................................................................206 7.4 E x p ert’s assessments of conditional m e d ia n s ................................................................208 7.5 The elicited hyperparam eters of marginal b eta d is trib u tio n s .................................. 209 7.6 The elicited covariance m atrix of the Gaussian copula p r i o r .................................. 209 7.7 Probability means and variances from marginal b e ta d istrib u tio n s.........................210 8.1 The elicited mean vector of a logistic normal p r i o r ...................................................249 8.2 The elicited variance-covariance m atrix of a logistic normal p r i o r .........................250 xi C hapter 1 Introduction In many situations there is a substantial amount of information th a t is only recorded in the experience and knowledge of experts. To efficiently use this knowledge as an input to a statistical analysis, the experts m ust be asked meaningful questions whose answers determine a probability distribution. This process is referred to as elicitation and different forms of probability model require different elicitation methods. Bayesian statistics offers an approach in which data and expert opinion are combined at the modelling stage, yielding probabilities th a t are a synthesis of the survey data and the expert’s opinion. To incorporate expert opinion into a Bayesian analysis, it m ust be quantified as a prior distribution. This should be accomplished through an elicitation process th a t asks the expert to perform various assessment tasks. These tasks include questions th a t the expert is able to comprehend and answer accurately according to her prior knowledge, w ithout needing to know about m athem atical and statistical coherence th a t is required in her assessments. The elicitation of prior beliefs has been studied extensively in the statistical, psycho­ logical, decision and risk analysis literature. Elicitation techniques have been proposed for many probabilistic models including both univariate and m ultivariate probability distribu­ tions. However, achieving accurate elicitation is not an easy task, even for single events or univariate distributions. The difficulty increases for m ultivariate distributions in which many constraints m ust be imposed on the expert’s assessments to be statistically coherent. Due to this complexity, relatively little literature deals with elicitation techniques for m ultivariate distributions. O ’Hagan et al. (2006) argued th a t the lack of elicitation m ethods for m ulti­ variate models and the lack of user-friendly elicitation software to implement them constitute remarkable deficiencies in the existing elicitation research. The aim of this thesis is to fill some gaps in the available techniques for eliciting prior distributions for m ultivariate models. We are mainly interested in eliciting prior distribu­ tions for the param eters of Generalized Linear Models (GLMs) and multinom ial models. We extend some of the available methods of prior elicitation for GLMs param eters and propose some original novel methods for eliciting different prior distributions for the param eters of multinomial models. All proposed methods in this thesis are designed to be used with in­ teractive graphical software th at is w ritten in Java and tailored to the specific requirements of each m ethod. These pieces of software are freely available as Prior Elicitation Graphical Software (PEGS) at http://statistics.open.ac.uk/elicitation. The elicitation methods for GLMs th a t are available in the literature focus mainly on logistic regression. A more general elicitation m ethod for quantifying opinion about a logistic regression model was developed in Garthwaite and Al-Awadhi (2006). The m ethod is very general and flexible and can be generalized to GLMs with any link function. The same authors proposed this generalization in an unpublished paper, Garthw aite and Al-Awadhi (2011). In their m ethod, the relationship between each continuous predictor and the dependant variable is modeled as a piecewise-linear function and each of its dividing points is accompanied w ith a regression coefficient. However, a simplifying assum ption was made regarding independence between these coefficients, in the sense th a t regression coefficients were a priori independent if associated with different predictors. One of the main purposes of this thesis is to relax the independence assum ption between coefficients of different variables. Then the variancecovariance m atrix of the prior distribution is no longer block-diagonal. Different elicitation methods for this more complex case are proposed and it is shown th a t the resulting variancecovariance m atrix is positive-definite. The m ethod of Garthw aite and Al-Awadhi (2006) was designed to be used w ith the aid of interactive graphical software. It has been used in practical case studies to quantify the opinions of ecologists and medical doctors (Al-Awadhi and Garthw aite (2006); Garthw aite et al. (2008)). The software is revised and extended further in this thesis to handle the case of GLM w ith correlated pairs of covariates. Available methods of prior elicitation for GLMs all concentrate on the task of quantify­ ing opinion about regression coefficients. For some GLMs, such as logistic regression, this determines the prior distribution completely. B ut w ith some other common GLMs, such as the normal linear model and gamma GLMs, prior opinion about an extra param eter m ust also be quantified in order to obtain a prior distribution for all model param eters. For this reason, we extend the m ethod of Garthwaite and Dickey (1988) for eliciting an inverse chisquared conjugate prior for the error variance in normal linear models. We also propose a novel m ethod for eliciting the scale param eter of a gamma GLM. The other m ultivariate model for which we develop original elicitation methods in this thesis is the multinomial model. M ultinomial models consist of items th a t belong to a number of complementary and m utually exclusive categories. These models arise in m any scientific disciplines and industrial applications. The multinomial d a ta are well described using the multinomial distribution, say with param eter vector p. In Bayesian analysis of multinomial models, an im portant assessment task is to elicit an informative joint prior distribution for the multinomial probabilities p. It is well-known th a t the Dirichlet distribution is a conju­ gate prior for the param eters of multinomial models. A limited number of attem pts have been made to introduce elicitation methods for Dirichlet param eters. However, the Dirichlet distribution has been criticized as insufficiently flexible to represent prior information about the param eters of multinomial models [e.g.Aitchison (1986), O ’Hagan and Forster (2004)]. Its main drawback is th a t it has a limited num ber of param eters. A fc-variate Dirichlet distri­ bution is specified by just k param eters th a t determine all means, variances and covariances. Dirichlet variates are always negatively correlated, which may not represent prior belief. Several authors have been interested in constructing new families of sampling distributions to model proportions. Some of these distributions can be used as prior distributions for the probabilities of multinomial models. See, for example, Forster and Skene (1994) and Wong (1998). However, elicitation methods th a t give these more flexible families as prior distributions for multinomial models have not been proposed. It is tricky, in the case of multinomial models, to elicit assessments th a t satisfy all the necessary constraints. Some of these constraints are obvious; the probabilities of each category m ust be non-negative and sum to one, for example. Others are less obvious. For example, if there are only two categories, the lower quartile for one category and the upper quartile of the other category m ust add to one. As the num ber of categories increases the constraints th a t must be satisfied increase and become less intuitive. Partly because of these difficulties, no doubt, elicitation methods and software for multi­ nomial sampling seem to have been constructed only for modelling opinion by a Dirichlet distribution. In this thesis, we propose novel m ethods th a t quantify expert opinion about a Dirichlet distribution and additionally about three more general and flexible prior dis­ tributions. First, an elicitation m ethod is proposed for a generalized Dirichlet distribution as a more flexible prior distribution. The generalized Dirichlet distribution, introduced by Connor and Mosimann (1969), has a more general covariance structure than the standard Dirichlet distribution and a larger number of param eters. Second, another m ethod elicits the Gaussian copula as a m ultivariate prior th a t expresses the dependence structure between the marginal beta priors of multinomial probabilities using a m ultivariate normal distribution. Third, a further novel m ethod quantifies expert opinion about the most flexible alternate prior, the logistic normal distribution, Aitchison (1986). W ith this distribution, the m ulti­ nomial probabilities are transform ed to variables th a t (by assumption) follow a m ultivariate normal distribution, using a m ultivariate form of the logistic transform ation. These different elicitation m ethods are each implemented in interactive graphical software. The logistic normal distribution has a large number of param eters and gives a prior distribution with a much more flexible dependence structure. Moreover, assuming a logistic normal prior for multinomial models enables us to extend the elicitation m ethod to the case of multinomial models with explanatory covariates. For these models, we proposed a m ethod for eliciting a m ultivariate normal prior distribution for the regression coefficients based on the m ultivariate logistic transform ation. The assessment tasks and the task structure implemented in all the proposed m ethods lead to coherent assessments without the expert having to be conscious of coherence con­ straints. Using the interactive software, the expert is only required to assess conditional an d /o r unconditional medians and quartiles for the elements of the probability vector p. For each of the available prior distributions, the expert does not need to be conscious of the con­ straints on her assessments. Instead, through the software we suggest coherent values th at are close to her initial assessments, which she may accept or modify. This thesis consists of 10 chapters. After this introductory chapter, C hapter 2 first gives a brief review of the main findings and considerations from psychological literature th a t should influence the construction of elicitation methods. Then the most relevant methods of eliciting prior distribution for normal linear models and GLMs are reviewed and discussed. Interactive computer software for these purposes is also listed with some of the different applications for which they have been used. In addition, the limited literature of prior elicitation methods for multinomial models is also reviewed, together with its implementing software. We also discuss some recent interactive graphical computer programs th a t have been reported in the literature for some other problems. In Chapter 3, the piecewise-linear model of G arthwaite and Al-Awadhi (2006), for eliciting m ultivariate normal priors for regression coefficients in GLMs is reviewed in detail and the assessment tasks th a t the expert performs to quantify her opinion are discussed. Also, we describe the software th a t implements it and detail improvements to the im plem entation th a t were made by the author of this thesis. As mentioned earlier, the elicitation m ethod of Garthw aite and Al-Awadhi (2006) makes the simplifying assumption th a t the regression coefficients associated w ith different predictors are independent in the prior distribution. In Chapter 4, we propose 3 new m ethods for eliciting positive-definite variance-covariance matrices of a m ultivariate normal prior for regression coefficients th a t do not require this simplifying assumption. Each m ethod is a trade-off between flexibility and the number of assessments th a t must be made by the expert. The first m ethod proposed in Chapter 4 is an extension to the m ethod of Garthw aite and Al-Awadhi (2006). It is the most flexible of the m ethods but it needs a large num ber of assessments. The second m ethod requires fewer assessments but assumes a restricted correla­ tion pattern between regression coefficients. The third m ethod first uses one of the other two methods to obtain the correlations between the regression coefficients of two predictors. Then all other correlations are induced through some assessed weights th a t reflect the m agnitude of correlations relative to each other. The expert assesses these weights and then an imple­ menting software presents interactive graphs th a t help her review and revise assessments to her satisfaction. In C hapter 5, we introduce two elicitation m ethods th a t aim to complete the prior struc­ ture of the normal and gamma GLMs. The methods quantify expert opinion about prior distributions for the extra param eters of these models. The first proposed methods elicits a conjugate inverted chi-squared prior distribution for the error variance in normal models. Our proposed m ethod is based on the expert’s assessments of medians and conditional me­ dians of the absolute difference between two observed values of the response variable at the same design point. It extends the m ethod of Garthw aite and Dickey (1988) by using more than one d a ta set of hypothetical future samples. The second proposed m ethod in C hapter 5 is a novel m ethod for eliciting a lognormal prior distribution for the scale param eter of gamma GLMs. Given the mean value of a gamm a distributed response variable, the m ethod is based on conditional quartile assessments. It can also be used to quantify an expert’s opinion about the prior distribution for the shape param eter of any gamma random variable, if the mean of the distribution has been elicited or is assumed to be known. Chapter 6 proposes two methods for eliciting a standard Dirichlet prior distribution for multinomial probabilities, using either a m arginal or a conditional approach. The main difference between the two proposed approaches is in the assessment tasks th a t they require. In the marginal approach, the expert assesses unconditional medians and quartiles for each multinomial probability pi. Then we use these quartiles to obtain a marginal beta distribution for each pi. The param eters of these m arginal betas are reconciled to form a standard Dirichlet distribution. Three different forms of reconciliation are used, each based on least-squares optimizations. For each optim ization m ethod, the medians and quartiles of the consequent Dirichlet distribution are computed and graphically presented to the expert, who chooses which of the Dirichlet distributions best represents her opinion. She is also offered the option to change the medians and quartiles if none of the offered sets is an adequate representation of her opinions. The other approach proposed in Chapter 6 is the conditional approach. Using this ap­ proach, the expert is asked to assess the median and quartiles of the first probability. For each of the remaining probabilities, she assesses conditional medians and quartiles, where the conditions state values for the preceding probabilities th at the expert should treat as correct when making her assessments. These conditional assessments are then used to form conditional beta distributions th a t are also reconciled into a standard Dirichlet distribution. New elicitation methods for two more general prior distributions for multinomial models are proposed in Chapter 7. The first m ethod uses the same conditional assessments, as obtained in C hapter 6, to elicit a flexible generalized Dirichlet prior, a Connor-Mosimann distribution, through its conditional beta distributions. The flexibility of the generalized Dirichlet distribution means th a t the elicited param eters of these conditional betas are exactly the same hyperparam eters of the elicited generalized Dirichlet prior; no reconciliation is required. This elicitation m ethod and the elicitation methods proposed in C hapter 6 are compared in an example in Section 7.3. In the example, a prominent medical expert in M alta quantified his prior opinions about obesity misclassification in health surveys in M alta. The second proposed m ethod in C hapter 7 elicits a Gaussian copula prior for the m ultino­ mial probabilities. To do this, marginal beta distributions for the multinomial probabilities are obtained from their assessed unconditional medians and quartiles. Then the correla­ tions between the multinomial probabilities are elicited using extra sets of assessments of their conditional medians and quartiles. The proposed Gaussian copula prior assumes th a t the dependence structure between the multinomial probabilities can be represented by a m ultivariate normal distribution, where the m arginal prior distribution of each multinomial probability is still expressed as a beta distribution. In Section 7.5, the proposed elicitation method and its implementing software are used by an environmental engineering expert to quantify his opinion about the fuel used by waste collection vehicles in the UK. In Chapter 8, a novel m ethod is proposed for eliciting a logistic normal prior distribution for the probabilities of a multinomial distribution. The m ethod requires conditional medians and quartiles of multinomial probabilities to be assessed. No beta distribution is elicited, instead, a monotonic m ultivariate logistic transform ation is used to transform these assess­ ments into medians and quartiles of a m ultivariate normal vector. Then a mean vector and a positive-definite covariance m atrix of the m ultivariate normal are determined using the trans­ formed quartiles. The adopted structural m ethod of getting assessments guarantees th a t the elicited variance-covariance m atrix is positive-definite. Chapter 8 also gives an illustrative example in which prior knowledge of a transport expert is quantified to elicit a logistic normal prior distribution for a multinomial model about a transportation problem. The elicitation m ethod proposed in C hapter 8 for logistic normal priors of multinomial distributions is extended further in C hapter 9 to handle multinomial models th a t contain explanatory covariates. Our extended m ethod in Chapter 9 elicits a m ultivariate normal prior distribution for the regression coefficients associated w ith different covariates in a form of the base-line multinomial logit model. For k categories and m covariates, the model th a t contains a constant term has exactly (k — l ) ( m + 1) free param eters. In Chapter 9, we show th a t the same assessment tasks of Chapter 8 can be repeated for each covariate to elicit a mean vector and a positive-definite variance-covariance m atrix of a m ultivariate normal prior distribution for the (k — l){m + 1) regression coefficients. Concluding comments are given in C hapter 10 where some directions for future research are also considered. 9 C hapter 2 Literature review 10 2.1 In trod u ction Relatively recent comprehensive reviews of eliciting probability distributions in its theory, methods, techniques, software, applications and case studies are found in Garthw aite et al. (2005), O ’Hagan et al (2006) and Jenkinson (2007). The aim of this chapter is to review the recent literature on quantifying expert opinion th a t is most relevant to eliciting prior distributions for Bayesian GLMs and multinomial models. The emphasize here is on the different statistical formulations of elicitation models as well as on the design of the software pieces available in the literature as elicitation tools. A brief review of some im portant elicitation topics, ideas and psychological aspects is given in Section 2.2. The im portant elicitation m ethod of Kadane et al. (1980) for normal linear models is reviewed in Section 2.3, where some other elicitation m ethods for these models are also reviewed briefly. Im portant and recent elicitation m ethods and software tools available in the literature for the prior distributions of Bayesian GLMs are reviewed in Section 2.4. However, most of these methods and their accompanying com puter programs were devoted to prior elicitation of the Bayesian logistic regression models w ith anticipated extensions to the more general family of GLMs. Section 2.5 reviews available m ethods and computer programs for quantifying expert’s opinion about priors for multinomial models. As expected, the m ajority of these methods and tools are quantifying opinions about the simple conjugate prior, the Dirichlet distribution. Some of the recent graphical interactive software th a t quantifies expert opinion about different problems other th an GLMs and multinomial priors are reviewed in Section 2.6. 2.2 P sych ological a sp ects in elicitin g op inion Psychological research on hum an performance in assessing probabilities dates back to the 1960’s. Peterson and Beach (1967) in their paper “M an as an Intuitive Statistician” studied hum an statistical inference for estim ating proportions, means, variances and correlations. 11 Their results conclude th a t m an can use probability theory and statistics intuitively in per­ forming these inferential tasks. In the same year, W inkler (1967) stated th at, in assessing prior distribution for Bayesian analysis, the expert has no ‘tru e ’ built-in prior distribution th a t can be elicited. Instead, an elicitation process only “helps to draw out an assessment of a prior distribution from the prior knowledge” . This prior distribution is affected by both the assessor and the assessment techniques. G arthw aite et al. (2005) reviewed a body of psychological literature about some of the main m ental operations, heuristics, th a t an expert may perform in his mind to give a specific numeric assessment and biases th at may influence these operations. A recent comprehensive review of psychological research on assessing probabilities including heuristics and biases is given by Kynn (2008). She also provided some guidelines for eliciting expert knowledge based on hum an biases and inadequacies in assessing probabilities given in the psychological literature. O ther useful discussions on psychological aspects in the elicitation context may be found in Hogarth (1975), W allsten and Budescu (1983) and O ’Hagan et al. (2006). The main interest of this thesis is to elicit m ultivariate probability distributions. Mul­ tivariate distributions require more quantities to be elicited th an univariate distributions. Beside the usual summaries of each random variable, the dependence structure between all variables m ust be also assessed. In the rest of this section, we briefly review psychological aspects involved in assessing quantities required for m ultivariate distributions. As a measure of central tendency for each random variable, we have decided to elicit its median value from the expert. Experim ental work in the literature reveals th a t people are better at eliciting medians rather than means, especially for skewed distributions. See Garthwaite et al. (2005) and references therein. The median value can be assessed through one step of the bisection m ethod, see for example W inkler (1967), Stael von Holstein (1971) and P ra tt et al. (1995). The expert is asked to determine her m edian as the value th a t the random variable is equally likely to be less than or greater than. For more discussion about bisection tasks and their usage, see for example Garthw aite and Dickey (1985), Hora et al. 12 (1992) and Fischer (2001). To elicit variances, we have chosen to assess the two quartile values of each univariate dis­ tribution. By assuming a smooth unimodal distribution, such as the normal or approxim ate normal distribution, quartiles are transform ed to elicit the variances. Quartiles can be easily assessed using the bisection m ethod, which is also called the successive subdivision m ethod, as follows. The upper quartile is assessed by asking the expert to assume th a t the random variable is above her assessed median value. She is then asked to assess her upper quartile as the value th a t the random variable is equally likely to be less th an or greater than. Similarly, the lower quartile is assessed as the value th a t divides the range below the median into two equally likely ranges. The assessed quartiles represent a central 50% credible interval. People can perform the task of assessing credible intervals reasonably well. However, there is a clear tendency for people to be overconfidence in assessing central credible intervals; they tend to give shorter intervals [Garthwaite et al. (2005)]. Some other quantiles were found to reduce the degree of overconfidence, such as the 33 and 67 percentiles. O ’Hagan (1998) suggested using the central 66% interval, and mentioned th a t experimental work about different quantile assessments had not revealed any single choice to be the best in all cases. For more details, see Hora et al. (1992), Garthw aite and O ’Hagan (2000) and Kynn (2005, 2006). To complete the elicitation process of a m ultivariate distribution for dependent variables, summaries of dependence structure must be elicited. Typically, determ ining correlations is the trickiest p art in a m ultivariate elicitation, especially when there are more th an two random variables and a variance-covariance m atrix m ust be assessed. Such a m atrix must be positive-definite for m athem atical coherence. We will make extensive use of the m ethod of Kadane et al. (1980) to elicit positive-definite variance-covariance matrices. The m ethod is described in the next section. It relies on assessing conditional medians and quartiles to compute conditional variances and covariances. Conditional quartiles are assessed in a structural way th a t guarantees positive-definiteness. 13 Assessing conditional quartiles is not, however, the only way to elicit correlations. Other m ethods were suggested in Clemen and Reilly (1999) and Clemen et al. (2000). These m eth­ ods include direct assessment of a correlation coefficient, and assessing conditional percentiles or probabilities of one variable given percentiles or probabilities of the other variable, either for one or two items from the population. These assessments were used to calculate Pear­ son, Spearm an and Kendall’s r correlation coefficients. Although Clemen and Reilly (1999) discussed building copula functions as joint distributions, th a t can be elicited using marginal distributions and elicited correlations, they did not attem pt to obtain a positive-definite variance-covariance m atrix for m ultivariate distributions. In summary, in building our proposed elicitation methods throughout this thesis, we take into account the following considerations. These were mentioned by Kadane and Wolfson (1998) as the points of agreement among most of the statistical literature on how elicitation should be carried out. 1. Expert opinion is the most worthwhile to elicit. 2. Experts should be asked to assess only observable quantities, conditioning only on covariates (which are also observable) or other observable quantities. 3. Experts should not be asked to estim ate moments of a distribution (except possibly the first moment); they should be asked to assess quantiles or prob­ abilities of the predictive distribution. 4. Frequent feed-back should be given to the expert during the elicitation pro­ cess. 5. Experts should be asked to give assessments both unconditionally and con­ ditionally on hypothetical observed data. 14 2.3 P rior elicita tio n for norm al linear m od els Although it was introduced as an elicitation m ethod for the param eters of a normal linear model, the work of Kadane et al. (1980) has been an im portant step towards eliciting prior distributions for GLMs, and even for eliciting many other m ultivariate distributions. See, for example, Dickey et al. (1986) Al-Awadhi and Garthwaite (1998), Garthw aite and Al-Awadhi (2001, 2006). The ideas of Kadane et al. (1980) are utilized, modified and implemented extensively throughout this thesis. A detailed review of their elicitation m ethod is given below. Suppose the normal linear model is given by Y = X!f3 + e, where X_ = ( ^ . . . } x r)' £ ~ N ( 0 , c t 2) ( 2 . 1) • • • , (3rY a vector of r explanatory variables, and is the vector of regression coefficients. Kadane et al. (1980) introduced an elicitation m ethod for the natural conjugate prior distribution structure of the param eters in model (2 . 1 ) as (2 .2 ) w5 (2.3) The hyperparam eters to be elicited are thus a mean vector 6, the two positive scalars 5, w and a positive-definite m atrix R. The expert cannot be asked about these quantities directly as they are not observable. Instead, the prior distributions are induced from expert assessments about the response variable Y , which is an observable quantity, at some given values of the explanatory variables. Hence, a num ber of m realizations X i , ■• • , 2Lm selected. Kadane and Wolfson (1998) discussed how these design points can be selected efficiently. At each design point X _^i = 1, • • • , m, the expert assesses a median value quartile 0.75 and a 0.9375 quantile 0.9375 0 .5 , an upper of the explanatory variable Y{. The quantile yi,0.9375 can be obtained using two bisection iterations above y^ 0 .7 5 . These assessments were used by Kadane et al. (1980) to elicit b and S as follows. To elicit the mean vector b, the assessed medians were treated as observations of Y , and b was elicited as the least-squares estim ate (2.4) where y Q5 = (yi.o.5,2/2.0.5, • • • ,2/m,o.s)', and X is the design m atrix, which is given by X = Under the prior structure in (2.2) and (2.3), the predictive distribution of (F |X ) is a m ultivariate t distribution with 5 degrees of freedom. To elicit S, Kadane et al. (1980) pointed out th a t the ratios , v ,s Vi , 0.9375 - Vi, 0.5 a»C£i) = —--------- — Vi, 0.75 — , /0 (2-5) Vi, 0.5 depend only on 5 as a measure of the thickness of the distribution tails. Since the standard normal distribution has the minimum value of this ratio as 2.27, Kadane et al. (1980) used a* instead of a; to elicit <5, where a*(X{) = m ax{ai(X i), 2.27}. Then S was elicited as the nearest value of degrees of freedom th a t gives the closest ratio ij(0.9375)/^(0.75) to -* = l ai (X i) (2.6) m We propose a different m ethod for eliciting a degrees of freedom hyperparam eter in Chap­ ter 5 of this thesis. Our proposed m ethod is an extension of the approach given by Garthw aite and Dickey (1988), which is described later in Chapter 5. Although the m ethod of Kadane et al. (1980), for eliciting a positive definite m atrix R and a value for w, is complicated and requires substantial m athem atical notation and details, we review it here because its structural elicitation approach is essential in our proposed m ethods for eliciting positive-definite matrices throughout this thesis. The m ethod is based on the properties of the m ultivariate t distribution. The center and spread of the distribution are defined as follows. For any constant vector a , and any constant m atrix B, if Y_ has a standard m ultivariate t distribution, then the center of the vector Z_ = a + BY_ is defined as C{Z) = a. The spread of Z_ is defined as S ( Z ) = B B ' . If 5 > 1, then the mean exists and B ( Z ) = C (Z). If S > 2, then the variance exists, and is 16 given by V ar(Z_) = jz^S(Z_). E x p ert’s assessments were used to compute centers and spreads to elicit R and w , as detailed below. The conditional elicitation structure suggested by Kadane et al. (1980), for i = 2, • ■• , ra, involved assessing conditional medians and upper quartiles of Yi given sequences of hypo­ thetical values Vi, - ' - The conditions th a t were imposed on these hypothetical values insured discrepancy between conditional and unconditional centers, in the sense th a t y°i ? C (U ), y H C (Y i\y l-- (2.7) t= (2.8) These conditions guarantee the existence of the elicited positive-definiteness m atrix R , as will be shown later. Centers and conditional centers were assessed using medians and conditional medians. For example, C{Y\) was taken as the unconditional median assessment yifi. 5 . For j < i, C ( Y i\y i,’ " ,y j) were taken as the conditional medians of Yi given th a t Yi = y \ , • • • ,Yj = y ®, which are denoted by (j/i.o.sls/ij • ■• ,Vj)- Similarly, conditional upper quartiles of Y; given y \ , • • • ,y® are denoted by (^ , 0 .7 5 \Vi, ■• • , Vj)- Spreads and conditional spreads were computed by dividing the assessed semi-interquartile range by the corresponding sem i-interquartile range t(5 ,0.75) of a standard m ultivariate t distribution w ith 5 degrees of freedom. This gives S(Yi) y 1,0.75 — y 1,0.5 2 t(5 ,0.75) (2.9) and, for i = 1,2, • • • , m — 1, (Z/i+1,0.7512/l >‘ ' , V i ) ~ ( 2 / t + l ,0 . 5 | 2 / i > - " , y i ) t(5 + i, 0.75) ( 2 . 10 ) To elicit a positive-definite m atrix R, the approach of Kadane et al. (1980) is to successively elicit the spread matrices Ui of (Yi, • ■• , Yj) in a way th a t guarantees the positive-definiteness of the final m atrix, Um . The value of U\ equals S'(Yi) > 0 as given in (2.9). Then, supposing th a t Ui has been estim ated as a positive-definite m atrix, the aim now is to elicit Ui+i, and 17 show it is positive-definite. Ui+\ is partitioned as Ui Uiii+ 1 (2 .11) Ui + 1 = £ Ui S(Yi+1) Conditional median assessments were used to estim ate g as follows. The partition in (2.11), with the properties of the m ultivariate t distribution, gives C(YiJrl\ y l - - - ,!/,■) - C (yi+i) = (v\ - C (Yi), ■■■ ,y? - C (Y J ) g.+y (2.12) Moreover, for j < i , taking the center of both sides of (2.12) given th a t Y\ = y \ , - • • , Yj = Vji gives Vi ~ C(Yi) Vj ~ C (yj) C(Yi+1 \ y l .. . ,Vj) — C(Yi+i) = (2.13) C(Yj+ 1 \ y l - - - >y^ - C ( Y j+1) C iY ily l--- , y ? ) - C ^ ) Since j — 1,2, • • • , i, Kadane et al (1980) ended up with a system of %equations of the form — i+1 (2.14) 1’ where c ( y i+i | y f ) - c ( y i+i) C(Yi+1 \yly° 2 ) - C ( Y i+1) (2.15) hi+i — C(Yi+l\ y l - - - ,</;) —C(Yi+i) and ^ -c (Y i) c ( y 2 \y? ) - c ( y 2) 2 /? - C ( U ) V° - C(Y2) - C(Yi) y \ - <7(Y2) ••• cu ^ ?)-c(y ) < 7 ( ^ 2 ,° ,^ ) - C ( Y i) M i+1 — 18 (2.16) M ultiplying both sides of (2.14) from the left by the m atrix Qi+l — 1 0 0 ... 0 -1 1 0 ... 0 0 -1 1 0 : (2.17) 0 0 -1 0 gives an upper diagonal system th a t can be solved for g 1 as follows, -1 Vi ~C{ Yi ) 0 C(Y2\y0l) - C ( Y 2) ■ 2/20 - c ( y 2|y?) C (5 % ?) - C{Yi) ■ Zi+ii — i+i : 0 0 ... (2.18) : 0 /?-C'(yj|22?,... .;/?_,) 2 where 9i+1 = Qi+ihi+1. Under conditions (2.7) and (2.8), the upper diagonal m atrix in (2.18) is nonsingular and hence a unique solution for g exists. It remains now to elicit the value of the spread 5 (Y + i) in (2.11). Kadane et al. (1980) used the elicited conditional spread, with the properties of the conditional spread of m ultivariate t distribution, to get a formula for S'O'i+i) as follows, S(Yi+1) = SW+ilff?,--- ,»?)[l + i/<5] (2.19) where Hi^ifi-CiY,), •••, ^ - C { Y i ) ) U t ( y f - C ( y i) , •••, yf - C(Yi))'- Using Schurr complement, the m atrix Ui+i as partitioned in (2.11), is positive-definite if and only if Ui is positive-definite and S(Yi+1) - tfi+1Uig.+1 >0, ( 2 .20 ) which is guaranteed from (2.19). Then, using m athem atical induction, the final m atrix Um is positive-definite. 19 To elicit R using Um , properties of the m ultivariate t distribution were used to yield the following formula R- 1 = ^! { X ' X ) - 1X '( U m - w l ^ X i X ’X ) - 1, (2.21) where Im is the identity m atrix of order m. See Kadane et al. (1980) for details. The formula requires w to be elicited first. To elicit w, the expert is asked to suppose th at two independent observations Yi and Y* are taken at the same design point JL — 2Q- Given V i>• • • >Vi- 1 >the expert assesses the median of Yi which is used to estim ate C(Yi\i/i, • • • , Then the expert is given a hypothetical value y° for Yi and is asked to assess the conditional median of Y* given y \ , • • • , y? to be used as an estim ate of C (Y*\yi, • • • ,y f). The conditional distribution of the two observations is a bivariate t, and its properties were used to elicit Wi as wi = [5 ( ^ 1!/;, • • • , y U ) - K *]Ss + l r (2,22) where Ki = [c {y ; \ vl ■■■ , y? ) - c ( Y i \ v l ■■■< y li ) ] vo S{c m v a y Vi i) ■■■ iVi-i) and L' = •••. v U - x ! i^ b ) u r - \ ( y Q l - x ! 1b_, v ti- Different values wi, ■• • ,w m , were then averaged to get a final elicited value w. Our exten­ sion of the m ethod suggested by Garthw aite and Dickey (1988) for eliciting w, as proposed in Chapter 5, makes the same assumption of getting two independent observations at the same design point. But we require a median assessment of the difference between the two observations, which is due only to the random variation. The m ethod of Kadane et al. (1980) has been extensively reviewed in the literature. See for example Kadane and Wolfson (1998) and Daneshkhah and Oakley (2010), where two extra examples for its implementation were also discussed. Two drawbacks of the m ethod were mentioned by Garthwaite et al. (2005). The assessments it uses are likely to be biased by conservatism as the expert is asked to revise her opinion based on hypothetical data. 20 Eliciting the spread using the median and upper quartile may not reflect both halves of the distribution, hence masking any asymm etry of expert opinion. Some other alternate methods for eliciting the param eters of normal linear models are available in the literature. See, for example, Oman (1985), Garthw aite and Dickey (1988, 1992) and Ibrahim and Laud (1994). Oman (1985) used empirical Bayes methods to estimate both 5 and R instead of eliciting them from the expert. The m ethod of Garthwaite and Dickey (1988) is similar to th a t of Kadane et al. (1980) in th a t both of them make use of repeated assessments th a t are reconciled and utilize a structural set of conditional questions to guarantee the positive-definiteness of the covariance m atrix. However, instead of asking about Yi, Garthwaite and Dickey (1988) suggested asking the expert about the mean Y{ of Y th at may be observed in a large num ber of experiments at the design point X^. In this way, the expert’s assessments do not include random variation. On the other hand, the design points th a t are used in Garthw aite and Dickey (1988) are to be selected by the expert. This enabled the m ethod to be extended to the variable selection problem in linear models, see Garthwaite and Dickey (1992). Nevertheless, the m ethod of Kadane et al. (1980) is more flexible th an th a t of Garthw aite and Dickey (1988). The latter is not designed to handle categorical explanatory variables nor polynomial regression models th a t contain interactions between explanatory variables. A more detailed review of normal linear models elicitation can be found in Garthwaite et al. (2005) or O ’Hagan et al. (2006). 2.4 P rior elicita tio n for GLM s Starting from the idea th a t it is more efficient and easier to elicit expert opinion about observable quantities, rather than about param eter values, Bedrick et al. (1996) were the first to elicit priors for some arbitrary generalized linear models. Their work switched from normal linear regression elicitation (Kadane et al. (1980); Garthw aite and Dickey (1988); Garthw aite and Dickey (1992)) into GLM. Their specification of informative prior distributions for the regression coefficients of a GLM is based on expanding the idea of conditional means priors 21 (CMP). The idea of the CMP is th a t the expert is asked to give his assessment of the mean of potential observations conditional on given values at some carefully chosen points in the explanatory variable space. This information is used to specify a prior distribution at each location point. These priors are conveniently assumed to be independent for the various locations. A prior distribution for the regression coefficient vector is then induced from the CMP. To clarify this idea, consider for example the binomial GLMs, with n independent obser­ vations Y{, each w ith a corresponding vector X_{ of p explanatory variables. Let N{Yi\X_i ~ Binomial(N j,pi), hence pi — E{Yi\X_j). The probability of success p is related to the vector X_ through a monotonic increasing link function g(.) as g(ti) = x l p , (2.23) where (3 is a p vector of regression coefficients. Common choices for the link function g(.) yield logistic, probit and complementary log-log regressions. The likelihood function for /3 is given by L(0) (X n < r 1Q d £ )'',,1'i [l - S - 1G £ g )]JVi(1- y‘). (2.24) i=l Bedrick et al. (1996) induced the prior on /3 from a CMP on pi = E(Yi\X_i), the suc­ cess probability for a “potentially observable” response Yi a t the vector X_iof explanatory variables. They assume th a t the p vectors X_i are linearly independent and assume th a t Pi ~ b e ta (a i)i ,a 2)i). (2.25) Hence, from independence, the prior on p is given by 7 T @ c x f[A ° M' 1( l - M i r ' i“ 11 (2.26) Under the independence assumption and from (2.23), (2.26), they gave the induced prior on (3 as tt(£) oc (2.27) i= 1 22 Although the above example is only valid for binomial GLMs, Bedrick et al. (1996) gave generalization and examples where their m ethod is applicable to common GLMs including Poisson and exponential regression. However, for normal and gamma regression models they were only interested in eliciting priors on the regression coefficients (3 assuming th a t the dispersion param eters of these models are known. The power of this approach as they stated is th a t “it is much easier to elicit information about success probabilities such as E ( Y |X ) = p, which are on the same scale as the data, than to attem pt the extremely difficult task of eliciting prior knowledge about /?.” In their work, the use of d a ta augm entation priors (DAP) was also proposed to induce priors on (3. They showed th a t D A P’s are closely related to C M P’s and can be induced by particular cases of C M P’s. A DAP on /3 has the same functional form of the likelihood and can be obtained by specifying “prior observations” and their weights. These prior observations must be taken at specific locations in the predictor space. Hence, a DAP also needs some locations in the predictor space to be specified as in the case of a CMP. The good choice of the predictor space location should be in the expected range of X , spread enough so th a t the corresponding probabilities can be reasonably assumed to be independent and they should also be accepted by the expert. It is straightforw ard, however, to let the field expert choose these locations. Bedrick et al. (1996) noted th a t the independence in C M P’s does not mean th a t the component of the vector will be independent too. After selecting a proper JQ, i = 1, • • • ,p, to determine the value of Li in a DAP, it can be thought of as a typical prior observation associated with X_{. For example, in binomial GLMs, it can be thought of as a prior estim ate of the mean num ber of successes at JQ. If the beta prior in (2.25) is reparameterized such th at aiti = WiYi a2)i = Wi(l - Yi), and (2.28) then, for the logistic model, the CMP in (2.27) is exactly a DAP since it takes the same functional form of the likelihood in (2.24). The CMP in (2.27) induces a DAP for the logistic 23 model as the logit link function is such th at d[9 - \ Z p ) \ = g - \ £ m ~ JT 1(£ '£ )]• (2.29) The induced DAP in (2.27), using (2.28) and (2.29), is proportional to a likelihood based on the “prior observations” (1^-,XZ- , ^ , : i = 1, • • • ,p ). The weight param eter W{ in (2.28) can be interpreted as the prior number of observations associated with Yi. Consequently, large values of Wi reflect more confidence in the prior belief which means th a t the prior is relatively more informative. However, these extra param eters need to be quantified, the m atter which may make the CMP easier to be elicited. Although the resulting priors are not necessarily members of any specific family of dis­ tributions, Bedrick et al. (1996) argued th a t the CMP and DAP priors lead to tractable posteriors for GLMs through importance sampling and Gibbs sampling techniques. Another approach for eliciting different classes of priors for GLM param eters started with the work of Ibrahim and Laud (1994) for normal linear models. Their work was then extended to prior elicitation and variable selection for logistic regression models by Chen et al. (1999). A further extension to GLMs was given by Chen et al. (2000), who proposed the class of power priors for GLMs. The main idea of the above series of papers is th a t a prior prediction vector T 0 can specified for the response vector Y, either using historical d a ta or an expert’s opinion. A scalar 0 < ao < 1 needs also to be elicited to quantify the expert’s confidence about her best guess Y 0 relative to the actual data. Hence the scalar ao reflects the contribution of the prior information in the posterior relative to the information given by the current experiment. Together w ith the design m atrix X , Y 0 and ao are used to specify an informative prior for regression coefficients. In the class of power priors, the prior density is raised to the power ao, which is considered as a precision param eter th a t controls the heaviness of the tails of the prior distribution. For a random ao, a beta distribution was assumed by Chen et al. (2000) as a prior for ao. Although the class of power priors cannot be expressed in a closed form, Chen et al. (2000) discussed 24 its theoretical properties and propriety together with its required computations. Different extensions to this class of priors have been proposed in the literature. For exam­ ple, based on the same ideas, Chen and Ibrahim (2003) proposed a class of conjugate priors for GLMs and discussed its elicitation. Moreover, Chen et al. (2003) introduced an informa­ tive class of priors for generalized linear mixed models. Extensions to variable selection were suggested by Meyer and Laud (2002), Chen and Dey (2003) and Chen et al. (2008). Garthw aite and Al-Awadhi (2006) developed an elicitation m ethod for piecewise-linear logistic regression. The m ethod is also valid for other GLMs and Garthw aite and Al-Awadhi (2011) extends the idea to GLMs with any link function. They assumed a m ultivariate normal distribution for the regression coefficients; its param eters can be determined from the expert assessments. One of the main aims of this current thesis is to extend this piecewiselinear elicitation m ethod in the context of GLMs to treat the case of correlated regression coefficients. The m ethod is reviewed in detail in Chapter 3 and the proposed extensions are given in Chapters 4 and 5. The piecewise-linear elicitation m ethod was designed to be used w ith the aid of interactive graphical software w ritten for this purpose. Older prototypes of the software were used in practical case studies for threatened species in Garthw aite (1998) and Al-Awadhi and Garthwaite (2006). A more recent version of the software has been w ritten by Jenkinson (2007), this version of the software has been reviewed, modified and extended further in Chapters 3, 4 and 5 of the current thesis. Another prototype of the interactive graphical software was given by K ynn (2005, 2006) to elicit expert opinion for the Bayesian logistic regression model. The software is called ELICITO R and appeared as an add-on to WinBUGS. Kynn extended the program w ritten by Garthwaite (1998) and rewrote it in a more robust programming language. The software was originally developed as a user friendly tool for quantifying environmental experts’ knowledge while studying the presence or absence of endangered species. It adopted the same approach of Al-Awadhi and Garthwaite (2006). 25 Following Garthw aite (1998), the elicitation scheme adopted in ELICITO R is based on the logistic regression model in which the probability of the presence of an endangered species is represented by a Bernoulli distribution and can be related to a num ber of environmental variables via a logit function. The expert is asked to give conditional probability assessments at the preferred or optimum site of species presence as the intercept. Then assessments are made at other sub-optimum levels of each other covariate. The choice of the “optim um ” value or level of each covariate to be its intercept, also called the reference value, is made by Garthwaite (1998) and thoroughly justified in Kynn (2006). She discussed th a t it is psychologically meaningful to the expert to be asked about conditional probabilities given th a t all or all except one covariate are at their optim um level. In this case, conditioning on all other covariates can be translated in the expert’s mind as conditioning on one event where everything is optimal. Kynn mentioned also some ecological concerns th a t make the optim um point a good selection, a noticeable concern is th a t the species responses distribution is usually considered to be unimodal. However, in our extensions to the piecewiselinear model, the expert freely chooses the reference level, although she is advised to select the optim um one. While categorical covariates are related to the probability of presence, or generally of success, through a bar chart in both ELICITO R and the prototype and its extensions, representing continuous covariates is clearly different. ELICITO R does not only assume a piecewise-linear relation between continuous covariates and the presence probability, b u t it also offers the options of linear and quadratic functions to model this relation. Nevertheless, Kynn (2006) stated th a t the fully linear form is not realistic and th a t the quadratic form can be too restrictive. We believe th a t the piecewise-linear relation is a very general form th a t can model many other forms as special cases. The main critical point in the statistical model of ELICITO R is th a t the regression coefficients are assumed to be independent a priori, an assum ption th a t may not be true in many situations. Thus, only univariate normal priors were elicited and no attem p t was made 26 to elicit covariances even for the coefficients at the dividing points of the same piecewise-linear curve or at the different levels of each single categorical covariate. The idea of successive sub-division, also called the bisection m ethod, as a technique to assess the three quartiles from an expert, has been generally accepted as a comparatively easy task for the expert to perform. The prototype software in Garthw aite (1998) and its extensions apply the bisection m ethod to obtain expert’s assessments. However, Kynn (2006) has a detailed discussion about available alternatives to assess percentiles, and cites results of studies comparing these methods. But in designing ELICITOR, she decided to use a quite different technique by letting the expert give her two boundaries of a credible interval, then give the probability of this interval. Despite being easy to perform, this m ethod does not seem to be efficiently tested or justified. R ather than assessing probabilities as numbers, the users of ELICITO R have more in­ teractive visualizations for estim ating probabilities. These include a probability wheel, a probability bar and other visualizations to help experts assess probabilities closer to their knowledge. The feedback provided after the assessment process are alternative credible in­ tervals and probability distribution functions for the intercept and categorical variables. ELICITO R was intended to be extended to encompass other GLMs, with flexible options of the link functions and prior distributions, not only the logistic regression. The software docum entation mentioned th a t this and other extensions were being tested, but we do not know of any version of the software where these extensions have been implemented. For more details on ELICITO R see Kynn (2005); Kynn (2006) and O ’Leary et al (2009), although the software and its docum entation no longer seem to exist as an open source on the web. Denham and Mengersen (2007) introduced a m ethod and developed software to elicit expert opinion based on maps and geographic d a ta for logistic regression models. Elicit­ ing information on observable quantities, such as values of the dependant variable at given values of the predictors, (referred to as the predictive procedure) is usually preferred and easier than direct assessment of the regression param eters (structural procedure). However, 27 they argued th a t each procedure is more convenient for a specific type of experts. For ex­ ample, they considered two types of ecological experts: the ‘physiologist’, who has a good understanding of the physical requirements of each species, is more likely to respond well to a structural elicitation. The ‘field ecologist’, who has more knowledge about the places of existence for each species, may be better at responding to a predictive elicitation. Denham and Mengersen (2007) proposed a new approach th a t combines both strategies. In their combination approach, the expert may use either m ethod or the two m ethods simultaneously w ith each variable, according to his preference and background. They adopted the usual logistic regression for species modelling, Yi ~ Binomial(n^, ^ ) , with the logit link function Yi = g{^i) = l o g ( ^ / ( l — /2Z)), and Y_ = X/3, where Yi is the number of observations of a species at site i, and X is the m atrix of explanatory variables. The aim is to quantify the expert’s opinion about the prior distribution of in the form £ ~ M V N (6 ,E ). They stated th a t the methods of Kadane et al (1980) and Garthw aite and Dickey (1988, 1992) can be used in this context to estim ate the hyperparam eters b and S by asking the expert to assess some quantile information for the value of Y at particular values of X . However, they referred to the difficulty of this predictive elicitation procedure for the ‘field ecologists’ who may have knowledge about the presence of a specific species at a located site map rather than the explanatory variables affecting this presence. To help this type of experts, Denham and Mengersen (2007) suggested two alternatives. The m ethod of Kadane et al. (1980) can be used, with the expert choosing the design points based on location, without specific reference to explanatory variables. Or, instead, the design points could be selected as in the m ethod of Kadane et al. (1980), and then transform ed to map locations th a t are displayed on the m ap for the expert. Their proposed combination approach as an elicitation m ethod is not only a hybrid ap28 proach th a t combines both the predictive and structural procedures together, but it also offers the opportunity to use either of the two procedures simultaneously for each single variable. The basis of their m ethod is to use the standard elicitation m ethod with maps as discussed above, to derive a “first pass” elicitation of b. A structural elicitation procedure is then applied. The latter is implemented by presenting a univariate graph for each of the p explanatory variables. In each graph, they fix all the other p — 1 variables a t their mean or median value, i.e. for the j th variable, j = 1, • • • ,p, they display the graph of p Y = bQ+ bjX j + frfcXfc. k=l,k^j These univariate graphs are autom atically updated once the expert updates the map by adding new points or editing values. Moreover, the expert can directly m anipulate the graphs, which cause the m ap to autom atically change as well. The expert is m eant to keep changing the map an d /o r the graphs until they all represent her prior knowledge. To elicit S, The expert is asked to provide a 95% “envelope” around the displayed regression lines by assessing upper and lower 95% quantiles. To apply this approach, Denham and Mengersen (2007) developed elicitation software under a Geographic Information System (GIS), in which design points were actual location on interactive maps. They listed the benefits of the elicitation procedure using the software w ith interactive maps over the usual elicitation with paper maps. The new procedure is more flexible, it allows the expert to access information at any point in a convenient m anner. The scale dependency of the hard copy maps could be removed by using the feature of zoom in and out. Using the software allows the visualization of the responses and provide feedback to the expert. In which case, the expert can revisit an d /o r modify any previous assessment on the interactive map. Denham and Mengersen (2007) implemented their software in two case studies for m od­ eling the median house prices in an Australian city and for predicting the distribution of an endangered species in Queensland. In their first case study, they modelled the m edian house prices using a piecewise-linear regression to a tta in flexibility and m aintain the simplicity of 29 the linear regression. They chose the dividing knots of the piecewise-linear relations as the 0.33 and 0.66 quantiles of each explanatory variable. Their model takes the form Yi = (3o + (3\Xi\ + faX'n + foX'ii + (3aX i2 + (3§X[2 + fieX1^, where X \ is the distance from city center in kilometers and X 2 is the distance from the river in kilometers. For j = 1,2, they defined X[j and X ”- as and X ij — Xo.33j if X ij > Xo.33j, 0 otherwise, / X ij XofiQj if X ij X()'Q6j , X'lj = < 0 otherwise, where Xo. 33j and Xo.66j are the 0.33 and 0.66 quantiles of X j , respectively. They m eant to simplify the Bayesian prior structure of the model compared to th a t of Kadane et al. (1980) or Garthwaite and Dickey (1992), to be of the form Y \X .,P ,a 2 ~ N ( X .'P ,a 2), P ~ M V N (6,E), a 2 ~ Inverted Gam ma(^o/2, vqSq/ 2 ), In this case study, they specify a prior for the regression param eters /?. However, it does not seem th a t they implemented any procedure to elicit the two extra hyperparam eters uq and So. The results suggested th a t the experts managed to elicit quantifications of their opinions of the house prices in the city th at were consistent with the actual house prices. The priors appeared to be relatively consistent. All participant experts in this case study reported th a t they preferred the combined approach over the m ap or the standard approach. Most experts elicited slightly different priors under the different elicitation m ethods they used. The second case study in Denham and Mengersen (2007) was devoted to eliciting two experts’ opinion about the distribution of the brush-tailed rock-wallaby in Queensland. The 30 explanatory variables were chosen by one of the experts to be X \ \ a measure of terrain, X 2 ' a moisture index, X%\ aspect and X 4 : a 4-category variable representing the rock type. They were interested in the following logistic model Yi ~ Bernoulli (pi), logit (pi) ~ N(/ii,(72), Hi = P0 + Pi X u + /?2 ^ 2 z + PzX\i + ^ 4 X 3 ^ + PsX^i + peX^u + P-jX^2i + Ps X ^ u P_ ~ MVN(6, E). They aimed to elicit the m ultivariate normal prior of p. The experts were allowed to choose the design points. The expert chooses a design point by clicking on a map, then an interactive dialogue pops up giving a plot of a beta distribution of the probability of presence at the selected design point. The given plot has three adjustable points at the m edian and the 0.05 and 0.95 quantiles. The expert is asked to adjust the three quantiles, or the computed b eta param eters, until the presented beta curve is the best representation of the expert’s belief about the probability of the specie presence at the selected point. This procedure is repeated for a number of design points. Once the expert has selected a minimum num ber of points, a logistic regression model is fitted by the software at each design point. Then the univariate relation between the probability of presence and each of the explanatory variables is presented to the expert in a separate graph, a response curve. Each curve is drawn assuming th a t the other variables are kept fixed at their means. The categorical variable X 4 is represented by box-plots rather than a curve. The expert can review and modify the design points to get the autom atic im pact on the response curves. The elicited beta distribution at each design points could be used to elicit the m ultivariate normal distribution of the regression param eters P through weighted logistic regression or a simulation based approach, see Denham and Mengersen (2007) for more details. They stated th a t the priors elicited from the experts were reasonably informative, 31 with corresponding posteriors th a t are clearly different from those posteriors obtained from a uniform improper prior. Although the software is specially designed for geographical d a ta elicitation of a logistic regression model, they indicated th a t the concepts can be generalized to any GLM. However, Denham and Mengersen (2007) wrote the software explicitly for each of the two case studies separately, tailored for the given cases and sets of explanatory variables. In its present form their software is thus limited and cannot be used as a general elicitation tool. Moreover, they used the R language to code statistical functions, with Visual Basic and other software for interactive graphs embedded in the GIS system. The latter limits the usability of their software. Jenkinson (2007) re-wrote the software of Garthw aite and Al-Awadhi (2006) in Java to provide a more transportable and stable version. He gave a detailed description and docum entation of both the software and the piecewise-linear theoretical model behind it [Jenkinson (2007), p.215-251]. Further modifications of the theoretical model and the software are given in this current thesis in Chapters 3, 4 and 5. An im portant medical application of the GLM elicitation software is given in a case study reported in Garthwaite et al. (2008). Aiming to estim ate the costs and benefits of current and alternate bowel cancer service in England, a pathw ay model was developed, whose transition param eters depend on covariates such as patient characteristics. D ata to estim ate some param eters were lacking and expert opinion was elicited for these param eters, using the indicated software and under the assum ption th a t the quantity of interest was related to covariates by the generalized piecewise-linear model given by Garthw aite and AlAwadhi (2006). The assessments were used to determine a m ultivariate normal distribution to represent the expert’s opinions about the regression coefficients of th a t model. One conclusion of this work was th a t quantifying and using expert judgem ent can be acceptable in real problems of practical importance, provided th a t the elicitation is carefully conducted and reported in detail. 32 A thorough detailed comparison has been conducted by O ’Leary et al. (2009) for three relatively recent elicitation tools for logistic regression. The comparison included the interac­ tive graphical tool of Kynn (2005) and Kynn (2006), the geographically assisted tool under GIS of Denham and Mengersen (2007) and a third simple direct questionnaire tool with no software. These tools were compared in an elicitation workshop (see O ’Leary et al. (2009) for more details on the third m ethod). The paper discusses and gives a detailed description for each of the three methods used, showing advantages and disadvantages of each of them. M ethods were compared according to their differences in the type of elicitation, the proposed prior model, the elicitation tool and the requirement of a facilitator to help the expert. Prior knowledge of two experts was elicited to model the habitat suitability of the endangered Australian brush-tailed rock-wallaby. The comparison revealed th at the elicitation m ethod influences the expert-based prior, to the extent th a t the three m ethods gave substantially different priors for one of the experts. Some guidelines were also given for proper selection of the elicitation method. This work of O ’Leary et al. (2009) is part of a large body of applied research which shows the importance of eliciting expert knowledge when modeling rare event data, see also Kynn (2005); Al-Awadhi and Garthw aite (2006); Low Choy et al (2009) and Low Choy et al. (2010). Although they are interested mainly in designing the elicitation process for ecological ap­ plications, Low Choy et al. (2009) give a framework for statistical design of expert elicitation processes for informative priors which may be valid for Bayesian modeling in any field. The proposed design consists of six steps, namely, determining the purpose and m otivation for using prior information; specifying the relevant expert knowledge available; formulating the statistical model; designing effective and efficient numerical encoding; managing uncertainty; and designing a practical elicitation protocol. O ther im portant stages in the elicitation pro­ cess may be found in Garthwaite et al. (2005), Jenkinson (2007) and Kynn (2008). Low Choy et al. (2009) validated these six steps in a detailed discussion and comparison of five case stud­ ies, revisiting the principles of successful elicitation in a m odern context. 33 The recent work of James et al. (2010) is very interesting and im portant in the current review for two aspects. First, it introduces and describes a general elicitation tool for quan­ tifying opinion in logistic regression using interactive graphical stand-alone software, called Elicitator. Second, the software is based on a novel statistical methodology to elicit a normal prior distribution for regression param eters. Their work is an extension to th at of Denham and Mengersen (2007) as applied on nor­ mal prior elicitation for logistic regression in a geographically-based ecological context. As m entioned before, Denham and Mengersen (2007) did not introduce a general purpose tool; their software was tailored to the requirements of specific case studies. M otivated by th at, James et al. (2010) developed the Elicitator software as a stand-alone elicitation tool th a t can be used for a wide range of applications. Although the Elicitator software is based on the same interface and protocol as its pro­ totype in Denham and Mengersen (2007), the statistical m ethod adopted to transform as­ sessed values into elicited priors is a novel one inspired from the CMP ideas of Bedrick et al. (1996). James et al. (2010) argued th a t the CMP is more tractable and more applicable in general compared to the predictive approach used by Kadane et al. (1980) and Denham and Mengersen (2007). The novel modification in the Elicitator design to the approach of Bedrick et al. (1996) is th a t it relaxes the assumption th a t the num ber of chosen points a t which the expert assesses her priors is exactly equal to the number p of explanatory variables in the logistic model. This is the assumption th a t leads to the induced prior on j3 as in (2.27). Relaxing this assum ption allows the number of elicitation points, say k, to exceed the num ber p of explanatory variables, the situation th a t is commonly encountered. Although the prior on /3 can no longer be induced as in (2.27), James et al. (2010) proposed a m easurem ent error model in which elicitation points represent d a ta in a beta regression model. In this sense, increasing the number k of elicitation points will lead to a more accurate prior. Specifically, they assume a standard logistic regression model with a Bernoulli distribution and a logit link function as used by Bedrick et al. (1996). A main criticism is th a t they 34 assume th a t the explanatory variables are independent a priori, in the sense th at independent univariate normal priors were assumed for (3, i.e. j = l, (63-,a?), (2.30) Although they mentioned the possibility of assuming a m ultivariate normal prior distribution, no attem pt has been made for its implem entation in Elicitator. For i = 1, • • • , k, the expert assesses information about the probability of success pi at a geographical site i, selected by the expert, with a known combination of the explanatory variables X i j , X 2 ,i, ■• • , X Pti. For example, the expert may assess information about the probability of presence of a species at a known combination of environmental predictors at site i. Following Bedrick et al. (1996), expert’s assessments are used to elicit a beta prior on p,i as in (2.25). However, in situations where k > p, a beta prior on m would not help induce the normal prior for (3. Instead, James et al. (2010) assumed a beta prior on the expert’s probability of success, say Zi, which is different from the actual probability pi. As in a measurement error model, pi is the conditional expectation of Z{ in the sense th a t logit (pi) = X!iP Zi\pi ~ b e ta (a i>t,a 2 >»), (2.31) Ei^Zi\pi) = pi. James et al. (2010) discussed the expert’s assessments about Zi th a t are required to elicit beta distributions as in (2.31). They argued th a t the required best estim ate of the probability Zi in the measurement error model is the arithm etic mean, however it is difficult to assess. They were also against the idea of assessing the median, claiming th a t it needs more effort from the expert to assess. Hence, Elicitator requires the mode of Zi as its best estimate. Then, following the well-established practice of assessing several quantiles for b eta elicitation, Elicitator requires the four bounds of the 50% and 95% credible intervals. Although two assessments are m athem atically sufficient for eliciting the two beta param eters, it is better to elicit more assessments and reconcile them , especially for skewed distributions. 35 A simple numerical procedure is used to elicit beta param eters from the mode and either two or four assessed quantiles. To elicit the hyperparam eters bj and cr|, j = l , - - - ,p in (2.30) using the elicited beta param eters a\^ and <2 2 ,i, i — 1, • • - , A; in (2.31), James et al (2010) proceed as follows. In principal, the beta regression in (2.31) is performed using the expert’s data on Zi and the known values of the explanatory variables to provide the expert-defined estim ates of ft. However, due to difficulties in implementing any beta regression package in Elicitator, the beta regression problem has been approxim ated by its discrete version, a binomial regression. An R software package is used to perform the binomial regression, where point estimates (3j and their corresponding standard errors s.e.(Pj) are obtained. The prior distributions in (2.30) are finally elicited using these estimates as f t ~ N ( f t,s .e .( f t) 2), j = 1, • • • ,p. (2.32) Two criticisms of the proposed measurement error model in this context are as follows. First, it adds additional sources of uncertainty, namely, the discrepancy between the expert’s probability Z{ and the conceptual probability fii. Second, it imposes difficulties in compu­ tation and implementation in the software, requiring a binomial regression approximation. However, these criticisms do not seem to be a high price compared to the increased accuracy gained by increasing the num ber of elicitation points of CMPs. Moreover, the use of beta or binomial regression make it easy to represent standard regression diagnostics to the expert as feedback. Interactive graphs th a t are given by Elicitator to the expert as feedback fall in three m ain groups. The first group includes a box-plot, a pdf curve and some numeric statistics of the elicited beta prior at each site. These are all interactive in the sense th a t they are autom atically modified if the expert changes her assessments of the mode value or the credible interval bounds of the probability of success at each site. The second group involves the univariate graphs th a t highlight the main effect of each explanatory variable associated with each of the elicitation sites. 36 These graphs plot the elicited probability against the value of the site predictor with a standard regression fit. The categorical predictors are drawn as bars to emphasize their discrete nature. Various regression diagnostics graphs are given in the third group. These graphs help the expert consider how the estim ated prior model elicited from her assessments corresponds to her knowledge overall. The Elicitator software is w ritten in Java and uses open source libraries. It does not require a commercial GIS, in contrast to the prototype of Denham and Mengersen (2007). All statistical calculations are performed using the R statistical package. Elicitator uses a Java package to communicate with R, without needing to run an actual instance of the R software. This greatly increases the generality and flexibility of Elicitator as a stand-alone tool th a t can be used by a wide range of experts with different backgrounds. According to James et al. (2010), Elicitator is highly extensible and one of the main extensions they are willing to handle is the ability to implement more GLMs rather than only the logistic regression model. But they did not mention or discuss how this can be done for other distributions and link functions under their proposed model for measurement error. 2.5 P rior elicita tio n for m ultin om ial m odels An early attem pt to elicit a Dirichlet prior distribution for multinomial param eters was suggested by Bunn (1978). He argued th a t the usual fractile assessment procedure th a t has been used for eliciting beta priors may be difficult and tedious to be applied on their m ultivariate extensions, the Dirichlet priors, when more conditions and restrictions must be taken into consideration. As will be shown on Chapter 6 of this thesis, developments in computing techniques and tools make it easy to implement fractile procedures in user-friendly software th a t assess quartiles and elicit Dirichlet priors effectively and interactively. However, the approach suggested by Bunn (1978) as an alternative to the fractile m ethod for Dirichlet elicitation was the m ethod of ‘imaginary results’. He used two versions of this m ethod, namely, the Equivalent Prior Samples (EPS) and the Hypothetical Future Sample 37 (HFS), to quantify opinions about a Dirichlet prior. Specifically, let p = p2) .. . } pk), be the vector of multinomial probabilities, with a Dirichlet prior distribution of the form /(g ) = £ ( E k ]5 i i n k P i l I i 1 Ii=l yQ'i) 2 > . = Xi a.>0. ( 2 .3 3 ) 1 It can be shown th a t the posterior mean of p i, say p i, after sampling N d ata is given by di + rii K = Z Z v^— ’ N + E i= l ai , . (2'34) where rii is the num ber of items, out of N , th a t falls in category i. In the EPS m ethod, the expert is asked to assess a set of prior means p?, i = 1,2, • • • , k. She also assesses the equivalent sample size of her subjective belief th a t would empirically give this set of probabilities. This sample size gives direct information on ai• Thus, the prior hyperparam eters can be elicited as k a i = p * Y ,a ii —1 (2.35) The main criticism to the usage of the EPS m ethod here is th a t the expert cannot easily give an assessment for ai directly. The assessed value does not necessarily represent her opinion accurately and may contain sources of assessment bias. Therefore, Bunn (1978) proposed the alternate HFS method, in which the expert also assesses the set of prior expec­ tations p*,i = 1,2 , • • • , k, but, in addition, she is asked to assess her posterior expectations, say p**,i — 1,2, •• • , k, given th a t a hypothetical future sample of size M has resulted in a number of mi items in category i , where elicited, using (2.34) and (2.35), as 1 < mi < M . Hence, the hyperparam eters can be _* mi - Mpl* ai = Pi — 7,;---- • Pi - P i (2-36) The main source of bias in the HFS m ethod is ‘conservatism ’; the expert tends to revise her probabilistic beliefs from prior expectations to posterior expectations as a result of the new data ‘insufficiently’ if compared with the revision indicated by Bayes theorem. The strong assumptions of the HFS m ethod, th a t the expert can be an ‘intuitive Bayesian’ and 38 can modify her prior beliefs in the light of new d ata sets, turned out to be poorly satisfied in the case study of Bunn (1978) and other studies mentioned therein. For example, in eliciting beta priors, W inkler (1967) found th a t the methods of imaginary results gave greater bias than the usual fractile methods. Another problem with the two methods suggested by Bunn (1978) is th a t probability means are directly elicited from the expert. We believe th a t medians are easier to assess and, by using the bisection m ethod, the expert will represent her beliefs more accurately. Although the unit sum of the probability assessments can be directly fulfilled by assessments of means (the means m ust sum to one), median assessments of these probabilities can be elicited for beta marginal or conditional distributions. M ethods for reconciliation of beta elicited distributions into a Dirichlet prior are proposed in Chapter 6 . In the HFS m ethod of Bunn (1978), he did not give any suggestion about the selection of the hypothetical sample. Instead, in a case study, he used an actual sample based on a survey, and called his m ethod an Actual Future Sample (AFS). To investigate the feasibility of this m ethod and its possible biases and subjective inconsistencies, the AFS m ethod was implemented in a case study reported in Bunn (1978). In this study, a publisher quantified his opinion about the expected m arket attitudes towards a new product. Different possible attitu d e events were summarized in three categories, for which he assessed their expected prior probabilities as p\ = 0.20, pi = 0.30, pi = 0.50. (2.37) From his EPS assessment, Y h =i ai was se^ eQual to 10. Then, a survey of 20 customers revealed th a t the num ber of customers in each category were 6 , 7, 7, respectively. Based on this survey, the publisher was asked to revise his prior probability expectations. He gave the following posterior expectations $ * 0 4 ) = 0.25, $ * (A ) = 0.30, pl*(A) = 0.45. (2.38) To investigate the conservatism of the publisher, the posterior expected probabilities were 39 computed as in (2.34). Since, a\ = 2 , <22 = 3, 03 = 5, the computed posterior expectations given by Bayes theorem are Pi*(C) = 0.27, $;*(<?) = 0.33, #5* (C) = 0.40. (2.39) Comparing the assessed posterior probabilities p**(A) in (2.38) to the computed ones in p*i*{C) in (2.39) reveals the conservatism of the publisher, who did not revise his prior prob­ abilities by as much as Bayes theorem would revise them. Bunn (1978) discussed the possible reasons of the revealed bias and inconsistency in using the methods of imaginary results for eliciting a Dirichlet prior. He argued th a t the expert should complete several iterations with these m ethods to achieve consistent results. However, he did not discuss how this might be done through feedback given to the expert, nor did he suggest any m ethod of reconciliation. These drawbacks of the imaginary results m ethods suggest th a t a fractile m ethod is to be preferred, especially in m ultivariate cases where more inconsistency can be expected. Using the same idea as the HFS m ethod, and consequently the same forms of equation as in Bunn (1978), Dickey et al. (1983) reintroduced the elicitation m ethod with a different case study. The m athem atical formulation of the two methods is identical. However, two main differences in the elicitation process can be identified. In assessing the expected prior probabilities i = 1,2, • • • ,k , Bunn (1978) assumed th a t the expert is coherently aware th a t these assessed expected probability m ust sum to one. In contrast, in the work of Dickey et al. (1983), the expert was free to assess the expected probabilities without being conscious of any probabilistic constraints. Instead, Dickey et al. (1983) suggested normalizing the initial assessed probabilities to get the following normalized set (2.40) th a t is guaranteed to add up to one. We use this simple normalization procedure extensively for our proposed logistic normal prior in Chapters 40 8 and 9. An im portant property of a good elicitation m ethod is th a t the expert is not overly conscious of the m athem atical constraints on her assessments. M ethods th a t include normalization and reconciliation procedures are generally better th an those th a t ask the expert to make assessments th a t meet specified constraints. The second difference between the elicitation procedure of Bunn (1978) and th a t of Dickey et al. (1983) regards the reconciliation of an expert’s assessments. As mentioned before, given the hypothetical sample, one expected posterior probability suffices to elicit the full vector of the Dirichlet hyperparameters. B ut it is usually better to assess several posterior probabilities and then reconcile the different results. Bunn (1978) regarded discrepancies in results as inconsistency on the p art of the expert and suggested asking the expert to resolve inconsistency by doing many iterations of the elicitation process. On the other hand, Dickey et al. (1983) suggested reconciling different hyperparam eter values by averaging them . They also advised th a t large discrepancies may indicate th a t the Dirichlet distribution is not a suitable prior. The case study in Dickey et al. (1983) quantified a social psychologist’s opinion about the attitudes of potential jurors in law trials where the death penalty was available. Their attitudes were classified into 4 categories, and the psychologist’s assessments of the prior probabilities of the categories were: Pl = 0.02, pi = 0.08, pi = 0.15, p\ = 0.75. (2.41) The psychologist was then told th a t a hypothetical sample of 200 potential jurors had been distributed between the four categories as 16, 20, 32, 132. Given this information, the expert revised her prior probabilities and gave the following expected posterior probabilities: PI* = 0.05, p*2* = 0.09, pl* = 0.16, pl* = 0.70. (2.42) Using each of these values in (2.36) gives an initial value of ai, which can then be used in (2.35), together with the corresponding prior probability, to get an estim ate of Y lt= iaiThese estimates were averaged in Dickey et al. (1983) and gave a value of 140. This gives 41 the final hyperparam eter elicited values, again from (2.35), as ai = 2.8, 0,2 = 1 1 .2 , 03 = 21, a 4 = 105. In contrast to the case study in Bunn (1978), the expert here was not conservative; her posterior probabilities were closer to the relative frequency of the hypothetical data, 0.08, 0.10, 0.16, 0.66, rather than to her prior probabilities. A lack of conservatism is also shown by the small value of J2i=i ai = 140, compared to the hypothetical sample size of N = 200. Using (2.35), the posterior probabilities in (2.34) can be considered as a weighted average of the prior probabilities and the relative frequency of the hypothetical sample, since f,** _ ______ Pi - N ^ Ui 4 . ^ i= l ai f* + Z t 1 ai N + N + T t= 1 a ! ’'- (0 4Q') ^ If the expert assesses Y a =i ai t° be less than the hypothetical sample size N , then she gives more weight to the relative frequency of the hypothetical sample. If Y^l= 1 ai = N , then the expert has given her prior opinion and the data equal weight. As in Bunn (1978), Dickey et al (1983) did not suggest a way to generate the hypothetical sample. Another m ethod for eliciting a Dirichlet prior distribution was developed by Chaloner and Duncan (1987) as an extension of their m ethod for eliciting beta distributions (Chaloner and Duncan, 1983). Their approach relied on assessing the mode vector for the predictive distribution, and some probabilities for other vectors around the mode. These assessments were used to elicit a Dirichlet-multinomial predictive distribution th a t was then used to induce a Dirichlet prior distribution for multinomial sampling. The approach thus differs from other Dirichlet elicitation methods in using mode assessments and in utilizing the predictive distribution rather than the prior distribution. The predictive distribution of a multinomial likelihood and a conjugate Dirichlet prior is a Dirichlet m ixture of multinomial distributions. This distribution is referred to as a Dirichlet-multinomial distribution and its probability mass function takes the form r(n + i)r(7\r) [ n j u r f a + <.()' f ( x i , x 2i- " , x k) = --------------r - -------r ( « + N ) [n jL j r ( z ; + x i > 0) 1 )] E iL l x i = n > ai > 0, 42 y r —— ---=r, [n tiiW E i = l CLi = N. (2.44) Chaloner and Duncan (1987) proved th at the Dirichlet-multinomial predictive distribution in (2.44) is a unimodal distribution for large values of n. They also gave sufficient conditions under which a vector, with components greater than or equal to one, is the unique mode of the Dirichlet-multinomial distribution. These conditions are mainly related to the probabilities of a set of vectors th a t are coordinate adjacent to the mode vector. Moreover, the identifiability of the Dirichlet prior distribution from the Dirichlet-multinomial predictive distribution was also proved. The above results were used in an elicitation scheme th a t was implemented in a computer program, in Chaloner and Duncan (1987), as follows. The expert specifies a large value of n as the sample size. Then she specifies a mode vector m = ( m i , m 2 , • • • ,m k ) th a t satisfies ]Ci=i m i = 71 and mi > 1. The computer program then uses a multinomial probability vector of n ~ l m to compute probabilities at some points th a t are component adjacent to the mode vector. These probabilities are presented to the expert and she is given the option of changing them if they do not represent her opinion adequately. The modified set of probabilities, together with the mode vector m, determine an initial value for the param eter vector a of the Dirichlet-multinomial predictive distribution. This is also taken as the elicited param eter vector for the Dirichlet prior distribution. The elicitation scheme of Chaloner and Duncan (1987) does not stop there. Instead, they chose to use the initially elicited vector a to compute the Dirichlet-multinomial probabilities at the same points where assessments had been elicited and give them as feedback to the expert, offering her the possibility of revising them to more closely represent her opinion. Moreover, Chaloner and Duncan (1987) believed th a t more replications were required. Therefore, the expert was to repeat the whole process again for a num ber of S different sample sizes n \, n 2 , • • •, n s • The resulting param eter vectors a1, a2, • • •, as were to be reconciled to give one final elicited vector of param eters. Chaloner and Duncan (1987) argued th a t it might be “dangerous” to use an autom atic specific reconciliation m ethod, instead, they recommended th a t the expert should examine the inconsistencies and “reconcile them introspectively” . 43 However, the m ethod requires direct assessment of the sample size n, this might lead to improper representation of an expert’s opinion and incur more bias [Bunn (1978)]. On the other hand, Chaloner and Duncan (1987) did not mention how large the assessed value n should be, neither did they discuss whether the expert should keep in mind the constraint HiL=i m i = n > on the mode vector m, or whether it may be corrected by the program if necessary. Nevertheless, it seems from their reluctance to apply any reconciliation th a t they preferred to leave it to the expert to make sure th a t the constraints were satisfied. Repeating the elicitation process for S different sample sizes may constitute an extra burden on the expert, especially if she is responsible for the final reconciliation. Unfortunately, the computer program implementing their m ethod does not seem to be available for reviewing and testing. Instead of using means or modes, van Dorp and Mazzuchi (2000, 2003, 2004) introduced a numerical algorithm and software to specify the param eters of a beta distribution and its Dirichlet extensions using quantiles. The motivation for their work was to quantify expert opinion as beta and Dirichlet distributions for subjective Bayesian analyses. They favored assessing quantiles rather than means or modes, as betting strategies can be used by the expert to make their assessments. They started by solving for the two param eters of a beta distribution using two quantiles, as follows. First, to ease the generalization to Dirichlet extensions, the beta distribution with two param eters a and b was reparameterized in term s of a location param eter fi — a/{a + 6 ), and a shape param eter N = a + 6 . Given the values of any two quantiles, say L and U, L < U : the two param eters fi and N can be obtained, although solving for these two param eters involves the use of the incomplete beta function, so th a t no closed form solution can be obtained, van Dorp and Mazzuchi (2000) utilized the limiting forms of a beta distribution as N tends to 0 and oo to prove the existence of at least one solution for the beta param eters in term s of any two quantiles. They gave a numerical algorithm to determine the beta param eters using a bisection m ethod as a numerical search procedure. If multiple solutions were found, the algorithm 44 selects the solution with the lowest value of N , i.e. with the highest level of uncertainty. The algorithm was implemented in software called BETA-CALCULATOR th a t inputs any two beta quantiles to output the corresponding values of the beta param eters. To extend the numerical algorithm to Dirichlet param eters, van Dorp and Mazzuchi (2003, 2004) used quantiles th a t were assessed through direct specification of marginal beta distributions. A Dirichlet distribution as given in (2.33) was also reparameterized in terms of its mean values /n = cn /N, as location param eters, and N = ^2i=l ai as a shape param eter. The extended algorithm was designed to use two quantiles for one of the Dirichlet variates, say Li and Ui, Li < Ui, for the ith variate, and ju st one quantile for each of the remaining variates, say Qj , j ^ i. Hence, the number k of quantile equations th a t they had is exactly equal to the number of required param eters. Following similar lines to their arguments for the beta distribution, van Dorp and Maz­ zuchi (2003, 2004) showed theoretically th a t at least one solution of the resulting system of equations always exists. The two quartiles Li and Ui were first used to elicit the m arginal distribution of the ith Dirichlet variate as X i ~ beta(/Xi, N ) . The value of N is then used with the quantiles Qj to elicit the remaining beta m arginal distributions as X j ~ beta(/Ltj, N ) , j 7^ i. If more than one solution exists, they decided to choose the solution w ith the smallest N , which is again the solution with maximum Dirichlet variance, hence giving the highest level of uncertainty. In addition to the Dirichlet distribution, they also gave another numer­ ical algorithm for the ordered Dirichlet distribution, which differs from the Dirichlet in the domain of its variates, see Wilks (1962). A criticism of the algorithm regards the selection of the Dirichlet variate for which two quantiles are assessed. No comment regarding the selection of this special variate was given in the published paper. The importance of its choice is th a t it determines the value of N for all other variates and hence determines the variances of the Dirichlet distribution. If substantial bias is made in assessing these two quartiles, all elicited param eters will be highly affected as a result. 45 In addition, to get a better representation of an expert’s opinion in the elicitation context, it is better to use over-fitting (Kadane and Wolfson (1998)). We believe th a t it is preferable to assess more quantiles than the minimum necessary and then apply a reconciliation technique to estim ate param eters. The expert may then be given feedback and questioned as to whether the feedback corresponds to her opinion, with re-assessment made when necessary. A possible general m ultivariate distribution, th a t can serve as a prior distribution for multinomial models, is constructed through using a m ultivariate copula function. A copula is defined as a function th a t represents a m ultivariate cumulative distribution in term s of one-dimensional marginal cumulative distribution functions. Hence, it joins marginal distri­ butions into a m ultivariate distribution th a t has those marginals. The im portance of the copula function is due to Sklar’s Theorem, which states th a t any joint distribution can be w ritten in a copula form. The marginal distributions can thus be chosen independently from the dependence structure th a t is represented by the copula function. For an introduction to copulas, see for example Joe (1997), Frees and Valdez (1998) and Nelsen (1999). The use of copula functions to elicit m ultivariate distributions has been considered in the literature, see Jouini and Clemen (1996), Clemen and Reilly (1999) and Kurowicka and Cooke (2006), among others. The joint distribution can be elicited by first assessing each m arginal distribution. Then the dependence structure is elicited through the copula function. Different families and classes of copula functions have been defined for both bivariate and m ultivariate distributions. Jouini and Clemen (1996) used bivariate and m ultivariate Archimedean and Frank’s families of copulae to aggregate multiple experts’ opinions about a random quantity. However, the simplest and most intuitive family of copulae is the inversion copula [Nelsen (1999)], of the form C[G iO n),-- - , G k ( x h)} = F(1,..,t) {•F’i"1[Gi(:ei)],• • • .F ^ G * ^ * ) ] } , where G{ are the known marginal distribution functions, (2.45) tk) is the assumed m ultivariate distribution function and its marginals are Fi. Hence, the marginal functions Gi S are coupled through into a new m ultivariate distribution given by the copula function C. 46 The distribution F ( i s usually selected as a m ultivariate normal distribution, which gives a Gaussian copula [Clemen and Reilly (1999)]. It has also been taken as a m ultivariatei distribution, [Demarta and McNeil (2005)], or even as a Dirichlet distribution [Lewandowski (2008)]. The Gaussian copula function is given by ,G t ( ^ ) ] = $ M {<6-1[G i(x1)],--- .S-H G kO r*)]}- (2-46) where $k,R is the cdf of a fc-variate normal distribution with zero means, unit variances, and a correlation m atrix R th a t reflects the desired dependence structure. $ is the standard univariate normal cdf. For eliciting a m ultivariate distribution, the Gaussian copula is the most appealing, see Clemen and Reilly (1999), as it is param eterized by the correlation m atrix R of the m ulti­ variate normal distribution; hence it only requires pairwise correlations among the variables. To elicit the Gaussian copula, any assessed positive-definite correlation m atrix R can be used together with the elicited marginal distributions G i ( x \ ) , • • • ,Gk(%k)- As w ith any other in­ version copula, any univariate distributions are allowed as m arginal distributions Gi s in the Gaussian copula. To elicit R, Clemen and Reilly (1999) suggested th a t a pairwise rank-order correlation between each and X j , such as Spearm an’s p i j or Kendall’s Tij, should be assessed. Then properties of the m ultivariate normal distribution are used to transform them into the product-mom ent Pearson correlation r y as follows: Tij = 2 sin( 7T/0i,j/ 6 ), or n j = sin( 7rrtJj / 2 ). (2.47) Then the product-m om ent correlation m atrix R is formed from the elements Clemen and Reilly (1999) suggested th a t only rank-order correlations should be elicited, not product-mom ent Pearson correlation, as the latter cannot necessarily be transform ed through the function 4>- 1 [£?*(.)] - while rank-order correlations transform regardless of the choice of the marginal distribution function Gi(.). To elicit these correlations, Clemen and Reilly (1999) mentioned three methods th a t can be used either separately or together. The 47 first m ethod involved the direct assessment of the correlation coefficient. Although people are not good at such direct assessment (Kadane and Wolfson, 1998), experimental evidence in Clemen et al. (2000) suggested th a t it can be a reasonable approach. The other two methods were based on assessed conditional probabilities or conditional quantiles th a t can be used to compute Kendall’s r or Spearm an’s p correlation coefficients, respectively. The m ethod proposed by Clemen and Reilly (1999) for eliciting a correlation m atrix is not guaranteed to yield a positive-definite m atrix. They cited two other studies in which dependence measures were assessed in a hierarchical way using dependence trees th a t require a fewer number of assessments. These studies use entropy maximization to guarantee the positive-definiteness of the resulting correlation m atrix. However, Clemen and Reilly (1999) criticized this approach for the relatively constrained nature of its dependence structure modelling. Instead, they suggested th a t the expert should be asked to revise her assessments if the resulting correlation m atrix is not positive-definite. For large problems with many variables, this revision m ethod would generally be very tedious and confusing. In Chapter 7, we propose a m ethod for eliciting a Gaussian copula function, as a prior distribution for multinomial models. Our approach overcomes two problems of the m ethod of Clemen and Reilly (1999) simultaneously. First, we transform the assessed conditional quar­ tiles of X i and X j , through 4>- 1 [(?*(.)], then product-m om ent correlations can be computed on the normal scale with no need for the rank-order correlations. Second, the conditional quartiles are assessed according to the structural elicitation procedure of Kadane et al. (1980), which guarantees th a t the elicited correlation m atrix is positive-definite. Copula functions were used extensively in the literature for building m ultivariate distri­ butions based on known marginals. This includes, of course, building joint prior distributions for Bayesian analysis using copulae. For example, Yi and Bier (1998) utilized some copula families to construct a joint prior distribution th a t reflects inter-system dependencies between accident precursors in a Bayesian study to estim ate accident frequencies. A Gaussian cop­ ula has not been widely used in the literature as a prior distribution for m ultinom ial models. 48 However, the need for a flexible joint prior distribution th a t effectively combines the marginal beta prior distributions of multinomial probabilities makes the Gaussian copula an attractive choice as it gives a more general dependence structure th an the usual Dirichlet distribution. An applied Bayesian study by Palomo et al. (2007) used a Gaussian copula to model external risk in project management. In one of their adopted scenarios, they assumed th a t any of k potential disruptive events might occur, one at a time, according to a multinomial distri­ bution. The multinomial probabilities were assigned beta marginals, and a Gaussian copula function was used as a m ultivariate distribution to param eterize the dependence structure between these probabilities. 2.6 O ther general graphical elicita tio n softw are This section reviews other interactive graphical elicitation software th a t has been reported in the literature. Software projects th a t are reviewed below cover general elicitation problems apart from those for GLMs and multinomial models. These have already been reviewed in Sections 2.4 and 2.5. Chaloner et al. (1993) aimed to quantify experts’ opinion in the form of a prior distribution about regression coefficients in a proportional hazards regression model. In a clinical trial, prior distributions from five AIDS experts were elicited. To compare two treatm ents with a placebo, experts were asked to elicit the joint and m arginal distributions of the survival probability under each treatm ent. This could be done by assessing some probabilities and quantiles to elicit a joint extreme value prior distribution for the proportional hazards model param eters. For this purpose, they developed an interactive com puter program th a t uses interactive graphs to elicit experts’ opinion and give them feedback. The curves of the two m arginal distribution and the contour representing the joint distribution were presented to the experts. This feedback was given in the form of dynamic graphical displays of probability distributions th a t can be adjusted freehand. 49 Some of the main “lessons” learned about this elicitation process, as stated by Chaloner et al. (1993), can be summarized as follows. They stressed the importance of the dynamic graphical displays in helping experts to visualize probability distributions and in giving useful instant feedback. They also noted th a t it is necessary to have a clear well-defined outline and explanation of the questions th a t will be addressed to the expert. In cases where an expert had to assess her best guess of a specific probability, they wanted her also to report her uncertainty about it. In assessing approximate bounds, experts found extreme percentiles easier to think about than quartiles. However, there is substantial empirical evidence th at people are poor at assessing extreme quantiles [e.g. W inkler (1967); Hora et al. (1992)] and we believe th a t quartiles provide a more faithful representation of an expert’s opinion, especially if they are assessed using the bisection method. A comparatively simple elicitation computer program was developed by Kadane et al. (2006) for the generalized Poisson distribution. In their paper, they explored the properties of the Conway-Maxwell-Poisson (COM-Poisson) distribution, in particular, the conjugate family of prior distributions associated with it. A computer application has been created to elicit the hyperparam eters of the conjugate prior distribution of the COM-Poisson param eters. The COM-Poisson distribution is a two param eter generalization of the Poisson distribu­ tion th a t allows for over- and under-dispersion. It has the following probability function P r { X = x\X,v} = Xx 1 (x\y' z (\,v y a; = 0 , 1,2 ,..., where OO The distribution indicates over-dispersion (under-dispersion) if u is less (greater) th an 1. It is the usual Poisson distribution if v — 1 . Since the COM-Poisson distribution is a member of the exponential family, it has a conjugate prior of the form h(X,v) = Xa l e ubZ(X,v) Ck(a,b,c)} where k(a,b,c) is the integration constant. X > Q ,v> 0, The computer program, available at http://w w w .stat.cm u.edu/C O M -Poisson/, is de­ signed to elicit the values of the hyperparam eters a, b and c from the field expert. It computes and plots the histogram of the predictive distribution at allowable selected values of a, b and c. Specifically, P r{ X = x\a, 6 , c} = fc(a, b, c) / roo roo / A‘*+ I- 1 e - ‘/(i,+log(l!))Z(A, v)~[c+1)d\dv. Jo Jo Kadane et al. (2006) pointed th a t it may be difficult for the expert to give meaningful values for the hyperparam eters a, b and c, since the distribution is likely to be new to her. They assumed th a t the expert may have some knowledge about P r{X = x}. program plots the predictive distribution as feedback to the expert. Thus, the She can type in or modify the values of a, b and c using sliders and see the direct im pact on the predictive histogram. However, it does not seem th a t the expert will be able to adjust three values sim ulta­ neously to assess a histogram th a t represents her prior belief. Also, some combinations are not allowed because of m athem atical incoherence, and some others need large numbers of iterations to produce the histogram. A lot of adjustm ent may be needed before the expert is happy with the histogram, since no specific combination of the hyperparam eter values is known in advance for any intended appearance of the histogram. 2.7 C oncluding com m en ts In this chapter, we have reviewed some of the relevant research work on eliciting prior distri­ butions for the Bayesian analysis of GLMs and multinomial models. We have also discussed and reviewed the main psychological aspects th a t are usually involved in making the assess­ ments to elicit these prior distributions. In addition, we commented on some of the recent interactive graphical software th a t have been reported in the literature for implementing and facilitating the elicitation processes in some other statistical problems. However, this review has been restricted to work th at is directly relevant to the elicitation m ethods proposed in 51 this thesis. There is a huge body of research th a t handles elicitation problems and tech­ niques in general. As noted earlier, psychological concerns and recommendations for efficient elicitation techniques will be taken into consideration while developing the elicitation m eth­ ods proposed in this thesis. Available elicitation techniques and computer software will feed into the methods developed in the next chapters and will help in building the software to implement these proposed methods. 52 C hapter 3 The piecew ise-linear m odel for prior elicitation in GLMs 53 3.1 In trod u ction Generalized linear models (GLMs) constitute a natural generalization of classical linear mod­ els, where the linear predictor p art is linked to the mean of the dependent variable through some link function. The distribution of the dependent variable is not necessarily assumed to be normal. The model is determined by a combination of the link function and the family of distributions to which the dependent variable belongs (see McCullagh and Nelder (1989) for an introduction to GLMs). Being very common in both frequentist and Bayesian data analysis, GLMs have attracted much research. An im portant task in the Bayesian analysis of GLMs is to specify an informative prior distribution for model param eters. Suitable elicitation methods play a key role in this task of representing expert knowledge as a prior distribution (see, for example, Bedrick et al. (1996) and O ’Leary et al (2009)). A m ethod of quantifying opinion about a logistic regression model was developed by G arthwaite and Al-Awadhi (2006). They mentioned th a t the m ethod is very flexible and can be generalized to GLMs with any link function, not just the logistic link. This generalization has been introduced by the same authors in an unpublished paper, Garthw aite and AlAwadhi (2011). Their m ethod has been used to quantify the opinions of ecologists (Al-Awadhi and Garthwaite (2006)) and medical doctors (Jenkinson (2007); G arthw aite et al. (2008)). However, the m ethod makes simplifying assumptions regarding independence between the regression coefficients. One purpose of the current thesis is to extend the elicitation m ethod so th a t these assumptions are unnecessary. Different m ethods for this extension are proposed in Chapter 4. This will significantly increase the range of situations where the m ethod is useful. The original m ethod for logistic regression was developed and implemented in user-friendly interactive software. The software was re-w ritten in Java by Jenkinson (2007) who also extended it to elicit expert opinion about some other GLMs. extended further by the author of the current thesis. 54 It has been modified and The software is interactive, requiring the expert to either type in assessments or plot points on. graphs and bar-charts using interactive graphics. An executable stand-alone version of the current software is available as a java executable (jar) file and a Windows executable file (with .exe extension). The stand-alone versions together with the user m anual and the source code are freely available as Prior Elicitation Graphical Software for Generalized Linear Models (PEGS-GLM) at http://statistics.open.ac.uk/elicitation. The software is aimed to be executable on any machine regardless of its operating systems and w ithout need of any other software packages. The current modified version of the software is more flexible in determining the options available for the user, especially for data input and results output. Some im portant modifica­ tions involve broadening the scope of available models and the range of the link functions, and giving the user many suggestions, help notes and video clips, questions, warning messages and directions aimed at making the software more interactive and easy to use for non-statistical experts. Useful feedback has also been added. In this current chapter, the piecewise-linear model of Garthw aite and Al-Awadhi (2006) is reviewed, and we describe the elicitation m ethod they propose together with the above modifications. The assessment tasks th a t the expert performs quantify her opinion about the regression coefficients as a m ultivariate normal prior distribution. The largest extension to the current version of the software is a new section for assessing expert knowledge about correlated covariates. This will be introduced in Chapter 4. Im portant options have been added to the m ethod th a t quantify opinion about the extra param eter in GLMs th a t involve gamma and normal distributions. The theoretical derivation and im plem entation of these options are proposed in C hapter 5. 55 3.2 T h e elicita tio n m eth o d for p iecew ise-lin ear m od els (G A m eth o d ) For quantifying expert’s opinion about GLMs, Garthw aite and Al-Awadhi (2011) proposed a m ethod to elicit expert opinion about the prior distribution of regression coefficients and its hyperparam eters. As mentioned before, the m ethod, which will be referred to here as GA, is a generalization of the same authors’ piecewise-linear model th a t they used for quantifying opinion for logistic regression (Garthwaite and Al-Awadhi (2006)). In their work, the relationship between each continuous predictor variable and the link function (assuming all other variables are held fixed) was modeled as a piecewise-linear func­ tion. Figure 3.1 illustrates a piecewise-linear relationship between the quantity of interest Y, and a continuous covariate “Weight” ; the relationship correspondence to a sequence of straight lines th a t form a continuous line. The endpoints of the straight lines are refereed to as knots. Hour, you have finished with this continuous covariate (W eight), you m ay p ress 'Next Covariate' to proceed F8o Edit Tools Help Eliciting M edians of Y fo r v a lu e s o f W eight W eight [Revised median at 1o O ] Figure 3.1: A piecewise-linear relationship given by median assessments If the predictor variable is a categorical covariate, it is referred to as a factor. Its relation­ ship with Y corresponds to a bar chart as in Figure 3.2, where the factor takes four levels: Very large, Large, Normal and Small. The aim of the elicitation process is to quantify opinion about the slopes of the straight lines (for continuous variables) and the heights of the bars (for factors). In the GA m ethod, a m ultivariate normal distribution was used to represent prior knowledge about the regression coefficients. These coefficients were allowed to be dependant if associated w ith a single variable. A detailed discussion of their model is given next. Now, you fiava finished with this factor (X I). you m ay p ress Tfoxt co v a ria te ' to proceed Fie Edft Tools Help Eliciting M edians of Y fo r v alu e s o f X1 0.95 0.90 0.85 0.60 0.75 0.70 0.65 0.60 0.55 >- 0.50 0.45 0.40 0 .35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Very large Large N o rm a l S m a ll [Revised median atQ.ol Figure 3.2: A bar chart relationship for a factor given by median assessments 3 .2 .1 T h e p ie c e w is e -lin e a r m o d e l Consider a response variable £, w ith m continuous covariates R \ , R 2 : - ■■ , R m and n cat­ egorical variables (factors) R m+i , R m+2 , • • • , R m+n- Each variable R{ has 6 (i) + 1 knots, ri,Q, riti, • • • , ritS(i), where r* j_ i < ri}j for j = 1,2,-** , 8 (i) and i = 1,2, • • • , m + n. These knots represents the dividing points of the piecewise-linear relation for the continuous vari­ ables, or levels for factors, with r^o taken as the reference point of each continuous covariate Ri, i = 1, • • • , m, or the reference level of each factor R{, i — m + 1, • • • , m + n. 57 Let r 0 be the overall reference point, where all variables are at their reference values, i.e. “ (>"1,0 , r2,0 , (3.1) rm+n,0) • For the response variable £, the expert is asked about its mean values given points on the space of the explanatory variables, i.e. about (3.2) M(r) = S(C IS = r). where R = ( / £ 1; ... 5 J7T n+n) / ) a n d r is any specific value of R. Let Y = ffKr)] = “ + £=£i + 0 2X 2 + --- + 0 m+nX n+n, where </(.) is any monotonic increasing link function. (3.3) If g(.) is monotonic decreasing we multiply it by -1, then change the sign of the resulting regression coefficients. We put 2L = (Xi,u = X (,2 , X iAi)y, A.2, / W '> i = 1 .2 ,- " , m + n, (3.4) + n. (3.5) i = The relation between Ri and X_{, for continuous covariates is that: 0 X hJ if Ri < n j - i Ri if n j - nj-i - dij 1 < R i < r itj (3.6) if r ij < R i , for i = 1,2, • • • ,m , and j — 1,2, • • • , 5(i), where di,j — r i j (3.7) r i j - 1. For factors, Xi j is defined by: 1 Xij (3.8) 0 for %= m + l ,m + 2, • • • ,7 7 7 + 77 , if Ri = r — otherwise, and j = 1,2,*** , <5(i). Note th at, if Ri — r^o, then X_{ is a zero vector {i = 1, • • • ,7 7 7 + 58 77 ). The m ethod concentrates on an expert’s opinion about each covariate Ri separately, one at a time, assuming th a t all other covariates are kept at their reference values. Hence, for any specific value r, Yi(r) is defined as (3.9) Yi(r) = g[m(r)], where tH{r) = M (rij0, •••, n - i,o, r, ri+i.o, •••, (3.10) denotes the mean value of £ when Ri has a value of r, and Rj = r+o, j ^ i. Then Yi(r) = a + {Y.J(_i, i = 1,2, ••• ,m + n. (3.11) Now, for i = 1, • ■• , m + n, j = 1,' ” , 6 (i), let (3.12) Y{j = Yi (r ij ) . For (3. as in (3.5), if Ri is a factor and r — r i j , then, in view of (3.8), (3.13) Y i j — ex. -f- P i j , hence, for factors, where i = m + 1, • • • , m + n, j = 1, • • • , S(i), we have Pi,j = Y i j (3.14) — Y ^q. For continuous covariates, from (3.6) and (3.7), for i = 1, • • • , m, j = 1, • • • Pi,j ~ Yi,j ~ Y i j - i (3.15) dhj The values of f a j are the slopes of the piecewise-linear relation in Figure 3.1. The prior distribution of a and (3 = { r ' r' ... r' )' is assumed to be the follow— t—V tL2’ ’ t-m+n ing m ultivariate normal distribution, / ( ( \ a \ bo w I ~ MVN Kb~ J 59 (0-0,0 £ i,\\ V -1 *7 (3.16) The elicitation of the hyperparam eters bo, b, <7 0 ,0 , £1 and S is reviewed in the next section. The m atrix S is assumed to have a block-diagonal structure, as the vectors (3^,(3'2, • • • >/3m+n> are assumed to be independent a priori. We propose three elicitation m ethods th a t relax this assum ption in the next chapter. 3.2.2 E liciting th e hyper param eters o f th e m ultivariate norm al prior The assessments th a t are required for eliciting all the prior hyperparam eters are only medians and quartiles of Pi(r). The monotone increasing function g(.) in (3.9) is then used to transform these assessments into medians and quartiles of Yi(r). Two main properties of the assumed normal distribution of Y are used extensively to elicit the hyperparam eters from medians and quartiles. Namely, these properties are equating means to medians and getting variances from interquartile ranges. It is well-known th at, for normally distributed Y , (3.17) where Q\ and Q 3 are the lower and upper quartiles of Y , respectively, as 1.349 is the in­ terquartile range of a standard normal distribution. Using the above approach, the elicitation of each hyperparam eter is detailed below. E licitin g 60 Let 7/10,0,5, and m o ,o .2 5 0 0 ,0 and 7710 ,0 .7 5 be the median, lower and upper quartiles, respectively, of yu(r0). Recall th a t r 0 is defined in (3.1) as the reference point of all variables, in which case, Y is equal to the constant term a. The expert assesses 7710 ,0 .5 , 7710 ,0 .2 5 and mo,0 .7 5 >which are then transform ed into the corresponding quartiles of Y , using the monotone increasing link function g{.) in (3.3), as yo,q = g{m 0 ,9)1 for 60 9 = 0.25,0.5,0.75. (3.18) So, bo and <to,o are elicited, in view of (3.17), as bo = t/0,0.5, 7/0,0.75 — 7/0,0.25 0 o ,o = (3.19) 2 1.349 (3.20) E licitin g b The expert is asked to assume th a t her previously assessed value mo,o.5 is the true value of the mean of £ at the reference point r^o, i.e. assume th a t /ii(r;,o) = mo,0 .5 , for each covariate %in turn, i = 1,2, ••• ,m + 71 . Given this information, she then assesses the conditional median of P i ( r ) at all other knots of Ri . These conditional medians are denoted by m ^ j y0 .5 , for j = 1,2, - • • ,5{i). Hence m j , 0 .5 = The Median of [/ii(ri,j)|/ii(ri,0) = m 0,o.5]- (3.21) The use of the software to assess these conditional medians is reviewed in detail in Sec­ tion 3.3.3. From (3.16), b = E((3) = E ( P \ a = bo), (3.22) but, from (3.1), (3.10), (3.18) and (3.19), we have b = E [ P \ m ( r i , o ) = m 0,0 .5 ]- (3-23) From the conformaly partitioning in (3.16), each element of b in (3.23) is of the form = m 0,o.5 ]- bitj = (3-24) Applying g{.) on both sides of (3.21), in view of (3.9) and (3.12), we get ,0 ) = mo,0 .5 ] = Vi,j, 0 .5 , (3.25) where 7/i,j, 0.5 = 0 .5 ). 61 (3.26) Now, from (3.24) and (3.25), bij can be elicited for factors, in view of (3.14), as K j ~ Vi,j,0-5 - 2/0,0.5, (3.27) for i = m + 1, • • • , m + n, j = 1, • • • , J(i), and for continuous covariates, in view of (3.15), as bid = —J’0-5 ~ V ' j - 1’0-5 , di,j (3.28) for i = 1, • • • , 771, j = 1, • • • ,5(i). Eliciting For any value a* satisfying a* ^ bo, it can be seen, from (3.16) and the theory of m ultivariate normal distribution, th a t E{§_\a = a^■) = b + a l (7a l(a •- b 0), (3.29) from which £1 [E{§\a = a *) = ----------- j — 7 6 ]cr0 ,0 a* - b 0 So, g_x can be elicited using assessments of ,0 (3-30) • — E((3\a = ck*), or equivalently, the expert is asked to assess = g ' 1^*)], 77iijj)0.5|a* = The Median of Following the same approach as in (3.27) and (3.28), equation (3.31)implies, for bi7j\a* Vi,j,0.5|a* for % = 771 + 1, • ^ > (3-31) factors, th a t (3.32) ••, 771 + 7i,j = 1, • • • , 5(i), while for continuous covariates it implies th a t i ®i,j\a* _ Vi,j,0.5|a* Vi,j- l,0.5|a* i hj > / Q Qq \ (.o .o o j for i = 1, ■■• , 77i, j = 1, • • • , £(i), where Vi,j, 0.5|a* 9(V^i j,0 .5 |a * ) • (3.34) Using the interactive software, a * is taken as 2/0 ,0 .7 5 , and the task of assessing 77i^j o.5 |y0 0 75 is detailed in Section 3.3.5. 62 E lic itin g E For eliciting the variance-covariance m atrix E of the m ultivariate normal prior distribution of /?, the m ethod of GA adopts a structured approach th a t recursively elicits conditional lower and upper quartiles given incremented sets of the previously assessed m edian values. The aim of using this structural elicitation is to be able to ensure th a t assessments yield a m atrix E th a t is positive-definite, as required for m athem atical coherence. The idea is th a t assessed conditional quartiles are transformed, under the normality as­ sumption, into sets of conditional variances th a t determine all elements of E. The positive­ definiteness of E is guaranteed under a very logical condition th a t is quite simple to recognize and which the expert can fulfill during the elicitation process. Specifically, the expert is asked to keep reducing her uncertainty as a set of conditional values is increased. Condition­ ing on more information should increase her confidence in her assessed values, especially as the conditions say th a t her previous median assessments were accurate. In what follows, we review the m ethod of GA for eliciting E, using the same notations and equations of Garthwaite and Al-Awadhi (2006). In the next chapter, we propose a generalization of the m ethod for the case of correlated vectors of regression coefficients. Let the conditions th a t /^(r*,o) = rao.o.s and 0.5 be denoted by m °0 and respectively, for i = 1,2, • • • , m + n, j = 1,2, • • • , S(i). For each covariate R 4 , %= 1,2, • • • , m + n, the assessment process consists of 5(i) steps. At step k, k — 1,2, • • • , 5(i), the expert is asked to assume th a t conditions m ^ 0, m - l5 • • •, m ik- 1 hold. Given this information, she assesses the conditional lower and upper quartiles of denoted by 771*^,0.25 |77i°0, *■• and 771^ , 0.75 |77iJi0, • • • , m ^ k_ v respectively, for j = k, k + l , -- - ,S(i). The use of the interactive software to obtain the assessments of these conditional quartiles is discussed in Section 3.3.6. For i = 1,2, • • • , m + n, fc = 1,2, • • ■, S(i), j — k, k + 1, • • • , £(z), using (3.17), the assessed 63 conditional quartiles are used to elicit the conditional variance: where y denotes the condition th a t Y^i — 0 .5 , which is equivalent to from (3.10), (3.12) and(3.26). For m athem atical coherence, conditioning on more values at each further step must reduce the value of the conditional variance in (3.35). Consequently, the expert m ust steadily reduce her uncertainty when she moves from one step to another. In view of (3.35), this means th a t the assessment of the interquartile range in step k must be less than th a t in step k —1 , which guarantees th a t (3.36) For i = 1,2, • • • , m + n, k = 0,1, • • • , S(i) — 1, let the conditional variance-covariance m atrix be defined as (3.37) To elicit the full m atrix A^o in the last step and investigate its positive definiteness, m athem atical induction is used to obtain a positive-definite m atrix A j^ -i from A^k th a t has the same property. To achieve this, let (3.38) for k = 1,2 , • • • , <5(z), where is a scalar, fc is a vector and 4 ^ is a square m atrix. In particular, the scalar <p^k,k in (3.38) is given by The scalar 4>i,k,k can thus be directly elicited using (3.35). The vector — d)., %,k takes the form: (3.40) From the theory of conditional m ultivariate normal distributions, and for j — k + 1 , • • • ,S(i), we have V a r ^ -I ^ o , • • • , yf)k) = V a i ( Y i t j \ y l 0 , • • • , y - ^ ) - (3-41) Hence, from (3.36) and (3.41), $i,k,j >j — k + 1, • • • , £ (« )5 i*1 (3.40) is given by = {hk,k[y&TiYi j \ y i ^ y i , i r •' ’Vi,k- 1 ) - V a r ^ j l ^ o , ^ ! , • • •,2/i>fc)]}2- (3.42) W hat is left to be elicited in (3.38) is the m atrix $ i fk, which can be computed, using the conditional m ultivariate normal theory, as $i,k = A ijk + ( 3 ,4 S ) Hence, the m atrix A ^ - i in (3.38) can be obtained from A ^ , for k = 1,2, • • • ,S(i) — 1. Finally, A^o is the result of applying the same routine recursively, starting with A ^ j - i as Ai,S(i)-i = Va r(Yi,S(i) A >2/?,i’ • • • ’ V i m - 1)- ( 3 > 44) It can be seen, from (3.35) and (3.44), th a t Ai,5(i)_i > 0. (3.45) From (3.38) and (3.43), we can write the determ inant of A ^ - i as |A i ,f c - l | = 4>i,k,k\$i,k ~ — 0i,fclfc|AiIfc|. (3.46) Hence, from (3.45) and (3.46), A^o is positive-definite. Under the independence assumption between the elements of different vectors of regression coefficients, the m atrix A can be defined as 65 O O D ^ A i . o ^ r 1)' O O I O O D - 1Am)0( D - 1)' : : (3.47) A \ • • O Ajn4.ito O • : : o •. o O • •• ••• ••• O Am+n,Q where, for i = 1,2, • • • , m, each Di is a lower triangular m atrix given by ^ ,i Di = 0 0 ••• 0 di, l dit2 0 ••• 0 di, i <^^2 c?i,3 0 : ydi,i W ith d{,j as defined in (3.7), di,j d{, 2 d{,3 ••• ^ (3.48) d{,$^ J 0, and hence D r 1 exists. Since, for continuous covariates, from (3.15), we have (YiA, yj,2, Yi m )' = (a, ■■■, aY + DiPi, (3.49) then V ar(C i^ .|a) = Var((yi l j YiMi)Y\a ) = A>,o, yj2i for * = 1,2,■ ■■ ,m . (3.50) Hence, D ^ 1A i ^ D ^ Y , for i = 1,2, • • • , m, vm<*) = (3.51) A^o, for i = m + 1 ,m + 2, • • • , m + n. In view of (3.16), the m atrix S, as the unconditional variance of (3, can be given by £ = A + fljtr^ i. (3.52) The full variance-covariance m atrix of (q^ p')' is thus positive-definite, from (3.16), (3.47) 66 and (3.52), since —^o,o|S ai £i0-O)o£il —°o,o|A|. (3.53) S The needed assessment tasks in order to elicit all the hyperparam eters bo, b, <ro,o, Q_\ and S, are given in detail with the software description in Section 3.3. 3.2.3 C om puting values for th e suggested assessm ents For larger elicitation problems, with many covariates or large numbers of knots per covariate, the number of required assessments increases and may represent an overload on the expert. To reduce this number of assessed quantities and help the expert to go through the elicitation process more easily, the m ethod of GA suggests some values of assessments th a t can be presented by the software to the expert, as a guide for her possible assessed conditional medians and quartiles. The expert may accept these suggestions if she finds them a reasonable representation of her opinion. Or, instead, she may change or modify them to the best of her knowledge and experience. The m ethod of GA chooses values to suggest by extrapolation from the previously assessed medians and quartiles, assuming some patterns of dependence or independence at different knots of each covariate. The derivations of these suggestions are reviewed below. S u g g estin g co n d ition al m ed ians Assuming independence between a and /?, the conditional medians ra^ o .sla* in (3.31) th a t are required for eliciting a 1} can be suggested as follows. Conditioning on a = o*, and under the independence assumption, we have —bij, Vi, j. (3.54) Taking a* = 2/0 ,0 .7 5 > and equating the right hand sides of (3.27), (3.28) to those of (3.32), (3.33), respectively, equation (3.54) implies th a t { Vi , j , 0.512/0,0.75) — 2/0,0.75 = 2/i,j , 0.5 “ 2/0,0.5 67 (3.55) for %— m + 1, • • • , m + n, j = 1,2, • • • , 6 (i), and { Vi , j , 0 . 5 12/0,0.75) “ —1 ,0 .5 12/0,0.75) = Vi , j , 0.5 - V i , j —1,0.5 (3.56) for i = 1,2, • • • , m, j = 1,2, • • • , 8 (i). Now, from both (3.55) and (3.56), we have ( V i j',0.512/0,0.75) - 2/z,j,0.5 = 2/ 0 ,0.75 - 2/0,0 .5 , (3.57) for i = 1,2, • • • , m + n, j = 1,2, • • • , 5(i). Hence, from (3.34), (3.57) and the independence assumption, a reasonable suggestion denoted by 771^,0.512/0,0.75 for 77^,0.512/0,0.75 is given by 777i,j,0 .5 12/0,0.75 = g 1 ( 2/ 0 ,0.75 - 2/ 0 ,0.5 + Vi,3,0 .5 ) , (3.58) for i = 1, 2, • • • , m + n, j = 1, 2, • • • , £(2 ). All the components in the right hand side of (3.58) can be computed from the previous assessments as in (3.18) and (3.26). Of course, accepting these suggested medians by the expert will lead to a zero vector as a value of a^ . S u g g estin g co n d ition al q uartiles for factors The simple idea here is to assume th a t the expert’s opinion at one factor level is independent of her opinion at other levels. These lead to conditional quartiles th a t are unchanged as the number of conditions increases. In particular, let rhij^0 .2 5 |t7 7 °0 5 " ' >m i,k an(f ™'i,j,o.75 \m i,o>' ' ’ >m i,k b e the suggested values of the conditional lower and upper quartiles, 777^^,0.25 • ** >m ik anh m /j,o.75|77i®o> • • • , 77i°fc, respectively, as required in (3.35), for i = m + I, -- - , m + n, k = 1,2, - • • ,S(i) — 1 and j = fc + l,fc + 2, ••• ,S(i). Under the independence assumption, the suggested values are (3.59) 68 and (3.60) for i = m + 1, • • • , m + n, k = 1,2, • • • , J(i) — 1 and j = k + 1, k + 2, • • • , 5(i). Again, the expert can change any of these suggestions should she wish. S u g g estin g co n d ition al q uartiles for con tin u ou s covariates ' ' ’ ’m i,k an<^ m *J,o.75lm i,o> In offering suggestions for the conditional quartiles, ••• as required in (3.35), the m ethod of GA distinguishes between two cases, the case where k — 0, and the case where k > 0. In the case of k = 0, the assumption is th a t the relation between Y and Ri is approximately linear, instead of being piecewise-linear. Hence, we may imagine three lines emerging from 2/o,o.5 at the reference knot r^o- The middle line connects all the medians while the lower (upper) line connects all the lower (upper) quartiles # ( ^ ,. 7,0 .2 5 I ^ q ) (fl,(flitj,o.75lrrii>o))> at all other knots, r^j, for j = 1,2 , • • • , £(z). The linearity assum ption ensures th a t the slopes of each of these three lines are equal at all knots r ij, j — 1 , 2 , • • • , <5(i). This implies th at, for any value I — 1 , 2 • • • , S(i), I 7 ^ j , Vi,j,0.5 - g(w»ij,0.25|n»°o) _ 3/i,i,0.5 - g(ra»,l,0.25|ra°o) (3.61) and 0(rat,j,o.75|ra?>o) - Vi,j,0.5 _ 9(m,i,o.75\ml0) - ^ , Once the expert has assessed one conditional quartile, 777^ , 0.25 0,5 (3.62) l ^ o or m i,l,0.75 1m i,o> equation (3.61) or (3.62) can be used to suggest conditional quartiles as (3.63) or 7 7 ^ ,0 .7 5 |77l?i0 = 5 M 2/ij , 0.5 ~ [Vi,l, 0.5 ~ respectively, for j — 1,2, • • • , $(i), j ^ I. 69 0 .751™ Suggestions for all conditional lower (upper) quartiles are extrapolated from only one assessed value of the conditional lower (upper) quartile. This helps a lot in saving the expert’s time and effort during the elicitation process. For the remaining assessment tasks, where k = 1,2, •• • ,S(i) — 1, a new assum ption is imposed to obtain the suggested quartiles 771*^,0 .2 5 \m lo, The conditional correlation coefficient between Y i j and m i,k an<^ ’ " m i,k- for j = k + 1, k + 2, • • • , 5(i), is assumed to be of the form Corr (Yt J , Yi<k\ yl 0, ■■■, y ^ ) = (3.65) From which, using theory of bivariate normal distributions, the conditional variance is given by V ar(Y y|2/?,o, • • • , & _ „ ! & ) = (1 - • ■, 3 / ^ ) , ' (3.66) for j = k + 1, fc + 2, • • • , 5(i). Once the expert has assessed both a lower and an upper conditional quartiles at any one knot, say r^fc+i, the value of V a r ^ ^ + i l ^ o , • • • ^y^k- vV lk) can be elicited from equation (3.35). Since V a r ^ ^ + il^ Q , • • • ,y®k-i) bas already been elicited in step k — 1, then the value of pitk- 1 can be computed from (3.66) for j = k -f-1. Substituting with p i ^ - i again in (3.66), and using the already elicited values of Var(Y^|7/?Q, ‘ ’ * »Vi,k-i)> ^or j = k + 2, • • ■, S(i), the value of V&r(Yitj \y?0, • • • , i »2/?,fc) can be obtained for all j = k + 2, • • • , 5(i). After the value of Yar(Yij\yf0, • • • , y®k) has been equations for mj, 0 .25 ^ 0 , • • ■ , 2/°fc and Wj, 0 .75 ^ 0, ••• elicited, we can solvethe following two , 2/°fc, 2 V ar(y)j |yf0, ■■■, y f k) ~ 1.349 (3.67) and (2/ij,o.75l2/z9,o, • • • , y l k) ~ yi,j,0.5 _ P ( ^ ,j ,0,75|m?0, • • • , m ? ^ ) - yiJt0,5 Vi,j,0.5 - (yi,j,o.25\ylo, • • • , ylk) yij, 0.5 - g ( m itj,o.25\mlo, • • • , ^ 9 fc_ x) ' (3.68) The use of equation (3.68) aims to ensures th a t asym m etry of the suggested quartiles around the median a t step k is the same as any asymm etry of the assessed quartiles at step k — 1. 70 Finally, in view of (3.26), the suggested quartiles are given by (3.69) and (3.70) for « = 1,2, • • • , m, A: = 1,2, • • • ,5(i) — 1 and j = k + 1, k -f 2, • • • ,S(i). 3.3 A ssessm en t tasks and softw are d escrip tion The assessment procedure divides naturally into five stages, which are described in turn. A description of the m ethod and theory for using the assessments to estim ate the hyperparam ­ eters of the prior distribution was reviewed in Section 3.2.2. 3 .3 .1 D e fin in g t h e m o d e l The modified version of the software, PEGS-GLM, offers the expert different options for the model to be fitted. The choices available are ordinary linear regression, logistic regression, Poisson regression and any other user defined model. Ordinary linear regression assumes a normal distribution for the response variable w ith the identity link function. For the logistic regression the assumed distribution is Bernoulli with the logit link function. Poisson regression assumes a Poisson distribution with the logarithm link function. The expert can choose to define any other model, in which case she will be asked to give a distribution and a link function. Available distributions are the normal, Poisson, binomial, gamma, inverse normal (inverse Gaussian), negative binomial, Bernoulli, geometric and exponential. The user is also asked for some param eters of the selected distribution where appropriate. However, the expert has the option to elicit the extra param eters of the normal and gamma distributions. Novel methods for eliciting these param eters are proposed in C hapter 5. 71 Available link functions are the canonical, identity, logarithm, logit, reciprocal, square root, probit, log-log, complementary log-log, power, log ratio and user defined link function. For a detailed definition of these link functions see McCullagh and Nelder (1989). For the power link function the software expects the exponent of the power function to be entered by the expert, a value of (-2) is suggested as a default. On choosing the distribution the software suggests the suitable canonical link function so as to help the expert (see Figure 3.3). " Hunger tfcorariates in fteniGde£ Chocsatte regressionmodefc ~~~ j? {Qtfisrmotel 31 Gnosettedstiftu&xc {Binomial anosetteinkfunc&ixc |otfierlinf;function Write your function here: ~ 33 Dist.2ndparameter. |l 3J ErpMentva.’ue: |-2 y=|log(x} Help? | Writs your irwefse function here: x=jesp&) <Back { fieri >j Help? | Figure 3.3: The dialogue box for defining the model An im portant modification to the software (made by the author) is th a t it offers a large range of GLM’s. It also lets the expert write her own link function and its inverse. The program m can parse both formulas and check their validity as m athem atical expressions. Moreover, the program can help by checking w hether the functions are valid inverses of each other. 3.3.2 D efining th e response variable and covariates The expert determines the dependant variable w ith its minimum and m aximum values in a dialogue box. The modified version of the software suggests the maximum and minimum values of the response variable whenever possible. The expert may still change them , but, in the light of the chosen model w ith the specified link function, invalid values are not accepted, and the expert is shown a warning message (For example, the range for a binomial proportion m ust not extend outside the interval (0,1)). A set of explanatory variables (covariates) are chosen by the expert. Each covariate is 72 treated as either a continuous random variable or a factor. Continuous covariates are specified with their minimum and maximum, factors are specified with their levels. For each continuous covariate, knots are chosen by the expert or suggested by the software. A reference point is chosen for each covariate, while the origin is the setting for which every covariate is at its reference point. After determining the number, names and types (continuous covariate or categorical factor) of the variables, the expert has only to give the maximum and minimum for each of her continuous covariates together with the value of its reference knot, and the modified software then suggests a suitable number of knots and the position of the reference knot relative to the other knots. The software can then divide the range and gives the value of each knot. This process is done autom atically to reduce the burden of data entry, but, again, the expert can change any of these. The fractional p art of each single numeric value is always being rounded to four decimal places, so as to avoid large decimal numbers which are not easily readable nor suitable for graph axis. If higher precision is to be used, measurement units can be modified to use data values of no more than four decimal places. For categorical factors, the expert gives the value of each level. In some cases, when the factor levels are ordinal data, for example, the expert may wish to keep the order of the factor levels, while still being able to select any level as the reference level. The author’s modification of the software gives an option to select the reference level of each factor without restricting it to be the first knot (see Figure 3.2). Using a dialogue box, the median, lower and upper quartiles of l^i{r0) at the origin are assessed, namely, rao,o.2 5 >^ 0 ,0.5 and rao,o.7 5 , as denoted in Section 3.2.2. These values must be inside the previously specified range of the response variable; if not, the software warns the expert and asks her to resolve this conflict. In the expert’s opinion, the true value of /iji(r0) is equally likely to be bigger or smaller th an the assessed median. Together w ith the median, these quartiles should divide the range into four equally likely intervals. The expert is encouraged to modify her median and quartile assessments until they divide the range into four intervals th a t each seem equally likely to her. These assessed values are used as in 73 equations (3.18), (3.19) and (3.20), in Section 3.2.2, to estim ate &o and cro.o- 3.3.3 Initial m edians assessm ents In the rem ainder of the elicitation procedure, the expert is separately questioned about each covariate in turn. She is asked to assume the other covariates are at their reference values/levels and forms a piecewise-linear graph or bar chart to represent her opinion about each separate covariate. The previous stage elicited the expert’s median estim ate, rao,o.5 , of ££i(r0) at the origin r = r 0. The software plots this value on the reference vertical line and the expert is told to treat it as being correct. The expert then plots her median estimates, 5, of /^(r^-), as given in equation (3.21), to form the rem ainder of the graph. She does this by using the computer mouse to ‘click’ points on the vertical lines. Straight linesare drawn by the computer between the ‘clicked’ points, which the expert can change until she feels the graph corresponds to her opinions. As an illustration, Figure 3.1 shows a software graph for the variable “Weight” . The horizontal axis gives values for the variable and the vertical axis gives values of Y. Thus the graph plots the effect on Y as the value of “Weight” varies. The experts is told th a t, if the graph is fairly flat, then the variable has less influence on Y th an if the graph is more curved. The axes and vertical lines are drawn by the software. For factors, bar charts are formed to represent the expert’s opinion. The value of Y has been elicited earlier for the reference level and this gives the height of the reference bar. The expert is told to assume th a t this bar is correct and to judge the appropriate heights for other bars relative to it. These heights give the value of Y for each level when the other covariates are at their reference values/levels. The software draws thin vertical lines for each level and the expert specifies the height of a bar by clicking on the line w ith the mouse. This is illustrated in Figure 3.2 where all bars have been specified. The expert could change an assessment by re-clicking on a line. These m edian assessments, 74 m i,j,0.5) for the continuous covariates and factors yield estimates of the hyperparam eter b, the mean of the regression coefficient vector (3. Theoretical derivation of this estim ation is given in detail in Section 3.2.2, equations (3.26), (3.27) and (3.28). 3.3.4 T he feedback stage It is im portant to help the expert check th a t her assessments have resulted in a prior distri­ bution th a t is a reasonable representation of her opinion. This is done through a feedback stage, in which the expert is informed of some other measurements th a t are inferred from her assessments. She can review and revise her original assessments, in the light of this feedback, if necessary. The current elicitation m ethod has quantified the relationship between the re­ sponse variable and each covariate in turn, while assuming th a t all other covariates are at their reference points. Hence, it is im portant th a t the expert has feedback th a t shows her implied predictions of the response variable when all covariates are simultaneously changed from their reference points. The software computes the values of the response variable at some suggested design points and presents these values to the expert to check th a t they are reasonable representation of her opinion about the response variable at each suggested design point. Figure 3.4 illustrates a feedback screen, in which the software suggests 6 design points, each of which is a combination of the values and levels of all covariates. Combinations 1 and 4 are the covariate values th a t gives the minimum and maximum response values, respectively. Combinations 2 and 3 consist of the values th a t divide each covariate range into one-third and two-thirds, respectively. Minimum and maximum values of each covariate are suggested in combinations 5 and 6, respectively. The expert is asked to specify other design points of interest and to revise any design points offered by the computer th a t are unrealistic combinations of covariates. 75 ' CcKanateKana C orftonatttoji CombffabanfT) CorofcinaSanfS) Ccmtwiaboolti CDnflKisSanps Corat*naSca$6) | j » o -r | m o-T | 15.0- r I j»$x€u JS.O-T TOa*erT«t****!are | JO a-f ] U 3 J3 lr] I SUIUBM 15.0“ 1 21.6667-^ I fe restilip e V & rillM to n | jnb€a jst 23.3 !33-rj ^ 2 . K . z l H ZJ | (sj*H z l N M f-r japat Z Graph values otiT: Scaled values ol Y: jauB |0 .U 32 137343 j 3.4727 I Overati scale tactoc " J in f x S ] a ,E.............. * ja m jl3 £ 3 4 |: m 2 e |iU ??4 0.671-p ■ e m 0 Do the values o f Y represent jour opinion reasonably vreH? I so, cut* Tio to next section wtttiou* seating-, otherwise, change ccrrariales o r Y values, then click 'Apply scaling factor and go to next section' o r "Apply scaling factor and review graphs*. || | -pel- seating latfci and w i * jjiopte | seating tactw an <3go to ne«t section 11 *30 to non section «vtthc*ft seating | }Heip? it2) 1 Figure 3.4: The feedback screen The expert is asked to check th a t the row of “Graph values of Y ”, as given in Figure 3.4, is an acceptable representation of her opinion at each design point. These values are predicted from the graphs of medians th a t were assessed by the expert in Section 3.3.3. T he values th a t are outside the range of the response variable, which was specified at the sta rt of the elicitation process, are flagged in red. The expert can change the unacceptable values by varying the “Overall scale factor” until the row of the “Scaled values of Y n, in Figure 3.4, represents her opinion reasonably well in term s of the predicted values a t each design point. The scaled values of Y are computed by multiplying all regression coefficients, except the constant term , by the selected value of the overall scale factor. The expert may choose to review and revise the scaled median assessments again as in Section 3.3.3. Then she will be shown an updated feedback screen. The process will continue until the expert is happy w ith the graph values of Y as presented in the feedback. 3.3.5 C onditional m edians assessm ents During this stage the expert is asked to assess her conditional medians, each covariate in turn, i = 1,2, • • • ,772 + at the reference point from the median, 72. 772^ , 0 .5 17720,0.755 f°r This is done by changing the conditioning value 7720 ,0 .5 5 76 to the upper quartile, 7720 ,0 .7 5 - See Figure 3.5 in which median assessments made in the previous stage axe given together w ith the upper quartile a t the reference point. The expert assumes th a t the true value of Y at the reference point is the given upper quartile and she is asked to change the median values at other points to assess rriijfi,5\mo,o.75 in the light of this new conditioning value. Conditional medians for all values have been assessed by the expert in Figure 3.5. These assessments are needed to elicit a p art of the covariance m atrix A, namely, a 1, the covariances between a and each of the components of /?, see equations (3.32), (3.33) and (3.34), in Section 3.2.2. Suggested values of these conditional medians, are given by the software, assuming th a t a and /? components are independent, see equation (3.58) in Section 3.2.3. The expert can change these suggested values if she wishes. .....'Tia*i 1: Now, you have finished with this continuous covariate (W eight), you ntay p ress 'Next C ovariate' to proceed FBe E d t Tools Help Eliciting Conditional M edians of Y fo r v alu e s of W eight 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 >- 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 10.0 23.0 43.3333 71.6667 We'9h* 100.0 iRevised median at IQ.'ol Figure 3.5: Conditional median assessments for the continuous covariate “W eight” 3.3.6 C onditional quartiles assessm ents The median assessments provide point estimates of the relationship between different covari­ ates and the variable Y. The remaining task is to quantify the expert’s confidence in these estimates and their interrelationship, i.e. how accurate she believes the estim ates to be and the correlations between them for each covariate individually. Correlations between coeffi­ 77 cients of different covariates are estim ated in three different methods proposed in Chapter 4. In this stage, assessments of conditional lower and upper quartiles, j,o .75 0’ 771^ 0 .2 5 !777° o and respectively, are elicited. Assessing quartiles is a harder task for an expert th an assessing medians, and quite a large num ber of quartile assessments are required. To assist the expert, the software suggests some quartile values by extrapolating from other quartile assessments of the expert. The theoretical procedure for getting these suggested values, rhijfi, 25 \m^0 and as reviewed in Section 3.2.3, was programmed into the software to effectively help the expert during the current stage. The expert can change these assessments and commonly does so but, even then, a starting value to consider seems to make the task easier. For each continuous covariate in turn, the software displays the graph of the medians th a t had been assessed earlier, tu ^ o . 5 , and then sets of conditional quartile assessments, m i,j,o.2 5 |tu°o and 777^ 0.75 |m ?0, are elicited. For this first set of assessments, the condition is th a t the value of Y at the reference value/level equals the m edian assessment, i.e. /^(r^o) = 7770,0.5Now, you have finished with the continuous covariate (Weight) a t step (1). you may press 'Next step* to proceed' ; m e Edit Tools Help Eliciting Q uartiles of Y for values of W eight 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 > 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 10.0 23.0 43.3333 71.6667 Weight 100.0 lUppcr Quartile at IQ.ol Figure 3.6: Quartile assessments for a continuous covariate In an interactive graph like Figures 3.6 and 3.7, the expert is asked to give her lower and upper quartiles for Y at one point on each side of the medians for each value/level of the covariate except for the reference value/level. The lines joining quartiles look similar to confidence intervals and it is emphasized to the expert th a t there should only be a 50% chance th a t the value of Y is between the lines at any point. The expert uses the computer mouse to make assessments or change values suggested by the software. •You a s se s s e d your lower quartile a t (Very large) to b e • (a.4 7 6 ). Please com plete for o th e r points Very large L a rg e ILower Quartile at Q .o] Figure 3.7: Quartile assessments for a factor For the second set of conditional assessments, the expert is asked to assume th a t the me­ dian estimates of Y are correct at both the reference value/level and the nearest points on each side of it, i.e. conditions m ?0, m ^ , •••, in Section 3.2.2. The expert gives lower and upper quartiles at another point, r^k+i, and the software suggests quar­ tiles, • • • ,m - fe and m iJ;o.7 5 |m °0,m ° i, • • • , for the rem aining points, j — fc+ 2, • • • , <5(z). In Figure 3.8 lower quartiles have been assessed while upper quartiles are to be assessed. The expert modifies quartile values so as to represent her opinion, subject to the restriction th a t the current values m ust be w ithin the previous set of quartile assess­ ments, m i o , m ^ i , • • • ,m°i k_ x and • • • , 7 1 ^ ^ . The idea is th a t as conditions increase, uncertainty should reduce. As detailed in Section 3.2.2, this condition guarantees th a t the covariance m atrix of correlation coefficients is positive definite. Figure 3.8 illustrates the graph formed at th a t stage. The two red lines (the outer lines) represent the previous set of quartile assessments, the second highest (black) line gives the median assessments, and the second lowest (blue) line joins the new lower quartile assess­ ments. The black line joining the median a t the right two bold points represents the condition th a t these medians should be treated as being correct. In assessing quartiles, the expert is told to consider the points to which she thinks the blue line may reasonably extend. You a s se s s e d your lower quarti'e a t (190.0) tu b a (0.306). p tease com plete for o th er points mmrnammm H e Edit Tools Help Eliciting Q uartiles of Y fo r v alu e s of H eight Heigh! [Lower Quartile at 19Q!o| Figure 3.8: Assessing quartiles conditioning on two fixed points Conditional assessments are also needed for factors. The software displays the bar chart th a t was formed during the assessment of medians. Conditional on the value of the bar a t the reference level being correct, i.e. on m- 0, the expert assesses a lower and an upper quartile, 0.25 and 77^ ^ 0 .7 5 I ^ q , respectively, for other factor levels. For each further set of conditional assessments, for both continuous covariates and factors, the expert is asked to assume th a t a further median given by another value/level was correct and to give her opinion about quartiles for the remaining values/levels. This is continued until the condition includes all but one of the values/levels at one side or one a t b o th sides, when the expert gives her opinion about just the last one or two values/levels (see Figure 3.9). H i a t m s tb s L a « s te p of thK facto r, P lease P ress "Hex* Covrartore’ hi the. Current Menu o f Sectfnn Three Eliciting Q uartiles e f Y fo r v a lu e s e f XI Very large U rg e [Lower Quartile at O.dl Figure 3.9: Assessing conditional quartiles for the last level of a factor As in other parts of the elicitation procedure, the expert uses the mouse to make assess­ ments. Figure 3.9 illustrates the bar chart when conditioning values are specified (indicated by the solid squares); quartiles for the last level are marked w ith short horizontal blue lines (the inner two lines), while the highest and lowest (red) lines represent the previous quartiles conditioning on fewer medians. Again, current conditional quartiles are not allowed to lay outside these red lines. The conditional quartile assessments, 771^ 0.25 ' ' ' j m ik and yield estimates of the variance, E, of the hyperparam eter /?, see Section 3.2.2. The conditional assessments complete the elicitation procedure for the case of independent coefficients as required in Section 3.2.2. 3.4 C onclu din g com m en ts The piecewise-linear elicitation m ethod for logistic regression introduced by G arthw aite and Al-Awadhi (2006), as reviewed in this chapter, is widely applicable for GLMs w ith any monotonic increasing link function. The m ethod only requires conditional and unconditional medians and quartiles to be assessed from the expert, these assessment tasks are easy to perform using the bisection method. The num ber of assessed quantities is sufficient to elicit a mean vector and a positive-definite variance-covariance m atrix for a m ultivariate normal prior distribution of the regression param eters of any GLM. The available modified software has increased the applicability of the m ethod and made its implem entation easier for the expert. However, the independence assumption between different regression coefficients th a t is imposed by the m ethod is sometimes unrealistic and need to be relaxed. Extended methods th a t relax this assum ption are proposed in the next chapter. 82 C hapter 4 E liciting a covariance m atrix for dependant coefficients in GLM s 83 4.1 In trod u ction For quantifying expert’s opinion about generalized linear models (GLM), Garthwaite and Al-Awadhi (2011) proposed a m ethod of eliciting opinion about the prior distribution of the regression coefficients. This m ethod, which will be referred to here as GA, is a generalization of the same authors’ piece wise-linear model th a t they used for quantifying opinion for logistic regression (Garthwaite and Al-Awadhi (2006)). A detailed description of their m ethod has been given in the previous chapter. In their work, the relationship between each continuous predictor variable and the depen­ dant variable (assuming all other variables are held fixed) was modeled as a piecewise-linear function. They used a m ultivariate normal distribution to represent prior knowledge about the regression coefficients. These coefficients were allowed to be dependant if they were asso­ ciated with a single variable. However, they assumed th a t there was no interaction between any variables, in the sense th a t regression coefficients were a priori independent if associated with different variables. Our aim in this chapter is to relax the independence assum ption between coefficients of different variables. In fact, in many practical situations, it may be thought th a t regression coefficients of different variables should be related in the prior distribution, if the prior dis­ tribution is to give a reasonable representation of the expert’s opinion. The expert may be asked to state which variables this applies to. We propose three different elicitation m ethods th a t are implemented in interactive graphical software. The software is freely available as PEGS-GLM (Correlated Coefficients) at http://statistics.open.ac.uk/elicitation. In the first m ethod, after assessing additional conditional quartiles, G A ’s m ethod of es­ tim ating the variance-covariance m atrix is generalized and used to estim ate the variancecovariance m atrix in generalized linear models where pairs of correlated vectors of coef­ ficients are not necessarily independent in the prior distribution. The second m ethod is designed to require a smaller number of assessments. Its generalization to the case of var­ ious vectors of correlated coefficients is straightforward, where the required conditions for 84 positive-definiteness can be easily investigated. A third flexible m ethod is proposed in which the expert assesses the relative correlation structure for all pairs of vectors, then chooses one of the other two methods to specify the coefficient for the highest correlated vectors. This m ethod autom atically fulfil the requirement th a t the whole variance-covariance m atrix must be positive-definite. The three proposed methods are detailed below. 4.2 A p roposed m eth o d for elicitin g th e variance-covariance m atrix o f a pair o f correlated vectors o f coefficients In this section, we propose an elicitation m ethod th a t generalizes the m ethod of GA to handle correlated coefficients in GLMs. We sta rt by generalizing the equations given in the previous chapter to make them applicable to the case of correlated coefficients. The underlying m athem atical framework is given in Section 4.2.1. The equations given there show how the required conditional assessments are m athem atically treated to elicit the variancecovariance m atrix. Our approach to assess these conditional quartiles from the expert using interactive software is detailed in Section 4.2.2. 4.2.1 N otation s and th eoretical fram ework Consider the piecewise-linear GLM of GA, with m continuous covariates R i, 1?2, • • • , Rm and n categorical variables (factors) Rm+h Rm+ 2 , • • * >-Rm+n- The model has been defined in Chapter 3, equations (3.1) to (3.15). Recall th a t the prior distribution of a and (3 = — (r' l_ 1 ’ r' £12’ ... ’ R1 !—m + n )' is assumed to be a m ultivariate normal distribution a \ (4.1) ~M V N \i) \ w Va s // The elicitation of the hyperparam eters &o> k, cr0)o, g_i and E has been reviewed in Section 3.2.2. Equation (3.52) states th a t E = A + (Li O'qq(t!i , where A has been assumed to have the block-diagonal structure ( i L>r1A1;0(L >1- :ly O O 0 O O D-lAmfi(D-'y : A= : : : O ! i (4.2) O V Ayn+l^ : : : O ^ ’ ■■ ■■■ *■■ O O G Am+n)o where, for i = 1,2, • • • , m, each Di is a lower triangular m atrix given by Di = ^dki 0 0 • • • dn di2 0 • • • dii di2 diz 0^ 0 0 (4.3) : 0 \d ii di2 diz dis(i) ••• Renee, for continuous covariates and V a r ( D i / ? . |o ! ) = V a x ( ( y . >1> y . >2j .. . , yi>5(i))V) = A.o, for i = 1,2, ■• • ,m , where Yi,j = 9[K(rifi, •••, r i - 1)0, r*j, ri+i.o, •••, Tm+n.o)7)]- As required, V is a continuous piecewise-linear function of the variable Ri, if all other variables are kept at their reference values. Hence, Di 1Ai)0(Di 1)/, for i = 1,2, V a r ( / ? . |a ) = ,m , (4.4) < A i to, for 2 = m 4 -1 , m + 2, • • • , m + n . Formulae for A^o are given in GA as reviewed in the previous chapter, see equations (3.37) to (3.44). 86 Instead of assuming the block-diagonal structure given by (4.2), we will conformally par­ tition A as S l.l ^1,2 ^2,1 S 2)2 •’• S 2)m + n A= y ^ m + n ,! E i,m + n ^ro+ n ,2 ' *' (4.5) ^ m + n ,m + n J where £'i,i = Var(/+|a:), (4.6) for z = 1,2, • • • , m + n, and the submatrices £ S)* are not necessarily zero matrices (s = 1 , 2, • • • , m + n, t = 1 , 2, • • • , m + n and s ^ t). We will estim ate the S S)t matrices in (4.5) by generalizing the m ethod of GA. Assume th a t the expert believes th a t (3 and (3t are correlated. For s < t, we must estim ate the upper diagonal covariance subm atrix VSyt of V , where, V = Var[(/3' (4.7) p.)'\a} = Vt,t J As a variance-covariance m atrix is symmetric, VttS = V'}t. The correlation relationships are handled one pair at a time. Suppose we are currently interested only in the pair (3 , (3 , and th a t these are correlated in the prior distribution. (The same procedure can be followed for each pair th a t is correlated.) For s = 1,2, • • • , m + n, t = 1,2, • • • , m + n, and s < t, let Sst = S(s) + S(t), and for k - 0 ,1, • • • , S s t - 1 , put V a r(y s ^ + i, • • • , Y s ^ s ( s )i ^ ,i j ’ ’ ’ for > 1 ^ 5 ,0 ’ ’ V s y k ) ’ 0 < k < 5(s) — 1. Vax(yM._J(s)+1, • ■• , Yt m \y°S)Q, • • • , y °s>6{s), y °0, • • • , ^ °fc_<5(s))’ for 87 5(s) < k < 5st — 1. Specifying conditional values j/? •, is equivalent to conditioning on the corresponding assessed medians as detailed in the previous chapter. We sta rt with A-st,5st —1 V ^ ( Y t , 5 ( t ) l?/s,0> »^ s , 5 ( s ) ’ V t,0i >y t , 5 ( t ) —l ) j (^-8) which can be computed from the conditional quartile assessments of the covariate Rt at £(£). The conditioning specifies the values of Y a t all previous knots of Rt and all knots of R s as well. Given these conditions, the expert assesses conditional quartiles m tjs(t),0.25 and m t,S(t),0 .7 5 - The m ethod of assessing these quartiles is detailed in Section 4.2.2. The formula for computing the variance ensures th a t A g t ^ t - i > 0, since A * t ,6 a t - 1 ~ [ 9 ( m t ,5 ( t) ,0.75 I ^ q , • • • 0> ' * * >m t , 5 ( t ) - l ) ~ 9{mt,6(t),o.25\m%, • • • ,m°sA s), m l Q, • • • ,m j>*(t)_ 1)/1.349]2. (4.9) We put $ st,k ,k 0 t ,k —s (4.10) A -st,k —1 \ ^ s t ,k for k = 1,2, • • • ,S st, where <pst,k,k is a scalar, <fist ^ st,k J is a vector and $ st,k is a square m atrix. In particular, the scalar <f>st,k,k in (4-10) is given by: Var(Yajfc|y °0, • • • <i>st,k,h = 1 < k < 5(s), for 1Var(Kti*_iW|2/°0, ■•• ,i y l 0t •■• , for (4.11) <5(s) + 1 < k < 5st. Recall from the previous chapter th at, for j = k + 1, • • • , 5(i), Vai(Ylj\ylu, ■■■ , ] / ( , . ) = V a rfy jlt^o , • • • ,y'ik-l) ~ <P7,k,k^h , j < (4.12) as a result of the theory about conditional m ultivariate normal distributions. Equation (4.12) can be generalized for the case where there are two correlated vectors of coefficients. Then, the vector (f)st in (4.10) takes the form: s t ,k ^4>s t , k , k + 1) '’’ 5 $ s t , k , S st ^ ’ where [0 «t,fc,fc{Vax(y^^|2/Oo, - - - “ Var(YS|j| ^ )0,--- ,y ° fc)}] 2 , for 1 < k < S(s), j = k + 1 , " ’ , S(s). [05i>fc>fc{Var(YtJ_J(a)|y°o, • • • , y j ^ ) -V ar(Y t|i_*(a)|y °0, • • • ,y ;>fc)}]5, fistfij = < for 1 < A; < <5(s), (4-13) j - 5(s) + 1, • • • ,5st. y >st,k,kV^&T0^tj-S(s)\y$,Oi > 2 / s , 5 ( s ) ’ 2 /* ,0 » ’ ^ ? ,f c - 5 ( s ) - l) -V aj(Y t>J-_J(a)|y°0,--- ,y ° 5(s),y?i0, • • • ,2/t°>fc_5(a))} ]^ for £(s) + 1 < k < 5st, j — k T 1, • • • , Sst. The main constraint needed here is th a t conditioning on more values at each further step must reduce the value of a conditional variance. The expert m ust therefore reduce her uncertainty as the elicitation process progresses. It means th a t her assessments of each interquartile range must steadily decrease. This will ensure th at, for i = 1,2, • • • , m + n, j > k, 1 < k < 5st, Var(Yij|j/°0, ■■• ,y ° k- 1) > Var(Y;j|2/?0, ••• (4.14) Conditional variances in (4.11) and (4.13) can be w ritten in term s of the assessed conditional quartiles as Var(YSJ-|y°0,- - . ,y£>jfc) = g{msd,o.75\m0S)o, • • • , m° fc) - g (m sJ)o.25|m°0, • • • , m° fc) 1.349 for 0 < k < S(s), Var(YtJ-|y°0,--- ,y°s,k) = g{mt,j,o.75 \m l0, • • • j = k + 1, • • • , 6(s), (4.15) , m ° fc) - y ( 7 ^ 0 . 2 5 | t o ° 0 , • • • , m ° fc) 1.349 for 0 < k < S(s), 89 j = 1, ■■• , 5(t), (4.16) V a x ( y ^ | y ° 0 , ■ • • , y ° 5 ( s ) , y t0| 0 , • • • , y ° fc) = { 9 ( r n t j , o . 7 5 \ m % , ■ • • ,r n ° a A a ) , r r ^ tQ, • • • - ^ ( m t j >o.25k°o»-” ^ ? ,fc ) /L349]2> for 1 < k < 5(t), j = 1, • • • , S(t). (4.17) W hat is left to be estim ated in (4.10) is the m atrix <&sttk, which can be computed, using the conditional m ultivariate normal theory, as $ st,k — K t , k + ^ s t ^ s t X k ^ s t i k Hence, the m atrix (4.18) in (4.10) can be obtained from A st}k, for k — 1,2, • • • , Sst — 1. Finally, A styo is the result of applying the same routine recursively, starting with A sttsat- i as in (4.8). If A ^o is conformally partitioned as ■ . . AS)S AS)t \ (4.19) Ast,o — ^ a ^ )S A tj j then its submatrices can be used to obtain the required conformally partitioned m atrix in (4.7), as follows. Take ( Vs,s Vs,t \ V = \K t where Vs,s is the variance of V tt) given a. Clearly, VStS — S SiS of equation (4.5), also Vs,s = ASjS of equation (4.19). Hence, from (4.2), D s 1A St0{Ds 1y, for s = 1 ,2 ,-•• ,m , AS)o, for s = m + 1, m + 2, • • • , m + n. Va,a = 90 The subm atrix Vsj is the covariance of (3g and ]3 given a , of the form D s 1ASit(Dt 1y, for s = 1 ,2,-•• ,m , for s = 1,2, • • • ,m , t = m + l ,m + 2, ••• , m + n for s = m + 1, m + 2, • • • , m 4- n. t = m + 1, m + 2, • • • ,m + n. Noting th a t At,t in (4.19) is the conditional variance of ^ given /?g and a , another version conditional only on a can be taken as D ^ k t it( D ^ y + V ^ V s-JVSlu for t = 1,2, • • ■ , m, for t = m + l ,m + 2, • • • ,m + n. W ith this construction, in Section 4.2.3 below, the m atrix V is shown to be positive-definite. 4.2.2 A ssessm ent tasks and software description The modified elicitation software PEGS-GLM (Correlated Coefficients), th a t is freely avail­ able at http://statistics.open.ac.uk/elicitation, elicits the expert’s conditional quartiles th a t are needed to estim ate the covariance m atrix of correlated pair of covariates. The m athe­ m atical details have been given in Section 4.2.1. The expert is asked whether the regression coefficients of any pair of covariates are dependent in her prior distribution. If so, she will be asked to name the two variables th a t have such dependence. Then she will be shown a panel th a t simultaneously displays two graphs (see Figure 4.1 or Figure 4.2). 91 7" AXtftls s ta g o , Y 'onshocritassc'S .scerw JU lG nalqfiartiT e-sateaellfenotG f (STpSgftt) " . JnL*3 pn tfia to rro r p a n<3rtj*srt?n s e q ts s m c n s o f m e d ia n s e f (H eig.. HI : f3* E A T«dts RSp P re v io u s m edian v a lu e s o f H eight 0.9 0.8 0.6 0.5 0.4 0.3 0.2 °0.0-1 120.0 190.0 260.0 330.0 H eight Eliciting Q uartiles of Y fo r v a lu e s of W eight conditional on p re v io u sly a s s e s s e d v alu e s of H eight 0.9 Q.e 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 10.0 43.3333 23.0 71.6667 100.0 Figure 4.1: Assessments needed in the first phase for correlated covariates The upper graph of the panel is for one variable of the correlated pair. It shows the previously assessed m edian values for th a t variable, denoted by i = 1,2, •• • , m + n, j = 1,2, ••• ,S(i), as in equations (4.9) and (4.15)-(4.17). The expert is asked to assume th a t these median values are the correct values of Y at the given knots. T h at is, they are accurate estim ates of the mean response for the specified covariate values. Conditional on this information, the expert clicks on the lower interactive graph to assess new conditional quartile values, denoted by 0.25 and r a ^ 0 .7 5 , * = 1,2,**- ,m + n, j = 1,2, ••• ,£(«), in equations (4.9) and (4.15)-(4.17). The procedure consists of two phases; in the first phase the expert assesses quartile values for the variable in the lower graph given sets of medians for the variable in the upper graph. Specifically, these medians are denoted by m ° 0, • ■• ,m ° fc in equation (4.16). The set of conditioning values of the first variable in the upper graph are incremented by one extra value at each new step. The expert is asked to take account of the additional inform ation and re-assess conditional quartiles. This gives the assessments denoted by m t j to.25 and mt,j, 0.75 in equation (4.16). 92 Step 1 of the first phase is shown in Figure 4.1, where the expert is asked to assess con­ ditional quartiles for different knots of the “Weight” variable in the lower graph conditioning on the previously assessed medians m ^0, of the “Height” variable at its reference knot and one other knot. These two medians are connected by the rightmost (black) line in the upper graph. The conditioning set includes also the median of the “Weight” variable at its reference knot (23.0). The upper and lower (red) curves in Figures 4.1 and 4.2, represent the previous quartile assessments conditioning on fewer medians. Current conditional quartiles are not allowed to lay outside these red lines. This fulfils condition (4.14), which guarantees the positive­ definiteness of the variance-covariance m atrix, as discussed before. Specifying these conditions by drawing boundary lines on the graph makes it easier for the expert to absorb what the conditional values are and what they imply. This helps her apply the idea of reducing uncertainty as conditions increase. The second phase starts after conditioning on the median values at all knots in the top graph, denoted by m® 0, • • • , mP, §^ in equation (4.17). Each further step in this second phase adds an extra median value from the lower graph to the conditioning set. These additional values are m j0, • • • , m ^ k in equation (4.17). Further conditional quartiles m ^o.25 and mt,j,0.75 are assessed in the lower graph and used in equation (4.17). 93 - Y a u a s s e s s fltfy o tirc o n d lY fa n a f t o t r e r q u a r t S o a t (ICO JJ} t o f m ^ t 3 i ) . P tefl50com [>iot© fofoC ficirpofnts. '. . " , .=101*2 Site C A T « ts R^p P re v io u s m e d iin v a lu e s of H eight 0 .9 0.8 0.7 0.8 0.5 0.3 0.2 0.1 0.0 120.0 190.0 260.0 330.0 H eight Eliciting Q uartiles o f Y fo r v a lu e s o f W eight conditional o n p re v io u sly a s s e s s e d v alu e s o f H eight 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 10.0 23.0 43.3333 71.6667 100.0 We|9ht____________________________________________ iLowa-Quartile at iOO.Ol Figure 4.2: Assessments needed in the second phase for correlated covariates This phase is very similar to the assessment of conditional quartiles in the GA m ethod, as reviewed in the previous chapter, where incremented sets of medians of the same variable are used as conditioning sets for assessing conditional quartiles. However, in this phase previously assessed median values at knots for a different variable (R s) are also taken into consideration when assessing conditional quartiles of Rt, where s < t. One of the steps of the second phase is shown in Figure 4.2. In this step, the expert is asked to assess conditional quartiles mt,j, 0.25 and m t j )0 .7 5 , for j = I ,- - - ,4, for different knots of the “Weight” variable in the lower graph. Some of the conditioning values are the previously assessed medians, m ^ , • • • ,m®3, of the “Height” variable at all of its four knots. These are connected by the black line in the upper graph. The other conditioning values are the median, m °0 of the “Weight” variable at its reference knot (23.0). Suggested conditional quartiles are computed by extrapolating from other quartile assess­ ments in the same m anner as in GA method; see the previous chapter. T he middle (green) lines in the lower graph in Figure 4.2 represent these suggested values. On finishing all phases of the assessment for this pair of explanatory variables, the user is 94 asked about other correlated pairs, and the process starts again for the new pair, if there is one. The modified software outputs data in three different files, one containing the basic setup data, the second containing all assessments made by the expert, and the third containing the resulting mean vector and covariance m atrix of the hyperparam eter vector, which are in a form suitable for further Bayesian analysis. 4.2.3 On th e p ositive-d efiniten ess o f th e elicited covariance m atrix After generalizing GA’s m ethod, as shown in Section 4.2.1 above, to estim ate the variancecovariance m atrix of and (3_t , we ended up with V = Var((£ i t)'\a) = (4.20) where S s>s is estim ated using the m ethod of GA. Now Vt,t r1 £(,(■ Instead, with / D ^ 1At,t { D ^ 1y, for t = 1,2, • • • , m, At,t, for t = m + 1, m + 2, • • • , m S?,( = Var(£|£ ,,«) = < + n. To check the positive-definiteness of the variance-covariance m atrix Var((/3^ f^t)'\a), we proceed as follows. First, we will show th a t V in (4.20) is positive-definite. Then we will find a transform ation to replace the sub-m atrix 14,t of V by the directly elicited unconditional variance m atrix This transform ation replaces V with a new m atrix, say A, which will be shown to be positive-definite. Now, in the m atrix V , we have: • From (4.4) and (4.6) E SiS is positive-definite, since AS)o is positive-definite as shown in the previous chapter, and from (4.3), D s are lower triangular for s — 1,2, • • • ,m . • £ £ t is positive-definite since it was computed in the m anner of BS)S, above. 95 • Since E SiS is positive-definite, so is E s *, and VgtE s lV Sft is sure to be positive semidefinite. In fact, Vx ^ 0, from the positive-definiteness of E s ]. • Vtyt is thus the sum of a positive-definite and a positive semi-definite m atrix, hence V^t is positive-definite. For V to be positive-definite, we use the Schurr complement (Abadir and Magnus, 2005, p.228) to show th at is positive-definite, which is the case. We believe th a t the subm atrix E^t is b etter th an Vtyt as an estim ate of Var(/3Jai). Note th a t Vt}t was computed by conditioning on both a and /^ . Our aim now is to introduce a new m atrix, A, conformally partitioned as, A = s t,tJ to replace V, where we believe A will generally be a better estim ate of the variance-covariance m atrix of (/? £ ') '|c r To this end, put / I O B = 96 \ and take A = B V B ' . Then / o £ S)S VSit A = Vo \K t ( \ ( o y t,tj [ O W \ Z 2t Vt,t 2v j tt o \ v ? z l \ 4 E l t \ V tft2v tltv u 2J E ty s S)S A S£ ) y A S,t j _1 1 with A Syt — VS)tVt t 2'E2t . We next investigate whether A is necessarily positive-definite. Since ES)S and E ^ are positive-definite, A is positive-definite, using the Schurr comple­ ment again, if and only if Ast, S s,s is positive-definite. But t- i Ss,s - A s^ A ' s , t = £ SlS - I Vs,tVt/ E l t ] E t7 ( El t v t/ v : >t = s S)5 - va,tv -* (E^S^eM v j v ' , t ^ s - V s M fv ^ . Thus E S)S —A s f^ t- A - s t is positive-definite from the positive-definiteness of the m atrix V. It can be simply seen also from the m atrix equation A — B V B ' th a t A is positive-definite since V is positive-definite, and B is non singular (Abadir and Magnus, 2005, p.221). Now, although each variance-covariance m atrix A for any pair of correlated vectors of coefficients, has been shown to be positive-definite, some extra conditions m ust be imposed for the whole variance-covariance m atrix A in (4.5) to be positive-definite. For th at, a structural elicitation m ethod should be applied to the whole m atrix. In which case, a huge num ber of conditional assessments will be needed to inter-relate all pairs, even though m any of them may be slightly correlated. This puts an extra assessment burden on the expert and there may be no real gain. 97 However, the power of this m ethod is apparent when only one pair of vectors is highly correlated. Another good situation for its application is when there are only a few correlated pairs and the whole variance-covariance m atrix can be re-arranged so th a t these are 2 x 2 partitioned matrices on the main diagonal and off-diagonal covariance matrices are zeros. The whole m atrix is sure to be positive-definite in this case. The expert should, of course, be willing to use the proposed m ethod to elicit each main diagonal 2 x 2 partitioned m atrix by assessing all the required conditional quartiles. Although the variance-covariance m atrix cannot be guaranteed to be positive-definite when there are many correlated pairs of vectors, it can still be checked for positive-definiteness. The expert may be asked to review her assessments, if needed, to fulfil the property. How­ ever, we propose another elicitation m ethod in the next section th a t not only fulfils the positive-definiteness of A in (4.5), but which also requires a smaller num ber of assessments. We also combine the two m ethods to give a flexible approach in which the expert assesses the variance-covariance m atrix for the highest correlated pair of vectors using the current m ethod. She then assesses the relative correlation of other pairs of vectors in comparison with the most highly correlated pair of vectors. These relative correlations are scaled to give the whole m atrix. The details of this approach are presented in the next two sections. 4.3 A n o th er elicita tio n m eth o d for th e variance-covariance m atrix o f correlated coefficients One possible drawback of the elicitation m ethod proposed in Section 4.2 is th a t the num­ ber of conditional quartiles th at the expert m ust assess will become uncomfortably large, if many pairs of covariates are thought to be correlated. For such situations, another m ethod is proposed here to elicit the off-diagonal covariance matrices. It uses a small num ber of coefficients to reflect the p attern of correlation between pairs of vectors and this reduces the number of assessments th a t are required. At the same time, the assessments can be used to induce all the elements of the covariance m atrix and, under suitable conditions, the resulting variance-covariance m atrix is positive-definite. These conditions can be translated into allow­ able ranges shown to the expert on an interactive graph; the expert will be asked to restrict her assessments so th a t conditional medians lie inside these ranges. The m athem atical details of the proposed m ethod are given in Sections 4.3.1 and 4.3.2 below. The required assessments for the equations in these two sections are discussed in detail in Section 4.3.3, where the use of the interactive software to obtain the conditional medians is also discussed. 4.3.1 T he case of tw o vectors o f correlated coefficients To reduce the num ber of required assessments for estim ating the covariance m atrix of any correlated vectors (3 and /^ , we assume a fixed p a tte rn of correlation between the elements of these two vectors. We must make some simplifying assumptions about the correlation between these vectors. If the variance-covariance m atrix of /5^ were the identity m atrix and the same were true for (3 , then it might be reasonable to assume th a t any component of /3 had the same correlation with each component of (3 , and vice-versa. — £ — 5 variances of {3 and (3 are not identity matrices. Instead, we transform Of course, the and /3f into £s Var(£t) = Is(t)- Then we assume th a t the and £ , respectively, for which, Var(£s) = correlation coefficient between any element £S)i (i = 1 , 2 , • • • , £(s)) of £s and any element £t,j (j = 1,2, • • •', 5(t)) of £ is a fixed number, cs>t. We elicit the value of cS}t using a small number of conditional assessments. The matrices Var(/?s) = E S)S and Var(/3 ) = E*^ may be estim ated using the m ethod of GA th a t was reviewed in Chapter 3. These m atrices are positive-definite, so there exist non-singular matrices A and B such th at AE SiSA ' = I 5(s), = l$(t). In fact, we take A and B as the inverse of the two unique symmetric positive-definite square roots th a t can be obtained from the eigenvalue decomposition of E S)S and E^t, respectively, 99 i.e. A = S sJ , Let £s = A(3_s and £t = B 0 t , then We assume th at Cs,t ''' Cs,t \ — Cs,t i<5(5) ^ 5 ( t ) ' C m (i S’Q = c >* = \ c s ,t ’•• (4.21) CS)t J S(s)xS(t) So th at ( \ A6S £ l 8 (s) MVN (4.22) ^S{t)J \ B b tj U</ CS)t Assume further th a t E (Lt \is = A - s + n ) = B ^ t + !L> (4.23) where r)s = (^s ^ ... T^y = T)s 1, for an arbitrary chosen value rjs > 0, fit = (et ot ■■■ et)’ = fit IB ut it is known, from the conditional m ultivariate normal theory, th a t £ (!,!£ , = ^ + 2 ,) = - c <A w K - (Ab-s + 2,)] = B h + (4 -24) Thus, from (4.23) and (4.24), we get fit = G 's,tlIs (4.25) The expert will be asked to determine the conditional mean of £ given a specific value of £ , hence the value of Qt will be computed from the expert’s assessment of -^(£J£5)- I*1 100 fact, the expert assesses only conditional medians of Y, which are then transformed, under normality assumption, into conditional means of the slopes of the piecewise-linear relation, or bar heights for factors, as will be detailed in Section 4.3.3. From (4.25), the value of csj is simply estim ated as c s,t It will be shown th a t V ar(^' (4.26) 5(s) x rjs ' p')' is a positive-definite m atrix if, and only if, (4.27) v 'w x m Using (4.26), this condition can be w ritten in term s of 9t, as M \ 9t \ < r j s ‘ (4.28) s(ty To prove (4.27), note th a t ( = Var V = Var i (4 \ Csj S ^s,s 1 S2 1 V*2 t ,t ° S,t s t,t Since, E S)S and E^t are both positive-definite matrices, V is positive-definite, using the Schurr complement, if and only if E S)S - si Cs,tC'sj E i - E i s (IS(S) - Cs,tC ')t) si (4.29) sit =sit (im - CSitc.,t) sit (4.30) is positive-definite, or equivalently Sm-S*5, is positive-definite. In other words, from (4.29) or (4.30), V is positive-definite if and only if \ h{s) Ca,t h(t) J is positive-definite. and note th a t G is a symmetric idem potent m atrix w ith rank(G) = trace(G ) = 1. Then F can be w ritten as F = h { s ) - c s,t c ; it = I 5{s) - 5{s)5(t)c\tG = h(s) —G -f- G —5{s)5(t)(?s t G = (l 8 {s) - G ) + ( l - 5 ( s ) 5 ( t ) c l t)G = a i{Is(s) ~ G) + CX.2 G , with oil 1, a.2 = (1 - 5(s)S(t)c2). As both G and (Is(s) — G) are idem potent m atrices summing up to /<5(s), the eigenvalues of F are 0:1 = 1, w ith multiplicity rank(/6(s) — G) = tra c e(/ 5 (s) — G) = <5(s) — 1 and 0:2 = (1 —5(s)5(t)c11) with multiplicity one. Hence, the necessary and sufficient condition for the m atrix F, and consequently for V, to be positive-definite is th a t both a\ and 0:2 m ust be positive. Since aq = 1, the m atrix V is positive-definite if and only if (1 — 5(s)5(t)c^t) > 0, which gives the condition (4.27). The same condition can also be deduced from the quadratic form of the m atrix F. First, recall, from Cauchy’s inequality, th at / n \ ^ n (X >i) \i= l / i= 1 102 Then \/x ^ 0, 5(5) x ' F x = ^ ( 1 - S(t)clt)x 2i + Y ^ ( - 5 ( t ) c 28tt)xiXj i=1 #3 5(5) 5(5) = Y ^ x i - 5{t)c\t i=1 5(5) i=l 5(5) = ' 5 2 x i - 5(t)<Z,t i=1 ^2 Y^xi i= 1 5(5) 5(5) > 5 ^ ® ? - 5{t)S{s)clt Y ^ ^ i—1 i=l 5(5) = (l-< 5 (s)£ (t)c 2* ) ^ : r ? . t=i Since ]T)i=i x2 > 0, F is positive-definite if and only if (1 —5(s)6(t)c*j) > 0. 4.3.2 T he case o f various vectors o f correlated coefficients W hen there are more than two correlated explanatory variables, the m ethod given in Sec­ tion 4.3.1 is still valid. We next obtain a set of n(n — l) /2 conditions th a t are necessary and sufficient for the full variance-covariance m atrix to be positive-definite, for any num ber n > 2 of correlated explanatory variables. The num ber of assessments required for eliciting a variance-covariance m atrix using this proposed m ethod when n > 2 is only n (n — l)/2 . The case of n = 2 has been considered already. For n = k > 2 explanatory variables, let &i = E Vi = Var W Assume th a t Vi, i — , for %= 1,2, W 1 , 2 , . . . , A: — 1 , have been obtained and th a t they are known to be positive-definite matrices. 103 Let V Pn for i = 1 ,2,..., k — 1, , W i-k ~ ^k,k d k ’ with Sfc,fc = Var(/3fc) . We assume th a t Ck.i ^ 1 Ck,2 \ C k , k -1 (4.31) y where Ck is a m atrix of order (X)i=i ^(0) x ^(^)> an(l th at each Cfc.i is a subm atrix of order £(z) x 5(k), taking the form ( Ck,i ' ’’ Ck,i \ , Ck,i = \^k,i '’‘ for i = 1,2,..., k — 1. Ck,i J Then ( ( Uifc-i A u MVN <j \ \ (4.32) -i \ * U Sfc-1 / \\ (4.33) S fc,fc -A: / Now suppose th at E (L \L = V P et + r M) = S “ * 6* + 0_k<i, for %= 1 ,2,..., k - 1, where 2* / ’ ~ k j ~ ( Tlk,j Vk,j r}k .)' = I k j 1, j = 1,2, 104 for arbitrary chosen rjk,j > 0, (4.34) —k,i ~ Ok,i (flfc.i ' 1' Ok^y — Qk,i !■ _1 The process will consist of k —1 steps, at the «th step, an elicited value of E (£ k \£. = Vi 2 ei + r k i ) will be obtained. This can be done by asking the expert to assess conditional median values of Y th a t can be transformed, under the normality assumption, to conditional means of the slopes of the piecewise-linear relation, or bar heights for factors, as will be discussed _i in Section 4.3.3. We can then obtain the conditional medians E (£k|£ = Vi 2 e{ + r k)i) in (4.34) from E ( ^ k \P1, P2, • • • ,0.). The conditional values of through a set of i graphs, each of which gives a value for a different displayed 0 , j = 1 ,2,..., i. Moreover, from the conditional m ultivariate normal theory, equation (4.33) gives E ( i k\ = &i + rk,i) = ^ k h + (c'kil c ■■■ 'ki2 (4-35) Then, from (4.34) and (4.35), we get c 'K2 ■■■ qpa,<■ (4.36) Hence, after finishing the k — 1 steps, the following system of equations can be formed 0fc,i = K ^ ) ck,lT]k,l, Ok,2 =^(l)Cfc,l7/jfe,l + 5(2) Ck,2Vk,2, Ok,k-1 = 5(l)cfc,ir/fc,i + 5(2)cfc)2?7fc,2 + To solve for c k}i , h 5(k - l)ck^i7]k,k-i- i = 1,2, ...,k — 1, the system can be w ritten as Ok, i C/c,2 0 k ,2 \Ck,k-iy \Qk,k-i J n 105 (4.37) where 6(l)rik)i 0 S(2)r)kt2 n = (4.38) ^(2)^,2 ••• S{k - Provided th a t r)kj ^ 0, \/ i = 1,2, ...,k — 1, the m atrix / Ck, 1 l)7]k, k - i J is non-singular and hence \ Ck,2 = n- i \ c k>k- 1 y @k,2 (4.39) y^fc,fc-i y Now, the variance-covariance m atrix Vk can be estim ated as follows: 'a' I Vk = Var & I \ = Var I U -: \ ^k ,k C'k Vk- 1 S M fib / \ C* (4.40) / \ik/ We define the matrices E^fc, for i = 1,2, • • • , k — 1, th a t conformally partition V]fc2_ 1 Ck E | fc as ! v^i,fc \ 1 Ck,1 ^ Ck ,2 ■<2^ n ,k= v h c k fc-i \pk-l,k y (4.41) yCjfc,fc-l y Following the same steps as in the case n = 2, and since Vk- \ and Hkjk are positive-definite matrices, equations similar to (4.29) and (4.30) show th a t Vk is positive-definite if and only if the m atrix (l \ C \ Ck J5(k) J 106 is positive-definite. P utting Fk = h{k) — CkCk k- 1 i=l — <5(&) where Gk is idem potent of rank 1, it can be shown th a t Fk = (h(k) - Gk) + ^1 - £ > ( * 0 ( 5 ( 0 4 ,^ Gk is positive-definite if and only if ^~ (4.42) >0- This condition implies A: — 1 conditions for ck,i, i = 1,2, ...,k — 1, of the form l - ^ 2 $ ( k ) 5 ( j ) cK ■3=1_____________ (4.43) <S(i) x 5(k) These k — 1 conditions guarantee th a t the elicited m atrix Vk in (4.40) is positive-definite, provided th a t Vk- i is positive-definite. Since V2 is known to be positive-definite from Sec­ tion 4.3.1, we can use m athem atical induction to prove th a t the full variance-covariance m atrix Vn is positive-definite, as follows. For any number (n > 2) of correlated vectors ^ , 0 2, • • •, (3 , the whole m atrix Vn = Var ( ~\ a I y, •^1,1 & £ 2,1 \JLn) y , ^-'1,2 ^ 2,2 ^n,2 ’ •• „ -^1,1 \ ^2 ,r ' ’ ‘ Sn,n J is certain to be positive-definite if (a) the n — 1 conditions in (4.43) hold and (b) Vn- \ is positive-definite. This imposes an extra i — 1 conditions on each m atrix Vi, (i = 2, • • • , n —1), so th a t each Vi is positive-definite. Then Vn is positive-definite under a num ber of J2k = 2 ^ ~ 107 1 = ]Cfc=i k = n(n — l) /2 conditions of the form: i —1 Using these conditions, the range of each 6 k,i, for i = 1,2,...,& — 1, k = 2 , 3 can be computed and shown to the expert who can ensure th a t her assessed values fall within these ranges. This will guarantee th a t the estim ated variance-covariance m atrix is positive-definite. For i = 1,2, ...,k — 1, from (4.36) and (4.43), the range of 6 ^ is given by ^2 5 U)ck,jVk,j 1- ± Vk,-, 5 (k )s U)cl This formula for the allowable range of 6 k,i has a drawback: we cannot calculate these ranges until quite late in the assessment procedure, so the expert may sometimes be asked to revise assessments th a t she made some time earlier. Hence, we decided to find a different approach th a t gives a more direct range for each 6 kti, and which only asks the expert to modify recent assessments th a t she has made. At step z, when conditioning on the value of .£ , the expert may be asked to modify the assessment she has made in step z —1, b u t she will not be asked to modify assessments she gave at stages before th at. This can be formulated as follows. Instead of equation (4.34), let (4.46) (4.47) j =i where ■■■ %k,j = (Vkj Vkj Qk,i = idk^i Qk,i r t k / ’ • • • Vkj)' = ''• = -1’2’ ®k,i) ~ ®k'i - 108 for arbitrary chosen r)k,j > 0, In this case, using standard results of conditional expectations, we get S fc.fc h + F a , - s k.k h + C'kZk'i, j =i for z — 1 ,2 ,..., k — 1. (4.48) Then equation (4.36) becomes (4.49) which gives Hence (4.50) The positive-definiteness of the whole variance-covariance m atrix Vn is still guaranteed under the same conditions in (4.44). B ut the allowable range for each O^i (i = 1,2, — 1,; k = 2,3, ...,n) has the simplified form, l-^ c5 (fc)S (j)c| This represents a simple range for 6^ , in comparison w ith (4.45). (4.51) The range in (4.51) depends only on the change rjk,i in the zth variable, £ , not on the changes rjkj in all variables ij, 3 = V " ) * - 4.3.3 as in (4.45). A ssessm ent tasks • The current assessment tasks start after eliciting all variance m atrices E ^ (i = 1,2,..., k). • For any pair of correlated vectors (/?s, Pt), we assume th a t (4.52) where Cs>t is given in (4.21) and E SjS and E t,t are the variances of Ps and £1 , respectively. • The expert will be shown a panel th a t simultaneously displays two graphs and a slider (see Figure 4.3). For continuous covariates, the upper graph of the panel shows the piecewise-linear relation between Y and X s. The slopes of the black (lower) curve 109 represent bs = i?(/?s), while the slopes of the blue (upper) curve represent the change i i of E( P s) by S |)S77s, i.e. the slopes of the blue (upper) curve are bs + £ | )S?7s. The black (lower) lines represent the expert’s original median assessments bu t she is asked to suppose th a t the correct values are actually the blue (upper) lines. Given this information, the expert is asked to use the slider to change the position of the black (middle) curve in the lower panel so th a t it gives her new opinion about the median value th a t Y will take as X t varies. The m agnitude and direction of the change reflects the correlation between @s and ^ . Figure 4.3: Assessments needed for two correlated variables • The two red (outer) piecewise-linear curves in the lower panel of Figure 4.3 represent the allowable boundaries for the change of /? ; these boundaries ensure th a t the resulting variance-covariance m atrix is positive-definite. The boundaries are calculated from the condition given in equation (4.28). Moving the slider simultaneously changes the position of all the medians of Y in the lower panel. W hen the expert is happy w ith the new position of the curve on the lower panel, the corresponding value of the slider is 110 used to compute cS)t, as will be shown later. • The expert is asked to assume th a t the slopes of X s, in the upper panel of Figure 4.3, i have changed from bs to bs + Es)S?7s. Conditional on this information, she revises the i slopes of Xt, in the lower panel, changing them from bt to bt + E f2^ . The exPert changes all the slopes simultaneously using the slider. • The size of the change, 7]s, in the conditioning variable, X s, in the upper panel, is chosen such th a t the vertical distances between the two piecewise-linear curves in the upper graph do not exceed the upper quartile at any of the knots of X s. This ensures i th a t the new conditioning values bs + E f)S77s are not too far from bs, as they have to be values th a t the expert finds plausible. This choice is also not too close to bs, so it should prom pt a measurable change in bt in the lower panel of Figure 4.3. i The software calculates medians to draw a piecewise-linear curve w ith slopes bs + 'Es,sVsFor i = 1,2, ••• ,S(s), the median value of Y at each knot i, m s,i,0 .5 , is changed to m li, 0.5> as follows. First, let m s, 0 ,0 .5 , m s ,0,0.5 = and d i ,i —1 — 1"s,i Ts , i —1- Then, for i = 1,2,- • • , 6 (s), we put K i , 0 . 5 - < i - l A5 = ^ + di,i-1 M s,* ,0.5 — 7715,1-1,0.5 . / v ,5 \ = ----> _!------*->---- + rjai^lsH, di,i- i I I where (EJ,s)j is the sum of the elements of the ith row of E f)S. Hence, i m s, i , 0.5 = ^ 5,1—1,0.5 "i- m s , i , 0.5 111 ~ 1,0.5 T 1(^s,s)i- (4.53) If X s is a factor, then i m s,i, 0.5 In view of ( 4 .5 3 ) and ( 4 .5 4 ) , r)s = m s,i, 0.5 + r]s( E l s)i. ( 4 .5 4 ) r)s can be chosen as = m in i I ™>s,i,0.75 - 771a,i,0.5 \ — — ------------------r — , A CEv , ( 4 .5 5 ) \ J 2 j= i d j,j-1 (Es,s)j J for continuous covariates. For factors, it can be chosen as T)s - m in i I m g i 0.75 - »,0.5 1 — - ---------1-------- LJ— V (£?«)< m t,i,0.5 = m t,i- 1,0.5 + ( 4 .2 8 ) , "li, 1,0.5 - ( 4 .5 6 ) 1 In order to draw the red (outer) boundaries in Figure bounds, m^i, 0.5 an(^ m t,i,0 .5 - From • 4 .3 , we require upper and lower if X t is a continuous covariate, we put mt ,i- 1,0.5 + Vs J ^ y d i,i- i( S ^ )i, (4.57) and S(s) ^ M ,0 .5 = ^ M - 1 ,0 .5 + " it ,i,0.5 - - mt,i- 1,0.5 - W (4.58) If X s is a factor, we put " l M,o.5 = ™ M ,o.5 + ( 4 -5 9 ) m t,i,0.5 = m t,i,0.5 ( 4 -6 ° ) and ~ V s]J j^ {^ lt)i- Using the slider, in view of (4.27), the expert changes the value of cS)t between its two boundaries, ± l / y /S(s) x S(t). To be interpretable by the expert, the slider presents a scaled range between -1 and 1 as a measure of correlation between /3 and ^ . Hence cSft = The slider value / \/S(s) x 5(t). The corresponding new curve, say Tn't i 0 5, is interactively changing w ith each movement of the slider. For continuous covariates, m't i Qi5 is computed after m 't ^ _ 1 0 5 has been calculated: i m t,i,0.5 = m 't,i-i,0.5 + m t,i,0.5 - m t,i- 1,0.5 + Cs j d i' i -i & l J i. 112 (4.61) For factors m t,i,0.5 = + Cs,t(E t,t)i- (4 -6 2 ) W hen the expert is happy with the new position of the curve, the value of cStt is used in (4.21) and (4.52) to calculate the covariances between @s and (3 . • For k > 2 correlated vectors of coefficients, the process will consist of —1 steps. At the zth step, the expert will be asked to change the conditional medians of ((3k\(31 , /?2, • • • , by a value of 9 ^ given a set of i graphs, each of which shows a change with a different fixed value rjj for each /? , j — 1 ,2,..., z. • However, we choose not to offer this general case as an option in the interactive soft­ ware. Although it has been shown to have a consistent m athem atical framework and adequate theoretical properties as proposed in Section 4.3.2, its practical implementa­ tion may raise some critical issues in the elicitation process. Conditioning on simulta­ neous changes in many graphs for different variables gives too much information for an expert to readily absorb. She may not be able to assess the direct conditional impact of these changes on the variable of concern. • Another difficulty arises in choosing the different values rjj, j = 1,2, • • • , 2 , th a t control the change in the conditioning set used in step 2. These values m ust be carefully specified so th a t the resulting simultaneous change represents a valid combination of values th a t is acceptable by the expert to condition on. • A general problem in successive increment of variables in the conditioning set is th a t the allowable range of medians at the variable of concern gets tighter as we approach the last variable in the list. This problem is not only a practical one, but it has also been shown th a t variances, and hence covariances, of the last variables in the list are usually over estim ated by the expert due to incremental conditioning (Garthwaite, 1994). These drawbacks constitute the motivation for the third elicitation m ethod proposed in the next section. 113 4.4 A general flexib le elicita tio n m eth o d for correlated coef­ ficients The aim here is to form an elicitation m ethod suitable for GLMs th a t contain a large num­ ber of correlated vectors. We propose the following elicitation m ethod as a promising new approach for eliciting the whole variance-covariance m atrix. It uses only a small num ber of assessments th a t directly reflect the p attern of correlations between all pairs of vectors. The m ethod avoids the previously mentioned disadvantages of using incremented condi­ tioning sets of variables. Instead, the m ethod treats all variables symmetrically. As w ith the m ethod proposed in Section 4.3.1, it assumes a fixed correlation structure for the ele­ ments of each pair of vectors. The current m ethod differs from the generalization proposed in Section 4.3.2, in th a t it avoids incremented conditioning and assesses all covariances si­ multaneously. The main idea is th a t the expert assesses the relative m agnitudes of the average corre­ lations between each pair of vectors. She is asked to ensure th a t these weights reflect the strength of the average correlation of each pair relative to each other pair. The expert need not be conscious of conditions th a t are required for m athem atical coherence. Instead, the assessed relative weights will be scaled to ensure th a t the assessed variance-covariance m atrix is positive-definite. The current m ethod can be used alone or together with one of the two m ethods proposed before in this chapter. In the latter case, the current m ethod needs an assessment of the correlation of only one pair of vectors, then all other correlations are computed using the relative weights. This correlation assessment may be obtained using the m ethod proposed in Section 4.2 or the m ethod proposed in Section 4.3.1. W ith the latter m ethod the expert might use a slider to adjust the slopes of one vector of a highly correlated pair. In what follows, the m ethod is introduced in detail and the scaling needed to obtain a positive-definite m atrix will also be investigated. 114 Assuming th a t all the k covariates are correlated, let t = 1 ,2 ,- .. ,fc, (4.63) then Var( 0 = h(i), i = 1,2, •••,&. (4.64) For all z = 1,2, • • ■, k, j = 1,2, • • • ,k, i ^ j, we assume th a t ( CiJ \ ' •• C1,3 in (4.65) = CiJ = \°i,j * jj ••• 6(i)x5(j) w ith Cjti = Cij. Then C o v (g .,£ ) = X f - C i j Z l j , (4.66) and hence V = Var = AJ" z . z G a "Iz , z , (4.67) i i where A £ .. is a block-diagonal m atrix with S ? i as the zth main diagonal block and C = C \ tk I 6 (1) C l ,2 C 2 ,l I 8(2) '• : • • Ck—l,k • \Ck, i ••• Ck,k-1 (4.68) h(k) with C u = C ij. 1 1 Since each EA is positive-definite, so is A |.... Hence, we can state th a t V in (4.67) is positivedefinite if and only if C in (4.68) is positive-definite. 115 For i = 1,2, • • • , k, j = 1,2, • • • , k, i ^ j , let Cij = cwij, (4.69) where w^j are the relative weights to be assessed from the expert and c > 0 is a fixed scaling constant th a t adjusts to ensure th a t C is positive-definite. The m ain assessment task w ith this m ethod consists of one dialogue box. An example is shown in Figure 4.4. The expert assesses the relative m agnitudes (weights) and signs of different correlations between all pairs of vectors. Since the correlation m atrix m ust be symmetric, we just require the elements below the m ain diagonal to be assessed. Hence, when there are n vectors of coefficients, we require n ( n — l) /2 assessments for this stage. The main diagonal elements are necessarily set equal to ones, as C is a correlation m atrix. E n te r y o u r re lativ e c o rre c tio n s : X1 C cw artate: XI X2 X3 X4 X5 r n c f [ X2 1 r [ i N ex t» 1 X3 X4 xs 1 1 1 ! 1 ....... 1........ 1 I l 1 j 1 j~HeipTj Figure 4.4: Assessments needed for five correlated variables The relative weights th a t are assessed in this task need not be coherent correlation co­ efficients. For example, they are not necessarily restricted to be between -1 and 1. Instead, any assessed numbers are accepted; they m ust simply reflect the m agnitude of the correla­ tion between any pair of vectors relative to other pairs. Negative values are allowed and are appropriate when an expert believes a correlation is negative. The expert is asked to assess a single weight for each pair of vectors. The weight should reflect her opinion about the average correlation between all pairs of elements in th a t pair of vectors. 116 The relative weights assessed in Figure 4.4 will be denoted by w*j, where w*j corresponds to the fixed average correlation between all elements of and (3 . The expert is asked to ensure th a t the relative m agnitudes of w*j, i = 1 ,2 , • • • ,k, j = 1 ,2 , • • • ,k, i > j, model her opinion about the relative correlation of each pair compared to the others. As mentioned before, wf 4■will be scaled later to attain m athem atical coherent values of correlations. For m athem atical simplicity, we use the weights, Wij, of correlations between and £. when investigating the conditions required for the scaling constant c in (4.69). However, we assess the weights w*j in terms of (3. and , as the expert cannot think about correlations between the transform ed vectors —i £. and —j £.. Hence, we need an explicit relationship between Witj and w*j. We obtain one as follows. For i = 1,2, • • • ,k, j = 1,2, • • • , k, i > j, let <j = » g (4 '7°) be the scaled average correlation between (3. and (3.. Then —i —j r* - £ r = l S S l[C o v ( A ,r ,ft> )/o -r ^ ] 5(i)5(j) ’ 1 ; where, as in (4.66), Cov(/lyr ,/3j)S) is the (r, s) element of CovQT,/T), and oy and as are the square roots of the r th and sth main diagonal elements of E^i and Ej j , respectively. Hence, from (4.65), (4.66) and (4.71), E 2 ? iE £ S K ./< v r.] ' : : 'r (4.72) <5 1 i where ar>s is the (r, s) element of E ^ l ^ j ^ E j C , i.e. Ci i = S1,i3 S(i) ^ S ( j ) r / . 1 (4.73) X]r'=l H2s=l\.ar>s/arVs] So, in view of (4.69) and (4.70), we have IVii = w 1, 3 Jh 3 „• ^ 8(i) ^ 6 ( j ) r , ! * (4.74) X 3 s = 1 [°rr',s/crr'Crs] It remains now to investigate the allowable range for the positive scaling constant c, so th a t C in (4.68), and consequently V in (4.67), are positive-definite. 117 First, from (4.69), we write C in (4.68) as C = I + cW, (4.75) where I is the identity m atrix of order X3j=i 5(j), W is a conformally partitioned m atrix with main diagonal zero block matrices, and all the elements of each (i,j) off-diagonal block are equal to Wi j . Let Aw, i, i — 1) 2, • • • , Y l j = i ^C?)j b e the eigenvalues of W . We have th a t min(Avy,i) < 0, 1 i since if not, W with zero main diagonal elements will be a nonnegative-definite m atrix, in which case wf j < WijWjj = 0, Vi ^ j, which is true if and only if W is a zero m atrix. Since I and W are symmetric, C in (4.75) is positive-definite if and only if all its eigenvalues, say Ac , i , i = 1,2, • • • , Yj)= i are strictly greater th an zero. But k ^C,i = 1 + i — 1,2, • • • , S(j). (4.76) j=i Consequently, C is positive-definite if and only if min(Ac'i) > 0, % i.e. if and only if C < . 7 '1 V min(Aw,i). i (4 -7 7 ) The condition in (4.77) guarantees th a t C and V are positive-definite, and also th a t Cij = c w i j , i j , are coherent correlation values, since, from the positive-definiteness of C , c i,j The software obtains the value of < ci,icj , j ~ 1- using the eigenvalue decomposition of the m atrix W. Then the boundary of c in (4.77) is computed. 118 W ith the software, different options are available to the expert for assessing a value of c th a t fulfils condition (4.77). The default option is to use a slider. The expert chooses the value of c th a t represents her opinion on the basis of interactive graphs. Specifically, the software displays a panel w ith k graphs, as illustrated in Figure 4.5. G iven th e c h a n g e s o f (Y) o n th e u p p e r p a n el, g iv e y o u r ne w a s s e s s m e n t s o n a ll th e low er p a n e ls using th e slid er. tntdi&ns of (Y) a t valu o s of <X4) conditional o n tfio a b o v e c hange* of (XI). Figure 4.5: Assessments needed for various correlated variables The upper graph shows the slopes for one continuous covariate after each of its slopes has been changed by a fixed amount, 77. This covariate is one of the m ostly highly correlated pair of vectors. In the same m anner as in Section 4.3.3, the expert is asked to assess the new medians of all other k — 1 covariates (factors) given the change in the above graph. A part from the condition in (4.77), other equations needed for drawing the graphs are exactly as in Section 4.3.3. Instead of using the slider and all graphs in Figure 4.5, another two options are also available to the expert after assessing the relative weights w*j. As the first option, the expert can choose to use the m ethod proposed in Section 4.2, to elicit different covariances for the elements of the highest correlated pair, say and (3 . An averaging argum ent as in (4.71) 119 is then used to get c*>t. As the second option, the expert might use the m ethod proposed in Section 4.3.1 to obtain c* t . In both cases, the value of c may be taken, for a small e > 0, as c = mm min(Aw;i) (4.78) The expert may choose the option th a t suits her most. For example, the option th at combines the current m ethod with the one in Section 4.2.1 is flexible although it requires more conditional assessments. However, we favour the default option as it gives the expert a good chance to see how all the other covariates are affected by her choice of c. The expert can, of course, go back in the software to change her assessed values of w*j, if she finds th a t the allowable range of c is not a reasonable representation of her opinion. 4.5 C oncluding com m en ts Three different methods for eliciting expert opinion about the variance-covariance m atrix of correlated coefficients in GLMs have been proposed. The first m ethod is the most flexible for modelling correlations between pairs of vectors - it is a good m ethod if correlations are only substantial between a few pairs of variables, while the other correlations are near zero. However, it needs lots of assessments if there are lots of variables th a t are inter-related, and the number may become uncomfortably large. The positive-definiteness of the resulting m atrix has only been investigated in the case of two vectors of correlated coefficients. No clear conditions have been investigated for the positive-definiteness of the whole m atrix if many vectors of coefficients are thought to be correlated. The second proposed m ethod requires fewer assessments and has been shown to be a valid m ethod for any number of vectors of correlated coefficients. Also, the required conditions for positive-definiteness of the covariance m atrix in this m ethod have been investigated. These were translated into boundaries for conditional assessments on the interactive graphs, which helps the expert fulfill the conditions. The disadvantage of the m ethod is th a t it makes 120 strong assumptions about the correlation structure between two vectors of coefficients, and sometimes the assumptions will be inappropriate. The third proposed m ethod requires a smaller num ber of assessments. For n > 2 cor­ related vectors of coefficients, the expert is required to make only n(n — l) / 2 assessments of relative m agnitudes of correlations between pairs of vectors. This leads to coherent es­ tim ates of correlations and a scaled variance-covariance m atrix th a t is guaranteed to be positive-definite. The needed conditional medians can be easily assessed from the expert by the movement of one slider using the available user-friendly software. The m ethod has been shown to give flexible options to the expert as an extension of the first or the second proposed methods. This third m ethod is very promising. It also avoids incremented conditioning and treats all covariates symmetrically. 121 C hapter 5 E liciting prior distributions for extra param eters of som e GLM s 122 5.1 In trod u ction So far, we have completed the process of eliciting the m ultivariate prior distribution for the vector of regression coefficients of any GLM. However, in some common GLMs, such as the normal and gamm a regression models, the regression param eters are not the only param eters in the sampling model. The other param eters in these GLMs m ust be either assumed known or expert opinion about them m ust be quantified in a suitable way. In normal GLMs, prior opinion about regression coefficients can be quantified using the m ethods discussed in the previous two chapters. However, prior opinion about the error variance in normal GLMs must also be quantified to complete the prior distribution of all the model param eters. A limited num ber of elicitation m ethods for error variance in normal linear models has been proposed in the literature. See, for example, Kadane et al (1980), Garthw aite and Dickey (1988) and Ibrahim and Laud (1994). However, these available m ethods have been criticized for using assessment tasks th a t the expert may not be very good at performing (Garthwaite et a l , 2005). The m ethod of Garthw aite and Dickey (1988) elicits a conjugate inverted chi-squared prior distribution for the error variance through conditional assessments th a t depend only upon the experimental error. The expert is required to assess her median of the absolute difference between two observed values of the response variable at the same design point. Then conditional medians of the same difference is assessed given a set of hypothetical data. These two assessments are sufficient to elicit the two hyperparam eters of the inverted chisquared prior of the normal error variance. However, it is better to specify several d a ta sets and get a conditional median for each d a ta set, then different assessments can be reconciled to elicit the two hyperparam eters. In this chapter, we propose an elicitation m ethod based on more than one d ata set of hypothetical future samples. The second task addressed in this chapter is to assess prior distributions for the shape param eter of a gamm a distribution and the scale param eter of gamm a GLMs. Prior dis123 tributions for these param eters have been proposed in the literature [see for example Miller (1980), West (1985) or Chen and Ibrahim (2003)], but no prior elicitation m ethod for these param eters has been suggested. To fill this gap, we propose a new m ethod for eliciting log­ normal prior distributions for such param eters. The proposed m ethod is based on conditional quartile assessments given th a t the m ean of the gamm a distribution is known or has already been elicited. In Section 5.2 of this chapter, we extend the m ethod of Garthw aite and Dickey (1988) for eliciting the variance of random errors in normal GLMs. A novel m ethod for eliciting a lognor­ mal prior distribution for the scale param eter in gamma GLMs is proposed in Section 5.3. The two methods have been implemented as extra options in our elicitation software PEGS-GLM (Correlated Coefficients) th a t is freely available at http://statistics.open.ac.uk/elicitation. 5.2 E licitin g a prior d istrib u tio n for th e error variance in nor­ m al GLM s The m ethod of Garthwaite and Dickey (1988) is based on conditional assessments th a t depend only on the random error to elicit a conjugate inverted chi-squared prior distribution for the normal error variance. In their m ethod, the expert is asked to assume th a t two observations are taken at the same design point. Then she assesses her m edian of their absolute difference - the two observations differ only because of random variation. The m ethod has been also used to quantify experts’ opinion about m ultivariate normal distributions [Al-Awadhi and Garthwaite (1998, 2001), Garthw aite and Al-Awadhi (2001)]. However, it has been criticized for eliciting only the minimum num ber of assessments th a t are required to determine the hyperparam eters. To overcome this, G arthw aite et al. (2005) suggested th a t it is a good idea to elicit more th an one estim ate of the hyperparam eters and to then reconcile these estimates in some way. The aim of this section is to extend the m ethod of Garthw aite and Dickey (1988) by 124 increasing the size and frequency of the hypothetical (virtual) sample d a ta th a t are used as the conditioning set on which the expert is modifying her opinion. Our extended m ethod is designed to elicit a conjugate prior for the error variance in normal GLMs. This will complete the prior distribution structure of these models when the prior distribution of their regression coefficients is elicited using the piecewise-linear model discussed in the previous chapters. However, the m ethod developed here can be used to elicit the prior distribution of error variance in any normal model where the prior distribution of its regression coefficients is totally known or has been elicited using any other elicitation method. The theoretical derivation of the proposed extension is detailed in Section 5.2.1. The implem entation of the m ethod has been programmed as a new option in the PEGS-GLM (Correlated Coefficients) software. The assessment tasks and the description of the procedure th a t implements our proposed m ethod are discussed in Section 5.2.2. 5.2.1 T he m athem atical fram ework and n otation s The normal GLM assumes th a t the link function g(.) in (3.3) is the identity link function, which means, in view of (3.2), th at £ = a + (3\Xi + P2 X 2 + where e is assumed to be a h /3^+nX m+n + e, (5.1) normal random error w ith zero mean and an unknown variance i.e. e~ N (0 ,< r* ). (5.2) A conjugate prior for u\ is the inverted chi-squared distribution [see, for example, P ra tt et al (1995), Kadane et al (1980) or Garthwaite and Dickey (1988)]. Equivalently, we assume th a t <jg.~ Inverted G am m a(z//2,vw/2), 125 (5-3) The aim now is to elicit the values of the hyperparam eters v and w of the pdf in (5.4). To attain this, the expert should preferably be asked to assess values th a t depend only on the random variation. For th at, the m ethod of Garthw aite and Dickey (1988) requires the expert to assess a median value, say <70, of the absolute difference, |£i — C2 I, between two observed values of the response variable ( at the same design point (X i, X 2 , • • • , X n+m). The expert is then asked to assume th a t the true value of this absolute difference is a suggested value z. Given this piece of information, she gives her new median assessment , say <71 , of the absolute difference between two observations for any new hypothetical experiments at the same design point (X \, X 2 , • • • , Xn+m). The difference between qo and the new m edian assessment, <71 , reflects the expert’s confidence in her first median assessment qo. Then both qo and q\ were used in Garthwaite and Dickey (1988) to calculate the two hyperparam eters v and w. To extend their m ethod, instead of conditioning on only one hypothetical datum z, we repeat the assessment of the conditional median for a num ber of s steps. At each step, the condition is on a steadily increasing set of hypothetical data representing the response differences for pairs of experiments at the same design point. At each step j , j = 1,2, • • • , s, the expert is asked to assume th a t a num ber k ( j ) = 2J_1 of experiment pairs at the same design point has given a hypothetical d a ta set of absolute differences, zi, Z2 , • •• , Zk(j)- She is then asked to give her conditional median qj of the absolute response difference of a new pair of experiments at the same design point. In w hat follows, we show how to use these assessments to estim ate a number of elicited values th a t can be reconciled to give a better assessment of v and w. For i = 1, • • • ,k, where k > 1 is any integer number, let Z{ be the difference between the two observed values, £^1 and (^ 2 , of the response variable ( in any two experiments at the same design point ( X v • • • , X m+n), i.e. Z { = Q,i - 0 ,2 - Clearly, from (5.1) and (5.2), given of, the random variables Zi, ■■■, Z k are independent and identically distributed normal variates, i.e. for i — 1,2, • • • , k, (5.5) ZiWe ~ N(0, 2(7^), with the joint distribution f ( z i , --- , z k \cr2e ) — 0 0 < Zi < 00, a 2 > 0. (5.6) From (5.4) and (5.6), the joint distribution of Z\, • • •, Z k and a 2 is given by /_1\ ^ + 1 {uw/ 2 ) v ! 2 /(•*!.• •• ,Zk,ae',v <w ) exp { - 4 ^ [ T , zi +2,' w vi=1 ~~ r (i//2 )(4 7 r )fc/ 2 W / — 00 < Z{ < 00, <j2 i ' , w > 0. (5-7) Integrating cr^ out from the RHS of (5.7), we get v+ k f{z ,Zk \U ,W )= ~2 r ( ( ^ + fc)/2) 1 j . ^ v( 2 w) r(i//2 ) K ( 2 « )) ] fe/ 2 — 0 0 < zi < 00, w > 0, (5.8) which is the ft-variate version of the general three-param eter Student-i distribution with v degrees of freedom, zero mean vector and a diagonal scale m atrix 2w l k, where I k is the identity m atrix of order k , i.e. Z h ' ■’ > (5.9) 2w l k). ~ Now, the conditional distribution of a 2 given Z\ = zi,-- - , Zk = Zk, can be obtained by dividing the RHS of (5.7) by th at of (5.8) to get (v + k ) / 2 1 f { p l \ Z x = z l r -- , Z k = zk\v,w) = T((v + k ) / 2 ) 2=1 k + 1 exp < - 2 vw zj + i=l 127 <T2 , V, W > 0. (5.10) Since the inverted gamma distribution is a conjugate prior for erf, comparing (5.10) with (5.4), we can write (of |Zi = Zi, • • • , Z k = zk) ~ Inverted Gamma ^ ^ (5.11) where vw + 2-1 1 2 (5.12) For j = 0,1, • • • , s, define a new set, Z ^ = C(j),i ~ C(j),2 >°f the response variable differences for two further experiments at the same design point (X lt • • • ,X m+n). The variates in this new set are iid with the same normal distribution as in (5.5). The conditional distribution of ( Z ^ j Z i = z \ , ••• , Z k^ = z k(j)), w ith k(j) = 2J~1, for j = 1, • • • , s, is given by 1^1 ^1) ’ ^k(j) zk(j)) r oo / / ( % ) k e ) X / ( ^ l ^ l = *1> ' • • » J <jf=0 = ZHj))dc7e ■ (5-13) Using the normal distribution in (5.5), and putting k = k ( j ) in (5.11), the integrand in (5.13) is similar to the RHS of (5.7) with k set equal to 1, v replaced by v + k(j) and w replaced by 'iOfc(j) with k set equal to k ( j ) in (5.12). As in (5.8) and (5.9), integrating erf out from (5.13) gives (^(j ) \ ^ 1 ~ i ^k(j) Zk(j)) for j = 1, • • • ,s. Similarly, for j = 0, the marginal unconditional distribution of Z(q) is obtained, from (5.4) and (5.5), as Z {0) ~ tv (0,2w). (5.15) As will be discussed in the next section, under reasonable choices of the conditioning values z h ’ •1 ’ z k(j) j the expert assesses her median of the absolute value for each of the Student-i distributions in (5.14) and (5.15). These are exactly the upper quartiles of the t-variates, from symm etry about zero. 128 Let the assessed upper quartile of by go and for j — R • • • and (Z ^ \Z \ — zi, • • • , Z k(j) = z k(j)) be denoted respectively. If we denote the upper quartile of a standard Student-t distribution with is degrees of freedom by Q u, then we have Qo = (2w) 1 / 2 Q„, (5.16) and Qj ^ ^ k{j)} ^ Qv+k{j)i (5.17) for j = 1, • • • ,s. The aim now is to solve the above pairs of equations for is and w. By division, for each pair, we get qo qj w Q" r W lV 2 . Qv+k(j) -Wk(j). (5.18) Using (5.12), (5.16), we can eliminate w from (5.18), to get qo qj v + k{j) Qi (5.19) Qv+k(j) L^ + « E S ( V ? o ) 2J for j = 1, • • • , s. For each value of j , the assessed ratio of qo/qj is used by the software to search for the value of the degrees of freedom is, say isj, th at solves equation (5.19). To guarantee the existence of a unique solution for v using this approach, two conditions m ust be imposed on the function in (5.19). It m ust be strictly monotonic in v on the interval of concern. For statistical coherence, the assessed quartile, qj, m ust also be above a lower limit, say aj, for j = 1,2, • • • , s. To satisfy the latter condition, we assume th a t there is a reasonable minimum value of the elicited degrees of freedom, say min(z/). Since go has already been assessed, using the extreme value min(z/) in the RHS of (5.19) gives the lower limit of qj, as follows: aj = for j = 1,2, go m in{is) + Q2min(t/) Y!t=Kzi/go? min(z/) + k { j ) , s. 129 (5.20) Setting this limit, we can now investigate the monotonicity condition. In fact, the mono­ tonicity of (5.19) as a function of v is required to ensure th a t there exists a unique value Vj > min(^) th at satisfies (5.19) for qj > aj, j = 1,2, • • • , s. In (5.19), if we put HJ) C3 = X ^ / 2 ° ) 2> (5 -2 1 ) i=1 then the first derivative of qo/qj with respect to v will take the form 3 (g o /g j) V 2 f e /g O ) Q v Q v + m Qi+k(j)(cjQi + ^ 2 dv + 2 v(y r r pSrp -hdW O ' ^ 1 ^))Qv+k{j)] k^j^^QyQv+kti) ~ QuQv+k{j)) ~ k'U)QvQi'+k(j)}' So, for all v > m in(^), 9(qo/qj) <0 dv if and only if Cj < min I" k{j)Q vQ v+k{j) + k(j)){QvQv+k(j) Q v Q u + k(j )) \ QllQv+m-^ +mQWkU)} J _ p / r o o N j’0' Since there does not exist a closed form for the derivative of a Student-t quantile with respect to its degrees of freedom, the values of Cjto cannot be found analytically. Instead, these values have been computed numerically using Maple 14 Software, for s = 5, v G [1,50]. Figure 5.1 lists these values of Cjto, where the derivative d(qo/qj)jdv is plotted against v and Cj , for j = 1,2, ••• ,5. 130 0J3&0.0?J OOH O.Q5-,' 0.04-1 0.03-/ 4 0U341 J 0 .0 2 -J 0 .02-1 0.0 i-J For fc(l) = 1, C i ;0 = 1.626. For k( 2) = 2, C 2,0 = 3.367. 0.0 i4 dv For k( 3) = 4, C3,0 = 6.950. For k( 4) = 8, (74,0 = 14.222. For j = 1,2, • • • ,5, Cj,o is such that: — ( — J < 0, for all 1 < v < 50, d v \ qj J 50 40 20 .; d_ c)v /- 0.1 -0.0( - if and only if Cj < Cj, o- 0 .00! -0.01C For k{ 5) = 16, C5,o = 28.846. Figure 5.1: Three dimension plots of d(qo/qj)/dv against v and Cj for various sample sizes k ti)- 131 It can be seen from Figure 5.1 th at C i,o< % £, for j = 2 ,3 ,4 ,5 . (5.23) Now, from (5.21), (5.22), (5.23) and Figure 5.1, we can state th a t the function in (5.19) is strictly monotonic decreasing in v, for all 1 < v < 50 and j = 1,2, • • • ,5, if and only if 1.626. (5.24) K j) Although we have not examined the case where v > 50, Figure 5.1 suggests th a t (5.24) holds for v > 1. In the implem entation of the m ethod, the software generates the values of zi th a t satisfy (5.24). Hence, for j = 1 ,2, • • • , 5, a unique solution Vj can be obtained from (5.19), then the corresponding Wj can be obtained by substituting Vj for v in (5.16). We then reconcile the five different values of the degrees of freedom param eter v by taking their geometric mean. W hen averaging different assessments of a degrees of freedom param eter, taking their geometric mean is favored, by empirical evidence, rather th an their arithm etic mean. See for example, Al-Awadhi (1997), Al-Awadhi and Garthw aite (1998) or G arthw aite et al. (2005). The elicited value of w can then be obtained from (5.16) by substituting for v w ith the geometric mean of v\, • • • ,u 5 . Finally, we assume th a t the regression coefficients vector of param eters (3 = (a,/?i, • • • , (3m+n) is independent from a priori, and give the full prior structure of the normal GLM as / ( £ , ^ ) = / i ( £ ) / 2 (<7e2), (5.25) where f i (/?) can be taken as the m ultivariate normal prior distribution elicited in the previous chapters, and f 2 (&e) = as given in (5.4) with the elicited hyperparam eters v and w. 132 5.2.2 Im plem entation and assessm ent tasks The elicitation m ethod proposed in the previous section has been programmed into the PEGSGLM (Correlated Coefficients) software by the author of this thesis. The option of eliciting the prior distribution of the random error variance is given to the expert once she selects her model as an “ordinary linear regression” model. The same procedure has also been programmed in a separate piece of software th a t can be used as an add-on to any other elicitation software for normal models. This developed software is freely available as PEGSNormal at http://statistics.open.ac.uk/elicitation. In a dialogue box, the expert is asked to assume th a t two independent experiments have been conducted at the same design point, i.e. at the same values of the explanatory variables. She then assesses her median value, go, of the absolute difference, |£(o)|» between the observed values of the response variable after these two virtual experiments. Since the distribution of Z(q) is symmetric about zero, see (5.15), the assessed median go of |Z (0)| is exactly the upper quartile of Z(0). In fact, Pr{|Z (0)| < go} — 0.5 implies th a t P r{-g o < Zyy < go} = 0.5, which implies from symm etry th a t P r {Z(0) < go} = 0.75. Similarly, from (5.14), each upper quartile qj, for j = 1, • ■■,s will be assessed as the median of the absolute difference \ Z ^ \ given th a t Z\ = z\, • • • , Z ^ = z ^ y In assessing the remaining conditional medians qj, the choice of the conditioning values zi, Z2 , • • • , Zf-yy for j = 1,2, • • • , 5, is an im portant issue. As mentioned before, the m ethod of Garthw aite and Dickey (1988) uses only one hypothetical d a ta point z\, for which they suggested a value of z\ = qo/2. They argued th at, this choice will give a conditioning value th a t is not too close to go, so as to prom pt a significant change in the expert’s opinion in assessing gi. This value of z\ is, at the same time, not too far from go, so as to represent an acceptable value for the expert to condition on. In our im plem entation of the extended m ethod, the above two criteria will be considered in choosing values for Z{, i > 1. This means th a t the values should result in a considerable change in the expert’s opinion, while the expert still find them plausible values. To a tta in 133 this, we take z\ — qo/ 2 , following Garthwaite and Dickey (1988). Then we generate four extra sets of hypothetical data, for j — 2, • • • , 5, the j th set consists of k ( j ) = 2J_1 data points. The first 2J 2 data points of each set, namely zi, ■■• ,, of the previous data set, while the new extra elements z 2j - 2 , are taken as the same elements z 2j - 2 + 1 , • • • , , z 2 j - i , are generated as follows. For i = 2J-2 + 1, • ■• , 2-7-1, we generate Zi as random variates from a population with a median of qo/2. Hence, we choose each zi as the absolute value of a normal variate with zero mean and a variance of (go/1-349)2. Thus, the interquartile range of this normal distribution is go, and the upper quartile of the signed variates, which is also the m edian of the unsigned ones, is exactly qo/2. For any d a ta set j , j = 2, • • • , 5, if the generated values fail to satisfy the following condition (5.26) we resample the new elements (5.26) z 2 j - 2 + 1 , • • • , , z 2j - i , from the same normal distribution, until is satisfied. This guarantees th a t the generated d a ta should prom pt the expert to revise her opinion by a substantial amount. To implement the proposed procedure, The expert is asked to perform an assessment task th a t consists of s = 5 steps. In each step j , for j = 1, • • • ,5, the software presents an interactive graph to the expert. The graph in Figure 5.2 is an example of the graph presented to the expert by the software at step j = 3. 134 J* ? th e drsiributKjn o f th e norniailerrorvarianae The thick black Brce marks your original assessment of the median diffenenoe betxieen two responses at the same design point. But suppose two experiments mere conducted aft e a * of a number of design points. The difference between each pair Of response s is marked by an arrow in the diagram th e new differences are marked by green arrows and earlier ones (or th e earlier one3 by black arrows. The median value Ofthese arrows is also matted -b y a downward-pointing arrow. If w e again ran another two experiments at one design point, their values are again likely to differ, how how big do you think their difference would be? Please giwe your median assessment by <£ddng on the horizontal fere. fYour assessment should be between the red marksj Eliciting Conditional M edians of The Absolute D ifference of two re sp o n s e s at th e sam e d esign point I-------------1------------- j]----------------0.0 1.0 2.0 |: 1---L)---------------------------1------------- 1------------- ,-------------1------------- 1------------- , 4.0 5.0 0|0 7.0 8.0 9.0 10.0 11.0 12.0 M ed ian of 4 d a ta p o in ts < A bsolute Difference lA ssessed m edian a t4 .8 9 2 6 | Figure 5.2: Assessing a median value conditioning on a set of data This graph shows the expert’s first unconditional median qo drawn as the thick black long line and the more recent assessed median in the second green long line. The graph also shows a number other thick generated data points z i , • • • ,Z 4 , represented by upward arrows, together with a downward arrow that shows the sample median of this virtual data set. The upward arrows of the data points from the previous set of hypothetical data, z \ and Z2 , are shown in the green color, while the upward arrows of the new generated data points, Z3 and Z4, are shown in the black color. Given the virtual data set (displayed as arrows), the expert is asked to assess her current median value <73 by clicking on the horizontal line between the two short red lines. These are the lower limit <23 computed as in (5.20) with min(i/) = 1 and the initial assessment qo. The expert’s median must lie between the red boundaries, otherwise she will get a warning message asking her to re-assess her median and satisfy this condition. To assess <7 3 , the expert has two obvious strategies. The first strategy (the black one) is to look at the black line that shows her initial assessment qo, and decide where to revise this value in the light of the new information given by the black downward arrow that shows the median of the whole hypothetical data set z \ , • • • , 2 4 . The other strategy (the green one) is for the expert to look at the green line th a t shows her most recent median assessment which has been based on the hypothetical data set in green arrows z\ and 22. She then decides where to revise this median assessment in the light of the new generated points 23 and 24 shown as the black arrows. W ith both of these strategies, if the expert is confident about her previous assessment, then her new median assessment should be near to this value rather th an near to the new hypothetical data. W hen the expert gives her new median assessment #3 , its value is first used by the software to compute z/3 from (5.19), and then to compute W3 from (5.16) using ^3The final output of the procedure, as illustrated in Figure 5.3, gives the five different elicited pairs of v and w, together w ith the geometric mean of 1/ and its corresponding value of w. The expert is asked to check w hether the different elicited values are close to each other and represents her opinion well. If not, she has the option to change any of them by going back to reassess a specific qj through pressing the corresponding ‘Change’ b u tto n for this step, see Figure 5.3. ]|fi| GIM tUCITATION (eliciting the distribution of the normal error variance) Step EScited value of DF — .. ^ ..... “ -a n EBtited value of W 1 3.6740 32.2546 [C h an g e 2 3.0870 30.9910 j ' Change 3 4.5280 333489 t Change 4 4.3840 333635 f Change 5 2.7370 30.0095 ( Change Average 3.6136 32.1419 | Head>d jrHejp?(t7r| Figure 5.3: The output table showing the elicited hyperparam eters After the expert has finished making any revision, the hyperparam eters v and w are set equal to the two values in the last row of the table illustrated in Figure 5.3. 136 5.3 E licitin g a prior d istrib u tio n for th e scale param eter in gam m a GLM s In this section, we propose a novel m ethod for eliciting a lognormal prior distribution for the scale param eter of a gamma GLM. It is well-known th a t the scale param eter of a gamma GLM, which is the reciprocal of the dispersion param eter, is in fact the shape param eter of the gamma distribution. Our new m ethod is a valid means of eliciting the shape param eter of any gamma distribution once the distribution’s mean has been elicited (or the mean is assumed to be known). Bayesian methods have been developed for analyzing d a ta to estim ate the shape pa­ ram eters of a gamma distribution, or the scale param eters of a gamma GLM. Miller (1980) proposed a general conjugate class of priors for the two param eters of the gamm a distribution, but he gave no m ethod of eliciting its hyperparam eters. Sweeting (1981) introduced some suggestions for the Bayesian estim ation of the scale param eters in exponential families. The problem of unknown scale param eters in GLMs was examined by West (1985). In his work, he discussed general ideas concerning scale param eters and variance functions in non-normal models including gamma GLMs, (see also West et al. (1985)). However, there does not seem to be a good m ethod of eliciting a prior distribution for such param eters. Ibrahim and Laud (1991) suggested a Jeffreys’s prior for the regression coefficients and an independent marginal informative prior on the scale param eter of gamma GLM, but they did not suggest any fam­ ily of distributions for this informative prior. The m ethod of Bedrick et al. (1996), which is considered as the first elicitation m ethod of informative prior distributions for GLMs, as­ sumed the scale param eter to be known and elicited priors only for the regression coefficients. Chen and Ibrahim (2003) proposed a novel class of conjugate priors for GLMs. They also discussed elicitation issues and strategies of these conjugate priors. Their proposed prior structure involves the dispersion param eter as well. However, no explicit elicitation m ethod was introduced for the dispersion param eter. 5.3.1 GLM s w ith a gam m a distribu ted response variable For a continuous, positive, skewed distributed response variable £ in a GLM of the form, Y = g(fi ) = g ( E ( ( ; \ X ) ) = a + p 1X 1 +( 32 X 2 + --- + PmXm, (5.27) the observations are often assumed to follow a gamma distribution, say Gamma(A,0), where A and 0 depend on X.- Its pdf is /(C IA e) = C, A, e > 0, (5.28) where A is the shape param eter, 0 is the rate param eter or the inverse of the scale param eter. It is well-known th at // - E(C} - A/d/. ^ - V » r '0 -- A/ft". (5.29) For the gamm a GLM in (5.27), with any monotone increasing link function <?(.), the methods discussed in Chapters 3 and 4 can be used to elicit the prior distribution of the regression coefficients £=(«, which represents the prior distribution A/0. A, A, 0m>’ (5’3°) of fi, i.e. reflects the prior knowledge about the ratio Weassume th a t the prior distribution of this ratio has already been elicited as g ( \ / e ) ~ N ( r Qb,X!QX X 0), (5.31) where b = E((3), S = Var(/3), have been assessed using m ethods given in the previous chapters, and the vector 2Lo denotes all explanatory variables to be at their reference points. Having elicited this prior for the ratio A/0, the prior expert’s opinion about one of the hyperparam eters A and 0 m ust be quantified to complete the prior structure of the gamma GLM model. In w hat follows, expert opinion about the scale param eter A is modelled by a lognormal prior distribution and we propose an assessment m ethod for determ ining the 138 hyperparam eters of this distribution. As discussed before, the proposed m ethod can be also used to elicit a shape param eter A of any gamma distribution. We base our m ethod on a gamma distribution with A as the only unknown param eter, assuming fi to be already assessed or completely known. For gamma GLMs, the elicited vector b can be used to obtain a single value of /i, say /iq, from (5.31). As we assume th at the link function g(.) is monotonic increasing, the median value of X/ 6 is then Mo = 0- 1CXo&). (5-32) We take the gamma distributed random variable £ defined in (5.28) and change param eters by putting 6 = \ / f i as in 5.29. This gives / ( C|A,m) = j~ G ) C,A,M > 0, (5.33) We let W = -, (5.34) and then the pdf of W will depend only on A, i.e. W ~ Gamma(A, A). This has the form /H A ) = - L w , \ > 0. (5.35) Our aim now is to find some meaningful strictly monotonic function in A, such th a t the expert can quantify her opinion about this function effectively. The expert cannot answer questions about A directly, as a gamma distribution param eter has little m eaning to an expert because it is not an observable quantity. Instead, the expert should be asked about an observable quantity th a t directly relates to the observable gamma variate, and which can be monotonically transform ed to A. The expert can thus be asked about any quantile of the gamma distribution as an observable quantity, provided th a t it is a strictly monotonic function in A. In what follows, we show th a t quantifying the expert opinion about the lower quartile of the gamma distribution in (5.35) will lead to a full prior distribution for A, and th a t this quartile is a strictly monotonic function in A. 139 To checkthe monotonicity of different quantiles in A, let F ( w ,A, A) be the cdf of W , then it can be w ritten in the form of a regularized gamm a function as follows \ X) = ~ where 7 , (5-36) (A, A,w) is a form of the lower incomplete gamma function, rw 'y(\,\,w )= / Axt x~ 1 e~^tdt. (5.37) J t =o Note th a t it differs from the usual lower incomplete gamma function 7 (A, w ) in th a t the latter does not contain Aa in the integrand. It is clear th a t the function F(w, A, A), as a cdf of IT, is strictly monotonic increasing in w. But, as a function in A, the usual cdf •F ( w ,A ) = 2r ^ ’ ( 5 '3 8 ) as a regularized gamma function is strictly monotonic decreasing in A. The proof of this fact is given in Tricomi (1952), see also Gautschi (1998). We next show th a t the same type of monotonicity is true for the function F(w, A, A) in (5.36). This helps in finding a range of quantiles th a t are monotonic functions in A. In fact, following the note of Koornwinder (2008) for F(w, A), we can write \ ^ _ F (w j A, A) t (a , a , w ) ,. v T(A) _ 7 7(A,A,u;) f \ \ \ i t>/'\ \ \> (A, X, w ) + r(A , X,w) , , (p.oyj where j( X ,X ,w ) takes the form in (5.37), and T(X,X,w) is a form of the upper incomplete gamma function, i.e. r co r(A , X,w) = / A (5.40) J t=w Differentiating (5.39) with respect to A, we have dF(w, A, A) —1 f N$r(A , X,w) , ^ 7 (A, AjU;)! a\ = r n \ ) \ ^ x ’x ' w ) ax ~ r{x' x'w ) a x \ ' The quantity in curly braces can be written, after getting the derivatives as, , . {5A1) So, the function F ( w , \ , \ ) is monotonic decreasing in A if log{t/u) — (t — u) > 0 in the integration domain, i.e. if roo / rw te^dt > / Jt = 0 J t=w te^dt. (5.43) Apparently, the above condition is fulfilled if w < median of G am m a(2,1) = 1.678. (5.44) Hence, from the positive skewness of a gamma distribution, and for all 0 < a < 0.5, wa < w 0,5 < E ( W ) = 1 < 1.678, VA > 0, (5.45) where wa is the a-quantile of W . Prom (5.44) and (5.45) we can see th a t F(w, A, A) is strictly monotonic decreasing in A for all quantiles w, such th a t w < wq^. However, we believe th a t the expert can efficiently quantify her opinion about quartiles more easily by using the bisection m ethod, see for example P ra tt et al. (1995). So, we choose the lower quartile, u>o.2 5 >as a monotonic function in A since the function F(ico.2 5 >A, A) is decreasing in A. Note th a t the opposite is not true, i.e. if w > iuo.5 then w is not necessarily greater than 1.678, and no monotonicity is guaranteed for wo.7 5 , for example. Another reason for choosing the lower quartile and not the upper quartile, beside mono­ tonicity as discussed above, is th a t the lower quartile is more sensitive than the upper quartile to changes in the the shape param eter A at any fixed value of the mean. Figure 5.4 illustrates this fact; it shows the changes in both the lower and upper quartiles of gamma distributions due to the change of its param eter value A, for different fixed mean values at 0.5, 5, 50, and 500. It can be seen from Figure 5.4 th a t the lower quartile is more sensitive th an the upper quartile to the changes in A at fixed mean values. 141 Mean=0.5 Mean=5 co CO o o in d co co o o CM CM 2 4 6 8 10 2 4 6 8 10 8 10 L am bda Mean=50 Mean=500 o oo CO O CO oo o•>sfoCO oCM oo CM 2 4 6 8 10 2 4 6 L am bda Figure 5.4: Changes in quartile values with the change of A at different mean values. Now, since F(w, A, A) is strictly monotonic increasing in w and strictly monotonic decreas­ ing in A, for w < iuo.5 , then fixing F(w, A, A) = 0.25, the lower quartile itfo.25 is an implicit monotonic increasing function in A, say W0.25 = h*( A). (5.46) Qi = M*(A) = MA), (5.47) Hence, from (5.34), we have where Q 1 is the lower quartile of £, and h(.) is a monotonic increasing function of A. The expert will be asked to assess three quaxtiles of her prior distribution for Q\. Then, from the monotonicity of h{.) in (5.47), these quartiles can be transform ed into the corre­ sponding three quartiles of A. We assume th a t the prior distribution of A is a lognormal distribution, and use the three transform ed quartiles to solve for the two param eters of the lognormal distribution. The required assessment tasks to implement this m ethod using in­ teractive graphical software are detailed in the next section. 142 5.3.2 A ssessm ent tasks The expert is questioned about the lower quartile of the gamma distribution, Q\ say. However, she is not simply asked to give a point estim ate of Qi - she is asked to give assessments th at quantify her uncertainty about it. Specifically, she is asked to give her lower and upper quartiles for Q\ in addition to her median assessment of its value. Questions th a t make this a meaningful task th a t an expert can reasonably be asked to perform are suggested later. • Three quartiles of Qi will be assessed by the expert, say Qi,i, Q 1,2 and Q i ts, where the median Q \ $ is a point estim ate of Q 1 , and Q ^ 3 —Qi^i is its interquartile range. Details on how to ask about these quartiles are given later. • Under the monotonicity of h{.) in (5.47), the three assessed quartiles Q i,i, Q i ,2 and Qi,3 of Qi can be transform ed to the three corresponding quartiles of A|/i, say Q \ t\ , Qx, 2 and Qa;3 , respectively. • Hence, we obtain the three quartiles Qa,i> Qa ,2 and Qx ,3 of the prior distribution of A given /i, as Qx,i — h ^(Qi,z), (5.48) i — 1,2,3, where /i_1(.) can be implemented by numerically inverting the incomplete gamm a func­ tion F(w, A, A) via a simple search procedure. • From (5.47) and (5.48), if the three assessed values Q i,i, Qi$ and (^1,3 are the three quartiles of Q \ , then Q\,\, Q \ $ and Q a ,3 are the three corresponding quartiles of A|/i, respectively. Clearly P r{Q i < Q u ] = Pr{(A|/u) < h 1 (Qi,*)} = Pr{(A|/i) < Qx,i} = 0.25(f), i = 1,2 ,3. (5.49) • We assume th a t the prior distribution of A given fi is a lognormal distribution w ith two hyperparam eters a and b of the form (5.50) 143 The properties of the normal distribution are used to estim ate a and b from the trans­ formed assessments Q \ j , i = 1,2,3. • Since, from the assumed lognormal prior distribution in (5.50), we have (In A|/x) ~ N(a, 6), (5.51) and using the fact th a t b = IQ R /1 .349, then clearly „ i \ a — ln(Qx, 2 j, L_ ln (Q Ai3) - l n ( Q A)1) b— l ~349 ’ ,c (5.52) • The prior structure of the gamma GLM param eters take the form f ( v , A) = f(fj) x f ( \ \ n ) , (5.53) where f{ n ) can be obtained from (5.31), and / ( A|/z) is given as lognormal (a, b). This elicitation m ethod has been implemented in graphical user-friendly software th a t au­ tomatically estimates the two hyperparam eters of the lognormal distribution. The soft­ ware has been developed as an add-on to the PEGS-GLM (Correlated Coefficients) soft­ ware for eliciting the scale param eter A of the gamm a GLM. It is also freely available at http://statistics.open.ac.uk/elicitation as a stand alone version, PEGS-Gam m a, for eliciting the shape param eter A of a gamm a distribution with a known mean. In the former case, the median no and and the lower quartile Q i of the response variable £ at the reference point have already been elicited, see (5.32). For the latter case, the expert is asked, in a dialogue box, to assess her mean value no and the lower quartile Q\ of the gamma random variable. In both cases, these two assessments represents the first assessment step, from which the software suggests reasonable initial values for the other two required assessments. The median value Q\,i is set equal to the assessed value of Q i, while the other two quartile values Q iti and Q i )3 are suggested as Qi,i = Q i ,2 - ^min(<5i)2, no ~ Q i j ) , 144 (5.54) These initial suggested values are used in (5.47) and (5.49) to get the three quartiles Q\ 2 1? and Q® 3 of the param eter A, respectively. The inversion of (5.47) is done by the software through a simple search procedure. As in (5.52), these quartiles are used to compute the two hyperparam eters a and b of the assumed lognormal distribution of A. Using a and b, the mean value of A, say fi\, is computed from the lognormal distribution of A: fi\ = e x p (a + i&2). (5.56) Then fi\ is used w ith the assessed mean value fio to draw the pdf graph of the gamma distribution, G am m a(^,M A /^o)- A main panel is presented to the expert showing this pdf graph; see the upper graph of Figure 5.5. The thick black line on this graph represents the mean value /iq . }[fij F ikitnqG cim nu P ara m e ter C ondtional on th e given value o f th e m ean, adjust th e th re e es tim a te s of th e tow er quartile o f th e re s p o n s e variable: Eliciting Q uartiles fo r QI of a Gamma distribution 0.09 -Q1 of The R esp o n se variable Y 0.08 0.07 14,7288 0.06 M ecSanofQ I: 0.05 | Upper quartile o f Q1:|5.8236 0.04 0.03 0.02 r Q uartiles o f lam bda0.01 0.00 4.73 Low er quartile: |1.B401 10.0 R e s p o n s e v ariable Y Median: DISTRIBUTION OF THE LOWER QUARTILE Q1 |2.45 Upper quartile: |3.094 0.5 ■H yperparam eters o f tognorm al0.4 0.2 j E ene “In W S p T| 4.73 L ow er quartile Q1 Figure 5.5: The main software panel for assessing gamm a param eter For statistical coherence of the assumed normal distribution of ln(A), the two normal quartiles ln(Q® x) and ln(Q° 3) should be symmetrical around the normal mean, a = ln(QA,2 )145 To attain this, we assume th a t the expert is always more confident in assessing the median value, than assessing the other two quartiles. So we treat her original and transform ed medians Q 1,2 and Qx, 2 , respectively, as being correct. Then we suggest two coherent sets of quartiles Q i.i, Q 1,3 and Qx, i , Q a ,3 to replace the initial assessments Q ° 3, respectively, as follows. First, Q a , i , Q a,3 Q i ,3 and 1? are computed as the actual first and third quartiles, respectively, of a lognormal distribution with the two elicited param eters a and b. Then Q i,i, Q 1,3 are computed from Qx,i, Q \ , 3 , respectively, using (5.47) and (5.49). The first group of values in the right-hand side panel of Figure 5.5 gives the values of the three suggested coherent quartiles Q i,i, Q 1,2 and Q 1 ,3 . These quartiles are also drawn as the three blue lines in the upper and lower pdf graphs of Figure 5.5. The second group of values gives the three quartiles of A, Q a,i> Q A,2 and Q a ,3 - The elicited values of a and b are shown as the third group of values in the same panel. The lower graph in Figure 5.5 represents the elicited distribution of the lower quartile Q 1 , with the three vertical blue lines representing Q i,i, Q 1,2 and $ 1 ,3 . The graph is intended to help the expert check th a t the distribution is a reasonable representation of her prior knowledge of Q\ . Although we do not assume any specific family of distributions for Q 1 , the pdf graph is drawn using pointwise numerical derivatives of the cdf of Q This cdf is obtained as in (5.49), not only for the three quartile points, bu t also for a sufficiently large number of points. A set of 1000 points covering the whole range of Q\ has been used. Hence, Figure 5.5 shows all the assessed and suggested quartiles of Q\ and A, with the two corresponding values of a and b. The two pdf graphs of A and Q 1 are also presented to the expert to show her the im pact of these quartile values and hyperparam eters on the two distributions. The main assessment task th a t the expert is asked to perform uses the following type of question. Let us suppose th a t the variable th a t has the gamm a distribution is the period of tim e th a t a patient with some medical disorder may stay in hospital. Then the expert will be asked to consider the length of tim e th a t a hypothetical patient, John, will spend in hospital. She is told, “John has this disorder and will spend a tim e in hospital. 146 Suppose he is fortunate and does not spend as long as most people in hospital. Specifically, suppose exactly 25% of patients with John’s disorder spend a shorter tim e in hospital than John. Give your median assessment for the length of tim e th a t John spends in hospital. Now give your lower and upper quartiles for this length of tim e.” The expert will be shown suggested coherent assessments and graphs. If she finds the suggestions a reasonable representation of her opinion, she can accept them, which finishes the assessment procedure. If they do not represent her opinion adequately, she has the option of directly reviewing the median value Q of Q i, or indirectly reviewing the quartiles Q 1,1 and Q ^ 3 by changing the value of the hyperparam eter 6. As discussed before, for statistical coherence, changes m ust be made first to the value of b and then transform ed into corresponding coherent changes in Q i)1 and Q 1 ,3 . In principal, the expert can change Q 1,2 to any value in (0,^o)> and she can change b to any positive value. However, to get a unimodal distribution for Qi, some restrictions must be imposed on the values of a and b, as detailed below. Although the relation between Q 1 and A, as given in (5.47), is strictly monotonic increasing for all A > 0, the numerical second derivative of h(A) reveals a critical point of zero at A = 0.5045. Therefore, the pdf of Q 1 is not guaranteed to be unimodal if the elicited values of a and b lead to a non-neglectable probability of A < 0.5045. To avoid an undesirable appearance of the pdf of Qi, we restrict the elicited lognormal hyperparam eters a and b to satisfy This condition insures (from the standard normal distribution) th a t (5.58) i.e. it guarantees th a t Pr(A < 0.5045) < 0.001. 147 (5.59) If condition (5.57) is not satisfied, the right hand side panel on Figure 5.5 will only allow the expert to increase the value of Qi,2 , hence increasing a = ln(<3 i, 2 )> or directly decreasing the value of 6. A ‘Reset’ button is available for the expert to return at any time to the initial coherent set of suggestions and graphs and review them again if she needs to. W hen the expert is happy w ith the quartile values and the corresponding pdf graphs, she clicks ‘Done’ and obtains the two corresponding hyperparam eters a and b as the output of her assessments. 5.4 C onclu din g com m en ts To elicit an informative prior distribution for normal and gamma GLMs, expert opinion must be quantified about both the regression coefficients and the extra param eters in these models. In this chapter, two elicitation methods have been proposed to quantify expert’s opinion about a prior distribution of the random error variance in normal GLMs, and a prior distribution for the scale param eter in gamma GLMs. A m ethod of assessing a conjugate inverted chi-squared prior distribution for the error variance in normal models has been proposed. The m ethod quantifies an expert’s opin­ ions through assessments of a median and conditional medians of the absolute difference between two observations of the response variable at the same design point. Conditional assessments have been based on various sets of hypothetical future samples. These assess­ ments depend only on the random error and have been used to elicit the inverted chi-squared distribution. A computer program th a t implements the m ethod is available as an option in the PEGS-GLM (Correlated Coefficient) software and also as an add-on to any other elicitation software for normal models, PEGS-Normal. Both versions are freely available at http: / / statistics.open.ac.uk/elicitation. A novel m ethod for eliciting a lognormal prior distribution for the scale param eter of a gamma GLM, or the shape param eter of any gamma distribution, has also been proposed. The m ethod depends only on quantifying an expert’s opinion about the lower quartile of 148 a gamm a distributed random variable. This lower quartile is itself a random variable; for which the expert assesses a median value as a point estim ate and an interquartile range. An example of questions th a t can be addressed to the expert has been given. The interactive graphical PEG S-Gam m a software implementing this m ethod is user-friendly. It gives coherent suggestions for all the required assessments and presents instant graphical feedback. To the best of the author’s knowledge, this is the first piece of interactive software th a t is designed for eliciting a prior distribution of the shape param eter of a gamm a distribution or the scale param eter of a gamma GLM. 149 C hapter 6 E liciting D irichlet priors for m ultinom ial m odels 150 6.1 In trod u ction M ultinomial models, consisting of items th a t belong to a number of complementary and m utually exclusive categories, arise in many scientific disciplines and industrial applications. For example, they are frequently encountered in geology for different compositions of rocks, in microeconomics for patterns of consumer selection preferences, in political science for voting behavior. O ther application areas include medicine, psychology and biology. For m athem atical coherence, the probabilities of each category m ust be non-negative and satisfy a unit-sum constraint. The multinomial distribution describes this model as a direct generalization of the binomial distribution to more than two categories. It is well-known th a t the Dirichlet distribution is a conjugate prior for the param eters of multinomial models. The distribution preserves the unit sum constraint of multinomial probabilities and imposes a simple Dirichlet p attern of dependency between them . This structure gives negative correlations between the probabilities of categories, as will be shown later. A different way of thinking about prior distributions for multinomial models is to use the m ultivariate normal distribution as a large sample approxim ation to the Dirichlet distribution or to the distribution of the log contrasts of the multinomial probabilities. Another option is to estim ate the exact distribution of log contrasts using a Monte Carlo sample. Generalized, nested or mixed forms of the Dirichlet distribution have been also introduced and suggested as suitable priors for multinomial models. For more details on possible prior distributions for multinomial models see, for example, O ’Hagan and Forster (2004). Eliciting param eters of m ultivariate distributions is not, in general, an easy task. It is even more complex when the variates are not independent, in which case summaries of the marginal distributions should be assessed, together with effective and reliable summaries of the dependence structure of the joint distribution [O’Hagan et al. (2006)]. In this chapter, our proposed m ethod makes use of assessments of marginal beta distributions. Decomposition of the Dirichlet elicitation process into the assessment of several marginal b e ta distributions 151 helps reduce the complexity of eliciting a m ultivariate distribution. In Section 6.2, we develop a m ethod of quantifying opinion about a beta prior distribution by the assessment of three quartiles. The m ethod will be generalized to elicit a Dirichlet distribution in Section 6.3. The elicited beta univariate distribution will also be used to construct more flexible distributions in the next chapter, including the generalized Dirichlet prior and a Gaussian copula function for the prior distribution. 6.2 E licitin g b e ta p aram eters u sing quartiles 6.2.1 Introdu ction The beta distribution is widely used in Bayesian analysis as a conjugate prior for the proba­ bility of success in Bernoulli trials. The domain of definition for the beta distribution of the first type is the interval [0,1], which is appropriate for the probability param eter of Bernoulli and binomial distributions. Moreover, the beta distribution is also a conjugate prior for Bernoulli and binomial sampling distributions, so th a t the posterior distribution is obtained through simple arithm etic. The wide range of valid values of the two hyperparam eters of the beta prior gives it great flexibility and its pdf has varied shapes. In this sense, the b eta dis­ tribution is more likely to be a reasonable model of the expert’s opinion compared w ith other priors such as the uniform distribution over the interval [0,1] or the triangular distribution suggested by van Dorp and Kotz (2002). It seems th a t eliciting beta param eters is. the most studied elicitation problem to date, whether it is a beta prior for Bernoulli or Binomial sampling distributions, a distribution of a probability of an event, or a proportion th a t ranges between zero and one. There are many methods available in the literature for eliciting b e ta distribution param eters. A comprehensive literature review may be found in Hughes and M adden (2002), Jenkinson (2007) or O ’Hagan et al. (2006). The available methods for beta elicitation can be classified into two general classes of 152 elicitation methods, variable interval and fixed interval. In the variable interval methods, the probability is fixed and the expert assesses an interval th a t gives this probability. In the fixed interval methods, the interval is fixed and the expert assesses the probability th a t the event of interest will be in th a t interval. Asking about quartiles is an example of the first methods, while assessing probabilities is an example of the second class of methods. Beta elicitation m ethods vary in the quantities th a t the expert m ust assess. She may be asked to assess a location value such as the mean, the median or the mode. Also, a scale value must be assessed, such as the probability of being in an interval, the boundaries of an interval, or the mean absolute deviation about a location value. These quantities may be converted into the hyperparam eters in exact forms or through numerical approximation. Regarding the number of required assessments, most of the available methods use only two assessed quantities, usually one for location and the other for scale. These give estim ates of the two beta param eters. Although only two assessments are m athem atically needed to elicit two unique param eters, some methods use over-fitting through assessing three or more quantities, followed by some sort of averaging or reconciliation. In this section we propose a new m ethod of eliciting the param eters of a b e ta prior distribution for the binomial success probability. Assessments of the m edian and two quartiles are elicited. A compromise is needed to reconcile these three assessments into two unique param eters. We use a normal approxim ation to the beta distribution to estim ate initial values of the beta param eters, followed by a least-squares technique to optimize the two initial values. According to the classifications given above, the proposed m ethod is a variable interval m ethod th a t uses three assessments, a median and two quartiles. We believe th a t it is better to elicit a median as a location value and quartiles for scale, than, say, to elicit a mean and other quantiles. The m edian and quartiles are easier for an expert to assess as they are obtained by the first two steps of equally likely subdivisions (bi­ section m ethod). The expert can be asked about the median as the value th a t the probability of success is equally likely to be above or below. Then we ask the expert to sub-divide the 153 interval above the median into two equally likely intervals for the probability; her assessed value is her upper quartile. The same concept is used for the interval below the median in order to obtain her lower quartile. van Dorp and Mazzuchi (2000, 2003, 2004) introduced a numerical algorithm and software to specify the param eters of the beta distribution and its Dirichlet extension using quantiles. They used the median as a measure of central tendency w ith any other single quantile as a measure of dispersion. Although they proved th a t this suffices m athem atically for the existence of a unique solution for beta param eters, it is more useful in elicitation contexts to use over-fitting as a means towards better representation of an expert’s opinion. 6.2.2 N orm al approxim ations for b e ta elicitation To estim ate the two param eters of the beta distribution using three assessed quartiles, we propose a two step approach. In the first step, a normal approxim ation for the beta distri­ bution is used to transform and reconcile the three assessed quartiles as two initial values for the beta param eters. In the second step, a numerical least-squares m ethod is applied to the initial param eter values so as to optimize them . The aim is to find param eter values th a t give nominal quartiles th a t are as close as possible to the assessed values. This section is devoted to the proposed normal approximation, while the least-squares optim ization is discussed in Section 6.2.3 below. A m ethod th a t directly fits a beta distribution to the assessed median and two quartiles is given in P ra tt et al. (1995). They used a normal approxim ation for the beta distribution together with averaging. The m ethod was also used as the main assessment m ethod in a study of the effect of feedback and learning on the assessment of subjective probability distributions (Stael von Holstein, 1971). Our proposed m ethod adopts the technique of P ra tt et al. (1995), but with a different normal approximation and a new compromise to get initial param eter values. We also add a least-squares optim ization technique. In w hat follows, we summarize the argument of P ra tt et al. (1995) and then propose a different normal approxim ation and 154 a different compromise. Let p be the success probability of concern, and assume th a t p has a conjugate standard beta prior distribution of the form f{jp) =i - p ) 6" 1, 0 < p < 1, a > 0 , b > 0 . (6 .1) P ra tt et al. (1995) stated th a t the transform ation Z = 2 { [ p ( 6 - l / 3 ) ] 1/2- [ ( l - p ) ( a - l / 3 ) ] 1/ 2} (6.2) has approximately a standard normal distribution. Let qi be the zth quartile of p th a t is assessed by the expert, for i = 1,2,3. Using the assessed lower quartile qi and the assessed median q2 , we get the following two equations from (6.2): P r \ Z < 2 {[?i(6 - 1/3)]1/2 - [(1 - qi)(o - 1/3)]1/2} } = 0.25, (6.3) P r { z < 2 { [ © ( 6 - l / 3 ) ) 1/2- | ( l - ? 2) ( a - l / 3 ) ] 1/2} } = 0 .5 . (6.4) Solving (6.3) and (6.4) for a and b gives ai = ci <72 + g (6 -5) bi = c i( l - q2) + (6.6) and where -2 ci = 0.112 {[g2(l - q i )}1/2 ~ fei(l - ©)]1/2}' Similarly, the assessed upper quartile, qs, gives the equation P r { z < 2 { f e ( 5 - l / 3 )]1/ 2- [ ( l - © ) ( a - l / 3 )]I/ 2} } = 0.75. (6.7) Solving (6.4) and (6.7) for a and b gives «2 = C2<?2 + g (6-8) b2 = C2(l —9 2 ) + (6-9) and 155 where c2 = 0.112 |[ g 2(l - <?3 )]1//2 - [9 3 ( 1 - ?2 )]1//2} The compromise of P ra tt et al. (1995) is simply to estim ate a and b as the average of (6.5), (6.6), (6.8) and (6.9), i.e. a\ + a2 a = = _ ’ z bi + &2 6= ^ — (6.10) However, P ra tt et al. (1995) did not mention the theoretical derivation of the approx­ im ation in (6.2), nor its accuracy. So, we tried to use another approxim ation th a t is still m athem atically tractable, but whose justification and accuracy have been investigated. P a­ tel and Read (1982) give a good review of some accurate normal approximations to beta variables. They describe the following normal approxim ation as a simple yet accurate ap­ proximation. If p has a beta distribution of the form in (6.1), then the transform ation Z = 2 {[p(6 - 1/4)]1/ 2 - [(1 - p)(a - 1/4)]1/2} , has an approxim ate standard normal distribution. The absolute error of this (6.11) approxim ation is of order We adopt the approxim ation (6.11) to propose a new elicitation m ethod for the b eta param ­ eters a and b using the three assessed quartiles <&, i — 1,2,3. Instead of direct averaging, we introduce a new compromise, making use of the charac­ teristics of the normal distribution. In fact, it is well-known th at 20.75 ~ 2o.25 = 1-349, (6.12) where zo.25 and zo.75 are the lower and upper quartiles of the standard normal distribution, respectively. 156 In view of the approxim ation (6.11), we have [<n i b - 1/4)]1/ 2 - [(1 - q2)(a - 1/4)]1/ 2 = 0. {[gi(6- (6.13) 1/4)]1/2 - [(1 - qi)(a - 1/4)]1/ 2} , (6.14) *0.75 = 2 {fe(& - 1/4)]1/2 - [(1 - ®)(<t - 1/4)]1/2} . (6.15) 20.25 = 2 Substituting with (6.14) and (6.15) in (6.12) we get the new compromise between q\ and qs as {\q3(b - 1/4)]1/ 2 - [(1 - q3)(a - 1/4)]1/ 2} {[9 l(i>- 1/4)]1/ 2 - [(1 - qi)(a — 1/4)]1//2} = Solving (6.13) and (6.16) for a and (6.16) 6, we get a = cq2 + ^ (6.17) &= c ( l - g 2) + i , (6.18) and where c = ( L M 9 f { fc(1 _ 5i)]1/2 _ fc (1 _ 9j)]1/2 + f e ( l - 9 2 ) ] 1/2- [ ? 2 ( l - ? 3 ) ] 1/2} ''2 . We argue th a t our m ethod preserves the assessed median value and the only compromise is between the two quartiles. We believe this will represent the expert’s opinion better. The expert usually assesses her median with more certainty and less bias th an her lower and upper quartiles. By using the new compromise of quartiles in (6.16) and keeping the m edian equation (6.13) fixed, we reflect the probable greater accuracy of the median assessment. According to the accuracy of the normal approxim ation, the proposed initial values of the beta param eters, given in (6.17) and (6.18), lead to nominal values for the b e ta quartiles th a t are close to the assessed quartiles. However, they are not guaranteed to be the param eter 157 values th a t minimize the differences between nominal and assessed quartile values. This is not ideal, so we just treat equations (6.17) and (6.18) as giving initial param eter values th a t can be improved upon. 6.2.3 L east-squares optim ization s for b e ta param eters Oakley (2010) gave a least-squares m ethod for choosing beta param eters a and b th a t minimize Q = [F(qi,a,b) — 0.25f + [F(q2, a, 6) - 0.5]2 + [F(q3, a, b) - 0.75]2 , (6.19) where F ( x , a, b) is the cdf of a b eta distribution with param eters a and b at the point x. The same approach has been implemented in the SHELF elicitation framework developed in Oakley and O ’Hagan (2010). They introduced an R package of tem plates and software for conducting elicitation, within which minimizing Q in (6.19) was used to estim ate beta param eters from assessed quartiles. However, they do not use any explicit normal approx­ im ation to a b eta distribution when deriving the initial estim ates of the beta param eters. Instead, they just transform the assessed b e ta quartiles into the mean and variance of a nor­ mal distribution, as if the quartiles were assessed for a normal distribution. The mean and variance are then assumed to be those of a beta distribution, from which initial values for the param eters can be computed. Our accompanying elicitation software, PEGS-Dirichlet, implements program s w ritten by Flanagan (2011) for the Java scientific library. These numerically minimize (6.19), which cannot be minimized analytically. They use a multidimensional technique called the down­ hill simplex m ethod. The m ethod was introduced by Nelder and Mead (1965) as a quick multidimensional minimization m ethod th a t uses only function evaluations, not derivatives. To constrain beta param eters to be positive, we transform them to a logarithm ic scale. Hence we actually minimize Q = (F[gi,exp(a*),exp(&*)] - 0.25}2 + {F[q2, exp(a*), exp(&*)] - 0.5}2 + {F[g3, exp (a*), exp(&*)] - 0.75}2 , 158 for a* and b*, w ith initial values as in (6.17) and (6.18), bu t on the logarithmic scale, i.e. log(a) and log (6). The final resulting beta param eter values are thus exp (a*) and exp(&*). Our elicitation software, PEGS-Dirichlet, presents an interactive graph to the expert showing the previously assessed probability medians of all categories. The expert is asked to assess a lower and an upper probability quartile for each category by clicking on the graph. Once the two required quartiles are assessed for any single category, the proposed m ethod of beta param eter elicitation is implemented by the software on the probability of this category. A pop up window opens showing the pdf graph of the elicited b e ta distribution w ith the location of the three assessed quartiles. This gives instant feedback to the expert, see Figure 6.1. Tho Bota Distribution of P2 /•> You a s s e s s e d t h e u p p e r q u a rtile p robability o f c a te g o ry (C a te g o ry 2) t o b* 0.5 P2 Eliciting Q uarttlos of tfco probabilities of e a c h c ategory E Category 3 Categories Rusi » tfim rep^aicnA ?...1 rwpn liar............................................................................. r D 5 )o g ? B u*6 . I*.«■■»>«»a a a a a »-'• I Figure 6.1: Assessing probability quartiles of each category If she is not satisfied w ith the fitted beta distribution, the expert can simply change her assessments of the two quartiles. The whole elicitation process is applied again whenever the expert changes her quartile assessments. The pdf curve is interactively changing to show the direct impact of changing quartiles. 159 On finishing the elicitation process for all categories, the b e ta param eters are then com­ promised to estim ate the Dirichlet hyperparam eter vector as discussed in Section 6.3, below. 6.3 E licitin g a D irich let prior for a m u ltin om ial m od el 6.3.1 Introdu ction A limited num ber of attem pts have been made to develop elicitation m ethods for Dirichlet param eters, see C hapter 2 for more details. Jenkinson (2007) and O ’Hagan et al. (2006) discussed two methods for Dirichlet elicitation. Namely, the m ethod of Dickey et al. (1983) and th a t of Chaloner and Duncan (1987). The elicitation m ethod suggested by Dickey et al. (1983) starts by assessing the probability of each category directly from the expert. She will then be given a hypothetical future sample of a fixed size and told the number of items in each category. She is asked to re-assess the probabilities given this hypothetical sample. The equivalent sample size th a t corresponds to her prior knowledge can thus be estim ated using Bayes’ theorem. Chaloner and Duncan (1983) give a m ethod for eliciting a beta distribution. Chaloner and Duncan (1987) generalize this m ethod and give an interactive graphical tool for Dirichlet elicitation. This is based on assessing the sample size and the modal values of Dirichlet variates, and then giving feedback to adjust the param eter values. As mentioned before, van Dorp and Mazzuchi (2003, 2004) introduced a numerical algo­ rithm th at yields the Dirichlet param eters from quantile assessments. Their algorithm uses k quantile assessments to estim ate all the param eters of a ^-dimension Dirichlet distribution. However, we believe th a t it is better to assess more than k quantiles and then apply some form of reconciliation to estim ate the param eters. Assuming a Dirichlet prior for the success probabilities is one way of reconciling separate marginal beta prior distributions. Eliciting a Dirichlet prior by using assessed b e ta m arginal distributions was outlined in Bunn (1978, 1979). However, his elicitation m ethod used the 160 hypothetical future sample technique. He stated th a t the application of the usual univariate quantile m ethods may generally be difficult and tedious in practice because of the multivari­ ate nature of the Dirichlet distribution. However, the availability of interactive graphs and efficient computing enables us to use the quantile m ethod in an elicitation m ethod th a t is easy for the assessor and quick. In what follows, we propose some reconciliation m ethods, based on the Dirichlet dis­ tribution, of combining beta marginals th a t have already been assessed using the m ethod introduced in Section 6.2. 6.3.2 T he m ultinom ial and D irichlet distributions Let the random vector X = ( x 1; X 2, • • • , Xf.) niultinomially distributed with k cat­ egories, n trials and a vector of probabilities P = ( p 1, p2> • *• 9 Pk)’ f{ x > x k) f°rm (6 .20) Xi\x2\ ••■ajfc! 0 < Xi < n, J2 x i = n > 0 < Pi < 1, J^Pi = 1, or, equivalently, in the form f ( x i , x 2, • • • ,Xk) = -^pTpT • • -P ^ U x \ \ x 2\ •••Xk'. 0 < xi < n, -PI - P l > * = n, <Pi < 1 , 0 P k - i T k, 2 (6.21) £ p * < l. A conjugate prior for the param eter vector p is the Dirichlet distribution, which has the form 7r(pi,P2,-- - ,Pk) = (6 .22) r(ai)r(a2) ••-r(afc) 0 < p i < 1, J^Pi = b ai > 0) N = J2 ai, or, equivalently, the form n(pi,P 2 , - ' ,Pk- 1 ) = T(N) p r(a i)r(a 2 )---r(a ifc ) „ x- i p „ 2 _ i . . . 0<P*<1, 161 (1 J2Pi < b _ pi _ p2 a-i > 0, . . . _ N = ^ 2 ai- ( 6 .2 3 ) It is well-known th a t the expectations, variances and covariances of the Dirichlet variates P i, for i = 1,2, • • • ,k, are given by (6.24) (6.25) (6.26) To elicit the vector of hyperparam eters a = (ai) . . . ? afc), we use the direct relation between the Dirichlet distribution and its special univariate case, the beta distribution. We have already developed, in Section 6.2, a m ethod of eliciting the two hyperparam eters of a beta distribution. The hyperparam eters of the Dirichlet distribution can be induced from those of the univariate b eta distributions through some form of reconciliation. This can be done using either the standard marginal beta distributions of the multinomial probabilities, or the conditional scaled beta distribution of each of them . In w hat follows, these two proposed approaches are given in detail. 6.3.3 T he m arginal approach Consider the form in (6.20) for the multinomial distribution with the conjugate prior Dirichlet distribution in (6.22). It is well-known that, from (6.20), the marginal distribution of each X{ is a binomial distribution with the two param eters rii - n, pi, i = 1,2, • • • , k. It is straightforward to show, using the Dirichlet pdf in (6.22), th a t the m arginal distribution of each pi is a beta distribution: Pi where ~ beta(ai,/% ), for i = 1,2, • • • , k (6.27) A ssessm en t task s Exploiting the beta marginal distributions, the elicitation process may be divided into k At each step, the expert will be asked to assess three quartiles for pi, the binomial steps. probability of category i (i = 1,2, • • •,k). SeeFigure 6.1, where the lower and upper quartiles have already been elicited for the first two categories. These quartiles can then be used to estim ate the two hyperparam eters a.{ and fa of the beta prior distribution of pi, as proposed in Section 6.2. Since we use the marginal approach, the categories here are interchangeable. It does not m atter where to sta rt assessing nor the order of the categories. To reconcile these separate marginal beta distribution into a Dirichlet distribution, we use a least-squares technique as follows. Least-squares techniques It is clear th a t the system of equations in (6.28) does not have a consistent solution, a = (ai <22 ••• afc)' ■^rom (6-28), each marginal step of the elicitation process provides estimates of ai and N{, namely for i = 1,2, • • • ,.fc, (6.29) for i = 1,2, • • • , k. (6.30) and Nj, — on -f- fa aj, — ^ The estim ated hyperparam eters must fulfill the unit sum constraint of the probability expec­ tations, i.e. they m ust satisfy k X > = 1, i= 1 where « = # . * = 1 ,2 ,--- ,fe. (6.31) Lindley et al. (1979) investigated the reconciliation of assessments th a t are inconsistent with the laws of probabilities (incoherent). They developed least-squares procedures as recon­ ciliation tools th a t may be used for any expert’s incoherent assessments. Following their 163 approach, we propose the following options for reconciling different incoherent estimates of Hi and N , yielding coherent estimates /r* and N *, respectively. O p tio n s fo r h *: 1. Normalize each Hi> as required for the Dirichlet distribution, giving (6.32) 2. Minimize the sum of squares of differences between h * and /i*, i = 1,2, • • • , fc, subject to the constraint Y a =i Mi = 1- ^ h is can be done using Lagrangian multipliers to minimize Q as follows. k Minimize Q = ^ ( m * - Hi f + k M* ~ !)• (6.33) Solve for ya*, giving (6.34) However, the values of h * computed here using Lagrangian optim ization are not guaran­ teed to be positive. If negative values are found, we replace the Lagrangian multipliers m ethod with a numerical restricted minimization technique. The downhill simplex m ethod of Nelder and Mead (1965) can also perform restricted minimization as follows. k Minimize Q = ^^(M i ~ Mi)2? (6.35) 0 < Hi < 1, (6.36) such th at i = 1,2, ••• ,k, k (6.37) To solve this restricted optim ization problem, for /z*, i = 1,2, ••• , fc, our elicitation software, PEGS-Dirichlet, implements a program for minimization w ritten by Flanagan (2011). The initial values for this m ethod are obtained from (6.32). 3. The option in (6.34) changes each value of Hi by adding a fixed amount. However, the precision of each estimate, i.e. the inverse of its variance, can be used as a weight to reflect the expert’s confidence in each of her assessments [Lindley et al. (1979)]. A constrained weighted least-squares procedures can be formulated as follows. k k Minimize Q = _ Mi)2 + MX^ ^ ~ ^ (6.38) i—1 z=1 where aiPi Wi = [Var(pi)]_1 = 1 1 X a i + Pi + , i = 1,2, • • • , k. (6.39) + A )2. Solving for /i* gives 1 Hi=lH+ * = 1.2, --- ,fc. -V Again, the minimization m ethod implementing the restricted downhill simplex m ethod is used if negative values of /i* are found: k Minimize Q = X ^ ■■ ■ — fii)2, (6.40) 7=1 under the same constraints given by (6.36) and (6.37), using initial values as in (6.32). O p tio n s for N *: 1. Since no constraints are imposed on N*, minimizing the sum of squares k Minimize Q = X^(-^* —Nj)2, 7=1 gives the average N* = ^ i=l k N- \ (6.41) 2. Using the same weights as in (6.39) gives the weighted average N* = WiNi (6.42) as a solution of k Minimize Q = — Ni)2. 7=1 Estim ating fi* and N *, using any of the options listed above, makes it easy to estim ate a{ by a*, where a* = ntN*, i = l,.2 ,...,f c . 165 Implementation and feedback We use three different combinations of the options given above as follows: 1. Direct normalization of fi? as in (6.32) and the average N* in (6.41). 2. Least-squares optim ization for /i* as in (6.33) or (6.35), and for N* as in (6.41). 3. Weighted least-squares optim ization for f.11 as in (6.38) or (6.40), and for N* as in (6.42). The software elicits three hyperparam eter vectors of the Dirichlet distribution, one vector for each of the above combinations. Each vector is then used to compute the corresponding pairs of marginal beta param eters as given in (6.28). Three quartiles for each b eta m arginal are computed numerically for each different Dirichlet hyperparam eter vector. The three sets of quartiles are then displayed to the expert and she is asked to select the set of quartiles th a t best represents her opinion. The vector w ith the selected set of quartiles will be taken as the final elicited hyperparam eter vector of the Dirichlet prior. See Figure 6.2, where the first two combinations are shown and the expert has selected the second one. H e re a r e y o u r un co n d itio n al a s s e s s m e n ts , y o u m a y c h a n g e a n y of th e m ! Fie E<St Tods Help U nconditional M edians and q u artiles already a s s e s e d fo r Each C ategory B 1 i Category t Category 2 Category 3 C ategories r'; S ip w w i l ’O o p o o n a r Figure 6.2: A feedback screen showing 2 different quartile options 166 The expert is still able, however, to modify any or all of the selected set of quartiles, in which case beta param eters are computed again as in Section 6.2, and the final Dirichlet hyperparam eter vector is computed according to equations (6.29) - (6.32) and (6.41). 6.3.4 T he conditional approach Consider the form of multinomial distribution given in (6.21), with the form of conjugate Dirichlet distribution given in (6.23). If P k_ i ~ ipi p2 ••• P k —i ) ~ D hichlet(ai, 02, • • • , &k) then it can be shown [e.g. Wilks (1962)] th a t the marginal distribution of any subset of P k _ x is again a Dirichlet distribution, e.g. k Pr = (pi p2 • • ■ pr ) ~ Dirichlet (ai, a2, • • • , ar , 1 < r < k - 1. Oj), i=r+1 For l < r < f c — 1, we can get the following conditional scaled beta distributions (6.43) (6.44) which are the scaled beta distributions over the intervals (0,1 — 1 Pi)- The distributions in (6.44) are also known as three param eter beta distributions, i.e. k r —1 i=r+ 1 i=l for 1 < r < k — 1. Applying the transform ation ( for r = 1 gives k for r = 1,2, • • • , k — 1. i= r+ 1 167 (6.45) A ssessm en t task s The elicitation process is conducted as follows: • The expert chooses the most convenient category to sta rt with; we denote its probability as p i . • The expert assesses three quartiles for pi, which are then converted into estim ates of the two hyperparam eters a\ and Pi of the beta distribution, beta(o:i, Pi). • The expert is asked to assume th a t the median value she gave in the first step is the correct value of pi, and she then assesses three quartiles for p 2 - Figure 6.3 shows the graph after the median and lower quartile of the second category have been assessed by the expert, given the median of the first category as shown by the red bar. :.r :z~z;.;:;i:zzzrziziz^^ " ~ ~ You a s se s s e d th e lower quartile probability of category (Category 2) to be (0.213). a — — .................. z — 1 fit Edt Tools Help Eliciting Q uartiles of th e probabilities of C ategory (C ategory 2) 0.95 0.90 085 0.80 0.75 0.70 0.65 0.60 0.55 0.50 | 0 ,5 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Category 1 Category 2 C ategory 3 Category 4 C a tego ries niadTl fHwT| Figure 6.3: Assessing conditional quartiles for Dirichlet elicitation • Dividing each of the three quartiles of P2 by 1 —pi, we get the quartiles of p\. Hence we obtain estim ates of the hyperparam eters a 2 and P2 of the m arginal b e ta distribution of P i 168 • The process is repeated for each category except for the last one. For r = 3,4■ • • , k —1, the expert gives quartiles for (pr \pi,P2 >• *• ,P r-i)- Dividing by 1 — J2i=i Pi gives the three quartiles of p*, which are used to estim ate the two hyperparam eters a r and (3r of its m arginal distribution. (We do not require the m arginal distribution of pk-) • To help the expert during this task, the software presents an interactive graph show­ ing the pdf curve of the conditional beta distribution of (pr \pi, " for r = 2,3, •• • ,k — 1, see Figure 6.4. The expert is able to change her assessed conditional quartiles of pr until she finds the conditional pdf curve an acceptable representation of her opinion. Tho C onditional Scalod B ota D litrlbutlon of P2 4.0 35 3.0 2.5 1.5 1.0 0.0 0 0 21 0.3 0.37 0.5 FM EdK T00t» Help Eliciting Q uarttlot o f tfio probability* of Catogory (Catogory 2) Category 3 Category 1 Hw-.-f £0 i g j j io j ** | Q !■«!»»hgp_________ | Category 4 fwgrl | |f f l TUg C Figure 6.4: Assessing conditional quartiles w ith scaled beta feedback Eliciting the hyperparameter vector Using (6.45), we get the following system of equations (Xj* for r = 1,2, • • • , k — 1, Cby J (6.46) k Pr = air for r = 1,2, i= r + l 169 — 1. Each elicited beta distribution has its own different estim ate of N , given by Nr — ^ ^ OL{ T Oir T fin (6.47) based on ai, i — 1,2, • • • , r — 1, which has been estim ated in previous steps. The system of equations in (6.46), as in the marginal approach, might not be consistent nor have a unique solution for a = (a1} a 2 , • • • , a*,)- So, we try to find a way of averaging this system to get a vector of estimates a* = (a *; a *? . . . ? a *) th a t is a good representa­ tion of the expert’s opinion. We believe th a t keeping the mean value fixed, where possible, while moving from different beta distributions to a Dirichlet distribution may be a sensible approach. Using (6.24), put Hence, in view of (6.46) and (6.47) r = 1,2, — , k - 1, r (6.48) file - 1 k- r = k. 1 Since, for the Dirichlet Distribution, it is required th a t k k'r — 1) r —1 we normalize the set of /ir , for r = 1,2, • • • , k, to get Moreover, let and take a*r = n*rN*, r = 1,2,-•• ,k. It remains now to find a proper estim ate of N*. We take this as the average of all the denominators in (6.48): k—1 f k -1 r E E ai + T—1 —1 i =1 &i + Pk- 1 k Changing the expert’s selection of the first category, as well as the order of conditioning categories at each step, will lead to different estimates of a. To overcome this, one possibility is to repeat the whole process several times, using different starting categories and orderings. This will give sets of estimates a*’s, for which a simple averaging might give a suitable choice for a*. However, showing the m arginal quartiles of the marginal beta distributions as a feedback to the expert and offering her the option of changing them seems another sensible option. Feedback The feedback process for the conditional approach is similar to th a t for the m arginal approach. The main difference is in the relationship between the Dirichlet hyperparam eters and beta param eters in the two approaches. To present the quartiles of each probability pi, i = 1,2, • • • , k, as feedback to the expert after applying the conditional elicitation approach, we must first compute the param eters of the marginal beta distributions. The two param eters oci and of each marginal beta distribution of pi, i = 1,2, • • • , k, can be simply computed from the already elicited hyperparam eter vector a* of the Dirichlet distribution: k 171 These can be used to compute numerically the three quartiles of each beta m arginal distri­ bution. The computed quartiles are then presented to the expert as feedback, see Figure 6.5. m -Ini x | H e re a r e y o u r unco n d itio n al a s s e s s m e n ts , y o u m a y c h a n g e a n y o f th em ! W- JSlx} : T ie B B T o rts (to p U nconditional M edians a n d q u artiles alre ad y a s s e s e d fo r Each C ategory J C ategory 3 Change Ita fia n s j J O Change Quartiles C ategory j_ 4 fniuinr»~| Figure 6.5: The feedback graph presenting m arginal quartiles The expert is asked to change any of the quartiles th a t do not satisfactorily represent her opinion. If any (or all) of these m arginal quartiles are changed by the expert, we apply the m arginal approach to re-elicit the Dirichlet hyperparam eters as follows. The new set of modified m arginal b e ta quartiles are used to elicit new pairs of beta param eters as proposed in Section 6.2. Using these new param eters, together w ith equations (6.29), (6.30) and (6.31), we apply the first combination proposed in the m arginal approach in Section 6.3.3. We implement the first combination th a t uses simple averaging as a quick and straightforw ard way to recompute the Dirichlet hyperparam eter vector using th e new set of modified quartiles. The whole process can be continuously applied until the expert is satisfied w ith the quartiles in the feedback. 172 6.4 C oncluding com m en ts A reasonable m ethod for eliciting beta param eters using quartiles has been proposed. The m ethod combines two different approaches th a t have been used separately in the literature. A normal approxim ation was used to compute initial param eter values, which have then been optimized using a least-squares technique. In order to elicit the hyperparam eter vector of the Dirichlet distribution, we made use of both the m arginal and conditional beta distributions in two different approaches. The two approaches are programmed in the PEGS-Dirichlet software th a t is freely available at http://statistics.open.ac.uk/elicitation. As it is the simplest conjugate prior distribution for multinomial models, the Dirichlet distribution is very tractable. However, its lack of flexibility limits its usefulness as a prior distribution. In the next chapter, we discuss the drawbacks of the Dirichlet distribution and propose new elicitation methods th a t give more flexible prior distributions for multinomial models. 173 C hapter 7 E liciting more flexible priors for m ultinom ial m odels 174 7.1 In trod u ction Being a conjugate prior for the multinomial models, the standard Dirichlet distribution is widely used for its tractability and m athem atical simplicity. However, the Dirichlet dis­ tribution in its standard form has been criticized as insufficiently flexible to represent prior information about the param eters of multinomial models [e.g. Good (1976), Aitchison (1986), O ’Hagan and Forster (2004), Wong (2007)]. The main criticisms of the Dirichlet distribution can be summarized as follows. 1. It has a limited number of param eters. A fc-variate Dirichlet distribution is only speci­ fied with k param eters. These determine all the k means, k variances and the k{k —1)/2 covariances, as given in (6.24)-(6.26). 2. The relative m agnitudes of each a2- determine the prior mean, while only the overall m agnitude N — Sa^ determines all the variances and covariances if the means are kept fixed. 3. Consequently, the dependence structure between Dirichlet variates cannot be deter­ mined independently of its mean values. 4. Dirichlet variates are always negatively correlated, as can be seen from the covariances formulae in (6.26), which may not represent prior belief. 5. Dirichlet variates th a t have the same mean necessarily have equal variances. M otivated by these deficiencies, many authors have been interested in constructing new families of distributions for proportions to allow more general dependence structures [e.g. Leonard (1975), Aitchison (1982), Albert and G ubta (1982), Krzysztofowicz and Reese (1993), Rayens and Srinivasan (1994), Tian et al. (2010)]. Some of these new distributions are direct generalizations of the standard Dirichlet dis­ tribution [e.g. Dickey (1968, 1983), Connor and Mosimann (1969), Grunwald et al (1993), Hankin (2010)]. We select one of them and develop a m ethod of eliciting its hyperparam ­ 175 eters as a prior distribution for the multinomial model. The selected generalized Dirichlet distribution shares some of the desirable properties of the standard Dirichlet distribution. It is conjugate, reasonably tractable and can be elicited via the beta elicitation procedure proposed in Chapter 6. The m ethod of eliciting a generalized Dirichlet distribution is given in Section 7.2 and an example illustrating its use is given in Section 7.3. A Gaussian copula function is proposed in Section 7.4 as a flexible m ultivariate distribution th a t combines the marginal beta distributions th a t an expert has assessed. 7.2 E licitin g a generalized D irich let prior for a m u ltin om ial m od el 7.2.1 C onnor-M osim ann d istribu tion Connor and Mosimann (1969) introduced a form of the generalized Dirichlet distribution th a t is also known as Connor-Mosimann distribution. It has a more general covariance structure than the standard Dirichlet distribution and a larger number of param eters, 2(k — 1). Its properties have been investigated by Lochner (1975) and Wong (1998), who used it as a prior distribution in a real life engineering application in Wong (2005) and addressed its maximum likelihood estim ation in Wong (2010). The density function can be w ritten in the form [Connor and Mosimann (1969)], k- 1 *( pi,P2,-” ,Pk) = n »=i 0 < Pi < 1, ^ bi—i r (a* + bi) Dai-1 / y ' r(o i)r(6 j) Pi \ j =' P] YPi = b ai > 0, v t 1' 1, (7-1) bi > 0, &o is arbitrary. Or, equivalently, in the form [Lochner (1975)] fc -l r i= 1 T ( c ij T 6 j ) —j . ---------------------P i 1 (1 r (a i)r(fei) 0 < Pi < 1, where j i = bi - (ai+1 + bi+1 ), for i = Y,Pi ^ Pl P 2 ---------------- P i ) 11 a%> 0> bi > 0, 1 , 2, • • • , k - 2, and j k - i = bk-i - 1 . 176 (7.2) The standard Dirichlet distribution is a special case of the Connor-Mosimann distribution when b{ = a*+i + fef+i, for i = 1,2, • • • , k — 2. Moreover, it is also a conjugate prior to the multinomial distribution. See, for example, Wong (1998). This generalized Dirichlet distribution can be obtained by transform ing (k — 1) indepen­ dent beta variates Zi, Z 2 , • • • , Z k - 1 , each with param eters ai and b{, for i = 1,2, • • • , k — 1, as follows for j = 1, Zi, Pj = (7.3) 3-1 j = 2r - - , k - l . \ i= l The remaining variable pk can be also given, in term s of Z i, Z 2 , • • • Zk, as fc—1 Pk = (7.4) If(l _ i=l where, by definition, Zk — 1. The inverse transform ations are given by for j = 1, Pi, Zj — < Pj 3-1 for j = 2, • • • , k. (7.5) 1i=l The first two moments of the generalized Dirichlet variates can be computed, in view of (7.3) and (7.4), as for j = 1, Sj — E (pj) — < j- (7.6) 1 £ ( Z , ) J ] E ( l - . Z i ), for j = 2, - - - ,fc, i=l and for j = 1, Ti = E ( t f ) = j-i £ ( Z f ) n £ (1 - Z i)2, i=l (7.7) for j = 2, • • • , A. Hence, using well-known formulae for the first two moments of the standard beta distribution, 177 and since Zk = 1, we write ai ai+ bi' Sj = for j = 1, j-i j-j Cl-i lj , Oi b j . ^ CLi T b{ CLj J for j = 2, • • • , k - 1, (7.8) J i= l k-1 n i= 1 a i(a i + 1) (ai + 6 i)(o i + b i for j = k , ai -f bi for j = 1, + 1) j-i aj(aj + 1) + 1) , (%• + bj)(aj + bj + 1) (ai + bi)(ai + + 1) n for j = 2, ••• , k - 1, bi(bi + 1) 1^1 (cii + bi)(ai + bi + (7.9) for j = k. 1) and V ar(pj) = Tj — S j , for j = 1,2, • • • , k. (7.10) Regarding covariances, Connor and Mosimann (1969) showed th a t C ov(pi,pj) = - E (Pj) E(1 - p i ) V ar(pi), for j = 2, ••• ,k, (7.11) j -1 Cov ( p j , p j + 1 ) = J2 (Zj + 1) E [ Z j ( 1 - Zj)} J ] JS[(1 - Z i f } i= 1 - E(pj)E(pj+ 1 ), for j = 2, • • • , k - 1, (7.12) and C ov(pj,pTlt) = E ( Z m) E ( Z j+1) m —1 n Cov(pj,pj+\), for 1 < j < m < k. (7.13) i= j'+ l Therefore, pi is always negatively correlated with all other variates. However, any other two successive variates can be positively correlated, as can be seen from equation (7.12). Moreover, the correlation between any pj and pm, for 1 < j < m < k, has the same sign as th a t of Cov(pj,pj+i). In this sense, the generalized Dirichlet distribution has a more flexible dependence structure than the standard Dirichlet, which always imposes negative correlations between all pairs of variables, as mentioned before. Similar results were found by Lochner (1975), while Wong (2005) used these properties to select a generalized Dirichlet prior for sorting probabilities of microelectronic chips th a t tend to be positively correlated. 178 As in the case of the standard Dirichlet distribution, the conditional distributions of the generalized Dirichlet variates are still scaled beta distributions. This can be shown, using the marginal distributions of the generalized Dirichlet distribution, as follows. If pfc_ 1 = (pi j P2 i • • • ,Pk- 1 ) has a generalized Dirichlet distribution of the form (7.2), then the marginal distribution of any subset from pfc_ 1, say pr = (pi,P 2 , • • ■,Pr), r = 2,3, • ■• , k —1, is again a generalized Dirichlet distribution with the corresponding param eters [e.g. Wong (1998)]. The conditional distributions of pr \pi,P2 , • • ■,Pr- 1 > for r = 2,3, • • • , — 1, can be com­ puted from (7.2) as follows . . i r ( P r\ P l , P2 , . ' ■■ , P r - l ) = '^'(P.r’ a2>’ ' ’ >Ur—1) b l i ^2? >br —i) T— r----- 7---- \ KWr- i ; a i >fl2’ ’ " 5ar-2,O i,62,-- - A - 2 ) 1 / 1 ' l - , , (7-14) \ i>r 1 ■■ 1 /JK.MU-EtJpO"’-V i-EClw , (7-15) which are scaled beta distributions over the intervals (0 , 1 — X)[=i Pi)’ They are also known as three param eter beta distributions, i.e. r —1 (Pr\pi>P2 5 ’ ■’ 5Pr—l) ~ b eta(ar , 6 r , l - ^ 2 Pi), for r = 2,3,--- , / c - l . i= l As in Section 6.3.4, applying the transform ation f pu for r = ^r r— 1 1, for r — 2 ,3, • • • , k — 1 , i= 1 gives p* ~ b eta(ar , 6 r ) 7.2.2 Vr = 1,2, • • • , k — 1. (7-16) A ssessm ent tasks The elicitation process given before in the conditional approach for the standard Dirichlet case in Section 6.3.4 is still valid here. The main difference in the current case is th a t the generalized Dirichlet hyperparam eters (ai, a 2 , • • • , Ufc-i, &i, &2 >• • • >k/c-i) are exactly the 179 param eters (a*, bi) of the beta distribution of p* in (7.16), for r = 1,2, • • • , k — 1. Hence, the generalized Dirichlet hyperparam eters are directly estim ated using beta param eters th a t can be elicited using conditional assessments as in Section 6.3.4. Note th a t no compromise or averaging is needed here, since the total num ber of hyperparam eters th a t are elicited is equal to the num ber of hyperparam eters in the generalized Dirichlet distribution, namely, 2(k —1). This extended num ber of param eters does not eliminate the benefits of feedback, but it gives the generalized Dirichlet distribution a more flexible structure than the standard one. Positive correlations can occur in this generalized case, as discussed before, making it more useful and practical in quantifying expert’s opinion. However, Aitchison (1986) criticized the class of generalized Dirichlet distributions as being intractable, particularly w ith respect to statistical analysis. He also noted th at, despite having a more general dependence structure than the standard Dirichlet, the class still retains a strong independence structure. 7.2.3 M arginal quartiles o f th e generalized D irichlet d istribu tion It is always useful to give feedback to the expert based on her elicited hyperparam eters. This feedback makes the elicited quantities a better representation of the expert’s opinion. For the generalized Dirichlet prior, where the assessed probability quartiles are all conditional except for the first category, it is helpful to inform the expert of the corresponding m arginal proba­ bility quartiles of each category. She should be given the opportunity to modify them so th a t they are closer to her opinion, and the elicitation m ethod should change the hyperparam eter vector according to these modifications. Unfortunately, marginal distributions of the generalized Dirichlet are not directly of the b eta type. However, we make use of the independent beta random variables given in (7.5) to approxim ate the distribution of each Pj, j = 1,2, • • • , k , as a standard beta distribution. Detail is given in the remainder of this section. 180 A n ap p roxim ate d istrib u tio n for th e p rod u ct o f in d ep en d en t b e ta variates Fan (1991) introduced a beta approxim ation to the product of a finite num ber of independent beta random variables. His m ethod is described in Johnson et al. (1994) and G upta and N adarajah (2004), who report favorably on the m ethod based on Fan’s comparison of the first ten approxim ate and exact moments. The m ethod equates the first two moments of the approxim ate beta distribution to the corresponding product moments of the independent beta random variables. In what follows, we use the m ethod of Fan (1991) to derive the marginal approxim ate beta distribution of each P j , j = 1,2, • • • , k, from which the marginal quartiles are computed. The m ethod can also be inverted to give a new elicited hyperparam eter vector of the generalized Dirichlet distribution, based on the marginal quartiles, if any have been modified by the expert. For j = 1,2, •• • ,k, using the m ethod of Fan (1991), the distribution of each pj can be approxim ated by Pj ~ b e ta (aj,(3j), (7.17) where j ~ T j-Sj ’ and Pj - Tj - S f ’ w ith Sj and Tj as given by equations (7.8) and (7.9), respectively. Feedback The three quartiles of the distributions in (7.17) are numerically computed and presented to the expert. She is invited to modify some or all of them as she thinks necessary, in which case the modified quartiles are converted in the same m anner as proposed in Section 6.2, to give modified pairs of param eters (a? ,flj)- 181 The modified two moments of each pj, for j = ' 1,2,--* , k , are computed as follows s' = aj (7.18) *» and rrJ + J ! ) (7.19) (aJ+^;)(aJ + /8; + l) After obtaining Sj and Tj, they are transform ed into normalized values Sj and Tj", respec­ tively, such th a t X)jLi -S’j = 1In the m anner of (7.8) and (7.9), we can write the two modified moments of each Zj , denoted by Uj = E * ( Z j ) and Wj = E * ( Z j ) , for j = 1,2, • • • , k — 1, as st, tj. 3 — for j = 1, St _______________ a*: + j- 1 n 1=1 \ for j = 2, -, , A; — 1, b* a* + I* 1 1 and for j = 1, rn* a-(a - + 1) w . = --------- U J -------------J (aj + 6j)(aj + 6* + 1) j jL 1 n -, for j = 2, • • • , A: — 1, &*(&*+1) («? + &:)(«? The above system of equations can be recursively solved for the modified hyperparam eters of the generalized Dirichlet distribution, aj and 6j, for j = 1,2, • • • , k — 1, to give . _ I'liUj-Wj) j ~ »J = Wj (1 - uj ’ Uj )( Vj - Wj ) W j-U f These modified hyperparam eters of the generalized Dirichlet distribution represent the final output of the method. 7.3 E xam ple: O b esity m isclassification Obesity and being overweight are serious public health problems whose adverse consequences can include diabetes, high blood pressure and cardiovascular disease. Obesity can be mea- sured using the Body Mass Index (BMI) of adults, which is defined as body weight (in kilograms) divided by body height (in meters) squared. Obesity is defined as a BMI of over 30 and overweight is a BMI over 25. Looking at the situation in Europe it is estim ated th at 50% of adults between 35 and 65 years of age are overweight, of whom 10-25% are obese. M alta reportedly has one of the highest levels of overweight people in Europe. According to the European Health Interview Survey (EHIS), November 2011, M alta recorded the highest proportion of obese men (24.7%) and women (21.1%) amongst the 19 EU Member States for which d a ta are available. The EHIS reports 36.3% of adults in M alta being overweight and a further 22.3% being obese. Obesity in M alta is indeed a m ajor public health challenge and it is targeted as a priority action in M alta’s Strategy for Sustainable Development. In interview surveys, the heights and weights of participating subjects are not measured. Self-reported values of these variables are normally used instead. However, self-reported val­ ues are less precise and have no guarantee of accuracy, specially when they are converted into BMI (Shields et al. (2008)). Indeed, the prevalence of overweight and obesity are gen­ erally underestim ated when calculated from self-reported d a ta as compared w ith measured data. Adults have been shown to systematically overestimate their height, and underestim ate their weight. The extent of weight underreporting increases w ith increasing measured weight (Shields et al. (2008)). As a result, significant misclassification occurs when BMI categories are estim ated from self-reported data. Correcting interview d a ta for this misclassification bias is desirable but d ata to estim ate the bias is lacking. Instead, quantifying expert opinion might be used to estim ate the bias. One aspect of the obesity misclassification problem in M alta was formulated in a m ultino­ mial model as follows. It relates to Maltese adults (16+) who self-report themselves as having a normal weight (18.5<BM I<25). Their actual clinical BMI classification may fall in one of the following multinomial categories: Underweight (BM I<18.5), Normal (18.5<BM I<25), Overweight (25<BM I<30) or Obese (BM I>30). A health information expert, Dr. Neville Calleja, used our PEGS-Dirichlet elicitation software to quantify his opinion about this 183 model, first giving two separate sets of assessments, each of which determines the param ­ eters of a Dirichlet distribution, so th a t his opinion could be represented by a Dirichlet prior distribution. The second set of assessments was also used to determine the param eters of a generalized Dirichlet distribution, so th a t his opinion could be modelled by a more flexible prior distribution. Dr. Calleja has been responsible for all health surveys in M alta for the last 10 years. Currently, he is the director of the D epartm ent of Health Information and Research in the M inistry of Health, the Elderly and Community Care, M alta. His departm ent leads the collection, analysis and delivery of health related information in M alta. To elicit a Dirichlet prior based on unconditional b eta marginals, the expert ordered the four categories as Normal, Overweight, Obese, Underweight. His unconditional median assessments for these categories were 0.65, 0.20, 0.10, 0.04, respectively. Then he gave his unconditional lower (upper) quartile assessments as 0.55, 0.15, 0.06, 0.02 (0.70, 0.30, 0.14, 0.07), respectively. See Figure 7.1. The four beta marginals were then reconciled into a Dirichlet distribution using three different ways; direct normalizing and averaging, leastsquares optimization, and weighted least-squares. Since the expert’s assessed medians nearly sum to one, the three different ways gave sets of reconciled quartiles th a t were very close to each other. He selected marginal medians and quartiles th a t were computed by direct normalizing and averaging. The elicited hyperparam eters of the Dirichlet prior distribution were obtained as a\ — 13.23, 02 = 4.71, 03 = 2.18, 184 04 = 1.08, with their sum N — 21.20. m =Mx3| Now, y o u h a v e finished w ith all c a te g o rie s . You m a y p ro s s 'N e x t' t o p ro c e e d F k EOS t a b *Wp Eliciting Q uartiles o f t h e probabilities of e a c h ca teg o ry Overweight O b ese C ategories fwfgri Figure 7.1: Medians and quartiles assessments Based on conditional beta distributions, the expert quantified his opinion again to elicit another Dirichlet prior for the same problem, but using a different elicitation m ethod. His three quartile assessments of the first category were 0.60, 0.65, 0.72. Then, he was asked to assume th a t the probability value of the first category is exactly 0.65; given this inform ation he gave his three quartiles for the second category as 0.17, 0.20, 0.25. Finally, conditioning on the probabilities of the first two categories being 0.65, 0.20, he gave the three quartiles of the third category to be 0.07, 0.09, 0.15. The three quartiles of the fourth category were autom atically computed and shown to the expert as 0 .0 1 , 0.06, 0.08. 185 | j Now, y o u h a v e finished w ith th is c a te g o ry . You m a y p r e s s 'N e x t1 to p ro c e e d B a s g g s ^ r z z z z : .; .: .: : : z i: .: - : : .: : .:: z z i,j a a f R e Edt Tsdts VfiAp Eliciting m edian probability o f c a teg o ry (O bese) 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 3 0.50 I 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Norm al Overweight O b e se Underw eight Categories I « B ack i f Ne'»t> | rW p F j? ;~ Figure 7.2: Assessing conditional medians Figure 7.2 is a screen shot after the expert had assessed his median for the third category. The median probability of the third category is in blue (it was assessed), while the fourth median is in yellow (it was calculated from other assessments). Figure 7.3 shows the condi­ tional quartiles th a t the expert assessed for the third category and the conditional quartiles th a t were calculated for the fourth category. The elicited hyperparam eters of the Dirichlet distribution using this m ethod were ai = 19.91, a 2 = 5.00, az = 1.11, a4 — 0.65, w ith a sum of N = 26.67. 186 raiis, you bavntlnfslietlYTfUi tts£s category- You may press ta a t* to proceed Eliciting Q uartiles o f t h e p robabilities o f t f He n ] U n d e rw e ig h t Overweight C ategories Figure 7.3: Assessing conditional quartiles On finishing the elicitation process using conditional assessments, the expert was shown a software message offering him the possibility of using the same conditional assessments to elicit a generalized Dirichlet distribution. The expert chose to elicit this more general distri­ bution as well. The following hyperparam eters of the generalized Dirichlet prior distribution were elicited, a\ = 19.29, a-i — 4.41, as = 0.91, b\ — 10.23, 62 = 3.15, 63 = 0.54. To compare the three prior distributions elicited in this example, expected values and variances of multinomial probabilities were computed for each distribution as shown in Ta­ ble 7.1. The means and variances of the Dirichlet distribution were computed using the elicited values of the hyperparam eters in formulae (6.24) and (6.25), respectively. The same was done for the generalized Dirichlet using formulae (7.6) to (7.10). 187 Table 7.1: Probability assessments for different elicited priors Marginal assessments Conditional assessments Generalized Dirichlet Median E (Pi) V (Pi) Median E{pi) V(pi) E (Pi) VM Pi 0.65 0.624 0 .0 1 2 0.65 0.746 0.007 0.653 0.008 P2 0 .2 0 0 .2 2 2 0.008 0 .2 0 0.187 0.006 0 .2 0 2 0.006 P3 0 .1 0 0.103 0.004 0.09 0.042 0 .0 0 1 0.091 0.004 PA 0.04 0.051 0 .0 0 2 0.06 0.024 0 .0 0 1 0.054 0.003 It can be seen from Table 7.1 th a t the first Dirichlet prior, which was elicited using marginal assessments, and the generalized Dirichlet prior both gave expected values of the multinomial probabilities th at are close to the assessed medians. The second Dirichlet prior th at was elicited using conditional assessments gave a relatively higher mean value for the first probability than its assessed median, combined w ith a reduction in the expected values of all other probabilities. This is a little surprising as the generalized Dirichlet utilized the conditional assessments th a t give the second Dirichlet distribution, yet its hyperparam eters are similar to the m ethod th a t uses marginal assessments. The two elicited values of the hyperparam eter N were relatively close to each other, 21.20 and 26.67, in the two elicited standard Dirichlet priors. (There is no single value for N with the generalized Dirichlet.) Moreover, variances of the multinomial probabilities were all small and also close to each other in the three elicited prior distributions. After eliciting each of the three Dirichlet prior distributions discussed above, the software showed the suggested marginal medians and quartiles of each pair to the expert. He accepted the suggested marginal quartile values, saying th a t the suggested values were very close to his initial beliefs. Keeping the unit sum constraint in his mind, the expert rem arked th a t assessing conditional medians and quartiles was easier th an assessing m arginal quartiles. He stated th a t he could not think about marginal assessments for each category independently of the others. However, he noted at the same tim e th a t the elicited generalized Dirichlet distribution may be the most flexible prior of the three. 7.4 C on stru ctin g a copula fun ction for th e prior d istrib u tion Using the m arginal elicitation process given before, we obtain a number of marginal beta distributions. R ather than assume these stem from a Dirichlet distribution, we would like to allow a more flexible dependence structure via their joint distribution, with the aim of better representing the expert’s opinion. A flexible tool for this task is given by the copula function, which allows us to choose the marginal distributions independently from the dependence structure between them . The latter structure is given by the copula. A copula is best described as a m ultivariate distribution function th a t is used to bind together m arginal distribution functions so as to form a joint distribution. The copula pa­ rameterizes the dependence between the marginals, while the param eters of each marginal distribution function can be assessed separately. See for example, Joe (1997), Nelsen (1999) and Kurowicka and Cooke (2006). There are many types and classes of copula functions, but the m ost intuitive ones use inverted distribution functions as argum ents in known m ultivariate distributions [Nelsen (1999)]. The general inversion form of a copula function C is given by C[G i(z i), • • • ,G k(xk)] = F (1,...ik) { F f 'I G U n ) ] , • • • , F ^ [ G k(xk)}} , where Gi are the known marginal distribution functions, Tp,... ,fc) and F{ (i = 1, • • • , k) are the assumed joint and marginal distribution functions, respectively. The copula function C works as the cdf of the m ultivariate distribution th a t “couples” the given m arginal distributions. 7 .4 .1 G a u s sia n c o p u la fu n c tio n The best-known example of the inversion m ethod is the Gaussian copula [Clemen and Reilly (1999)], which is given by ,G fc(xfc)] = ®it,,! { $ - 1[G1(x1)],--- .S -^ G tO t* )]}. 189 (7.20) Here <&k,R is the cdf of a fc-variate normal distribution with zero means, unit variances, and a correlation m atrix R th a t reflects the desired dependence structure. is the marginal standard univariate normal cdf. Since <&k,R and 4> are differentiable, the Gaussian copula density function can be simply obtained by differentiating (7.20) with respect to X{, i = 1,2, • • • ,k , giving f ( x i , x 2, - " ,Xk\R) = 9 l ^ 9k^ ex p { - ^ Z fc ( ^ _1 ~ h ) Y k}. (7.21) where X t = ( $ - I [G1( s 1))) ^ - ' [ G s ^ ) ] , gi(.) is the density function corresponding to ® - 1[G?*(a:fc)])> i — 1,2, ••• ,k, and Ik is the identity m atrix of order k. To construct a Gaussian copula function in the case of a multinomial model, we can think of each marginal distribution as a beta distribution whose two hyperparam eters have been assessed. Then we can construct a Gaussian copula function for the m ultivariate dis­ tribution of pi,p2, • • • jPfc-i- According to the unit sum constraint, the remaining variable, Pk = 1 — Yli=i Pi> can t reat ed as a redundant variable th a t may be removed from the m ultivariate distribution to avoid singularity problems. Using the Gaussian copula function, the dependence structure of the m ultivariate distribution will have high flexibility rather than the limited dependence structure imposed by the Dirichlet distribution. The Gaussian copula function is indexed by the correlation m atrix R, which needs to be elicited effectively and must be a positive-definite m atrix. In what follows we introduce a m ethod, inspired by Kadane et al. (1980), to elicit the correlation m atrix R th a t is sure to be positive-definite. Let Gi(pi) be the cdf of the beta distribution of pi with hyperparam eters cn* and fy, z = l , 2, --* , k —1, and assume th a t the joint density of P \ , P 2 , • • • ,P k-i is given by a Gaussian copula density, such th a t ,Pk-i\R) = 9liPl) X '|'^[vf*!~l(P*~l) e x p f - i i l - j t i r 1 190 (7-22) where Z'k- i = t e - ^ G x f a ) ] , S -M G s fe )], ■■■, S -M G n fe - i) ]). and gi(.) is the beta density of pi, i — 1, 2, • • • , k — 1. Note th a t the marginal distributions of this joint density are still the desired beta marginals. Since the hyperparam eters of each beta distribution of pi, i = 1,2, • ■• , k — 1, have already been elicited, the prior distribution is totally known except for the m atrix R. Although the above density is not m ultivariate normal for p \ , p 2 , • • • ,Pk-i and the m atrix R is not their cor­ relation m atrix, we can still use the m ultivariate normal properties to elicit a positive-definite m atrix R by considering the following normalizing transform ations, Yi = 4>-1[Gi (p«)], i = ,fc. (7.23) We should stress th a t with this copula function, the marginal distributions of the pi are beta distributions th a t can be fixed independently of R. Thus the ability to specify R gives added flexibility. The aim is to choose R so as to model the expert’s opinion about the dependence between the pi. According to the main assumption of the Gaussian copula construction, and from (7.23), the vector Y^k_ x = ( y 1) y 2) .. . } Y k -\) ^ as a m ultivariate normal distribution with zero means, unit variances and a correlation m atrix R , i.e. y fc_ ! ~ M V N (0 ,tf). Following this assumption, together with the unit sum constraint of the elements of p, the full vector Y_' = ( y 1? y 2) . y fc) has w hat is known as a singular m ultivariate normal distribution, which will be discussed in more detail in the next chapter. However, we will be interested, during the rest of this chapter, in eliciting a non-singular correlation m atrix R for the Gaussian copula function only for pi, p 2 , • • •, Pk-iKeeping in mind th at the Pearson correlation coefficients, as elements of R, are not transform ation respecting, i.e. they are not invariant even under strictly monotone increasing transform ations as in (7.23). We do not attem pt to elicit any correlations between the 191 elements of p. Even if a correlation m atrix for p has been elicited it may be of no use in estim ating R as no explicit relationship between the two m atrices is available. Moreover, the density function in (7.22) is indexed by R, the correlation m atrix of F fc_ 1, not the correlation m atrix of p. An alternate m ethod of estim ating R th a t has been proposed in the literature was reviewed in C hapter 2. In th a t approach, a transform ation th a t respects non-param etric measure of correlation, such as Kendall’s r or Spearm an’s p, is computed for p. The monotonicity of a transform ation like (7.23) is then used to impose the same correlations on Y_k_ l . Pearson’s correlations are calculated using approxim ate relations between different correlation coeffi­ cients for the normal distribution. For more details see, for example, Clemen and Reilly (1999), Palomo et al. (2007) or Daneshkhah and Oakley (2010). In our proposed approach, the m atrix R is elicited as a covariance or correlation m atrix of a m ultivariate normal random vector Y_k- i • However, we still utilize the monotone increasing property of the transform ations in (7.23). We may assess conditional quartiles of p, then transform them into those of Y_ using (7.23). Correlation coefficients between the elements of Kfc_i can then be estim ated using their conditional quartiles and utilizing the properties of the m ultivariate normal distribution. This is described in Sections 7.4.2 and 7.4.3. Although the elicitation m ethod of Kadane et al. (1980) has been designed to elicit the covariance m atrix of a m ultivariate t-distribution as a conjugate prior for the hyperparam eters of a normal multiple linear regression model, their m ethod can be useful in a variety of m ultivariate elicitation problems th a t require eliciting positive-definite m atrices [Garthwaite et al. (2005)]. The m ethod is modified here to elicit the correlation m atrix R of the Gaussian copula function. 7 .4 .2 A s s e s s m e n t ta s k s Since the transform ations in (7.23) are strictly monotonic increasing from p to K, we can establish a one-to one correspondence between medians and quartiles of these two vectors. 192 The required assessments are as follows. A ssessin g in itial m ed ian s and q u artiles 1. To elicit each marginal beta distribution, the expert has already assessed a lower quartile, a median and an upper quartile for pi, i = 1,2, ••• ,k, say L*0, m *Q and U*0, respectively. The m ethod proposed in Section 6.2 can be used to determine the two param eters a* and Pi of each marginal beta distribution, for i = 1,2, • • • , k. 2. To help the expert assess the medians and quartile in (1), the PEGS-Copula software presents an interactive graph showing the pdf curve of the beta distribution of pr , for r = 1,2, • • • , k. The expert is able to change her assessed quartiles of pr until its pdf curve represents her opinion to her satisfaction, see Figure 6.1. 3. To attain the unit sum constraint, the mean values of the elicited beta marginals must and Pi are thus modified to fulfill this condition, sum to one. The elicited param eters as follows. The mean values pi are computed as = for z = 1,2, • • • ,k. + Pi The normalized mean values p* are given by = ■ * = 1, 2, --- ,fe.. (7.24) 1N We keep the variances fixed as _ cr„2- = (XiPi (ai + Pi)2(ai + Pi + 1) ’ for i = 1,2, • • • , k. (7.25) Equations (7.24) and (7.25) give the modified set of param eters a* and P?, for i = 1,2, - - - ,fc: >?(!-/*?) 2 . ct,? P t = {1 - r f ) 193 a? / j ’ - 1 4. Before going further, the modified param eters of each m arginal beta distribution are used to compute the corresponding quartiles numerically. These quartiles are presented as feedback to the expert, who is still able to change some or all of them , in which case the process is repeated again until the modified sets of quartiles are accepted by the expert. A sse ssin g co n d itio n a l qu artiles 5. To estim ate the correlation m atrix R, the expert is asked to assume th a t p\ = m \ 0 and gives a lower quartile L \ and an upper quartile for p 2 - For each remaining Pj, j = 3, • • • , k — 1, she assesses the two quartiles L j and Uj given th a t p\ = m ^ 0, P2 = ^ 20 , •••, P j-i — irij-ifl- Figure 7.4 shows the process of assessing conditional quartiles, where the expert has already assessed the lower quartile of the th ird category, conditional on the median values of the first two categories, which are shown by the red bars. — ............ . You a s s e s s e d t h e lo w er q u a rtile p ro b a b ility o f c a te g o ry (C a te g o ry 3 ) t o b e (0 .1 3 8 ). i, . F lc E a t Tools Hotp Eliciting Q uartiles of th e probabilities of C ategory (C ategory 3) I 0.45 C ategory t Category 2 Category 3 Category 4 C a tegories ( ^Bace 3 rifetV | fW fl Figure 7.4: Assessing conditional quartiles for copula elicitation 194 .jsja 6. The lower (upper) quartile L k (Uk ) of pk will be autom atically shown to the expert once she assesses the upper (lower), quartile Uk_ x ( L * ^ ) of P k-i- The two quartiles L k and Uk are shown to the expert as a guide to help her choose L*k_ x and Uk_ v See Figure 7.4, where the software has shown the upper quartile of the fourth category after the expert assessed the lower quartile of the third category. In fact, L k (Uk) is the lower (upper) quartile of m i,o>' ‘ ’ t P k - i — m k-i (pk\pi o)’ as ~ m i '' ‘ iVk- 2 = m k- 2 0) instead of (pk\pi = ^w0 quartiles in the latter case should be ju st equal to m*k 0, because of the unit sum constraint. A ssessin g co n d itio n a l m ed ians 7. Here we assume th a t the median of pi has been changed from m | 0 into Given this information, the expert will be asked to change her previous medians of each p j to be m *^ r r ij 0 1. We put m j,i = m lo + ej ,i ’ for j = 2,--.* ,k. (7.26) 8. In each successive step i, for i = 2,3, • • • , k —2, the expert will be asked to suppose th a t the median values of p\, P 2 , •••, P i are m \ ^ — 0+ 77^, ra22 = ^ 2,1 " >m i,i = respectively. Given this information, she will be asked to update her assessed medians from the most recent previous step m*+l i_l5 dated assessments are ra*+M = 7n |+1)i_ 1+ ^ +l f, ' i m *ki-1- The UP“ i = m*i+2 ,i - \ + 0 i+2 ,n • • • , m%A = rn% i_i + 0k,i’ respectively. In other words, for i = 1,2, • • • , k —2, j = i + 1, z+ 2, • • • , As, we can write m ^i = + 0jti is the median of (pj\pi = m \ jl, • • • ,p» = m*fi). (7.27) On an interactive graph produced by the PEGS-Copula software, see Figure 7.5, the conditioning set of median values are shown as red bars. The conditional m edians of the remaining categories at the most recent previous step are shown as black lines. The 195 expert is asked to assess how her new m edian values will change based on the new conditioning set. Unix! EE You a s s e s s e d t h e co n d itio n al p ro b a b ility m ed ian o f c a te g o ry (C a te g o ry 3 ) t o b e (0 .3 7 4 ). P le a s e c o m p le te fo r o th e r c a te g o rie s ! m Ob C fi T ads B«p Eliciting conditional m edians o f P robabilities fo r Each C ategory £L C ategory 3 Category 2 Category 1 Category 4 C ategories 'R ev ise S uggestion fBSpTl *cee p tS u g g e stro o s Figure 7.5: Assessing conditional medians for copula elicitation 9. For m athem atical coherence, as will be proved in Lemma 7.1, we require i k 1 2 mh + J 2 j —1 = * = 1 , 2 , " - ,fc —2. j —i +l The expert has the option of changing her initial set of assessments m'i+l i , " '■ > m'k i until she feels th a t the suggested normalized set m*+ l i , m*+ 2 i, •••, m k i gives an adequate representation of her opinion. The software suggests each normalized conditional median m ^ , given by yellow bars in Figure 7.6, as r= l m jti = m for i = 1, • • • , k — 2, j = i + 1, • • • , k. J 2 m 'r.i . r=i+l 10. The current assessment task stops at step k — 2, as we do not ask for any conditional assessments for the last remaining category pi~. Since the condition of summing to one should always be fulfilled, conditioning on specific values of all p i,P 2 , • • • ,P k - 1 gives a 196 fixed value for pf~. In this case no upper or lower quartiles can be assessed for pk, as mentioned before. liaBss^ :111:......1" ............... ... ...." ~~~~ _i_ ‘ . -iai*i Now, y o u h a v e finished w ith th is fra m e . You m a y p r e s s 'N e x t 1 to p ro c e e d ja a ; fie Effi Tods Rdp Eliciting conditional m edians o f P robabilities fo r Each C ategory 0 .9 5 ......................... -...... - ...........................- .................•........................................................ ^................................................................ 0.90 ............................ i................................................4............................................... - ................................................. 0.75 ................................. -............. -....................................................... . ■ .. ------------------------------------ *..< i. .I---------------------------------Category 2 C ategory 3 Category —. C ategory 1 4 C ategories P^eSTl RCTS8su;ie£fenS| F ^ s ^ S ir] \ fH5p?"l Figure 7.6: Software suggestions for conditional medians 7 .4 .3 E liciting a p ositive-d efinite correlation m atrix R The normalizing one-to-one functions in (7.23) are used to transform the assessed condi­ tional quartiles of p into conditional quartiles of Y_, and hence, into conditional expectations, variances and covariances of the m ultivariate normal variables. In particular, letting M ( X ) denote the median function of the random variable X , we proceed as follows. For i = 1,2, • • • ,k, let m^o = $ _1 [G i(m |0)]. For i = l , 2, -*- , k — 2, and j = i + 1, • • • , k — 1, let m jti = E(Yj\pi = m l t0 + rjl,P2 = m*2jl + 77I, • • • ,p { = + 77?). Then (7.28) For 2 = 1, 2, --- , k —2 define rji by letting rji = Yi — r a ^ - i when Pi = 'm jti = E(Yj \Y1 = m i f l + rjuY 2 = m 2,i + 197 772 , • • • + rj*. Then , Yi = ™i,i-i + Vi), (7.29) and m = 4 + for i = 1,2, ■■• , k - 2. „*)] _ Analogous to mU = + 77*, define rriij = m ^ i - i + 77*, so th a t 1 7 = rriij when pi = For 2 = for 2 = 1,2, • • • , k - 2, (7.30) . 1,2,-** , k —2, and j = 2 + 1 ,••• 0 j,i , fc — 1, analogous to 0 ^ = rrij^ define = Trijj —rrijj- 1 , so th a t e,- 4 = r For 2 = ‘ [G,(m*(. i + < y ] - s - H e j K i - i ) ] - 1,2,-** , k — 2, and j = 2 + 1, ■■• , k —1, let Vk = Var(Y)|Yi = m i)0, U2 = m 2>0, • • • , 1* = m ij0), so th a t Vjj- 1 — U j-L f2 , 1.349 for j = 2,3, • • • , k — 1, (7.31) with U j = $ _ 1 [G^(£f/)], L j = $ _ 1 [G3-(l;)]. Having defined the above quantities, we are ready now to state and prove the following lemma. L em m a 7.1. Under the unit sum constraint of p, and the multivariate normality o f Y , , i k mh j= 1 + mi,< = 1> i = 1,2, • • • , fc - 2. j= i+ 1 P roof A property of conditional expectations of singular m ultivariate normal distributions is given by equation (8a.2.11) in (Rao, 2002, p 522). Using this property, for 198 2 = 1,2, • • • ,k — 2, we have E[Y k \Yi = m i,i, • • • ,Yi = 77ij,i] = E[Yk \Yi — Yi + 1 , , ••• , 7721 1 = Yt = m ifi, E ( Y i + 1 |Y i = 7 7 ii,i, • • • y fc_ i = .E7(Yfc_ i | Y i = 7 7 ii,i,- • • ,Yi = m i ti ) , ,Yi = m i}i)], then, from equations (7.29) and (7.30) M (Y k\Yi = 772i,i,- • ■,Yi = m iti) = M ( Y k \Yi = m i,i, - •• ,Y { = Yi+ 1 — 772j+i,^, • • • , Ffc—1 = 772/5—1 ,1). Hence M { $ - 1 [G fc (p fc )]b i = 772^1, • • • M { $ - 1 [<?fc(pfc)]|pi = 772^ 1 , ,pi m li} = = • • • ,pi = m liiP i+ i = TnJ+i^, • • • , p k- i = m k- i , i } > which, utilizing equations (7.26) and (7.27), gives $ - 1 [Gfc(772^)] = $ - 1 {Gfc[M(pfc|pi = 772^ 1 , • • • Pi+ 1 ,pi = m li, ’ ' ' i P k —1 'TH'k— i.e. 7 7 2 ^ = M ( p fc| P l = m \ t1 , • • ■ , P i = 7 7 2 U , P i + i = 772-+ l i , • • • , p fc_ i = Since the condition in the RHS of the above equation is on all the pis except pk, applying the unit sum constraint gives the conditional median in the form of the following complement k —1 i = 1~ Y mh ~ Y j= l mh ’ j= i+l which ends the proof of Lemma 7.1. • To elicit a positive-definite correlation m atrix R, let Y-i — (> 1 , Y2, -•-, Yi)> 199 2 = 1, 2, - - - , A: — 1, where R \ = Var(Yi) = 1 and the final m atrix R = R k -i- Suppose th a t R i - i has been estim ated as a positive-definite m atrix, we aim now to elicit R4 , and show it is positive-definite. R{ can be partitioned as follows R i —l R i —lTLi (7.32) Ri = t i R i -1 Vi where R i - i u = C o v C y ^ y * ), Vi = Var {Yi). Although the Gaussian copula function implies th a t Var(Yi) — 1, we will find another estim ate for Vi using the conditional variance of Yi elicited in (7.31). The reason for this, as will be shown later, is to follow the approach of Kadane et al. (1980) so as to ensure the positive-definiteness of the m atrix Ri. In what follows, we use the conditional median assessments to estim ate r^. Using the partition (7.32), it is well-known from m ultivariate normal distribution theory, since E(Y_) = 0, th a t (7.33) Moreover, for j < i — 1, taking the conditional expectation of both sides of (7.33), given th a t y_. — im ifi + + *72 ,- • • , E gi ves = y}] = = y.) U , (7.34) i.e ^ « K J-= 1 /J.) = (y1> . . . , Vj, E (Y j+ 1 \Y j )i E iY i^lY j)) n . (7.35) From (7.29) and (7.35) we get m i,j = (mift + 771, m 2)i + 772, m j+ 1:j, • • • , r r i j j - i + r}j t • • • , r r i i - i j ) U- Since j — 1,2, • • • , i — 1, we end up with a system of i — 1 equations of the form (7.36) T i — Q i —lL-i where m i, 1 m it2 Ti = mi^i—i and 1 771 Q i —l — 7773,1 TTlz-1,1 771 7772,1 + 772 7773,2 T77i_i,2 7/1 7772,1 + 772 7773,2 + 773 777^—1,3 7?1 7772,1 + 772 7773,2 + 773 7 7 7 i_ l,i_ 2 + T ] i- \ ••• Since m i:j - m j - i = 0id , j = 1,2, • • • , z - 1, and 777*,0 = 0, m ultiplying both sides of (7.36) from the left by the m atrix 0 -1 1 0 ... 0 0 -1 1 0 0 -1 1 1 M i_i = ; ... O 0 ; 0 0 1 the system can be w ritten as @i, 1 771 #2,1 ••• 0 z - l,l 0i,2 0 772 ••• 0 » - l,2 0 7 7 i_ i 0 @i,i—1 0 201 Provided th at Vj 7^ 0 ) 3 = 1>2, • • • , i — 1, the upper diagonal m atrix M i - i Q i - i is non-singular. Hence r - m #2,1 0 rj2 • •• - l r- -| 0 i - i,i 01,1 *** 01- 1 , 2 0z,2 0 0 0 77*—1 0*,i—1 • Since V ar(y,|Z i_ 1) = Var(yi) we can now use the assessed conditional variance given by V^i-i in (7.31) to estim ate the unconditional variance Vi as follows Vi = Vij-i + Using the Schurr complement, the m atrix Ri is positive-definite if and only if Vi - r^Ri-iZi > 0, which is guaranteed from (7.31) since i > 0. • Choosing the arbitrary values r/j ^ 0, j = 1,2, • • • ,i — 1, guarantees the existence of a unique solution for r {. It can be seen from the relation V j th a t r]j = + , ; ) ] - 0 as rjj ^ 0, j = 1 , 2, • • • , i — 1 . • W ith the proposed m ethod, Ri is a positive-definite m atrix if R i - i is positive-definite (■i = 2,3, • •• , k —1). Since R \ = 1 > 0, by m athem atical induction, the full correlation m atrix R = R k - i is guaranteed to be positive-definite. 202 We have to note th at, according to this m ethod of elicitation, the variances on the main diagonal of R, say r ^ , i = 1, 2, • • • , k — 1, will seldom equal one, except for the first element r \^ . It is easy, however, to transform R into R*, where R* is a suitable correlation m atrix for the Gaussian copula function, satisfying both the unit variances and positive-definiteness. R* can be obtained from f using the transform ation R * = ARA. where 1 0 1 A = 0 0 0 i,fc—i . The unit variances in the correlation m atrix R* ensures th a t each m arginal distribution Gi(pi) is still a beta distribution with the same marginal hyperparam eters ai and fa th a t were elicited before (2 = 1,2,-** , k). • The accompanying software outputs the elicited pairs of beta param eters ai and fa, for i = 1,2, • • • , k, together with the elicited covariance m atrix, R*. 7.5 Exam ple: W aste co llection The Environm ental Agency in the UK is currently interested in the fuel consumption of waste collection vehicles. It is thought th a t substantial quantities of fuel are used to collect recyclable waste and th a t local authorities are insufficiently aware of the amounts involved. In this example, a waste management expert, (Dr. Stephen Burnley, The Open University) used the PEGS-Copula elicitation software to quantify his opinion about the proportions of waste collection trips according to the type of recyclable waste. Dr. Burnley is a fellow of the Chartered Institution of Waste Management. He advised th a t two main types of the waste are considered; urban recycle and rural recycle. Each of them may contain bins, sacks, 203 garden waste and recycle waste. Hence, each collection trip is arranged by the local authority for only one of eight different waste types. Considering the proportions of collection trips for waste in each category, the problem can be form ulated in a multinomial model w ith eight categories. Our m ethod and software were used to quantify the expert’s opinion about a Gaussian copula prior for the param eters of this multinomial model. After initializing the software and defining the model, the expert assessed his medians of the proportion of collection trips for each of the following 8 types of waste: urban-bins/ urbansacks/ urban-garden/ rural-bins/rural-sacks/ rural-garden/ urban-recycle/ rural-recycle. Then the expert assessed lower and upper quartiles for the proportion of each category. His as­ sessed medians and quartiles are shown as blue bars and short dark blue horizontal lines, respectively, in Figure 7.7. These assessments are also given in Table 7.2 below. You h a v e a lre a d y a s s e s s e d alt c a te g o rie s b e fo re , b u t still y o u m a y c h a n g e it Eliciting Q uartiles of th e p robabilities of e a c h category 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 § 0.50 2 0.45 0.40 0.35 0.30 0.25 0.20 0.15; 0.10 0.05 0.00 U. g a rd en U .recycle R. recycle C ategories f i'gt"*"] Figure 7.7: The initially assessed m arginal medians and quartiles 204 Table 7.2: E xpert’s assessments of medians and quartiles Pi P2 P3 Pa P5 Pe P7 Ps Lower quartile 0.25 0.05 0.13 0.05 0.01 0.02 0.18 0.07 M edian 0.30 0.08 0.20 0.07 0.03 0.05 0.25 0.09 Upper quartile 0.35 0.12 0.28 0.15 0.05 0.07 0.30 0.25 These assessments were used to elicit a m arginal beta prior distribution for the proportion of trips in each category. For m athem atical coherence, the expected values of these elicited beta priors m ust sum to 1, so, the software used the initial assessments to elicit b e ta dis­ tributions th a t satisfy this condition. The median values and quartiles of the coherent b eta distributions were computed and presented to the expert as feedback in Figure 7.8. During this feedback stage he was invited to accept or revise these quantities. The initial median values given by the expert have a sum th a t is nearly equal to one, so the coherent medians and quartiles suggested by the software in Figure 7.8 were close to his assessments and he naturally accepted them as representatives of his opinions. H e re o re y o u r unco n d itio n al a s s e s s m e n ts , y o u m a y c h a n g e a n y o f th em ! Unconditional M edians an d quartiles already a s s e s e d fo r Each C ategory _i U. recycle U. g a rd en R. recycle C ategories Figure 7.8: The coherent assessments suggested by the software 205 To elicit a correlation m atrix for the Gaussian copula prior, the expert gave conditional assessments th a t quantified his opinion about the dependence structure between the marginal beta distributions. To do th a t, he assessed conditional quartile values, under the condition th a t the assessed medians for the previous categories were actually the true values. For example, he assessed his conditional quartiles of the proportion for the fourth category, given th a t the median values for the first three categories equalled their true values. This is illustrated in Figure 7.9. You have already a s se s s e d this c ategory (R. bins) before, b u t stfll you m ay change It FI* Ecfit Tods Help Eliciting Q uartiles of t h e probabilities of C ategory (R. bins) S 0.45 U. g ard en U. recycle R. recycle C a te go rie s I he* » | Figure 7.9: Assessing conditional quartiles The expert’s seven pairs of assessments for the lower and upper conditional quartiles are given in Table 7.3. The quartiles for the last category are shown in bold typeface in Table 7.3 as they were autom atically com puted by the software when the expert assessed two quartiles for the seventh category. This is also illustrated in Figure 7.10. Table 7.3: E x p ert’s assessments of conditional quartiles P2 PZ PA Ps Ps Pi P8 0.03 0.10 0.03 0.01 0.02 0.20 0.19 0.13 0.23 0.08 0.04 0.08 0.28 0.27 206 no«r„ yet* fcav a fin is h e d w flh th is c a te g o ry . Y ou m a y p r e s s *Next’ t o p ro c e e d sm 8 0.45 U . b in s U sacks U . g a rd e n R . b in s R .s a c k s C a te g o r ie s R . G a rd e n U .re c y c le R . re cy c le Figure 7.10: Assessing conditional quartiles for the last two categories Next, conditional on the proportion for the first category being 0.12, the expert gave conditional median assessments for the proportions of the seven remaining categories. The number of conditions was then increased in stages. For example, in Figure 7.11, the expert has assessed the conditional medians for the last five categories given th a t the proportions for the first three categories are 0.12, 0.04 and 0.08, respectively. Table 7.4 gives all the conditional median assessments, where the underlined values constitute the conditioning set at each stage. 207 • o z ^ z irz riiz iz z z z z z ^ ^ r * * llcrrr, you: bav® fTimSzed wiTlx t h i s fra m e . You m a y p r e s s 'f f a x f t o p ro c e e d |V^ - ■- jb i *s n j^ j FBe t e a Tools Help Eliciting conditional m edians o f P robabilities fo r Each C ategory 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 £ 0.55 I 0-50 | 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 U .b ln s U. s a c k s U. g a rd en R. bins R. sa c k s R. G arden U. recycle R. recycle C ategories r < BacK | -: Rs^seStfggesb'ons | f *cs«pt S u g g g s te n ij j !&* >■ | Figure 7.11: Assessing conditional medians Table 7.4: E xpert’s assessments of conditional medians Pi P2 P3 P4 Pb P6 P7 P8 0.12 0.09 0.16 0.10 0.06 0.12 0.15 0.14 0.12 0.04 0.16 0.14 0.06 0.14 0.14 0.2 0.12 0.04 0.08 0.14 0.06 0.18 0.14 0.22 0.12 0.04 0.08 0.07 0.10 0.20 0.14 0.23 0.12 0.04 0.08 0.07 0.05 0.22 0.15 0.26 0.12 0.04 0.08 0.07 0.05 0.11 0.22 0.33 This was the last assessment task, after which the software output the elicited hyperpa­ ram eters of the m arginal beta prior distributions as in Table 7.5. The dependence structure between these beta marginals was quantified as a m ultivariate Gaussian copula function w ith an elicited covariance m atrix as given in Table 7.6. 208 Table 7.5: The elicited hyperparam eters of marginal beta distributions Pi P2 P3 PA P5 P6 P7 P8 a 3.7607 1.0661 1.6536 0.6951 0.5731 0.6344 1.2489 0.4545 b 12.0047 14.8133 8.6742 8.6012 19.5493 14.8684 3.3578 3.3669 Table 7.6: The elicited covariance m atrix of the Gaussian copula prior Yi y2 y3 Yi y5 y6 Y7 Yi 1 -0.1279 -0.2601 -0.7773 -0.55 -0.6192 0.5414 y2 -0.1279 1 0.1328 0.1479 -0.4842 -0.3304 0.3326 y3 -0.2601 0.1328 1 -0.082 0.042 -0.03 -0.0618 Ya -0.7773 0.1479 -0.082 1 0.2358 0.4632 -0.4406 y5 -0.55 -0.4842 0.042 0.2358 1 0.5664 -0.5812 y6 -0.6192 -0.3304 -0.03 0.4632 0.5664 1 -0.8354 y7 0.5414 0.3326 -0.0618 -0.4406 -0.5812 -0.8354 1 The elicited m atrix in Table 7.6 does not give covariances between the beta distributed proportions, pi , -- - ,P 8 - Instead, it gives the covariances between the transform ed normal variates, Yi,-- - , Y7 . The eighth transform ed normal variate is om itted so as to avoid the singularity of the elicited m atrix, as discussed before. The Gaussian copula m ultivariate dis­ tribution is param eterized by both the marginal beta param eters and the covariance m atrix in Table 7.6. The software produces a WinBUGS file with the Gaussian copula prior distribu­ tion. Marginal beta param eters can also be used to compute the expected value and variance of the proportions of each category. These are given in Table 7.7, where the expected values are very close to the coherent median assessments in Figure 7.8, and even closer to the initial median assessments in Table 7.2 and Figure 7.7. The elicitation process took about an hour to complete. The expert stressed the im por­ tance of the convenient order of categories when conditioning. During the task of giving conditional assessments based on an increasing num ber of conditions, he commented th a t 209 ordering the categories in a suitable sequence made it easier for him to think about these conditions according to his knowledge. Table 7.7: Probability means and variances from marginal beta distributions 7.6 Pi P2 P3 PA P5 P6 P7 P8 E ( Pi) 0.239 0.067 0.160 0.075 0.028 0.041 0.271 0.119 V (Pi) 0.011 0.004 0.012 0.007 0.001 0.002 0.035 0.022 C oncluding com m en ts The elicitation methods for beta param eters proposed in the previous chapter have been used in this chapter as the main tools for eliciting two more flexible prior distributions for multinomial models. A novel elicitation m ethod for the generalized Dirichlet distribu­ tion has been introduced. The m ethod makes use of the fact th a t the conditional dis­ tributions of the generalized Dirichlet variates are b e ta distributions. The m ethod has been implemented in user-friendly software th a t is freely available as PEGS-Dirichlet at http://statistics.open.ac.uk/elicitation. The elicitation of copula functions for multinomial models faces two obstacles, as noted in the literature. The usual correlations cannot be transform ed through the assumed cop­ ula transform ation, which is one obstacle, and the need to elicit a positive-definite variancecovariance m atrix is the other. Our proposed elicitation m ethod for the Gaussian copula prior has overcome both problems. The assessed conditional quartiles could be transform ed through the normalizing one-to-one transform ation, making it possible to elicit correlations. Moreover, the m ethod of Kadane et al. (1980) has been modified to elicit a positive-definite variancecovariance m atrix for the Gaussian copula. The m ethod has been implemented in the userfriendly PEGS-Copula software th a t is freely available at http ://statistics.o p en .ac.u k /elicitatio n . 210 C hapter 8 E liciting logistic norm al priors for m ultinom ial m odels 211 8.1 In trod u ction The logistic normal distribution has long been used as a m ultivariate distribution for propor­ tions (Aitchison, 1986). The constrained proportions are obtained by transform ing normally distributed unconstrained variables on the real space using some one-to-one transform ation. Different m ultivariate logistic transform ations are given in the literature, see for example Aitchison (1986). The most well-known and widely used logistic transform ation, specially for multinomial logit models, is the additive logistic transform ation. We propose a m ethod for quantifying opinion about a logistic normal prior for multinomial models. Our proposed m ethod has been implemented in interactive graphical user-friendly software developed in Java. This is freely available as PEGS-Logistic at h ttp ://sta tistic s.o p en . ac.uk/elicitation. The elicitation m ethod proposed here is generalized in C hapter 9 to handle the case of multinomial models with covariates, or w hat are known as the multinomial logit models. In Section 8.2 we define the logistic normal prior to be used and consider its assumptions. The required assessments with our structural procedure to elicit them using the software are given in Section 8.3. The use of these assessments to elicit the hyperparam eters of the logistic normal prior distribution is proposed in Section 8.4. A m ethod to obtain the prior’s marginal quartiles, which are useful as feedback, is proposed in Section 8.5. We finish this chapter by giving an example in Sections 8.6 and some concluding comments in Section 8.7. 8.2 T h e a d d itive logistic norm al d istrib u tio n The additive logistic transform ation from V* to p is defined by with inverse transform ation Yi = log ( E ) = log \P lJ ( - ---------------- ----------------------1 , V1 ~P2 ~ P 3 i = 2 ,3, ( 8 .2 ) PkJ where r = (y2) y 3, Yfc) ~ MVN(/xfc_ 1,Sfc_1). (8.3) • The transform ation is one-to-one from the k — 1 dimension random vector Y* into the k dimension random vector p. The definition of an extra random variable Y\ will be given later. k • For any values Y 2 , • • • , Ffc, (8.1) gives E Pi = 1. i= 1 • The m atrix E/c_i is non-singular. • The transform ation is not symmetric in the p i, as we choose a fill-up variable Pi = 1~P2 - P 3 --------- P k - • The transform ation is used in the multinomial logit regression model when Yi = X % • If (8.3) applies, the elements of the vector p are said to have the m ultivariate logistic normal distribution. Their joint density has the form f(S} H k - v E k -i) = (27r)fc21|E fc_ i|2 (p 1 x p 2 x ••• x p k) ex p | - i 2*2, p o g ^ j/P i) k where p '^ = (p2 p3 ... pk), 0 < p{ < 1, ^ P i = 1. i= 1 • This additive logistic normal distribution is said to be perm utation invariant. T h a t is, whatever be the ordering of the elements of the vector p , the density function given above is invariant. For a theoretical proof of this property see Aitchison (1986). Under the perm utation invariance, any order of the elements of p can be considered. Con­ sequently, the choice of the fill-up variable is arbitrary. Usually it is chosen as the probability of the most common category, the first category, or the last category. To elicit a logistic normal prior, we favour choosing the most common category as the first category and making pi the fill-up variable. This is more convenient for our m ethod because of the order of conditioning we adopt later. • For sampling compositional data, the problem of zero components has been reported by Aitchison (1986) as a critical irregular case th a t needs special attention in dealing with the logistic normal distribution. Clearly, the log transform ation cannot be applied with zero components. However, we need not worry about this problem in our elicitation m ethod, as categories with assessed zero probabilities can simply be removed from the analysis at the first early step w ithout any loss. We assume th a t prior opinion about Y_* can be represented by the m ultivariate normal dis­ tribution in (8.3). As will be shown later, for the assessments of p to be fully transform able to y*, a further normalizing transform ation m ust be defined on the fill-up variable p\. We define an extra variable Yi such th a t (8.4) Based on the normality assum ption of Y_* in (8.3) and the unit sum constraint of p, the random variable e Yl can be represented as a sum of k — 1 lognormally distributed random variables, since k Although the sum of lognormal random variables has no simple exact distribution, it is common to approxim ate its distribution by another lognormal distribution. This is discussed in the next section. 214 8 .2 .1 A p p r o x im a te d is tr ib u t io n o f t h e lo g n o r m a l s u m Fenton (1960) considered the numerical convolution of lognormal distributions and showed th a t the sum of such distributions is a distribution th a t approxim ately follows the lognormal law.. He added th a t the sum of two (or more) lognormal distributions can be assumed, as a first approximation, to have another lognormal distribution. Later, Schwartz and Yeh (1982) mentioned th a t there is an accumulated body of evidence indicating th a t the distribution of the sum of a finite num ber of lognormal random variables is well-approximated, at least to first order, by another lognormal distribution. Several approximations have been introduced for the sum of lognormal random variables. Although the idea of approximating their sum using another lognormal distribution has been common in many studies, methods differs in approxim ating the moments of the lognormal distribution of the sum. Fenton (1960) matches the first two moments of the sum of lognormal random variables to the first two moments of an equivalent lognormal random variable. Schwartz and Yeh (1982) follow the same approach but compute the exact first two moments for the sum of two lognormal random variables; the procedure is then iteratively applied for the sum of more than two lognormal random variables. Their m ethod of computing the distribution of a sum of independent lognormal random variables was extended to the case of correlated lognormal random variables by Safak (1993). Recently, based on approxim ating the distribution of the sum of lognormal random vari­ ables by another lognormal distribution, a lot of work have been devoted to giving various approxim ation methods. For example, Beaulieu and Xie (2004) uses a linearizing transform w ith a linear minimax approxim ation to determine an optim al lognormal approxim ation to a lognormal sum distribution. Tellambura and Senaratne (2010) use the classical complex in­ tegration techniques to approxim ate the moment generating function of the sum. M ahmoud (2010) approximates the characteristic function and the cumulative distribution function of the lognormal sum by exploiting the recent Hermit-Gauss quadrature-based approxim ation. It is thus natural to approxim ate the distribution of Y\ by a normal distribution with 215 elicited mean and variance. (We do not require any approximations to obtain its param eters.) We can then state our main assumption: ijc = (Yi, Y2, y fc) '~ M V N ( Mt,E fc). The unit sum constraint of p will always lead to a singular m atrix (8.5) However, we assume th a t there is only one condition on the elements of p, namely the unit sum. In particular, we assume th a t there does not exist any subset of categories such th a t the sum of their probabilities is known with certainty. Although no density function can be defined for the singular m ultivariate normal distribu­ tion, its theoretical properties and numerical results have been investigated in the literature. See, for example, Bland and Owen (1966), Kwong and Iglewicz (1996), A lbajar and Fidalgo (1997) or Genz and Kwong (1999). Usage of the singular normal is thus feasible and has been exploited in numerous mul­ tivariate methods. K hatri (1968) used the notion of a generalized inverse to utilize the singular normal distribution in m ultivariate regression. Styan (1970) discussed the distribu­ tion of quadratic forms in singular normal variables. West and Harrison (1997) defined the covariance m atrix of the m ultivariate normal distribution as a non-negative definite m atrix. In C hapter 8 of his book on linear statistical inference, Rao (2002) did not use the density function to define the m ultivariate normal distribution. Instead, he characterized it by the property th a t every linear function of its elements has a univariate normal distribution. He could then list properties and characterizations of the m ultivariate normal distribution w ithout using the pdf. The singular normal distribution is thus a special case of the standard normal distribution, and has similar properties, but with the usual inverse of the covariance m atrix replaced by its generalized inverse. Conditional properties of the singular normal distribution have been extensively used in the current chapter for eliciting a logistic normal distribution. To this end, using (8.5), we assume th a t the prior distribution of p is the logistic normal 216 distribution induced by the vector r = ^ r fc~ M V N (& _ 1, s fc_ 1). where A — 0 j Ifc—i (8 .6) (8.7) S&_1 — A E kA'. (8.8) We sta rt by eliciting p,k and a m atrix E& of rank k — 1. In our approach, we modify the m ethod of Kadane et al. (1980) and add a special treatm ent for the k th row and column. This will give the and Y k- i in equations (8.7) and (8.8). The m atrix E& is singular of rank k — 1, given th a t no other constraint can be imposed on subsets of probabilities except the unit sum. However, the m atrix is shown to be positive-definite of full rank k — 1, since it is simply E^ with its first row and column removed. A formal proof of the positive-definiteness of E ^ -i will be given later in Section 8.4.2. 8.3 A ssessm en t tasks Since the transform ations in (8.2) and (8.4) are strictly monotonic increasing from p to Y fc, we can establish a one-to one correspondence between the medians and quartiles of these two vectors. The required assessments are detailed as follows. 8.3.1 A ssessing initial m edians • The choice of a category to sta rt with is arbitrary, as discussed earlier. Hence it may be chosen by the expert as the most common category and its probability is denoted p\. A median value m \ for p\ will be assessed as a first step. Then the expert assesses median values rrij, j = 2, • • • , k, for all the remaining categories. These assessed values are shown by the blue bars in Figure 8.1. 217 ~ ~ uaa Now, you have finished with th is fram e. A ccept o r modify suggestions to sum to one! : ........... Jja& S3e E tt Tods IHp Eliciting M edians o f P robabilities fo r Each C ategory 1 4 3 2 C a te g o r ie s im S S r l r - t - | ; A c c e p t S u g g e stio n s 'l rtet > | r^ W f! Figure 8.1: Assessing probability medians for logistic normal elicitation • The norm ality assum ption of Y_k , together w ith the unit sum constraint of p, can be used in Lemma 8.1 and Theorem 8.1 (which are given in Section 8.4) to show th a t the unit sum constraint must be also fulfilled by the rrij. T hat is X]jLi m j = 1- To attain m athem atical coherence, the software suggests a normalized set of assessments, given by the yellow bars in Figure 8.1, as follows. Suppose the initial assessments were • • • , m'k. Then the coherent assessments th a t are suggested for the rrij are given by m'm 3 = — ------ . for i = 1.2,••• , k. i 2 m 'i i= 1 W ith our software, the expert can keep changing her assessed values until she is happy w ith the normalized values th a t are suggested. 8.3.2 A ssessing conditional quartiles • In this assessment task, the expert is asked to assess a lower quartile L \ and an upper quartile U£ for p\. She is then asked to assume th a t p\ = m \ and gives a lower quartile L \ and an upper quartile U2 for P2 - For each remaining pj, j = 3, • • • , k — 1, she 218 assesses the two quartiles L*j and Uj given th a t pi — m i, P2 = m 2 , • • •, P j-i = rrij- 1 . See Figure 8.2, where the expert has assessed the two quartiles of p% conditional on the m edian values of p \ and P2 as given by the red bars. • The lower (upper) quartile L k (Uk ) of pk is autom atically shown to the expert once she assesses the upper (lower) quartile Uk - 1 ( L ^ j ) of Pk-i, see Figure 8.2. The two quartiles L k and Uk are also shown to the expert as a guide to help her choose L k_ x and Uk_ v In fact, L k (Uk) is the lower (upper) quartile of as + Uk = (pk\pi = m i , • • • ,P k- 2 = m k - 2) + L k = 1 —m i — • • • —m k - 2 , from the unit sum constraint. ■not*) Tho C onditional D istribution o f P3 6 Now, y o u h a v e finished w ith th is ca te g o ry . You m a y p r e s s 'Next* to pn ■■■ .. ■■■■■ ■■■■■_■111 ■ ■" 0 Flc E « Toots Help P3 Eliciting Q usrdlos of th o probabllltlos of cate g o ry (3) I 2 F6aV| IS @ 0 ^ S tit p * ffi m.T2£»Q*i. QJCTTA?... I Qawyahd? 3 4 Categories fsar| | ffiootnoQiarUts fTwp~l ||g j TheCootfPwat PtetrT j« © £ £ J-W» Figure 8.2: Assessing conditional quartiles w ith lognormal feedback o To help the expert during this current task, the software presents an interactive graph showing the pdf curve of the lognormal distribution of (pj\pi = m i, • • • , P j - i — r r i j - 1) , for j = 2,3, • • • , k —1, see Figure 8.2. The expert is able to change her assessed condi­ tional quartiles of pj until the conditional pdf curve forms an acceptable representation of her opinion. W ith the aid of the lognormal curve, the expert is advised to make 219 sure th a t her assessed interquartile range gives an almost zero probability of pj exceed­ ing 1 — i m i • This boundary is given by the red vertical line on the pdf graph of Figure 8.2. See Lemma 8.2 for the formal validity of the above results. 8.3.3 A ssessing conditional m edians • Here, the expert is asked to assume th a t the median of p\ has been changed from m i = m i + rf[. Given this information, the expert will be asked to change her to previous medians rrij of each pj. Her new assessment, rrij rrij i = rrij + 6 j }1, may be w ritten as for j — 2, • • • , k. (8.9) • In each successive step i, for %= 2,3, • • • , k —2, the expert will be asked to suppose th at the median values of pi, p 2 , • • •, Pi are m |^ = m i + V iim 2,2 — m 2 ,i + v b ' ' ' »m i,i = m*i_i -f 77*, respectively, shown as red bars in Figure 8.3. Given this information, she will be asked to change her assessed medians of the most recent previous step m *+ 1 i_ 1, m i+2 i-i> - •• i ^ k i - 1 > sh°wn by black lines in Figure 8.3. < fi,i = °i+ hv + m i+2 ,i = m i+2 ,i-i + ei+2 ,v Her new assessments are • •• > mU = mU -1+ respectively, which are shown as the blue bars in Figure 8.3. For i = 2,3, • • • , k — 2, and j = i + 1, i + 2, • • • , k, we can write rriji = irij^i-i + Oji is the median of {pj\pi = m ^ 1, • • • ,pi = m j^). (8.10) • For m athem atical coherence, as will be proved in Lemma 8.3, we have to make sure th at i k . 2 3 roh + H m h = 1< j= 1 j=i + 1 <= i , 2 , - - , f e - 2 . The expert has the option of changing her initial set of assessments rn'i+l i , 'rn,i+2 )i, • • •, m'k the blue bars on Figure 8.3, until she feels th a t the suggested normalized set m*+l i , > m ki-> shown as yellow bars on Figure 8.3, gives the best representation of her opinion. The software suggests each normalized conditional median m ^ as 220 i - E m r,r r=1 mJ)i5 m i,i = for i = 1, • • • , fc —2, J = ®+ 1, *- - , fc. E mr _ r= i+ 1 JsL*l EC Now, you have finished with th is fram e. You m ay press 'Next* to proceed ■aigjxj He E S Totfc M b Eliciting conditional m edians o f P robabilities fo r Each C ategory 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 Bocfify jo u r tnedtans to sum to one o r ac cept suggestions tn yeiiow! I 0.55 \ 0.50 * 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 C ategories Revise S u g g e stio n s “c;eptSuggestions pw rj Figure 8.3: Assessing conditional medians for logistic normal elicitation • The current assessment task stops at step k — 2, as we do not ask for any conditional assessments for the last remaining category p&. As the condition of summ ing to one must be fulfilled, conditioning on specific values of a llp i,p 2 5• • • ,P k - 1 gives a fixed value for Pk- Then no upper or lower quartiles can be assessed for pk, as m entioned before. Conditional medians of Y& given specific values of Yf, I 2 , • • • , Tfc-i can be autom atically computed when needed, as will be shown later. 8.4 E licitin g prior h yperp aram eters The normalizing one-to-one functions in equations (8.2) and (8.4) are used to transform the assessed conditional quartiles of p into conditional quartiles of Y k and, hence, into conditional expectations, variances and covariances of the m ultivariate norm al variables. In particular, 221 letting M ( X ) denote the median function of the random variable X , we proceed as follows. Let for j = 1 mi, mlo = (8.11) < M (p j\p i= rr ii) , for j — 2,3, - - - ,k. Since the normal variates, Yj = log(pj/pi), j = 2, • • • , k, depend on the fill-up probability p i , eliciting prior hyperparam eters for Y_* is tractable if we condition on p i . T hat is why we define the extra normal variate Yi as in (8.4) and the conditional medians, m*jQ, as in (8.11). These conditional medians are required instead of the assessed unconditional medians, rrij, to elicit the hyperparam eters of the logistic normal prior distribution. However, we chose to elicit the unconditional medians as they are easier to assess than conditional medians. Fortunately, under the normality assum ption of Y f and the unit sum constraint of p, we will show in Theorem 8.1 below th a t the marginal unconditional medians, rrij, are identical to conditional medians, m j Q, of pj, for j = 1,2, • • • ,k, respectively, provided the lognormal sum is adequately approxim ated by another lognormal random variable. For 2 = 1,2,-*- , k, let (8 .12) rriifi = E (Y i). R em ark 8.1 It is worth noting th a t Yi = m^o when pi = m \ Q, but, Yi = o when both pi = m*0 and Pi = m \ 0, for i = 2,3, • • • , k. Extensive use is made of the fact th a t each Y{ follows a symmetric distribution (each has a normal distribution), so E (Y i) = M (Yi). This is a key assum ption in proving the following lemma, which states an im portant result th a t is needed in the proof of Theorem 8.1. L em m a 8.1. Under the unit sum constraint of p, and the multivariate normality ofY_k, k X ^ K o = 1i=1 222 P roof As given by (Rao, 2002, p 522), the conditional distribution of any subset of singular normal random variables is normally distributed with the usual conditional mean and variance, but with generalized inverses of matrices. This property enables us to write, as in the non-singular case, B (Y k) = E[Yk \Y! = E(Y!)} = £[Yfc|Yi = £ (Y i),y 2 = E (Y 2), ■■■, Yfc-i = E{Yk_{j\. Then, replacing means by medians and using (8.12), we get M {Y k \Yi = m i )0) = M {Y k\Y1 = m ij0, Y2 = m 2 ,o,--- ,Y k - i = ^ - 1 ,0 )- Hence, from Remark 8.1, M[\og(pk) - log(pi)|pi = m i)0] = M[\og(pk) - l o g ( p i ) |p i = m i 0 , • •' i P k - l = which gives l o g ^ . o ) - los (^ i,o ) = \ o g [ M ( p k \pi = m j|0, • • • , p k - 1 = m*k - h 0 )} ~ log(m*)0). i.e. m k ,0 = M ( P k \ P i = m*l t 0 , P 2 = 7712,0, • • • , P k - 1 = m*k _ i j0) fc-i i= 1 This is the unit sum constraint, which completes the proof of Lemma 8.1. The main idea in Theorem 8.1 is th a t the fill-up category can be changed from the first category to any other category, and the same assumptions are still valid. We first give some relations and notations needed for the proof of the theorem. 223 Let Y ^ = Y*, and denote the mean vector and variance-covariance m atrix of the multi­ variate normal distribution of Y^-q by fi ^ and = Efc_i. We supposed and S(!) have already been assessed. Moreover, let Y \^ = log(pi) —log(l —pi), w ith E \ = E ( Y i fi) and V\ = Var(Yijl). To change the fill-up category from the first category to any other category j , for j = 2,3,■ • • , k, let Y ij. log(pi) - log (pj) Y j-U log(pj_i) - log ( P j ) Yj+i,j log(Pj+i) - lo g fe ) Ykj logfrfc) - log(pj)>-. Y j j =log{pj) - l o g ( l - p j ) , with tt(j) ~ E{Y_(j)), ^(j) E j^ E iY jj), V a r(Y ^ ), Vj = Var(Y)j), and p itj = E (Y id ), crfj = Var (Yi J ) , i , j = l , 2 ,--- ,k , i^ j. It can easily be shown th at, for j = 2, • • • , fc, 1); — (j) ~ (8.13) where Fj is the identity m atrix of degree k —1 with the j th column replaced by a column of -1. From the normality assum ption of Y(i), and in view of (8.13), we have Y {j) ~ M VN (£o r S w)) 224 with filfl = FiU < J-iV EU) = Fi S (i-1) F'r Approximate normality of each Y jj, for j = 2,3, • • • ,k, is thus induced from the normality assum ption of in a m anner exactly similar to th a t for Y \t Hence, for each j = 1,2, • • • , k, we can also assume th a t the k random variables Y ij, for i = 1,2, ,k, are m ultivariate normally distributed. Moreover, using the norm ality assum ption of Y^i, we assume th a t the k + 1 random variables Y i}i and Y ij, for i = 1,2, •• • , k, are also m ultivariate normally distributed for each j = 1,2, ■• • ,k. T h e o r e m 8.1. For any j = 2,3, • • • ,k , under the unit sum constraint of p, and the multi­ variate normality ofYf(j)> m j = M (pj) = M (pj\pi = = m j)0. P ro o f Let = M[l°g(pi) - \og(pj)], i = 1,2, •••,& , i^j, then = F (y i j ) = E{Yij\Y j d = E j ) =M[\og{pi) - log{pj)\pj = M(pj)]. Hence, exponentiating both sides of the above relation, we get M\pi\pj = M{pj)} = M (pj) exp(m i>(j-)). (8.14) As in Lemma 8.1, we put k M (Pj) + Y M\pi\pj = M (pj)\ = 1. #3 225 (8.15) Solving (8.14) and (8.15) for M ( p j ), we get M( pj ) = ------~ k— ----------- • 1+ S (8.16) exp(m i>(i)) On the other hand, for j ^ 1, since P r {pj < m lo\pi = = 0.5, then P r { f e /P i) < (^ j.o M i.o ) ! ^ = ™i,o} = °-5> and Pr{log(pi/pj-) < l°g(m i,o/m j,o) l^i,i = # i} = °-5So, we can write log(m l,0/m j,0) = M ( Y 1J \Y1,1 = E{) = E ( Y l d \Y1,1 = E 1) = E (Y l d ) = m 1|0). (8.17) Moreover, for j ^ i ^ 1, since m i>U) = E (Y id \Yhl = E u Y j j = E (Y j d \Yltl = E l )) =M[\og(pilpj )\p 1 = m i)0,Pi = m j|0], we have th at Pr{log(pi/pj) < m ii{j)\pi = m*h0,pj = m j>0} = 0.5, and Pr{pi < m l o exp(m i>(7-))|pi = m \ t0} = 0.5. So, m !,o = m j,o exp(m ,i(i)), which gives (8.18) Substituting (8.17) and (8.18) into (8.16) shows th a t M(jpj) is as stated in Theorem 8.1. 226 8.4.1 E liciting a m ean vector To elicit a mean vector jj^ = (miQ m20 ... mfej0), we put (8.19) m i,0 = E (T i) = M (T i) = M (log(pi) - log(l - pi)) (8 .20) = log(mi,o )- lo g ( l - m lj0). For i = 2,3, • • • , k, put (8 .21) mi,o = E (Y i) = E[Yi\Yi = E ^ ) ] = M(Yi\Y1 = m lt0) = M[ log (pi) - log(pi) \p\ = m j>0] (8 .22) = log ( K o ) - log(m i,o)- 8.4.2 E liciting a variance-covariance m atrix For i — 1,2, • • • , k —2, and j = i + 1, • • • , k — 1, let m 3,1 E(Yj\pi = m *1>0 + r)$,P2 = ml,! + *72. *• • ,P» = m i,i-1 + )• Then / mu m j i = log J’ \ m ?h i / For 2 = 1 , 2 , - - ,k — 2, define rji by letting r\i = Y i~ (8.23) m ^ - i when pi = m * ^ + 77*. Then rrij^i = E(Yj\Yi = rai)0.+ r)i>Y 2 = m 2,i 4- 772 , — = ^ - 1 + 77*), (8.24) and log f — ^1,0 + 71. ^ _ log --’V l > for 2 = \ l - { m * 1Q + l?i) i I 1 —7T2i 0 / 1, Vi = l0g " ') ~ l0g ( ^ Analogous to m*^ — m * ^ + 77*, f ) ’ for i = 2,3, • • • , k ~ 2 define mi,i = m , i - 1 + 7a, for i = 1,2, • • • , k - 2, 227 (8.25) so Yi = m iti when pi = m ^ . For 2 = 1,2, • • • , k - 2, and j = 2 + 1, • • ■,k — l, analogous to 0Y = mjh - define so th at 6 j,i - log / '’’ft? t-1 + 0? *\ Im " \ " -lo g V m ?.i / V " ‘ 1,1 of rank k — 1, let To elicit a (singular) variance-covariance m atrix U i-h 1.349 Vi = Var(Yi) = (8.26) where U\ and L \ are the upper and lower quartile of Yi, respectively. We have th a t Ui = log(C/*/l - [/*), L>1 = log(Z q/l —L\). For 2 = 1,2, • • • , k — 2, and j = 2 + 1, • • • , k — 1, let Vjti = Var(Y)|Yi = m i|0, Y2 = m 2>0, • • • , Y = rriifi), so th at VjJ - 1 = Uj — L j 1.349 , for j = 2,3, • - , k - 1, , L3 = (8.27) with U *\ ^J=l0g -mif ‘i .o/ ( L*a ' \ m i.o / Having defined the above quantities, we are ready to state and prove the following two lemmas. L e m m a 8.2. Under the assumptions of Lemma 8.1, for {Pi\pi = m i|0,p 2 = ™2,o> • ’ • >Pi- 1. 1 2 = 2, • • • , k — 1, = m i - i,o) ~ Lognormal(n*,V*), where Pi = m ito + lo g (m i0) = log(m*0), 228 and VC = Viti- i = 'Ui - Li 1.349 i —1 2. Pr ^ p i > 1 - m j ,o f < j =i if and only if Uf I 1.349 < exp LI | zi_ i—1 log ( 3=1 where za is the a quantile of the standard normal distribution. P roof From the normality of F fc together with property (v) of the singular normal distribution in (Rao, 2002, p 522), we have (Yi\Yi = mifl, • • • ,Yi- 1 = ?7ii_i,o) ~ N(mi)0, Rv-i). Then for known fixed m | 0, (Yi + log(m ij0)|Fi = m i|0, • • • ,Yi - 1 = m i-i,o) ~ N ^ o + log(m i>0), V ^i-i). The one-to-one transform ations in (8.2) and (8.4) then give (— pi 1m i,o\Pi = m i,o>' ' ' >Pi-1 = ™*-i,o) = (p*bi = m i,o> • • • >Pi-1 = m i-i,o) ~ Lognormal (ra^o + log(m i>0), F ^ - i) . Using equation (8.22), the first statem ent of the lemma is proved. To prove the second statem ent, we use standard normal distribution theory and the first statem ent of this lemma to state th at p r , log(Pi)^ ft- > log ( i - J2lj=\ ™j|0) - lA VW ' if and only if log ( l - E}=i rnlo) Zi _ c 229 < a, or, equivalently, if and only if - n* log Z l-a This proves the second statem ent. L e m m a 8.3. Under the assumptions of Lemma 8.1, k j= l J=i+1 P ro o f Using equation (8a.2.11) of (Rao, 2002, p 522), for i = 1,2, • • • , fc —2, we can state th a t E[Yk \Yi = m i,!, • • • , Yi = mi,i] = E[Yk \Yi = m i,i, • • • , Y = m u , Y + 1 — E ( Y { + 1 |Y — 7771,1, • • • , Y i — 7 7 7 i,i), • • • , Y —1 — E ( Y k —l IY i — 7 7 7 l , l , ■■’ j Y ^ i,i) ] • Then, from definition (8.24) and (8.25) M (Y k \Yi = 7771,1, • • • , Yi = 7 7 2 i,i) = M(Yfc|Yi = 7771,1, Y +l — • ** , Y = 777f,i, ) Y —1 — Hence M[log(p&) - log(pi)|pi = m ifl, Af[log(pfc) - log(pi)|pi = - ,Pi = my = ••• ,Pi = m l i}p i + 1 = m j+1>i, • • • ,Pfc_i = which, utilizing equations (8.9) and (8.10), gives l°g(™fc,i) - l o g ^ ^ ) = \og[M(pk \pi = m y , ,Pi = m li,P i + 1 = K+i,i> • • • ,Pfc-i = rn%_hi)] - log(7n*jl), i.e. m k,i = M ( p k \pi = 777^ 1 , • • • ,pi = m li,pi+ i = m*i+hi, • • • ,p k - 1 = 230 Since the condition in the RHS of the above equation relates to all ps except pk, applying the unit sum constraint gives the conditional median in the form of the following complement: i k—1 j =1 j=i+1 m which ends the proof of Lemma 8.3. Now, we modify the m ethod of Kadane et al (1980) to show th a t the quantities in (8.24)-(8.27) are sufficient to elicit a positive-definite variance-covariance m atrix Vk-i for Y*.-i = (Yi, • • • , Yfc_i). Then, based on the condition of Y^i=iPi — an^ assuming th a t it is the only constraint on sums of these probabilities, we add a k th row and column to get E& as a singular variance-covariance m atrix for all the elements of Y fc. Removing the first row and column of Efc will lead to the desired positive-definite variance-covariance m atrix Efc_i of Y*. For i = 1,2, • • • , k — 1, let and Vi = Var(y^), with Vi as defined in (8.26). Suppose th a t Vi-\ has been estim ated as a positive-definite m atrix. We aim now to elicit Vi and investigate its positive-definiteness. Vi can be partitioned as Vi—l Vi—iWj (8.28) Vi = y/iVi-i o? where V -m i = Cov(yi_1,y i ), and of = Var(y<). 231 It is well-known from m ultivariate normal distribution theory th a t E (Y ^ Y i_ l ) - m i) = lY .i- 1 - = [Z i-i - E ^ Y i-^ 'V r -iV i-m £ ( & - ! ) ] '& • (8 -2 9 ) Moreover, for j < i — 1, taking the conditional expectation of both sides of (8.29), given th at Vj = (m i )0 + r/i, m 2)i + 772, • • • , r r i jj - i + gives E [ E W Y ^ Y j = y,.] - E(Yi) = E { [ Y - E i Y ^ Y j = y . } ’ a . (8.30) i.e. E (Y i \Yj = y ^ - E ( Y i )_ = ( 3/1 - E ( Y 1 ) ,y 2 - E ( Y 2), E (Y j), . - S « + i ) , ■■■, B W - il Z j ) - B (V i-i)) 2Sj. (8-31) Prom (8.24) and (8.31) we get m i,j - m i ,o = (t?1 > 7712,1 - 771j+1,j 7712,0 + 7?2, • • ' , 7 7 l j , j - l - 771j-)_i,o, j TTli—i j 771j,0 + TJj, 771j_l,o ) I L i • This holds for j — 1,2, • • • , i — 1, so we have a system of i — 1 equations of the form (8.32) where 771i,l 777-2,0 777-2,2 777/2,0 Ti = 777/2,2—1 232 777-2,0 and Vl Q i—l — 7712,1 - m 2)0 771 7772,1 - 7772,0 + V 2 771 7772,1 - 7772,0 + 7?2 771 7772,1 - 7772,0 + V2 7713,1 ~ 7773,0 ’ ** 7 7 7 i_ i,l - 7 7 7 j _ i,0 7773 ,2 ~ 7773 ,0 • ■• 7 7 l i _ l ,2 - 7 7 7 * _ l,0 • • • 777; _ i ,3 - 771j _ 1,0 7773 ,2 - 7773 ,0 + 7773 ,2 - 773 7773j0 + 7?3 ••• 77r7 * _ l,* _ 2 ~ 777^ -1,0 + V i - l Since rriij — m ^ j - \ = 8 i j , j = 1,2, • • • ,7 — 1, multiplying both sides of (8.32) from the left by the m atrix 1 0 0 ... 0 -1 1 0 ... 0 0 -1 1 0 0 -1 1 ; M_i = ° 0 gives $ i, i 771 $ 2,1 ••' $ i —1,1 $ i ,2 0 772 ••• $ i —1,2 U a. 0 0 $ i , i —1 0 T 7 i_ i Provided th at Vj 7^ 0, j = 1,2, z — 1, the upper diagonal m atrix M i - i Q i - i is non-singular and hence -1 - 1 r -1 771 $ 2 ,1 ••• $ i —1 , 1 $ z ,l 0 772 ••• $ i —1 , 2 $ i,2 Ua = 1 1 5? 0 0 0 $ i , i —1 • Since V a r ^ l X ^ ) = V a r(^ ) - uJVi-m*, 233 we can now use the assessed conditional variance given by V ^_i in (8.27) to estim ate the unconditional variance of as follows: o l = Viti-1 + y^Vi-iUi. Using the Schurr complement, the m atrix Vi is positive-definite if and only if of - IkiYi-iUi > 0, which is guaranteed from (8.27) since V ^_i > 0. Choosing the arbitrary values rjj ^ 0, j = 1,2, • • • , i — 1, guarantees the existence of a unique solution for u{. It can be seen from the relation log log m l v + rit log i - K i0+)?nj ( m j , j - 1 + Vj \ -lo g m 1,0 u - mi.oJ ’ for 3 ( m for m 1,1 — 2,3,*** , i - l , th a t rjj = 0 if and only if r]j = 0, j = 1,2, • • • , z — 1. • So far, the proposed m ethod estim ates Vi as a positive-definite m atrix, assuming th a t V i- 1 is positive-definite. Since V\ > 0, the m ethod yields a positive-definite m atrix Vk- \ , by m athem atical induction. E stim a tin g th e last row and colu m n o f E& Let E*; be partitioned as follows Vk- 1 Vk-lUk u'kv k- 1 ^ where Vfc-iUfc = C o v ^ .^ U f c ) , and 4 = Var (Yk). 234 (8.33) Note th at, according to the condition th a t elements of p m ust sum to one, the condi­ tional variance of Yk, given any specific value for Y_k_ i, has a fixed value of zero. Hence, using the standard theory of the m ultivariate normal distribution, we estim ate ak as = u'kVk- i u k. • To estim ate u k we write, as in (8.29), £ ( n |Z * - i ) - E (Y k) = [y*_, - E ( Y k^ ) ] ' u k. Exploiting the condition th a t Y^t= i Pi = (8.34) we can obtain k — 1 estimates of E ( Y k\Y_k_i) from k — 1 different sets of conditioning values for Y_k_ 1. More preciously, let m k>o = E[Yk \Yi = m fc,i = E \Y k \Yi m i )0, Y2 = - m k)i = E[Yk \Yi = • • • , Yk- i m i , i , Y2 , , Y2= 7771 1 m 2,o, 1 >0], rn2)0, ■■ • = , , • • • , T i- 7772 2 = m k- 1 = 777 Yi+1 = 777i+i,i, • • • , Yfc_2 = ; _ i , ; _ i , Yi , Y fc- i = 777 = ^ U fc-i.o ], *,*—! , = "7fc-l,fc-l], for i = 2,3, • • • , k —2, Wlfc.jfe-l = £?[Yfc|Yi = 7771,1,^2 = 7^2,2, • • • , Yfc-2 = ^fc-2,fc-2, ^fc-1 = ^fc-l,fc-l], where m k- i ik- \ is an arbitrary value, which will be chosen such th a t l,fc—1 7^ ^fc—1,0* We require 777.fc—i,fc_i 7 ^ ^ fc - 1,0 ln order to solve the resulting system of equations, as will be shown later. This gives the system of k — 1 equations, Tfc = Q k- \ u k, 235 (8.35) where 1 m k ,0 2 rnk,Q Tk = 'W'k^k—1 'm,k,Q m 0 0 0 ••• 0 m m 2 ,l m 3,2 ''’ m 'k-2,k-3 m k-l,k-l m ™ 2 ,2 m 3,2 ■ ■• m 'k-2,k-3 m 'k-l,k-l m m 2,2 m 3,3 m k —2 , k —3 m k-l,k-l m 'k-2,k-2 m 'k-l,k-l Q k —1 — m m 2,2 r r iij - rrn t m 3,3 and m[ j = i = 2, 3, • • • , k - 1, o, j = i - 1, i. We multiply both sides of (8.35) from the left by the m atrix M k~i, which has a different structure from (i < k), taking the form 1 M k -1 — 0 0 0 -1 1 0 0 0 0 -1 0 0 1 0 0 0 -1 The system of equations can then be w ritten as n^k, i rrikfi m . m k,3 — m k,2 0 T]2 0 (8.36) 'nr1/k,k—1 ^ k ,0 mkjk —2 TYlk,k—l 0 -m ~ m 2,2 236 0 Vk- 2 ••• ~ m k - 2 ,k- 2 Vk- 1_ where ™>iti = rrii,i ~ m , o, Vk-i — m k - i= 2 ,3, • • • , k — 2 , i,o - Provided th a t rij t^O, j = l , 2 , k — 1, the lower triangular m atrix M k - i Q k - i is non-singular and hence - l r mk, i m 0 -| p 0 T)2 m k ,o m k fi ~ m k ,2 U.k = 0 0 -m — m m k , k —i V k-2 - m k -2 ,k -2 2 ,2 r]k-i m k ,o m k , k —2 m k , k —l P o sitiv e -d e fin iten ess o f th e variance-covariance m atrix As mentioned before, the inverse of the additive logistic transform ation is applied to the k dimension random vector p, transform ing it into the k — 1 dimension random vector Y_* = (y^ Y s , • • • , Pfc)* We are interested in the hyperparam eter T,k-i as this is the variance- covariance m atrix of Y_*. Although the whole m atrix is clearly a singular m atrix, we will show th a t the subm atrix £&_i is sure to be a positive-definite m atrix, provided th a t no subset of categories has a known fixed sum of probabilities. Consider the following partition of the singular m ultivariate normally distributed Y_k: log(pi) - log(l - Pi) 1” Fi y2 y3 = Yh- i Yk logfe) ~ log(pi) log(p3) - log(pi) log(pfc_i) - log(pi) log (P k ) 237 - log(pi) .* L y'** "n Recall th at, by definition, Y2 ' y3 y* y ** __ "Yk ' Yk- i "n" Let Sfc be conformally partitioned as a' i b V! a V*\ c Sfc = b "bi b 1 where V* is a (k —2) x (A; —2) square m atrix, a and c are (/c - 2) x 1 vectors, V\ ,cr| and b are scalars. Y\ a' a v* The m ethod we used to estim ate Vk-\ = guarantees its positive-definiteness, hence V* is also positive-definite. The m atrix Efc_i is then partitioned as ~v* d For Efc_i to be positive-definite, we must show th at 4 > e n v y 's . In fact, using the inverse of a partitioned m atrix, and for d = V\ — -l a A c * Vi. \ a' I b id we may write - d -'a 'iV * )-1 d~l - { y * ) ~ l ad ' ( - Y ’(V"*) 1’a'rf-f ’(T>*) - r = £, (l/ *)“ 1c + i {62 - 26[fi'(K*)-1c] + [s'(V *)-1a][a'(V *)-1£|} = c '(K * )-1c + i [ 6 - a '( F * ) _1s ]2 - So, Efc_i is positive-definite if and only if b - a ' i V * ) - ^ ^ 0. 238 (8.37) The m ethod used to estim ate E& autom atically guarantees the fulfilment of such a condition. In fact, using the following partition of u k, u\_ U2 Us .Uk .VlI. U<1 U k-1 gives — Vk—1 Uk Vi a' a V* “ 1. U2 V\ u\ + a' u 2 a u\ + V* u 2 Condition (8.37) thus holds if and only if Ul ± 0. [Vi - But V\ —d ( y * ) ~ l a > 0 from the positive-definiteness of V k -i, and hence definite if and only if u\ ^ 0. It can be seen from (8.36) th at UT'k, 1 u\ 772fc,0 11 This condition is sure to be fulfilled since mkfi = log 1 E j J m *i,o and m k,i = log m; from which fUkyl 7^ Ulkfil unless ™1,1 = ™i,o> 239 m j ,o _ 1 i is positive- which can never occur since ThVO. So, the proposed m ethod for eliciting the m atrix E& ensures th a t Efc_i is positive-definite, even though is itself singular. Once n k and Efc have been estim ated, equations (8.6)-(8.8) give the hyperparam eters and Efc_i of the logistic normal prior distribution of p based on the normalizing transform a­ tions given by Y_*. 8.5 Feedback using m arginal quartiles o f th e lo g istic norm al prior After eliciting the mean vector p k _ 1 and the variance-covariance m atrix E ^ -i of Y*, the software calculates marginal medians and quartiles of the probability of each category and displays their values as feedback to the expert. Since the initially assessed quartiles were all conditional, it is useful to inform the expert of the marginal quartiles and give her the option of changing them if she wants. To add this feedback option to the software, we had to develop a reliable technique for estim ating marginal quartiles from the elicited hyperparam eters and Efc_i. Moreover, we must correspondingly modify the elicited hyperparam eters once the m arginal quartiles have been changed by the expert during the feedback stage. A simple direct m ethod for estim ating the m arginal moments, or quartiles, of the logis­ tic normal distribution in closed forms does not seem to exist in the literature. Aitchison (1986) suggested using Hermitian numerical integration m ethods to obtain m arginal mo­ ments. However, he argued th a t the main practical interest is in the ratio of components, not in the component themselves. This is not the case here, as we are mainly interested in marginal probabilities, not in their ratios. Another approach, based on the Gibbs sampling technique, has been used by Forster and Skene (1994) to accurately approxim ate the posterior 240 marginal densities and other summaries for a broad class of prior distributions including the Dirichlet and logistic normal distributions. However, the m ethod approximates the marginal densities of the posterior distribution rather th an the prior distribution. Under the normality assum ption of Y* and the unit sum constraint, it has been proved in Theorem 8.1 th a t the m arginal unconditional medians o fp j, rrij, are equal to their conditional medians, m |)0,.for j = 1,2, • • • , k. Moreover, the same assumptions make it possible to estim ate marginal lower and upper quartiles for each pj, for j = 1,2, ■• • , k. In the following lemma we formally state and prove the above results. Then, we propose a m ethod of revising the estimates of Affc_ 1 and E ^ -i to reflect any change made by the expert to the m arginal quartiles. L e m m a 8.4. For any j = 1,2, • • • , k, under the assumptions of Theorem 8.1, and Vj is guaranteed to be strictly greater than zero. P ro o f Since with known p i j , af j, the expected value of the lognormal distribution of (Pi/pj) is given by On the other hand, by the assum ption of approxim ate norm ality for Yj j, we have los ( i Z ^ : ) ~ N (£ a > ri), SO 10S^ ~ p T ) and Mj = E [ 1 = exp Pj J - E j + -Vj ) . V (8.39) ^ We take Mj as in (8.38), and Theorem 8.1 gives ^ = Iog( i ^ % ) - (8-40) Equation (8.39) can be solved for Vj to give the first statem ent of Lemma 8.4. Substituting m ^0 for M ( p j ) in equation (8.16) and putting Ej = - log ^ ^ e x p ( / x i(j ) j . = fiij, gives (8.41) This guarantees th a t Vj > 0 in (8.39), since by comparing the RHSs of (8.38) and (8.41), we can see clearly th at Mj > exp (—E j ) . This ends the proof of Lemma 8.4. The two unconditional quartiles of pj can be obtained from n („ \ exP[<My« ) l Ql[Pl) l + expfQ, f t , , ) ] and n (ri i _ Qz{ Pl ) exp[Q3(y ij)] l + exp[Q3« j ) ] ’ with = + 0.25), Q 3 (Xjj) = Ej + y/Vj <S!-\0.7t>), 242 where $ is the cdf of the standard normal distribution. The unconditional quartiles Qiijpj) and Qz(pj) are presented to the expert as feedback with the unconditional median M(pj), for j = 1 ,2 ,-•• ,k. The expert has the option of changing any of the unconditional medians an d /o r quartiles. The changes are reflected in estimates of the hyperparam eters p<k _ 1 and Efc_i, using the following approach. • Let m'( pj ) denote the values of M ( p j ) after re-assessment (j = 1,2, • • • , k). We revise H, i to log{m*{pj)) - log(l - m*(pj)) for j = 1, log(m*(pj)) - log(ra*(pi)) for j = 2, • • • , k, Pj,i = E*(YjA) = with a new normalized set of medians m*(pj), where ^ j = 1,2, ••• ,fc, i= 1 • Suppose one ormore of the marginal unconditional quartiles Qi{pj) an d /o r Qz(Pj) are re-assessed as Q[(pj) an d /o r Q'3 (pj), respectively, for j variance-covariance m atrix — 1 ,• • • , k. Then we change the to = Var*(y(1)) = D l E(1) D l , (8.42) where D is a diagonal m atrix with diagonal elements °il di = - s - , ah i = 2,3, •• • ,k, and cr?i is defined by °f*i = Var* (log(pi) - log(pi)) = V ar*(log(^)) + Var*(log(pi)) - 2Cov*(log(pj), log(pi)). (8.43) The modified variances and covariances, Var* and Cov*, respectively, are determ ined as follows. 243 As Y j j is assumed to have an approxim ate normal distribution, let Q 'ziP j) log V- = Var log - Q i (p j) 1 - Q'liPj) 1 - Q'zivj) Pj 1 log 1.349 P j so, Y,J ~ N ( E J t v ; ) . Using a simple numerical integration technique on the normal pdf of Yj j, we can get the expectations, for j = 1,2, • • • , k, in the RHS of the following equation, Var* (log (pj)) = E<\o g ,1 + exp (Y^-). - E <log ,1 + exp (Yj d )_ To attain a strictly positive value of crf\ as in (8.43), we modify Cov(log(pi), log(pi)) by putting Cov*(log(p»),log(pi)) = Wi Cov (log (p*), log (pi)) i = 2,3, • • • , k. where 1Var* (log {pi)) Var* (log (pi)) Wi = Var (log (p^) Var (log (pi)) i = 2,3, • • • , k. In (8.42) we use the diagonal m atrix, D = so as to change the variances of Y _ , while preserving correlations and also preserving the positive-definiteness for E ^ . Another feedback window is available on request for the expert, should she need to see the influence of changing one or more of the marginal quartile values. If this option is taken and further re-assessment made, then the m ethod given in Lemma 8.4 is applied again on the modified m atrix E ^ , to give a new set of m arginal quartiles. These can be changed again by the expert if she does not find it a satisfactory representation of her opinion. We should mention th a t the new set of m arginal quartiles does not necessarily have the same values as the modified quartiles. The unit sum condition of p, with the norm ality assum ption of each Yj j, for j = 1,2, • • • , k, always forces the m arginal interquartile range for a 244 single probability to partly depend on the other probabilities, as shown in Lemma 8.4. Hence, for m athem atical coherence, the resulting set of m arginal quartiles will not correspond exactly to the expert’s assessments. The proposed approach th a t uses Lemma 8.4 and continuous feedback enables the expert to adjust the quartiles until she is happy w ith the feedback values. 8.6 Exam ple: T ransport preferences In designing transport systems for the future, one ingredient is the relative im portance of fac­ tors a person may consider in selecting the mode of transport for different journeys. Estim ates of these preferences help in planning rail services, roads and other transport infrastructure. Such estim ates are also of interest from the environmental point of view, because of the impact of transport emissions. For a preparatory environmental study, estimates about factors affecting transport pref­ erences in 2020 were needed. In this example, a transport expert quantified his opinion about the factors affecting the choice of transport for a hundred mile journey across UK in th a t year. Prim ary interests of the expert (Dr. James W arren, The Open University) include modelling energy and emissions to gain a better understanding of transport systems and the potential effects of transportation policy and technology on the environment. He specified five quan­ tities as the main factors a passenger would consider in choosing the means of transport for such a journey. These factors are: cost, journey time, environmental im pact, comfort, and convenience. Interest focuses on the relative frequency w ith which each of these quantities is the most im portant factor: For what proportion of people would cost be the m ost im portant factor in choosing the mode of transport for the journey? For what proportion would it be journey time? And so on. The problem can thus be described as a multinom ial model w ith five categories, one for each factor. Our m ethod and PEGS-Logistic software were used by the expert to quantify his opinion about a logistic normal prior for the param eters of this multinomial model. After initializing the software and defining the model, the expert assessed his medians of 245 the proportion of people for whom C o st/ T im e/ E colm pact/ Com fort/ Convenience would be the most im portant factor. These medians assessments were 0.61, 0.25, 0.04, 0.06, 0.10, respectively, and they are the blue bars in Figure 8.4. These values do not sum to 1 and the software suggests values (yellow bars) th a t did. R ather th an accepting these suggestions, the expert revised his initial median assessments to be 0.49, 0.28, 0.04, 0.06, 0.11, respectively. As their sum is nearly equal to one, the medians suggested next were very close to his assessments and the expert accepted them as representatives of his opinions. ; l E r r'"r7~z:zzizzzzr'jz : nz “ — Now; you have finished with this fram e. Accept or modify suggestions to sum to onel ........ i Fite Edtt Tools Nefe Eliciting M edians of P robabilities fo r Each C ategory 0.95 0.90 0.85 0.80 0 .7 5 0.70 0.65 0.60 0.55 ■«°-50 I 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0 .0 0 ! C ost I < 8 a c k [| T im e E c o lm p a c t C a te g o r ie s S u g .y s fK .n a | C o m fo rt : H e# C o n v e n ie n c e | j, H e l p ? '( 4 ) , | Figure 8.4: Software suggestions for initial medians The expert then gave his assessed upper and lower quartile values for the probability of the first category; these were 0.62 and 0.43 respectively. Then conditioning on his assessed medians for previous categories, he assessed his conditional quartile values. The four con­ ditional lower quartiles were 0.18, 0.03, 0.03, 0.10, respectively, while the four conditional upper quartiles were 0.36, 0.10, 0.08, 0.15, respectively. See Figure 8.5, in which the expert has given his two quartiles of the fourth category conditional on the probabilities of the first three categories. The quartiles of the last category follow automatically. Although the expert is not a statistician, he had no problems in assessing quartiles after a brief discussion about 246 the m ethod of bisection. Now, you have finished with this category. You m ay p ress 'Next* to proceed Eliciting a u a rtn e s o f th e probabilities o f C P ro p o rtio n W E c o lm p a c t t < S ack ; Figure 8.5: Assessing conditional quartiles Next, the expert gave conditional median assessments of 0.41, 0.16, 0.12, 0.33 for the remaining four categories, conditional on the probability of the first category being 0.25. The num ber of conditions was then increased in stages. Conditional on 0.25 and 0.20 being the probabilities for the first and second categories, respectively, the expert revised his probability median assessments for the last three categories to 0.13, 0.18 and 0.25, respectively. See Figure 8.6. Finally, he gave the conditional medians of 0.19, 0.30 for the last two categories given th a t the probabilities of the first three categories were 0.25, 0.20 and 0.07, respectively. 247 : Hot/, you hove. E n r s l t o i niVtr t i u s fram e. You may* cSci ’liaxt* n o v i Eliciting conditional m ed ian s o f P robabilities fo r E ach C ategory i !! C ost T im e E c o lm p a c t C a te g o r ie s C o m fo rt C o n v e n ie n c e <*ladT1 Figure 8.6: Revised conditional medians It is worth mentioning th a t the suggestions given by the software played a crucial role in helping the expert choose medians th a t satisfy the unit sum constraint. During the elicitation process, obviously the sums of expert’s assessments never equalled one exactly. W hen sug­ gestions were offered by the software, he normally revised one assessment and then accepted the second round of offered suggestions. After making his conditional m edian assessments, the expert was then shown the unconditional medians and unconditional quartiles th a t were implied by all his assessments. See Figure 8.7. During this feedback stage he was invited to accept or revise these quantities. The unconditional medians th a t were offered were ac­ cepted by the expert as an adequate representation of his opinion. However, he decided to use the change quartiles b u tton to revise the unconditional quartiles and then reduced the interquartile range of the last category. 248 . : . . . 1- — —~ . " ! ~~... aqta H eraareyouruncontilC frnafassxissnients, yoti m aycfoang&any o f tfusmi t „ „„„ > F it E« ^ , :JP1^ Toots Ba*p 0.95 0.90. 0.&5 0.80 0.75 0.70 0.65 0.60 0.55 a. 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 « C ta n g .lM m » p rg jg j-j OCftanaeQuatttes 1— :------- 1 T im e p. E c o lm p a c t C a te g o r ie s , | j is : C o m fo rt | 1 ! 1 C o n v e n ie n c e f F ira s n | 1 - C ost — *-----“* F fiS W l 1------------- 1 Figure 8.7: Software suggestions for m arginal medians and quartiles The elicitation process took about 20 minutes to complete. The expert commented th a t although the elicitation problem was quite tricky, the software gave a helpful form of visual­ ization. He also mentioned th a t he had found it hard to make his median assessments sum to one, so th a t the software’s suggestions had been very welcome. He also advised th a t it would be helpful if the different categories were ordered according to their im portance, i.e. in a descending order according to their median probability values. He thought th a t this order would make it easier for him to think about conditional assessments. The software output the following elicited hyperparam eters of the logistic norm al prior as in Tables 8.1 and 8.2. Table 8.1: The elicited mean vector of a logistic normal prior y 2 = log(p 2 /pi) Y 3 = log(p3 /pi) Y 4 = log(p4 /pi) Y 5 = log(p5 /p i) -0.5058 -2.4517 -2.0639 -1.5043 249 Table 8.2: The elicited variance-covariance m atrix of a logistic normal prior ^ 2 = l0 g (g ) ^3 = l0 g (g ) n = log(EJ) Ys = lo g (g ) ^2 == lo g (g ) 0.3414 0.1511 0.1598 -0.3035 t == 10g ( g ) 0.1511 0.9087 0.3677 -0.5551 y4 = JO II 0.1598 0.3677 1.0906 -1.9076 y 5 == los(g-) -0.3035 -0.5551 -1.9076 3.468 3 This output gives the mean vector and variance-covariance m atrix of a m ultivariate normal distribution of degree 4 for Y2 , 1 3 , Y4 , Y5 . However, the marginal moments of each pi are not given as output. Instead, marginal medians and quartiles are presented to the expert during the feedback stage as discussed before, see Figure 8.7. The m ultivariate normal distribution of F 2 , F 3 , Yj, Y§ may be used as a prior distribution in a Bayesian analysis. Details of the additive logistic transform ations are also needed: , 1 Pi = < for i = 1 , + ^ e x p (Yj) 3=2 exp(Y ) ■■■.-----, for i = 2,3, • • • ,5. l + ]Texp(Y-) j =2 Of course, the extra variable Y\ is om itted as it is a redundant variable due to the unit sum constraint on p. The software has an option to implement this prior distribution in a WinBUGS file. After the sample data are obtained, the software produces a file for a W inBUGS model th a t contains sample data, a multinomial likelihood and a complete specification of the logistic normal prior distribution th a t the expert assessed. 8.7 C oncluding com m en ts In Chapters 6 and 7, we introduced elicitation m ethods for Dirichlet, generalized Dirich- let and Gaussian copula as prior distributions for the param eter vector p of the m ulti­ nomial model. Hence the logistic normal distribution is our fourth suggested prior dis250 tribution for this model. Among these priors, the logistic normal prior gives the most general correlation structure. The PEGS-M ultinomial software, th a t is freely available at http://statistics.open.ac.uk/elicitation, offers the option of eliciting any of these four prior distributions. As noted earlier, it is tricky to elicit assessments th a t satisfy all the necessary requirements for multinomial models. For example, if there are only two categories, the lower probability quartile of one category and the upper quartile of the other must add up to one. As the number of categories increases the requirements th a t m ust be satisfied increases. In our proposed elicitation m ethod, we chose assessment tasks and a structure th a t led to a coherent set of assessments, w ithout the expert having to be conscious of the requirements. 251 C hapter 9 E liciting m ultinom ial m odels w ith covariates 252 9.1 In trod u ction W ith multinomial models, the membership probabilities of different categories may depend on one or more continuous or categorical explanatory variables (covariates) th a t influence these probabilities. The simpler well-known example in this context is the logistic regression, where the probability of being in one of only two categories is related to a set of explanatory variables through the logit link function. Suppose there are k categories, let pi,P 2 , - " ,Pk denote the membership probabilities and let = ( X i , X 2 , • • • , X m) be a vector of m explanatory variables. Relating X_ to each probability pi using separate logit link functions is not the best choice. The inverse link functions gives exp (ai + X ' p . ) P i { X ) = 1 + exp(ai +T Xy P' oi )V z' = 1’2 (9>1 in which case, it will not be easy to investigate the conditions under which the constraint J2i=iPi{20 = 1 is fulfilled. Some other link functions are available in the literature [e.g. Aitchison (1986)]. However, the additive multinomial logistic link function is the most con­ venient, as it autom atically accounts for the unit sum constraint. It links the classification probabilities to linear predictors in the form, k ’ 2==1, 1 + Y L exP(a i + X' Pj ) Pi(X) = 1 3= 2 (9.2) exp (a* + X ' f i J k %— 2, • • • , k. 1 + J 2 exp ( « j + x ' p . ) 3=2 Expressing the model in the form of (9.2) helps to generalize results obtained in the previous chapter to the current case. For the Bayesian analysis of the multinomial logit model, a m ultivariate norm al prior may be assumed [e.g O ’Hagan and Forster (2004)] for the param eter vector 253 where the vectors of coefficients, (ai,p.)', are category specific, for i = 2, • • • , k, i.e. each category has its own vector of regression coefficients. We select the first category as the fill-up category, hence, its regression coefficients, (a q ,/^ )', are not included in the prior distribution for identifiability. In this chapter we propose an elicitation m ethod for eliciting a mean vector and a positive-definite variance-covariance m atrix of the normal prior distribution of /3*. Our proposed m ethod is based on the results obtained in the previous chapter for the logis­ tic normal prior distribution of the multinomial model. The proposed m ethod has been implemented in the PEGS-M ultinomial with Covariates software th a t is freely available at http://statistics.open.ac.uk/elicitation. In Section 9.2, we define the underlying model, namely, the base-line multinomial logit model, in term s of the additive logistic transform ation. The required assumptions, notation and theoretical framework are discussed in Section 9.3. Elicitation m ethods and assessment tasks required for eliciting a mean vector and a positive-definite variance covariance m atrix for the regression coefficients are proposed in Sections 9.4 and 9.5. Final concluding comments of this chapter are given in Section 9.6. 9.2 T h e b ase-lin e m ultin om ial logit m od el The model th a t uses the link function in (9.2) is known as the multinomial logistic (logit) model, since it has multinomial responses with a num ber of k > 2 categories. The model in (9.2) is usually given in the more general form exp(oti +X'/3.) Vi{X) = - ------------ = = ^ - , j =1 i = l,2 ,...,fc , (9.3) exp(oy + X'fi.) which is called the base-line multinomial logit model. See, for example, Agresti (2002) or Powers and Xie (2000). In the rest of this chapter, for ease of notation, each classification probability Pi(A ), as defined in (9.2), will be ju st denoted by pi, for i = 1,2,*-* , k. 254 To attain the unit sum constraint in the base-line model, an identifiability constraint must be imposed by equating the coefficients of the “base-line” category to zeros. The selection of the base-line category is arbitrary. If we select the first category as the base-line category, then, under the identifiability constraint (ctq, f t )' = 0, it can easily be shown th a t the model in (9.3) is equivalent to th a t in (9.2). Thus, the model has exactly (k — l)(m + 1) free param eters. From (9.2), the linear predictor, Yj = otj + 2C/3 -, can be w ritten in term s of the logistic transform ations in classification probabilities as Yj = Oij + X!fi. = log f a ) - log(pi), for j = 2,3, • • • , k, (9.4) where the regression coefficients for the jfth category are We define an extra variable, Yi, as Yi = log(pi) - log(l - pi). (9.5) This extra variable is required to be used as a conditioning value in the elicitation process, as shown in the previous chapter. We do not assume Yi to be a linear predictor, since the trivial param eters, aq and (3 , will not appear in the elicited prior distribution. We adopt the conventions oq = 0, 9.3 = 0, for identifiability of the base-line model. N o ta tio n and th eo retica l fram ew ork We assume th a t the prior opinion about the linear predictors Y 2 , • • • , Tfc, can be adequately represented by a m ultivariate normal distribution of degree k — 1. Then from equations (9.4), (9.5) and Section 8.2.1, Yi has an approxim ate normal distribution. In addition, the classification probabilities, pi,P 2 , ‘ ” lPk, have a logistic normal distribution as defined in Section 8.2. Following O ’Hagan and Forster (2004), we assume a m ultivariate norm al prior distribution for the regression coefficients. 255 For tractability in the elicitation process, the expert is asked to give her assessments for the classification probabilities, p\, ■■• ,p^, and consequently for Y 2 , • • • , Yfc, for only one covariate at a time. All other covariates are assumed to be at their reference values/levels. By doing this for each covariate in turn, the expert can concentrate on revising her assessments as a result of the change in ju st one explanatory covariate. The relationship between each Yj and each continuous covariate X r is not necessarily linear. A piecewise-linear relationship as discussed in Chapters 3 and 4 might be a reasonable choice here th a t can model many types of relationships. However, in dealing with k categories and m explanatory covariates, a piecewise-linear relationship will seldom be practical as it imposes a large num ber of dividing points (knots) at which the expert m ust give assessments. This would lead to a lengthy elicitation process. So, to simplify the elicitation process, we assume th a t relationships are linear. Specifically, we assume a linear relationship between each continuous covariate X r , r = 1,2,•• • ,m , and each Yj, j = 2, • • • , k, of the form Yj = aj + X rpr,j, r = 1, • • • ,m, j = 2 ,---, k, (9.6) given th a t all other covariates are fixed at their reference values/levels. T h at is, equation (9.6) holds when X{ = a^o, for z = 1,2, • • • , m, i ^ r, where 2^0 is the reference value/level of X{. If, all covariates are at their reference values/levels, i.e. X{ = x ^ 0 , for i = 1,2, • • • , m, then Yj = aj> j = 2, •••,& • (9-7) To achieve this, for r = 1,2, • • • , m, if the covariate X r is a factor (categorical variable), with a reference level x Vjq and any number S(r) of levels, x rji , x r^ , • • • , av,<5(r)> then X r is split into <5(r) new factors, X rj defined as 1 if Xi-rf* Xy^'l (9.8) Xr,i — < 0 otherwise, for i = 1,2, • • • , 5(r). 256 If X r is a continuous covariate with a reference value x t> q, then we define a new variable X* as X* = X r —x rfi, for r = 1,2, • • • , m. (9.9) W ith the new covariates defined by (9.8) and (9.9), the value of each covariate is equal to zero at its reference value. Hence, if m consists of m \ factors and m 2 continuous covariates, we get a new set of, say, m* explanatory variables, where mi m* = + m 2. j= 1 To simplify the notation, with no loss of generality, we keep the notation X i , W2, • • •, X m, for the set of covariates, while keeping in mind th a t m actually denotes m* and th a t each X r is of the form of (9.8) for a factor or (9.9) for a continuous covariate. In this sense, the models in (9.6) and (9.7) are equivalent to (9.4). It is convenient to rearrange the regression coefficients into a m atrix, say (3, of the form (a\. \ OC2 AN (\ \ Oik (9.10) \ N i \& ) Then we define the new set of vectors a , j3^y for r = 1,2, • • • ,m , as the rows of /?, of the form QL = (an, —(r) CX-2 > ^Pr , l ) O'ky, J > fir,k^ ’ f i r , 2j (9.11) (9.12) and the same set with the first zero elements removed, as a1= —(r) (Oi 2 > ^fi r, 2i 3 f i r, 3i Oik 3 3 fir,k^ (9.13) (9.14) Since each column of the (3 m atrix in (9.10) contains regression coefficients th a t correspond to one category, it is more convenient to work w ith the rows, which each correspond to one covariate. In this case, elements of a single row correspond to classification probabilities, and 257 hence these elements m ust be inter-related in a way th a t reflects the unit sum constraint of the probabilities. Therefore, we assume th a t the elements of a are correlated, and th at the elements of each / 3 ^ are also correlated, hence statistically dependent, a priori, for all r = 1,2, ••• ,ra. While elements from different rows of /?, th a t corresponds to different covariates, are assumed to be independent a priori, so as to simplify the elicitation process and obtain a block-diagonal variance-covariance m atrix. If we let d 1 = (ft1' ft1' .. . ft1' V, then the m ultivariate normal prior distribution V£ (l)’ £(2)’ ’ £(m) . to be elicited is thus of the form, (a MVN \? J 9.4 f1 a. t— (9.15) VV E licitin g th e m ean vector To elicit the mean vectors p a and p^, in (9.15), we proceed as follows • The expert is asked to assume th a t all covariates are at their reference values/levels, i.e. X r = 0, r = 1,2, • • • , m. We call this situation as the reference point. She then assesses a median value, say m^o g, for the probability p\ of the first category. As discussed in the previous chapter, since the choice of the first category is arbitrary, it is chosen by the expert as the most common category. Then the expert assesses m edian values mJ 0>o» i = 2, • *• >k, for all the remaining categories. • As proved in Theorem 8.1 in the previous chapter, these unconditional m edian assess­ ments are equal to the conditional medians of (pj\pi = ^ iio o ) f°r J ~ 2,3, ••• ,/c. For convenience, we denote both conditional and unconditional medians by m*^0j0 ) j = 2, • • • , k. Lemma 8.1 in the previous chapter states th a t median assessments m ust sum to one, so they are normalized by the PEGS-M ultinom ial w ith Covariates software to fulfill this condition. 258 • For each covariate in turn, the expert is asked to assume a specific value of the current covariate, say X r = x r , while all other covariates are assumed to be at their reference values/levels. Under these assumptions the expert starts by assessing a median value for pi, say m \ Qr. Then she assesses a new set of median values, say ra*0i7o f°r 3 = 2,3,--- ,k, for all the remaining categories. Again, these assessments are normalized to satisfy the unit sum constraint. This process is repeated for each covariate, i.e. for r = 1,2, • • • , ra. • Figure 9.1 shows the assessed probability medians when only one of the covariates, age, has changed from its reference value to a new value (40 years). To help the expert during this stage, the software gives the previously assessed medians when all covariates were at their reference values/levels. This is presented by the upper right graph of Figure 9.1. The reference value/level of each continuous covariate/factor is also listed in the upper left table as in Figure 9.1. Probability m edians a t th e r e fe re n c e point Now, y o u h a v e fin ish ed w ith th is fra m e . You m a y click 'N ext11 File Ectt Tools Help Eliciting M edians of Probabilities fo r Each C ategory w hen th e covariate (age; Categories IHem?,18; il H«ie' ~ | <9 i Figure 9.1: Assessing probability medians at age = 40 years 259 Now, let the conditional median of Yj, given th a t all covariates are at their reference levels, be denoted by m ^ o ? for j — 1,2, • • • ,k. Also, let the conditional median of Yj, given th a t X r = x r and all other covariates are at their reference levels, be denoted by rrijto>r, for j = 1,2, • • • , k, and r = 1,2, • • • , m. As the transform ations in (9.4) and (9.5) are monotonic increasing, medians and con­ ditional medians are transform ed. 2 Hence we can write, for r = 0,1,2, ••• , m, and j = ,3,• • • , k, mi,o,r = log(rai)0)T.) - log(l - m*1Ar), (9.16) mj,o,r = log(m*0,r ) - log(m ^0)r). (9.17) It is worth mentioning here th a t the validity of (9.17) is a result of defining r a j^ r as the conditional median of (pj\pi = r a |)0r), which implies th a t m^o.r is a conditional median of (Yj\Yi = m i)o,r)--That is why we need the redundant variable, Yi, to be defined in (9.5). The computed assessments from (9.16) and (9.17), together w ith the linearity assumptions in (9.6) and (9.7), enable us to determine fij = E{a.j), for j = 2, • •• , k, as p.j = E( Yj\Xi = 0,Vz = 1,2, • • ■,m ) = (9.18) We m ust determine /rrj- = E((3rj ) for r — 1,2, • • • , ra, j = 2, • • • , k. If X r is a factor, then from (9.6) and (9.7), and utilizing the assessments in (9.16) and (9.17), we put lirJ = E ( Y j \ X r = 1 , X i = 0, Vi ^ r) - E{Yj\Xi = 0,Vi = 1,2, • • • ,ra) = rrij^r - rrijfifi. (9.19) If X r is a continuous covariate, then /3rj is the slope of the linear relation in (9.6), so firj = [E(Yj\Xr = x r , X i = 0,Vi 7^ r) —E(Yj\X{ = 0,Vi = 1,2, • ■• , m ) ) / x r = [mj>0)r - m j toto]/xr , for r = 1,2 (9.20) , • • • , ra, and j — 2 , • • • , k. Finally, we put ifa = ^ 2 , M3, 260 •• • , Mfc) 5 (9-21) and Up ~ 9.5 (m i,2j •••> Ml,fc> M2,2, M2,fc, Mm,2 , ' • * , Mm.fc) ' (9-22) E licitin g th e variance m atrix To elicit a positive-definite m atrix for the m ultivariate normal prior distribution of the re­ gression coefficients in (9.15), we proceed as follows. 9.5.1 E liciting th e variance-covariance sub-m atrices We denote £ Q = Var ( a 1) by So? and put S r ,a = V a r e l a 1) (9.23) and S ^ V a r ^ , ^ .... (9.24) In order to develop a m ethod for eliciting positive-definite matrices So and £ r |a (r = 1, • • • , ra), we proceed as follows. From (9.7) we put So = V a r(F 1|X i- = 0,Vz = l , 2, - . - ,m ) = V0, where X1= ( y 2i y 3) ..., (9.25) y fc) . For continuous covariates, if we assume th a t X r = x r and X{ = 0, for i = 1,2, • • • ,m , i ^ r , we have from (9.6) th a t VarQ^lJW = x r , a 1 = j x j = x 2r Var(/?Jr)|a 1 = p j = Vr . (9.26) Hence, for r = 1,2, • • ■, ra ■Er |a = x - 2 Vr . (9.27) For factors, (9.27) is reduced to E r |* = Vr . Each m atrix Vr (r = 0,1, • • • , ra) can be elicited as (9.28) a positive-definite m atrix in the way used to obtain the variance m atrix of the logistic normal prior in Chapter 8. 261 R em ark 9.1 The main difference between this chapter and Chapter 8 is th a t here the process of assessing the conditional medians and quartiles must be repeated m + 1 times. In the initial step, the expert is asked to assume th a t X{ = 0, Vi = 1,2, • • • ,m. Then, in each successive step number r, for r = 1,2, • • • , ra, the expert is asked to assume th a t the r th covariate has changed from 0 to x r , i.e. X r — x r , while all other covariates are at their reference values, i.e. = 0, for i — 1,2, • • • ,m, i ^ r. During these remaining ra steps, another key assum ption is made. The expert is also conditioning o n a 1 = / ^ . Under these main assumptions at step r, r = 0,1, • • • , m, the assessment tasks can be detailed as follows. 9.5.2 A ssessing conditional quartiles • Under the assumptions listed in Remark 9.1, the expert is asked to assess a lower quartile L \ r and an upper quartile U{ r for p \ . She is then asked to assume th a t pi = m \ Qr and gives a lower quartile L\ and an upper quartile U2r for p2. • For each remaining pj, j = 3, • • • , k — 1, she assesses the two quartiles L j r and UjT given th a t pi = m j>0>r, p 2 = m*2Ar, ..., p j - i = m ]_ 10 r . • Using the interactive PEGS-M ultinomial w ith Covariates software, and due to the unit sum constraint, the lower (upper) quartile L ^ r {U^r) of p^ is autom atically shown to the expert once she assesses the upper (lower) quartile r (T^_l r ) of pk-i- • W ith the aid of a lognormal curve produced by the software, the expert is advised to make sure th a t her assessed interquartile range gives an almost zero probability of pj exceeding 1 — Yj[Zi ^ or more details on this, see Section 8.3.2 in the previous chapter. 262 9.5.3 A ssessing conditional m edians • Under the assumptions listed in Remark 9.1, for r = 0,1, • • • , m, the expert is asked to assume th a t the median of p\ has been changed from m \ Qr to Given this information, the expert is asked to change her previous medians 0)T- of each pj. Her new assessment, Trij l r , may be w ritten as m l i , r = m l o.r- + 6 j , i >r, for j = 2, • • • , k. (9.29) • In each successive step i, for i = 2,3, • • • , k — 2, the expert is asked to suppose th a t the median values of pi, p2, ..., p%are m \ l r = m ^ 0)T. + ^ r , m*22r = m i i-i r ••• , = + ^iri respectively. These are shown as red bars in Figure 9.2. Given this information, she is asked to revise the medians th a t she assessed at the most recent previous step nr*+l i_ l r , m?+2i_ 1)7., ’ ' ' Figure 9.2. Her new assessments are denoted m j+1 ^ = 77i^+1 m i+ 2 ,i-i,r + 0i+2,t,r» ' ' 1> m k ,i,r = m k , i - i,r sh°wn by black lines in +^*+1 >ijT., m*+2 i)T, = + 0jfc,i,r» respectively, which are shown as the blue bars in the main graph of Figure 9.2. In other words, for i = 1,2, • • • , k —2, and j = i + 1, i + 2, • • • , k, we can write m h,r = + 0j,i,r is the median of (pj |pi = m ijl>r, • • • ,Pi = mf>i>r). (9.30) • For m athem atical coherence, as proved in Lemma 8.3, Section 8.4.2 in the previous chapter, we have to make sure th at i k y ! m j,j,r + m j,hr = j= l j=i+l 2 = 1, 2, —2. The software suggests new normalized conditional medians satisfying the above con­ straint. • As mentioned in Remark 9.1, the expert assesses her conditional medians assuming th a t only one of the covariates, age, has changed from its reference value to 40 years, and assuming at the same time th a t her previously assessed medians at the reference point 263 are correct. Probability medians at the reference point are presented to the expert in the upper right graph of Figure 9.2. The expert is asked to assume these medians are the true values while assessing her conditional medians on the main graph of Figure 9.2. jig yg ^ j] » j.g frte x - Marotoft Out... I , > V>Vfctt,»*TcX • [C:V.. | P 7 fctobt Reacter Figure 9.2: Assessing conditional medians a t age = 40 years Assessment tasks in Sections 9.5.2 and 9.5.3 will be repeated m + 1 times, for r = 0,1, • • • ,m . Then, as detailed in the previous chapter, the normalizing one-to-one functions in (9.4) and (9.5) are used to transform the assessed conditional quartiles of p into condi­ tional quartiles of Y_ and, hence, into conditional expectations, variances and covariances of the m ultivariate normal elements. The m ethod of Kadane et al. (1980) is modified, as in the previous chapter, to estim ate a positive-definite variance-covariance m atrix Vr for Y } \ X r, from the assessed conditional medians and quartiles. So, because of the unit sum constraint, each positive-definite m atrix Vr is of order (k — 1). Under the assumptions leading to (9.15), and in view of (9.23) and (9.24), the diagonal blocks of the block-diagonal m atrix S^|a are S r |a , where each E r |a is given by (9.27), for 264 r = 1,2, ••• ,ra. Hence, E^|a is a positive-definite m atrix. The unconditional variance- covariance m atrix E^ will be obtained from E ^ using the covariance m atrix Eq,^. The latter is elicited as follows. 9.5.4 E liciting th e covariance m atrix Eaig The covariance m atrix of a 1 and ft 1 is the m atrix Eq,^ of order (k — 1) x m{k —1). To elicit this m atrix, it is convenient to conformally partition Ea)/g as = ( E a ^ , E Qj/32, •••, E (9.31) where, for r = 1,2, • • • , m, S a A ^ C o v f e 1, ^ ) . (9.32) We denote the rows of each S a,/3r by <Z-a,(3r,t’ f°r ^ — 2, • • • , &, where ^ For any specific value satisfying it = C ov(at , ^ r)). (9.33) ^ fit, for t = 2, • • • , k, it can be seen from (9.15), (9.32), (9.33) and the theory of m ultivariate normal distribution th a t &LU. = —(r )1“ < = “ t) = fi«. £&• + - (H _Var(af)_ (9.34) From this Vai(at) - Vt (ifft-la ,-H a .)- (9-35) Since Var(o;t) is the (i —l) th element of the main diagonal of So as in (9.25), then, from (9.32) and (9.33), E a a can be elicited using (k } — 1) assessments of fin . Pr\Ott , for t = 2, • • • k. , Under the normality assumptions, these conditional means of the regression coefficients can be computed from the conditional median assessments of the classification probabilities. This can be detailed as follows. For each covariate X r (r = 1,2, • • • , m) in turn, the expert is asked to assume th a t each single at {t = 2, • • • , k) in tu rn has changed from fit to af, i.e. she is asked to assume th a t the true value of (pj\X{ = 0, Vz = 1,2, • • • , m) has changed from to a new specific value, rrijQ Qt . This is shown by the change from the black lines to the red 265 bars in the upper right graph of Figure 9.3. Given this information, the expert then assesses her median of (Pj\Xr — x r , X{ = 0,* = 1,2, • • • , m , i ^ r), which we denote by ^ 0 r|at’ ^or j = 2, • • • , k. These are assessed as the blue bars in the main graph of Figure 9.3. -a n yjj C onditioning probabilities a t th « re f e r e n c e point C r w l w ! K efc rw e n fcie * m * igt f® kr 1 30.fi I code 1 60.8 Now, y o u l u v e Rnlsboii w ith th is fra m e . You m a y efiefc *8 0 x1* n o w ......."-"M g no tmTea* Hcd> Eliciting c on d itio n al M edians fo r Each Category w hen th e c o v a m te (age) I; F T 5 S T iilf| IB @ 0 i£l 0 i!3 * r * I X ta /H m * | Q 7 MteBCTdc- » | QMJtrOTalw«ic...|| ltj5 ■ »jjDreirentt.Mcr..,| JjDaaCTmli-Her... | Q IM? Figure 9.3: Assessing conditional medians given changes at the reference point The choice of the specific values a£ is arbitrary, provided th a t a£ ^ fit. However, we select each of them to be the upper quartile of the normally distributed variable at, namely, a^ = fit + 0.674>/Vhr(at), for t = 2, • • • , k. (9.36) This leads, from (9.4), (9.5) and (9.7), to sets of conditioning probabilities, rrij 0 0 t , th a t are given by m exp(aj) for j = 1, • • • , k, (9.37) 1 + E z t i exp(ajj) where a\ = 0, a f = a% and c& = fij, for j ^ t. Since, as in (9.34), we condition on changing at, for t = 2, • • • , k, one at a tim e, we have to compute the resulting conditioning probabilities from this change as in (9.37). If we had chosen to first change the conditioning probabilities, the desired change for at would not have been guaranteed. 266 As in (9.17), the corresponding median assessments for Yj can be computed, for j = 2, • • • , k, r — 0,1, • • • , m, and t = 2, • • • , k, as ™ j,0,r|at = i o g K V l a t ) _ lo S ( m lA r |a t )- ( 9 -3 8 ) Hence, we denote E((3rj\cxt = a£) by iirj\ at and compute it as follows. If X r is a factor, then as in (9.19), we put A*r,j\at ^j,Q,r\at ^U,0,0|af (9.39) If X r is a continuous covariate, then as in (9.20), we put fir j\at — m j>°>r\at ^~ m J»o.Qlat (Q An\ , (9.4U) for r = 1,2, • • • , m, j = 2, • • • ,k, and t = 2, • • • , k. P utting ^l^r,2 \at^ Mr,3|at» —/3r|ott all the components of g[!a > f^r,k\at^ ’ as in (9.35), and hence of Y,a^ r as in (9.32), are elicited. Then E a>/? as in (9.31) is fully determined. After obtaining the covariance m atrix E a>/g, and utilizing the elicited m atrix E ^ q,, we get E/j from the conditional variance ^/3|a — S/3 — E ^)jgEQ1E a)/3, (9.42) S/9 = S/3|a + E J ^ E ^ E q,^. (9.43) which gives Since E ^ and E a are positive-definite, so is E/j. Also, from (9.43) and using the Schurr complement, the full variance-covariance m atrix of the m ultivariate normal prior distribution in (9.15) ispositive-definite. It is of order (k — l)(m + 1) and does not contain or covariances of a\, nor the elements of /? .This is equivalent variances to the usual identifiability assum ption of the base-line multinomial logit models, where the regression coefficients of the base-line category are set equal to zeros. 267 9.6 C onclu din g com m en ts A novel m ethod has been introduced for eliciting a m ultivariate normal prior distribution for the regression coefficients in a multinomial logit model w ith explanatory covariates. The m ethod is an extension of our proposed m ethod in Chapter 8 for eliciting a logistic normal prior for classification probabilities in a multinomial model. Specifically, under a base-line m ultinomial logit model containing k categories and m explanatory covariates, assessment tasks of a standard multinomial model are repeated m 4-1 times. The expert assesses con­ ditional medians and quartiles for the multinomial probabilities at specific values of each explanatory covariate. This determines a m ean vector and a positive-definite variance- covariance m atrix of a m ultivariate normal prior distribution for (k — l)(m + 1) regression Coefficients. 268 C hapter 10 C oncluding com m ents 269 This chapter summarizes the main results and conclusions of the thesis. We give a brief review of the elicitation methods proposed throughout this thesis, commenting on the main assumptions, strength and weakness points of each proposed m ethod. In addition, the inter­ relationships between related methods are mentioned and clarified. The proposed methods divide naturally in two groups: m ethods of quantifying expert opinion for GLMs and methods of prior elicitation for multinomial models. The proposed m ethods in each group are briefly discussed in order. Some extensions for further future research are given. The m ethod proposed by Garthwaite and Al-Awadhi (2006) and its extension in Garthwaite and Al-Awadhi (2011) can be considered a general tool for eliciting a m ultivariate normal prior for the regression coefficients in any GLM. In their m ethod, opinion about the relationship between each continuous predictor variable and the response variable is modeled by a piecewise-linear function. This gives a flexible model th a t can represent a wide variety of opinion. Expert opinion about each categorical predictor variable (factor) is elicited through a bar-chart. Each slope of the piecewise-linear relationships and each level of the factors has a corresponding regression coefficient. The expert assesses conditional medians and quar­ tiles of the response variable at different selected design points. In this sense, the m ethod applies the idea of conditional means prior proposed by Bedrick et al. (1996). Conditional assessments are transformed, under the normality assum ption of regression coefficients, to estim ate a mean vector and a variance-covariance m atrix for the m ultivariate normal prior distribution. Conditional quartiles are assessed in a structural way th a t ensures th a t the resulting m atrix is positive-definite. The m ethod proposed by Garthwaite and Al-Awadhi (2011) has been implemented in interactive graphical user-friendly software, in which the expert draws piecewise-linear curves and bar-charts by clicking on interactive graphs on a com puter screen to give her assessments. The software computes and offers suggestions to the expert to help reduce the burden of making assessments. A prototype of this software was w ritten in Java by Jenkinson (2007) and has been modified and extended in the current thesis to be more flexible and to include 270 more options. A detailed description of the m ethod and the current modifications to the software has been given in Chapter 3. Previously the software could only handle logistic regression but now it handles a wide range of GLMs. As noted earlier, an im portant feedback option has been added to the software. As each covariate is assessed separately, this feedback option is very useful for helping the expert see the joint im pact of all explanatory covariates th a t her assessments imply. A simplifying assum ption in the m ethod of Garthw aite and Al-Awadhi (2011), th a t has been relaxed in this thesis, is th a t regression coefficients had been assumed to be indepen­ dent, a priori, if attached to different explanatory variables. This yielded a block-diagonal variance-covariance m atrix and reduced the num ber of required assessments for its elicita­ tion. However, this independence assum ption can be unrealistic in many practical situations. We proposed three elicitation methods for a m ultivariate normal prior distribution th a t do not impose this simplifying assumption. The proposed methods elicit full variance-covariance matrices, but additional assessments are needed in order to estim ate the off-diagonal elements. As noted earlier, the three proposed m ethods differ in their flexibility and in the num ber of additional assessments th a t they require. The first m ethod is a direct extension to the m ethod of Garthw aite and Al-Awadhi (2011). It is the most flexible m ethod among the three and perm its different correlations between regression coefficients attached to the same pair of covariates. Consequently, it requires a large number of conditional assessments, but it should prove useful when there are only a few pairs of variables th a t, a priori, have highly correlated regression coefficients. The second proposed m ethod uses only one assessment to model the correlation between all regression coefficients attached to any specific pair of explanatory covariates. This assum p­ tion, of fixed correlations for all elements belonging to the same pair of vectors of coefficients, is useful as it reduces the assessment tasks to ju st one task. The expert is asked to use a slider to determine the correlation between two vectors of regression coefficients. This can be attractive as an easy and quick m ethod for eliciting correlations if only two vectors of 271 regression coefficients are thought to be correlated. Moreover, for the case where more than two vectors have correlated regression coefficients, we extended the m ethod and showed it will yield a full variance-covariance m atrix th a t is positive-definite. The third m ethod we proposed is suitable for GLMs th a t contain a large number of correlated vectors. It uses a few assessments th a t directly reflect the p attern of correlations between all pairs of vectors. In a dialogue box, the expert assesses the relative magnitudes and signs of the average correlations between each pair of vectors. Hence, for n vectors of coefficients, n (n —l) / 2 assessments are needed. These relative m agnitudes should reflect the strength of the average correlation of each pair relative to other pairs. It is a comparatively easy task for the expert as these assessments need not be coherent correlation coefficients; they are scaled later to attain statistical coherence. The m ethod avoids incremented conditioning and assesses all covariances simultaneously. After assessing the relative magnitudes, using the PEGS-GLM (Correlated Coefficients), the third m ethod can be used alone or together with one of the other two proposed m eth­ ods, to obtain correlations. The default option, th a t implements this m ethod alone, is to use one slider to determine correlation coefficients based on simultaneous interactive graphs th a t show the changes of different variables according to their assessed relative magnitudes. The other two alternate options need an assessment of the correlation of only one pair of vectors, then all other correlation coefficients are computed from this assessment using the relative magnitudes. The correlation assessment for one of the highly correlated pairs may be obtained using one of the other two proposed methods. The first of them needs more as­ sessments, while the second m ethod assumes a fixed correlation structure for the elements of the highly correlated pair of vectors. Figure 10.1 shows the different options available to the expert for choosing which m ethod to use when she is assessing correlations between regression coefficients in GLMs. These are the different options offered by our PEGS-GLM (Correlated Coefficients) software th a t is freely available at h ttp ://statistics.open.ac.uk/elicitation. 272 Eliciting a block-diagonal matrix No .Any correlated vectors?. Yes More One pair or more? One pair No Fixed correlations? Yes Method 1 Method 3 Method 2 Yes Weight by a pair? Finish No Figure 10.1: Options for assessing correlations between regression coefficients To complete the prior structure of GLMs w ith normal and gamm a response variables, we proposed two m ethods of eliciting prior distributions for the extra param eters in these models. One of these methods elicits a conjugate chi-squared prior distribution for the random error variance in normal linear models. The expert is asked to revise her assessments conditional on various sets of hypothetical future samples. A num ber of sets of hypothetical d a ta are used in order to obtain several estimates of the hyperparam eter th a t is most difficult to assess, namely, the degrees of freedom param eter of the chi-squared distribution. Reconciliation of these estimates, using the geometric mean, yields an overall estim ate of the num ber of degrees of freedom. The second hyperparam eter of the chi-squared prior distribution is also determined from the same assessments. The use of interactive graphical software greatly 273 facilitates the tasks th a t the expert m ust perform. For a gamm a response variable, the additional param eter th a t m ust be assessed is the scale param eter. We assumed th a t prior opinion about this positive-valued param eter can be reasonably quantified as a lognormal distribution. To determine the hyperparam eters of the lognormal prior distribution, the expert is asked to give a point estim ate and an interquartile range for the lower quartile of the gamma response variable. We proved th a t the lower quartile is a monotonic increasing function of the scale param eter. The expert’s assessments are thus transform ed to quartiles of the lognormal distribution, and hence to the mean and variance of the lognormal distribution. An example of the questions th a t can be asked in order to obtain the expert’s assessments has been given. As noted earlier, no other reasonable elicitation m ethods for the scale param eters of gamma GLMs seems to be available in the literature. Eliciting flexible prior distributions for the classification probabilities in multinomial m od­ els has been another im portant interest of this thesis. In this context, we started by proposing two elicitation methods for the natural conjugate Dirichlet prior. The first m ethod is based on m arginal quartile assessments of the classification probabilities. These assessments were used to elicit separate m arginal beta distributions of the Dirichlet prior distribution. A nor­ mal approxim ation and least-squares techniques have been used to obtain b eta param eters from the quartile assessments. From three reconciliations of b eta distributions into a Dirich­ let prior distribution, the expert is asked to select the reconciliation th a t best describes her opinions, based on graphical feedback. The second m ethod elicits conditional quartile assess­ ments for the classification probabilities. These conditional assessments are used to determine conditional beta distributions th a t are averaged to obtain a Dirichlet prior distribution. The same marginal and conditional quartile assessments for classification probabilities have been used to elicit two other flexible prior distributions for multinomial models. Condi­ tional quartile assessments were used to elicit conditional beta distributions of a generalized Dirichlet prior distribution. As noted earlier, this distribution is more flexible th an the 274 standard Dirichlet distribution for quantifying expert opinion. It has the same number of hyperparam eters as the total num ber of param eters in the conditional beta distributions th a t determine it. Hence no reconciliation is needed. The generalized Dirichlet distribution has a more general dependence structure th an the standard Dirichlet. For example, its correlation structure allows positive correlations between classification probabilities. Marginal assessments were used to elicit m arginal beta distributions for multinomial prob­ abilities. Then, instead of assuming a Dirichlet prior, the beta marginals were used in a Gaussian copula function to model the joint prior distribution of multinomial probabilities. This required further conditional quartile assessments to describe the correlation structure between these probabilities. The monotonicity of the Gaussian copula transform ation allowed conditional quartiles of the multinomial probabilities to be transform ed into normal quartiles. The latter were used to obtain product-m om ent correlations for normal variates. This power­ ful technique of transform ing quartiles avoids the difficulties encountered when transform ing product-m om ent correlations. Structural assessment of the conditional quartiles has been used to ensure th a t the elicited variance-covariance m atrix is positive-definite. The conditional quartile assessments th a t were used to elicit correlations for a Gaussian copula prior were also used in a new m ethod for eliciting a logistic normal prior distribution for multinomial probabilities. Quantifying expert opinion as a logistic normal prior raised some interesting points th a t do not seem to have arisen in elicitation contexts before. We made use of the natural approxim ation of the lognormal sum by another lognormally distributed random variable. In addition, our proposed m ethod has extensively used the notion of singular m ultivariate normal distribution; available literature shows th a t conditional properties of the singular normal distribution is nearly identical to their corresponding properties in the standard normal distribution. These results were used to prove th a t the medians, not only the means, of multinomial probabilities must sum to one, assuming they follow a logistic normal distribution. This was critical in building the elicitation m ethod as it enables assistance to be given to the expert th a t leads to statistically coherent assessments. 275 The four proposed prior distributions are interrelated regarding the assessments th a t they use. Each type of assessments can be used to elicit more than one prior distribution. The Prior Elicitation Graphical Software package for M ultinomial models, PEGS-Multinomial, th a t is freely available at http://statistics.open.ac.uk/elicitation, arranges the assessment tasks th a t are required for the four proposed prior distributions. Software is also available th a t elicits each of the prior distributions separately. The flowchart in Figure 10.2 shows the options for prior distributions th a t are available in PEGS-M ultinomial and the corresponding assessments th a t they require. For example, it shows th a t a Gaussian copula prior is elicited using two types of assessments, and th a t a standard Dirichlet prior is elicited using either m arginal or conditional assessments, as discussed before. Since conditional beta assessments can be used to elicit both the standard and generalized Dirichlet distributions, the software gives the option of eliciting both of them using the same conditional quartiles. 276 Which Prior? Standard Dirichlet Generalized Dirichlet Conditional Logistic Normal [arginal or Conditional?. / Marginal Beta Assessment! /Conditional Beta Assessments/ Gaussian Copula /Conditional Quartiles Assessments/ I---Standard Dirichlet Prior Generalized Dirichlet Prior Gaussian Copula Prior Logistic Normal Prior Yes Generalized Dirichlet? Standard Dirichlet? Yes No No Finish Figure 10.2: A flowchart of the prior elicitation software for multinomial models All the proposed prior elicitation methods for multinomial models and their implementing software have been used in examples by real experts. In all examples, the experts suggested the problem according to their fields of expertise. They understood the multinom ial formula­ tion and were keen to participate in the elicitation process. After a brief discussion about the ideas of the bisection m ethod and conditional assessments they had no problem in assessing quartiles and conditional quartiles. All the experts expressed the view th a t visualization of the problem had helped them a lot in quantifying their opinions. They also made use of the coherent suggestions given by the software and used the feedback options to revise some of their assessments. Thus the software proved im portant in providing visualization, coherent suggestions and feedback. It also helped the experts review and revise their assessments, and 277 reduced the time taken by the elicitation processes. Future research in assessment methods for GLMs may include eliciting prior distribu­ tions for the overdispersion param eters in binomial and Poisson GLMs. In these im portant GLMs, it is common th a t the d a ta show a greater variability than the theoretical variability assumed by the model. However, no elicitation m ethod have been proposed in the literature for quantifying opinion about overdispersion param eters. A reasonable approach might be to assume a generalized binomial distribution or a generalized Poisson distribution for the response variable, instead of the standard binomial or Poisson distributions. These general­ ized distributions have extra param eters th a t allow for overdispersion. M ethods of assessing suitable prior distributions for these extra param eters need to be developed. Another extension to the proposed m ethod for GLMs elicitation concerns the proportional hazard model. This model, also known as the Cox regression model, is often used to model survival data in medical research. See, for example, Collett (1994). Due to its wide practical importance, a huge bulk of research has been devoted to investigating both theoretical and applied aspects of Bayesian analysis of a proportional hazard model. See, Ibrahim and Chen (1998) and Zuashkiani et al. (2008), among others. Quantifying opinion about these models has also attracted some attention. See, for example, Chaloner et al. (1993) and Henschel et al. (2009). A daptation is needed for the current GLM elicitation m ethods to handle a proportional hazard model. The m ethod of eliciting logistic normal prior distributions for multinom ial models has already been extended further in C hapter 9. The extended m ethod treats the case of m ulti­ nomial models in which classification probabilities are influenced by explanatory covariates. Specifically, we proposed a m ethod th a t quantifies opinion about the param eters of a base­ line multinomial logit model as a m ultivariate normal prior distribution. The m ethod uses conditional median and quartile assessments for the classification probabilities at different combinations of the explanatory variables. These assessments have been obtained in a struc­ tured way th a t yields the mean vector and positive-definite variance-covariance m atrix of 278 the prior m ultivariate normal distribution. Another desirable extension would be to elicit a logistic normal prior distribution for the cell probabilities of contingency tables. The logistic normal distribution is considered a reasonable prior for contingency tables, see for example Goutis (1993). Hence, our proposed elicitation m ethod for a logistic normal prior promises to be useful in further contexts. O ther models for which elicitation m ethods still need to be developed include tim e series analysis, extreme values analysis and modelling the spread of infectious diseases. These models sometimes investigate cases for which d a ta are scarce, the events are rare, or situations are new and uncontrollable. Expert opinion is highly im portant in such situations, so the need for appropriate elicitation methods is clear. 279 Bibliography Abadir, K. M. and Magnus, J. R. (2005). M atrix Algebra. Cambridge University Press, New York. Agresti, A. (2002). Categorical Data Analysis. Wiley Series in Probability and Statistics. John Wiley k Sons, Inc., New Jersey, second edition. Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society, Series B , 44, 139-177. Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapm an and Hall, London. Al-Awadhi, S. A. (1997). Elicitation of Prior Distribution fo r a M ultivariate Normal D istri­ bution. Ph.D . thesis, University of Aberdeen, UK. Al-Awadhi, S. A. and Garthwaite, P. H. (1998). An elicitation m ethod for m ultivariate normal distributions. Communications in Statistics-Theory and Methods, 27, 1123-1142. Al-Awadhi, S. A. and Garthwaite, P. H. (2001). Prior distribution assessment for a mul­ tivariate normal distribution: An experimental study. Journal of Applied Statistics, 28, 5-23. Al-Awadhi, S. A. and Garthwaite, P. H. (2006). Quantifying expert opinion for modelling fauna h abitat distributions. Computational Statistics, 21, 121-140. 280 A lbajar, R. A. and Fidalgo, J. F. L. (1997). Characterizing the general m ultivariate normal distribution through the conditional distributions. Extracta M athematica, 12, 15-18. Albert, J. H. and G ubta, A. K. (1982). M ixtures of Dirichlet distributions and estim ation in contingency tables. The Annals of Statistics, 10, 1261-1268. Beaulieu, N. C. and Xie, Q. (2004). An optim al lognormal approxim ation to lognormal sum distributions. IE E E Transactions on Vehicular Technology, 53, 479-489. Bedrick, E. J., Christensen, R., and Johnson, W. (1996). A new perspective on priors for generalized linear models. Journal of the American Statistical Association, 91, 1450-1460. Bland, R. P. and Owen, D. B. (1966). A note on singular normal distributions. Annals of the Institute of Statistical M athematics, 18, 113-116. Bunn, D. W. (1978). Estim ation of a Dirichlet prior distribution. Omega, 6, 371-373. Bunn, D. W. (1979). Estim ation of subjective probability distributions in forecasting and decision making. Technological Forecasting and Social Change, 14, 205-216. Chaloner, K. and Duncan, G. T. (1983). Assessment of a b eta prior distribution: P M elicitation. The Statistician, 32, 174-180. Chaloner, K. and Duncan, G. T. (1987). Some properties of the Dirichlet-multinomial distri­ bution and its use in prior elicitation. Communications in Statistics-Theory and Methods, 16,511-523. Chaloner, K., Church, T., Louis, T. A., and M atts, J. P. (1993). Graphical elicitation of a prior distribution for a clinical trial. The Statistician, 42, 341-353. Chen, M.-H. and Dey, D. K. (2003). Variable selection for m ultivariate logistic regression models. Journal of Statistical Planning and Inference, 111, 37-55. Chen, M.-H. and Ibrahim , J. G. (2003). Conjugate priors for generalized linear models. Statistica Sinica, 13, 461-476. 281 Chen, M.-H., Ibrahim , J. G., and Yiannoutsos, C. (1999). Prior elicitation, variable selection and Bayesian com putation for logistic regression models. Journal of the Royal Statistical Society, Series B , 61, 223-242. Chen, M.-H., Ibrahim , J. G., and Shao, Q.-M. (2000). Power prior distributions for generalized linear models. Journal o f Statistical Planning and Inference, 84, 121-137. Chen, M.-H., Ibrahim , J. G., Shao, Q.-M., and Weiss, R. E. (2003). Prior elicitation for model selection and estim ation in generalized linear mixed models. Journal of Statistical Planning and Inference, 111, 57-76. Chen, M.-H., Huamg, L., Ibrahim , J. G., and Kim, S. (2008). Bayesian variable selection and com putation for generalized linear models with conjugate priors. Bayesian A nalysis, 3, 585-614. Clemen, R. C. and Reilly, T. (1999). Correlations and copulas for decision and risk analysis. M anagement Science, 45, 208-224. Clemen, R. T., Fischer, G. W., and Winkler, R. L. (2000). Assessing dependence: Some experimental results. Management Science, 46, 1100-1115. Collett, D. (1994). Modelling Survival Data in Medical Research. Chapm an and Hall, London. Connor, R. J. and Mosimann, J. E. (1969). Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal of the Am erican Statistical Association, 64, 194-206. Daneshkhah, A. and Oakley, J. (2010). Eliciting m ultivariate probability distributions. In K. Bocker, editor, Rethinking Risk Measurement and Reporting: Volume I. Risk Books, London. D em arta, S. and McNeil, A. J. (2005). The t copula and related copulas. International Statistical Review, 73, 111129. 282 Denham, R. and Mengersen, K. (2007). Geographically assisted elicitation of expert opinion for regression models. Bayesian A nalysis, 2, 99-136. Dickey, J. M. (1968). Three multidimensional-integral identities with Bayesian applications. The Annals of Mathematical Statistics, 39, 1615-1627. Dickey, J. M. (1983). M ultiple hypergeometric functions: Probabilistic interpretations of statistical uses. Journal of the American Statistical Association, 78, 628-637. Dickey, J. M., Jiang, J. M., and Kadane, J. B. (1983). Bayesian m ethods for multinomial sam­ pling w ith noninformatively missing data. Technical Report 6/83 - # 15, State University of New Yourk at Albany, D epartm ent of M athem atics and Statistics. Dickey, J. M., Dawid, A. P., and Kadane, J. B. (1986). Subjective probability assess­ ment methods for m ultivariate-t and m atrix-t models. In P. Goel and A. Zellner, editors, Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de F inetti, pages 177-195. North-Holland, Amestrdam. Fan, D. Y. (1991). The distribution of the product of independent beta variables. Commu­ nications in Statistics-Theory and Methods, 20, 4043-4052. Fenton, L. F. (1960). The sum of log-normal probability distributions in scatter transmission systems. IR E Transactions on Communications System s, C S -8, 57-67. Fischer, M. H. (2001). Cognition in the bisection task. TR E N D S in Cognitive Sciences, 5, 460-462. Flanagan, M. T. (2011). Michael Thom as Flanagan’s Java scientific library. http://w w w .ee.ucl.ac.uk/~m flanaga/java/. [Accessed 9 M arch 2011]. Forster, J. J. and Skene, A. M. (1994). Calculation "of m arginal densities for param eters of multinomial distributions. Statistics and Computing, 4, 279-286. 283 Frees, E. W. and Valdez, E. A. (1998). Understanding relations using copulas. North A m er­ ican Actuarial Journal, 2, 125. Garthw aite, P. H. (1994). Assessment of prior distributions for regression models: An exper­ im ental study. Communications in Statistics-Sim ulation and Computation, 23, 871-895. Garthw aite, P. H. (1998). Quantifying expert opinion for modelling hab itat distributions. Technical report, Sustainable Forest M anagement 1998/02, D epartm ent of N atural Re­ sources Queensland. Garthwaite, P. H. and Al-Awadhi, S. A. (2001). Non-conjugate prior distribution assessment for m ultivariate normal sampling. Journal of the Royal Statistical Society, Series B , 63, 95-110. Garthwaite, P. H. and Al-Awadhi, S. A. (2006). Quantifying opinion about a logistic re­ gression using interactive graphics. Technical Report 06/07, Statistics Group, The Open University, UK. Garthwaite, P. H. and Al-Awadhi, S. A. (2011). Quantifying subjective opinion about gen­ eralized linear and piecewise-linear models. In preparation. Garthwaite, P. H. and Dickey, J. M. (1985). Double- and single-bisection m ethods for sub­ jective probability assessment in a location-scale family. Journal of Econom etrics, 29, 149-163. Garthwaite, P. H. and Dickey, J. M. (1988). Quantifying expert opinion in linear regression problems. Journal of the Royal Statistical Society, Series B , 50, 462-474. Garthwaite, P. H. and Dickey, J. M. (1992). Elicitation of prior distributions for variable selection problems in regression. Annals of Statistics, 20, 1697-1719. Garthwaite, P. H. and O ’Hagan, A. (2000). Quantifying expert opinion in U K w ater industry: An experimental study. The Statistician, 49, 455-477. 284 Garthwaite, P. H., Kadane, J. B., and O ’Hagan, A. (2005). Statistical m ethods for eliciting probability distributions. Journal of the American Statistical Association, 100, 680-7Q1. Garthwaite, P. H., Chilcott, J. B., Jenkinson, D. J., and Tappenden, P. (2008). Use of expert knowledge in evaluating costs and benefits of alternative service provisions: A case study. International Journal of Technology Assessm ent in Health Care, 24, 350-357. Gautschi, W. (1998). The incomplete gamma functions since tricomi. In In Tricom i’s Ideas and Contemporary Applied Mathematics, A tti dei Convegni Lincei, n. 147, Accademia Nazionale dei Lincei, Roma, pages 203-237. Genz, A. and Kwong, K. (1999). Numerical evaluation of singular m ultivariate normal dis­ tributions. Journal of Statistical Computation and Simulation, 68, 1-21. Good, I. J. (1976). On the application of symmetric Dirichlet distributions and their mixtures to contingency tables. The Annals of Statistics, 4, 1159-1189. Goutis, C. (1993). Bayesian estim ation methods for contingency tables. Journal of the Italian Statistical Society, 2, 35-54. Grunwald, G. K., Raftery, A. E., and G uttorp, P. (1993). Time series of continuous propor­ tions. Journal of the Royal Statistical Society, Series B , 55, 103-116. G upta, A. K. and N adarajah, S. (2004). Products and linear combinations. In A. K. G upta and S. N adarajah, editors, Handbook of Beta Distribution and Its Applications. Marcel Dekker, Inc., New York. Hankin, R. K. S. (2010). A generalization of the Dirichlet distribution. Journal of Statistical Software, 33, 1-18. Henschel, V., J., E., Holzel, D., and Mansmann, U. (2009). A sem iparam etric Bayesian proportional hazards model for interval censored d a ta with frailty effects. B M C Medical Research Methodology, 9:9. Available at http://w w w .biom edcentral.eom /1471-2288/9/9 [Accessed 27th February 2009]. 285 Hogarth, R. M. (1975). Cognitive processes and the assessment of subjective probability distributions. Journal of the American Statistical Association, 70, 271-289. Hora, S. C., Hora, J. A., and Dodd, N. G. (1992). Assessment of probability distributions for continuous random variables: A comparison of the bisection and fixed value methods. Organizational Behavior and Human Decision Processes, 51, 133-155. Hughes, G. and Madden, L. V. (2002). Some methods for eliciting expert knowledge of plant disease epidemics and their application in cluster sampling for disease incidence. Crop Protection, 21, 203215. Ibrahim, J. G. and Chen, M.-H. (1998). Prior distributions and Bayesian com putation for proportional hazards models. Sankhya: The Indian Journal of Statistics, S p l S eries, 48-64. Ibrahim, J. G. and Laud, P. W. (1991). On Bayesian analysis of generalized linear models using Jeffreys’s prior. Journal of the American Statistical Association, 86, 981-986. Ibrahim, J. G. and Laud, P. W. (1994). A predictive approach to the analysis of designed experiments. Journal of the American Statistical Association, 89, 309-319. James, A., Low Choy, S., and Mengersen, K. L. (2010). Elicitator: an expert elicitation tool for regression in ecology. Environmental Modelling & Software, 25, 129-145. Jenkinson, D. J. (2007). Quantifying Expert Opinion as a Probability Distribution. Ph.D . thesis, The Open University, UK. Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapm an &; Hall, London. Johnson, N., Kotz, S., and Balakrishnan, N. (1994). Continuous Univariate Distributions, volume 1. Wiley, New York, second edition. Jouini, M. N. and Clemen, R. T. (1996). Copula models for aggregating expert opinions. Operations Research, 44, 444-457. 286 Kadane, J. B. and Wolfson, L. J. (1998). Experiences in elicitation. The Statistician, 47, 3-19. Kadane, J. B., Dickey, J. M., Winkler, R., Smith, W., and Peters, S. (1980). Interactive elicitation of opinion for a normal linear model. Journal of the Am erican Statistical Asso­ ciation, 75, 845-854. Kadane, J. B., Shmueli, G., Minka, T. P., Borle, S., and Boatwright, P. (2006). Conjugate analysis of the Conway-Maxwell-Poisson distribution. Bayesian Analysis, 1 , 363-374. K hatri, C. G. (1968). Some results for the singular normal m ultivariate regression models. Sankhya, 30, 267-280. Koornwinder, T. H. (2008). On a m onotonticity property of the normalized incomplete gamma function, h ttp ://sta ff.sc ie n c e .u v a.n l/~ th k /a rt/c o m m e n t/. [Accessed 5 December 2011 ]. Krzysztofowicz, R. and Reese, S. (1993). Stochastic bifurcation processes and distributions of fractions. Journal of the American Statistical Association, 88, 345-354. Kurowicka, D. and Cooke, R. (2006). Uncertainty Analysis with High Dimensional Depen­ dence Modelling. Wiley Series in Probability and Statistics. John Wiley &; Sons Ltd, Chichester. Kwong, K. and Iglewicz, B. (1996). On singular m ultivariate normal distribution and its applications. Computational Statistics and Data Analysis, 22, 271-285. Kynn, M. (2005). Eliciting Expert Knowledge fo r Bayesian Logistic Regression in Species Hapitat Modelling in Natural Resources. Ph.D . thesis, Queensland University of Technol­ ogy, Australia. Kynn, priors M. for (2006). logistic Designing regression elicitor: models 287 Software in ecology. to graphically Available elicit at h ttp :/ / www.winbugs—development.org.uk/elicitor/files / designing, elicitor.p df [Accessed 10th October 2008]. Kynn, M. (2008). The ‘heuristics and biases’ bias in expert elicitation. Journal of the Royal Statistical Society, Series A, 171, 239-264. Leonard, T. (1975). Bayesian estim ation methods for two-way contingency tables. Journal of the Royal Statistical Society, Series B , 37, 23-37. Lewandowski, D. (2008). High Dimensional Dependence: Copulae, Sensitivity, Sampling. Ph.D . thesis, Delft University of Technology, Netherlands. Lindley, D. V., Tversky, A., and Brown, R. V. (1979). On the reconciliation of probability assessments. Journal of the Royal Statistical Society, Series A, 142, 146-180. Lochner, R. H. (1975). A generalized Dirichlet distribution in Bayesian life testing. Journal of the Royal Statistical Society, Series B , 37, 103-113. Low Choy, S., O ’Leary, R., and Mengersen, K. (2009). Elicitation by design in ecology: Using expert opinion to inform priors for Bayesian statistical models. Ecology, 90, 265-277. Low Choy, S., James, A., Murray, J., and Mengersen, K. (2010). Indirect elicitation from ecological experts: From methods and software to h abitat modelling and rock-wallabies. In A. O ’Hagan and M. West, editors, The Oxford Handbook of Applied Bayesian Analysis. Oxford University Press, Inc., New York. Mahmoud, A. S. H. (2010). New quadrature-based approxim ations for the characteristic func­ tion and the distribution function of sums of lognormal random variables. IE E E Transac­ tions on Vehicular Technology, 59, 3364-3372. McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. C hapm an and Hall, London, second edition. 288 Meyer, M. C. and Laud, P. W. (2002). Predictive variable selection in generalized linear models. Journal of the American Statistical Association, 97, 859-871. Miller, R. B. (1980). Bayesian analysis of the two-param eter gamm a distribution. Techno­ metrics, 22, 65-69. Nelder, J. A. and Mead, R. (1965). A simplex m ethod for function minimization. Computer Journal, 7, 308-313. Nelsen, R. B. (1999). A n Introduction to Copulas. Lecture Notes in Statistics, 139. SpringerVerlag, New York. Oakley, J. (2010). Eliciting univariate probability distributions. In K. Bocker, editor, Re­ thinking Risk Measurement and Reporting: Volume I . Risk Books, London. Oakley, J. E. and O ’Hagan, A. (2010). SH E L F: the Sheffield elicitation framework (version 2.0). School of M athem atics and Statistics, University of Sheffield, UK. http://tonyohagan.co.uk/shelf. [Accessed 9 March 2011]. O ’Hagan, A. (1998). Eliciting expert beliefs in substantial practical applications. The Statis­ tician, 47, 21-35. O ’Hagan, A. and Forster, J. (2004). Bayesian Inference, volume 2B of K endall’s Advanced Theory of Statistics. Arnold, London, second edition. O ’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J., Oakley, J. E., and Rakow, T. (2006). Uncertain Judgements: Eliciting Expert Probabilities. John Wiley, Chichester. O ’Leary, R. A., Low Choy, S., Murray, J. V., Kynn, M., Denham, R., M artin, T. G., and Mengersen, K. (2009). Comparison of three expert elicitation methods for logistic regression on predicting the presence of the threatened brush-tailed rock-wallaby Petrogalae peicillata. Environmmetrics, 20, 379-398. 289 Oman, S. D. (1985). Specifying a prior distribution in structured regression problems. Journal of the American Statistical Association, 80, 190-195. Palomo, J., Insua, D. R., and Ruggert, F. (2007). Modeling external risks in project m an­ agement. Risk A nalysis, 27, 961-978. Patel, J. K. and Read, C. B. (1982). Handbook of the Normal Distribution. Marcel Dekker, Inc., New York. Peterson, C. R. and Beach, L. R. (1967). Man as an intuitive statistician. Psychological Bulletin, 68, 29-46. Powers, D. A. and Xie, Y. (2000). Statistical Methods fo r Categorical Data Analysis. Aca­ demic Press, San Diego, CA. P ra tt, J. W., Raiffa, H., and Schalifer, R. (1995). Introduction to Statistical Decision Theory. The M IT Press, London. Rao, C. R. (2002). Linear Statistical Inference and its Applications. John Wiley h Sons, Inc., New York, second edition. Rayens, W. S. and Srinivasan, C. (1994). Dependence properties of generalized Liouville distributions on the simplex. Journal of the Am erican Statistical Association, 89, 14651470. Safak, A. (1993). Statistical analysis of the power sum of multiple correlated log-normal components. IE E E Transactions on Vehicular Technology, 42, 58-61. Schwartz, S. C. and Yeh, Y. S. (1982). On the distribution function and moments of power sums with log-normal components. The Bell System Technical Journal, 61, 1441-1462. Shields, M., Gorber, S. C., and Tremblay, M. S. (2008). Effects of m easurement on obesity and morbidity. Health Reports, Statistics Canada, Catalogue 82-003, 19, 1-8. 290 Stael von Holstein, C. A. S. (1971). The effect of learning on the assessment of subjective probability distributions. Organizational Behavior and Human Performance, 6, 304-315. Styan, G. P. H. (1970). Notes on the distribution of quadratic forms in singular normal variables. Biometrika, 57, 567-572. Sweeting, T. (1981). Scale param eters: a Bayesian treatm ent. Journal of the Royal Statistical Society, Series B , 43, 333-338. Tellambura, C. and Senaratne, D. (2010). Accurate com putation of the M G F of the log­ normal distribution and its application to sum of lognormals. IE E E Transactions on Communications, 58, 1568-1577. Tian, G. L., Tang, M. L., Yuen, K. C., and Ng, K. W. (2010). Further properties and new applications of the nested Dirichlet distribution. Computational Statistics and Data Analysis, 54, 394-405. Tricomi, F. G. (1952). Sulla funzione gamma incompleta. Annali di Matematica Pura ed Applicata, 31, 263-279. van Dorp, J. R. and Kotz, S. (2002). A novel extension of the triangular distribution and its param eter estimation. The Statistician, 51, 93-79. van Dorp, J. R. and Mazzuchi, T. A. (2000). Solving for the param eters of a b eta distribution under two quantile constraints. Journal of Statistical Computation and Simulation, 67, 189-201. van Dorp, J. R. and Mazzuchi, T. A. (2003). Param eter specification of the beta distribution and its Dirichlet extensions utilizing quantiles. Beta Distributions and Its Applications, 29, 1-37. van Dorp, J. R. and Mazzuchi, T. A. (2004). Param eter specification of the b eta distribution and its Dirichlet extensions utilizing quantiles. In A. K. G upta and S. N adarajah, editors, Handbook of Beta Distribution and Its Applications. Marcel Dekker, Inc., New York. 291 Wallsten, T. S. and Budescu, D. V. (1983). Encoding subjective probabilities: A psychological and psychometric review. Management Science, 29, 151-173. West, M. (1985). Generalized linear models: Scale param eters, outlier accommodation and prior distributions. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, editors, Bayesian Statistics 2, pages 531-558. Elsevier, North-Holland. West, M. and Harrison, J, (1997). Bayesian Forecasting and Dynamic Models. SpringerVerlag, New York, second edition. West, M., Harrison, P. J., and Migon, H. S. (1985). Dynamic generalized linear models and Bayesian forecasting. Journal of the American Statistical Association, 80, 73-83. Wilks, S. S. (1962). Mathematical Statistics. John Wiley h Sons, Inc., New York. W inkler, R. L. (1967). The assessment of prior distributions in Bayesian analysis. Journal of the American Statistical Association, 62, 776-800. Wong, T. T. (1998). Generalized Dirichlet distribution in Bayesian analysis. Applied M ath­ ematics and Computation, 97, 165-181. Wong, T. T. (2005). A Bayesian approach employing generalized Dirichlet priors in predicting microchip yields. Journal of the Chinese Institute of Industrial Engineers, 22, 210-217. Wong, T. T. (2007). Perfect aggregation of Bayesian analysis on compositional data. Statis­ tical Papers, 48, 265-282. Wong, T. T. (2010). Param eter estim ation of generalized Dirichlet distributions from the sample estim ates of the first and the second moments of random variables. Computational Statistics and Data Analysis, 54, 1756-1765. Yi, W. and Bier, V. M. (1998). An application of copulas to accident precursor analysis. M anagement Science, 44, S257-S270. 292 Zuashkiani, A., Banjevic, D., ters of proportional hazards data. Journal of the and Jardine, model based A. (2008). on expert Operational Research Society, Estim ating knowledge pages parame­ and statistical 1-16. Available at http://w w w .palgrave-journals.com /jors/journal/vaop/ncurrent/full/jors2008119a.htm l [Accessed 17th June 2009]. 293