Stephen C Hora
Stephen C Hora
Stephen C Hora
2009
Recommended Citation
Hora, Stephen C., "Expert Judgment in Risk Analysis" (2009). Non-published Research Reports. Paper 120.
http://research.create.usc.edu/nonpublished_reports/120
This Article is brought to you for free and open access by CREATE Research Archive. It has been accepted for inclusion in Non-published Research
Reports by an authorized administrator of CREATE Research Archive. For more information, please contact [email protected].
risk0525
must be made about how to proceed. These include the resources for the more important issues. A sensi-
the following: tivity analysis using initial estimates of probabilities
PR
process is to determine the objectives and desired • It should be resolvable in that given sufficient
products. Judgments can be made about a number time and/or resources, one could conceivably
of different things. Some judgments are about facts, learn whether the event has occurred or learn the
while others are about values. Roughly speaking, a value of the quantity in question. Hence, the issue
fact is something that can be verified unambiguously, concerns a fact or set of facts.
ST
while a value is a measure of the desirability of • It should have a basis upon which judgments can
something. be made and can be justified.
For example, you might believe that it is better
to expend tax dollars to fight HIV abroad than it The requirement of resolvability means that
R
is to improve the educational level of impoverished the event or quantity is knowable and physically
FI
risk0525
2 Expert Judgment
measurable. We consider a counterexample. In a uncertain, and, perhaps, those that should be excluded
study of risk from a radioactive plume following from their analyses.
a power plant failure, a simple Gaussian dispersion Finally, once an issue has been framed and put in
model of the form y = ax b was employed [1]. In this the form of statement to be submitted to the experts,
model, a and b are simply parameters that give a good it should be tested. The best way to do this testing is
fit to the relation between x, downwind distance, through a dry-run, with stand-in experts who have not
and y, the horizontal width of the plume. But not been participants in the framing process. Although
all experts subscribe to this model. More complex this seems like a lot of extra work, experience has
alternatives have been proposed with different types shown that getting the issue right is both critical and
of parameters. Asking an expert to provide judgments difficult [2]. All too often, the expert’s understanding
about a and b violates the first principle above. One of the question differs from what was intended by the
cannot verify if the judgments are correct, experts analyst who drafted the question. It is also possible
may disagree on the definition of a and b, and experts that the question being asked appears to be resolvable
who do not embrace the simple model will find the to the person who framed the question, but not to the
parameters not meaningful. It is very difficult to expert who must respond.
provide a value for something you do not believe
exists.
Selecting the Experts
The second requirement is that there is some
knowledge that can be brought to bear on the event The identification of experts requires that one develop
or quantity. For many issues, there is no directly some criteria by which expertise can be measured.
applicable data so that data from analogs, models Generally, an expert is one who “has or is alleged to
using social, medical or physical principles, etc., have superior knowledge about data, models and rules
may form the basis for the judgments. If the basis
for judgments is incomplete or sketchy, the experts
should reflect this by expressing greater uncertainty
in their judgments.
FS
in a specific area or field” [3]. But measuring against
this definition requires one to look at indicators
of knowledge rather than knowledge per se. The
following list contains such indicators:
O
Once issues have been identified, it is necessary
to develop a statement that presents the issue to the • research in the area as identified by publications
O
experts in a manner that will not color the experts’ and grants
responses. This is called framing the issue. Part of • citations of work
• degrees, awards, or other types of recognition
PR
are to integrate the uncertainty about the conditions need to meet some additional requirements. The
into their responses. For example, in a study of expert should be free from motivational biases caused
PA
dry deposition of radioactivity, the experts were told by economic, political, or other interest in the deci-
that the deposition surface was northern European sion. The choice of whether to use internal or external
grassland, but they were not told the length of the experts often hinges on the appearance of motivations
grass which is thought to be an important determinant biases. Potential experts who are already on a project
of the rate of deposition [1]. Instead, the experts were team may be much easier to engage in an expert
ST
asked to treat the length of grass as an unknown and judgment process, but questions about the indepen-
to incorporate any uncertainty that they might have dence of their judgments from project goals may be
into their responses. The experts should be informed raised. Experts should be willing to participate and
about those factors that are considered to be known, they should be accountable for their judgments [6].
R
those that are constrained in value, those that are This means that they should be willing to have their
FI
risk0525
Expert Judgment 3
names associated with their specific responses. At probability. For example, consider the question “what
times, physical proximity or availability will be an is the probability the next elected president of the
important consideration. United States is a woman?” Individuals may hold
How the experts are to be organized also impacts different probabilities or degrees of belief about this
the selection. Often, when more than one expert is event occurring. There is, however, no physical, ver-
used, the experts will be redundant of one another, ifiable probability that could be known but remains
meaning that they will perform the same tasks. In uncertain. The event will resolve as occurring or not
such a case, one should attempt to select experts but, will not resolve to a frequency or probability.
with differing backgrounds, responsibilities, fields of It is possible to address the goodness of proba-
study, etc., so as to gain a better appreciation of bilities, however. There are two properties that are
the differences among beliefs. In other instances, the desirable to have in probabilities:
experts will be complementary, each bringing unique
expertise to the question. Here, they act more like a • probabilities should be informative
team and should be selected to cover the disciplines • probabilities should authentically represent uncer-
needed. tainty.
Some analyses undergo extreme scrutiny because
of the public risks involved. This is certainly the case The first property, being informative, means that
with radioactive waste disposal or purity of the blood probabilities closer to 0.0 or 1.0 should be preferred
supply. In such instances, the process for selecting to those closer to 0.5 as the more extreme proba-
(and excluding) experts should be transparent and bilities provide greater certainty about the outcome
well documented. In addition to written criteria, it of an event. In a like manner, continuous probabil-
may be necessary to isolate the project staff from ity distributions that are narrower or tighter convey
the selection process. This can be accomplished by more information than those that are diffuse. The
appointing and independent selection committee to
seek nominations and make recommendations to the
staff [7].
FS
second property, the appropriate representation of
uncertainty, requires consideration of a set of assessed
probabilities. For those events that are given an
assessed probability of p, the relative frequency of
O
How many experts should be selected? Experience
has shown that the differences among experts can be occurrence of those events should approach p.
very important in determining the total uncertainty To illustrate this idea, consider two weather fore-
O
expressed about a question. Clemen and Winkler [8] casters who have provided precipitation forecasts as
examine the impact of dependence among experts probabilities. The forecasts are given to a precision
PR
using a normal model and conclude that three to of one digit. Thus a forecast of 0.2 is taken to mean
five experts are adequate. Hora [9] created synthetic that there is a 20% chance of precipitation. Forecasts
groups from the responses of real experts, and found from two such forecasters are shown in Figure 1.
that three to six or seven experts are sufficient, Ideally, each graph would have a 45° line indicat-
with little benefit from additional experts beyond that ing that the assessed probabilities are faithful in that
E
point. When experts are organized in groups, and each they correctly represent the uncertainty about reality.
group provides a single response, this advice would Weather Forecaster B’s graph shows a nearly per-
G
apply to the number of groups. The optimal number fect relation while the graph for the Forecaster A
of experts within a group has not been investigated shows poorer correspondence between the assessed
PA
and is likely to be dependent on the complexity of probabilities and relative frequencies with the actual
issues being answered. frequency of rain exceeding the forecast probability.
The graph is not even monotonic at the upper end.
Graphs showing the relation between assessed
The Quality of Judgments probabilities and relative frequencies are called cal-
ST
use as a measure of the accuracy of a single elicited let Fi (x) be a set of assessed continuous probability
FI
risk0525
4 Expert Judgment
Calibration chart
1
distribution functions and let xi be the corresponding are identical to the probabilities or probability func-
actual values of the variables. If an expert is perfectly tions that are used to take the expectation. An exam-
calibrated, the cumulative probabilities of the actual ple will clarify.
values measured on each corresponding distributions A simple strictly proper scoring rule for the
function, pi = Fi (x), will be uniformly distributed
on the interval [0,1]. We can use the area between
the 45° line of perfect calibration and the observed
calibration curve as a measure of miscalibration for
FS
assessed probability of an event is the Brier or
quadratic rule [11]:
probability distributions may not be informative. For bility q, the mathematical expectation
example, in an area where it rains on 25% of the days, Eq [S(p)] = −q(1 − p)2 − (1 − q)p 2 is maxi-
a forecaster who always predicts a 25% chance of rain mized with respect to p by setting p = q. Thus, if an
will be perfectly calibrated but provide no informa- expert believes the probability is q, the expert will
tion from day to day about the relative likelihood of maximize the perceived expectation by responding
E
rain. But information and calibration are somewhat at with q. In contrast, the scoring rule S(p) = −p if the
odds. Increasing the information by making probabil- event occurs and S(p) = −(1 − p), while intuitively
G
ities closer to zero or one or by making distributions pleasing, does not promote truthfulness. Instead, the
tighter may reduce the level of calibration. expected score is maximized by providing a probabil-
PA
One approach to measuring the goodness of prob- ity p of either 0.0 or 1.0 depending on whether q is
abilities is through scoring rules. Scoring rules are less than or larger than 0.5. Winkler [12] provides a
functions of the assessed probabilities and the true discussion of this Brier rule and other strictly proper
outcome of the event or value of the variable that scoring rules. See also [6, 10].
measure the goodness of the assessed distribution The concept of a strictly proper scoring rule can
ST
and incorporate both calibration and information into be extended to continuous distributions [13]. For
the score. The term strictly proper scoring function example, the counterpart to the quadratic scoring rule
refers to the property that the expected value of for continuous densities is:
∞
the function is maximized when the probabilities or
R
Expert Judgment 5
Expected scores can sometimes be decomposed multiple points in the model where different experts
into recognizable components. The quadratic rule for have given their judgments, the number of separate
continuous densities can be decomposed in the fol- evaluations of the model is the product of the number
lowing manner. Suppose that an expert’s uncertainty of experts used at each place in the model and can
is correctly expressed through the density g(x), but be very large. Aggregation of the judgments into a
the expert responds with f (x) either through inadver- single probability or distribution avoids this problem.
tence or intention. The expected score can be written There are two classes of aggregation methods,
as follows: behavioral and mathematical. Behavioral approaches
entail negotiation to reach a representative or consen-
Eg {S[f (x), w]} = I (f ) − C(f, g)
∞ sus distribution. Mathematical methods, in contrast,
are based on a rule or formula. The approaches are
whereI (f ) = f 2 (x) dx and C(f, g)
−∞ not entirely exclusive, however, as they may be both
∞ be used to greater or lesser degree to perform an
=2 f (x)[f (x) − g(x)] dx (3) aggregation.
−∞ You may be familiar with the “Delphi” technique,
I (f ) is the expected density associated with the developed at the Rand Corporation in the 1960s by
assessed distribution and is a measure of information. Norman Dalkey [18]. In the Delphi method, the inter-
C(f, g) is a nonnegative function that increases action among the experts is tightly controlled. In fact,
as g(x) diverges from f (x). Thus C(f, g) is a they do not meet face to face but remain anonymous
measure of miscalibration. Further discussion of to one another. This is done to eliminate the influence
decomposition can be found in [10, 14, 15]. Haim that one might have because of position or personal-
[16] provides a theorem that shows how a strictly ity. The judgments are exchanged among the experts,
proper scoring rule can be generated from a convex
function. See also Savage [17]. FS
along with the reasoning for the judgments. After
viewing all the judgments and rationales, the experts
are given the opportunity to modify their judgments.
The process is repeated – exchanging judgments and
O
Combining Expert Judgments revising them – until the judgments become static or
have converged to a consensus. Often times, it will be
There are two situations that may require probabil-
O
can be accomplished by simple probability manipula- expert to provide the idea or judgment and records
tions. In more complicated situations, you may need this on a public media display such as a white board,
G
to employ simulation methods to obtain a top event flip chart, or computer screen. There may be several
probability or distribution for a quantity. This is a rounds of ideas/judgments which are then followed
PA
of the inherent uncertainty, doing so creates the Kaplan [20] proposes a behavioral method for
problem of having multiple answers when it would combining judgments through negotiation with a
be convenient to have a single answer. If you decide facilitator. The facilitator and experts meet together
to evaluate the risk model separately, using the to discuss the problem. The facilitator’s role is to
R
judgments of each individual expert and you have bring out information from the experts and interpret
FI
risk0525
6 Expert Judgment
a “consensus body of evidence” that represents the combination rule is not a function of the event or
aggregated wisdom of the group. quantity in question.
A wide range of mathematical methods for com-
bining probability judgments have been proposed. This property, in turn, implies the following two
Perhaps the simplest and most widely used is a sim- properties:
ple average termed the linear opinion pool [21]. This Zero set property
technique applies equally well to event probabilities If each assessor, i = 1, . . . , n provides Pi (A) = 0,
and continuous probability densities or distributions. then the combined result, Pc (A), should also concur
It is important to note that with continuous distribu- with Pc (A) = 0.
tions, it is the probabilities not the values that are
averaged. For example, it is tempting, given several Marginalization property
medians, to average the medians, but this is not the If a subset of events is considered, the marginal
approach we are referring to. An alternative to the probabilities from the combined distribution will be
simple average is to provide differential weights to the same as the combined marginal probabilities.
the various experts, ensuring that the weights are non-
The strong set property also implies that the
negative and sum to one. The values of the weights
combining rule is a linear opinion pool or weighted
may be assigned by the staff performing the aggre-
average of the form
gation or they may result from some measure of the
experts’ performance. Cooke [6] suggests that evi-
n
the subject matter area addressed by the expert. The where the weights, αi , are nonnegative and sum to
experts are given weights based on the product of
the p value for the χ 2 test of calibration and the
information, as measured by the entropy in the assess-
one.
FS
Another property is termed the independence
property is defined by:
ments. A cutoff value is used so that poorly calibrated
O
experts are not included in the combination. Pc (A ∩ B) = Pc (A)Pc (B) (5)
The most elegant approach to combining judg-
O
approaches to combining experts using a Bayesian Pc (A|B)Pc (B) = αi Pi (A|B)Pi (B) (6)
approach. The decision maker can then develop the
G
posterior distribution for the uncertain quantity or except when one of the weights is one and all others
event using Bayes’ theorem (see Bayes’ Theorem are zero, so that one expert is a “dictator”.
PA
risk0517 and Updating of Belief). The strong set property was used by Dalkey [26]
Various mathematical methods for combining to provide an impossibility theorem for combining
probability judgments have different desirable and rules. Dalkey’s theorem adopts seven assumptions
undesirable properties. Genest and Zidek [25] that lead to the conclusion that while conforming
describe the following property: to the seven assumptions, “there is no aggregation
ST
bilities and maps [0, 1]n → [0, 1]. In particular, the as an assumption.
FI
risk0525
Expert Judgment 7
While the linear rule does not conform to the protocols are discussed in [6, 28–30]. We briefly
independence property, its cousin, the geometric or outline two different protocols that illustrate the
logarithmic rule does. This rule is linear in the log range of options that have been employed in expert
probabilities and is given by elicitation studies.
Morgan and Henrion [28] identify the Stanford
n
n
Research Institute (SRI) assessment protocol as, his-
Pc (A) = k Pi (A)αi where αi > 0, αi = 1 (7)
i=1 i=1
torically, the most influential in shaping structured
probability elicitation. This protocol is summarized in
and k is a normalizing constant. The geometric rule [31]. It is designed around a single expert (subject)
also has the property of being externally Bayesian. and single analyst engaged in a five-stage process
Externally Bayesian detailed below:
The result of performing Bayes’ theorem on indi- • motivating – rapport with the subject is estab-
vidual assessments and then combining the revised lished and possible motivational biases explored;
probabilities is the same as combining the probabili- • structuring – the structure of the uncertainty is
ties and then applying Bayes’ theorem. defined
• conditioning – the subject is conditioned to think;
While the geometric rule is externally Bayesian, fundamentally about his judgment and to avoid
it is also dictatorial in the sense that if one expert cognitive biases
assigns Pi (A) = 0, the combined result is necessarily • encoding – this is the actual quantification in
Pc (A) = 0. We note that the linear opinion pool is not probabilistic terms;
externally Bayesian. • verifying – checking for consistency, the res-
It is apparent that all the desirable mathematical ponses obtained in the encoding.
properties of combining rules cannot be satisfied by
a single rule. The topic of selecting a combining
method remains an open topic for investigation.
FS
The role of the analyst in the SRI protocol is
primarily it to help the expert avoid psychological
biases. The encoding of probabilities roughly follows
O
a script. Stael von Holstein and Matheson [4] provide
Expert Judgment Designs a script showing how an elicitation session might go
O
forward.
In addition to defining issues and selecting and The encoding stage for continuous variables is
training the expert(s), there are a number of questions
PR
• the type and amount of preliminary information to provide a probability of being outside the inter-
that is to be provided to the experts; val. The process next goes to a set of intermediate
G
• the time and resources that will be allocated to values whose cumulative probabilities are assessed
preparation of responses. with the help of the probability wheel. The probability
PA
• What is the venue – the expert’s place of work, wheel provides a visual representation of a probabil-
the project’s home, or elsewhere? ity and its complement. Then, an interval technique
• Will there be training, what kind, and how will it is used to obtain the median and quartiles. Finally,
be accomplished? the judgments are verified by testing for coherence
• Are the names of the experts to be associated with and conformance with the expert’s beliefs.
ST
their judgments, and will individual judgments be While the SRI protocol was designed for solitary
preserved and made available? experts, a protocol developed by Sandia Laboratories
for the US Nuclear Regulatory Commission [5, 32]
The choices result in the creation of a design for was designed to bring multiple experts together. The
R
elicitation that has been termed a protocol. Some Sandia protocol consists of two meetings:
FI
risk0525
8 Expert Judgment
First meeting agenda [2] Hora, S.C. & Jensen, M. (2002). Expert Judgement Elic-
• presentation of the issues and background mate- itation, Swedish Radiation Protection Authority, Stock-
rials; holm.
[3] Bonano, E.J., Hora, S.C., Keeney, R.L. & von Winter-
• discussion by the experts of the issues and feed-
feldt, D. (1989). Elicitation and Use of Expert Judgment
back on the questions; in Performance Assessment for High-Level Radioactive
• a training session, including feedback on judg- Waste Repositories, NUREG/CR-5411, U. S. Nuclear
ments. Regulatory Commission, Washington, DC.
[4] Stael von Holstein, C.A.S. & Matheson, J.E. (1979).
The first meeting is followed by a period of A Manual for Encoding Probability Distributions, SRI
individual study of approximately 1 month. International, Menlo Park.
[5] Hora, S.C. & Iman, R.L. (1989). Expert opinion in risk
Second meeting agenda analysis: the NUREG-1150 experience, Nuclear Science
• discussion by the experts of the methods, models, and Engineering 102, 323–331.
and data sources used; [6] Cooke, R.M. (1991). Experts in Uncertainty, Oxford
• individual elicitation of the experts. University Press, Oxford.
[7] Trauth, K.M., Hora, S.C. & Guzowski, R.V. (1994).
A Formal Expert Judgment Procedure for Performance
The second meeting is followed by documentation
Assessments of the Waste Isolation Pilot Plant, SAND93-
of rationales and opportunity for feedback from the 2450, Sandia National Laboratories, Albuquerque.
experts. The final individual judgments are then com- [8] Clemen, R.T. & Winkler, R.L. (1985). Limits for the
bined using simple averaging to the final probabilities precision and value of information from dependent
or distribution functions. sources, Operations Research 33, 427–442.
There are a number of significant differences [9] Hora, S.C. (2004). Probability judgments for continuous
between the SRI and Sandia protocols. First, the SRI quantities: linear combinations and calibration, Manage-
ment Science 50, 597–604.
protocol is designed for isolated experts while the
Sandia protocol brings multiple experts together and
allows them to exchange information and viewpoints.
They are not allowed, however, to view or participate
[10]
FS
Lichtenstein, S., Fischhoff, B. & Phillips, L.D. (1982).
Calibration of probabilities: the state of the art to 1980,
in Judgment Under Uncertainty: Heuristics and Biases,
D. Kahneman, P. Slovic & A. Tversky, eds, Cambridge
O
in the individual encoding sessions or comment on University Press, Cambridge.
one another’s judgments. Second, in the SRI proto- [11] Brier, G. (1950). Verification of weather forecasts
expressed in terms of probabilities, Monthly Weather
O
ability wheel is today seldom employed by analysts. of Applied Meteorology 11, 273–282.
[15] Murphy, A.H. (1973). A new vector partition of the
Third, the Sandia protocol places emphasis on obtain-
G
those studies to which it had been applied. Proper Scoring Rules, Doctoral dissertation, University
of California, Berkeley.
[17] Savage, L.J. (1971). The elicitation of personal proba-
References bilities and expectations, Journal of the American Statis-
tical Association 66, 783–801.
ST
[1] Harper, F.T., Hora, S.C., Young, M.L., Miller, L.A., [18] Dalkey, N.C. (1967). Delphi, Rand Corporation Report,
Lui, C.H., McKay, M.D., Helton, J.C., Goossens, L.H.J., Santa Monica.
Cooke, R.M., Pasler-Sauer, J., Kraan, B. & Jones, J.A. [19] Delbecq, A.L., Van de Ven, A.H. & Gustafson, D.H.
(1994). Probability Accident Consequence Uncertainty (1986). Group Techniques for Program Planning : A
R
Analysis, (NUREG/ CR-6244, EUR 15855 EN), USNRC Guide to Nominal Group and Delphi Processes, Green
and CEC DG XII, Brussels, Vols 1–3. Briar Press, Middleton.
FI
risk0525
Expert Judgment 9
[20] Kaplan, S. (1990). ‘Expert information’ vs ‘expert [29] Merkhofer, M.W. (1987). Quantifying judgmental uncer-
opinions’: another approach to the problem of elicit- tainty: Methodology, experiences, and insights, IEEE
ing/combining/using expert knowledge in PRA, Journal Transactions on Systems, Man, and Cybernetics 17,
of Reliability Engineering and System Safety 39, 61–72. 741–752.
[21] Stone, M. (1961). The linear opinion pool, Annals of [30] Keeney, R. & von Winterfeldt, D. (1991). Eliciting
Mathematical Statistics 32, 1339–1342. probabilities from experts in complex technical prob-
[22] Morris, P.A. (1974). Decision analysis expert use, Man- lems, IEEE Transactions on Engineering Management
agement Science 20, 1233–1241. 38, 191–201.
[23] Morris, P.A. (1977). Combining expert judgments: a [31] Spetzler, C.S. & Stael von Holstein, C.-A.S. (1975).
Bayesian approach, Management Science 23, 679–693. Probability encoding in decision analysis, Management
[24] French, S. (1985). Group consensus probability distri- Science 22, 340–358.
butions: a critical survey, in Bayesian Statistics 2, J.M. [32] Ortiz, N.R., Wheeler, T.A., Breeding, R.J., Hora, S.,
Bernardo, M.H. DeGroot, D.V. Lindley & A.F.M. Smith, Meyer, M.A. & Keeney, R.L. (1991). The use of expert
eds, North-Holland, pp. 183–201. judgment in the NUREG-1150, Nuclear Engineering and
[25] Genest, C. & Zidek, J.V. (1986). Combining probability Design 126, 313–331.
distributions: a critique and annotated bibliography,
Statistical Science 1, 114–148.
[26] Dalkey, N. (1972). An Impossibility Theorem for Group Related Articles
Probability Functions, The Rand Corporation, Santa
Monica. Subjective Probability risk0539
[27] Boardley, R.F. & Wolff, R.W. (1981). On the aggre-
gation of individual probability estimates, Management Uncertainty Analysis and Dependence Modeling risk0541
10 Expert Judgment
Abstract: Experts are often used to provide uncertainty distributions in risk analyses. They play an important
role when insufficient data exist for quantification, or when the available data or models are conflicting. Multiple
steps are required in constructing a successful expert judgment process. These steps include selecting and
framing the issues, identifying the experts, deciding upon an organization structure, and possibly combining
the distributions from multiple experts.
Expert judgments are normally given as probabilities or probability distributions that express the uncertainty
about future events or unmeasured quantities. The goodness of probabilistic judgments is measured through
calibration and information, which, in turn, can be measured through a scoring rule.
Various behavioral and mathematical methods have been proposed for combining the judgments of experts.
There is, however, no single method that has emerged as the best.
FS
O
O
PR
E
G
PA
ST
R
FI