A Critique of Benchmark Theory
Robert Bassett
Final draft: Do not cite without permission
September 1st 2014
Abstract
Benchmark Theory (BT), introduced by Ralph Wedgwood, departs from decision theories of pure expectation maximization like Evidential Decision Theory
(EDT) and Causal Decision Theory (CDT) and instead ranks actions according
to the desirability of an outcome they produce in some state of affairs compared
to a standard—a benchmark—for that state of affairs. Wedgwood motivates BT
through what he terms Gandalf’s Principle, that the merits of an action in a given
state should be evaluated relative only to the performances of other actions in that
state, and not to their performances in other states. Although BT succeeds in selecting intuitively rational actions in a number of cases—including some in which
EDT or CDT seem to go wrong—it places constraints on rational decision-making
that either lack motivation or are untenable. Specifically, I argue that as it stands
BT is committed both to endorsing and rejecting the Independence of Irrelevant
Alternatives. Furthermore its requirement that weakly dominated actions be excluded from consideration of rational action lacks motivation and threatens to collide with traditional game theory. In the final section of the paper, I construct a
counterexample to BT.1
1 Background
Decision theory considers problems in which an agent—who might be ignorant to some
extent about how the world is—is faced with a choice of (mutually exclusive) actions.
The agent has (or may have, at least) credences, which we will take to be a Kolmogorovian probability distribution, over states, i.e. (roughly) mutually exclusive ways the
world might relevantly be, and a utility function2 that assigns real numbers (values or
1 I am very grateful to Dorothy Edgington, Corine Besson, Dan Adams, Jonathan Nassim, Chris Sykes
and to two anonymous Synthese referees for helpful suggestions and comments on this paper.
2 Specifically, this is typically taken to be a von Neumann-Morgenstern utility function, i.e. it is unique
up to positive affine transformations: neither utility 0 nor utility 1 has special significance, but same-sized
differences in utility are significant. Nothing in what follows turns on the details of these functions. Indeed,
strictly speaking, utility is a measure of subjective preference and Wedgwood is keen to emphasize that he
does not assume in his theory that such preferences are measurable by utility functions. In this paper, I use
utility in a looser sense to denote the payoffs an agent associates with outcomes; these need not be interpreted
as measures of subjective preference, but whatever values Wedgwood needs to underwrite the details of his
theory.
1
utilities) to outcomes, pairs of actions and states. This encodes the agents preferences.
Outcome O1 is assigned higher utility than outcome O2 iff O1 is preferred to O2 .3
1.1
Evidential Decision Theory and Causal Decision Theory
Decision theorists largely agree that, at least in run-of-the-mill cases, rational decisionmaking consists in choosing actions that maximize expectation: the sum of utilities
of outcomes weighted by their subjective chances of occurring.4 However, there is
disagreement over how expectations should be calculated. Evidential Decision Theory5
(henceforth, EDT) states that utilities of outcomes should be weighted by the subjective
probability conditional on the action in question, i.e. one should choose an action A
that maximizes news value:6
EV (A) = ∑ U(A, Si )Cr(Si |A)
(1)
∀i
where U(A, Si) is the utility obtained by choosing action A when one is in state Si and
Cr(Si |A) is the conditional credence given to state Si occurring given that action A is
chosen.7
Newcomb’s problem and medical Newcomb cases provide plausible and well-known
counterexamples to EDT.8 In Newcomb’s problem, a highly reliable predictor has two
sealed boxes, A and B, and offers an agent a choice between taking both A and B
or just B. Box A, which is transparent, contains £1000 for certain. Box B contains
£1m if the predictor predicted she would choose just B and nothing if the predictor
predicted she would take both. As the agent’s credence that the predictor will have
correctly predicted tends to 1, the news value of choosing just B tends to £1000; that
of choosing both boxes tends to £1m. Indeed the news value of choosing just B exceeds that of choosing both as long as the agent’s credence that the predictor will have
made a correct prediction exceeds 1001
2000 . However, regardless of how the money has
been distributed in the now sealed boxes, the agent is always better off by £1000 by
choosing both boxes; choosing both dominates choosing just B. Although choosing
just B is correlated with a higher payoff, this should not influence one’s choice because
one’s choice of action plays no role in determining the distribution of monies. Medical
Newcomb cases have a similar structure. Among migraineurs, migraines are correlated
with eating chocolate, but research has demonstrated that chocolate consumption is not
a trigger of migraines.9 Rather it is the case instead that pre-migraine tension can result
3 As
sketched in the last footnote, more structure is usually built into these utility functions than mere
monotonic preservation of preference orderings, but the details are not important here since this is not relevant for Wedgwood’s theory.
4 Some theorists advocate modifications to this basic model in order to account for e.g. the Egan cases
discussed below. In this paper, I will focus only on Wedgwood’s response to these kinds of cases.
5 See Jeffrey (1983) for a classic treatment.
6 This term is taken from Gibbard and Harper (1978).
7 (Rational) conditional credences conform to Bayes’ Rule, i.e. Cr(P|Q) = Cr(P∩Q)/Cr(Q), for Cr(Q) 6=
0.
8 There is, however, more to be said in EDT’s defence against such cases than space permits here. See
e.g. Jeffrey (1983) and Eells (1981) for responses to these cases.
9 For an example of such research, see Marcus et al. (1997).
2
in a craving for chocolate. Should a migraine sufferer refrain from assuaging her craving? Certainly not on the grounds that doing so will obviate a migraine. Again eating
chocolate is correlated, among sufferers, with migraines, but this fact should play no
role in a sufferer’s decision of whether to eat chocolate, since doing so plays no role in
determining whether she will suffer an episode.
In response to these counterexamples, some philosophers have advocated Causal
Decision Theory (CDT)10 , according to which expected utility is calculated by weighting utilities of outcomes by probabilities of counterfactual conditionals:
EU(A) = ∑ U(A, Si )Cr(A → Si )
(2)
∀i
Here Cr(A → Si ) is one’s credence that if one were to choose A, state Si would
obtain. Backtracking conditionals are excluded from consideration.11 Acting in accordance with CDT by maximizing EU, rather than in accordance with EDT and maximizing EV, one chooses the undominated action in Newcomb’s problem because one’s
choice does not result in or cause the money to be present. Given how things are, one
should never leave £1000 on the table. If B has the million then if one were to take A, it
would continue to be the case that B has the million. Likewise, it handles the migraine
case correctly. If a migraineur has a craving for chocolate then if she were to refrain
from eating some, a migraine would (likely) ensue nonetheless.
1.2
Counterexamples to Causal Decision Theory
Egan (2007) gives two counterexamples to CDT. In Murder Lesion, Mary chooses
whether or not to shoot Alfred. Since the world would be a far better place without Alfred, she receives a high payoff if she shoots and hits him. If, however, she shoots but
misses, Alfred, a grudge holder, will respond in a manner disagreeable to Mary, resulting in a very, very low payoff for her. If she does not shoot, she will receive the fairly
low payoff due to anyone suffering under Alfred’s reign of oppression. Mary knows of
a common brain lesion, sufferers of which attempt murder by shooting. Indeed most
murder attempts are committed by such sufferers, compelled to act by the lesion. However, the lesion also causes the aim of sufferers to fail at the crucial moment, and so the
vast majority of their murder attempts fail.
In Psychopath Button, Paul chooses whether or not to press the ‘destroy all psychopaths’ button. Paul prefers there to be no psychopaths to there being any, but also
prefers living in a world with psychopaths to his own death. Paul is fairly certain that
he is not a psychopath and he is quite sure that only a psychopath would press the
button.
Mary should not shoot and Paul should not press the button. They will very likely
cause their own demises if they do not heed this advice. However, CDT handles these
scenarios incorrectly because it ignores the evidence that Mary’s shooting and Paul’s
pressing the button would provide, namely that (it is very likely that) Mary is a terrible
shot and that Paul is a psychopath. CDT ignores this evidence because Mary’s shooting
10 Such
11 For
philosophers include Stalnaker (1972), Gibbard and Harper (1978) and Lewis (1981).
an exposition of this constraint, see Lewis (1981).
3
would not result in her being a terrible shot and Paul’s pressing the button would not
result in his being a psychopath.
2 Gandalf’s Principle and Benchmark Theory
In this section I outline Ralph Wedgwood’s response to the Egan counterexamples
described above. Wedgwood builds a decision theory crucially based on Gandalf’s
Principle, a maxim to the effect that agents should evaluate the merits of an action only
relative to the performances of other actions in the same state of the world, and not
make such evaluations across states. The theory assigns benchmarks to relevant states
of the world. Actions are then compared to the benchmark of each state (as described
below). I discuss how benchmarks are assigned to states and make explicit an important
constraint on this assignment.
Wedgwood (2011) argues that rational decision-making should incorporate Gandalf’s Principle (GP):
Gandalf’s Principle: The merits of an action in a given state should be evaluated relative only to the performances of other actions in that state, and not to their performances
in other states.
Gandalf’s Principle strikes me as an eminently sensible principle to incorporate into
rational decision-making. It provides insight into the error one-boxers commit in Newcomb’s problem. A one-boxer might argue as follows:
‘Choosing just box B means it is very likely I shall walk away with a million. Choosing
both boxes means it is very likely I shall walk away with only a piffling thousand. So I
should choose just box B.’
In light of GP, we can see an error in the reasoning. A proponent of GP might counterargue:
‘It’s not part of your choice whether you’ll likely walk away with a million or a thousand. Whatever state you’re actually in now, it does you no good to compare what
you’re best off actually doing now to how you might have done had things been different. That comparison does not serve to reflect the choice that you actually face.
Choosing just box B just looks like wishful thinking!’
Wedgwood develops a theory based on GP. Benchmark theory (BT) requires only that
comparisons of goodness of outcome are meaningful within states and that differences
in goodness can be compared across states. Outcomes (i.e. action-state pairs) are
evaluated relative to a benchmark for the state they occupy. It is in this way that the
theory incorporates Gandalf’s Principle. By comparing the utility of taking an action in
a state to a benchmark for that state, an agent avoids making cross-state comparisons
of actions.
4
The difference between the utility of an outcome and the benchmark for the state it
occupies is the comparative value for that outcome (let CV (A, Si ) denote the comparative value of action A in the ith state):
CV (A, Si ) = U(A, Si ) − bi
(3)
The evidentially expected comparative value (ECV) of action A is:
ECV (A) = ∑ CV (A, Si )Cr(Si |A)
(4)
∀i
BT, with some qualifications that I will discuss later, advises that a rational action
is that which maximizes ECV. It is a theory of evidentially expected comparative value
maximization.
Briggs (2010) shows that BT agrees with CDT in any case where states are probabilistically independent of action, i.e. one’s credence that one is in a certain state
does not vary across actions. Wedgwood notes that (at least with benchmarks that satisfy certain conditions) in two-action cases with a strictly dominating action (such as
Newcomb’s problem), BT will always favour the strictly dominating action.12 Hence,
unlike EDT, BT handles cases like Newcomb’s problem correctly. Additionally, Wedgwood notes, unlike CDT, BT also handles problems like Murder Lesion and Psychopath
Button correctly too.
2.1
Benchmarks
Wedgwood advocates using weighted averages of utilities in state S as benchmarks for
S. Regret is what he terms a weighting that assigns weight 0 to all outcomes in a state
except the best and relief is a weighting that assigns 0 to all outcomes except the worst.
He says:
Every measure of comparative value that results from using one of these
kinds of weighted averages to set the relevant benchmarks is, in effect, a
mixture of the two extreme measures that I mentioned earlier, “regret” and
“relief”. (Wedgwood 2011, p. 24)
There is presumably a tacit constraint at work here. It is reasonable to demand of a
rational decision theory that it approaches isomorphic decision problems in the same
way. For example, suppose that an agent is faced with a decision problem (decision
problem 1) with the following payoff matrix:
Let the agent have posterior credences Cr(S1 |A) = Cr(S2 |B) = p, i.e. she has the
‘evidential probability’ matrix given in table 2.
Because of the symmetry of the problem it would be odd if a decision theory
yielded, say, A as a rational choice of action and B as an irrational choice. There is
a non-trivial automorphism in the problem (i.e. we can swap A and B and S1 and S2 to
yield a problem that is the same in all relevant respects) that could be exploited to show
12 This does not generalize to cases with more options: BT, as I have presented it thus far, sometimes will
choose a dominated action. See below for discussion.
5
Table 1: Payoffs 1
Action/State
S1
S2
A
B
10
1
1
10
Action/State
S1
Table 2: Posterior credences 1
S2
A
B
p
1− p
1− p
p
that the same decision theory would yield B as a rational choice and A as an irrational
choice. Such a theory would therefore be inconsistent.
A simple way to satisfy this constraint and to work weighted averages as mixtures
of relief and regret in the way that Wedgwood suggests is to rank outcomes by preference in each state before assigning weights (I take it that this is what Wedgwood
has in mind). Specifically, then, in a decision problem with n actions and where Ωi
is the set of outcomes in the ith state we take ri : Ωi 7→ {x ∈ Z+ : x ≤ n} such that
ri (A j , Si ) > ri (Ak , Si ) if outcome hA j , Si i is strongly preferred to outcome hAk , Si i, and
it is a free choice about the ranking of outcomes among which the agent is indifferent
(since these will be assigned the same utility by the agent their relative ranking cannot
make a difference to the benchmark). Benchmarks are set by choosing some probability mass function w (the weighting) with support {x ∈ Z+ : x ≤ n} and taking the
benchmark of the ith state as:
bi = ∑ U(A j , Si ).w ◦ ri (A j , Si )
(5)
∀j
For example, the benchmarks for the problem described in table 1 would be calculated as follows if, say, we used a weighting that assigned 0·9 to the best (first)
outcome and 0·1 to the worst (second). In state S1 , action A yields the highest payoff,
so outcome hA, S1 i is ranked first in that state and outcome hB, S1 i is ranked second,
i.e. r1 (A, S1 ) = 1 and r2 (B, S2 ) = 2. Our weighting, w, assigns weight 0·9 to the first
outcome and 0·1 to the second, i.e. w(1) = 0·9 and w(2) = 0·1. Hence we calculate the
benchmark for state S1 as follows:
b1 = U(A, S1 ).w(r1 (A, S1 )) +U(B, S1 ).w(r1 (B, S1 ))
= U(A, S1 ).w(1) +U(B, S1 ).w(2)
(6)
= U(A, S1 ).0·9 +U(B, S1 ).0·1 = 10.0·9 + 1.0·1 = 9·1
In state S2 , it is action B that has the highest payoff and hence is ranked first, so we
have r2 (B, S2 ) = 1 and r2 (A, S2 ) = 2. The benchmark for the state is thus:
6
b2 = U(A, S2 ).w(r2 (A, S2 )) +U(B, S2 ).w(r2 (B, S2 ))
= U(A, S2 ).w(2) +U(B, S2 ).w(1)
= U(A, S2 ).0·1 +U(B, S1 ).0·9 = 1.0·1 + 10.0·9 = 9·1
(7)
Of course, in this case, the automorphism in the problem described above is what
secures that the benchmarks in both states are the same. In general, this need not be
so. With this way of calculating benchmarks, let the evidentially expected comparative
value of A according to weighting w be denoted ECV w (A).
Different weightings give different results. Wedgwood suggests that every weighting is equally reasonable and offers two ways in which to interpret this. First, an
agent could arbitrarily select a weighting and then maximize evidentially expected
comparative value as chosen by that weighting. Second, we can distinguish between
absolutely rationally permissible and absolutely rationally required actions. It is absolutely rationally required to prefer an action A to an action B iff for every weighting
w, ECV w (A) ≥ ECV w (B) and for some weighting v, ECV v (A) > ECV v (B). It is absolutely rationally permissible to prefer an action A to an action B iff it is not absolutely
rationally required to prefer B to A.
Under the first suggestion it seems that it is rationally required to prefer A to B iff
under every weighting w, ECV w (A) > ECV w (B). If, under some weighting v, A’s ECV
is not greater than B’s then if agents can arbitrarily choose weightings, it cannot be that
it is rationally required to prefer A to B. An agent could (arbitrarily) choose weighting
v, under which B receives at least as great an ECV as A. If A and B are the only actions
available then B maximizes ECV under v. If it is not rationally required to prefer A to
B then by employing BT it should be possible to choose B, i.e. there should be some
weighting such that B’s ECV is at least as good as A’s.
3 Dominated actions
Dominated actions pose a problem for benchmark theory. It is widely agreed among
decision theorists that—at least in problems with finitely many available actions—a
dominance principle constrains rational choice of action, i.e. dominated actions should
not be chosen by rational agents. There are decision problems in which maximizing expected comparative value results in choosing a dominated action. Wedgwood’s
proposed solution to this problem is to rule out dominated actions from consideration
before expected comparative values are calculated. He offers a supposedly benchmarktheoretic motivation for this solution. In this section, I discuss in more detail how ECV
maximizaton behaves with respect to dominated actions. I consider a challenge to
benchmark theory due to Briggs (2010): nearly dominated actions. I argue that Wedgwood’s motivation for ruling out dominated actions relies upon the Independence of
Irrelevant Alternatives, a principle with which benchmark theory is inconsistent. Finally I argue that the weak dominance principle should be treated with caution anyway,
and then consider and reject a candidate replacement principle.
7
3.1
Dominated actions and benchmark theory
Action X strictly dominates action Y iff for every state Si , U(X, Si ) > U(Y, Si ), and
weakly dominates Y iff for every state Si , U(X, Si ) ≥ U(Y, Si ) and for some state S j ,
U(X, S j ) > U(Y, S j ). An action is strictly (weakly) dominated iff there is an action that
strictly (weakly) dominates it.
As stated earlier, in two-action decision problems, BT always (i.e. with any weighting) rejects strictly dominated actions. In light of the constraints on benchmarks suggested above, we can now show a cluster of results concerning how BT behaves with
respect to dominated actions in two-action problems.
Theorem 1 In a decision problem with only two actions A and B, if A weakly dominates B then for any weighting w, ECV w (A) ≥ ECV w (B).
Proof 1 Let A and B be the only actions available in an n-state decision problem and
let A weakly dominate B. Hence ∀i U(A, Si ) ≥ U(B, Si ) and ∃ j U(A, S j ) > U(B, S j ).
Since A weakly dominates B, for every i we can choose the ranking function ri so that
ri (A, Si ) > ri (B, Si ) (because we have a free choice about the ranking in states where
the utility of A is equal to that of B). We choose an arbitrary weighting that assigns
w ∈ [0, 1] to the top-ranked action (which is always A) and 1 − w to the bottom ranked
action (which is always B). The benchmark for state Si is therefore
bi = w.U(A, Si ) + (1 − w).U(B, Si )
(8)
The comparative values in state Si are:
CV (A, Si ) = U(A, Si ) − bi = (1 − w)(U(A, Si ) −U(B, Si )
(9)
CV (B, Si ) = U(B, Si ) − bi = −w.(U(A, Si ) −U(B, Si )
(10)
Since A weakly dominates B, we have U(A, Si ) ≥ U(B, Si ) for every Si and for some S j ,
U(A, S j ) > U(B, S j ). Hence ∀i U(A, Si ) −U(B, Si ) ≥ 0 and since 1 − w ≥ 0,
CV (A, Si ) = (1 − w)(U(A, Si ) −U(B, Si ) ≥ 0
(11)
CV (B, Si ) = −w(U(A, Si ) −U(B, Si ) ≤ 0
(12)
Since w ≥ 0,
Similarly, U(A, S j ) −U(B, S j ) > 0 and therefore either 13 or 14 holds:
CV (A, S j ) ≥ 0 > CV (B, S j )
(13)
CV (A, S j ) > 0 ≥ CV (B, S j )
(14)
Since credences are non-negative we therefore have:
CV (A, S j )Cr(S j |A) +
∑ CV (A, Si )Cr(Si |A) ≥
∀i6= j
CV (B, S j )Cr(S j |B) +
∑ CV (B, Si )Cr(Si |B)
(15)
∀i6= j
But the LHS and RHS of 15 are respectively ECV w (A) and ECV w (B), wherefore ECV w (A) ≥
ECV w (B).
8
Theorem 2 In a two-action decision problem with actions A and B, if A weakly dominates B and the agent facing the problem believes that it is possible that she is in a
state in which the utility from choosing A exceeds that from choosing B in the sense
that there is some state S j such that U(A, S j ) > U(B, S j ) and Cr(S j |A) +Cr(S j |B) > 0,
then for any weighting w, ECV w (A) = ECV w (B) only if w assigns weight 0 or 1 to the
top-ranked action.
Proof 2 We have the same set up as in the proof of theorem 1 and the condition that
there is some state S j such that U(A, S j ) > U(B, S j ) and not both Cr(S j |A), Cr(S j |B)
are zero. We assume that w ∈ (0, 1) is the weight assigned to the top-ranked action.
Again, we choose a ranking function that prefers A in states where A and B are tied.
Since 1 > w > 0 we obtain a stronger result than the disjunction of 13 and 14 in the
proof of theorem 1:
CV (A, S j ) > 0 > CV (B, S j)
(16)
Hence and since at least one of Cr(S j |A) and Cr(S j |B) is positive, we have:
CV (A, S j )Cr(S j |A) > CV (B, S j)Cr(S j |B)
(17)
Therefore:
CV (A, S j )Cr(S j |A) +
∑ CV (A, Si )Cr(Si |A) >
∀i6= j
CV (B, S j )Cr(S j |B) +
∑ CV (B, Si )Cr(Si |B)
(18)
∀i6= j
There are two further results worth noting. It is possible in a two-action decision
problem with a weakly dominated action that the ECVs of the actions are equal even
if the agent believes it possible to be in a state where the dominating action gives
higher utility. By theorem 2 this requires choosing a weighting that assigns 0 or 1
to the top-ranked action. The trick (and this is the only way it can be done) is to set
CV (A, S j )Cr(S j |A) = CV (B, S j)Cr(S j |B) = 0, where A weakly dominates B and S j is
any state in which A outperforms B, by choosing the weighting such that one of the
comparative values is 0 (by setting the weight assigned to the top-ranked action to 0 or
1) and ‘setting’ the credence on the other side to 0.
In a two-action problem with a strictly dominated action, the constraint on credences in theorem 2 is met. The agent believes she is in some state and in such a
problem the dominating action gives higher utility in every state, so the agent believes
she is some such state. By theorem 2 the ECVs of the two actions will be the same only
if the weighting assigned to the top-ranked is 0 or 1. But the trick described above now
is not available, since this would involve setting all the agent’s posterior credences for
one of the actions to 0. Hence a strictly dominating action always has the higher ECV
in a two-action problem.
How one interprets these results depends in part on how one regards the relationship between weightings and rational choice of action. At the end of the last section, the two suggestions Wedgwood makes about how to regard this relationship
9
were outlined. Under both it is permissible to choose a weakly dominated action if
Cr(S j |A) = Cr(S j |B) = 0 whenever U(A, S j ) > U(B, S j ), where A weakly dominates
B. Under the first suggestion, even if this condition does not hold, it is sometimes
permissible to choose a weakly dominated action. This involves choosing a weighting
that assigns 0 or 1 and for credences to hold in the way described above. Under the
second suggestion, it follows as an immediate corollary to theorems 1 and 2 that if this
condition does not hold in a two-action decision problem then it is never permissible
to choose a weakly dominated action in it.
An analogue of these results does not hold for decision problems with more than
two actions. For example, consider the problem (decision problem 2) with the payoff
and evidential probability matrices given in tables 3 and 4.
Table 3: Payoffs 2
Action/State S1 S2
S3
Table 4: Posterior credences 2
Action/State S1 S2 S3
A
B
C
9
6
8
A
B
C
0
3
4
0
4
5
1
0
0
0
1
0
0
0
1
According to the weighting that assigns 0·3 to the two top outcomes in a state and 0·4
to the worst, B has the highest ECV, followed by C and A. Hence, according to BT (as
far as I have presented it here) it is rationally permissible to prefer B to A and C, but B
is strictly dominated by C.
Wedgwood is aware of this complication and writes:
The problem is that, in fact, in every situation in which we have to make a
choice, there is an enormous number of perfectly dreadful courses of action that are at least physically (if not psychologically) available. (Wedgwood 2011, p. 20)
Wedgwood (2011, p. 23) meets the challenge to BT posed by dominated actions by
ruling that they ‘should not be taken seriously in practical reasoning, and so should
be excluded from consideration altogether.’ In response to Briggs’ suggestion that this
move is ad hoc, Wedgwood argues that the proponent of BT has a perfectly good reason
for ruling out weakly dominated actions – namely that a weakly dominated action A
will lose out on a pairwise comparison with its dominating action B and therefore ‘there
is nothing to be said in favour of A that cannot also be said in favour of B, while there
is something to be said in favour of B that cannot be said in favour of A.’
3.2
Nearly dominated actions
Briggs (2010) raises the case of nearly dominated actions13 as an objection to BT and,
in particular, Wedgwood’s proposal about ruling out weakly dominated actions. These
are derived from weakly dominated actions by sweetening their utility very slightly
13 This
is Wedgwood’s term.
10
in some state. We might formally define them as follows: Action A nearly dominates
action B iff there is small positive ε such that there is some state S j in which U(B, S j ) =
U(A, S j ) + ε, some state Sk in which U(A, Sk ) > U(B, Sk ) and for every state Si 6= S j ,
U(A, Si ) ≥ U(B, Si ). An action is nearly dominated iff there is some action that nearly
dominates it.
Briggs argues that such actions will not be ruled out by Wedgwood’s proposal of
ruling out weakly dominated actions and therefore will be in the running for rationally
permissible actions. Here Wedgwood bites the bullet and replies that they should be. I
agree. Suppose A nearly dominates B and let S j be the state in which B performs ever
so slightly better than A. If an agent has very good reason to think that she is in state
S j then she has very good reason to think that she should choose action B, for it results
in a (very slightly) higher payoff than action A. Under the right circumstances, both
EDT and CDT will agree that B should be chosen. For instance, suppose for some very
small δ the agent has credences Cr(S j |A) = Cr(S j |B) = 1 − δ (in the case of EDT) or
Cr(A → S j ) = Cr(B → S j ) = 1 − δ (in the case of CDT). Where U A is the largest
ε
payoff A ever results in and UB is the smallest payoff B ever results in, if δ < U A +U
B +ε
then both CDT and EDT prefer B to A. After all, these are theories that maximize
expected value, and no matter how small ε is, it is always possible to find credences
that result in B having greater expectation than A as long as credences can be arbitrarily
small.
Nearly dominated actions are, I think, besides the point. If they are a problem for
BT then they are already a problem for EDT and CDT, but it is far from clear that they
are a problem anyway: it is intuitive that sometimes it could be rational to choose a
nearly dominated action. Indeed, note that two actions can nearly dominate each other.
If these are the only actions available to an agent, one of them must be chosen. It
would be impossible not to choose a nearly dominated action, so it seems unreasonable
to rule that under such circumstances, choosing a nearly dominated action is irrational.
In contrast, weak domination is an asymmetric relation. It is never possible to have two
actions weakly dominate each other.14
Although Briggs’ argument from nearly dominated actions fails, there is nevertheless a case for Wedgwood’s proposal of ruling out weakly dominated actions being ad
hoc. By his own lights, Wedgwood’s argument seems to fail. The reason why concerns
the Independence of Irrelevant Alternatives.
3.3
The Independence of Irrelevant Alternatives
There is an apocryphal story about Sidney Morgenbesser ordering dessert. When
offered a choice between apple pie and blueberry pie by a waitress, Morgenbesser
14 In a decision problem with infinitely many actions to choose from (and only such problems), every action
can be strictly (or weakly) dominated. It does not seem in general therefore that it is unreasonable to rule
that every action in a problem is irrational to choose. In an infinite problem, there might always be a better
action available than whatever one chooses, and this is why one’s choice might always be irrational. This
plainly cannot be the case in a finite problem. In particular in a two-action problem in which both actions
are nearly dominated, they cannot both be better than each other. An anonymous reviewer comments that it
is implausible that every action in a decision problem be irrational to choose. While I’m not sure that I agree
with this point, if it is true then it serves to emphasize that it could be rational to choose a nearly dominated
action.
11
plumped for apple. The waitress took the order and left, only to return a minute later to
say that there was now also cherry pie on the menu. Morgenbesser replied that in that
case, he would take the blueberry. A money-pumping argument can be made against
someone with such preferences. Suppose Morgenbesser alternates between situations
in which he can choose between A and B and situations in which he can choose among
A, B and C. Let us also say that these choices will take the form of trade offers: he
starts with a default option and can offer to trade it by paying some premium for one of
the other options, and we will assume that if he prefers one option to another then there
is some amount he is willing to pay in order to have the former rather than the latter.
We initialize Morgenbesser with a default B, which he will trade, at a premium p > 0,
for A (since this is an A, B choice scenario and he prefers A to B in such a scenario).
If we introduce C onto the menu, Morgenbesser will now trade A back, at a premium
r > 0, for B. We’ve extracted p + r > 0 from Morgenbesser and returned him to his
initial endowment of B. We can iterate this process to extract arbitrarily large amounts
of money from Morgenbesser in return for nothing.
The error, if it can be described as such, that an agent such as Morgenbesser is
committing here is to violate the Independence of Irrelevant Alternatives (IIA), i.e.
his preferences between two options are not constant when new options are provided.
There are situations in which this violation seems very reasonable (and indeed the
money-pumping argument above fails). Sen (1993) discusses three kinds. The first he
terms positional choice. There may be certain rules that govern some choice situations
that upset IIA. Sen’s examples are ‘do not take the largest slice’ and ‘do not take the last
apple’. The second concerns the epistemic value of additional choices. A clean-living
agent may prefer to accept an invitation to take tea with an acquaintance to declining,
but prefer to decline any invitation if the acquaintance offers the option of snorting
cocaine as well. The idea at work in this example is that the very presentation of the
option to snort cocaine has caused the agent to revise her beliefs about how enjoyable
taking tea with her acquaintance would be. Sen’s third case is freedom to reject. The
point of a hunger strike is to reject emphatically the option of eating well. If the only
alternative to a complete fast is scarcely eating at all, an agent’s preferences might
invert. Wedgwood discusses incommensurability as another kind. Where an agent has
a non-total preference relation, certain options A and B may be such that neither is
preferred to the other, yet the agent is not indifferent between them. Perhaps this could
be because the options in question are so unalike and so outré that the agent cannot
compare them. Now a sweetened version of A, A+ (which is just like A except the agent
also gets some minor additional benefit, say a bonus of £1) is strictly preferred to A. A
and B are to be considered rationally permissible when they are the only options, but
in the presence of another option A+, A is no longer considered rationally permissible,
although B might be. Since B is rationally permissible in this situation but A is not, B
can now be considered, in some sense at least, preferred to A.
Wedgwood is aware that BT violates IIA. An action A that is always rejected by
BT in favour of another B when just A and B are available may be chosen by BT when
another action C is available. He offers the following example.15 An agent is faced
15 I have, for clarity of exposition, relabelled actions to accord with my description of IIA above. However,
the structure of the problem and the values used for credences and utilities are true to Wedgwood’s example.
12
with a two-action, two-state decision problem (problem 3) with the following utilities
and posterior credences:
Table 5: Payoffs 3
Action/State S1 S2
Table 6: Credences 3
Action/State S1
S2
A
B
A
B
0
0
900
1800
0.1
0.9
0.9
0.1
Since B weakly dominates A in this example and it involves only non-zero posterior
credences, it is easily seen by the results of subsection 3.1 that the ECV of B is always
greater than A in this problem. Now suppose a new action is introduced to the problem
(problem 4) as described in tables 7 and 8.
Table 7: Payoffs 4
Action/State S1
S2
A
B
C
0
0
2000
Table 8: Posterior credences 4
Action/State S1
S2
A
B
C
900
1800
0
0.1
0.9
0.1
0.9
0.1
0.9
There are weightings such that in this ‘expanded’ problem the ECV of A is larger
than the ECV of B.16 The preference for B over A in the two-action problem of a
BT-sensitive agent who used such a weighting is inverted when the new option C is
introduced. This violates IIA.
Wedwood’s response is to reject IIA. He does
...concede that there are many cases where the IIA does indeed hold. In
particular, the IIA certainly seems to hold in cases that involve (i) no incommensurability, and (ii) no probabilistic dependence of the states of
nature on the agent’s choices. (Wedgwood 2011, p. 28)
Sen’s counterexamples suggest to me that IIA certainly seems not to hold in certain
nevertheless commensurable and probabilistically choice-independent cases. However,
Wedgwood’s reason for rejecting IIA seems to be entirely grounded in considerations
of incommensurability.17 However, the example he gives of BT’s inconsistency with
16 Wedgwood claims that wherever the benchmark is set, BT requires choosing A. This is false. There is a
relatively small region of values (where the weighting given to the top-ranked action is small in comparison
to the weighting assigned to the second-ranked action) where the ECV of B exceeds that of A. In particular
63 < 72w − 16v, where v is the weighting given to the top-ranked action and w that given to the second, is a
sufficient and necessary condition for B to receive a higher ECV than A. As an aside, C is never preferred to
A in this problem but there is a small region of values in which C receives a higher ECV than B.
17 He cannot use probabilistic choice dependence as any grounds for rejecting IIA because his only reason
for thinking that IIA fails when there is choice dependence is that BT is inconsistent with IIA under these
circumstances. It would be begging the question to use this as a defence of BT. As far as I am aware, he has
no independent reason for thinking that IIA fails in these dependency situations.
13
IIA has nothing to do with incommensurability. Plainly for BT to work, actions need to
be commensurable within states – otherwise, how are we to obtain comparative values?
It is also clear that Wedgwood’s example is not a case of positional choice, epistemic
value or freedom to reject. So why should we accept that the example Wedgwood
gives is a case where IIA genuinely fails? It cannot be just because BT tells us so.
Wedgwood needs an independent reason to motivate the claim that we should let go
of IIA in this example, particularly since, outside of special circumstances, there are
good reasons, such as the money-pumping argument, for thinking it should constrain
rational decision-making. It does not even seem that the example at hand is of the same
kind as the incommensurability example. In the incommensurability example, we had
a clear reason why B is preferred to A when A+ is available: because A+ knocks A out
of the running. No such explanation is at hand in the example Wedgwood considers.
But even if Wedgwood can find such independent reasons, he is likely to be at
odds with his own argument for ruling out weakly dominated actions. In response to
the charge that it was ad hoc to do so, he gave a reason why a benchmark theorist
should rule out weakly dominated actions, namely that pairwise comparisons in BT
of weakly dominated actions with actions that dominate them result in the dominating
action being preferred under every weighting (modulo the special cases identified in
subsection 3.1). Surely the argument tacitly appeals to IIA – why else think that the
pairwise comparison is any use to guiding decisions in the more inclusive problem?
However, Wedgwood writes:
The “contracted” choice situation is simply a different situation from the
“expanded” choice situation. It is logically impossible for any agent to
be in both situations at the same time. It cannot be the case both that the
only options available to you are A and B, and also that your available options include a third option C as well. The central idea of my approach is
precisely that the crucial factor in rational decision-making is the way in
which all the available options compare with each other within each state
of nature, and so my approach will obviously accept that two choice situations in which different options are available will be crucially different
from each other—at least so long as all of those options are ones that deserve to be taken seriously by a rational deliberator. (Wedgwood 2011, p.
26)
If weakly dominated actions are not among those that deserve to be taken seriously
by a rational deliberator then Wedgwood owes us an account to explain why. The only
account he gives is that a benchmark theorist has reason to rule them out in pairwise
comparisons, which aside from courting circularity, stands completely at odds with
what he says in the passage quoted above.
3.4
Weakly dominated actions and weakly dominated strategies
There are reasons to be cautious about throwing out weakly dominated actions in all
cases anyway. Call the following principles respectively the weak dominance principle
and the strict dominance principle:
14
Weak dominance principle: Where all relevant states are causally independent of A
and B, and A weakly dominates B, B should not be chosen.
Strict dominance principle: Where all relevant states are causally independent of A
and B, and A strictly dominates B, B should not be chosen.
The weak dominance principle is stronger than the strict dominance principle in the
sense that it entails it. Some authors use ‘dominance principle’ to mean the weak dominance principle, while others use it to mean the strict dominance principle.18 In this
subsection, I shall present some arguments for being, at least, suspicious of the weak
dominance principle. In the next subsection I discuss the weak dominance principle in
relation to benchmark theory in more detail.
CDT implies the strict dominance principle but not the weak dominance principle.
CDT will reject weakly dominated actions (causally independent of states) in all cases
where the prior credence given to some state in which a weakly dominated action is
defeated by its dominator is non-zero (and it is clear therefore that it always rules out
strictly dominated actions, since such actions always face some state with non-zero
prior credence in which it is defeated by a dominator). Where this is not the case, CDT
does not necessarily reject them. Suppose I have a choice between taking an old lottery
ticket and not taking it. I know that the ticket wasn’t a winner because I already know
the outcome of the lottery. The decision problem (‘Lottery ticket’) I face is described
by table 9.
Action/State
It’s a loser
Table 9: Lottery ticket
It’s a winner
Take it
do not take it
£0
£0
£1,000,000
£0
I assign zero credence to the ticket’s being a winner (and deem the state probabilistically independent of my choice) and so it seems perfectly rational not to take it, even
though this is weakly dominated, which is exactly what EDT and CDT advocate anyway. It is also absolutely rationally permissible to take it under BT, unless we eliminate
weakly dominated actions beforehand.19
Let us change the situation a little. I no longer deem the state statistically independent of my choice. I know that I take the ticket if and only if it is a winner. In this case,
I have conditional credences Cr(it is a winner|I take it) = Cr(it is a loser|I do not take
it) = 1. Now both EDT advises me to take it and rules out not taking it. Under BT it is
absolutely rationally required to take it, since the ECV of taking it exceeds the ECV of
not taking it unless a weighting is used that assigns 1 to the top-ranked action, in which
18 Nozick (1969) and Kahneman and Tversky (1986) are examples of the former; Weirich (2004) is an
example of the latter.
19 Note that Briggs’ proof that BT agrees with EDT and CDT in cases where the state of the world is
probabilistically independent of one’s choice of action assumes that weakly dominated actions are not ruled
out beforehand.
15
case the ECVs are equal. But clearly taking it is bonkers if I know it is a sure loser
because I already know the results of the lottery! A natural line to take here is that,
in problems with finitely many states, both EDT and BT should ignore from all consideration any states that are assigned zero prior credence. There is good independent
reason for doing so: since these are states the agent anticipates will never happen, they
should not be considered at all relevant to the decision. I will return to this proposal in
the next subsection.
There is another, more decisive reason for at least advocating caution with respect
to weakly dominated actions. An open problem in decision theory is how it is to be
unified with game theory. For instance, one of game theory’s main tools for predicting
the behaviour of rational agents is Nash equilibrium, but there is no thorough-going
decision-theoretic explanation for why it should be rational to play as part of an equilibrium. Some games have only (pure) Nash equilibria that involve weakly dominated
strategies. For example consider the two-player game with the following normal form
matrix:
Table 10: Game 1
1/2
U
C
D
L
1, 1
0, 1
1, 1
M
1, 1
0, 2
2, 2
R
1, 0
3, 0
2, 3
The only pure strategy Nash equilibrium of this game is < U, L >, but both U and
L are weakly dominated. The reduced game (matrix below) from which U and L are
eliminated has no pure strategy equilibrium.
Table 11: Game 2: game 1 reduced
1/2
C
D
M
0, 2
2, 2
R
3, 0
2, 3
There is a mixed strategy equilibrium in game 1 that effectively involves eliminating both U and L (because they are both played with zero probability in this equilibrium): < (0, 31 , 13 ); (0, 31 , 23 ) >, which corresponds to the unique equilibrium in game 2,
< ( 31 , 32 ); ( 13 , 23 ) >. Some Nash equilibrium will always remain after eliminating weakly
dominated strategies in a game with only finitely many strategies.20 However, this fact
does not immediately justify discarding other equilibria. Yet it seems that the sort of
argument Wedgwood gives to justify eliminating weakly dominated actions in decision
20 Proof sketch: First note that since weak domination is irreflexive and transitive, every player with only
finitely many strategies must have some undominated strategy. Nash showed that every finite game has a
Nash equilibrium, and hence if we reduce a (finite) game Γ by eliminating a weakly dominated strategy, the
resulting game will have one, π . But every strategy did at least as well in the non-reduced game Γ, since
we only eliminated a weakly dominated strategy. Therefore π must contain only best responses in Γ and is
therefore Nash in Γ. It follows by induction that equilibria of Γ will remain regardless of how many weakly
dominated strategies are eliminated (or indeed if they are eliminated iteratively).
16
problems would also justify eliminating weakly dominated strategies in games. For
example, from player 1’s perspective in game 1, there is nothing to be said in favour of
U that cannot also be said in favour of D.
It seems reasonable to expect that, if game theory is to be unified with decision theory, the basis upon which a rational agent makes a decision in a non-strategic problem
will be the same as that upon which she makes a decision in a strategic (i.e. gametheoretic) problem. In particular then, we should expect that if the weak dominance
principle holds in decision theory, this will leak into game theory. For, in the sort of
simple game described above, a player faces choice under uncertainty that looks familiar enough to the decision theorist: there are possible states of the world (i.e. whatever
strategies have been chosen by other players) and which state obtains together with the
player’s choice of action determines her payoff. That these states are determined by
the actions of other agents facing similar problems might be relevant in choosing an
optimal action, but it does not seem relevant in characterizing the problem the agent
faces in this more decision-theoretic style. In game 1, for example, player 1 might be
uncertain about whether player 2 has chosen (or will choose) L, M or R. That which of
these states obtains has been decided by player 2, who is a player with such-and-such
preferences, might inform player 1 about what credences are rational to hold about
which state obtains. However, beyond this, it does not seem that there is any basis on
which player 1’s rational choice of action can be described as fundamentally different
to a choice made in a non-strategic context.21 In this case, if the weak dominance
principle holds, it holds for game-theoretic problems too.
There is another problem with eliminating weakly dominated strategies. If rational
players do not play weakly dominated strategies, then rational players with common
knowledge of rationality should eliminate weakly dominated strategies to obtain a reduced game, with common knowledge that it is this reduced game that they are playing.
If the reduced game has weakly dominated strategies, they should likewise eliminate
these by the same rationale, and so on. In other words, eliminating weakly dominated strategies motivates iterated elimination of weakly dominated strategies. However, what remains after weakly dominated strategies have been iteratively eliminated
depends upon the order in which they were removed.
A similar problem concerns the credences that players in a game might be expected
to have given their expectations about what other players will do. Recall that earlier
it was suggested that BT should ignore any states that are assigned zero credence. In
game 3 (below) if both players have common knowledge of rationality and rational
players do not play weakly dominated strategies then player 1 should give zero credence to player 2’s playing L, and so considers herself to face game 4.
U was weakly dominated in game 3 but is not in game 4. In this case, the outcome
might reasonably be expected to be < U, R >, even though it involves a weakly dominated strategy, U. It is difficult to see what a theory of games incorporating the weak
domination principle might say about such a case.
21 Or at least if there is, this seems to cast a long shadow over the prospects of unifying decision theory
and game theory.
17
Table 12:
1/2 L
U
1, 1
0, 1
C
2, 1
D
3.5
Game 3
M
R
2, 1 2, 3
3, 2 1, 0
2, 2 2, 2
Table 13: Game 4
1/2 M
R
U
2, 1 2, 3
3, 2 1, 0
C
2, 2 2, 2
D
Benchmark theory and elimination of weakly dominated actions
I have argued that Wedgwood’s argument for eliminating weakly dominated actions
fails and that there is good reason not to eliminate them in the way he suggests. It
is worth examining how grave a problem this is for the benchmark theorist. After
all, one might think that if the weak dominance principle fails and ECV-maximization
sometimes chooses weakly dominated actions, there is no tension here and no difficulty
the benchmark theorist faces. I do not think that this is the case. For one thing, it is
far less controversial that the strict dominance principle holds, but ECV-maximization
sometimes yields strictly dominated actions as permissible.22
Let us assume that the benchmark theorist has some independently well-motivated
grounds for eliminating strictly dominated actions. Do considerations about dominated
actions pose any further difficulties? The problem for BT with respect to weakly dominated actions is not just that sometimes they do not seem irrational, it is this plus the
fact that sometimes they do seem irrational and BT does not rule them out simpliciter
in those cases. Action A in problem 3 is a case in point. If prior credences about
states are non-zero, A strikes me as irrational to choose here. I do not think I need
to provide a full explanation here of why I think this, but that it is weakly dominated
would doubtless enter into such an explanation were I to give one. Since A sometimes
maximizes ECV in this problem, it behooves the benchmark theorist to explain how it
is to be eliminated as impermissible by benchmark theory. Since I do not think that
in general weakly dominated actions can be systematically eliminated as irrational for
the reasons described in the last subsection, and I do not think that the benchmark theorist has good independent reason for doing so, the benchmark theorist cannot hope to
win me over by appealing to the weak dominance principle. I do not accept that this
principle explains why A is irrational here because I think the principle is false.
If the arguments given in the last subsection are correct and the weak dominance
principle is false, then presumably a good place for the benchmark theorist to start
in eliminating an action like A in the example discussed above is to seek a principle
weaker than weak dominance. Since the defence of weakly dominated actions given
in the previous subsection concerned to large extent states certain not to occur, this is
a natural place to begin constructing such a principle. Say that action X non-trivially
weakly dominates action Y iff X weakly dominates Y and there is some epistemically
possible state S j such that U(X, S j ) > U(Y, S j ). An action is non-trivially weakly dominated if some action non-trivially weakly dominates it. Consider the following princi22 Of
course, in the benchmark theorist’s defence, independent motivation for the strict dominance principle will be more easily found than that for the weak dominance principle.
18
ple:
Non-trivial weak dominance:Where all relevant states are causally independent of A
and B and A non-trivially weakly dominates B, B should not be chosen.
At first blush, endorsement of this principle enables the benchmark theorist to exclude undesirable actions like A in the example above, while not making absurd rulings
on (trivial) weakly dominated actions like in the lottery ticket example given earlier.
Furthermore, since every strictly dominated action is non-trivially weakly dominated,
the non-trivial weak dominance principle entails the strict dominance principle. Hence
countenancing non-trivial weak dominance should not count as a further complication
to the theory given that it countenances strict dominance: motivating or justifying this
principle already motivates or justifies strict dominance.
Despite its initial appeal, I will argue that non-trivial weak dominance raises fresh
problems for the benchmark theorist. The problem, which I shall spell out in greater
detail below, roughly, is this. Where an agent believes that if some state turns out to
be actual then she must have taken some particular action23 then it is plausible that if
the agent has ruled out that action then that state is no longer an epistemic possibility.
However, this means that what actions are ruled out by an agent can affect which remaining actions count as non-trivially dominated. The result is that some actions may
or may not be eliminated as irrational depending only on the order in which the agent
chooses to eliminate actions.
The notion of epistemic possibility at play here needs to be clarified. An objection
to the above might run that it is not at all plausible that if an agent rules out such an
action then such a state is no longer an epistemic possibility. After all, even if I rule
out some action as irrational, I might slip or stumble, or be deceived by an evil demon,
or change my mind, say ‘to hell with rationality’ and choose it anyway. This objection cannot get very far. If the non-trivial weak dominance principle is to do work for
the benchmark theorist then a much more robust conception of epistemic possibility
than that articulated in this objection is needed. Although anticipating making an error might sometimes be relevant in rationally choosing an action, it is not helpful to
include such possibilities as relevant epistemic possibilities because so doing threatens to impede the usefulness of the non-trivial weak dominance principle. Suppose I
faced a choice problem in which one action involved—the exact details of which depending upon the state of the world—being gruesomely murdered and another action
involved—the exact details of which depending upon the state of the world—being
heavily rewarded. I could say with firm conviction that it is certain I would not choose
gruesome murder. My choosing it would not be a relevant possibility, even though in
the same breath I could accept that, in some sense, it is an epistemic possibility that
I would make such a choice. Whatever sense that might be is not the one that should
be read in the formulation of non-trivial weak domination and hence of the non-trivial
weak dominance principle, for if it were, the principle would be far too strong. It would
rule out A in the example above, but it would also rule out not taking the lottery ticket.
23 This, of course, should not be given a causal or counterfactual reading. In the lottery ticket example, if
it turns out that the ticket is a winner then I must’ve chosen it. In Newcomb’s problem, if it turns out that
box B has the million then I must’ve chosen to one-box.
19
I am certain that it is a losing ticket: I saw the results last night; my memory’s good;
I am not drunk or otherwise intoxicated; it was announced on the news that no ticket
bought had won, etc. Now of course, it remains an epistemic possibility in some very
weak sense that it is not a losing ticket (I am or was being deceived by an evil demon,
etc.), but that possibility is not relevant to my decision.
A natural way to define the requisite notion of epistemic possibility, given that it
arises in a decision-theoretic environment, is to do so in terms of agents’ credences. We
cannot say that a state S is epistemically possible for agent α iff α assigns S positive
prior credence. This condition is too strong. A state assigned zero prior credence by
an agent need not be a state the agent believes cannot occur. It is a state the agent
is almost sure cannot be actual. Examples of problems with uncountably many states
provide obvious examples. Suppose I have a spinner that can take values between 0 and
2π. I spin the spinner, keeping its value hidden from you, and invite you to choose an
integer between 0 and 6. I tell you that once you have chosen, I will give you (x − a)2
utils, where x is the spinner’s value and a is your choice. If you uniformly distribute
your credences about the spinner’s value, then for any value between 0 and 2π you will
have credence 0 that the spinner has taken on that value. But clearly we should not
interpret you as believing, for any value, that that value cannot be the spinner’s actual
value.
There is another problem with defining epistemic possibility in the way described
above. An agent might not have well-defined prior credences over all states when presented a decision problem (even though posterior credences for states given actions
might be defined). The above definition precludes such states from counting as epistemically possible. This is absurd, as the limiting case where an agent has no welldefined prior credences over any states demonstrates. Instead the right way to define
epistemic possibility for present purposes seems to be as follows: a state S is epistemically impossible for agent α iff α’s prior credences are such that α is sure that S is
not actual. A state S is epistemically possible iff it is not epistemically impossible. For
simplicity, in the following exposition, we will consider only problems such that if an
agent assigns a proposition zero credence then that agent is sure that that proposition
is false. Suppose that a perfectly reliable predictor put £1 million in box Z iff she
predicted that I’d choose gruesome death. I am sure that I will not choose gruesome
death, so I am sure (since the predictor is infallible) that the predictor did not predict
I would choose gruesome death. I am sure that the actual state of the world is one in
which that prediction wasn’t made, that the predictor did not put £1 million in box Z,
etc. This state of the world is, in the sense described above, epistemically impossible.
This seems like a perfectly reasonable pattern of reasoning to enter into while deliberating about a rational course of action. It should not be confused with the far stronger,
widely disputed and presumably false claim that an agent need assign prior credences
to all available actions before a rational action can be chosen.
Now consider problem 5:
If when confronted with the problem, the agent regards all three states as live possibilities, i.e. for each of them is not sure that it will not occur, then both A and B
are non-trivially weakly dominated by C. Since A is non-trivially weakly dominated,
our agent employs the non-trivial weak dominance principle to reject it as a rational
course of action. It is ruled out by the agent as a course of action: an action the agent
20
Table 14: Payoffs 5
Action/State S1 S2
S3
Table 15: Posterior credences 5
Action/State S1
S2
S3
A
B
C
8
8
8
A
B
C
4
3
4
3
5
5
0.4
0
0
0
0.7
0
0.6
0.3
1
is sure not to choose. But since state S1 is the actual state only if A is chosen, state S1
is now regarded as epistemically impossible. But since S1 is the only state in which the
utility derived from C exceeds that from B, it is no longer the case that C non-trivially
weakly dominates B. ECV-maximization together with non-trivial weak dominance
principle leaves B as permissible at this stage, although A was eliminated as impermissible. However, if the agent had ruled out B as non-trivially weakly dominated when
first approaching the problem, A would have been similarly left as permissible. Finally,
the agent could have ruled both A and B out at the same time, leaving only C as permissible. Presumably it is this last arrangement of rejections that is the right one, but this
fact is not explained or justified by the non-trivial weak dominance principle. Indeed it
undermines the claim that it is correct, since it was by application of the principle that
other arrangements of rejection were reached. This casts doubt on the coherence of the
principle as a principle governing rational choice, at least in scenarios where states are
not statistically independent of actions.
It might be objected that agents do not revise their credences about states in the way
suggested, and so the problem of order dependence described above does not arise. I
withhold judgment about whether it is required by rationality to undertake such revision
(although it seems compelling that, at least in a large range of cases, rational agents
should exploit all the information available to them). Whether agents do in practice is
surely an empirical matter, but it seems very plausible that in certain cases—such as
the ‘gruesome death’ example given above, or Wedgwood’s own examples of ‘dreadful
courses of action’—some agents would do this and would not count as irrational for
doing so.
I have argued against ruling out weakly dominated actions simpliciter (i.e. against
the weak dominance principle). If the weak dominance principle is rejected, an unacceptable gap is left in benchmark theory, since ECV-maximization alone does not suffice to rule out intuitively irrational dominated actions from certain decision problems.
An alternative principle, the non-trivial weak dominance principle, was considered as a
candidate for filling the gap, but was found to be problematic for the reasons described
above.
4 An apparent counterexample to benchmark theory
So far the criticisms of benchmark theory presented in this paper have been ‘bottom
up’, i.e. I have argued that some of the foundational principles, in particular the weak
dominance principle, on which the theory is built are both unmotivated and problem-
21
atic. In this section I will change tactics and offer a ‘top-down’ criticism. Even if
BT can be given a solid foundation, there are decision problems for which it yields an
intuitively wrong result. Such a problem is offered here as a counterexample to BT.
Before constructing the counterexample, it will be instructive to consider an example problem that Wedgwood gives. This problem forms part of Wedgwood’s response
to Briggs. He uses it to argue that it is not always irrational to play a nearly dominated
strategy. I agree with his verdict here, but the arguments he employs I will later use
to defend an action that BT rules as irrational to choose. Furthermore, I suggest that
these same arguments applied differently to this very example produce a conclusion
problematic for BT.
The decision problem he considers is as follows (problem 6):
Table 16: Payoffs 6
Action/State S1 S2
S3
Table 17: Posterior credences 6
Action/State S1
S2
S3
A
B
C
0
9000
3000
A
B
C
1
0
0
0
3000
9000
0.9
0.05
0.05
0.05
0.9
0.05
0.05
0.05
0.9
There are weightings that will yield under BT any of the three actions as rational
and hence all three are absolutely rationally permissible. Wedgwood argues that this
is the right result and I agree up to a point. It might seem prima facie odd that A is
rationally permissible since B and C possibly result in far larger payoffs (or in Wedgwood’s backstory, possibly save many more lives). However, as Wedgwood points out,
one never chooses between saving one life or thousands in this problem. The world
is such that one can save thousands, which is good, or only one, not so good, but one
is not faced with a choice between these ways the world could be. Indeed it is in just
this sort of problem that Gandalf-style reasoning clarifies the nature of the choice one
faces. Wedgwood offers two ways in which A seems a palatable choice. First, when
state S1 is given a high prior credence, A seems like a very sensible choice (and under
this circumstance CDT will recommend A). Second, Wedgwood suggests that one can
conceive of the problem in such a way that make B and C seem like poor choices. In
state S1 one is pretty powerless to act, but A uses that power prudently. In choosing A
one should believe that one is nearly powerless but using one’s powers wisely. In states
S2 and S3 , one is powerful. Hence in choosing either B or C one should believe that one
is powerful but using one’s powers badly. Here it sounds like Wedgwood is appealing
to ratifiability, in Jeffrey’s (1983) sense. An action is ratifiable iff it maximizes the
utility an agent expects to receive given that she chooses that action. For example, oneboxing in Newcomb’s problem is not ratifiable because it does not maximize the utility
one expects to receive given that one has one-boxed. Given that one has one-boxed,
one expects there to be £1,000,000 in box B (and £1000 in box A), and under these
circumstances one maximizes one’s utility by two-boxing. Two-boxing is ratifiable:
given one has two-boxed, one expects there to be nothing in box B and under such circumstances one maximizes one’s utility by two-boxing. In the problem at hand, action
A is the only ratifiable action.
22
It strikes me that both of these arguments backfire for the benchmark theorist. In
the first instance, suppose that state S1 has very low prior credence. Presumably precisely the same line of reasoning that made A look very sensible when S1 enjoyed high
prior credence makes it look like a poor choice when S1 has low prior credence. If one
strongly believes oneself to be in S2 or S3 , A seems a terrible choice. Since BT ignores
prior credences, it will still yield A as absolutely rationally permissible under such circumstances. The benchmark theorist can bite the bullet here and maintain that it still is,
but in that case we are surely owed some explanation as to why. Turning to the second
argument, which Wedgwood (2011, p. 31) describes as ‘the right way...to conceive of
this choice situation’, one can argue that A is nevertheless uniquely ratifiable and intuitions to the contrary are mistaken. This is what the Jeffrey-style evidential decision
theorist would maintain. However, such a decision theorist would reject B and C as
irrational because they are not ratifiable. This goes too far for the benchmark theorist,
since B and C are still absolutely rationally permissible. If we conceive of this problem
in ‘the right way’, B and C look like poor choices. If one is, rightly or wrongly, guided
by the maxim ‘Make one’s choice such that one uses wisely the powers one expects to
have given that choice’, then one ought not choose B or C.24
4.1
A counterexample
The observations just made are by no means devastating to benchmark theory. A benchmark theorist could think that, although it is nearly dominated, A is not irrational to
choose and still not endorse either of the arguments Wedgwood gives for thinking it
is not. I will now construct what seems to be a plausible counterexample to BT. A
rational-looking choice of action in this problem is irrational to choose according to
BT. The sorts of argument discussed above seem to vindicate this action as a rational
choice, and it enjoys independent plausibility as such.
The predictor’s back (with fresh boxes) and offers you a choice between taking just
box A and taking just box B. If the predictor predicted that you would choose A, she
has put £1m in there and nothing in B. If the predictor predicted that you would choose
B, she has put £5m in there and £4.5m in A. The predictor is as reliable as you like and
the boxes were sealed, etc. after the prediction. What do you choose?
The above can be conceived as follows: each state has a default quantity associated
with it and there has been placed a bonus in the box predicted. In the A-predicted state
the default is £0, with a £1m bonus in A, and in the B-predicted state the default is
£4.5m with a £500,000 bonus in B. It generalizes as follows:
Here a is the default when A is predicted and c is the bonus placed in A in that case;
b is the default when B is predicted and d is the bonus placed in B in that case.
24 An anonymous reviewer suggests that perhaps Wedgwood does not conceive of ratifiability as a necessary condition on rational decision-making, but rather a feature that counts in favour of rationally choosing
an action. If this is so, then Wedgwood can maintain that the ratifiability of A may explain why it is rationally
permissible to choose, while denying that B and C are rationally impermissible simply because they are not
ratifiable choices. However, if Wedgwood does think this, it still seems that we are owed some explanation
of why B and C are rationally permissible when they are to be thought of as using one’s powers badly. Wedgwood might say here that, while this fact should count against them as rational choices, there remain other
considerations in their favour that it does not outweigh.
23
Table 18: Payoffs 7
1/2
A predicted
B predicted
A
B
a+c
a
b
b+d
Solving the problem with BT: Let us assign weighting w to the highest payoff in any
state and 1 − w to the other payoff. This gives us benchmark (a + c)w + a(1 − w) =
a + cw when A is predicted and benchmark b(1 − w) + (b + d)w = b + dw when B is
predicted. This yields the following comparative values for outcomes:
Table 19: Comparative values 7
1/2
A predicted
B predicted
A
B
c(1 − w)
−cw
−dw
d(1 − w)
If we assume the predictor is perfectly reliable (or more accurately, the agent facing
the problem has credences to that effect) then evidentially expected comparative values
are as follows:
Table 20: Evidentially expected comparative values 7
1/2
A predicted
B predicted
Total
A
B
c(1 − w)
0
0
d(1 − w)
c(1 − w)
d(1 − w)
If w < 1, A has a greater ECV than B if the ‘mark up’ under A prediction (c) exceeds
that under B prediction (d). When w = 1 (i.e. maximizes evidentially expected regret),
both A and B have the same ECV of 0, so according to BT it absolutely rationally
required to take A. Intuitively it is not irrational to choose B here, so this result is
problematic for BT. Arguably it is not irrational to choose A either. One might be
playing safe: A maximizes minimum gain. If we were told by a reliable witness that he
had seen the predictor putting £1m in A, for instance, it might be rational to choose A.
By the same token, it is rational to choose B if we were told by a reliable witness that
he had seen the predictor putting £5m in B.
Perhaps Wedgwood’s suggestion of the distinction between absolute rational requirement and absolute rational permissibility is too strong. We could turn instead
to his suggestion that an agent arbitrarily selects a weighting and chooses rationally
by choosing an action that maximizes ECV under this weighting. Since B maximizes
ECV when w = 1, BT does not rule out B as a rational choice under this way of viewing
different weightings.
24
But now consider the problem adjusted ever so slightly: we let the predictor be
arbitrarily reliable, but not perfect. Specifically then there is some 1 ≫ ε > 0 such that
the predictor has a 1 − ε chance of being correct in her prediction. Our agent forms the
following conditional credences:
Table 21: Conditional credences 7′
1/2
A predicted
B predicted
A
B
1−ε
ε
ε
1−ε
Comparative values are as before and the evidentially expected comparative values
in this problem are therefore:
Table 22: Evidentially expected comparative values 7′
1/2
A predicted
B predicted
Total
A
B
c(1 − w)(1 − ε)
−cwε
−dwε
d(1 − w)(1 − ε)
c(1 − w)(1 − ε) − dwε
d(1 − w)(1 − ε) − cwε
Theorem 3 If c > d then the evidentially expected comparative value of A exceeds that
of B regardless of the weighting.
Proof 3 We need to show ECV w (A) > ECV w (B) for arbitrary weighting w. We have
ECV w (A) = c(1 − w)(1 − ε) − dwε
(19)
ECV w (B) = d(1 − w)(1 − ε) − cwε
(20)
ECV w (A) − ECV w (B) = (c − d)((1 − w)(1 − ε) + wε)
(21)
and
Hence:
We’ll denote this quantity X. We now have three cases to consider: w = 1, w = 0
and 1 > w > 0. If w = 1 then X = (c − d)ε > 0 because ε, c − d > 0. If w = 0 then
X = (c − d)(1 − ε) > 0 because 1 − ε, c − d > 0 (since ε < 1). If 1 > w > 0 then
1 − w > 0, so only positive quantities are being added and multiplied in the RHS of
(21) and hence X > 0. By cases, X > 0 and hence ECV w (A) > ECV w (B).
The upshot of theorem 3 is that B maximizes ECV under no weighting and so is
never vindicated as a rational choice of action by BT. The same arguments that applied
earlier apply here: if I have a high prior that the predictor predicted B, then B seems
sensible. Furthermore B is ratifiable. One uses the power expects to have wisely if
one chooses B. Aside from these arguments, B simply does not look like an irrational
choice of action here. Either BT has gone wrong in this case or the benchmark theorist
needs to explain why choosing B, contrary to appearances, really is irrational.
25
5 Conclusion
A number of objections to benchmark theory have been put forth. In particular, it is suggested that the weak dominance principle, especially if the Independence of Irrelevant
Alternatives is to be rejected, bedevils the theory, and it seems that there are decision
problems for which BT yields wrong results—or at least results that are wrong both
intuitively and by the reasoning that Wedgwood offers elsewhere to defend the results
BT gives for other problems.
There are two conclusions that I want to draw. The first is that Gandalf’s Principle seems worth pursuing as a principle of rational choice and perhaps should find a
foundational role in a decision theory equipped to handle Egan’s counterexamples to
CDT. However, it seems that BT is not a promising avenue along which to do this. If
it is to succeed, then I suggest that, at the very least, a principle more subtle than weak
dominance needs to be motivated and incorporated into the theory.
The second concerns the wider implications for the theory of rational choice of the
considerations made. For one I think that these sorts of considerations place important
constraints on what a good decision theory might be like. I think there is good reason to
be hesitant to accept any decision theory that entails the weak dominance principle, but
I also think that it is unlikely that good independent grounds for motivating it can be
found. A weaker principle such as non-trivial weak dominance seems more plausible,
and—while it does not seem suitable for BT—it might deserve consideration as another
constraint that a decision theory should satisfy.25
References
[1] Briggs, R. (2010). Decision-Theoretic Paradoxes as Voting Paradoxes. Philosophical Review, 119(1), 1–30.
[2] Eells, E. (1981). Causality, Utility and Decision. Synthese, 48(2), 295–329.
[3] Egan, A. (2007). Some Counterexamples to Causal Decision Theory. Philosophical Review, 116(1), 93–114.
[4] Gibbard, A. and W. Harper. (1978). Counterfactuals and Two Kinds of Expected
Utility. In C. A. Hooker, J. J. Leach, and E. F. McClennen (Eds.) Foundations and
Applications of Decision Theory (pp. 125–62). Dordrecht: Reidel.
[5] Jeffrey, R. (1983). The Logic of Decision (2nd ed.). Chicago: University of
Chicago Press.
[6] Kahneman, D. and A. Tversky (1986). The Behavioural Foundations of Economic
Theory. The Journal of Business 59(4), S251–S278.
[7] Lewis, D. (1981). Causal Decision Theory. Australasian Journal of Philosophy,
59(1), 5–30.
25
26
[8] Marcus D.A., Schraff L., Turk D.C. (1997). A double-blind provocative study of
chocolate as a trigger of headache. Cephalagia, 17, 855–862.
[9] Nozick, R. (1969). Newcomb’s Problem and Two Principles of Choice. In
Nicholas Rescher (Ed.), Essays in Honor of Carl G. Hempel (pp. 114–146). Dordrecht: Reidel.
[10] Sen, A. (1993). Internal Consistency of Choice. Econometrica, 61(3), 495–521.
[11] Stalnaker, R. (1972). Letter to David Lewis. In William Harper, Robert Stalnaker,
and Glenn Pearce (Eds.), Ifs: Conditionals, Belief, Decision, Chance, and Time
(pp. 151–152). Dordrecht: Reidel.
[12] Wedgwood, R. (2011). Gandalf’s Solution to the Newcomb Problem Synthese,
doi:10.1007/s11229-011-9900-1
[13] Weirich, P. (2004). Realistic Decision Theory: Rules for Nonideal Agents in Nonideal Circumstances. New York: Oxford University Press.
27