Academia.eduAcademia.edu

A critique of benchmark theory

Benchmark theory (BT), introduced by Ralph Wedgwood, departs from decision theories of pure expectation maximization like evidential decision theory (EDT) and causal decision theory (CDT) and instead ranks actions according to the desirability of an outcome they produce in some state of affairs compared to a standard—a benchmark—for that state of affairs. Wedgwood motivates BT through what he terms Gandalf’s principle, that the merits of an action in a given state should be evaluated relative only to the performances of other actions in that state, and not to their performances in other states. Although BT succeeds in selecting intuitively rational actions in a number of cases—including some in which EDT or CDT seem to go wrong—it places constraints on rational decision-making that either lack motivation or are untenable. Specifically, I argue that as it stands BT is committed both to endorsing and rejecting the independence of irrelevant alternatives. Furthermore its requirement that weakly dominated actions be excluded from consideration of rational action lacks motivation and threatens to collide with traditional game theory. In the final section of the paper, I construct a counterexample to BT.

A Critique of Benchmark Theory Robert Bassett Final draft: Do not cite without permission September 1st 2014 Abstract Benchmark Theory (BT), introduced by Ralph Wedgwood, departs from decision theories of pure expectation maximization like Evidential Decision Theory (EDT) and Causal Decision Theory (CDT) and instead ranks actions according to the desirability of an outcome they produce in some state of affairs compared to a standard—a benchmark—for that state of affairs. Wedgwood motivates BT through what he terms Gandalf’s Principle, that the merits of an action in a given state should be evaluated relative only to the performances of other actions in that state, and not to their performances in other states. Although BT succeeds in selecting intuitively rational actions in a number of cases—including some in which EDT or CDT seem to go wrong—it places constraints on rational decision-making that either lack motivation or are untenable. Specifically, I argue that as it stands BT is committed both to endorsing and rejecting the Independence of Irrelevant Alternatives. Furthermore its requirement that weakly dominated actions be excluded from consideration of rational action lacks motivation and threatens to collide with traditional game theory. In the final section of the paper, I construct a counterexample to BT.1 1 Background Decision theory considers problems in which an agent—who might be ignorant to some extent about how the world is—is faced with a choice of (mutually exclusive) actions. The agent has (or may have, at least) credences, which we will take to be a Kolmogorovian probability distribution, over states, i.e. (roughly) mutually exclusive ways the world might relevantly be, and a utility function2 that assigns real numbers (values or 1 I am very grateful to Dorothy Edgington, Corine Besson, Dan Adams, Jonathan Nassim, Chris Sykes and to two anonymous Synthese referees for helpful suggestions and comments on this paper. 2 Specifically, this is typically taken to be a von Neumann-Morgenstern utility function, i.e. it is unique up to positive affine transformations: neither utility 0 nor utility 1 has special significance, but same-sized differences in utility are significant. Nothing in what follows turns on the details of these functions. Indeed, strictly speaking, utility is a measure of subjective preference and Wedgwood is keen to emphasize that he does not assume in his theory that such preferences are measurable by utility functions. In this paper, I use utility in a looser sense to denote the payoffs an agent associates with outcomes; these need not be interpreted as measures of subjective preference, but whatever values Wedgwood needs to underwrite the details of his theory. 1 utilities) to outcomes, pairs of actions and states. This encodes the agents preferences. Outcome O1 is assigned higher utility than outcome O2 iff O1 is preferred to O2 .3 1.1 Evidential Decision Theory and Causal Decision Theory Decision theorists largely agree that, at least in run-of-the-mill cases, rational decisionmaking consists in choosing actions that maximize expectation: the sum of utilities of outcomes weighted by their subjective chances of occurring.4 However, there is disagreement over how expectations should be calculated. Evidential Decision Theory5 (henceforth, EDT) states that utilities of outcomes should be weighted by the subjective probability conditional on the action in question, i.e. one should choose an action A that maximizes news value:6 EV (A) = ∑ U(A, Si )Cr(Si |A) (1) ∀i where U(A, Si) is the utility obtained by choosing action A when one is in state Si and Cr(Si |A) is the conditional credence given to state Si occurring given that action A is chosen.7 Newcomb’s problem and medical Newcomb cases provide plausible and well-known counterexamples to EDT.8 In Newcomb’s problem, a highly reliable predictor has two sealed boxes, A and B, and offers an agent a choice between taking both A and B or just B. Box A, which is transparent, contains £1000 for certain. Box B contains £1m if the predictor predicted she would choose just B and nothing if the predictor predicted she would take both. As the agent’s credence that the predictor will have correctly predicted tends to 1, the news value of choosing just B tends to £1000; that of choosing both boxes tends to £1m. Indeed the news value of choosing just B exceeds that of choosing both as long as the agent’s credence that the predictor will have made a correct prediction exceeds 1001 2000 . However, regardless of how the money has been distributed in the now sealed boxes, the agent is always better off by £1000 by choosing both boxes; choosing both dominates choosing just B. Although choosing just B is correlated with a higher payoff, this should not influence one’s choice because one’s choice of action plays no role in determining the distribution of monies. Medical Newcomb cases have a similar structure. Among migraineurs, migraines are correlated with eating chocolate, but research has demonstrated that chocolate consumption is not a trigger of migraines.9 Rather it is the case instead that pre-migraine tension can result 3 As sketched in the last footnote, more structure is usually built into these utility functions than mere monotonic preservation of preference orderings, but the details are not important here since this is not relevant for Wedgwood’s theory. 4 Some theorists advocate modifications to this basic model in order to account for e.g. the Egan cases discussed below. In this paper, I will focus only on Wedgwood’s response to these kinds of cases. 5 See Jeffrey (1983) for a classic treatment. 6 This term is taken from Gibbard and Harper (1978). 7 (Rational) conditional credences conform to Bayes’ Rule, i.e. Cr(P|Q) = Cr(P∩Q)/Cr(Q), for Cr(Q) 6= 0. 8 There is, however, more to be said in EDT’s defence against such cases than space permits here. See e.g. Jeffrey (1983) and Eells (1981) for responses to these cases. 9 For an example of such research, see Marcus et al. (1997). 2 in a craving for chocolate. Should a migraine sufferer refrain from assuaging her craving? Certainly not on the grounds that doing so will obviate a migraine. Again eating chocolate is correlated, among sufferers, with migraines, but this fact should play no role in a sufferer’s decision of whether to eat chocolate, since doing so plays no role in determining whether she will suffer an episode. In response to these counterexamples, some philosophers have advocated Causal Decision Theory (CDT)10 , according to which expected utility is calculated by weighting utilities of outcomes by probabilities of counterfactual conditionals: EU(A) = ∑ U(A, Si )Cr(A → Si ) (2) ∀i Here Cr(A → Si ) is one’s credence that if one were to choose A, state Si would obtain. Backtracking conditionals are excluded from consideration.11 Acting in accordance with CDT by maximizing EU, rather than in accordance with EDT and maximizing EV, one chooses the undominated action in Newcomb’s problem because one’s choice does not result in or cause the money to be present. Given how things are, one should never leave £1000 on the table. If B has the million then if one were to take A, it would continue to be the case that B has the million. Likewise, it handles the migraine case correctly. If a migraineur has a craving for chocolate then if she were to refrain from eating some, a migraine would (likely) ensue nonetheless. 1.2 Counterexamples to Causal Decision Theory Egan (2007) gives two counterexamples to CDT. In Murder Lesion, Mary chooses whether or not to shoot Alfred. Since the world would be a far better place without Alfred, she receives a high payoff if she shoots and hits him. If, however, she shoots but misses, Alfred, a grudge holder, will respond in a manner disagreeable to Mary, resulting in a very, very low payoff for her. If she does not shoot, she will receive the fairly low payoff due to anyone suffering under Alfred’s reign of oppression. Mary knows of a common brain lesion, sufferers of which attempt murder by shooting. Indeed most murder attempts are committed by such sufferers, compelled to act by the lesion. However, the lesion also causes the aim of sufferers to fail at the crucial moment, and so the vast majority of their murder attempts fail. In Psychopath Button, Paul chooses whether or not to press the ‘destroy all psychopaths’ button. Paul prefers there to be no psychopaths to there being any, but also prefers living in a world with psychopaths to his own death. Paul is fairly certain that he is not a psychopath and he is quite sure that only a psychopath would press the button. Mary should not shoot and Paul should not press the button. They will very likely cause their own demises if they do not heed this advice. However, CDT handles these scenarios incorrectly because it ignores the evidence that Mary’s shooting and Paul’s pressing the button would provide, namely that (it is very likely that) Mary is a terrible shot and that Paul is a psychopath. CDT ignores this evidence because Mary’s shooting 10 Such 11 For philosophers include Stalnaker (1972), Gibbard and Harper (1978) and Lewis (1981). an exposition of this constraint, see Lewis (1981). 3 would not result in her being a terrible shot and Paul’s pressing the button would not result in his being a psychopath. 2 Gandalf’s Principle and Benchmark Theory In this section I outline Ralph Wedgwood’s response to the Egan counterexamples described above. Wedgwood builds a decision theory crucially based on Gandalf’s Principle, a maxim to the effect that agents should evaluate the merits of an action only relative to the performances of other actions in the same state of the world, and not make such evaluations across states. The theory assigns benchmarks to relevant states of the world. Actions are then compared to the benchmark of each state (as described below). I discuss how benchmarks are assigned to states and make explicit an important constraint on this assignment. Wedgwood (2011) argues that rational decision-making should incorporate Gandalf’s Principle (GP): Gandalf’s Principle: The merits of an action in a given state should be evaluated relative only to the performances of other actions in that state, and not to their performances in other states. Gandalf’s Principle strikes me as an eminently sensible principle to incorporate into rational decision-making. It provides insight into the error one-boxers commit in Newcomb’s problem. A one-boxer might argue as follows: ‘Choosing just box B means it is very likely I shall walk away with a million. Choosing both boxes means it is very likely I shall walk away with only a piffling thousand. So I should choose just box B.’ In light of GP, we can see an error in the reasoning. A proponent of GP might counterargue: ‘It’s not part of your choice whether you’ll likely walk away with a million or a thousand. Whatever state you’re actually in now, it does you no good to compare what you’re best off actually doing now to how you might have done had things been different. That comparison does not serve to reflect the choice that you actually face. Choosing just box B just looks like wishful thinking!’ Wedgwood develops a theory based on GP. Benchmark theory (BT) requires only that comparisons of goodness of outcome are meaningful within states and that differences in goodness can be compared across states. Outcomes (i.e. action-state pairs) are evaluated relative to a benchmark for the state they occupy. It is in this way that the theory incorporates Gandalf’s Principle. By comparing the utility of taking an action in a state to a benchmark for that state, an agent avoids making cross-state comparisons of actions. 4 The difference between the utility of an outcome and the benchmark for the state it occupies is the comparative value for that outcome (let CV (A, Si ) denote the comparative value of action A in the ith state): CV (A, Si ) = U(A, Si ) − bi (3) The evidentially expected comparative value (ECV) of action A is: ECV (A) = ∑ CV (A, Si )Cr(Si |A) (4) ∀i BT, with some qualifications that I will discuss later, advises that a rational action is that which maximizes ECV. It is a theory of evidentially expected comparative value maximization. Briggs (2010) shows that BT agrees with CDT in any case where states are probabilistically independent of action, i.e. one’s credence that one is in a certain state does not vary across actions. Wedgwood notes that (at least with benchmarks that satisfy certain conditions) in two-action cases with a strictly dominating action (such as Newcomb’s problem), BT will always favour the strictly dominating action.12 Hence, unlike EDT, BT handles cases like Newcomb’s problem correctly. Additionally, Wedgwood notes, unlike CDT, BT also handles problems like Murder Lesion and Psychopath Button correctly too. 2.1 Benchmarks Wedgwood advocates using weighted averages of utilities in state S as benchmarks for S. Regret is what he terms a weighting that assigns weight 0 to all outcomes in a state except the best and relief is a weighting that assigns 0 to all outcomes except the worst. He says: Every measure of comparative value that results from using one of these kinds of weighted averages to set the relevant benchmarks is, in effect, a mixture of the two extreme measures that I mentioned earlier, “regret” and “relief”. (Wedgwood 2011, p. 24) There is presumably a tacit constraint at work here. It is reasonable to demand of a rational decision theory that it approaches isomorphic decision problems in the same way. For example, suppose that an agent is faced with a decision problem (decision problem 1) with the following payoff matrix: Let the agent have posterior credences Cr(S1 |A) = Cr(S2 |B) = p, i.e. she has the ‘evidential probability’ matrix given in table 2. Because of the symmetry of the problem it would be odd if a decision theory yielded, say, A as a rational choice of action and B as an irrational choice. There is a non-trivial automorphism in the problem (i.e. we can swap A and B and S1 and S2 to yield a problem that is the same in all relevant respects) that could be exploited to show 12 This does not generalize to cases with more options: BT, as I have presented it thus far, sometimes will choose a dominated action. See below for discussion. 5 Table 1: Payoffs 1 Action/State S1 S2 A B 10 1 1 10 Action/State S1 Table 2: Posterior credences 1 S2 A B p 1− p 1− p p that the same decision theory would yield B as a rational choice and A as an irrational choice. Such a theory would therefore be inconsistent. A simple way to satisfy this constraint and to work weighted averages as mixtures of relief and regret in the way that Wedgwood suggests is to rank outcomes by preference in each state before assigning weights (I take it that this is what Wedgwood has in mind). Specifically, then, in a decision problem with n actions and where Ωi is the set of outcomes in the ith state we take ri : Ωi 7→ {x ∈ Z+ : x ≤ n} such that ri (A j , Si ) > ri (Ak , Si ) if outcome hA j , Si i is strongly preferred to outcome hAk , Si i, and it is a free choice about the ranking of outcomes among which the agent is indifferent (since these will be assigned the same utility by the agent their relative ranking cannot make a difference to the benchmark). Benchmarks are set by choosing some probability mass function w (the weighting) with support {x ∈ Z+ : x ≤ n} and taking the benchmark of the ith state as: bi = ∑ U(A j , Si ).w ◦ ri (A j , Si ) (5) ∀j For example, the benchmarks for the problem described in table 1 would be calculated as follows if, say, we used a weighting that assigned 0·9 to the best (first) outcome and 0·1 to the worst (second). In state S1 , action A yields the highest payoff, so outcome hA, S1 i is ranked first in that state and outcome hB, S1 i is ranked second, i.e. r1 (A, S1 ) = 1 and r2 (B, S2 ) = 2. Our weighting, w, assigns weight 0·9 to the first outcome and 0·1 to the second, i.e. w(1) = 0·9 and w(2) = 0·1. Hence we calculate the benchmark for state S1 as follows: b1 = U(A, S1 ).w(r1 (A, S1 )) +U(B, S1 ).w(r1 (B, S1 )) = U(A, S1 ).w(1) +U(B, S1 ).w(2) (6) = U(A, S1 ).0·9 +U(B, S1 ).0·1 = 10.0·9 + 1.0·1 = 9·1 In state S2 , it is action B that has the highest payoff and hence is ranked first, so we have r2 (B, S2 ) = 1 and r2 (A, S2 ) = 2. The benchmark for the state is thus: 6 b2 = U(A, S2 ).w(r2 (A, S2 )) +U(B, S2 ).w(r2 (B, S2 )) = U(A, S2 ).w(2) +U(B, S2 ).w(1) = U(A, S2 ).0·1 +U(B, S1 ).0·9 = 1.0·1 + 10.0·9 = 9·1 (7) Of course, in this case, the automorphism in the problem described above is what secures that the benchmarks in both states are the same. In general, this need not be so. With this way of calculating benchmarks, let the evidentially expected comparative value of A according to weighting w be denoted ECV w (A). Different weightings give different results. Wedgwood suggests that every weighting is equally reasonable and offers two ways in which to interpret this. First, an agent could arbitrarily select a weighting and then maximize evidentially expected comparative value as chosen by that weighting. Second, we can distinguish between absolutely rationally permissible and absolutely rationally required actions. It is absolutely rationally required to prefer an action A to an action B iff for every weighting w, ECV w (A) ≥ ECV w (B) and for some weighting v, ECV v (A) > ECV v (B). It is absolutely rationally permissible to prefer an action A to an action B iff it is not absolutely rationally required to prefer B to A. Under the first suggestion it seems that it is rationally required to prefer A to B iff under every weighting w, ECV w (A) > ECV w (B). If, under some weighting v, A’s ECV is not greater than B’s then if agents can arbitrarily choose weightings, it cannot be that it is rationally required to prefer A to B. An agent could (arbitrarily) choose weighting v, under which B receives at least as great an ECV as A. If A and B are the only actions available then B maximizes ECV under v. If it is not rationally required to prefer A to B then by employing BT it should be possible to choose B, i.e. there should be some weighting such that B’s ECV is at least as good as A’s. 3 Dominated actions Dominated actions pose a problem for benchmark theory. It is widely agreed among decision theorists that—at least in problems with finitely many available actions—a dominance principle constrains rational choice of action, i.e. dominated actions should not be chosen by rational agents. There are decision problems in which maximizing expected comparative value results in choosing a dominated action. Wedgwood’s proposed solution to this problem is to rule out dominated actions from consideration before expected comparative values are calculated. He offers a supposedly benchmarktheoretic motivation for this solution. In this section, I discuss in more detail how ECV maximizaton behaves with respect to dominated actions. I consider a challenge to benchmark theory due to Briggs (2010): nearly dominated actions. I argue that Wedgwood’s motivation for ruling out dominated actions relies upon the Independence of Irrelevant Alternatives, a principle with which benchmark theory is inconsistent. Finally I argue that the weak dominance principle should be treated with caution anyway, and then consider and reject a candidate replacement principle. 7 3.1 Dominated actions and benchmark theory Action X strictly dominates action Y iff for every state Si , U(X, Si ) > U(Y, Si ), and weakly dominates Y iff for every state Si , U(X, Si ) ≥ U(Y, Si ) and for some state S j , U(X, S j ) > U(Y, S j ). An action is strictly (weakly) dominated iff there is an action that strictly (weakly) dominates it. As stated earlier, in two-action decision problems, BT always (i.e. with any weighting) rejects strictly dominated actions. In light of the constraints on benchmarks suggested above, we can now show a cluster of results concerning how BT behaves with respect to dominated actions in two-action problems. Theorem 1 In a decision problem with only two actions A and B, if A weakly dominates B then for any weighting w, ECV w (A) ≥ ECV w (B). Proof 1 Let A and B be the only actions available in an n-state decision problem and let A weakly dominate B. Hence ∀i U(A, Si ) ≥ U(B, Si ) and ∃ j U(A, S j ) > U(B, S j ). Since A weakly dominates B, for every i we can choose the ranking function ri so that ri (A, Si ) > ri (B, Si ) (because we have a free choice about the ranking in states where the utility of A is equal to that of B). We choose an arbitrary weighting that assigns w ∈ [0, 1] to the top-ranked action (which is always A) and 1 − w to the bottom ranked action (which is always B). The benchmark for state Si is therefore bi = w.U(A, Si ) + (1 − w).U(B, Si ) (8) The comparative values in state Si are: CV (A, Si ) = U(A, Si ) − bi = (1 − w)(U(A, Si ) −U(B, Si ) (9) CV (B, Si ) = U(B, Si ) − bi = −w.(U(A, Si ) −U(B, Si ) (10) Since A weakly dominates B, we have U(A, Si ) ≥ U(B, Si ) for every Si and for some S j , U(A, S j ) > U(B, S j ). Hence ∀i U(A, Si ) −U(B, Si ) ≥ 0 and since 1 − w ≥ 0, CV (A, Si ) = (1 − w)(U(A, Si ) −U(B, Si ) ≥ 0 (11) CV (B, Si ) = −w(U(A, Si ) −U(B, Si ) ≤ 0 (12) Since w ≥ 0, Similarly, U(A, S j ) −U(B, S j ) > 0 and therefore either 13 or 14 holds: CV (A, S j ) ≥ 0 > CV (B, S j ) (13) CV (A, S j ) > 0 ≥ CV (B, S j ) (14) Since credences are non-negative we therefore have: CV (A, S j )Cr(S j |A) + ∑ CV (A, Si )Cr(Si |A) ≥ ∀i6= j CV (B, S j )Cr(S j |B) + ∑ CV (B, Si )Cr(Si |B) (15) ∀i6= j But the LHS and RHS of 15 are respectively ECV w (A) and ECV w (B), wherefore ECV w (A) ≥ ECV w (B).  8 Theorem 2 In a two-action decision problem with actions A and B, if A weakly dominates B and the agent facing the problem believes that it is possible that she is in a state in which the utility from choosing A exceeds that from choosing B in the sense that there is some state S j such that U(A, S j ) > U(B, S j ) and Cr(S j |A) +Cr(S j |B) > 0, then for any weighting w, ECV w (A) = ECV w (B) only if w assigns weight 0 or 1 to the top-ranked action. Proof 2 We have the same set up as in the proof of theorem 1 and the condition that there is some state S j such that U(A, S j ) > U(B, S j ) and not both Cr(S j |A), Cr(S j |B) are zero. We assume that w ∈ (0, 1) is the weight assigned to the top-ranked action. Again, we choose a ranking function that prefers A in states where A and B are tied. Since 1 > w > 0 we obtain a stronger result than the disjunction of 13 and 14 in the proof of theorem 1: CV (A, S j ) > 0 > CV (B, S j) (16) Hence and since at least one of Cr(S j |A) and Cr(S j |B) is positive, we have: CV (A, S j )Cr(S j |A) > CV (B, S j)Cr(S j |B) (17) Therefore: CV (A, S j )Cr(S j |A) + ∑ CV (A, Si )Cr(Si |A) > ∀i6= j CV (B, S j )Cr(S j |B) + ∑ CV (B, Si )Cr(Si |B) (18) ∀i6= j  There are two further results worth noting. It is possible in a two-action decision problem with a weakly dominated action that the ECVs of the actions are equal even if the agent believes it possible to be in a state where the dominating action gives higher utility. By theorem 2 this requires choosing a weighting that assigns 0 or 1 to the top-ranked action. The trick (and this is the only way it can be done) is to set CV (A, S j )Cr(S j |A) = CV (B, S j)Cr(S j |B) = 0, where A weakly dominates B and S j is any state in which A outperforms B, by choosing the weighting such that one of the comparative values is 0 (by setting the weight assigned to the top-ranked action to 0 or 1) and ‘setting’ the credence on the other side to 0. In a two-action problem with a strictly dominated action, the constraint on credences in theorem 2 is met. The agent believes she is in some state and in such a problem the dominating action gives higher utility in every state, so the agent believes she is some such state. By theorem 2 the ECVs of the two actions will be the same only if the weighting assigned to the top-ranked is 0 or 1. But the trick described above now is not available, since this would involve setting all the agent’s posterior credences for one of the actions to 0. Hence a strictly dominating action always has the higher ECV in a two-action problem. How one interprets these results depends in part on how one regards the relationship between weightings and rational choice of action. At the end of the last section, the two suggestions Wedgwood makes about how to regard this relationship 9 were outlined. Under both it is permissible to choose a weakly dominated action if Cr(S j |A) = Cr(S j |B) = 0 whenever U(A, S j ) > U(B, S j ), where A weakly dominates B. Under the first suggestion, even if this condition does not hold, it is sometimes permissible to choose a weakly dominated action. This involves choosing a weighting that assigns 0 or 1 and for credences to hold in the way described above. Under the second suggestion, it follows as an immediate corollary to theorems 1 and 2 that if this condition does not hold in a two-action decision problem then it is never permissible to choose a weakly dominated action in it. An analogue of these results does not hold for decision problems with more than two actions. For example, consider the problem (decision problem 2) with the payoff and evidential probability matrices given in tables 3 and 4. Table 3: Payoffs 2 Action/State S1 S2 S3 Table 4: Posterior credences 2 Action/State S1 S2 S3 A B C 9 6 8 A B C 0 3 4 0 4 5 1 0 0 0 1 0 0 0 1 According to the weighting that assigns 0·3 to the two top outcomes in a state and 0·4 to the worst, B has the highest ECV, followed by C and A. Hence, according to BT (as far as I have presented it here) it is rationally permissible to prefer B to A and C, but B is strictly dominated by C. Wedgwood is aware of this complication and writes: The problem is that, in fact, in every situation in which we have to make a choice, there is an enormous number of perfectly dreadful courses of action that are at least physically (if not psychologically) available. (Wedgwood 2011, p. 20) Wedgwood (2011, p. 23) meets the challenge to BT posed by dominated actions by ruling that they ‘should not be taken seriously in practical reasoning, and so should be excluded from consideration altogether.’ In response to Briggs’ suggestion that this move is ad hoc, Wedgwood argues that the proponent of BT has a perfectly good reason for ruling out weakly dominated actions – namely that a weakly dominated action A will lose out on a pairwise comparison with its dominating action B and therefore ‘there is nothing to be said in favour of A that cannot also be said in favour of B, while there is something to be said in favour of B that cannot be said in favour of A.’ 3.2 Nearly dominated actions Briggs (2010) raises the case of nearly dominated actions13 as an objection to BT and, in particular, Wedgwood’s proposal about ruling out weakly dominated actions. These are derived from weakly dominated actions by sweetening their utility very slightly 13 This is Wedgwood’s term. 10 in some state. We might formally define them as follows: Action A nearly dominates action B iff there is small positive ε such that there is some state S j in which U(B, S j ) = U(A, S j ) + ε, some state Sk in which U(A, Sk ) > U(B, Sk ) and for every state Si 6= S j , U(A, Si ) ≥ U(B, Si ). An action is nearly dominated iff there is some action that nearly dominates it. Briggs argues that such actions will not be ruled out by Wedgwood’s proposal of ruling out weakly dominated actions and therefore will be in the running for rationally permissible actions. Here Wedgwood bites the bullet and replies that they should be. I agree. Suppose A nearly dominates B and let S j be the state in which B performs ever so slightly better than A. If an agent has very good reason to think that she is in state S j then she has very good reason to think that she should choose action B, for it results in a (very slightly) higher payoff than action A. Under the right circumstances, both EDT and CDT will agree that B should be chosen. For instance, suppose for some very small δ the agent has credences Cr(S j |A) = Cr(S j |B) = 1 − δ (in the case of EDT) or Cr(A → S j ) = Cr(B → S j ) = 1 − δ (in the case of CDT). Where U A is the largest ε payoff A ever results in and UB is the smallest payoff B ever results in, if δ < U A +U B +ε then both CDT and EDT prefer B to A. After all, these are theories that maximize expected value, and no matter how small ε is, it is always possible to find credences that result in B having greater expectation than A as long as credences can be arbitrarily small. Nearly dominated actions are, I think, besides the point. If they are a problem for BT then they are already a problem for EDT and CDT, but it is far from clear that they are a problem anyway: it is intuitive that sometimes it could be rational to choose a nearly dominated action. Indeed, note that two actions can nearly dominate each other. If these are the only actions available to an agent, one of them must be chosen. It would be impossible not to choose a nearly dominated action, so it seems unreasonable to rule that under such circumstances, choosing a nearly dominated action is irrational. In contrast, weak domination is an asymmetric relation. It is never possible to have two actions weakly dominate each other.14 Although Briggs’ argument from nearly dominated actions fails, there is nevertheless a case for Wedgwood’s proposal of ruling out weakly dominated actions being ad hoc. By his own lights, Wedgwood’s argument seems to fail. The reason why concerns the Independence of Irrelevant Alternatives. 3.3 The Independence of Irrelevant Alternatives There is an apocryphal story about Sidney Morgenbesser ordering dessert. When offered a choice between apple pie and blueberry pie by a waitress, Morgenbesser 14 In a decision problem with infinitely many actions to choose from (and only such problems), every action can be strictly (or weakly) dominated. It does not seem in general therefore that it is unreasonable to rule that every action in a problem is irrational to choose. In an infinite problem, there might always be a better action available than whatever one chooses, and this is why one’s choice might always be irrational. This plainly cannot be the case in a finite problem. In particular in a two-action problem in which both actions are nearly dominated, they cannot both be better than each other. An anonymous reviewer comments that it is implausible that every action in a decision problem be irrational to choose. While I’m not sure that I agree with this point, if it is true then it serves to emphasize that it could be rational to choose a nearly dominated action. 11 plumped for apple. The waitress took the order and left, only to return a minute later to say that there was now also cherry pie on the menu. Morgenbesser replied that in that case, he would take the blueberry. A money-pumping argument can be made against someone with such preferences. Suppose Morgenbesser alternates between situations in which he can choose between A and B and situations in which he can choose among A, B and C. Let us also say that these choices will take the form of trade offers: he starts with a default option and can offer to trade it by paying some premium for one of the other options, and we will assume that if he prefers one option to another then there is some amount he is willing to pay in order to have the former rather than the latter. We initialize Morgenbesser with a default B, which he will trade, at a premium p > 0, for A (since this is an A, B choice scenario and he prefers A to B in such a scenario). If we introduce C onto the menu, Morgenbesser will now trade A back, at a premium r > 0, for B. We’ve extracted p + r > 0 from Morgenbesser and returned him to his initial endowment of B. We can iterate this process to extract arbitrarily large amounts of money from Morgenbesser in return for nothing. The error, if it can be described as such, that an agent such as Morgenbesser is committing here is to violate the Independence of Irrelevant Alternatives (IIA), i.e. his preferences between two options are not constant when new options are provided. There are situations in which this violation seems very reasonable (and indeed the money-pumping argument above fails). Sen (1993) discusses three kinds. The first he terms positional choice. There may be certain rules that govern some choice situations that upset IIA. Sen’s examples are ‘do not take the largest slice’ and ‘do not take the last apple’. The second concerns the epistemic value of additional choices. A clean-living agent may prefer to accept an invitation to take tea with an acquaintance to declining, but prefer to decline any invitation if the acquaintance offers the option of snorting cocaine as well. The idea at work in this example is that the very presentation of the option to snort cocaine has caused the agent to revise her beliefs about how enjoyable taking tea with her acquaintance would be. Sen’s third case is freedom to reject. The point of a hunger strike is to reject emphatically the option of eating well. If the only alternative to a complete fast is scarcely eating at all, an agent’s preferences might invert. Wedgwood discusses incommensurability as another kind. Where an agent has a non-total preference relation, certain options A and B may be such that neither is preferred to the other, yet the agent is not indifferent between them. Perhaps this could be because the options in question are so unalike and so outré that the agent cannot compare them. Now a sweetened version of A, A+ (which is just like A except the agent also gets some minor additional benefit, say a bonus of £1) is strictly preferred to A. A and B are to be considered rationally permissible when they are the only options, but in the presence of another option A+, A is no longer considered rationally permissible, although B might be. Since B is rationally permissible in this situation but A is not, B can now be considered, in some sense at least, preferred to A. Wedgwood is aware that BT violates IIA. An action A that is always rejected by BT in favour of another B when just A and B are available may be chosen by BT when another action C is available. He offers the following example.15 An agent is faced 15 I have, for clarity of exposition, relabelled actions to accord with my description of IIA above. However, the structure of the problem and the values used for credences and utilities are true to Wedgwood’s example. 12 with a two-action, two-state decision problem (problem 3) with the following utilities and posterior credences: Table 5: Payoffs 3 Action/State S1 S2 Table 6: Credences 3 Action/State S1 S2 A B A B 0 0 900 1800 0.1 0.9 0.9 0.1 Since B weakly dominates A in this example and it involves only non-zero posterior credences, it is easily seen by the results of subsection 3.1 that the ECV of B is always greater than A in this problem. Now suppose a new action is introduced to the problem (problem 4) as described in tables 7 and 8. Table 7: Payoffs 4 Action/State S1 S2 A B C 0 0 2000 Table 8: Posterior credences 4 Action/State S1 S2 A B C 900 1800 0 0.1 0.9 0.1 0.9 0.1 0.9 There are weightings such that in this ‘expanded’ problem the ECV of A is larger than the ECV of B.16 The preference for B over A in the two-action problem of a BT-sensitive agent who used such a weighting is inverted when the new option C is introduced. This violates IIA. Wedwood’s response is to reject IIA. He does ...concede that there are many cases where the IIA does indeed hold. In particular, the IIA certainly seems to hold in cases that involve (i) no incommensurability, and (ii) no probabilistic dependence of the states of nature on the agent’s choices. (Wedgwood 2011, p. 28) Sen’s counterexamples suggest to me that IIA certainly seems not to hold in certain nevertheless commensurable and probabilistically choice-independent cases. However, Wedgwood’s reason for rejecting IIA seems to be entirely grounded in considerations of incommensurability.17 However, the example he gives of BT’s inconsistency with 16 Wedgwood claims that wherever the benchmark is set, BT requires choosing A. This is false. There is a relatively small region of values (where the weighting given to the top-ranked action is small in comparison to the weighting assigned to the second-ranked action) where the ECV of B exceeds that of A. In particular 63 < 72w − 16v, where v is the weighting given to the top-ranked action and w that given to the second, is a sufficient and necessary condition for B to receive a higher ECV than A. As an aside, C is never preferred to A in this problem but there is a small region of values in which C receives a higher ECV than B. 17 He cannot use probabilistic choice dependence as any grounds for rejecting IIA because his only reason for thinking that IIA fails when there is choice dependence is that BT is inconsistent with IIA under these circumstances. It would be begging the question to use this as a defence of BT. As far as I am aware, he has no independent reason for thinking that IIA fails in these dependency situations. 13 IIA has nothing to do with incommensurability. Plainly for BT to work, actions need to be commensurable within states – otherwise, how are we to obtain comparative values? It is also clear that Wedgwood’s example is not a case of positional choice, epistemic value or freedom to reject. So why should we accept that the example Wedgwood gives is a case where IIA genuinely fails? It cannot be just because BT tells us so. Wedgwood needs an independent reason to motivate the claim that we should let go of IIA in this example, particularly since, outside of special circumstances, there are good reasons, such as the money-pumping argument, for thinking it should constrain rational decision-making. It does not even seem that the example at hand is of the same kind as the incommensurability example. In the incommensurability example, we had a clear reason why B is preferred to A when A+ is available: because A+ knocks A out of the running. No such explanation is at hand in the example Wedgwood considers. But even if Wedgwood can find such independent reasons, he is likely to be at odds with his own argument for ruling out weakly dominated actions. In response to the charge that it was ad hoc to do so, he gave a reason why a benchmark theorist should rule out weakly dominated actions, namely that pairwise comparisons in BT of weakly dominated actions with actions that dominate them result in the dominating action being preferred under every weighting (modulo the special cases identified in subsection 3.1). Surely the argument tacitly appeals to IIA – why else think that the pairwise comparison is any use to guiding decisions in the more inclusive problem? However, Wedgwood writes: The “contracted” choice situation is simply a different situation from the “expanded” choice situation. It is logically impossible for any agent to be in both situations at the same time. It cannot be the case both that the only options available to you are A and B, and also that your available options include a third option C as well. The central idea of my approach is precisely that the crucial factor in rational decision-making is the way in which all the available options compare with each other within each state of nature, and so my approach will obviously accept that two choice situations in which different options are available will be crucially different from each other—at least so long as all of those options are ones that deserve to be taken seriously by a rational deliberator. (Wedgwood 2011, p. 26) If weakly dominated actions are not among those that deserve to be taken seriously by a rational deliberator then Wedgwood owes us an account to explain why. The only account he gives is that a benchmark theorist has reason to rule them out in pairwise comparisons, which aside from courting circularity, stands completely at odds with what he says in the passage quoted above. 3.4 Weakly dominated actions and weakly dominated strategies There are reasons to be cautious about throwing out weakly dominated actions in all cases anyway. Call the following principles respectively the weak dominance principle and the strict dominance principle: 14 Weak dominance principle: Where all relevant states are causally independent of A and B, and A weakly dominates B, B should not be chosen. Strict dominance principle: Where all relevant states are causally independent of A and B, and A strictly dominates B, B should not be chosen. The weak dominance principle is stronger than the strict dominance principle in the sense that it entails it. Some authors use ‘dominance principle’ to mean the weak dominance principle, while others use it to mean the strict dominance principle.18 In this subsection, I shall present some arguments for being, at least, suspicious of the weak dominance principle. In the next subsection I discuss the weak dominance principle in relation to benchmark theory in more detail. CDT implies the strict dominance principle but not the weak dominance principle. CDT will reject weakly dominated actions (causally independent of states) in all cases where the prior credence given to some state in which a weakly dominated action is defeated by its dominator is non-zero (and it is clear therefore that it always rules out strictly dominated actions, since such actions always face some state with non-zero prior credence in which it is defeated by a dominator). Where this is not the case, CDT does not necessarily reject them. Suppose I have a choice between taking an old lottery ticket and not taking it. I know that the ticket wasn’t a winner because I already know the outcome of the lottery. The decision problem (‘Lottery ticket’) I face is described by table 9. Action/State It’s a loser Table 9: Lottery ticket It’s a winner Take it do not take it £0 £0 £1,000,000 £0 I assign zero credence to the ticket’s being a winner (and deem the state probabilistically independent of my choice) and so it seems perfectly rational not to take it, even though this is weakly dominated, which is exactly what EDT and CDT advocate anyway. It is also absolutely rationally permissible to take it under BT, unless we eliminate weakly dominated actions beforehand.19 Let us change the situation a little. I no longer deem the state statistically independent of my choice. I know that I take the ticket if and only if it is a winner. In this case, I have conditional credences Cr(it is a winner|I take it) = Cr(it is a loser|I do not take it) = 1. Now both EDT advises me to take it and rules out not taking it. Under BT it is absolutely rationally required to take it, since the ECV of taking it exceeds the ECV of not taking it unless a weighting is used that assigns 1 to the top-ranked action, in which 18 Nozick (1969) and Kahneman and Tversky (1986) are examples of the former; Weirich (2004) is an example of the latter. 19 Note that Briggs’ proof that BT agrees with EDT and CDT in cases where the state of the world is probabilistically independent of one’s choice of action assumes that weakly dominated actions are not ruled out beforehand. 15 case the ECVs are equal. But clearly taking it is bonkers if I know it is a sure loser because I already know the results of the lottery! A natural line to take here is that, in problems with finitely many states, both EDT and BT should ignore from all consideration any states that are assigned zero prior credence. There is good independent reason for doing so: since these are states the agent anticipates will never happen, they should not be considered at all relevant to the decision. I will return to this proposal in the next subsection. There is another, more decisive reason for at least advocating caution with respect to weakly dominated actions. An open problem in decision theory is how it is to be unified with game theory. For instance, one of game theory’s main tools for predicting the behaviour of rational agents is Nash equilibrium, but there is no thorough-going decision-theoretic explanation for why it should be rational to play as part of an equilibrium. Some games have only (pure) Nash equilibria that involve weakly dominated strategies. For example consider the two-player game with the following normal form matrix: Table 10: Game 1 1/2 U C D L 1, 1 0, 1 1, 1 M 1, 1 0, 2 2, 2 R 1, 0 3, 0 2, 3 The only pure strategy Nash equilibrium of this game is < U, L >, but both U and L are weakly dominated. The reduced game (matrix below) from which U and L are eliminated has no pure strategy equilibrium. Table 11: Game 2: game 1 reduced 1/2 C D M 0, 2 2, 2 R 3, 0 2, 3 There is a mixed strategy equilibrium in game 1 that effectively involves eliminating both U and L (because they are both played with zero probability in this equilibrium): < (0, 31 , 13 ); (0, 31 , 23 ) >, which corresponds to the unique equilibrium in game 2, < ( 31 , 32 ); ( 13 , 23 ) >. Some Nash equilibrium will always remain after eliminating weakly dominated strategies in a game with only finitely many strategies.20 However, this fact does not immediately justify discarding other equilibria. Yet it seems that the sort of argument Wedgwood gives to justify eliminating weakly dominated actions in decision 20 Proof sketch: First note that since weak domination is irreflexive and transitive, every player with only finitely many strategies must have some undominated strategy. Nash showed that every finite game has a Nash equilibrium, and hence if we reduce a (finite) game Γ by eliminating a weakly dominated strategy, the resulting game will have one, π . But every strategy did at least as well in the non-reduced game Γ, since we only eliminated a weakly dominated strategy. Therefore π must contain only best responses in Γ and is therefore Nash in Γ. It follows by induction that equilibria of Γ will remain regardless of how many weakly dominated strategies are eliminated (or indeed if they are eliminated iteratively). 16 problems would also justify eliminating weakly dominated strategies in games. For example, from player 1’s perspective in game 1, there is nothing to be said in favour of U that cannot also be said in favour of D. It seems reasonable to expect that, if game theory is to be unified with decision theory, the basis upon which a rational agent makes a decision in a non-strategic problem will be the same as that upon which she makes a decision in a strategic (i.e. gametheoretic) problem. In particular then, we should expect that if the weak dominance principle holds in decision theory, this will leak into game theory. For, in the sort of simple game described above, a player faces choice under uncertainty that looks familiar enough to the decision theorist: there are possible states of the world (i.e. whatever strategies have been chosen by other players) and which state obtains together with the player’s choice of action determines her payoff. That these states are determined by the actions of other agents facing similar problems might be relevant in choosing an optimal action, but it does not seem relevant in characterizing the problem the agent faces in this more decision-theoretic style. In game 1, for example, player 1 might be uncertain about whether player 2 has chosen (or will choose) L, M or R. That which of these states obtains has been decided by player 2, who is a player with such-and-such preferences, might inform player 1 about what credences are rational to hold about which state obtains. However, beyond this, it does not seem that there is any basis on which player 1’s rational choice of action can be described as fundamentally different to a choice made in a non-strategic context.21 In this case, if the weak dominance principle holds, it holds for game-theoretic problems too. There is another problem with eliminating weakly dominated strategies. If rational players do not play weakly dominated strategies, then rational players with common knowledge of rationality should eliminate weakly dominated strategies to obtain a reduced game, with common knowledge that it is this reduced game that they are playing. If the reduced game has weakly dominated strategies, they should likewise eliminate these by the same rationale, and so on. In other words, eliminating weakly dominated strategies motivates iterated elimination of weakly dominated strategies. However, what remains after weakly dominated strategies have been iteratively eliminated depends upon the order in which they were removed. A similar problem concerns the credences that players in a game might be expected to have given their expectations about what other players will do. Recall that earlier it was suggested that BT should ignore any states that are assigned zero credence. In game 3 (below) if both players have common knowledge of rationality and rational players do not play weakly dominated strategies then player 1 should give zero credence to player 2’s playing L, and so considers herself to face game 4. U was weakly dominated in game 3 but is not in game 4. In this case, the outcome might reasonably be expected to be < U, R >, even though it involves a weakly dominated strategy, U. It is difficult to see what a theory of games incorporating the weak domination principle might say about such a case. 21 Or at least if there is, this seems to cast a long shadow over the prospects of unifying decision theory and game theory. 17 Table 12: 1/2 L U 1, 1 0, 1 C 2, 1 D 3.5 Game 3 M R 2, 1 2, 3 3, 2 1, 0 2, 2 2, 2 Table 13: Game 4 1/2 M R U 2, 1 2, 3 3, 2 1, 0 C 2, 2 2, 2 D Benchmark theory and elimination of weakly dominated actions I have argued that Wedgwood’s argument for eliminating weakly dominated actions fails and that there is good reason not to eliminate them in the way he suggests. It is worth examining how grave a problem this is for the benchmark theorist. After all, one might think that if the weak dominance principle fails and ECV-maximization sometimes chooses weakly dominated actions, there is no tension here and no difficulty the benchmark theorist faces. I do not think that this is the case. For one thing, it is far less controversial that the strict dominance principle holds, but ECV-maximization sometimes yields strictly dominated actions as permissible.22 Let us assume that the benchmark theorist has some independently well-motivated grounds for eliminating strictly dominated actions. Do considerations about dominated actions pose any further difficulties? The problem for BT with respect to weakly dominated actions is not just that sometimes they do not seem irrational, it is this plus the fact that sometimes they do seem irrational and BT does not rule them out simpliciter in those cases. Action A in problem 3 is a case in point. If prior credences about states are non-zero, A strikes me as irrational to choose here. I do not think I need to provide a full explanation here of why I think this, but that it is weakly dominated would doubtless enter into such an explanation were I to give one. Since A sometimes maximizes ECV in this problem, it behooves the benchmark theorist to explain how it is to be eliminated as impermissible by benchmark theory. Since I do not think that in general weakly dominated actions can be systematically eliminated as irrational for the reasons described in the last subsection, and I do not think that the benchmark theorist has good independent reason for doing so, the benchmark theorist cannot hope to win me over by appealing to the weak dominance principle. I do not accept that this principle explains why A is irrational here because I think the principle is false. If the arguments given in the last subsection are correct and the weak dominance principle is false, then presumably a good place for the benchmark theorist to start in eliminating an action like A in the example discussed above is to seek a principle weaker than weak dominance. Since the defence of weakly dominated actions given in the previous subsection concerned to large extent states certain not to occur, this is a natural place to begin constructing such a principle. Say that action X non-trivially weakly dominates action Y iff X weakly dominates Y and there is some epistemically possible state S j such that U(X, S j ) > U(Y, S j ). An action is non-trivially weakly dominated if some action non-trivially weakly dominates it. Consider the following princi22 Of course, in the benchmark theorist’s defence, independent motivation for the strict dominance principle will be more easily found than that for the weak dominance principle. 18 ple: Non-trivial weak dominance:Where all relevant states are causally independent of A and B and A non-trivially weakly dominates B, B should not be chosen. At first blush, endorsement of this principle enables the benchmark theorist to exclude undesirable actions like A in the example above, while not making absurd rulings on (trivial) weakly dominated actions like in the lottery ticket example given earlier. Furthermore, since every strictly dominated action is non-trivially weakly dominated, the non-trivial weak dominance principle entails the strict dominance principle. Hence countenancing non-trivial weak dominance should not count as a further complication to the theory given that it countenances strict dominance: motivating or justifying this principle already motivates or justifies strict dominance. Despite its initial appeal, I will argue that non-trivial weak dominance raises fresh problems for the benchmark theorist. The problem, which I shall spell out in greater detail below, roughly, is this. Where an agent believes that if some state turns out to be actual then she must have taken some particular action23 then it is plausible that if the agent has ruled out that action then that state is no longer an epistemic possibility. However, this means that what actions are ruled out by an agent can affect which remaining actions count as non-trivially dominated. The result is that some actions may or may not be eliminated as irrational depending only on the order in which the agent chooses to eliminate actions. The notion of epistemic possibility at play here needs to be clarified. An objection to the above might run that it is not at all plausible that if an agent rules out such an action then such a state is no longer an epistemic possibility. After all, even if I rule out some action as irrational, I might slip or stumble, or be deceived by an evil demon, or change my mind, say ‘to hell with rationality’ and choose it anyway. This objection cannot get very far. If the non-trivial weak dominance principle is to do work for the benchmark theorist then a much more robust conception of epistemic possibility than that articulated in this objection is needed. Although anticipating making an error might sometimes be relevant in rationally choosing an action, it is not helpful to include such possibilities as relevant epistemic possibilities because so doing threatens to impede the usefulness of the non-trivial weak dominance principle. Suppose I faced a choice problem in which one action involved—the exact details of which depending upon the state of the world—being gruesomely murdered and another action involved—the exact details of which depending upon the state of the world—being heavily rewarded. I could say with firm conviction that it is certain I would not choose gruesome murder. My choosing it would not be a relevant possibility, even though in the same breath I could accept that, in some sense, it is an epistemic possibility that I would make such a choice. Whatever sense that might be is not the one that should be read in the formulation of non-trivial weak domination and hence of the non-trivial weak dominance principle, for if it were, the principle would be far too strong. It would rule out A in the example above, but it would also rule out not taking the lottery ticket. 23 This, of course, should not be given a causal or counterfactual reading. In the lottery ticket example, if it turns out that the ticket is a winner then I must’ve chosen it. In Newcomb’s problem, if it turns out that box B has the million then I must’ve chosen to one-box. 19 I am certain that it is a losing ticket: I saw the results last night; my memory’s good; I am not drunk or otherwise intoxicated; it was announced on the news that no ticket bought had won, etc. Now of course, it remains an epistemic possibility in some very weak sense that it is not a losing ticket (I am or was being deceived by an evil demon, etc.), but that possibility is not relevant to my decision. A natural way to define the requisite notion of epistemic possibility, given that it arises in a decision-theoretic environment, is to do so in terms of agents’ credences. We cannot say that a state S is epistemically possible for agent α iff α assigns S positive prior credence. This condition is too strong. A state assigned zero prior credence by an agent need not be a state the agent believes cannot occur. It is a state the agent is almost sure cannot be actual. Examples of problems with uncountably many states provide obvious examples. Suppose I have a spinner that can take values between 0 and 2π. I spin the spinner, keeping its value hidden from you, and invite you to choose an integer between 0 and 6. I tell you that once you have chosen, I will give you (x − a)2 utils, where x is the spinner’s value and a is your choice. If you uniformly distribute your credences about the spinner’s value, then for any value between 0 and 2π you will have credence 0 that the spinner has taken on that value. But clearly we should not interpret you as believing, for any value, that that value cannot be the spinner’s actual value. There is another problem with defining epistemic possibility in the way described above. An agent might not have well-defined prior credences over all states when presented a decision problem (even though posterior credences for states given actions might be defined). The above definition precludes such states from counting as epistemically possible. This is absurd, as the limiting case where an agent has no welldefined prior credences over any states demonstrates. Instead the right way to define epistemic possibility for present purposes seems to be as follows: a state S is epistemically impossible for agent α iff α’s prior credences are such that α is sure that S is not actual. A state S is epistemically possible iff it is not epistemically impossible. For simplicity, in the following exposition, we will consider only problems such that if an agent assigns a proposition zero credence then that agent is sure that that proposition is false. Suppose that a perfectly reliable predictor put £1 million in box Z iff she predicted that I’d choose gruesome death. I am sure that I will not choose gruesome death, so I am sure (since the predictor is infallible) that the predictor did not predict I would choose gruesome death. I am sure that the actual state of the world is one in which that prediction wasn’t made, that the predictor did not put £1 million in box Z, etc. This state of the world is, in the sense described above, epistemically impossible. This seems like a perfectly reasonable pattern of reasoning to enter into while deliberating about a rational course of action. It should not be confused with the far stronger, widely disputed and presumably false claim that an agent need assign prior credences to all available actions before a rational action can be chosen. Now consider problem 5: If when confronted with the problem, the agent regards all three states as live possibilities, i.e. for each of them is not sure that it will not occur, then both A and B are non-trivially weakly dominated by C. Since A is non-trivially weakly dominated, our agent employs the non-trivial weak dominance principle to reject it as a rational course of action. It is ruled out by the agent as a course of action: an action the agent 20 Table 14: Payoffs 5 Action/State S1 S2 S3 Table 15: Posterior credences 5 Action/State S1 S2 S3 A B C 8 8 8 A B C 4 3 4 3 5 5 0.4 0 0 0 0.7 0 0.6 0.3 1 is sure not to choose. But since state S1 is the actual state only if A is chosen, state S1 is now regarded as epistemically impossible. But since S1 is the only state in which the utility derived from C exceeds that from B, it is no longer the case that C non-trivially weakly dominates B. ECV-maximization together with non-trivial weak dominance principle leaves B as permissible at this stage, although A was eliminated as impermissible. However, if the agent had ruled out B as non-trivially weakly dominated when first approaching the problem, A would have been similarly left as permissible. Finally, the agent could have ruled both A and B out at the same time, leaving only C as permissible. Presumably it is this last arrangement of rejections that is the right one, but this fact is not explained or justified by the non-trivial weak dominance principle. Indeed it undermines the claim that it is correct, since it was by application of the principle that other arrangements of rejection were reached. This casts doubt on the coherence of the principle as a principle governing rational choice, at least in scenarios where states are not statistically independent of actions. It might be objected that agents do not revise their credences about states in the way suggested, and so the problem of order dependence described above does not arise. I withhold judgment about whether it is required by rationality to undertake such revision (although it seems compelling that, at least in a large range of cases, rational agents should exploit all the information available to them). Whether agents do in practice is surely an empirical matter, but it seems very plausible that in certain cases—such as the ‘gruesome death’ example given above, or Wedgwood’s own examples of ‘dreadful courses of action’—some agents would do this and would not count as irrational for doing so. I have argued against ruling out weakly dominated actions simpliciter (i.e. against the weak dominance principle). If the weak dominance principle is rejected, an unacceptable gap is left in benchmark theory, since ECV-maximization alone does not suffice to rule out intuitively irrational dominated actions from certain decision problems. An alternative principle, the non-trivial weak dominance principle, was considered as a candidate for filling the gap, but was found to be problematic for the reasons described above. 4 An apparent counterexample to benchmark theory So far the criticisms of benchmark theory presented in this paper have been ‘bottom up’, i.e. I have argued that some of the foundational principles, in particular the weak dominance principle, on which the theory is built are both unmotivated and problem- 21 atic. In this section I will change tactics and offer a ‘top-down’ criticism. Even if BT can be given a solid foundation, there are decision problems for which it yields an intuitively wrong result. Such a problem is offered here as a counterexample to BT. Before constructing the counterexample, it will be instructive to consider an example problem that Wedgwood gives. This problem forms part of Wedgwood’s response to Briggs. He uses it to argue that it is not always irrational to play a nearly dominated strategy. I agree with his verdict here, but the arguments he employs I will later use to defend an action that BT rules as irrational to choose. Furthermore, I suggest that these same arguments applied differently to this very example produce a conclusion problematic for BT. The decision problem he considers is as follows (problem 6): Table 16: Payoffs 6 Action/State S1 S2 S3 Table 17: Posterior credences 6 Action/State S1 S2 S3 A B C 0 9000 3000 A B C 1 0 0 0 3000 9000 0.9 0.05 0.05 0.05 0.9 0.05 0.05 0.05 0.9 There are weightings that will yield under BT any of the three actions as rational and hence all three are absolutely rationally permissible. Wedgwood argues that this is the right result and I agree up to a point. It might seem prima facie odd that A is rationally permissible since B and C possibly result in far larger payoffs (or in Wedgwood’s backstory, possibly save many more lives). However, as Wedgwood points out, one never chooses between saving one life or thousands in this problem. The world is such that one can save thousands, which is good, or only one, not so good, but one is not faced with a choice between these ways the world could be. Indeed it is in just this sort of problem that Gandalf-style reasoning clarifies the nature of the choice one faces. Wedgwood offers two ways in which A seems a palatable choice. First, when state S1 is given a high prior credence, A seems like a very sensible choice (and under this circumstance CDT will recommend A). Second, Wedgwood suggests that one can conceive of the problem in such a way that make B and C seem like poor choices. In state S1 one is pretty powerless to act, but A uses that power prudently. In choosing A one should believe that one is nearly powerless but using one’s powers wisely. In states S2 and S3 , one is powerful. Hence in choosing either B or C one should believe that one is powerful but using one’s powers badly. Here it sounds like Wedgwood is appealing to ratifiability, in Jeffrey’s (1983) sense. An action is ratifiable iff it maximizes the utility an agent expects to receive given that she chooses that action. For example, oneboxing in Newcomb’s problem is not ratifiable because it does not maximize the utility one expects to receive given that one has one-boxed. Given that one has one-boxed, one expects there to be £1,000,000 in box B (and £1000 in box A), and under these circumstances one maximizes one’s utility by two-boxing. Two-boxing is ratifiable: given one has two-boxed, one expects there to be nothing in box B and under such circumstances one maximizes one’s utility by two-boxing. In the problem at hand, action A is the only ratifiable action. 22 It strikes me that both of these arguments backfire for the benchmark theorist. In the first instance, suppose that state S1 has very low prior credence. Presumably precisely the same line of reasoning that made A look very sensible when S1 enjoyed high prior credence makes it look like a poor choice when S1 has low prior credence. If one strongly believes oneself to be in S2 or S3 , A seems a terrible choice. Since BT ignores prior credences, it will still yield A as absolutely rationally permissible under such circumstances. The benchmark theorist can bite the bullet here and maintain that it still is, but in that case we are surely owed some explanation as to why. Turning to the second argument, which Wedgwood (2011, p. 31) describes as ‘the right way...to conceive of this choice situation’, one can argue that A is nevertheless uniquely ratifiable and intuitions to the contrary are mistaken. This is what the Jeffrey-style evidential decision theorist would maintain. However, such a decision theorist would reject B and C as irrational because they are not ratifiable. This goes too far for the benchmark theorist, since B and C are still absolutely rationally permissible. If we conceive of this problem in ‘the right way’, B and C look like poor choices. If one is, rightly or wrongly, guided by the maxim ‘Make one’s choice such that one uses wisely the powers one expects to have given that choice’, then one ought not choose B or C.24 4.1 A counterexample The observations just made are by no means devastating to benchmark theory. A benchmark theorist could think that, although it is nearly dominated, A is not irrational to choose and still not endorse either of the arguments Wedgwood gives for thinking it is not. I will now construct what seems to be a plausible counterexample to BT. A rational-looking choice of action in this problem is irrational to choose according to BT. The sorts of argument discussed above seem to vindicate this action as a rational choice, and it enjoys independent plausibility as such. The predictor’s back (with fresh boxes) and offers you a choice between taking just box A and taking just box B. If the predictor predicted that you would choose A, she has put £1m in there and nothing in B. If the predictor predicted that you would choose B, she has put £5m in there and £4.5m in A. The predictor is as reliable as you like and the boxes were sealed, etc. after the prediction. What do you choose? The above can be conceived as follows: each state has a default quantity associated with it and there has been placed a bonus in the box predicted. In the A-predicted state the default is £0, with a £1m bonus in A, and in the B-predicted state the default is £4.5m with a £500,000 bonus in B. It generalizes as follows: Here a is the default when A is predicted and c is the bonus placed in A in that case; b is the default when B is predicted and d is the bonus placed in B in that case. 24 An anonymous reviewer suggests that perhaps Wedgwood does not conceive of ratifiability as a necessary condition on rational decision-making, but rather a feature that counts in favour of rationally choosing an action. If this is so, then Wedgwood can maintain that the ratifiability of A may explain why it is rationally permissible to choose, while denying that B and C are rationally impermissible simply because they are not ratifiable choices. However, if Wedgwood does think this, it still seems that we are owed some explanation of why B and C are rationally permissible when they are to be thought of as using one’s powers badly. Wedgwood might say here that, while this fact should count against them as rational choices, there remain other considerations in their favour that it does not outweigh. 23 Table 18: Payoffs 7 1/2 A predicted B predicted A B a+c a b b+d Solving the problem with BT: Let us assign weighting w to the highest payoff in any state and 1 − w to the other payoff. This gives us benchmark (a + c)w + a(1 − w) = a + cw when A is predicted and benchmark b(1 − w) + (b + d)w = b + dw when B is predicted. This yields the following comparative values for outcomes: Table 19: Comparative values 7 1/2 A predicted B predicted A B c(1 − w) −cw −dw d(1 − w) If we assume the predictor is perfectly reliable (or more accurately, the agent facing the problem has credences to that effect) then evidentially expected comparative values are as follows: Table 20: Evidentially expected comparative values 7 1/2 A predicted B predicted Total A B c(1 − w) 0 0 d(1 − w) c(1 − w) d(1 − w) If w < 1, A has a greater ECV than B if the ‘mark up’ under A prediction (c) exceeds that under B prediction (d). When w = 1 (i.e. maximizes evidentially expected regret), both A and B have the same ECV of 0, so according to BT it absolutely rationally required to take A. Intuitively it is not irrational to choose B here, so this result is problematic for BT. Arguably it is not irrational to choose A either. One might be playing safe: A maximizes minimum gain. If we were told by a reliable witness that he had seen the predictor putting £1m in A, for instance, it might be rational to choose A. By the same token, it is rational to choose B if we were told by a reliable witness that he had seen the predictor putting £5m in B. Perhaps Wedgwood’s suggestion of the distinction between absolute rational requirement and absolute rational permissibility is too strong. We could turn instead to his suggestion that an agent arbitrarily selects a weighting and chooses rationally by choosing an action that maximizes ECV under this weighting. Since B maximizes ECV when w = 1, BT does not rule out B as a rational choice under this way of viewing different weightings. 24 But now consider the problem adjusted ever so slightly: we let the predictor be arbitrarily reliable, but not perfect. Specifically then there is some 1 ≫ ε > 0 such that the predictor has a 1 − ε chance of being correct in her prediction. Our agent forms the following conditional credences: Table 21: Conditional credences 7′ 1/2 A predicted B predicted A B 1−ε ε ε 1−ε Comparative values are as before and the evidentially expected comparative values in this problem are therefore: Table 22: Evidentially expected comparative values 7′ 1/2 A predicted B predicted Total A B c(1 − w)(1 − ε) −cwε −dwε d(1 − w)(1 − ε) c(1 − w)(1 − ε) − dwε d(1 − w)(1 − ε) − cwε Theorem 3 If c > d then the evidentially expected comparative value of A exceeds that of B regardless of the weighting. Proof 3 We need to show ECV w (A) > ECV w (B) for arbitrary weighting w. We have ECV w (A) = c(1 − w)(1 − ε) − dwε (19) ECV w (B) = d(1 − w)(1 − ε) − cwε (20) ECV w (A) − ECV w (B) = (c − d)((1 − w)(1 − ε) + wε) (21) and Hence: We’ll denote this quantity X. We now have three cases to consider: w = 1, w = 0 and 1 > w > 0. If w = 1 then X = (c − d)ε > 0 because ε, c − d > 0. If w = 0 then X = (c − d)(1 − ε) > 0 because 1 − ε, c − d > 0 (since ε < 1). If 1 > w > 0 then 1 − w > 0, so only positive quantities are being added and multiplied in the RHS of (21) and hence X > 0. By cases, X > 0 and hence ECV w (A) > ECV w (B).  The upshot of theorem 3 is that B maximizes ECV under no weighting and so is never vindicated as a rational choice of action by BT. The same arguments that applied earlier apply here: if I have a high prior that the predictor predicted B, then B seems sensible. Furthermore B is ratifiable. One uses the power expects to have wisely if one chooses B. Aside from these arguments, B simply does not look like an irrational choice of action here. Either BT has gone wrong in this case or the benchmark theorist needs to explain why choosing B, contrary to appearances, really is irrational. 25 5 Conclusion A number of objections to benchmark theory have been put forth. In particular, it is suggested that the weak dominance principle, especially if the Independence of Irrelevant Alternatives is to be rejected, bedevils the theory, and it seems that there are decision problems for which BT yields wrong results—or at least results that are wrong both intuitively and by the reasoning that Wedgwood offers elsewhere to defend the results BT gives for other problems. There are two conclusions that I want to draw. The first is that Gandalf’s Principle seems worth pursuing as a principle of rational choice and perhaps should find a foundational role in a decision theory equipped to handle Egan’s counterexamples to CDT. However, it seems that BT is not a promising avenue along which to do this. If it is to succeed, then I suggest that, at the very least, a principle more subtle than weak dominance needs to be motivated and incorporated into the theory. The second concerns the wider implications for the theory of rational choice of the considerations made. For one I think that these sorts of considerations place important constraints on what a good decision theory might be like. I think there is good reason to be hesitant to accept any decision theory that entails the weak dominance principle, but I also think that it is unlikely that good independent grounds for motivating it can be found. A weaker principle such as non-trivial weak dominance seems more plausible, and—while it does not seem suitable for BT—it might deserve consideration as another constraint that a decision theory should satisfy.25 References [1] Briggs, R. (2010). Decision-Theoretic Paradoxes as Voting Paradoxes. Philosophical Review, 119(1), 1–30. [2] Eells, E. (1981). Causality, Utility and Decision. Synthese, 48(2), 295–329. [3] Egan, A. (2007). Some Counterexamples to Causal Decision Theory. Philosophical Review, 116(1), 93–114. [4] Gibbard, A. and W. Harper. (1978). Counterfactuals and Two Kinds of Expected Utility. In C. A. Hooker, J. J. Leach, and E. F. McClennen (Eds.) Foundations and Applications of Decision Theory (pp. 125–62). Dordrecht: Reidel. [5] Jeffrey, R. (1983). The Logic of Decision (2nd ed.). Chicago: University of Chicago Press. [6] Kahneman, D. and A. Tversky (1986). The Behavioural Foundations of Economic Theory. The Journal of Business 59(4), S251–S278. [7] Lewis, D. (1981). Causal Decision Theory. Australasian Journal of Philosophy, 59(1), 5–30. 25 26 [8] Marcus D.A., Schraff L., Turk D.C. (1997). A double-blind provocative study of chocolate as a trigger of headache. Cephalagia, 17, 855–862. [9] Nozick, R. (1969). Newcomb’s Problem and Two Principles of Choice. In Nicholas Rescher (Ed.), Essays in Honor of Carl G. Hempel (pp. 114–146). Dordrecht: Reidel. [10] Sen, A. (1993). Internal Consistency of Choice. Econometrica, 61(3), 495–521. [11] Stalnaker, R. (1972). Letter to David Lewis. In William Harper, Robert Stalnaker, and Glenn Pearce (Eds.), Ifs: Conditionals, Belief, Decision, Chance, and Time (pp. 151–152). Dordrecht: Reidel. [12] Wedgwood, R. (2011). Gandalf’s Solution to the Newcomb Problem Synthese, doi:10.1007/s11229-011-9900-1 [13] Weirich, P. (2004). Realistic Decision Theory: Rules for Nonideal Agents in Nonideal Circumstances. New York: Oxford University Press. 27