Bounded Rationality
Bounded Rationality
Bounded Rationality
G REGORY W HEELER
F RANKFURT S CHOOL OF F INANCE & M ANAGEMENT
[email protected]
Herbert Simon introduced the term ‘bounded rationality’ (Simon 1957, p. 198) as a shorthand for his
brief against neoclassical economics and his call to replace the perfect rationality assumptions of homo
economicus with a conception of rationality tailored to cognitively limited agents.
Broadly stated, the task is to replace the global rationality of economic man with the kind
of rational behavior that is compatible with the access to information and the computa-
tional capacities that are actually possessed by organisms, including man, in the kinds of
environments in which such organisms exist (Simon 1955a).
‘Bounded rationality’ has since come to refer to a wide range of descriptive, normative, and prescriptive
accounts of effective behavior which depart from the assumptions of perfect rationality. This entry aims
to highlight key contributions—from the decision sciences, economics, cognitive- and neuropsychology,
biology, computer science, and philosophy—to our current understanding of bounded rationality.
Contents
1
6 Aumann’s Five Arguments and One More 23
Bounded rationality has come to broadly encompass models of effective behavior that weaken, or reject
altogether, the idealized conditions of perfect rationality assumed by models of economic man. In this
section we state what models of economic man are committed to and their relationship to expected utility
theory. In later sections we review proposals for departing from expected utility theory.
The perfect rationality of homo economicus imagines a hypothetical agent who has complete infor-
mation about the options available for choice, perfect foresight of the consequences from choosing those
options, and the wherewithal to solve an optimization problem (typically of considerable complexity)
that identifies an option which maximizes the agent’s personal utility. The meaning of ‘economic man’
has evolved from John Stuart Mill’s description of a hypothetical, self-interested individual who seeks
to maximize his personal utility (1844); to Jevon’s mathematization of marginal utility to model an eco-
nomic consumer (1871); to Frank Knight’s portrayal of the slot-machine man of neo-classical economics
(1921), which is Jevon’s calculator man augmented with perfect foresight and determinately specified
risk; to the modern conception of an economically rational economic agent conceived in terms of Paul
Samuelson’s revealed preference formulation of utility (1947) which, together with von Neumann and
Morgenstern’s axiomatization (1944), changed the focus of economic modeling from reasoning behavior
to choice behavior.
Modern economic theory begins with the observation that human beings like some consequences
better than others, even if they only assess those consequences hypothetically. A perfectly rational
person, according to the canonical paradigm of synchronic decision making under risk, is one whose
comparative assessments of a set of consequences satisfies the recommendation to maximize expected
utility. Yet, this recommendation to maximize expected utility presupposes that qualitative comparative
judgements of those consequences (i.e., preferences) are structured in such a way (i.e., satisfy specific
axioms) so as to admit a mathematical representation that places those objects of comparison on the
real number line (i.e., as inequalities of mathematical expectations), ordered from worst to best. This
structuring of preference through axioms to admit a numerical representation is the subject of expected
utility theory.
2
A prospect P is simply the set of consequence-probability pairs, P = (x1 , p1 ; x2 , p2 ; . . . ; xn , pn ). By con-
vention, a prospect’s consequence-probability pairs are ordered by the value of each consequence, from
least favorable to most. When prospects P, Q, R are comparable under a specific preference relation, ,
and the (ordered) set of consequences X is fixed, then prospects may be simply represented by a vector
of probabilities.
The expected utility hypothesis (Bernoulli 1738) states that rational agents ought to maximize ex-
pected utility. If your qualitative preferences over prospects satisfy the following three constraints,
ordering, continuity, and independence, then your preferences will maximize expected utility (von Neu-
mann and Morgenstern 1944).
A1. Ordering. The ordering condition states that preferences are both complete and transitive. For all
prospects P, Q, completeness entails that either P Q, Q P, or both Q P and Q P, written
P ∼ Q. For all prospects P, Q, R, transitivity entails that if P Q and Q R, then P R.
A2. Archimedean. For all prospects P, Q, R such that P Q and Q R, then there exists some
p ∈ (0, 1) such that (P, p; R, (1 − p)) ∼ Q, where (P, p; R, (1 − p)) is the compound prospect that
yields the prospect P as a consequence with probability p or yields the prospect R with probability
1 − p.1
A3. Independence. For all prospects P, Q, R, if P Q, then (P, p; R, (1 − p)) (Q, p; R, (1 − p)) for
all p.
Specifically, if A1, A2, and A3 hold, then there is a real-valued function V (·) of the form
where P is any prospect and u(·) is a von Neumann and Morgenstern utility function defined on the set
of consequences X, such that P Q if and only if V (P) ≥ V (Q). In other words, if your qualitative
comparative judgements of prospects at a given time satisfy A1, A2, and A3, then those qualitative
judgments are representable numerically by inequalities of functions of the form V (·), yielding a logical
calculus on an interval scale for determining the consequences of your qualitative comparative judgments
at that time.
Alternatives to A1. Weakening the ordering axiom introduces the possibility for an agent to forgo
comparing a pair of alternatives, an idea both Keynes and Knight advocated (Keynes 1921; Knight 1921).
Specifically, dropping the completeness axiom allows an agent to be in a position to neither prefer one
option to another nor be indifferent between the two (Koopman 1940; Aumann 1962; Fishburn 1982).
Decisiveness, which the completeness axiom encodes, is more mathematical convenience than principle
of rationality. The question, which is the question that every proposed axiomatic system faces, is what
logically follows from a system which allows for incomplete preferences. Led by (Aumann 1962), early
1 For compound prospects with an uncountable number of consequences, the Archimedean condition is replaced by a
continuity condition, which maintains that ∀p ∈ P are closed in the topology of weak convergence.
3
axiomatizations of rational incomplete preferences were suggested by (Giles 1976) and (Giron and Rios
1980), and later studied by (Karni 1985), (Bewley 2002), (Walley 1991), (Seidenfeld, Schervish, and
Kadane 1995), (Ok 2002), (Nau 2006), (Galaabaatar and Karni 2013) and (Zaffalon and Miranda 2015).
In addition to accommodating indecision, such systems also allow for you to reason about someone
else’s (possibly) complete preferences when your information about that other agent’s preferences is
incomplete.
Dropping transitivity limits extendability of elicited preferences (Luce and Raiffa 1957), since the
omission of transitivity as an axiomatic constraint allows for cycles and preference reversals. Although
violations of transitivity have been long considered both commonplace and a sign of human irrationality
(May 1954; Tversky 1969), reassessments of the experimental evidence challenge this received view
(Mongin 2000; Regenwetter, Dana, and Davis-Stober 2011). The axioms impose synchronic consistency
constraints on preferences, whereas the experimental evidence for violations of transitivity commonly
conflate dynamic and synchronic consistency (Regenwetter, Dana, and Davis-Stober 2011). Specifically,
a person’s preferences at one moment in time that are inconsistent with his preferences at another time
is no evidence for that person holding logically inconsistent preferences at a single moment in time.
Arguments to limit the scope of transitivity in normative accounts of rational preference similarly point
to diachronic or group preferences, which likewise do not contradict the axioms (Kyburg 1978; Schick
1986; Anand 1987; Bar Hillel and Margalit 1988). Arguments that point to psychological processes
or algorithms that admit cycles or reversals of preference over time also point to a misapplication of,
rather than a counter-example to, the ordering condition. Finally, for decisions that involve explicit
comparisons of options over time, violating transitivity may be rational. For example, given the goal
of maximizing the rate of food gain, an organism’s current food options may reveal information about
food availability in the near future by indicating that a current option may soon disappear or that a better
option may soon reappear. Information about availability of options over time can, and sometimes does,
warrant non-transitive choice behavior over time that nevertheless maximizes food gain (McNamara,
Trimmer, and Houston 2014).
Alternatives to A2. Dropping the Archimedean axiom allows for an agent to have lexicographic pref-
erences (Blume, Brandenburger, and Dekel 1991); that is, the omission of A2 allows the possibility
for an agent to prefer one option infinitely more than another. One motivation for developing a non-
Archemedean version of expected utility theory is to address a gap in the foundations of the standard
subjective utility framework that prevents a full reconciliation of admissibility (i.e., the principle that one
ought not select a weakly dominated option for choice) with full conditional preferences (i.e., that for
any event, there is a well-defined conditional probability to represent the agent’s conditional preferences)
(Pedersen 2014). Specifically, the standard subjective expected utility account cannot accommodate
conditioning on zero-probability events, which is of particular importance to game theory (Hammond
1994). Non-Archimedean variants of expected utility theory turn to techniques from nonstandard analy-
sis (Goldblatt 1998), full conditional probabilities (Renyi 1955; Popper 1959; Dubins 1975; Coletti and
Scozzafava 2002), and lexicographic probabilities (Halpern 2010; Brickhill and Horsten 2016), and are
all linked to imprecise probability theory (Wheeler and Cozman 2018).
Non-compensatory single-cue decision models, such as the Take-the-Best heuristic (Section 7.2), ap-
peal to lexicographically ordered cues, and admit a numerical representation in terms of non-Archimedean
expectations (Arló-Costa and Pedersen 2011).
Alternatives to A3. A1 and A2 together entail that V (·) assigns a real-valued index to prospects such
that P Q if and only if V (P) ≥ V (Q). The independence axiom, A3, encodes a separability property for
choice, one that ensures that expected utilities are linear in probabilities. Motivations for dropping the
independence axiom stem from difficulties in applying expected utility theory to describe choice behav-
ior, including an early observation that humans evaluate possible losses and possible gains differently.
Although expected utility theory can represent a person who either gambles or purchases insurance,
Friedman and Savage remarked in their early critique von Neumann and Morgenstern’s axiomization, it
4
cannot simultaneously do both (Friedman and Savage 1948).
The principle of loss aversion (Kahneman and Tversky 1979; Rabin 2000) suggests that the sub-
jective weight that we assign to potential losses is larger than those we assign to potential gains. For
example, the endowment effect (Thaler 1980)—the observation that people tend to view the value of
a good higher when viewed as a potential loss than when viewed as a potential gain—is supported by
neurological evidence for gains and losses being processed by different regions of the brain (Rick 2011).
However, even granting the affective differences in how we process losses and gains, those differences
do not necessarily translate to a general “negativity bias“ (Baumeister, Bratslavsky, and Finkenauer
2001) in choice behavior (Hochman and Yechiam 2011; Yechiam and Hochman 2014). Yechiam and
colleagues report experiments in which participants do not exhibit loss aversion in their choices, such
as cases in which participants respond to repetitive situations that issue losses and gains and single-case
decisions involving small stakes. That said, observations of risk aversion (Allais 1953) and ambiguity
aversion (Ellsberg 1961) have led to alternatives to expected utility theory, all of which abandon A3.
Those alternative approaches include prospect theory (Section 2.4), regret theory (Bell 1982; Loomes
and Sugden 1982), and rank-dependent expected utility (Quiggin 1982).
Most models of bounded rationality do not even fit into this broad axiomatic family just outlined.
One reason is that bounded rationality has historically emphasized the procedures, algorithms, or psy-
chological processes involved in making a decision, rendering a judgment, or securing a goal (Section 2).
Samuelson’s shift from reasoning behavior to choice behavior abstracted away precisely these details,
however, treating them as outside the scope of rational choice theory. For Simon, that was precisely the
problem. A second reason is that bounded rationality often focuses on adaptive behavior suited to an or-
ganism’s environment (Section 3). Since ecological modeling involves goal-directed behavior mitigated
by the constitution of the organism and stable features of its environment, focusing on (synchronically)
coherent comparative judgments is often not, directly at least, the best way to frame the problem.
That said, one should be cautious about generalizations sometimes made about the limited role of
decision theoretic tools in the study of bounded rationality. Decision theory—broadly construed to
include statistical decision theory (Berger 1980)—offers a powerful mathematical toolbox even though
historically, particularly in its canonical form, it has traded in psychological myths such as “degrees
of belief“ and logical omniscience (Section 1.3). One benefit of studying axiomatic departures from
expected utility theory is to loosen the grip of Bayesian dogma to expand the range of possibilities for
applying a growing body of practical and powerful mathematical methods.
5
a remote digit of π would, in order to comply fully with the theory, have to compute that
digit, though this would really be wasteful if the cost of computation were more than the
prize involved. For the postulates of the theory imply that you should behave in accordance
with the logical implication of all that you know. Is it possible to improve the theory in this
respect, making allowances within it for the cost of thinking, or would that entail paradox, as
I am inclined to believe but unable to demonstrate? (Savage 1967, excerpted from Savage’s
prepublished draft. See notes in Seidenfeld et al., 2012)
Responses to Savage’s problem include a game-theoretic treatment proposed by I.J. Good (1983),
which swaps the extensional variable that is necessarily true for an intensional variable representing an
accomplice who knows the necessary truth but withholds enough information from you for you to be
(coherently) uncertain about what he knows. This trick changes the subject of your uncertainty, from
a necessarily true proposition that you cannot coherently doubt to a coherent guessing game about that
truth facilitated by your accomplice’s incomplete description. Another response sticks to the classical
line that failures of logical omniscience are deviations from the normative standard of perfect rationality
but introduces an index for incoherence to accommodate reasoning with incoherent probability assess-
ments (Schervish, Seidenfeld, and Kadane 2012). A third approach, suggested by de Finetti (1974), is
to restrict possible states of affairs to observable states with a finite verifiable procedure—which may
rule out theoretical states or any other that does not admit a verification protocol. Originally, what de
Finetti was after was a principled way to construct a partition over possible outcomes to distinguish
serious possible outcomes of an experiment from wildly implausible but logically possible outcomes,
yielding a method for distinguishing between genuine doubt and mere “paper doubts” (Peirce 1955).
Other proposals follow de Finetti’s line by tightening the admissibility criteria and include epistemically
possible events, which are events that are logically consistent with the agent’s available information;
apparently possible events, which include any event by default unless the agent has determined that it
is inconsistent with his information; and pragmatically possible events, which only includes events that
are judged sufficiently important (Walley 1991, §2.1).
The notion of apparently possible refers to a procedure for determining inconsistency, which is a
form of bounded procedural rationality (Section 2). The challenges of avoiding paradox, which Savage
alludes to, are formidable. However, work on bounded fragments of Peano arithmetic (Parikh 1971)
provided coherent foundations for exploring these ideas, which has been taken up specifically to formu-
late bounded-extensions of default logic for apparent possibility (Wheeler 2004) and more generally in
models of computational rationality (Lewis, Howes, and Singh 2014).
6
machine performing arithmetic. A curriculum for improving the arithmetical performance of elemen-
tary school children will differ from one designed to improve the performance of adults. Even though
the normative standard of Peano arithmetic is the same for both children and adults, stable psycholog-
ical differences in these two populations may warrant prescribing different approaches for improving
their arithmetic. Continuing, even though Peano’s axioms are the normative standard for full arithmetic,
nobody would prescribe Peano’s axioms for the purpose of improving anyone’s sums. There is no mis-
taking Peano’s axioms for a descriptive theory of arithmetical reasoning, either. Even so, a descriptive
theory of arithmetic will presuppose the Peano axioms as the normative standard for full arithmetic,
even if only implicitly. In describing how people sum two numbers, after all, one presumes that they are
attempting to sum two numbers rather than concatenate them, count out in sequence, or send a message
in code.
Finally, imagine an effective pedagogy for teaching arithmetic to children is known and we wish to
introduce children to cardinal arithmetic. A reasonable start on a prescriptive theory for cardinal arith-
metic for children might be to adapt as much of the successful pedagogy for full arithmetic as possible
while anticipating that some of those methods will not survive the change in normative standards from
Peano to (say) ZFC+. Some of those differences can be seen as a direct consequence of the change from
one standard to another, while other differences may arise unexpectedly from the observed interplay
between the change in task, that is, from performing full arithmetic to performing cardinal arithmetic,
and the psychological capabilities of children to perform each task.
To be sure, there are important differences between arithmetic and rational behavior. The objects
of arithmetic, numerals and the numbers they refer to, are relatively clear cut, whereas the objects of
rational behavior vary even when the same theoretical machinery is used. Return to expected utility
theory as an example. An agent may be viewed as deliberating over options with the aim to choose one
that maximizes his personal welfare, or viewed to act as if he deliberately does so without actually doing
so, or understood to do nothing of the kind but to instead be a bit part player in the population fitness of
his kind.
Separating the question of how to choose a normative standard from questions about how to eval-
uate or describe behavior is an important tool to reduce misunderstandings that arise in discussions of
bounded rationality. Even though Peano’s axioms would never be prescribed to improve, nor proposed
to describe, arithmetical reasoning, it does not follow that the Peano axioms of arithmetic are irrelevant
to descriptive and prescriptive theories of arithmetic. While it remains an open question whether the nor-
mative standards for human rational behavior admit axiomatization, there should be little doubt over the
positive role that clear normative standards play in advancing our understanding of how people render
judgments, or make decisions, and how they ought to do so.
Simon thought the shift in focus from reasoning behavior to choice behavior was a mistake. Since,in the
1950s, little was known about the processes involved in making judgments or reaching decisions, we
were not in the position to freely abstract away all of those features from our mathematical models. Yet,
this ignorance also raised the question of how to proceed. The answer was to attend to the costs in effort
involved operating a procedure for making decisions and comparing those costs to the resources avail-
able to the organism using the procedure and, conversely, to compare how well an organism performs
in terms of accuracy (Section 8.2) with its limited cognitive resources in order to investigate models
with comparable levels of accuracy within those resource bounds. Effectively managing the trade-off
between the costs and quality of a decision involves another type of rationality, which Simon later called
procedural rationality (Simon 1976, p. 69).
In this section we highlight early, key contributions to modeling procedures for boundedly rational
judgment and decision-making, including the origins of the accuracy-effort trade-off, Simon’s satisficing
strategy, improper linear models, and the earliest effort to systematize several features of high-level,
cognitive judgment and decision-making, cumulative prospect theory.
7
2.1 Accuracy and Effort
Herbert Simon and I.J. Good were each among the first to call attention to the cognitive demands of
subjective expected utility theory, although neither one in his early writings abandoned the principle
of expected utility as the normative standard for rational choice. Good, for instance, referred to the
recommendation to maximize expected utility as the ordinary principle of rationality, whereas Simon
called the principle objective rationality and considered it the central tenant of global rationality. The
rules of rational behavior are costly to operate in both time and effort, Good observed, so real agents
have an interest in minimizing those costs (Good 1952, §7(i)). Efficiency dictates that one choose from
available alternatives an option that yields the largest result given the resources available, which Simon
emphasized is not necessarily an option that yields the largest result overall (Simon 1947, p. 79). So
reasoning judged deficient without considering the associated costs may be found meritorious once all
those costs are accounted for—a conclusion that a range of authors soon came to endorse, including
Amos Tversky:
It seems impossible to reach any definitive conclusions concerning human rationality in
the absence of a detailed analysis of the sensitivity of the criterion and the cost involved
in evaluating the alternatives. When the difficulty (or the costs) of the evaluations and the
consistency (or the error) of the judgments are taken into account, a [transitivity-violating
method] may prove superior (Tversky 1969).
Balancing the quality of a decision against its costs soon became a popular conception of bounded
rationality, particularly in economics (Stigler 1961), where it remains commonplace to formulate bound-
edly rational decision-making as a constrained optimization problem. On this view boundedly rational
agents are utility maximizers after all, once all the constraints are made clear (Arrow 2004). Another
reason for the popularity of this conception of bounded rationality is its compatibility with Milton Fried-
man’s as if methodology (Friedman 1953), which licenses models of behavior that ignore the causal
factors underpinning judgment and decision making. To say that an agent behaves as if he is a utility
maximizer is at once to concede that he is not but that his behavior proceeds as if he were. Similarly, to
say that an agent behaves as if he is a utility maximizer under certain constraints is to concede that he
does not solve constrained optimization problems but nevertheless behaves as if he did so.
Simon’s focus on computationally efficient methods that yield solutions that are good enough con-
trasts with Friedman’s as if methodology, since evaluating whether a solution is “good enough”, in
Simon’s terms, involves search procedures, stopping criteria, and how information is integrated in the
course of making a decision. Simon offers several examples to motivate inquiry into computationally
efficient methods. Here is one. Applying the game-theoretic minimax algorithm to the game of chess
calls for evaluating more chess positions than the number of molecules in the universe (Simon 1957,
p. 6). Yet if the game of chess is beyond the reach of exact computation, why should we expect everyday
problems to be any more tractable? Simon’s question is to explain how human beings manage to solve
complicated problems in an uncertain world given their meager resources. Answering Simon’s question,
as opposed to applying Friedman’s method to fit a constrained optimization model to observed behavior,
is to demand a model with better predictive power concerning boundedly rational judgment and deci-
sion making. In pressing this question of how human beings solve uncertain inference problems, Simon
opened two lines of inquiry that continue to today, namely:
2. How can the standard theories of global rationality be simplified to render them more tractable?
Simon’s earliest efforts aimed to answer the second question with, owing to the dearth of psycho-
logical knowledge at the time about how people actually make decisions, only a layman’s “acquaintance
with the gross characteristics of human choice” (Simon 1955a, p. 100). His proposal was to replace
the optimization problem of maximizing expected utility with a simpler decision criterion he called
satisficing, and by models with better predictive power more generally.
8
2.2 Satisficing
Satisficing is the strategy of considering the options available to you for choice until you find one that
meets or exceeds a predefined threshold—your aspiration level—for a minimally acceptable outcome.
Although Simon originally thought of procedural rationality as a poor approximation of global ratio-
nality, and thus viewed the study of bounded rationality to concern “the behavior of human beings who
satisfice because they have not the wits to maximize” (Simon 1957, p. xxiv), there are a range of applica-
tions of satisficing models to sequential choice problems, aggregation problems, and high-dimensional
optimization problems, which are increasingly common in machine learning.
Given a specification of what will count as a good-enough outcome, satisficing replaces the optimiza-
tion objective from expected utility theory of selecting an undominated outcome with the objective of
picking an option that meets your aspirations. The model has since been applied to business (Bazerman
and Moore 2008; Puranam, Stieglitz, Osman, and Pillutla 2015), mate selection (Todd and Miller 1999)
and other practical sequential-choice problems, like selecting a parking spot (Hutchinson, Fanselow, and
Todd 2012). Ignoring the procedural aspects of Simon’s original formulation of satisficing, if one has
a fixed aspirational level for a given decision problem, then admissible choices from satisficing can be
captured by so-called ε-efficiency methods (Loridan 1984; White 1986).
Hybrid optimization-satisficing techniques are used in machine learning when many metrics are
available but no sound or practical method is available for combining them into a single value. Instead,
hybrid optimization-satisficing methods select one metric to optimize and satisfice the remainder. For
example, a machine learning classifier might optimize accuracy (i.e., maximize the proportion of exam-
ples for which the model yields the correct output; see Section 8.2) but set aspiration levels for the false
positive rate, coverage, and runtime.
Selten’s aspiration adaption theory models decision tasks as problems with multiple incomparable
goals that resist aggregation into a complete preference order over all alternatives (Selten 1998). Instead,
the decision-maker will have a vector of goal variables, where those vectors are comparable by weak
dominance. If vector A and vector B are possible assignments for my goals, then A dominates vector
B if there is no goal in the sequence in which B assigns a value that is strictly less than A, and there
is some goal for which A assigns a value strictly greater than B. Selten’s model imagines an aspiration
level for each goal, which itself can be adjusted upward or downwards depending on the set of feasible
(admissible) options. Aspiration adaption theory is a highly procedural and local account in the tradition
of Newell and Simon’s approach to human problem solving (Newell and Simon 1972), although it was
not initially offered as a psychological process model. Analogous approaches have been explored in the
AI planning literature (Bonet and Geffner 2001; Ghallab, Nau, and Traverso 2016).
9
Robin Dawes, returning to Meehl’s question about statistical versus clinical predictions, found that
even improper linear models perform better than clinical intuition (Dawes 1979). The distinguishing
feature of improper linear models is that the weights of a linear model are selected by some non-optimal
method. For instance, equal weights might be assigned to the predictor variables to afford each equal
weight or unit-weights assigned, such as -1 or 1, to tally features supporting a positive or negative pre-
diction, respectively. As an example, Dawes proposed an improper model to predict subjective ratings
of marital happiness by couples based on the difference between their rates of lovemaking and fighting.
The results? Among the thirty happily married couples, two argued more than they had intercourse.
Yet all twelve unhappy couples fought more frequently. And those results replicated in other laborato-
ries studying human sexuality in the 1970s. Both equal-weight regression and unit-weight tallying have
since been found to commonly outperform proper linear models on small data sets. Although no simple
improper linear model performs well across all common benchmark datasets, for almost every data set
in the benchmark there is some simple improper model that performs well in predictive accuracy (Licht-
enberg and Özgür Simsek 2016). This observation, and many others in the heuristics literature, points
to biases of simplified models that can lead to better predictions when used in the right circumstances
(Section 4).
Dawes’s original point was not that improper linear models outperform proper linear models in terms
of accuracy, but rather that they are more efficient and (often) close approximations of proper linear
models. “The statistical model may integrate the information in an optimal manner,” Dawes observed,
“but it is always the individual . . . who chooses variables” (Dawes 1979, p. 573). Moreover, Dawes
argued that it takes human judgment to know the direction of influence between predictor variables and
target variables, which includes the knowledge of how to numerically code those variables to make this
direction clear. Recent advances in machine learning chip away at Dawes’s claims about the unique role
of human judgment, and results from Gigerenzer’s ABC Group about unit-weight tallying outperforming
linear regression in out-of-sample prediction tasks with small samples is an instance of improper linear
models outperforming proper linear models (Czerlinski, Gigerenzer, and Goldstein 1999). Nevertheless,
Dawes’s general observation about the relative importance of variable selection over variable weighting
stands (Katsikopoulos, Schooler, and Hertwig 2010).
1. Reference Dependence. Rather than make decisions by comparing the absolute magnitudes of
welfare, as prescribed by expected utility theory, people instead tend to value prospects by their
change in welfare with respect to a reference point. This reference point can be a person’s current
state of wealth, an aspiration level, or a hypothetical point of reference from which to evaluate op-
tions. The intuition behind reference dependence is that our sensory organs have evolved to detect
changes in sensory stimuli rather than store and compare absolute values of stimuli. Therefore, the
argument goes, we should expect to see the cognitive mechanisms involved in decision-making to
inherit this sensitivity to changes in perceptual attributes values.
In prospect theory, reference dependence is reflected by utility changing sign at the origin of the
valuation curve v(·) in Figure 1(a). The x-axis represents gains (right side) and losses (left side) in
euros, and y-axis plots the value placed on relative gains and losses by a valuation function v(·),
which is fit to experimental data on people’s choice behavior.
10
2. Loss Aversion. People are more sensitive to losses than gains of the same magnitude; the thrill
of victory does not measure up to the agony of defeat. So, Kahneman and Tversky maintained,
people will prefer an option that does not incure a loss to an alternative option that yields an
equivalent gain. The disparity in how potential gains and losses are evaluated also accounts for
the endowment effect, which is the tendency for people to value a good that they own more than a
comparatively valued substitute (Thaler 1980).
In prospect theory, loss aversion appears in Figure 1(a) in the (roughly) steeper slope of v(·) to
the left of the origin, representing losses relative to the subject’s reference point, than the slope of
v(·) for gains on the right side of the reference point. Thus, for the same magnitude of change in
reward x from the reference point, there magnitude of the consequence of gaining x is less than
the magnitude of losing x.
Note that differences in affective attitudes toward, and the neurological processes responsible for
processing, losses and gains do not necessarily translate to differences in people’s choice behavior
(Yechiam and Hochman 2014). The role and scope that loss aversion plays in judgment and
decision making is less clear than was initially assumed (Section 1.2).
3. Diminishing Returns for both Gains and Losses. Given a fixed reference point, people’s sensi-
tivity to changes in asset values (x in Figure 1a) diminish the further one moves from that reference
point, both in the domain of losses and the domain of gains. This is inconsistent with expected
utility theory, even when the theory is modified to accommodate diminishing marginal utility
(Friedman and Savage 1948).
In prospect theory, the valuation function v(·) is concave for gains and convex for losses, repre-
senting a diminishing sensitivity to both gains and losses. Expected utility theory can be made
to accommodate sensitivity effects, but the utility function is typically either strictly concave or
strictly convex, not both.
4. Probability Weighting. Finally, for known exogenous probabilities, people do not calibrate their
subjective probabilities by direct inference (Levi 1977), but instead systematically underweight
high-probability events and overweight low-probability events, with a cross-over point of approx-
imately one-third (Figure 1b). Thus, changes in very small or very large probabilities have greater
impact on the evaluation of prospects than they would under expected utility theory. People are
willing to pay more to reduce the number of bullets in the chamber of a gun from 1 to 0 than from
4 bullets to 3 in a hypothetical game of Russian roulette.
Figure 1(b) plots the median values for the probability weighting function w(·) that takes the ex-
ogenous probability p associated with prospects, as reported in (Tversky and Kahneman 1992).
Roughly, below probability values of one-third people overestimate the probability of an outcome
(consequence), and above probability one-third people tend to underestimate the probability of
an outcome occurring. Traditionally, overweighting is thought to concern the systematic miscal-
ibration of people’s subjective estimates of outcomes against a known exogenous probability, p,
serving as the reference standard. In support of this view, miscalibration appears to disappear
when people learn a distribution through sampling instead of learning identical statistics by de-
scription (Hertwig, Barron, Weber, and Erev 2004). Miscalibration in this context ought to be
distinguished from overestimating or underestimating subjective probabilities when the relevant
statistics are not supplied as part of the decision task. For example, televised images of the af-
termath of airplane crashes lead to an overestimation of the low-probability event of commercial
airplanes crashing. Even though a person’s subjective probability of the risk of a commercial
airline crash would be too high given the statistics, the mechanism responsible is different: here
the recency or availability of images from the evening news is to blame for scaring him out of
his wits, not the sober fumbling of a statistics table. An alternative view maintains that people
understand that their weighted probabilities are different than the exogenous probability but nev-
ertheless prefer to act as if the exogenous probability were so weighted (Wakker 2010). On this
11
30 1
−30 0
−100 0 100 0 0.5 1
x p
(a) (b)
Figure 1: (a) plots the value function v(·) applied to consequences of a prospect; (b) plots the median value of the
probability weighting function w(·) applied to positive prospects of the form (x, p; 0, 1 − p) with probability p.
Prospect theory incorporates these components into models of human choice under risk by first
identifying a reference point that either refers to the status quo or some other aspiration level. The con-
sequences of the options under consideration then are framed in terms of deviations from this reference
point. Extreme probabilities are simplified by rounding off, which yields miscalibration of the given,
exogenous probabilities. Dominance reasoning is then applied, where dominated alternatives are elimi-
nated from choice, along with additional steps to separate options without risk, probabilities associated
with a specific outcome are combined, and a version of eliminating irrelevant alternatives is applied
(Kahneman and Tversky 1979, pp. 284–285).
Nevertheless, the prospect theory comes with problems. For example, a shift of probability from
less favorable outcomes to more favorable outcomes ought to yield a better prospect, all things con-
sidered, but the original prospect theory violates this principle of stochastic dominance. Cumulative
prospect theory satisfies stochastic dominance, however, by appealing to a rank-dependent method for
transforming probabilities (Quiggin 1982). For a review of the differences between prospect theory and
cumulative prospect theory, along with an axiomatization of cumulative prospect theory, see (Fennema
and Wakker 1997).
Imagine a meadow whose plants are loaded with insects and few are in flight. Then, this meadow is a
more favorable environment for a bird that gleans rather than hawks. In a similar fashion, a decision-
making environment might be more favorable for one decision-making strategy than for another. Just
as it would be “irrational” for a bird to hawk rather than glean, given the choice in this meadow, so too
what may be an irrational decision strategy in one environment may be entirely rational in another.
If procedural rationality attaches a cost to the making of a decision, then ecological rationality locates
that procedure in the world. The questions ecological rationality ask is what features of an environment
can help or hinder decision making and how should we model judgment or decision-making ecologies.
For example, people make causal inferences about patterns of covariation they observe—especially chil-
dren, who then perform experiments testing their causal hypotheses (Glymour 2001). Unsurprisingly,
people who draw the correct inferences about the true causal model do better than those who infer the
wrong causal model (Meder, Mayrhofer, and Waldmann 2014). More surprising, Meder and his col-
leagues found that those making correct causal judgments do better than subjects who make no causal
12
judgments at all. And perhaps most surprising of all is that those with true causal knowledge also beat
the benchmark standards in the literature which ignore causal structure entirely; the benchmarks encode,
spuriously, the assumption that the best we can do is to make no causal judgments at all.
In this section and the next we will cover five important contributions to the emergence of ecological
rationality. In this section, after reviewing Simon’s proposal for distinguishing between behavioral
constraints and environmental structure, we turn to three historically important contributions: the lens
model, rational analysis, and cultural adaptation. Finally, in Section 4, we review the bias-variance
decomposition, which has figured in the Fast and Frugal Heuristics literature (Section 7.2).
we must be prepared to accept the possibility that what we call “the environment” may
lie, in part, within the skin of the biological organisms. That is, some of the constraints that
must be taken as givens in an optimization problem may be physiological and psychological
limitations of the organism (biologically defined) itself. For example, the maximum speed
at which an organism can move establishes a boundary on the set of its available behavior
alternatives. Similarly, limits on computational capacity may be important constraints en-
tering into the definition of rational choice under particular circumstances. (Simon 1955a,
p. 101)
That said, what is classified as a behavioral constraint rather than an environmental affordance varies
across disciplines and the theoretical tools pressed into service. For example, one computational ap-
proach to bounded rationality, computational rationality theory (Lewis, Howes, and Singh 2014), classi-
fies the cost to an organism of executing an optimal program as a behavioral constraint, classifies limits
on memory as an environmental constraint, and treats the costs associated with searching for an optimal
program to execute as exogenous. Anderson and Schooler’s study and computational modeling of hu-
man memory (Anderson and Schooler 1991) within the ACT-R framework, on the other hand, views the
limits on memory and search-costs as behavioral constraints which are adaptive responses to the struc-
ture of the environment. Still another broad class of computational approaches are found in statistical
signal processing, such as adaptive filters (Haykin 2013), which are commonplace in engineering and
vision (Marr 1982; Ballard and Brown 1982). Signal processing methods typically presume the sharp
distinction between device and world that Simon cautioned against, however. Still others have chal-
lenged the distinction between behavioral constraints and environmental structure by arguing that there
is no clear way to separate organisms from the environments they inhabit (Gibson 1979), or by arguing
that features of cognition which appear body-bound may not be necessarily so (Clark and Chalmers
1998).
Bearing in mind the different ways the distinction between behavior and environment have been
drawn, and challenges to what precisely follows from drawing such a distinction, ecological approaches
to rationality all endorse the thesis that the ways in which an organism manages structural features of
its environment are essential to understanding how deliberation occurs and effective behavior arises. In
doing so theories of bounded rationality have traditionally focused on at least some of the following
features, under this rough classification:
• Behavioral Constraints – may refer to bounds on computation, such as the cost of searching the
best algorithm to run, an appropriate rule to apply, or a satisficing option to choose; the cost of
executing an optimal algorithm, appropriate rule, or satisficing choice; and costs of storing the
data structure of an algorithm, the constitutive elements of a rule, or the objects of a decision
problem.
13
• Ecological Structure – may refer to statistical, topological, or other perceptible invariances of
the task environment that an organism is adapted to; or to architectural features or biological fea-
tures of the computational processes or cognitive mechanisms responsible for effective behavior,
respectively.
14
Figure 2: Brunswik’s Lens Model
on what follows from the reclassification, which will depend on the model and the goal of inquiry
(Section 8). If we were using the lens model to understand the ecological validity of an organism’s
judgment, then reclassifying εs as an environmental constraint would only introduce confusion; If instead
our focus was to distinguish between behavior that is subject to choice and behavior that is precluded
from choice, then the proposed reclassification may herald clarity—but then we would surely abandon
the lens model for something else, or in any case would no longer be referring to the parameter εs in
Figure 2.
Finally, it should be noted that the lens model, like nearly all linear models used to represent human
judgment and decision-making, does not scale well as a descriptive model. In multi-cue decision-making
tasks involving more than three cues, people often turn to simplifying heuristics due to the complications
involved in performing the necessary calculations (Section 2.1; see also Section 4). More generally, as
we remarked in Section 2.3, linear models involve calculating trade-offs that are difficult for people
to perform. Lastly, the supposition that the environment is linear is a strong modeling assumption.
Quite apart from the difficulties that arise for humans to execute the necessary computations, it becomes
theoretically more difficult to justify model selection decisions as the number of features increases.
The matching index G is a goodness-of-fit measure, but goodness-of-fit tests and residual analysis be-
gin to lead to misleading conclusions for models with as five or more dimensions. Modern machine
learning techniques for supervised learning get around this limitation by focusing on analogues of the
achievement index, construct predictive hypotheses purely instrumentally, and dispense with matching
altogether (Wheeler 2017).
15
tional limitations are accounted for, an optimal solution under those conditions is derived to explain why
a behavior that is otherwise ineffective may nevertheless be effective in achieving that goal under those
conditions (Marr 1982; Anderson 1991; Oaksford and Chater 1994; Palmer 1999). Rational analyses
are typically formulated independently of the cognitive processes or biological mechanisms that explain
how an organism realizes a behavior.
One theme to emerge from the rational analysis literature that has influenced bounded rationality is
the study of memory (Anderson and Schooler 1991). For instance, given the statistical features of our
environment, and the sorts of goals we typically pursue, forgetting is an advantage rather than a liability
(Schooler and Hertwig 2005). Memory traces vary in their likelihood of being used, so the memory
system will try to make readily available those memories which are most likely to be useful. This is a
rational analysis style argument, which is a common feature of the Bayesian turn in cognitive psychology
(Oaksford and Chater 2007; Friston 2010). More generally, spacial arrangements of objectives in the
environment can simplify perception, choice, and the internal computation necessary for producing an
effective solution (Kirsch 1995). Compare this view to the discussion of recency or availability effects
distorting subjective probability estimates in Section 2.4.
Rational analyses separate the goal of behavior from the mechanisms that cause behavior. Thus,
when an organism’s observed behavior in an environment does not agree with the behavior prescribed
by a rational analysis for that environment, there are traditionally three responses. One strategy is to
change the specifications of the problem, by introducing an intermediate step or changing the goal alto-
gether, or altering the environmental constraints, et cetera (Anderson and Schooler 1991; Oaksford and
Chater 1994). Another strategy is to argue that mechanisms matter after all, so details of human psychol-
ogy are taken into an alternative account (Newell and Simon 1972; Gigerenzer, Todd, and Group 1999;
Todd, Gigerenzer, and Group 2012). A third option is to enrich rational analysis by incorporating com-
putational mechanisms directly into the model (Russell and Subramanian 1995; Chater 2014). Lewis,
Howes, and Singh, for instance, propose to construct theories of rationality from (i) structural features
of the task environment; (ii) the bounded machine the decision-process will run on, about which they
consider four different classes of computational resources that may be available to an agent; and (iii) a
utility function to specify the goal, numerically, so as to supply an objective function against which to
score outcomes (Lewis, Howes, and Singh 2014).
16
the adoption of maladaptive norms or stupid behavior.
The bias-variance trade-off refers to a particular decomposition of overall prediction error for an esti-
mator into its central tendency (bias) and dispersion (variance). Sometimes overall error can be reduced
by increasing bias in order to reduce variance, or vice versa, effectively trading an increase in one type
of error to afford a comparatively larger reduction in the other. To give an intuitive example, suppose
your goal is to minimize your score with respect to the following targets.
0 0
1 1
2 2
3 3
4 4
(low bias & low variance) (low bias & high variance)
0 0
1 1
2 2
3 3
4 4
(high bias & low variance) (high bias & high variance)
Ideally, you would prefer a procedure for delivering your “shots” that had both a low bias and low
variance. Absent that, and given the choice between a low bias and high variance procedure versus a
high bias and low variance procedure, you would presumably prefer the latter procedure if it returned
a lower overall score than the former, which is true of the corresponding figures above. Although a
decision maker’s learning algorithm ideally will have low bias and low variance, in practice it is common
that the reduction in one type of error yields some increase in the other. In this section we explain the
conditions under which the relationship between expected squared loss of an estimator and its bias and
variance holds and then remark on the role that the bias-variance trade-off plays in research on bounded
rationality.
17
relationships between random variables, such as the relationship between the temperature in Rome, X,
and volume of Roman gelato consumption, Y , is the subject of regression analysis.
Suppose we predict that the value of Y is h. How should we evaluate whether this prediction is
any good? Intuitively, the best we can do is to pick an h that is as close to Y as we can make it, one
that would minimize the difference Y − h. If we are indifferent to the direction of our errors, viewing
positive errors of a particular magnitude to be no worse than negative errors of the same magnitude, and
vice versa, then a common practice is to measure the performance of h by its squared difference from
Y , (Y − h)2 . (We are not always indifferent; consider the plight of William Tell aiming at that apple.)
Finally, since the values of Y vary, we might be interested in the average value of (Y − h)2 by computing
its expectation, E (Y − h)2 . This quantity is the mean squared error of h,
MSE(h) := E (Y − h)2 .
Now imagine our prediction of Y is based on some data D about the relationship between X and
Y , such as last year’s daily temperatures and daily total sales of gelato in Rome. The role that this
particular dataset D plays as opposed to some other possible data set is a detail that will figure later. For
now, view our prediction of Y as some function of X, written h(X). Here again we wish to pick an h(·)
to minimize E (Y − h(X))2 , but how close h(·) is to Y will depend on the possible values of X, which
we can represent by the conditional expectation
How then should we evaluate this conditional prediction? The same as before, only now accounting for
X. For each possible value x of X, the best prediction of Y is the conditional mean, E [Y | X = x]. The
regression function of Y on X, r(x), gives the optimal value of Y for each value x ∈ X:
r(x) := E [Y | X = x] .
Although the regression function represents the true population value of Y given X, this function is
usually unknown, typically complicated, therefore often approximated by a simplified model or learning
algorithm, h(·).
We might restrict candidates for h(X) to linear (or affine) functions of X, for instance. Yet making
predictions about the value of Y with a simplified linear model, or some other simplified model, can
introduce a systematic prediction error called bias. Bias results from a difference between the central
tendency of data generated by the true model, r(X) (for all x ∈ X), and the central tendency of our
estimator, E [h(X)], written
Bias(h(X)) := r(X) − E [h(X)] ,
where any non-zero difference between the pair is interpreted as a systematically positive or systemati-
cally negative error of the estimator, h(X).
Variance measures the average deviation of a random variable from its expected value. In the cur-
rent setting we are comparing the predicted value h(X) of Y , with respect to some data D about the
relationship between X and Y , and the average value of h(X), E [h(X)], which we will write
The bias-variance decomposition of mean squared error is rooted in frequentist statistics, where the
objective is to compute an estimate h(X) of the true parameter r(X) with respect to data D about the
relationship between X and Y . Here the parameter r(X) characterizing the truth about Y is assumed to be
fixed and the data D is treated as a random quantity, which is exactly the reverse of Bayesian statistics.
What this means is that the data set D is interpreted to be one among many possible data sets of the same
dimension generated by the true model, the deterministic process r(X).
18
Following (Bishop 2006), we may derive the bias-variance decomposition of mean squared error of
h as follows. Let h refer to our estimate h(X) of Y , r refer to the true value of Y , and E [h] the expected
value of the estimate h. Then,
MSE(h) = E (r − h)2
h i
= E ((r − E [h]) + (E [h] − h))2
h i
= E (r − E [h])2 + E (E [h] − h)2 + 2E [(E [h] − h) · (r − E [h])]
h i
= (r − E [h])2 + E (E [h] − h)2 + 0
= B(h)2 + Var(h)
Note that the frequentist assumption that r is a deterministic process is necessary for the derivation to
go through; for if r were a random quantity, the reduction of E [r · E [h]] to r · E [h] in line (2) would be
invalid.
One last detail that we have skipped over is the prediction error of h(X) due to noise, N, which
occurs independent of the model/learning algorithm used. Thus, the full bias-variance decomposition of
the mean-squared error of an estimate h is the sum of the bias (squared), variance, and irreducible error:
19
humans learn a complicated skill, such as driving a car, from how a machine learning system learns the
same task. As harrowing an experience it is to teach a teenager how to drive a car, they do not need to
crash into a utility pole 10,000 times to learn that utility poles are not traversable. What teenagers learn as
children about the world through play and observing other people drive lends to them an understanding
that that utility poles are to be steered around, a piece of commonsense that our current machine learning
systems do not have but must learn from scratch on a case-by-case basis. We, unlike our machines, have
a remarkable capacity to transfer what we learn from one domain to another domain, a capacity fueled
in part by our curiosity (Kidd and Hayden 2015).
Viewed from the perspective of the bias-variance trade-off, the ability to make accurate predictions
from sparse data suggests that variance is the dominant source of error but that our cognitive system
often manages to keep these errors within reasonable limits (Gigerenzer and Brighton 2009). Indeed,
Gigerenzer and Brighton make a stronger argument, stating that “ the bias-variance dilemma shows for-
mally why a mind can be better off with an adaptive toolbox of biased, specialized heuristics” (Gigeren-
zer and Brighton 2009, p. 120); see also (Section 7.2). However, the bias-variance decomposition is a
decomposition of squared loss, which means that the decomposition above depends on how total error
(loss) is measured. There are many loss functions, however, depending on the type of inference one is
making along with the stakes in making it. If one were to use a 0-1 loss function, for example, where all
non-zero errors are treated equally—meaning that “a miss as good as a mile”—the decomposition above
breaks down. In fact, for 0-1 loss, bias and variance combine multiplicatively (Friedman 1997)! A gen-
eralization of the bias-variance decomposition that applies to a variety of loss functions L(·), including
0-1 loss, has been offered by (Domingos 2000),
where the original bias-variance decomposition, Equation 3, appears as a special case, namely when
L(h) = MSE(h) and β1 = β2 = 1.
Our discussion of improper linear models (Section 2.3) mentioned a model that often comes surpris-
ingly close to approximating a proper linear model, and our discussion of the bias-variance decomposi-
tion (Section 4.2) referred to conjectures about how cognitive systems might manage to make accurate
predictions with very little data . In this section we review examples of models which deviate from
the normative standards of global rationality yet yield markedly improved outcomes—sometimes even
yielding results which are impossible under the conditions of global rationality. Thus, in this section we
will survey examples from the statistics of small samples and game theory which point to demonstrable
advantages to deviating from global rationality.
20
Kahneman and Tversky attributed this effect to a systematic failure of people to appreciate the biases
that attend small samples, although Hertwig and others have offered evidence that samples drawn from
a single population are close to the known limits to working memory (Hertwig, Barron, Weber, and Erev
2004).
Overconfidence can be understood as an artifact of small samples. The Naïve Sampling Model
(Juslin, Winman, and Hannson 2007) assumes that agents base judgments on a small sample retrieved
from long-term memory at the moment a judgment is called for, even when there are a variety of other
methods available to the agent. This model presumes that people are naïve statisticians (Fiedler and
Juslin 2006) who assume, sometimes falsely, that samples are representative of the target population
of interest and that sample properties can be used directly to yield accurate estimates of a population.
The idea is that when sample properties are uncritically taken as estimators of population parameters
a reasonably accurate probability judgment can be made with overconfidence, even if the samples are
unbiased, accurately represented, and correctly processed by the cognitive mechanisms of the agent.
When sample sizes are restricted, these effects are amplified.
However, sometimes effective behavior is aided by inaccurate judgments or cognitively adaptive
illusions (Howe 2011). The statistical properties of small samples are a case in point. One feature of
small samples is that correlations are amplified, making them easier to detect (Kareev 1995). This fact
about small samples, when combined with the known limits to human short-term memory, suggests
that our working-memory limits may be an adaptive response to our environment that we exploit at
different stages in our lives. Adult short-term working memory is limited to seven items, plus or minus
two. For correlations of 0.5 and higher, Kareev demonstrates that sample sizes between five and nine
are most likely to yield a sample correlation that is greater than the true correlation in the population
(Kareev 2000), making those correlations nevertheless easier to detect. Furthermore, children’s short-
term memories are even more restricted than adults, thus making correlations in the environment that
much easier to detect. Of course, there is no free lunch: this small-sample effect comes at the cost of
inflating estimates of the true correlation coefficients and admitting a higher rate of false positives (Juslin
and Olsson 2005). However, in many contexts, including child development, the cost of error arising
from under-sampling may be more than compensated by the benefits from simplifying choice (Hertwig
and Pleskac 2008) and accelerating learning. In the spirit of Brunswik’s argument for representative
experimental design (Section 3.2), a growing body of literature cautions that the bulk of experiments
on adaptive decision-making are performed in highly simplified environments that differ in important
respects from the natural world in which human beings make decisions (Fawcett, Fallenstein, Higginson,
Houston, Mallpress, Trimmer, and McNamara 2014). In response, Houston, MacNamara and colleagues
argue, we should incorporate more environmental complexity in our models.
21
improper models yielding results that were strictly better than what was prescribed by the corresponding
proper model. In the early 1980s Robert Axelrod held a tournament to empirically test which among
a collection of strategies for playing iterations of the prisoner’s dilemma performed best in a round-
robin competition. The winner was a simple reciprocal altruism strategy called tit-for-tat (Rapoport and
Chammah 1965), which simply starts off each game cooperating then, on each successive round, copies
the strategy the opposing player played in the previous round. So, if your opponent cooperated in this
round, then you will cooperate on the next round; and if your opponent defected this round, then you
will defect the next. Subsequent tournaments have shown that tit-for-tat is remarkably robust against
much more sophisticated alternatives (Axelrod 1984). For example, even a rational utility maximizing
player playing against an opponent who only plays tit-for-tat (i.e., will play tit-for-tat no matter whom
he faces) must adapt and play tit-for-tat—or a strategy very close to it (Kreps, Milgrom, Roberts, and
Wilson 1982).
Since tit-for-tat is a very simple strategy, computationally, one can begin to explore a notion of ra-
tionality that emerges in a group of boundedly rational agents and even see evidence of those bounds
contributing to the emergence of pro-social norms. Rubinstein (Rubinstein 1986) studied finite automata
which play repeated prisoner’s dilemmas and whose aims are to maximize average payoff while mini-
mizing the number of states of a machine. Finite automata capture regular languages, the lowest-level
of the Chomsky-hierarchy, thus model a type of boundedly rational agents. Solutions are a pair of ma-
chines in which the choice of the machine is optimal for each player at every stage of the game. In an
evolutionary interpretation of repeated games, each iteration of Rubinstein’s can be seen as successive
generations of agents. This approach is in contrast to Neyman’s study of players of repeated games who
can only play mixtures of pure strategies that can be programmed on finite automata, where the number
of states that are available is an exogenous variable whose value is fixed by the modeler. In Neyman’s
model, each generation plays the entire game and thus traits connected to reputation can arise (Neyman
1985). More generally, although cooperation is impossible for infinitely repeated prisoner’s dilemmas,
for finitely repeated prisoner’s dilemmas, a cooperative equilibrium exists for finite automata players
whose number of states is less than exponential in the number of rounds of the game (Papadimitriou
and Yannakakis 1994; Ho 1996). The demands on memory may exceed the psychological capacities of
people, however, even for simple strategies like tit-for-tat played by a moderately sized group of play-
ers (Stevens, Volstorf, Schooler, and Rieskamp 2011). These theoretical models showing a number of
simple paths to pro-social behavior may not, on their own, be simple enough to offer plausible process
models for cooperation.
On the heels of work on the effects of time (finite iteration versus infinite iteration) and mem-
ory/cognitive ability (finite state automata versus Turing machines), attention soon turned to environ-
mental constraints. Nowak and May looked at the spatial distribution on a two-dimensional grid of
‘cooperators’ and ‘defectors’ in iterated prisoner’s dilemmas and found cooperation to emerge among
players without memories or strategic foresight (Nowak and May 1992). This work led to the study of
network topology as a factor in social behavior (Jackson 2010), including social norms (Bicchieri 2005;
Alexander 2007), signaling (Skyrms 2003), and wisdom of crowd effects (Golub and Jackson 2010).
When social ties in a network follow a scale-free distribution, the resulting diversity in the number
and size of public-goods games is found to promote cooperation, which contributes to explaining the
emergence of cooperation in communities without mechanisms for reputation and punishment (Santos,
Santos, and Pacheco 2008).
But, perhaps the simplest case for bounded rationality are examples of agents achieving a desirable
goal without any deliberation at all. Insects, flowers, and even bacteria exhibit evolutionary stable strate-
gies (Maynard Smith 1982), effectively arriving at Nash equilibria in strategic normal form games. If
we imagine two species interacting with one another, say honey bees (Apis mellifera) and a species of
flower, each interaction between a bee and a flower has some bearing on the fitness of each species,
where fitness is defined as the expected number of offspring. There is an incremental payoff to bees and
flowers, possibly negative, after each interaction, and the payoffs are determined by the genetic endow-
ments of bees and flowers each. The point is that there is no choice exhibited by these organisms nor
22
in the models; the process itself selects the traits. The agents have no foresight. There are no strategies
that the players themselves choose. The process is entirely mechanical. What emerges in this setting are
evolutionary dynamics, a form of bounded rationality without foresight.
Of course, any improper model can misfire. A rule of thumb shared by people the world-over is
to not let other people take advantage of them. While this rule works most of the time, it misfires in
the ultimatum game (Güth, Schmittberger, and Schwarze 1982). The ultimatum game is a two-player
game in which one player, endowed with a sum of money, is given the task of splitting the sum with
another player who may either accept the offer—in which case the pot is accordingly split between the
two players—or rejected, in which case both players receiving nothing. People receiving offers of 30
percent or less of the pot are often observed to reject the offer, even when players are anonymous and
therefore would not suffer the consequences of a negative reputation signal associated with accepting a
very low offer. In such cases, one might reasonably argue that no proposed split is worse than the status
quo of zero, so people ought to accept whatever they are offered.
Aumann advanced five arguments for bounded rationality, which we paraphrase here (Aumann 1997).
1. Even in very simple decision problems, most economic agents are not (deliberate) maximizers.
People do not scan the choice set and consciously pick a maximal element from it.
2. Even if economic agents aspired to pick a maximal element from a choice set, performing such
maximizations are typically difficult and most people are unable to do so in practice.
3. Experiments indicate that people fail to satisfy the basic assumptions of rational decision theory.
4. Experiments indicate that the conclusions of rational analysis (broadly construed to include ratio-
nal decision theory) do not match observed behavior.
In the previous sections we covered the origins of each of Aumann’s arguments. Here we briefly review
each, highlighting material in other sections under this context.
The first argument, that people are not deliberate maximizers, was a working hypothesis of Simon’s,
who maintained that people tend to satisfice rather than maximize (Section 2.2). Kahneman and Tver-
sky gathered evidence for the reflection effect in estimating the value of options, which is the reason for
reference points in prospect theory (Section 2.4) and analogous properties within rank-dependent utility
theory more generally (Sections 1.2 and 2.4). Gigerenzer’s and Hertwig’s groups at the Max Planck
23
Institute for Human Development both study the algorithmic structure of simple heuristics and the adap-
tive psychological mechanisms which explain their adoption and effectiveness; both of their research
programs start from the assumption that expected utility theory is not the right basis for a descriptive
theory of judgment and decision-making (Sections 3, 5.3, and 7.2).
The second argument, that people are often unable to maximize even if they aspire to, was made
by Simon and Good, among others, and later by Kahneman and Tversky. Simon’s remarks about the
complexity of Γ-maxmin reasoning in working out the end-game moves in chess (Section 2.2) is one
of many examples he used over the span of his career, starting before his seminal papers on bounded
rationality in the 1950s. The biases and heuristics program spurred by Tversky and Kahneman’s work
in the late 1960s and 1970s (Section 7.1) launched the systematic study of when and why people’s
judgments deviate from the normative standards of expected utility theory and logical consistency.
The third argument, that experiments indicate that people fail to satisfy the basic assumptions of
expected utility theory, was known from early on and emphasized by the very authors who formulated
and refined the homo economicus hypothesis (Section 1) and whose names are associated with the
mathematical foundations. We highlighted an extended quote from Savage in Section 1.3, but could
mention as well a discussion of the theory’s limitations by de Finetti and Savage (de Finetti and Savage
1962), and even a closer reading of the canonical monographs of each, namely (Savage 1954) and (de
Finetti 1974). A further consideration, which we discussed in Section 1.3 is the demand of logical
omniscience in expected utility theory and nearly all axiomatic variants.
The fourth argument, regarding the differences between the predictions of rational analysis and
observed behavior, we addressed in discussions of Brunswik’s notion of ecological validity (Section 3.2)
and the traditional responses to these observations by rational analysis (Section 3.3). The fifth argument,
that some of the conclusions of rational analysis do not agree with a reasonable normative standard, was
touched on in Sections 1.2, 1.3, and the subject of Section 5.
Implicit in Aumann’s first four arguments is the notion that global rationality (Section 2) is a rea-
sonable normative standard but problematic for descriptive theories of human judgment and decision-
making (Section 8). Even the literature standing behind Aumann’s 5th argument, namely that there are
problems with expected utility theory as a normative standard, nevertheless typically address those short-
comings through modifications to, or extensions of, the underlying mathematical theory (Section 1.2).
This broad commitment to optimization methods, dominance reasoning, and logical consistency as
bedrock normative principles is behind approaches that view bounded rationality as optimization un-
der constraints.
Boundedly rational procedures are in fact fully optimal procedures when one takes account
of the cost of computation in addition to the benefits and costs inherent in the problem as
originally posed (Arrow 2004).
For a majority of researchers across disciplines, bounded rationality is identified with some form of
optimization problem under constraints.
Gerd Gigerenzer is among the most prominent and vocal critics of the role that optimization methods
and logical consistency plays in commonplace normative standards for human rationality (Gigerenzer
and Brighton 2009), especially the role those standards play in Kahneman and Tversky’s biases and
heuristics program (Kahneman and Tversky 1996; Gigerenzer 1996). We turn to this debate next, in
Section 7.
Heuristics are simple rules of thumb for rendering a judgment or making a decision. Some examples
that we have seen thus far include Simon’s satisficing, Dawes’s improper linear models, Rapoport’s tit-
for-tat, imitation, and several effects observed by Kahneman and Tversky in our discussion of prospect
theory.
There are nevertheless two views on heuristics that are roughly identified with the research tradi-
tions associated with Kahneman and Tversky’s biases and heuristics program and Gigerenzer’s fast and
24
frugal heuristics program, respectively. A central dispute between these two research programs is the
appropriate normative standard for judging human behavior (Vranas 2000). According to Gigerenzer,
the biases and heuristics program mistakenly classifies all biases as errors (Gigerenzer, Todd, and Group
1999; Gigerenzer and Brighton 2009) despite evidence pointing to some biases in human psychology
being adaptive. In contrast, in a rare exchange with a critic, Kahneman and Tversky maintain that the
dispute is merely terminological (Kahneman and Tversky 1996; Gigerenzer 1996).
In this section, briefly survey each of these two schools. Our aim is to give a characterization of each
research program rather than an exhaustive overview.
A cab was involved in a hit and run accident at night. Two cab companies, the Green and
the Blue, operate in the city. You are given the following data:
(i) 85% of the cabs in the city are Green and 15% are Blue.
(ii) A witness identified the cab as a Blue cab. The court tested his ability to identify cabs
under the appropriate visibility conditions. When presented with a sample of cabs
(half of which were Blue and half of which were Green) the witness made correct
identifications in 80% of the cases and erred in 20% of the cases.
Question: What is the probability that the cab involved in the accident was Blue rather than
Green? (Tversky and Kahneman 1977, §3-3).
Continuing, Kahneman and Tverskey report that several hundred subjects have been given slight varia-
tions of this question and for all versions the modal and median responses was 0.8, instead of the correct
answer of 12/29 (≈ 0.41). “Thus, the intuitive judgment of probability coincides with the credibility
of the witness and ignores the relevant base-rate, i.e., the relative frequency of Green and Blue cabs”
(Tversky and Kahneman 1977, §3-3).
Critical responses to results of this kind fall into three broad categories. The first types of reply
is to argue that the experimenters, rather than the subjects, are in error (Cohen 1981). In the Taxi-cab
problem, arguably Bayes sides with the folk (Levi 1983) or, alternatively, is inconclusive because the
normative standard of the experimenter and the presumed normative standard of the subject requires
a theory of witness testimony, neither of which is specified (Birnbaum 1979). Other cognitive biases
have been ensnared in the replication crises, such as implicit bias (Oswald, Mitchell, Blanton, Jaccard,
and Tellock 2013; Forscher, Lai, Axt, Ebersole, Herman, Devine, and Nosek 2017) and social priming
(Doyen, Klein, Pichton, and Cleeremans 2012; Kahneman 2017).
The second response is to argue that there is an important difference between identifying a normative
standard for combining probabilistic information and applying it across a range of cases (Section 8.2),
and it is difficult in practice to determine that a decision-maker is representing the task in the manner
that the experimenters intend (Koehler 1996). Observed behavior that appears to be boundedly rational
25
or even irrational may result from a difference between the intended specification of a problem and the
actual problem subjects face.
For example, consider the systematic biases in people’s perception of randomness reported in some
of Kahneman and Tversky’s earliest work (Kahneman and Tversky 1972). For sequences of flips of
a fair coin, people expect to see, even for small samples, a roughly-equal number heads and tails and
alternation rates between heads and tails that are slightly higher than long-run averages (Bar Hillel and
Wagenaar 1991). This effect is thought to explain the gambler’s fallacy, the false belief that a run of
heads from an i.i.d. sequence of fair coin tosses will make the next flip more likely to land tails. Hahn
and Warren argue that the limited nature of people’s experiences with random sequences is a better
explanation than to view them as cognitive deficiencies. Specifically, people only ever experience finite
sequence of outputs from a randomizer, such as a sequence of fair coin tosses, and the limits to their
memory (Section 5.1) of past outcomes in a sequence will mean that not all possible sequences of a
given length with appear to them with equal probability. Therefore, there is a psychologically plausible
interpretation of the question, “is it more likely to see HHHT than HHHH from flips of a fair coin?”,
for which the correct answer is, “Yes” (Hahn and Warren 2009). If the gambler’s fallacy boils down
to a failure to distinguish between sampling with and without replacement, Hahn and Warren’s point is
that our intuitive statistical abilities acquired through experience along is unable to make the distinction
between these two sampling methods. Analytical reasoning is necessary.
Consider also the risky-choice framing effect that was mentioned briefly in Section 2.4. An example
is the Asian disease example,
(b) If program B is adopted, there is a 1/3 probability that 600 people will be saved, and a 2/3 proba-
bility that no people will be saved (Tversky and Kahneman 1981, p. 453).
Tversky and Kahneman report that a majority of respondents (72 percent) chose option (a), whereas a
majority of respondents (78 percent) shown an equivalent reformulation of the problem in terms of the
number of people who would die rather than survive chose (b). A meta-analysis of subsequent experi-
ments has shown that the framing condition accounts for most of the variance, but it also reveals no linear
combination of formally specified predictors that are used in prospect theory, cumulative prospect theory,
and Markowitz’s utility theory, suffices to capture this framing effect (Kühberger, Schulte-Mecklenbeck,
and Perner 1999). Furthermore, the use of an indicative conditional in this and other experiments to ex-
press the consequences is also not (currently) adequately understood. Experimental evidence collected
about how people’s judgments change when learning an indicative conditional, while straight-forward
and intuitive, cannot be accommodated by existing theoretical frameworks for conditionals (Collins,
Krzyż, Hartmann, Wheeler, and Hahn 2018).
The point to this second line of criticism is not that people’s responses are at variance with the correct
normative standard but rather that the explanation for why they are at variance will matter not only for
assessing the rationality of people but what prescriptive interventions ought to be taken to counter the
error. It is rash to conclude that people, rather than the peculiarities of the task or the theoretical tools
available to us at the moment, are in error.
Lastly, the third type of response is to accept the experimental results but challenge the claim that
they are generalizable. In a controlled replication of Kahneman and Tversky’s lawyer-engineer exam-
ple (Tversky and Kahneman 1977), for example, a crucial assumption is whether the descriptions of
the individuals were drawn at random, which was tested by having subjects draw blindly from an urn
(Gigerenzer, Hell, and Blank 1988). Under these conditions, base-rate neglect disappeared. In response
to the Linda example (Tversky and Kahneman 1983), rephrasing the example in terms of which al-
ternative is more frequent rather than which alternative is more probable reduces occurrences of the
conjunction fallacy among subjects from 77% to 27% (Fiedler 1988). More generally, a majority of
people presented with the Linda example appear to interpret ‘probability’ nonmathematically but switch
to a mathematical interpretation when asked for frequency judgments (Hertwig and Gigerenzer 1999).
26
Ralph Hertwig and colleagues have since noted a variety of other effects involving probability judg-
ments to diminish or disappear when subjects are permitted to learn the probabilities through sampling,
suggesting that people are better adapted to making a decision by experience of the relevant probabilities
as opposed to making a decision by their description (Hertwig, Barron, Weber, and Erev 2004).
27
the Hudson River.
Here are a list of heuristics studied in the Fast and Frugal program (Gigerenzer, Hertwig, and Pachur
2011), along with an informal description for each along with historical and selected contemporary
references.
Imitation. People have a strong tendency to imitate the successful members of their communities
(Henrich and Gil-White 2001). “If some man in a tribe . . . invented a new snare or weapon,
or other means of attack or defense, the plainest self-interest, without the assistance of much
reasoning power, would prompt other members to imitate him” (Darwin 1871, p. 155). Imitation
is presumed to be fundamental to the speed of cultural adaptation including the adoption of social
norms (Section 3.4).
Preferential Attachment. When given the choice to form a new connection to someone, pick the
individual with the most connections to others (Yule 1911; Simon 1955b; Barabási and Albert
1999).
Default rules. If there is an applicable default rule, and no apparent reason for you to do other-
wise, follow the rule. (Fisher 1936; Reiter 1980; Wheeler 2004; Thaler and Sustein 2008).
Satisficing. Search available options and choose the first one that exceeds your aspiration level.
(Simon 1955a; Hutchinson, Fanselow, and Todd 2012).
Tallying. To estimate a target criterion, rather than estimate the weights of available cues, instead
count the number of positive instances (Dawes 1979; Dana and Dawes 2004).
One-bounce Rule (Hey’s Rule B). Have at least two searches for an option. Stop if a price quote
is larger than the previous quote. The one-bounce rule plays “winning-streaks” by continuing
search while you keep receiving a series of lower and lower quotes, but stops as soon as your luck
runs out (Hey 1982; Charness and Kuhn 2011).
Tit-for-tat. Begin by cooperating, then respond in kind to your opponent; If your opponent
cooperates, then cooperate; if your opponent defects, then defect (Axelrod 1984; Rapaport, Seale,
and Colman 2015).
Linear Optical Trajectory (LOT). To intersect with another moving object, adjust your speed so
that your angle of gaze remains constant. (McBeath, Shaffer, and Kaiser 1995; Gigerenzer 2007).
Take-the-best. To decide which of two alternatives has a higher value on a specific criterion, (i)
first search the cues in order of their predictive validity; (ii) next, stop search when a cue is found
which discriminates between the alternatives; (iii) then, choose the alternative selected by the
discriminating cue. (iv) If all cues fail to discriminate between the two alternatives, then choose
an alternative by chance (Einhorn 1970; Gigerenzer and Goldstein 1996).
Recognition: To decide which of two alternatives has a higher value on a specific criterion and
one of the two alternatives is recognized, choose the alternative that is recognized (Goldstein and
Gigerenzer 2002; Davis-Stober, Dana, and Budescu 2010; Pachur, Todd, Gigerenzer, Schooler,
and Goldstein 2012).
Fluency: To decide which of two alternatives has a higher value on a specific criterion, if both
alternatives are recognized but one is recognized faster, choose the alternative that is recognized
faster(Schooler and Hertwig 2005; Herzog and Hertwig 2013).
1
N Rule: For N feasible options, invest resources equally across all N options (Hertwig, Davis,
and Sulloway 2002; DeMiguel, Garlappi, and Uppal 2009).
28
There are three lines of responses to the Fast and Frugal program to mention. Take-the-Best is an
example of a non-compensatory decision rule, which means that the first discriminating cue cannot be
“compensated” by the cue-information remaining down the order. This condition, when it holds, is
thought to warrant taking a decision on the first discriminating cue and ignoring the remaining cue-
information. The computational efficiency of Take-the-Best is supposed to come from only evaluating a
few cues, which number less than 3 on average in benchmarks tests (Czerlinski, Gigerenzer, and Gold-
stein 1999). However, all of the cue validities need to be known by the decision-maker and sorted before
initiating the search. So, Take-the-Best by design treats a portion of the necessary computational costs
to execute the heuristic as exogenous. Although the lower-bound for sorting cues by comparison is
O(n log n), there is little evidence to suggest that humans sort cues by the most efficient sorting algo-
rithms in this class. On the contrary, such operations are precisely of the kind that qualitative probability
judgements demand (Section 1.2). Furthermore, in addition to the costs of ranking cue validities, there is
the cost of acquisition and the determination that the agent’s estimates are non-compensatory. Although
the exact accounting of the cognitive effort presupposed is unknown, and argued to be lower than crit-
ics suggest (Katsikopoulos, Schooler, and Hertwig 2010), nevertheless these necessary steps threaten to
render Take-the-Best non-compensatory in execution but not in what is necessary prior to setting up the
model to execute.
A second line of criticism concerns the cognitive plausibility of Take the Best (Chater, Oaksford,
Nakisa, and Redington 2003). Nearly all of the empirical data on the performance characteristics of
Take-the-Best are by computer simulations, and those original competitions pitted Take the Best against
standard statistical models (Czerlinski, Gigerenzer, and Goldstein 1999) but omitted standard machine
learning algorithms that Chater, Oaksford and colleagues found performed just as well as Take the
Best. Since these initial studies, the focus has shifted to machine learning, and includes variants of
Take-the-Best, such as “greedy cue permutation” that performs provably better than the original and is
guaranteed to always find accurate solutions when they exist (Schmitt and Martignon 2006). Setting
aside criticisms targeting the comparative performance advantages of Take the Best qua decision model,
others have questioned the plausibility of using Take-the-Best as a cognitive model. For example, Take-
the-Best presumes that cue-information is processed serially, but the speed advantages of the model
translate to an advantage in human decision-making derives only if humans process cue information on
a serial architecture. If instead people process cue information on a parallel cognitive architecture, then
the comparative speed advantages of Take-the-Best would become moot (Chater, Oaksford, Nakisa, and
Redington 2003).
The third line of objection concerns whether the Fast-and-Frugal program truly mounts a challenge
to the normative standards of optimization, dominance-reasoning, and consistency, as advertised. Take-
the-Best is an algorithm for decision-making that does not comport with the axioms of expected utility
theory. For one thing, its lexicographic structure violates the Archimedean axiom (Section 1.2, A2). For
another, it is presumed to violate the transitivity condition of the Ordering axiom (A1). Further still,
the “less-is-more” effects appear to violate Good’s principle (Good 1967), a central pillar of Bayesian
decision theory, which recommends to delay making a terminal decision between alternative options if
the opportunity arises to acquire free information. In other words, according canonical Bayesianism,
free advice is a bore but no one ought to turn down free information (Pedersen and Wheeler 2014).
If noncompensatory decision rules like Take-the-Best violate Good’s principle, then perhaps the whole
Bayesian machinery ought to go (Gigerenzer and Brighton 2009).
But these points merely tell us that attempts to formulate Take-the-Best in terms of an ordering
of prospects on a real-valued index won’t do, not that ordering and numerical indices have all got to
go. As we saw in Section 1.1, there is a long and sizable literature on lexicographic probabilities and
non-standard analysis, including early work specifically addressing non-compensatory nonlinear models
(Einhorn 1970). Second, Gigerenzer argues that “cognitive algorithms. . . need to meet more important
constraints than internal consistency” (Gigerenzer and Goldstein 1996), which includes transitivity, and
elsewhere advocates abandoning coherence as a normative standard (Arkes, Gigerenzer, and Hertwig
2016). However, Take-the-Best presupposes that cues are ordered by cue validity, which naturally entails
29
transitivity, otherwise Take-The-Best could neither be coherently specified nor effectively executed.
More generally, the Fast and Frugal school’s commitment to formulating heuristics algorithmically and
implementing them as computational models commits them to the normative standards of optimization,
dominance reasoning, and logical consistency.
Finally, Good’s principle states that a decision-maker facing a single-person decision-problem can-
not be worse (in expectation) from receiving free information. Exceptions are known in game theory
(Osborne 2003, p. 283), however, that involve asymmetric information among two or more decision-
makers. But there is also an exception for single-person decision-problems involving indeterminate or
imprecise probabilities (Pedersen and Wheeler 2015). The point is that Good’s principle is not a fun-
damental principle of probabilistic methods, but instead is a specific result that holds for the canonical
theory of single-person decision-making with determinate probabilities.
The rules of logic, the axioms of probability, the principles of utility theory—humans flout them all,
and do so as a matter of course. But are we irrational to do so? That depends on what being rational
amounts to. For a Bayesian, any qualitative comparative judgment that does not abide by the axioms of
probability is, by definition, irrational. For a baker, any recipe for bread that is equal parts salt and flour
is irrational, even if coherent. Yet Bayesians do not war with bakers. Why? Because bakers are satisfied
with the term ‘inedible’ and do not aspire to commandeer ‘irrational’.
The two schools of heuristics (Section 7) reach sharply different conclusions about human rational-
ity. Yet, unlike bakers, their disagreement involves the meaning of ‘rationality’ and how we ought to
appraise human judgment and decision making. The “rationality wars” are not the result of “rhetorical
flourishes” concealing a broad consensus (Samuels, Stich, and Bishop 2002), but substantive disagree-
ments (Section 7.2) that are obscured by ambiguous use of terms like ‘rationality’.
In this section we first distinguish seven different notions of rationality, highlighting the differences
in aim, scope, standards of assessment, and differences in the objects of evaluation. We then turn to
consider different two importantly different normative standards used in bounded rationality, followed
by an example, the perception-cognition gap, illustrating how slight variations of classical experimental
designs in the biases and heuristics literature change both the results and the normative standards used
to evaluate those results.
8.1 Rationality
While Aristotle is credited with saying that humans are rational, Bertrand Russell later confessed to
searching a lifetime in vain for evidence in Aristotle’s favor. Yet ‘rationality’ is what Marvin Minsky
called a suitcase word, a term that needs to be unpacked before getting anywhere.
One meaning, central to decision theory, is coherence, which is merely the requirement that your
commitments not be self-defeating. The subjective Bayesian representation of rational preference over
options as inequalities in subjective expected utility delivers coherence by applying a dominance prin-
ciple to (suitably structured) preferences. A closely related application of dominance reasoning is the
minimization of expected loss (or maximization of expected gain in economics) according to a suitable
loss function, which may even be asymmetric (Elliott, Komunjer, and Timmermann 2005) or applied
to radically restricted agents, such as finite automata (Rubinstein 1986). Coherence and dominance
reasoning underpin expected utility theory (Section 1.1), too.
A second meaning of rationality refers to an interpretive stance or disposition that we take to un-
derstand the beliefs, desires, and actions of another person (Dennett 1971) or to understand anything
they might say in a shared language (Davidson 1974). On this view, rationality refers to a bundle of
assumptions we grant to another person in order to understand their behavior, including speech. When
we offer a reason-giving explanation for another person’s behavior, we take such a stance. If I say “the
driver laughed because she made a joke” you would not get far in understanding me without granting
to me, and even this imaginary driver and woman, a lot. So, in contrast to the lofty normative standards
30
of coherence that few if any mortals meet, the standards of rationality associated with an interpretive
stance are met by practically everyone.
A third meaning of rationality, due to Hume (1738), applies to your beliefs, appraising them in
how well they are calibrated with your experience. If in your experience the existence of one thing
is invariably followed by an experience of another, then believing that the latter follows the former is
rational. We might even go so far as to say that your expectation of the latter given your experience of the
former is rational. This view of rationality is an evaluation of a person’s commitments, like coherence
standards; but unlike coherence, Hume’s notion of rationality seeks to tie the rational standing of a belief
directly to evidence from the world. Much of contemporary epistemology endorses this concept of
rationality while attempting to specify the conditions under which we can correctly attribute knowledge
to someone’s beliefs.
A fourth meaning of rationality, called substantive rationality by Max Weber (Weber 1905), applies
to the evaluation of your aims of inquiry. Substantive rationality invokes a Kantian distinction between
the worthiness of a goal, on the one hand, and how well you perform instrumentally in achieving that
goal, on the other. Aiming to count the blades of grass in your lawn is arguably not a rational end to
pursue, even if you were to use the instruments of rationality flawlessly to arrive at the correct count.
A fifth meaning of rationality, due to Peirce (1955) and taken up by the American pragmatists,
applies the process of changing a belief rather than the Humean appraisal of a currently held belief. On
Peirce’s view, people are plagued by doubt not by belief; we don’t expend effort testing the sturdiness of
our beliefs, but rather focus on those that come into doubt. Since inquiry is pursued to remove the doubts
we have, not certify the stable beliefs we already possess, principles of rationality ought to apply to the
methods for removing doubt (Dewey 1960). On this view, questions of what is or is not substantively
rational will be answered by the inquirer: for an agronomist interested in grass cover sufficient to crowd
out an invasive weed, obtaining the grass-blade count of a lawn would be a substantively rational aim to
pursue.
A sixth meaning of rationality appeals to an organism’s capacities to assimilate and exploit complex
information and revise or modify it when it is no longer suited to task. The object of rationality according
to this notion is effective behavior. Jonathan Bennett discusses this notion of rationality in his case study
of bees:
All our prima facie cases of rationality or intelligence were based on the observation that
some creature’s behaviour was in certain dependable ways successful or appropriate or apt,
relative to its presumed wants or needs. . . . There are canons of appropriateness whereby
we can ask whether an apian act is appropriate not to that which is particular and present to
the bee but rather to that which is particular and past or to that which is not particular at all
but universal (Bennett 1964, p. 85).
Like Hume’s conception, Bennetti’s view ties rationality to successful interactions with the world. Fur-
ther, like the pragmatists, Bennett includes for appraisal the dynamic process rather than simply the
synchronic state of one’s commitments or the current merits of a goal. But unlike the pragmatists, Ben-
nett conceives of rationality to apply to a wider range of behavior than the logic of deliberation, inquiry,
and belief change.
A seventh meaning of rationality resembles the notion of coherence by defining rationality as the
absence of a defect. For Bayesians, sure-loss is the epitome of irrationality and coherence is simply its
absence. Sorensen has suggested a generalization of this strategy, one where rationality is conceived as
the absence of irrationality tout court, just as cleanliness is the absence of dirt. Yet, owing to the long
and varied ways that irrationality can arise, a consequence of this view is that there then would be no
unified notion of rationality to capture the idea of thinking as one ought to think (Sorensen 1991).
These seven accounts of rationality are neither exhaustive nor complete. But they suffice to illustrate
the range of differences among rationality concepts, from the objects of evaluation and the standards
of assessment, to the roles, if any at all, that rationality is conceived to play in reasoning, planning,
deliberation, explanation, prediction, signaling, and interpretation. One consequence of this hodgepodge
31
of rationality concepts is a pliancy in the attribution of irrationality that resembles Victorian methods for
diagnosing the vapors. The time may have come to retire talk of rationality altogether, or to demand a
specification of the objects of evaluation, the normative standards to be used for assessment, and require
ample attention to the implications that follow from those commitments.
32
The accuracy paradox is one motivation for introducing other measures of predictive performance.
For our fraud detection problem there are two ways your prediction can be correct and two ways it can
be wrong. A prediction can be correct by predicting that Y = 1 when in fact a transaction is fraudulent
(a true positive) or predicting Y = 0 when in fact a transaction is legitimate (a true negative). Corre-
spondingly, one may err by either predicting Y = 1 when in fact Y = 0 (a false positive) or predicting
Y = 0 when in fact a transaction is legitimate (a true negative). These four possibilities are presented in
the following two-by-two contingency table, which is sometimes referred to as a confusion matrix:
Actual Class
Y 1 0
true false
1
Predicted positive positive
Class false true
0
negative negative
For a binary classification problem involving N examples, each prediction will fall into one of these four
categories. The performance of your classifier with respect to those N examples can then be assessed.
A perfectly inaccurate classifier will have all zeros in the diagonal; a perfectly accurate classifier will
have all zeros in the counterdiagonal. The precision of your classifer is the ratio of true positives to all
positive predictions, that is true positives / (true positives + false positives). The recall of your classifier
is the ratio of true positives to all true predictions, that is true positives / (true positives + false negatives).
There are two points to notice. The first is that in practice there is typically a trade-off between
precision and recall, and the costs to you of each will vary from one problem to another. A trade-off
of precision and recall that suits detecting credit card fraud may not suit detecting cancer, even if the
frequencies of positive instances are identical. The point of training a classifier on known data is to make
predictions on out of sample instances. So, tuning your classifier to yield a suitable trade-off between
precision and recall in your training data is no guarantee that you will see this trade-off generalize.
The moral is that to evaluate the performance of your classifier it is necessary to specify the purpose
for making the classification and even then good performance on your training data may not generalize.
None of this is antithetical to coherence reasoning per se, as we are making comparative judgments
and reasoning by dominance. But putting the argument in terms of coherence changes the objects of
evaluation, moving from the point of view from the first person the decision maker to that of a third
person decision modeler.
33
system is also robust to outliers (Körding and Wolpert 2004). What is more, advances in machine learn-
ing have been guided by treating human performance errors for a range of perception tasks as proxies
for Bayes error, yielding an observable, near-perfect normative standard. Unlike cognitive decisions,
there is very little controversy concerning the overall optimality of our motor-perceptual decisions. This
difference between high-level and low-level decisions is called the perception-cognition gap.
Some view the perception-cognition gap as evidence for the claim that people use fundamentally
different strategies for each type of task (Section 7.2). An approximation of an optimal method is
not necessarily an optimal approximation of that method, and the study of cognitive judgments and
deliberative decision-making is led astray by assuming otherwise (Mongin 2000). Another view of the
perception-cognition gap is that it is largely an artifact of methodological differences across studies
rather than a robust feature of human behavior. We review evidence for this second argument here.
Classical studies of decision-making present choice problems to subjects where probabilities are de-
scribed. For example, you might be asked to choose the prospect of winning e300 with probability 0.25
or the prospect of winning e400 with probability 0.2. Here, subjects are given a numerical description
of probabilities, are typically asked to make one-shot decisions without feedback, and their responses
are found to deviate from the expected utility hypothesis. However, in motor control tasks, subjects
have to use internal, implicit estimates of probabilities, often learned with feedback, and these internal
estimates are near optimal. Are perceptual-motor control decisions better because they provide feedback
whereas classical decision tasks do not, or are perceptual-motor control decisions better because they
are non-cognitive?
Jarvstad et al. (2013) explored the robustness of the perception-cognition gap by designing (a) a
finger-pointing task that involved varying target sizes on a touch-screen computer display; (b) an arith-
metic learning task involving summing four numbers and accepting or rejecting a proposed answer with
a target tolerance, where the tolerance range varied from problem to problem, analogous to the width of
the target in the motor-control task; and (c) a standard classical probability judgment task that involved
computing the expected value of two prospects. The probability information across the tasks was in
three formats: low-level, high-level, and classical, respectively.
Once confounding factors across the three types of tasks are controlled for, Jarvstad et al.’s results
suggest that (i) the perception-cognition gap is largely explained by differences in how performance is
assessed; (ii) the decisions by experience vs decisions by description gap (Hertwig, Barron, Weber, and
Erev 2004) is due to assuming that exogenous objective probabilities and subjective probabilities match;
(iii) people’s ability to make high-level decisions is better than the biases and heuristics literature sug-
gests (Section 7.1); and (iv) differences between subjects are more important for predicting performance
than differences between the choice tasks (Jarvstad, Hahn, Rushton, and Warren 2013).
The upshot, then, is that once the methodological differences are controlled for, the perception-
cognition gap appears to be an artifact of two different normative standards applied to tasks. If the
standards applied to assessing perceptual-motor tasks are applied to classical cognitive decision-making
tasks, then both appear to perform well. If instead the standards used for assessing the classical cognitive
tasks are applied to perceptual-motor tasks, then both will appear to perform poorly.
Acknowledgements
Thanks to Sebastian Ebert, Ulrike Hahn, Ralph Hertwig, Konstantinos Katsikopoulos, Jan Nagler, Chris-
tine Tiefensee, Conor Mayo-Wilson, and an anonymous referee for helpful comments on earlier drafts
of this article.
References
34
l’école américaine. Econometrica 21(4), 503–546. Reprint of Cowes Foundation Discussion Paper 807,
Anand, P. (1987). Are the preference axioms really ratio- 1986.
nal? Theory and Decision 23, 189–214. Bicchieri, C. (2005). The Grammar of Society. New York:
Anderson, J. R. (1991). The adaptive nature of human cat- Cambridge University Press.
egorization. Psychological Review 98, 409–429. Bicchieri, C. and R. Muldoon (2014). Social norms. In
Anderson, J. R. and L. J. Schooler (1991). Reflections of E. N. Zalta (Ed.), The Stanford Encyclopedia of Phi-
the environment in memory. Psychological Science 2, losophy (Spring 2014 ed.). Metaphysics Research
396–408. Lab, Stanford University.
Arkes, H. R., G. Gigerenzer, and R. Hertwig (2016). How Birnbaum, M. H. (1979). Base rates in Baysian infer-
bad is incoherence? Decision 3(1), 20–39. ence: Signal detection analysis of the cab problem.
The American Journal of Psychology 96(1), 85–94.
Arló-Costa, H. and A. P. Pedersen (2011). Bounded ra-
tionality: Models for some fast and frugal heuristics. Bishop, C. M. (2006). Pattern Recognition and Machine
In A. Gupta, J. van Benthem, and E. Pacuit (Eds.), Learning. New York: Springer.
Games, Norms and Reasons: Logic at the Crossroads. Blume, L., A. Brandenburger, and E. Dekel (1991). Lexi-
Springer. cographic probabilities and choice under uncertainty.
Arrow, K. (2004). Is bounded rationality unboundedly ra- Econometrica 58(1), 61–78.
tional? Some ruminations. In M. Augier and J. G. Bonet, B. and H. Geffner (2001). Planning as heuristic
March (Eds.), Models of a man: Essays in memory of search. Artificial Intelligence 129(1–2), 5–33.
Herbert A. Simon, Cambridge, MA, pp. 47–55. MIT Bowles, S. and H. Gintis (2011). A Cooperative Species:
Press. Human Reciprocity and its Evolution. Princeton, NJ:
Aumann, R. J. (1962). Utility theory without the complete- Princeton University Press.
ness axiom. Econometrica 30, 445–462. Boyd, R. and P. J. Richerson (2005). The Origin and Evolu-
Aumann, R. J. (1997). Rationality and bounded rationality. tion of Cultures. New York: Oxford University Press.
Games and Economic Behavior 21, 2–17. Brickhill, H. and L. Horsten (2016, August). Pop-
Axelrod, R. (1984). The Evolution of Cooperation. New per functions, lexicographic probability, and non-
York: Basic Books. Archimedean probability. arXiv:1608.02850v1.
Brown, S. D. and A. Heathcote (2008). The simplest com-
Ballard, D. H. and C. M. Brown (1982). Computer Vision.
plete model of choice response time: Linear ballistic
Englewood Cliffs, NJ: Prentice Hall.
accumulation. Cognitive Psychology 57, 153–178.
Bar Hillel, M. and A. Margalit (1988). How vicious are cy-
Brunswik, E. (1943). Organismic achievement and envi-
cles of intransitive choice? Theory and Decision 24,
ronmental probability. Psychological Review 50(3),
119–145.
255–272.
Bar Hillel, M. and W. A. Wagenaar (1991). The percep-
Brunswik, E. (1955). Representative design and proba-
tion of randomness. Advances in Applied Mathemat-
bilistic theory in a functional psychology. Psychologi-
ics 12(4), 428–454.
cal Review 62(3), 193–217.
Barabási, A.-L. and R. Albert (1999). Emergence of scal-
Charness, G. and P. J. Kuhn (2011). Lab labor: What can
ing in random networks. Science 286(5439), 509–512.
labor economists learn from the lab? In Handbook of
Barkow, J., L. Cosmides, and J. Tooby (Eds.) (1992). Labor Economics, Volume 4, pp. 229–330. Elsevier.
The Adapted Mind: Evolutionary Psychology and the
Chater, N. (2014). Cognitive science as an interface be-
Generation of Culture. New York: Oxford University
tween rational and mechanistic explanation. Topics in
Press.
Cognitive Science 6, 331–337.
Baumeister, R. F., E. Bratslavsky, and C. Finkenauer Chater, N., M. Oaksford, R. Nakisa, and M. Redington
(2001). Bad is stronger than good. Review of General (2003). Fast, frugal, and rational: How rational norms
Psychology 5(4), 323–370. explain behavior. Organizational Behavior and Hu-
Bazerman, M. H. and D. A. Moore (2008). Judgment in man Decision Processes 90, 63–86.
Managerial Decision Making (7th ed.). New York: Clark, A. and D. Chalmers (1998). The extended mind.
Wiley. Analysis 58(1), 7–19.
Bell, D. E. (1982). Regret in decision making under uncer- Cohen, L. J. (1981). Can human irrationality be experi-
tainty. Operations Research 30(5), 961–981. mentally demonstrated? Behavioral and Brain Sci-
Bennett, J. (1964). Rationality: An Essay towards an Anal- ences 4(3), 317–331.
ysis. London: Routledge. Coletti, G. and R. Scozzafava (2002). Probabilistic Logic
Berger, J. O. (1980). Statistical Decision Theory and in a Coherent Setting. Trends in logic, 15. Dordrecht:
Bayesian Analysis (2nd ed.). New York: Springer. Kluwer.
Bernoulli, D. (1954/1738). Exposition of a new theory on Collins, P. J., K. Krzyż, S. Hartmann, G. Wheeler, and
the measurement of risk. Econometrica 22(1), 23–36. U. Hahn (2018, January). Conditionals and testimony.
Trans. Louise Sommer, from “Specimen Theoriae No- Unpublished Manuscript.
vae de Mensura Sortis”, Commentarii Academiae Sci- Czerlinski, J., G. Gigerenzer, and D. G. Goldstein (1999).
entiarium Imperialis Petropolitanae, Tomus V, 1738, How good are simple heuristics? In G. Gigerenzer,
pp. 175–192. P. M. Todd, and T. A. Group (Eds.), Simple Heuristics
Bewley, T. S. (2002). Knightian decision theory: Part that Make Us Smart, pp. 97–118. Oxford University
I. Decisions in Economics and Finance 25, 79–110. Press.
35
Damore, J. A. and J. Gore (2012). Understanding micro- Fiedler, K. (1988). The dependence of the conjunction fal-
bial cooperation. Journal of Theoretical Biology 299, lacy on subtle linguistic factors. Psychological Re-
31–41. search 50, 123–129.
Dana, J. and R. M. Dawes (2004). The superiority of sim- Fiedler, K. and P. Juslin (2006). Information Sampling and
ple alternatives to regression for social science predic- Adaptive Cognition. Cambridge: Cambridge Univer-
tions. Journal of Educational and Behavioral Statis- sity Press.
tics 29(3), 317–331. Fishburn, P. C. (1982). The Foundations of Expected Util-
Darwin, C. (1871). The Descent of Man. New York: Pen- ity. Dordrecht: D. Reidel.
guin Classics.
Fisher, R. A. (1936). Uncertain inference. Proceedings of
Davidson, D. (1974). Belief and the basis of meaning. Syn- the American Academy of Arts and Sciences 71, 245–
these 27(3–4), 309–323. 258.
Davis-Stober, C. P., J. Dana, and D. V. Budescu (2010).
Forscher, P., C. K. Lai, J. R. Axt, C. R. Ebersole, M. Her-
Why recognition is rational: Optimality results on
man, P. G. Devine, and B. A. Nosek (2017, July). A
single-variable decision rules. Judgment and Decision
meta-analysis of change in implicit bias. Unpublished
Making 5(4), 216–229.
Manuscript. Under review.
Dawes, R. M. (1979). The robust beauty of improper lin-
Friedman, J. (1997). On bias, variance, 0-1 loss and the
ear models in decision making. American Psycholo-
curse of dimensionality. Journal of Data Mining and
gist 34(7), 571–582.
Knowledge Discovery 1, 55–77.
de Finetti, B. (1974). Theory of Probability: A critical in-
troductory treatment, Volume 1 and 2. Wiley. Friedman, M. (1953). The methodology of positive eco-
nomics. In Essays in Positive Economics, pp. 3–43.
de Finetti, B. and L. J. Savage (1962). Sul modo di
University of Chicago Press.
scegliere le probabilità iniziali. Biblioteca del Metron,
Serie C 1, 81–154. Friedman, M. and L. J. Savage (1948). The utility analysis
DeMiguel, V., L. Garlappi, and R. Uppal (2009). Optimal of choices involving risk. Journal of Political Econ-
versus naive diversification: How inefficient is the N1 omy 56, 279–304.
portfolio strategy? Review of Financial Studies 22(5), Friston, K. (2010). The free-energy principle: A unified
1915–1953. brain theory. Nature Reviews Neuroscience 11, 127–
Dennett, D. C. (1971). Intentional systems. Journal of Phi- 138.
losophy 68(4), 87–106. Galaabaatar, T. and E. Karni (2013). Subjective ex-
Dewey, J. (1960). The Quest for Certainty. Gifford Lec- pected utility with incomplete preferences. Economet-
tures of 1929. New York: Capricorn Books. rica 81(1), 255–284.
Dhami, M. K., R. Hertwig, and U. Hoffrage (2004). The Gergely, G., H. Bekkering, and I. Király (2002). Devel-
role of representative design in an ecological approach opmental psychology: Rational imitation in preverbal
to cognition. Psychological Bulletin 130(6), 959–988. infants. Nature 415, 755–756.
Domingos, P. (2000). A unified bias-variance decomposi- Ghallab, M., D. Nau, and P. Traverso (2016). Automated
tion and its applications. In Proceedings of the 17th Planning and Acting. New York: Cambridge Univer-
International Conference on Machine Learning, pp. sity Press.
231–238. Morgan Kaufmann. Gibson, J. J. (1979). The Ecological Approach to Visual
Doyen, S., O. Klein, C.-L. Pichton, and A. Cleeremans Perception. Boston: Houghton Mifflin.
(2012). Behavioral priming: It’s all in the mind, but Gigerenzer, G. (1996). On narrow norms and vague heuris-
whose mind? PLoS One 7(1), e29081. tics: A reply to Kahneman and Tversky. Psychological
Dubins, L. E. (1975). Finitely additive conditional proba- Review 103(3), 592–596.
bility, conglomerability, and disintegrations. Annals of
Gigerenzer, G. (2007). Gut Feelings: The Intelligence of
Probability 3, 89–99.
the Unconscious. New York: Viking Press.
Einhorn, H. J. (1970). The use of nonlinear, noncompen-
Gigerenzer, G. and H. Brighton (2009). Homo heuristicus:
satory models in decision making. Psychological Bul-
Why biased minds make better inferences. Topics in
letin 73, 221–230.
Cognitive Science 1(1), 107–43.
Elliott, G., I. Komunjer, and A. Timmermann (2005). Es-
timation and testing of forecast rationality under flex- Gigerenzer, G. and D. Goldstein (1996). Reasoning the fast
ible loss. Review of Economic Studies 72, 1107–1125. and frugal way: Models of bounded rationality. Psy-
chological Review 103, 650–669.
Ellsberg, D. (1961). Risk, ambiguity and the Savage ax-
ioms. Quarterly Journal of Economics 75, 643–69. Gigerenzer, G., W. Hell, and H. Blank (1988). Presenta-
Fawcett, T. W., B. Fallenstein, A. D. Higginson, A. I. tion and content: The use of base rates as a contin-
Houston, D. E. W. Mallpress, P. C. Trimmer, and uous variable. Journal of Experimental Psychology:
J. M. McNamara (2014). The evolution of decision Human Perception and Performance 14(3), 513–525.
rules in complex environments. Trends in Cognitive Gigerenzer, G., R. Hertwig, and T. Pachur (Eds.) (2011).
Science 18(3), 153–161. Heuristics: The Foundations of Adaptive Behavior.
Fennema, H. and P. Wakker (1997). Original and cumu- New York: Oxford University Press.
lative prospect theory: A discussion of empirical dif- Gigerenzer, G., P. M. Todd, and T. A. Group (Eds.) (1999).
ferences. Journal of Behavioral Decision Making 10, Simple Heuristics that Make Us Smart. Oxford Uni-
53–64. versity Press.
36
Giles, R. (1976). A logic for subjective belief. In W. Harper for enhancing the benefits of cultural transmission.
and C. A. Hooker (Eds.), Foundations of Probability Evolution and Human Behavior 22(3), 165–196.
Theory, Statistical Inference, and Statistical Theories Hertwig, R., G. Barron, E. U. Weber, and I. Erev (2004).
of Science, Volume I. Dordrecht: Reidel. Decisions from experience and the effect of rare
Giron, F. J. and S. Rios (1980). Quasi-Bayesian behavior: events in risky choice. Psychological Science 15(8),
A more realistic approach to decision making? Traba- 534–539.
jos de Estadistica Y de Investigacion Operativa 31(1), Hertwig, R., J. N. Davis, and F. J. Sulloway (2002).
17–38. Parental investment: How an equity motive can pro-
Glymour, C. (2001). The Mind’s Arrows. Cambridge, MA: duce inequality. Psychological Bulletin 128(5), 728–
MIT Press. 745.
Goldblatt, R. (1998). Lectures on the Hyperreals: An In- Hertwig, R. and G. Gigerenzer (1999). The ‘conjunction
troduction to Nonstandard Analysis. Graduate Texts fallacy’ revisited: How intelligent inferences look
in Mathematics. New York: Springer-Verlag. like reasoning errors. Journal of Behavioral Decision
Making 12, 275–305.
Goldstein, D. and G. Gigerenzer (2002). Models of eco-
Hertwig, R. and T. J. Pleskac (2008). The game of life:
logical rationality: The recognition heuristic. Psycho-
How small samples render choice simpler. In The
logical Review 109(1), 75–90.
Probabilistic Mind: Prospects for Bayesian Cogni-
Golub, B. and M. O. Jackson (2010). Naïve learning in tive Science, pp. 209–235. Oxford: Oxford University
social networks and the wisdom of crowds. American Press.
Economic Journal of Microeconomics 2(1), 112–149.
Herzog, S. and R. Hertwig (2013). The ecological validity
Good, I. J. (1952). Rational decisions. Journal of the Royal of fluency. In C. Unkelbach and R. Greifeneder (Eds.),
Statistical Society. Series B 14(1), 107–114. The Experience of Thinking: How Fluency of Men-
Good, I. J. (1967). On the principle of total evidence. The tal Processes Influences Cognition and Behavior, pp.
British Journal for the Philosophy of Science 17(4), 190–219. Psychology Press.
319–321. Hey, J. D. (1982). Search for rules for search. Journal of
Good, I. J. (1983). Twenty-seven principles of rational- Economic Behavior and Organization 3(1), 65–81.
ity. In Good Thinking: The Foundations of Probability Ho, T.-H. (1996). Finite automata play repeated prisoner’s
and its Applications, pp. 15–20. Minneapolis: Univer- dilemma with information processing costs. Journal
sity of Minnesota Press. of Economic Dynamics and Control 20, 173–207.
Güth, W., R. Schmittberger, and B. Schwarze (1982). An Hochman, G. and E. Yechiam (2011). Loss aversion in the
experimental analysis of ultimatum bargaining. Jour- eye and in the heart. Journal of Behavioral Decision
nal of Economic Behavior and Organization 3(4), Making 24(2), 140–156.
367–388. Hogarth, R. M. (2012). When simple is hard to accept. In
Hacking, I. (1967). Slightly more realistic personal proba- P. M. Todd, G. Gigerenzer, and T. A. Group (Eds.),
bility. Philosophy of Science 34(4), 311–325. Ecological Rationality: Intelligence in the World, pp.
Haenni, R., J.-W. Romeijn, G. Wheeler, and J. Williamson 61–79. New York: Oxford University Press.
(2011). Probabilistic Logics and Probabilistic Net- Hogarth, R. M. and N. Karelaia (2007). Heuristic and lin-
works. Synthese Library. Dordrecht: Springer. ear models of judgment: Matching rules and environ-
ments. Psychological Review 114(3), 733–758.
Hahn, U. and P. A. Warren (2009). Perceptions of ran-
domness: Why three heads are better than four. Psy- Howe, M. L. (2011). The adaptive nature of memory and
chological Review 116(2), 454–461. See correction in its illusions. Current Directions in Psychological Sci-
116(4). ence 20(5), 312–315.
Hume, D. (1738). A Treatise of Human Nature. Version by
Halpern, J. Y. (2010). Lexicographic probability, condi-
Jonathan Bennett, 2008: www.earlymoderntexts.com.
tional probability, and nonstandard probability. Games
and Economic Behavior 68(1), 155–179. Hutchinson, J. M., C. Fanselow, and P. M. Todd (2012).
Car parking as a game between simple heuristics. In
Hammond, K. R. (1955). Probabilistic functioning and the
P. M. Todd, G. Gigerenzer, and T. A. Group (Eds.),
clinical method. Psychological Review 62(4), 255–
Ecological Rationality: Intelligence in the World, pp.
262.
454–484. New York: Oxford University Press.
Hammond, K. R., C. J. Hursch, and F. J. Todd (1964). An- Jackson, M. O. (2010). Social and Economic Networks.
alyzing the components of clinical inference. Psycho- Princeton, NJ: Princeton University Press.
logical Review 71(6), 438–456.
Jarvstad, A., U. Hahn, S. K. Rushton, and P. A. Warren
Hammond, P. J. (1994). Elementary non-Archimedean rep- (2013). Perceptuo-motor, cognitive, and description-
resentations of probability for decision theory and based decision-making seem equally good. Proceed-
games. In P. Humphreys (Ed.), Patrick Suppes: Sci- ings of the National Academy of Sciences 110(40),
entific Philosopher, Volume 1: Probability and Prob- 16271–16276.
abilistic Causality, pp. 25–59. Dordrecht, The Nether- Jevons, W. S. (1871). The Theory of Political Economy.
lands: Kluwer. Palgrave Classics in Economics. London: Macmillian
Haykin, S. O. (2013). Adaptive Filter Theory (5th ed.). and Company.
London: Pearson. Juslin, P. and H. Olsson (2005). Capacity limitations and
Henrich, J. and F. J. Gil-White (2001). The evolution of the detection of correlations: Comment on Kareev.
prestige: freely conferred deference as a mechanism Psychological Review 112(1), 256–267.
37
Juslin, P., A. Winman, and P. Hannson (2007). The naïve Koopman, B. O. (1940). The axioms and algebra of intu-
intuitive statistician: A naïve sampling model of in- itive probability. Annals of Mathematics 41(2), 269–
tuitive confidence intervalsve intuitive statistician: A 292.
naïve sampling model of intuitive confidence inter- Körding, K. P. and D. M. Wolpert (2004). The loss function
vals. Psychological Review 114(3), 678–703. of sensorimotor learning. Proceedings of the National
Kahneman, D. (2017). Reply to Schimmack, Heene, Academy of Sciences 101, 9839–42.
and Kesavan’s ‘Rconstruction of a Train Wreck: Kreps, D. M., P. Milgrom, J. Roberts, and R. Wil-
How Priming Research Went Off the Rails’. son (1982). Rational cooperation in the finitely re-
https://replicationindex.wordpress.com/ peated prisoners’ dilemma. Journal of Economic The-
2017/02/02/reconstruction-of-a-train- ory 27(2), 245–252.
wreck-how-priming-research-went-of-the- Kühberger, A., M. Schulte-Mecklenbeck, and J. Perner
rails/comment-page-1/#comment-1454. (1999). The effects of framing, reflection, probabil-
Kahneman, D., B. Slovic, and A. Tversky (Eds.) (1982). ity, and payoff on risk preference in choice tasks.
Judgment Under Uncertainty: Heuristics and Biases. Organizational Behavior and Human Decision Pro-
Cambridge: Cambridge University Press. cesses 78(3), 204–231.
Kahneman, D. and A. Tversky (1972). Subjective prob- Kyburg, Jr., H. E. (1978). Subjective probability: Crit-
ability: A judgment of representativeness. Cognitive icisms, reflections, and problems. Journal of Philo-
Psychology 3, 430–454. sophical Logic 7(1), 157–180.
Kahneman, D. and A. Tversky (1979). Prospect theory: Levi, I. (1977). Direct inference. Journal of Philosophy 74,
An analysis of decision under risk. Econometrica 47, 5–29.
263–291. Levi, I. (1983). Who commits the base-rate fallacy? Be-
Kahneman, D. and A. Tversky (1996). On the reality havioral and Brain Sciences 6(3), 502–506.
of cognitive illusions. Psychological Review 103(3), Lewis, R. L., A. Howes, and S. Singh (2014). Computa-
582–591. tional rationality: Linking mechanism and behavior
Kareev, Y. (1995). Through a narrow window: Work- through bounded utility maximization. Topics in Cog-
ing memory capacity and the detection of covariation. nitive Science 6, 279–311.
Cognition 56(3), 263–269. Lichtenberg, J. M. and Özgür Simsek (2016). Simple re-
Kareev, Y. (2000). Seven (indeed, plus or minus two) gression models. Proceedings of Machine Learning
and the detection of correlations. Psychological Re- Research 58, 13–25.
view 107(2), 397–402. Loomes, G. and R. Sugden (1982). Regret theory: An al-
Karni, E. (1985). Decision Making Under Uncertainty: ternative theory of rational choice under uncertainty.
The Case of State-Dependent Preferences. Cam- Economic Journal 92(4), 805–824.
bridge, MA: Harvard University. Loridan, P. (1984). ε-solutions in vector minimization
Katsikopoulos, K. V. (2010). The less-is-more effect: problems. Journal of Optimization Theory and Appli-
Predictions and tests. Judgment and Decision Mak- cations 43(2), 265–276.
ing 5(4), 244–257. Luce, R. D. and H. Raiffa (1957). Games and Decisions:
Katsikopoulos, K. V., L. J. Schooler, and R. Hertwig Introduction and Critical Suvey. New York: Dover.
(2010). The robust beauty of ordinary information. Marr, D. C. (1982). Vision. New York: Freeman.
Psychological Review 117(4), 1259–1266. May, K. O. (1954). Intransitivity, utility, and the aggrega-
Kaufmann, E. and W. W. Wittmann (2016). The success tion of preference patterns. Econometrica 22(1), 1–13.
of linear bootstrapping models: Decision domain-, Maynard Smith, J. (1982). Evolution and the Theory of
expertise-, and criterion-specific meta-analysis. PLoS Games. Cambridge: Cambridge University Press.
One 11(6), e0157914.
McBeath, M. K., D. M. Shaffer, and M. K. Kaiser (1995).
Keeney, R. L. and H. Raiffa (1976). Decisions with Mul- How baseball outfielders determine where to run to
tiple Objectives: Preferences and Value Trade-offs. catch fly balls. Science 268(5210), 569–573.
New York: Wiley.
McNamara, J. M., P. C. Trimmer, and A. I. Houston
Kelly, K. T. and O. Schulte (1995). The computable testa- (2014). Natural selection can favour ‘irrational’ be-
bility of theories making uncomputable predictions. havior. Biology Letters 10(1), 20130935.
Erkenntnis 43(1), 29–66.
Meder, B., R. Mayrhofer, and M. Waldmann (2014). Struc-
Keynes, J. M. (1921). A Treatise on Probability. London: tural induction in diagnostic causal reasoning. Psycho-
Macmillan. logical Review 121(3), 277–301.
Kidd, C. and B. Y. Hayden (2015). The psychology and Meehl, P. (1954). Clinical versus statistical prediction: A
neuroscience of curiosity. Neuron 88(3), 449–460. theoretical analysis and a review of the evidence. Min-
Kirsch, D. (1995). The intelligence use of space. Artificial neapolis: Minnesota Press.
Intelligence 73(1–2), 31–68. Mill, J. S. (1844). On the definition of political economy.
Knight, F. H. (1921). Risk, Uncertainty and Profit. Boston: In J. M. Robson (Ed.), The Collected Works of John
Houghton Mifflin. Stuart Mill, Volume IV of Essays on Economics and
Koehler, J. J. (1996). The base rate fallacy reconsid- Society, Part I. Toronto: University of Toronto Press.
ered: Descriptive, normative, and methodological Mongin, P. (2000). Does optimization imply rationality.
challenges. Behavioral and Brain Sciences 19, 1–53. Synthese 124(1–2), 73–111.
38
Nau, R. (2006). The shape of incomplete preferences. The Peterson, C. R. and L. R. Beach (1967). Man as an intuitive
Annals of Statistics 34(5), 2430–2448. statistician. Psychological Bulletin 68(1), 29–46.
Newell, A. and H. A. Simon (1956, June). The logic theory Popper, K. R. (1959). The Logic of Scientific Discovery.
machine: A complex information processing system. London: Routledge.
Technical Report P-868, The Rand Corporation, Santa Puranam, P., N. Stieglitz, M. Osman, and M. M. Pillutla
Monica, CA. (2015). Modelling bounded rationality in organiza-
Newell, A. and H. A. Simon (1972). Human Problem Solv- tions: Progress and prospects. The Acadamy of Man-
ing. Englewood Cliffs, NJ: Prentice-Hall. agement Annals 9(1), 337–392.
Newell, A. and H. A. Simon (1976). Computer science as Quiggin, J. (1982). A theory of anticipated utility. Journal
empirical inquiry: Symbols and search. Communica- of Economic Behavior and Organization 3, 323–343.
tions of the ACM 19(3), 113–126. Rabin, M. (2000). Risk aversion and expected-utility the-
Neyman, A. (1985). Bounded complexity justifies cooper- ory: A calibration theorem. Econometrica 68(5),
ation in the finitely repeated prisoner’s dilemma. Eco- 1281–1292.
nomic Letters 19(3), 227–229. Rapaport, A., D. A. Seale, and A. M. Colman (2015).
Norton, M. I., D. Mochon, and D. Ariely (2012). The Is Tit-for-Tat the answer? On the conclusions
IKEA effect: When labor leads to love. Journal of drawn from Axelrod’s tournaments. PLoS One 10(7),
Consumer Psychology 22(3), 453–460. e0134128.
Nowak, M. A. and R. M. May (1992). Evolutionary games Rapoport, A. and A. Chammah (1965). Prisoner’s
and spatial chaos. Nature 359, 826–829. Dilemma: A study in conflict and cooperation. Ann
Oaksford, M. and N. Chater (1994). A rational analysis of Arbor: University of Michigan Press.
the selection task as optimal data selection. Psycho- Regenwetter, M., J. Dana, and C. P. Davis-Stober
logical Review 101(4), 608–631. (2011). Transitivity of preferences. Psychological Re-
Oaksford, M. and N. Chater (2007). Bayesian Rationality. view 118(1), 42–56.
Oxford: Oxford University Press. Reiter, R. (1980). A logic for default reasoning. Artificial
Ok, E. A. (2002). Utility representation of an incom- Intelligence 13, 81–132.
plete preference relation. Journal of Economic The- Renyi, A. (1955). On a new axiomatic theory of probabil-
ory 104(2), 429–449. ity. Acta Math. Acad. Sci. Hungarian 6, 285–335.
Osborne, M. J. (2003). An Introduction to Game Theory. Rick, S. (2011). Losses, gains, and brains: Neuroeco-
Oxford: Oxford University Press. nomics can help to answer open questions about loss
Oswald, F. L., G. Mitchell, H. Blanton, J. Jaccard, and P. E. aversion. Journal of Consumer Psychology 21, 453–
Tellock (2013). Predicting ethnic and racial discrimi- 463.
nation: A meta-analysis of IAT criterion studies. Jour- Rieskamp, J. and A. Dieckmann (2012). Redundancy: En-
nal of Personal Psychology 105(2), 171–192. vironment structure that simple heuristics can exploit.
Pachur, T., P. M. Todd, G. Gigerenzer, L. J. Schooler, and In P. M. Todd, G. Gigerenzer, and T. A. Group (Eds.),
D. Goldstein (2012). When is the recognition heuris- Ecological Rationality: Intelligence in the World, pp.
tic an adaptive tool? In P. M. Todd, G. Gigerenzer, 187–215. New York: Oxford University Press.
and T. A. Group (Eds.), Ecological Rationality: Intel- Rubinstein, A. (1986). Finite automata play the re-
ligence in the World, pp. 113–143. New York: Oxford peated prisoner’s dilemma. Journal of Economic The-
University Press. ory 39(1), 83–96.
Palmer, S. E. (1999). Vision Science. Cambridge, MA: Russell, S. J. and D. Subramanian (1995). Provably
MIT Press. bounded-optimal agents. Journal of Artificial Intelli-
Papadimitriou, C. H. and M. Yannakakis (1994). On com- gence Research 2, 575–609.
plexity as bounded rationality. In Proceedings of the Samuels, R., S. Stich, and M. Bishop (2002). Ending the
26th annual ACM Symposium on Theory of Comput- rationality wars: How to make disputes about human
ing, Montreal, Quebec, pp. 726–733. rationality disappear. In R. Elio (Ed.), Common Sense,
Parikh, R. (1971). Existence and feasibility in arithmetic. Reasoning, and Rationality. New York: Oxford Uni-
Journal of Symbolic Logic 36(3), 494–508. versity Press.
Payne, J. W., J. R. Bettman, and E. J. Johnson (1988). Samuelson, P. (1947). Foundations of Economic Analysis.
Adaptive strategy selection in decision making. Jour- Cambridge, MA: Harvard University Press.
nal of Experimental Psychology: Learning, Memory, Santos, F. C., M. D. Santos, and J. M. Pacheco (2008). So-
and Cognition 14(3), 534–552. cial diversity promotes the emergence of cooperation
Pedersen, A. P. (2014). Comparative expectations. Studia in public goods games. Nature 454, 213–2016.
Logica 102(4), 811–848. Savage, L. J. (1954). Foundations of Statistics. New York:
Pedersen, A. P. and G. Wheeler (2014). Demystifying di- Wiley.
lation. Erkenntnis 79(6), 1305–1342. Savage, L. J. (1967, April). Difficulties in the theory of per-
Pedersen, A. P. and G. Wheeler (2015). Dilation, disinte- sonal probability. Philosophy of Science 34(4), 311–
grations, and delayed decisions. In Proceedings of the 325.
9th Symposium on Imprecise Probabilities and Their Schervish, M. J., T. Seidenfeld, and J. B. Kadane (2012).
Applications (ISIPTA), Pescara, Italy, pp. 227–236. Measures of incoherence: How not to gamble if you
Peirce, C. S. (1955). Philosophical Writings of Peirce. New must, with discussion. In J. Bernardo, A. P. Dawid,
York: Dover. J. O. Berger, M. West, D. Heckerman, M. Bayarri, and
39
A. F. M. Smith (Eds.), Bayesian Statistics 7: Proceed- Thaler, R. H. (1980). Toward a positive theory of consumer
ings of the 7th Valencia International Meeting, Oxford choice. Journal of Economic Behavior and Organiza-
Science Publications, Oxford, pp. 385–402. Claren- tion 1(1), 39–60.
don Press. Thaler, R. H. and C. R. Sustein (2008). Nudge: Improving
Schick, F. (1986). Dutch bookies and money pumps. Jour- Decisions about Health, Wealth, and Happiness. New
nal of Philosophy 83(2), 112–119. Haven: Yale University Press.
Schmitt, M. and L. Martignon (2006). On the complexity Todd, P. M., G. Gigerenzer, and T. A. Group (Eds.) (2012).
of learning lexicographic strategies. Journal of Ma- Ecological Rationality: Intelligence in the World.
chine Learning Research 7, 55–83. New York: Oxford University Press.
Schooler, L. J. and R. Hertwig (2005). How forgetting Todd, P. M. and G. F. Miller (1999). From pride and prej-
aids heuristic inference. Psychological Review 112(3), udice to persuasion: Satisficing in mate search. In
610–628. G. Gigerenzer, P. M. Todd, and T. A. Group (Eds.),
Seidenfeld, T., M. J. Schervish, and J. B. Kadane (1995). Simple Heuristics that Make Us Smart, pp. 287–308.
A representation of partially ordered preferences. The Oxford University Press.
Annals of Statistics 23, 2168–2217. Trivers, R. L. (1971). The evolution of reciprocal altruism.
Seidenfeld, T., M. J. Schervish, and J. B. Kadane (2012). The Quarterly Review of Biology 46(1), 35–57.
What kind of uncertainty is that? Using personal Trommershäuser, J., L. T. Maloney, and M. S. Landy
probability for expressing one’s thinking about logi- (2003). Statistical decision theory and trade-offs in the
cal and mathematical propositions. Journal of Philos- control of motor response. Spatial Vision 16(3), 255–
ophy 109(8-9), 516–533. 275.
Selten, R. (1998). Aspiration adoption theory. Journal of Turner, B. M., C. A. Rodriguez, T. M. Norcia, S. M. Mc-
Mathematical Psychology 42(2–3), 191–214. Clure, and M. Steyvers (2016). Why more is better:
Simon, H. A. (1947). Administrative Behavior: a study of Simultaneous modeling of EEG, fMRI, and behavioral
decision-making processes in administrative organi- data. NeuroImage 128, 96–115.
zation (1st ed.). New York: Macmillan. Tversky, A. (1969). Intransitivity of preferences. Psycho-
Simon, H. A. (1955a). A behavioral model of rational logical Review 76, 31–48.
choice. Quarterly Journal of Economics 69, 99–118. Tversky, A. and D. Kahneman (1973). Availability: A
Simon, H. A. (1955b). On a class of skew distribution func- heuristic for judging frequency and probability. Cog-
tions. Biometrika 42(3–4), 425–440. nitive Psychology 5(2), 207–232.
Simon, H. A. (1957). Administrative Behavior: a study of Tversky, A. and D. Kahneman (1974). Judgment under un-
decision-making processes in administrative organi- certainty: Heuristics and biases. Science 185(4157),
zation (2nd ed.). New York: Macmillan. 1124–1131.
Simon, H. A. (1976). From substantive to procedural ratio- Tversky, A. and D. Kahneman (1977, October). Causal
nality. In T. Kastelein, S. Kuipers, W. Nijenhuis, and schemata in judgments under uncertainty. Technical
G. Wagenaar (Eds.), 25 Years of Economic Theory, pp. Report TR-1060-77-10, Defense Advanced Research
65–86. Boston: Springer. Projects Agency (DARPA).
Skyrms, B. (2003). The Stag Hunt and the Evolution of Tversky, A. and D. Kahneman (1981). The framing
Social Structure. Cambridge: Cambridge University of decisions and the psychology of choice. Sci-
Press. ence 211(4481), 483–458.
Sorensen, R. A. (1991). Rationality as an absolute concept. Tversky, A. and D. Kahneman (1983). Extensional versus
Philosophy 66(258), 473–486. intuitive reasoning: The conjunction fallacy in prob-
Spirtes, P. (2010). Introduction to causal inference. Journal ability judgment. Psychological Review 90(4), 293–
of Machine Learning Research 11, 1643–1662. 315.
Stalnaker, R. (1991). The problem of logical omniscience, Tversky, A. and D. Kahneman (1992). Advances in
I. Synthese 89(3), 425–440. prospect theory: Cumulative representation of uncer-
Stanovich, K. E. and R. F. West (2000). Individual differ- tainty. Journal of Risk and Uncertainty 5(4), 297–323.
ences in reasoning: Implications for the rationality de- von Neumann, J. and O. Morgenstern (1944). Theory
bate? Behavioral and Brain Sciences 23(5), 645–65. of Games and Economic Behavior. Princeton, NJ:
Stein, E. (1996). Without Good Reason: The Rationality Princeton University Press.
Debate in Philosophy and Cognitive Science. Oxford: Vranas, P. B. (2000). Gigerenzer’s normative critique of
Clarendon Press. Kahneman and Tversky. Cognition 76, 179–193.
Stevens, J. R., J. Volstorf, L. J. Schooler, and J. Rieskamp Wakker, P. P. (2010). Prospect Theory: For Risk and Am-
(2011). Forgetting constrains the emergence of co- biguity. Cambridge: Cambridge University Press.
operative decision strategies. Frontiers in Psychol- Waldmann, M. R., K. J. Holyoak, and A. Fratianne
ogy 1(235), 1–12. (1995). Causal models and the acquisition of category
Stigler, G. (1961). The economics of information. Journal structure. Journal of Experimental Psychology: Gen-
of Political Economy 69, 213–225. eral 124(2), 181–206.
Tarski, A., A. Mostowski, and R. M. Robinson (1953). Un- Walley, P. (1991). Statistical Reasoning with Imprecise
decidable Theories. North-Holland Publishing Co. Probabilities. London: Chapman and Hall.
40
Weber, M. (1905). The Protestant Ethic and the Spirit of White, D. J. (1986). Epsilon efficiency. Journal of Opti-
Capitalism. London: Allen and Unwin. Translated by mization Theory and Applications 49(2), 319–337.
Talcott Parsons (1930).
Yechiam, E. and G. Hochman (2014). Loss attention in a
Wheeler, G. (2004). A resource bounded default logic. In dual task setting. Psychological Science 25(2), 294–
J. Delgrande and T. Schaub (Eds.), 10th International 502.
Workshop on Non-Monotonic Reasoning (NMR 2004),
Whistler, Canada, pp. 416–422. Yule, G. U. (1911). A mathematical theory of evolution,
Wheeler, G. (2017). Machine epistemology and big data. based on the conclusions of dr. j. c. williss, f.r.s. Philo-
In L. McIntyre and A. Rosenberg (Eds.), The Rout- sophical Transactions of the Royal Society of London.
ledge Companion to Philosophy of Social Science, pp. Series B, Containing Papers of a Biological Charac-
321–329. Routledge. ter 213, 21–87.
Wheeler, G. and F. G. Cozman (2018, September). On the Zaffalon, M. and E. Miranda (2015). Desirability and
imprecision of full conditional probabilities. Unpub- the birth of incomplete preferences. ArXiv e-prints,
lished Manuscript. abs/1506.00529.
41