Game Theory
Game Theory
Game Theory
Markus M. Möbius
February 4, 2003
1
Strategic: Individual players account for this interdependence.
Rational: While accounting for this interdependence each player chooses
her best action. This condition can be weakened and we can assume that
agents are boundedly rational. Behavioral economics analyzes decision prob-
lems in which agents behave boundedly rational. Evolutionary game theory
is game theory with boundedly rational agents.
2
Axiom 1 Completeness. Any two outcomes can be ranked, e.g. a ¹ b or
b ¹ a.
Both axioms ensure that all choices can be ordered in a single chain without
gaps (axiom 1) and without cycles (axiom 2).
Although the preference relation is the basic primitive of any decision
problem (and generally observable) it is much easier to work with a consistent
utility function u : A → < because we only have to remember n real numbers
{u1 , u2 , .., un }.
Definition 2 A utility function u : A → < is consist with the preference
relationship of a decision problem (A, ¹) if for all a, b ∈ A:
Theorem 1 Assume the set of outcomes is finite. Then there exists a utility
function u which is consistent.
Proof: The proof is very simple. Simple collect all equivalent outcomes in
equivalence classes. There are finitely many of those equivalence classes
since there are only finitely many outcomes. Then we can order these
equivalence classes in a strictly increasing chain due to completeness
and transitivity.
Note that the utility function is not unique. In fact, any monotonic
transformation of a consistent utility function gives another utility function
which is also consistent.
We can now define what a rational decision maker is.
Definition 3 A rational decision maker who faces a decision problem (A, ¹)
chooses an outcome a∗ ∈ A which maximizes his utility (or, equivalently, for
each a ∈ A we have a ¹ a∗ ).
Remark 1 When there are infinitely many choices we want to make sure
that there is a continuous utility function. This requires one more axiom
which makes sure that preferences are continuous. For that purpose, one has
to define topology on the set of outcomes. We won’t deal with that since we
won’t gain much insight from it.
3
4 Decision Theory under Uncertainty
Lotteries are defined over the of outcomes A (which is again assumed to be
finite to keep things simple).
DefinitionP4 A simple lottery is defined as the set {(a1 , p1 ) , (a2 , p2 ) , .. (an , pn )}
such that ni=1 pi = 1 and 0 ≤ pi ≤ 1. In a simple lottery the outcome ai
occurs with probability pi .
When there are up to three outcomes we can conveniently describe the
set of lotteries in a graphical way (see triangle).
Under certainty the preference relationship can still be written down ex-
plicitly for finite A (simply write down all of the n(n+1) 2
rankings). Under
uncertainty there are suddenly infinitely many lotteries. This poses two
problems. First of all, it’s impractical to write a large number of lottery
comparisons down. A second (and deeper) point is the observation that the
preference relationship is in principle unobservable because of the infinite
number of necessary comparisons.
John von Neumann and Oscar Morgenstern showed that under some ad-
ditional restrictions on preferences over lotteries there exists a utility function
over outcomes such that the expected utility of a lottery provides a consistent
ranking of all lotteries.
Definition 5 Assume a utility function u over the outcomes A. The expected
utility of the lottery L = {(a1 , p1 ) , (a2 , p2 ) , .. (an , pn )} is defined as
n
X
u (L) = u (ai ) pi
i=1
4
Axiom 3 Each compound lottery is equivalent to a simple lottery with the
same distribution over final outcomes.
Axiom 5 Archimedian. For any outcomes a < b < c there is some lottery
L = {(a, α) , (c, 1 − α)} such that the agent is indifferent between L and b.
Proof: First of all we find the best and worst outcome b and w (possible be-
cause there are only finitely many outcomes). Because of the Archime-
dian axiom we can find a number αa for each outcome a such that L =
{(b, α) , (w, 1 − α)}. We can define a utility function over each outcome
a such that u (a) = α. Using the monotonicity axiom it can be shown
that this number is unique. For each lottery we can now calculate its ex-
pected utility. It remains to be shown that this expected utility function
is consistent with the original preferences. So take two lotteries L1 and
L2 such that L1 ¹ L2 . We can write L1 = {(a1 , p1 ) , (a2 , p2 ) , .. (an , pn )}
5
and L2 = {(a1 , q1 ) , (a2 , q2 ) , .. (an , qn )}. Now replace each outcome
ai by the above
Pn lotteries. The compound Pn lottery can be rewritten
as L1 = {(b,
Pn i=1 i p u (a i )) , (w, 1
Pn i=1 i (ai ))}. Similarly, we get
− p u
L2 = {(b, i=1 qi u (ai )) , (w,
P 1 − i=1 qi u P (ai ))}. By the monotonicity
axiom we can deduce that ni=1 pi u (ai ) ≤ ni=1 qi u (ai ), e.g. EU (L1 ) ≤
EU (L2 ). QED
From now on all payoffs in our course will be assumed to represent vNM
utility values. The expected payoff will be the expected utility.
4.1 Puzzles
EUT forms the basis of modern micro economics. Despite its success there
are important behavioral inconsistencies related to it. Some of those we are
going to discuss briefly before we turn our attention to proper games.
6
Most people in our class now preferred C2 over C1.
It can be checked that the lotteries Bi and Ci are derived from Ai just
by mixing the original lotteries with an irrelevant alternative - in the case of
(B) there is a 10 percent chance of getting nothing and a 90 percent chance
of getting (A), and in case of (C), there is a 75 percent chance of getting
nothing.
The Allais paradox is the most prominent example for behavioral incon-
sistencies related to the von Neumann Morgenstern axiomatic model of choice
under uncertainty. The Allais paradox shows that the significant majority of
real decision makers orders uncertain prospects in a way that is inconsistent
with the postulate that choices are independent of irrelevant alternatives.
There is an alternative explanation for the failure of EUT in this case.
Assume, that agents face the compound lottery instead of the simple lotteries
(B) and (C). Now the relationship to (A) is much more transparent - in fact,
one could tell a story such as: ”with 75 percent probability you are not invited
to choose between these two outcomes, and with 25 percent probability you
can choose either A1 or A2”. It’s likely that choices would be much more
consistent now.
The standard explanation for the failure of EUT is peoples’ inability to
keep small probability differences apart. 80 percent and 100 percent ’looks’
quite different and people focus on the probabilities. 20 percent and 25 per-
cent ’looks’ the same - so people focus on the values instead. Prospect theory
(Kahnemann and Tversky) can deal with the Allais paradox by weighting
probabilities accordingly.
7
Table 1: McNeil, Pauker and Tversky (1988)
Survival Mortality Both
Radiation Surgery Radiation Surgery Radiation Surgery
immediate 100 90 0 10
1 year 77 68 23 32
5 year 22 34 78 66
US 16 84 50 50 44 56
Israeli 20 80 45 56 34 66
8
Lecture III: Normal Form Games, Rationality
and Iterated Deletion of Dominated Strategies
Markus M. Möbius
February 7, 2003
S = S1 × S2
In our case, S has order 4. If player 1 can take 5 possible actions, and
player 2 can take 10 possible actions, the set of profiles has order 50.
1
Figure 1: General 2 by 2 game
E C
E 1,1 0,0
C 0,0 1,1
4. Players have preferences over the outcomes of the play. You should
realize that players cannot have preferences over the actions. In a game
my payoff depends on your action. In our New York game players just
want to be able to meet at the same spot. They don’t care if they
meet at the Empire State building or at Central Park. If they choose
E and the other player does so, too, fine! If they choose E but the
other player chooses C, then they are unhappy. So what matters to
players are outcomes, not actions (of course their actions influence the
outcome - but for each action there might be many possible outcomes -
in our example there are two possible outcomes per action). Recall, that
we can represent preferences over outcomes through a utility function.
Mathematically, preferences over outcomes are defined as:
ui : S → R
2
We’ll write S = S1 × S2 × .. × SI and we call s ∈ S a strategy profile
(s = (s1 , s2 , .., sI )). We denote the strategy choices of all players except
player i with s−i for (s1 , s2 , .., si−1 , si+1 , ..sI ).
H T
H 1,−1 −1,1
T −1,1 1,−1
3
the venues, but each wants to go their favorite one (the husband is player 1
- the column player).
F O
F 2,1 0,0
O 0,0 1,2
t c
t −1,−1 10,0
c 0,10 5,5
4
C D
C 3,3 −1,4
D 4,−1 0,0
Note, that the best outcome in terms of welfare is if both cooperate. The
outcome (D, D) is worst in welfare terms, and is also Pareto dominated by
(C, C) because both players can do better. So clearly, (D, D) seems to be a
terrible outcome overall.
Some examples of Prisoner’s dilemmas are the following:
5
2.5 Cournot Competition
This game has an infinite strategy space. Two firms choose output levels qi
and have cost function ci (qi ). The products are undifferentiated and market
demand determines a price p (q1 + q2 ). Note, that this specification assumes
that the products of both firms are perfect substitutes, i.e. they are homoge-
nous products.
D = {1, 2}
S1 = S2 = R+
u1 (q1 , q2 ) = q1 p (q1 + q2 ) − c1 (q1 )
u2 (q1 , q2 ) = q2 p (q1 + q2 ) − c2 (q2 )
6
4 Experiment: Iterated Deletion Game
Students play the game given below and are asked to choose a strategy for
player 1.
A B C D
7
Definition 2 Player i is rational with beliefs µi if
si ∈ arg max
0
Eµi (s−i ) ui (s0i , s−i ) ,
si
or alternatively X
si maximizes ui (s0i , s−i ) µi (s−i ) .
s−i
Note, that player i faces a simple decision problem as soon as she has
formed her belief µi .
Definition 3 Strategy si is stricttly dominated for player i if there is some
s0i ∈ Si such that
ui (s0i , s−i ) > ui (si , s−i )
for all s−i ∈ S−i .
Note that the inequality is strict for all s−i . A strategy is weakly domi-
nated if the inequality is weak for all s−i and strict for at least one s−i .
Proposition 1 If player i is rational he will not play a strictly dominated
strategy.
Proof: If strategy si is strictly dominated by strategy s0i we can deduce that
for any belief of player i we have Eµi (s−i ) ui (s0i , s−i ) > Eµi (s−i ) ui (si , s−i ).
8
L M R
2. Row player should realize this if he know that the other player is ra-
tional. Thus he won’t play D.
9
• Step II: Define
© ¯ ª
Si1 = si ∈ Si0 ¯6 ∃s0i ∈ Si0 ui (s0i , s−i ) > ui (si , s−i ) ∀s−i ∈ S−i
0
Sik+1 is the set still not strictly dominated when you know your oppo-
k
nent uses some strategy in S−i .
0 1
Note restrictions S−i , S−i , ..
Players know that opponents are rational, know that opponents know
that they are rational ..., e.g. rationality is common knowledge.
T
• Step ∞: Let Si∞ = ∞ k
k=1 Si .
Note, that the process must stop after finitely many steps if the strategy
set is finite because the sets can only get smaller after each iteration.
Most games are not dominance solvable (coordination game, zero sum
game).
We have not specified the order in which strategies are eliminated. You
will show in the problem set that the speed and order of elimination does
not matter.
The same is not true for the elimination of weakly dominated strategies
as the next example shows.
L R
T 1,1 0,0
M 1,1 2,1
B 0,0 2,1
10
We can first eliminate T and then L in which case we know that (2,1) is
played for sure. However, if we eliminate B first and then R we know that
(1,1) is being played for sure. So weak elimination does not deliver consistent
results and is therefore generally a less attractive option than the deletion of
strictly dominated strategies.
11
2k+1
Therefore
¡ ¡ in¢¢ the 2k + 1th stage only strategies in S1 = S22k+1 =
[BR2 ..BR1 q , BR2 (..BR1 (q))] survive.
It’s easy to show graphically that this interval shrinks in each iteration
and that the two limits converge to the intersection q1∗ = q2∗ of both best
response functions where q2∗ = BR2 (q1∗ ). Therefore, the Cournot game is
solvable through the iterated deletion of strictly dominated strategies.
Remark 3 It can be shown that the same game with three firms is NOT
dominance solvable. You have to show that on the problem set!
12
Lecture IV: Nash Equilibrium
Markus M. Möbius
March 3, 2003
1
A Nash equilibrium captures the idea of equilibrium. Both players know
what strategy the other player is going to choose, and no player has an
incentive to deviate from equilibrium play because her strategy is a best
response to her belief about the other player’s strategy.
C 3,3 −1,4
D 4,−1 0,0
This game has the unique Nash equilibrium (D,D). It is easy to check
that each player can profitably deviate from every other strategy profile. For,
example (C,C) cannot be a NE because player 1 would gain from playing D
instead (as would player 2).
1.2 Example II
L M R
In this game the unique Nash equilibrium is (U,L). It is easy to see that
(U,L) is a NE because both players would lose from deviating to any other
strategy.
2
To show hat there are no other Nash equilibria we could check each other
strategy profile, or note that S1∞ = {U } and S2∞ = {L} and use:
Proof: Suppose not. Then there exists T such that s∗ ∈ S1T × ... × SIT
but s∗ 6∈ S1T +1 × ... × SIT +1 . The definition of ISD implies that there
exists s0i ∈ SiT ⊆ Si such that ui (s0i , s−i ) > ui (s∗i¡, s−i ) for
¢ all s−i
¡∈
T
S−i¢.
0 0 ∗ ∗ ∗
Therefore there exists a si ∈ Si such that ui si , s−i > ui si , s−i
which contradicts that s∗ was a NE.
Lemma 1 (q1∗ , q2∗ ) is a NE if and only if qi∗ ∈ BRi (q−i ) for all i.
The NE is the solution to q1 = BR1 (q2 ) and q2 = BR2 (q1 ). This system
has exactly one solution. This can be shown algebraically or simply by
looking at the intersections of the BR graphs in figure 1. Because of symmetry
we know that q1 = q2 = q ∗ . Hence we obtain:
α − c q∗
q∗ = −
β 2
3
Figure 1: BR functions of two firm Cournot game
BR1(q2)
2
q
(q*1,q*2)
BR2(q1)
(α−c)/2β (α−c)/β
q1
We also assume that D (c) > 0, i.e. firms can sell a positive quantity if
they price at marginal cost (otherwise a market would not be viable - firms
couldn’t sell any output, or would have to accumulate losses to do so).
Lemma 2 The Bertrand game has the unique NE (p∗1 , p∗2 ) = (c, c).
Proof: First we must show that (c,c) is a NE. It is easy to see that each firm
makes zero profits. Deviating to a price below c would cause losses to
the deviating firm. If any firm sets a higher price it does not sell any
4
output and also makes zero profits. Therefore, there is no incentive to
deviate.
To show uniqueness we must show that any other strategy profile
(p1 , p2 ) is not a NE. It’s easiest to distinguish lots of cases.
Case I: p1 < c or p2 < c In this case one (or both players) makes
negative losses. This player should set a price above his rival’s price
and cut his losses by not selling any output.
Case II: c ≤ p1 < p2 or c ≤ p2 < p1 Assume first that c < p1 < p2
or c < p2 < p1 . In this case the firm with the higher price makes zero
profits. It could profitably deviate by setting a price equal to the rival’s
price and thus capture at least half of his market, and make strictly
positive profits. Now consider the case c = p1 < p2 or c = p2 < p1 .
Now the lower price firm can charge a price slightly above marginal
cost (but still below the price of the rival) and make strictly positive
profits.
Case III: c < p1 = p2 Firm 1 could profitably deviate by setting a
price p1 = p2 − ² > c. The firm’s profits before and after the deviation
are:
D (p2 )
πB = (p2 − c)
2
πA = D (p2 − ²) (p2 − ² − c)
This expression (the gain from deviating) is strictly positive for suffi-
ciently small ². Therefore, (p1 , p2 ) cannot be a NE.
Remark 1 In problem 11 of problem set 1 you had to solve for the unique
Nash equilibrium when one firm (say 2) has higher marginal cost c2 > c1 .
Intuitively the price in the unique NE should be just below c2 - this would
keep firm 2 out of the market and firm 1 has no incentive to cut prices any
further. However, if firms can set any real positive price there is no pure NE.
Assume c2 = 10. If firm 1 sets prices at 9.99 it can do better by setting them
at 9.999 etc. Therefore, we have to assume that the pricing is discrete, i.e.
5
can only be done in multiples of pennies say. In this way, the unique NE has
firm 1 setting a price p1 = c2 minus one penny.
Food for Thought: How would you modify the Bertrand game to make
it solvable through IDSDS? Hint: You have to (a) discretize the strategy
space, and (b) assume that D (p) = 0 for some sufficiently high price.
6
Lecture IV: Nash Equilibrium II - Multiple
Equilibria
Markus M. Möbius
March 3, 2003
E C
E 1,1 0,0
C 0,0 1,1
This game has two Nash equilibria - (E,E) and (C,C). In both cases no
player can profitably deviate. (E,C) and (C,E) cannot be NE because both
players would have an incentive to deviate.
1
1.2 Voting Game
Three players simultaneously cast ballots for one of three alternatives A,B
or C. If a majority chooses any policy that policy is implemented. If the
votes split 1-1-1 we assume that the status quo A is retained. Suppose the
preferences are:
Claim 1 The game has several Nash equilibria including (A, A, A), (B, B, B),
(C, C, C),(A, B, A), and (A, C, C).
Informal Proof: In the first three cases no single player can change the
outcome. Therefore there is no profitable deviation. In the last two
equilibria each of the two A and two C players, respectively, is pivotal
but still would not deviate because it would lead to a less desirable
result.
2
R L
R 1,1 0,0
L 0,0 1,1
We expect both drivers to choose (R,R) which is the social norm in this
game.
Next, let’s conduct a class experiment.
We played the game with four pairs of students - three pairs coordinated
on SAAB98, one pair did not coordinate.
This experiment is meant to illustrate that a strategy which looks quite
distinct from the set of other available strategies (here, Fiats) can be a focal
point in a one-shot game (when no social norm can guide us).
F O
F 2,1 0,0
O 0,0 1,2
3
(F,F) and (O,O) are both Nash equilibria of the game. The Battle of the
Sexes is an interesting coordination game because players are not indifferent
on which strategy to coordinate. Men want to watch Football, while Women
want to go to the Opera.
Class Experiment 2 You are playing the battle of the sexes. You are player
2. Player 1 will make his choice first but you will not know what that move
was until you make your own. What will you play?
Last year: We divided the class up into men and women. 18 out of 25
men (i.e. 72 percent) chose the action which in case of coordination would
give them the higher payoff. In contrast, only 6 out of 11 women did the
same. These results replicate similar experiments by Rubinstein at Princeton
and Tel Aviv University. Men are simply more aggressive creatures...
When we aggregate up we found that 24 out of 36 people (66 percent)
play the preferred strategy in BoS.
Because there is an element of conflict in the BoS players use the framing
of the game in order to infer the strategies of their opponent. In the follow-
ing experiments the underlying game is always the above BoS. However, in
each case the results differ significantly from the basic experiment we just
conducted. This tells us that players signal their intention to each other, and
that the normal strategic form does not capture this belief formation process.
Class Experiment 3 You are player 1 in the Battle of the sexes. Player 2
makes the first move and chooses an action. You cannot observe her action
until you have chosen your own action. Which action will you choose.
Last year: Now a significantly higher number of students (17 instead of 12)
choose the less desirable action (O). Note, that the game is still the same
simultaneous move game as before. However, players seem to believe that
player 1 has an advantage by moving first, and they are more likely to ’cave
in’.
Class Experiment 4 You are player 1 in the Battle of the sexes. Before
actually playing the game, your opponent (player 2) had an opportunity to
make an announcement. Her announcement was ”I will play O”. You could
not make a counter-announcement. What will you play ?
4
Now 35 out of 36 students chose the less desirable action. The announcement
seems to strengthen beliefs that the other player will choose O.
This kind of communication is called cheap talk because this type of
message is costless to the sender. For exactly this reason, it should not
matter for the analysis of the game. To see that, simply expand the strategy
set of player 2. Instead of strategies F and O she now has 4 strategies - Ff,
Fo, Of, Oo - where strategy Ff means that player 2 plays F and announces
to play f, while Of means that player 2 announces O and plays f. Clearly,
the strategies Of and Oo yield exactly the same payoffs when played against
any strategy of player 1. Therefore, the game has exactly the same NE as
before. However, the announcement seems to have successfully signalled to
player 1 that player 2 will choose her preferred strategy.
Class Experiment 5 Two players are playing the Battle of the Sexes. You
are player 1. Before actually playing the game, player 2 (the wife) had an
opportunity to make a short announcement. Player 2 choose to remain silent.
What is your prediction about the outcome of the game?
Less than 12 people choose the less desirable action in this case. Apparently,
silence is interpreted as weakness.
A B
A 9,9 −15,8
B 8,−15 7,7
5
Class Experiment 6 Ask class how many would choose strategy A in this
coordination game.
Observations:
1. This game has the two Nash equilibria, namely (A,A) and (B,B). Co-
ordinating on A Pareto dominates coordination on B. Unlike the New
York and the Battle of the Sexes game, one of equilibria is clearly ’bet-
ter’ for both players. We might therefore be tempted to regard (A,A)
as the more likely equilibrium.
4 Interpretations of NE
IDSDS is a constructive algorithm to predict play and does not assume that
players know the strategy choices of other players. In contrast, in a Nash
equilibrium players have precise beliefs about the play of other players, and
these beliefs are self-fulfilling. However, where do these beliefs come from?
There are several interpretations:
6
them being rational. This is a good explanation for explaining NE in
games with a unique NE. However, it is less compelling for games with
multiple NE.
7
Lecture V: Mixed Strategies
Markus M. Möbius
March 3, 2003
R P S
1
H T
H 1,−1 −1,1
T −1,1 1,−1
Fearing this what might the opponent do? One solution is to randomize
and play a mixed strategy. Each player could flip a coin and play H with
probability 12 and T with probability 12 .
Note that each player cannot be taken advantage of.
2. Vector:
¡ If¢ the pure strategies are si1 ,..siNi write (σi (si1 ) , .., σi (siNi ))
e.g. 21 , 12 .
1
3. 2
H + 12 T
Class Experiment 1 Three groups of two people. Play RPS with each other
30 times. Calculate frequency with which each strategy is being played.
2
2 Mixed Strategy Nash Equilibrium
Write Σi (also ∆ (Si )) for the set of probability distributions on Si .
Write Σ for Σ1 × .. × ΣI . A mixed strategy profile σ ∈ Σ is an I-tuple
(σ1 , .., σI ) with σi ∈ Σi .
We write ui (σi , σ−i ) for player i’s expected payoff when he uses mixed
strategy σi and all other players play as in σ−i .
X
ui (σi , σ−i ) = (1)
si ,s−i ui (si ,s−i )σi (si )σ−i (si )
3
4 Finding Mixed Strategy Equilibria I
Definition 3 In a finite game, the support of a mixed strategy σi , supp (σi ),
is the set of pure strategies to which σi assigns positive probability
Proposition 2 If σ ∗ is a mixed strategy Nash equilibrium and s0i ,s00i ∈ supp (σi∗ )
then ¡ ¢ ¡ ¢
ui s0i , σ−i
∗
= ui s00i , σ−i
∗
4
Example 2 Find all the Nash equilibria of the game below.
L R
U 1,1 0,4
D 0,2 2,1
It is easy to see that this game has no pure strategy Nash equilibria. For
a mixed strategy Nash equilibrium to exist player 1 has to be indifferent
between strategies U and D and player 2 has to be indifferent between L and
R. Assume player 1 plays U with probability α and player 2 plays L with
probability β.
Remark 2 Recall the Battle of the Sexes experiments from last class. It
can be shown that the game has a mixed NE where each agent plays her
favorite strategy with probability 23 . This was not quite the proportion of
people playing it in class (but pretty close to the proportion of people choosing
it in the previous year)! In many instances of this experiment one finds that
men and women differed in their ’aggressiveness’. Does that imply that they
were irrational? No. In a mixed NE players are indifferent between their
strategies. As long as men and women are matched completely randomly
(i.e. woman-woman and man-man pairs are also possible) it only matters
how players mix in the aggregate! It does NOT matter if subgroups (i.e.
’men’ and ’women’) mix at different states, although it would matter if they
5
would play only against players within their subgroup. Interestingly, that
suggests that letting women ’segregate’ into their own communities should
make them more aggressive, and men less aggressive. The term ’aggressive’
is a bit misleading because it does not result in bigger payoffs. However, you
could come up with a story of lexicographic preferences - people care first of
all about payoffs, but everything else equal they want to fit gender stereotypes
- so playing ’football’ is good for men’s ego.
L C R
6
After we delete C we note that M is dominated by 23 D + 13 U . Using the above
proposition we can conclude that the only Nash equilibria are the NE of the
2 by 2 game analyzed in the previous section. Since that game had a unique
mixed strategy equilibrium we can conclude that the only NE of the 3 by 3
game is σ1∗ = 41 U + 34 D and σ2∗ = 23 L + 13 R.
It is useful to adjust the formal definition of IDSDS and allow for mixed
strategy domination:
Remark 3 Recall the above 3 by 3 game. If we would look for possible mixed
NE with supports (U,M) and (L,C) respectively, we would find a potential NE
2
3
U + 13 M, 56 L + 16 C. However, this is NOT a NE because player 2 would play
R instead.
Remark 4 In the RPS game we cannot reduce the set of strategies through
IDSDS. Therefore we have to check all possible supports and check if it works.
¡ ∗ ¢
Proposition 4 σ ∗ is a Nash equilibrium if and only if σi∗ ∈ BRi σ−i for
all i.
7
In the 2 by 2 game we have:
2
U if β > 3
2
BR1 (βL + (1 − β) R) = {αU + (1 − α) D|α ∈ [0, 1]} if β = 3
2
D if β < 3
1
L if α < 4
1
BR2 (αU + (1 − α) D) = {βL + (1 − β) R|β ∈ [0, 1]} if α = 4
1
R if α > 4
7 Interpretation of Mixed NE
1. Sometimes players explicitly flip coins. That fits games like Poker,
soccer etc., where players have to randomize credibly.
8
Lecture VI: Existence of Nash equilibrium
Markus M. Möbius
March 3, 2003
1
of how they should play a game, then the notion of NE and even even IDSDS
can be too restricting.
Behavioral game theory has tried to weaken the joint assumptions of
rationality and common knowledge in order to come up with better theories
of how real people play real games. Anyone interested should take David
Laibson’s course next year.
Despite these reservation about Nash equilibrium it is still a very useful
benchmark and a starting point for any game analysis.
In the following we will go through three proofs of the Existence Theorem
using various levels of mathematical sophistication:
• existence in 2 × 2 games using elementary techniques
L R
U 1,1 0,4
D 0,2 2,1
We next draw the best-response curves of both players. Recall that player
1’s strategy can be represented by a single number α such that σ1 = αU +
(1 − α)D while player 2’s strategy is σ2 = βL + (1 − β)R.
2
Let’s find the best-response of player 2 to player 1 playing strategy α:
u2 (L, αU + (1 − α)D) = 2 − α
u2 (R, αU + (1 − α)D) = 1 + 3α (1)
BR2(α) BR1(β)
2/3
1/4 α 1
3
What’s useful about this approach is that it generalizes to a proof that
any two by two game has at least one Nash equilibriu, i.e. its two best
response correspondences have to intersect in at least one point.
An informal argument runs as follows:
3. Two lines which connect the left/right side and the upper/lower side
of the square respectively have to intersect in at least one point. Hence
each 2 by 2 game has a mixed Nash equilibrium.
4
giant correspondence BR : [0, 1] × [0, 1] → [0, 1] × [0, 1] in the following way:
BR(α, β) = BR1 (β) × BR2 (α) (4)
The following figure illustrates the properties of the combined best-response
map BR:
BR2(α) BR1(β)
(α2,β2)
2/3
(α1,β1)
β
(BR1(β1),BR2(α1))
1/4 α 1 (BR1(β2),BR2(α2))
The neat fact about BR is that the Nash equilibria σ ∗ are precisely the
fixed points of BR, i.e. σ ∗ ∈ BR(σ ∗ ). In other words, if players have beliefs
σ ∗ then σ ∗ should also be a best response by them. The next lemma follows
directly from the definition of mixed Nash equilibrium:
Lemma 1 A mixed strategy profile σ ∗ is a Nash equilibrium if and only if it
is a fixed point of the BR correspondence, i.e. σ ∗ ∈ BR (σ ∗ ).
We therefore look precisely for the fixed points of the correspondence
BR which maps the square [0, 1] × [0, 1] onto itself. There is well developed
mathematical theory for these types of maps which we utilize to prove Nash
existence (i.e. that BR has at least one fixed point).
5
sufficient for proving the existence of nash equilibria because it only applies
to functions but not to correspondences.
Theorem 2 Kakutani A correspondence r : X → X has a fixed point
x ∈ X such that x ∈ r (x) if
1. X is a compact, convex and non-empty subset of <n .
6
a correspondence r : [0, 1] → [0, 1] which fulfills the conditions above always
intersects with the diagonal.
3/4
1/4
1/3 x 2/3 1
7
3.1.2 Kakutani Condition II: r(x) is non-empty.
If r(x) could be empty we could define a correspondence r : [0, 1] → [0, 1]
such as the following:
3
4 if x ∈ [0, 13 ]
r(x) = ∅ if x ∈ [ 31 , 32 ] (6)
1
4
if x ∈ [ 23 , 1]
As before, this correspondence has no fixed point because of the hole in the
middle.
x 1/2 1
The graph is non-convex because r( 12 ) is not convex. It also does not have a
fixed point.
8
3.1.4 Kakutani Condition IV: r(x) has a closed graph.
This condition ensures that the graph cannot have holes. Consider the follow-
ing correspondence r : [0, 1] → [0, 1] which fulfills all conditions of Kakutani
except (4): 1
£2 ¢ if x < 12
1 1
r(x) = , if x = 21 (8)
14 2 1
4
if x > 2
x 1/2 1
£ ¢
Note, that r( 12 ) is the convex set 14 , 12 but that this set is not closed. Hence
the graph is not closed. For example, consider the sequence xn = 21 and
yn = 12 − n+2
1
for n ≥ 1. Clearly, we have yn ∈ r(xn ). However, xn → x∗ = 12
and yn → y ∗ = 12 but y ∗ ∈ / r(x∗ ). Hence the graph is not closed.
9
We have to check (a) that BR is a map from some compact and convex
set X into itself, and (b) conditions (1) to (4) of Kakutani.
• First note, that BR : [0, 1] × [0, 1] → [0, 1] × [0, 1]. The square X =
[0, 1] × [0, 1] is convex and compact because it is bounded and closed.
• Now check condition (2) of Kakutani - BR(σ) is non-empty. This is
true if BR1 (σ2 ) and BR2 (σ1 ) are non-empty. Let’s prove it for BR1 -
the proof for BR2 is analogous. Player 1 will get the following payoff
u1,β (α) from playing strategy α if the other player plays β:
u1,β (α) = αβu1 (U, L) + α(1 − β)u1 (U, R) +
+ (1 − α)βu1 (D, L) + (1 − α)(1 − β)u1 (D, R) (9)
The function u1,β is continuous in α. We also know that α ∈ [0, 1] which
is a closed interval. Therefore, we know that the continuous function
u1,β reaches its maximum over that interval (standard min-max result
from real analysis - continuous functions reach their minimum and max-
imum over closed intervals). Hence there is at least one best response
α∗ which maximizes player 1’s payoff.
• Condition (3) requires that if player 1 has tow best responses α1∗ U +
(1 − α1∗ )D and α2∗ U + (1 − α2∗ )D to player 2 playing βL + (1 − β)R then
the strategy where player 1 chooses U with probability λα1∗ + (1 − λ)α2∗
for some 0 < λ < 1 is also a best response (i.e. BR1 (β) is convex).
But since both the α1 and the α2 strategy are best responses of player
1 to the same β strategy of player 2 they also have to provide the same
payoffs to player 1. But this implies that if player 1 plays strategy α1
with probability λ and α2 with probability 1 − λ she will get exactly
the same payoff as well. Hence the strategy where she plays U with
probability λα1∗ +(1−λ)α2∗ is also a best response and her best response
set BR1 (β) is convex.
• The final condition (4) requires that BR has a closed graph. To show
this consider a sequence σ n = (αn , β n ) of (mixed) strategy profiles and
σ̃ n = (α̃n , β̃ n ) ∈ BR (σ n ). Both sequences are assumed to converge to
σ ∗ = (α∗ , β ∗ ) and σ̃ ∗ = (α̃∗ , β̃ ∗ ), respectively. We now want to show
that σ̃ ∈ BR (σ) to prove that BR has a closed graph.
We know that for player 1, for example, we have
u1 (α̃n , β n ) ≥ u1 (α0 , β n )
10
for any α0 ∈ [0, 1]. Note, that the utility function is continuous in both
arguments because it is linear in α and β. Therefore, we can take the
limit on both sides while preserving the inequality sign:
u1 (α̃∗ , β ∗ ) ≥ u2 (α0 , β)
for all α0 ∈ [0, 1]. This shows that α˜∗ ∈ BR1 (β) and therefore σ̃ ∗ ∈
BR (σ ∗ ). Hence the graph of the BR correspondence is closed.
Therefore, all four Kakutani conditions apply and the giant best-response
correspondence BR has a fixed point, and each 2 × 2 game has a Nash
equilibrium.
11
We can now check that all conditions of Kakutani are fulfilled in the gen-
eral finite case. Checking them is almost 1-1 identical to checking Kakutani’s
condition for 2 × 2 games.
Condition 1: The individual mixed strategy sets Σi are clearly non-
empty because every player has at least one strategy. Since Σi is compact
Σ = Σ1 ×...×ΣI is also compact. Hence the BR correspondence BR : Σ → Σ
acts on a compact and convex non-empty set.
Condition 2: For each player i we can calculate his utiltiy ui,σ−i (σi ) for
σi ∈ Σi . Since Σi is compact and ui,σ−i is continuous the set of payoffs is also
compact and hence has a maximum. Therefore, BRi (Σi ) is non-empty.
Condition 3: Assume that σi1 and σi2 are both BR of player i to σ−i .
Both strategies have to give player i equal payoffs then and any linear com-
bination of these two strategies has to be a BR for player i, too.
Condition 4: Assume that σ n is a sequence of strategy profiles and
σ̃ ∈ BR (σ n ). Both sequences converge to σ ∗ and σ̃ ∗ , respectively. We
n
for all σi0 ∈ Σi . Note, that the utility function is continuous in both arguments
because it is linear.3 Therefore, we can take the limit on both sides while
preserving the inequality sign:
¡ ¢ ¡ ¢
ui σ̃i∗ , σ−i
∗
≥ ui σi0 , σ−i
∗
¡ ∗ ¢
for all σi0 ∈ Σi . This shows that σ̃i∗ ∈ BRi σ−i and therefore σ̃ ∗ ∈ BR (σ ∗ ).
Hence the graph of the BR correspondence is closed.
So Kakutani’s theorem applies and the giant best-response map BR has
a fixed point.
3
It is crucial here that the set of pure strategies is finite.
12
Lecture VII: Common Knowledge
Markus M. Möbius
March 4, 2003
1 A Model of Knowledge
There is a set of states of nature Ω = {ω1 , ω2 , .., ωn } which represent the
uncertainty which an agent faces when making a decision.
• (P1) ω ∈ hi (ω),
1
Note, that the subsets hi (ω) span Ω. We can think of hi (ω) as the knowledge
of agent i if the state of nature is in fact ω. Property P1 ensures that the true
state of nature ω is an element of an agent’s information set (or knowledge) -
this is called the axiom of knowledge. Property P2 is a consistency criterion.
Assume for example, that ω 0 ∈ hi (ω) and that there is a state ω 00 ∈ hi (ω 0 )
but ω 00 6∈ hi (ω). Then in the state of nature is ω the decision-maker could
argue that because ω 00 is inconsistent with his information the true state can
not be ω 0 .
H1 = {{ω1 , ω2 } , {ω3 }}
So the agent has good information if the weather is going to be sunny but
cannot distinguish between bad weather.
So the set K (E) is the collection of all states in which the decision maker
knows E.
We are now ready to define common knowledge (for simplicity we only
consider two players).
This definition implies that player 1 and 2 knows E, they know that the
other player knows it, and so on.
There is an equivalent definition of common knowledge which is frequently
easier to work with.
2
Example 1 (cont.) Agent 2 has information function
In this case the event E = {ω1 , ω2 } is common knowledge if the state of nature
is ω1 or ω2 . Both definition can be applied - E survives iterated deletion, but
is also self-evident.
1. Ki (E) = E for i = 1, 2
Proof: Assume a). Then for every ω ∈ E we have hi (ω) ⊆ E and b) follows.
c) follows because immediately. c) implies a).
3
2 Dirty Faces
We have played the game in class. Now we are going to analyze it by using
the mathematical language we just developed. Recall, that any agent can
only see the faces of all n − 1 other agents but not her own. Furthermore, at
least one face is dirty.
First of all we define the states of nature Ω. If there are n players there
are 2n − 1 possible states (since all faces clean cannot be a state of nature).
It’s convenient to denote the states by the n-tuples ω = (C, C, D, .., C). We
also denote the number of dirty faces with |ω| and note that |ω| ≥ 1 by
assumption (there is at least one dirty face).
The initial information set of each agent in some state of nature ω has at
most two elements. The agent knows the faces of all other agents ω (−i) but
does not know if she has a clean or dirty face, i.e. hi (ω) = {(C, ω (−i)) , (D, ω (−i))}.
Initially, all agents information set has two elements except in the case |ω| = 1
and one agent sees only clean faces - then she know for sure that she has a
dirty face because all clean is excluded. You can easily show that the event
”There is at least one dirty face” is common knowledge as you would expect.
The game ends when at least player knows the state of the world for sure,
i.e. her knowledge partition hi (ω) consists of a single element. In the first
this will only be the case if the state of world is such that only a single player
has a dirt face.
What happens if no agent raises her hand in the first period? All agents
update their information partition and exclude all states of nature with just
one dirty face. All agents who see just one dirty face now know for sure the
state of nature (they have a dirty face!). They raise their hand and the game
is over. Otherwise, all agents can exclude states of nature with at most two
dirty faces. Agents who see two dirty faces now know the state of nature for
sure (they have a dirty face!). etc.
The state of nature with k dirty faces therefore gets revealed at stage k
of the game. At that point all guys with dirty faces know the state of nature
for sure.
Question: What happens if it is common knowledge that neither all faces
are clean, nor all faces are dirty?
Remark 1 The game crucially depends on the fact that it is common knowl-
edge that at least one agent has a dirty face. Assume no such information
would be known - so the state of the world where all faces are clean would
4
be a possible outcome. Then no agent in the first round would ever know for
sure if she had a dirty face. Hence the information partition would never get
refined after any number of rounds.
3 Coordinated Attack
This story shows that ’almost common knowledge’ can be very different from
common knowledge.
Two divisions of an army are camped on two hilltops overlooking a com-
mon valley. In the valley waits the enemy. If both divisions attack simultane-
ously they will win the battle, whereas if only one division attacks it will be
defeated. Neither general will attack unless he is sure that other will attack
with him.
Commander A is in peace negotiations with the enemy. The generals
agreed that if the negotiations fail, commander A will send a message to
commander B with an attack plan. However, there is a small probability ²
that the messenger gets intercepted and the message does not arrive. The
messenger takes one hour normally. How long will it take to coordinate on
the attack?
The answer is: never! Once commander B receives the message he has to
confirm it - otherwise A is not sure that he received it and will not attack.
But B cannot be sure that A receives his confirmation and will not attack
until he receives another confirmation from A. etc. The messenger can run
back and forth countless times before he is intercepted but the generals can
never coordinate with certainty.
Let’s define the state of nature to be (n, m) if commander A sent n
messages and received n−1 confirmation from commander B, and commander
B sent m messages and received m − 1 confirmations. We also introduce the
state of nature (0, 0). In that state the peace negotiations have succeeded,
and no attack is scheduled.1
The information partition of commander A is
HA = {{(0, 0)} , {(1, 0), (1, 1)} , {(2, 1), (2, 2)} , ..} . (1)
1
As was pointed out by an alert student in class this state is necessary to make this
exercise interesting. Otherwise, the generals could agree on an attack plan in advance,
and no communication would be necessary at all - the attack would be common knowledge
already.
5
The information partition of commander B is
HB = {{(0, 0), (1, 0)} , {(1, 1), (2, 1)} , ..} . (2)
• µi (ω) is agent i’s belief about the actions taken by his opponent.
6
• knows the other players’ actions: hi (ω) ⊆ {ω 0 ∈ Ω|a−i (ω 0 ) = a−i (ω)}
• has a belief which is consistent with his knowledge: the support µi (ω)
is a subset of {a−i (ω 0 ) ∈ A−i : ω 0 ∈ hi (ω)}
4.2 IDSDS
For applying iterated deletion of strictly dominated strategies we can use a
less restrictive information partition.
7
Lecture VIII: Learning
Markus M. Möbius
March 6, 2003
1 Introduction
What are the problems with Nash equilibrium? It has been argued that Nash
equilibrium are a reasonable minimum requirement for how people should
play games (although this is debatable as some of our experiments have
shown). It has been suggested that players should be able to figure out
Nash equilibria starting from the assumption that the rules of the game, the
players’ rationality and the payoff functions are all common knowledge.1 As
Fudenberg and Levine (1998) have pointed out, there are some important
conceptual and empirical problems associated with this line of reasoning:
1. If there are multiple equilibria it is not clear how agents can coordinate
their beliefs about each other’s play by pure introspection.
1
3. Equilibrium theory does a poor job explaining play in early rounds of
most experiments, although it does much better in later rounds. This
shift from non-equilibrium to equilibrium play is difficult to reconcile
with a purely introspective theory.
1. We can imagine that players are locked into their actions for quite
2
a while (they invest infrequently, can’t build a new factory overnight
etc.) and that their discount factors (the factor by which they weight
the future) is small compared that lock-in length. It them makes sense
to treat agents as approximately myopic when making their decisions.
2 Cournot Adjustment
In the Cournot adjustment model two fixed players move sequentially and
choose a best response to the play of their opponent in the last period.
The model was originally developed to explain learning in the Cournot
model. Firms start from some initial output combination (q10 , q20 ). In the
first round both firms adapt their output to be the best response to q20 . They
therefore play (BR1 (q20 ) , BR2 (q10 )).
This process is repeated and it can be easily seen that in the case of linear
demand and constant marginal costs the process converges to the unique
Nash equilibrium. If there are several Nash equilibria the initial conditions
will determine which equilibrium is selected.
• In each period play can actually change quite a lot. Intelligent firms
should anticipate their opponents play in the future and react accord-
ingly. Intuitively, this should speed up the adjustment process.
3
BR1(q2)
q2
0 0 1 0
(q1,q2) (q1,q2)
(q1,q2)
1 2
(q* ,q* )
1 2
BR2(q1)
(α−c)/2β (α−c)/β
q1
3 Fictitious Play
In the process of fictitious play players assume that their opponents strategies
are drawn from some stationary but unknown distribution. As in the Cournot
adjustment model we restrict attention to a fixed two-player setting. We also
assume that the strategy sets of both players are finite.
4
In fictitious play players choose the best response to their assessment of
their opponent’s strategy. Each player has some exogenously given weighting
function κ0i : S−i → <+ . After each period the weights are updated by adding
1 to each opponent strategy each time is has been played:
½ t−1
t κi (s−i ) if s−i 6= st−1
−i
κi (s−i ) =
κt−1
i (s−i ) + 1 if s −i = st−1
−i
The player then chooses a pure strategy which is a best response to his
assessment of other players’ strategy profiles. Note that there is not neces-
sarily a unique best-response to every assessment - hence fictitious play is
not always unique.
We also define the empirical distribution dti (si ) of each player’s strategies
as Pt
t I t̃ (si )
di (si ) = t̃=0
t
The indicator function is set to 1 if the strategy has been played in period t̃
and 0 otherwise. Note, that as t → ∞ the empirical distribution dtj of player
j’s strategies approximate the weighting function κti (since in a two player
game we have j = −i).
Remark 1 The updating of the weighting function looks intuitive but also
somewhat arbitrary. It can be made more rigorous in the following way.
Assume, that there are n strategy profiles in S−i and that each profile is
played by player i’s opponents’ with probability p (s−i ). Agent i has a prior
belief according to which these probabilities are distributed. This prior is
a Dirichlet distribution whose parameters depend on the weighting function.
After each round agents update their prior: it can be shown that the posterior
belief is again Dirichlet and the parameters of the posterior depend now on
the updated weighting function.
5
Proposition 1 If s is a strict Nash equilibrium, and s is played at date t in
the process of fictitious play, s is played at all subsequent dates. That is, strict
Nash equilibria are absorbing for the process of fictitious play. Furthermore,
any pure-strategy steady state of fictitious play must be a Nash equilibrium.
H T
H 1,−1 −1,1
T −1,1 1,−1
Assume that both players have initial weights weights (1.5,2) and (2,1.5).
Then fictitious play cycles as follows: In the first period, 1 and 2 play T, so
the weights the next period are (1.5,3) and (2, 2.5). Then 1 plays T and 2
plays H for the next two periods, after which 1’s weights are (3.5,3) and 2’s
are (2,4.5). At this point 1 switches to H, and both players play H for the
next three periods, at which point 2 switches to T, and so on. It may not be
obvious, but although the actual play in this example cycles, the empirical
6
¡1 ¢
distribution over each player’s strategies are converging to ,1
2 2
- this is
precisely the unique mixed Nash equilibrium.
This observation leads to a general result.
These results don’t tell us when fictitious play converges. The next the-
orem does precisely that.
7
L M R
¡ ¢
The unique mixed NE of the game is s1 = s2 = 31 , 13 , 13 .
If the initial play is (T, M ) then the sequence becomes (T, M ), (T, R),
(M, R), (M, L), (D, L), (D, M ), (T, M ).
One can show that the number of time each profile is played increases at
a fast enough rate such that the play never converges. Also note, that the
diagonal entries are never played.
8
A B
A 0,0 1,1
B 1,1 0,0
¡ √ ¢
Assume that the initial weights are 1, 2 for both players. In the first
period both players think
¡ √ the¢ other will play B, so both play A. The next pe-
riod the weights are 2, 2 , and both play B; the outcome is the alternating
sequence (B, B), (A, A), (B, B), and so on. The empirical frequencies of each
player’s choices converge to 1/2, 1/2, which is the Nash equilibrium. The
realized play is always on the diagonal, however, and both players receive
payoff 0 each period. Another way of putting this is that the empirical joint
distribution on pairs of actions does not equal the product of the two marginal
distributions, so the empirical joint distribution corresponds to correlated as
opposed to independent play.
This type of correlation is very appealing. In particular, agents don’t seem
to be smart enough to recognize cycles which they could exploit. Hence the
attractive property of convergence to a Nash equilibrium can be misleading
if the equilibrium is mixed.
9
Lecture IX: Evolution
Markus M. Möbius
March 15, 2003
1 Introduction
For models of learning we typically assume a fixed number of players who
find out about each other’s intentions over time. In evolutionary models the
process of learning is not explicitly modeled. Instead, we assume that strate-
gies which do better on average are played more often in the population over
time. The biological explanation for this is that individuals are genetically
programmed to play one strategy and their reproduction rate depends on
their fitness, i.e. the average payoff they obtain in the game. The economic
explanation is that there is social learning going on in the background -
people find out gradually which strategies do better and adjust accordingly.
However, that adjustment process is slower than the rate at which agents
play the game.
We will focus initially on models with random matching: there are N
agents who are randomly matched against each other over time to play a
certain game. Frequently, we assume that N as infinite. We have discussed
in the last lecture that random matching gives rise to myopic play because
there are no repeated game concerns (I’m unlikely to ever encounter my
current opponent again).
We will focus on symmetric n by n games for the purpose of this course.
In a symmetric game each player has the same strategy set and the payoff
matrix satisfies ui (si , sj ) = uj (sj , si ) for each player i and j and strategies
si , sj ∈ Si = Sj = {s1 , .., sn }. Many games we encountered so far in the
1
course are symmetric such as the Prisoner’s Dilemma, Battle of the Sexes,
Chicken and all coordination games. In symmetric games both players face
exactly the same problem and their optimal strategies do not depend on
whether they play the role of player 1 or player 2.
An important assumption in evolutionary models is that each agent plays
a fixed pure strategy until she dies, or has an opportunity to learn and about
her belief. The game is fully specified if we know the fraction of agents
who play strategy s1 , s2 , .., sn which we denote with x1 , x2 , .., xn such that
P n
i=1 xi = 1.
3 Replicator Dynamics
The replicator dynamics is one particular selection mechanism which captures
the notion that strategies with above average payoff should spread in the
population. Typically, the replicator dynamics is modelled without allowing
for mutations - the dynamics therefore becomes deterministic.
Since the stage game is symmetric we know that u (si , sj ) = u1 (si , sj ) =
u2 (sj , si ) . The average payoff of strategy si for a player is u (si , x) since he
is randomly matched with probability xj against agents playing sj (x =
1
If we start from an all-cooperating state, mutating agents will defect and do better.
Hence they spread, and finally take over. Without mutations the system might be ’stuck’
in the all-cooperating state.
2
Pn
(x1 , x2 , .., xn )). The average payoff of all strategies is i=1 xi u (si , x) =
u (x, x).
Strategy si does better than the average strategy if u (si , x) > u (x, x),
and worse otherwise. A minimum requirement for a selection mechanism is
that sgn (ẋi (t)) = sgn [u (si , x) − u (x, x)]. The share xi increases over time
if and only if strategy si does better than average. The replicator dynamics
is one particular example:
Definition 1 In the replicator dynamics the share xi of the population play-
ing strategy si evolves over time according to:
ẋi
= u (si , x) − u (x, x)
xi
If xi = 0 at time 0 then we have xi = 0 at all subsequent time periods: if
nobody plays strategy si then the share of population playing it can neither
decrease not increase.
The next proposition makes sure that the definition is consistent (i.e.
population share always sum up to 1).
Proposition 1 The population shares always sum up to 1.
Note, that the reverse is NOT true. If all players cooperate in a Prisoner’s
Dilemma this will be a steady state (since there are no mutations).
3
3.1.1 Example I
In 2 by 2 games the replicator dynamics is easily understood. Look at the
following game:
A B
A 0,0 1,1
B 1,1 0,0
There are only two types in the population and x = (xA , xB ). It’s enough to
keep track of xA .
ẋA
= xB − 2xA xB = (1 − xA ) (1 − 2xA ) (1)
xA
It’s easy to see that ẋA > 0 for 0 < xA < 12 and ẋA < 0 for 1 > xA > 21 . This
makes xA = 1/2 a ’stable’ equilibrium (see below for a precise definition).
3.1.2 Example II
Now look at the New York game.
4
E C
E 1,1 0,0
C 0,0 1,1
We can show:
ẋE
= xE − (xE xE + xC xC ) = (1 − xE ) (2xE − 1) (2)
xE
1
Now the steady state xE = 2
is ’unstable’.
3.2 Stability
Definition 3 A steady state σ is stable if for all ² > 0 there exists δ > 0
such that if the process starts a distance δ away from the steady state it will
never get further away than ².
5
Proof: Assume not. Then there exists a profitable deviation si such that
u (si , σ) − u (σ, σ) = b > 0. Because the linear utility function is uni-
formly continuous there exists some ² such that for all x a distance less
than ² away we have |u (si , x) − u (si , σ)| < 4b and |u (x, x) − u (σ, σ)| <
b
4
. This implies that |u (si , x0 ) − u (x, x)| > 2b . Because σ is stable
we know that for x close enough to σ (less ¡ than ¢a distance δ the dy-
1 1
namics converges to σ. So take x (0) = 1 − 2 δ x + 2 δsi . Then we
δ
b
have dxdt
i
= xi (u (si , x) − u (x, x)) ≥ xi 2b ≥ 2
2
. But this implies that
x (t) → ∞ which is a contradiction.
R P S
6
We then get:
ẋR
= −xP + xS (5)
xR
ẋP
= xR − xS (6)
xP
¡ ¢
The unique mixed equilibrium 13 , 13 , 13 is the only steady state of the system.
To see whether it is stable we have to linearize the system around the
steady state. We define x̃i = xi − 13 and obtain:
Definition 6 The state x is ESS if for all y 6= x there exists ² such that
u (x, (1 − ²) x + ²y) > u (y, (1 − ²) x + ²y) for all 0 < ² < ².
This definition captures the idea that a stable state is impervious to muta-
tions since they do worse than the original strategy.
Proposition 3 x is ESS iff for all y 6= x either (a) u (x, x) > u (y, x) or (b)
u (x, x) = u (y, x) and u (x, y) > u (y, y).
The following results establish the link between Nash equilibrium and
ESS.
7
Proposition 6 If x is a totally mixed ESS then it is the unique ESS.
The next theorem establishes the link between ESS and stability under
the replicator dynamics.
5 Stochastic Stability
Stochastic stability combines both mutations and a selection mechanism. It
provides a much more powerful selection mechanism for Nash equilibria than
ESS and the replicator dynamics.
The ESS concept gives us some guidance as to which Nash equilibria are
stable under perturbations or ’experimentation’ by agents. The replicator
dynamics tells us how a group of agents can converge to some steady state
starting from initial conditions. However, selection relies in many cases on the
initial conditions (take the coordination game, for example). In particular,
we cannot select between multiple strict equilibria.
For this section we concentrate on generic coordination games:
A B
A a,a b,c
B c,b d,d
We assume, that a > c and d > b such that both (A, A) and (B, B) are NE
of the game. We assume that a + b > c + d such that A is the risk-dominant
8
strategy for each player. Note, that the risk-dominant NE is not necessarily
the Pareto optimal NE.
We have seen in experiments that agents tend to choose A. Can we justify
this with an evolutionary story?
YES!
Assume that there is a finite number n of agents who are randomly
matched against each other in each round. Assume that agents choose the
best response to whatever strategy did better in the population in the last
period. This is called the BR dynamics. Clearly, all agents will choose A if
d−b
xA > q ∗ = a−b+d−c . Because A is risk-dominant we have q ∗ < 21 .
There are also mutations in each period. Specifically, with probability ²
each agent randomizes between both strategies.
We define the basin of attraction of xA = 1 (everybody plays A) to be
BA = [q ∗ , 1] and the basin of attraction of xA = 0 to be [0, q ∗ ]. Whenever
the initial state of the system is within the basin of attraction it converges
to all A/ all B for sure if there are no mutations. We define the radius of the
basin BA to be the number of mutations it takes to ’jump out’ of the state
all A. We get RA = (1 − q ∗ ) n. Similarly, we define the co-radius CRA as the
number of mutations it takes at most to ’jump into’ the basin BA . We get
CRA = q ∗ n.
Theorem 3 If CRA < RA then the state ’all A’ is stochastically stable. The
waiting time to reach the state ’all A’ is of the order ²CRA .
9
Lecture X: Extensive Form Games
Markus M. Möbius
April 9, 2003
1 Introduction
While models presented so far are fairly general in some ways it should be
noted that they have one main limitation as far as accuracy of modeling goes
- in each game each player moves once and moves simultaneously.
This misses common features both of many classic games (bridge, chess)
and of many economic models.1 A few examples are:
1
2 Extensive Form Games
The extensive form of a game is a complete description of
In Out
1 2
0
F A
−1 1
−1 1
2
1
q1=0 q1=2
q1=1
2 2 2
0 1 2 0 1 2 0 1 2
0 0 0 2 1 0 2 0 −2
0 2 2 0 1 0 0 0 −2
H T
2 2
H T H T
1 −1 −1 1
−1 1 1 −1
3
1
H T
2 2
H T H T
1 −1 −1 1
−1 1 1 −1
The extensive form representation allows that players can ’forget’ infor-
mation. For example we can assume that in a game with 4 rounds player 2
can observe player 1’s move in round 1, but in round 4 he has forgotten the
move of player 1. In most cases, we assume perfect recall which rules out
that players have such ’bad memory’.2
2. A finite set T of nodes which form a tree along with functions giving
for each non-terminal node t 6∈ Z (Z is the set of terminal nodes)
4
3. Payoff functions ui : Z → < giving the players payoffs as a function of
the terminal node reached (the terminal nodes are the outcomes of the
game).
We sometimes write i (h) and A (h) since the action set is the same for
each node in the same information set.
It is useful to go over the definition in detail in the matching pennies game
where player 2 can’t observe player 1’s move. Let’s number the non-terminal
nodes x1 , x2 and x3 (top to bottom).
3. The tree defines clearly the terminal nodes, and shows that x2 and x3
are successors to x1 .
Write Ai for the set of actions available to player i at any of his information
sets.
5
Note that a strategy is a complete contingent plan explaining what a
player will do in any situation that arises. At first, a strategy looks overspec-
ified: earlier action might make it impossible to reach certain sections of a
tree. Why do we have to specify how players would play at nodes which can
never be reached if a player follows his strategy early on? The reason is that
play off the equilibrium path is crucial to determine if a set of strategies form
a Nash equilibrium. Off-equilibrium threats are crucial. This will become
clearer shortly.
Out In
F 2,0 −1,−1
A 2,0 1,1
3
You might think that a more natural definition would simply define mixed strategies
as a randomization over a set of pure strategies (just as in simultaneous move games). It
can be shown that for games with perfect recall this definition is equivalent to the one
given here, i.e. a mixed strategy is a mixed behavior strategy and vice versa. In games
without perfect recall this is no longer true - it is instructive to convince yourself that in
such games each mixed behavior strategy is a mixed strategy but not vice versa.
6
4.2 Example II: Stackelberg Competition
Firm 1 chooses q1 and firm 2 chooses a quantity q2 (q1 ). With three possible
output levels, firm 1 has three strategies, while firm 2 has 33 = 9 different
strategies because it can choose three strategies at its three information sets.
HH HT TH TT
4.4 Example IV
Look at the following extensive form game:
7
1
L R
1 2
L R
0 1
a b
4 3
0 3
One might be tempted to say that player 1 has three strategies because there
are only three terminal nodes which can be reached. However, there are 4
because La and Lb are two distinct strategies. After player 1 plays L it
is irrelevant for the final outcome what he would play in the bottom node.
However, this off equilibrium pay is important for player 2’s decision process
which in turn makes 1 decide whether to play L or R.
L R
La 1,0 1,0
Lb 1,0 1,0
Ra 0,1 4,0
Rb 0,1 3,3
8
5 Nash Equilibrium in Extensive Form Games
We can apply NE in extensive form games simply by looking at the normal
form representation. It turns out that this is not an appealing solution
concept because it allows for too many profiles to be equilibria.
Look at the entry game. There are two pure strategy equilibria: (A,In)
and (F,Out) as well as mixed equilibria (αF + (1 − α) A, Out) for α ≥ 12 .
Why is (F,Out) a Nash equilibrium? Firm 2 stays out because he thinks
that player 2 will fight entry. In other words, the threat to fight entry is
sufficient to keep firm 2 out. Note, that in equilibrium this threat is never
played since firm 2 stays out in the first place.
The problem with this equilibrium is that firm 2 could call firm 1’s bluff
and enter. Once firm 2 has entered it is in the interest of firm 1 to accom-
modate. Therefore, firm 1’s threat is not credible. This suggests that only
(A,In) is a reasonable equilibrium for the game since it does not rely on non-
credible threats. The concept of subgame perfection which we will introduce
in the next lecture rules out non-credible threats.
s1 = q10
½ 1−q10
if q1 = q10
s2 = 2
1 − q10 6 q10
if q1 =
In words: firm 2 floods the market such that the price drops to zero if firm
1 does not choose q10 . It is easy to see that these strategies form a NE. Firm
1 can only do worse by deviating since profits are zero if firm 2 floods the
market. Firm 2 plays a BR to q10 and therefore won’t deviate either.
Note, that in this game things are even worse. Unlike the Cournot game
where we got a unique equilibrium we now have a continuum of equilibria.
Second, we have even more disturbing non-credible threats. For instance, in
the equilibrium where q10 = 1 firm 2’s threat is ”if you don’t flood the market
9
and destroy the market I will”. Not only won’t the threat be carried out -
it’s also hard to see why it would be made in the first place.
10
Lecture XI: Subgame Perfect Equilibrium
Markus M. Möbius
April 9, 2003
1 Introduction
Last time we discussed extensive form representation and showed that there
are typically lots of Nash equilibria. Many of them look unreasonable because
they are based on out of equilibrium threats. For example, in the entry
game the incumbent can deter entry by threatening to flood the market. In
equilibrium this threat is never carried out. However, it seems unreasonable
because the incumbent would do better accommodating the entrant if entry
in fact occurs. In other words, the entrant can call the incumbent’s bluff by
entering anyway.
Subgame perfection is a refinement of Nash equilibrium. It rules out
non-credible threats.
2 Subgames
Definition 1 A subgame G0 of an extensive form game G consists of
1
2.1 Example I: Entry Game
2
In Out
1 2
0
F A
−1 1
−1 1
This game has two subgames. The entire game (which is always a sub-
game) and the subgame which is played after player 2 has entered the market:
F A
−1 1
−1 1
2.2 Example II
1
L R
2 2
L R L R
a b
2 2
a b a b
2
This game has also two subgames. The entire game and the subgame (a
simultaneous move game) played after round 2:
a b
2 2
a b a b
3
The FOC for maximizing u2 is q2∗ = 21 .
Note that firm 2 (the first mover) produces more than in Cournot.
There are many games which fit the Stackelberg paradigm such as mon-
etary policy setting by the central bank, performance pay for managers etc.
We will discuss general results for this class of games in the next lecture.
4 Backward Induction
The previous example illustrates the most common technique for finding and
verifying that you have found the SPE of a game. Start at the end of the
game and work your way from the start.
We will focus for the moment on extensive form games where each infor-
mation set is a single node (i.e. players can perfectly observe all previous
moves).
What does generic mean? With generic payoffs players are never indiffer-
ent between two strategies. If payoffs are randomly selected at the terminal
nodes then indifference between two actions is a zero probability event. More
mathematically, we can say that the results holds for almost all games.
Proof: I did it in class, and I do a more general proof in the next section
for games with imperfect information. Intuitively, you solve the last
rounds of the game, then replace these subgames with the (unique)
outcome of the NE and repeat the procedure.
4
1
L R
2 2
A B C D
3 0 1 1
2 0 1 1
5 Existence of SPE
The next theorem shows that the logic of backward induction can be extended
to games with imperfect information.
This theorem is the equivalent of the Nash existence theorem for extensive
form games. It establishes that SPE is not too strong in the sense that a
SPE exists for each extensive form game. We have seen that NE is too weak
in extensive form games because there are too many equilibria.
The proof of the theorem is a generalization of backward induction. In
backward induction we solve the game from the back by solving node after
node. Now we solve it backwards subgame for subgame.
Formally, define the set Γ of subgames of the game G. Γ is never empty
because G itself is a member. We can define a partial order on the set Γ such
that for two subgames G1 and G2 we have G1 ≥ G2 if G2 is a subgame of
G1 .
Look at the following example.
5
1
A B C
1 1 1
−1
H T H T
2 2 2 2
H T H T H T H T
1 −1 −1 1 1 −1 −1 1
−1 1 1 −1 −1 1 1 −1
This game has three subgames: the whole game G1 and two matching pennies
subgames G2 and G3 . We have G1 ≥ G2 and G1 ≥ G3 but G2 and G3 are
not comparable.
Step I Identify the terminal subgames. Terminal subgames are those
which do not dominate another subgame (G0 is terminal if there is no G00
such that G0 > G00 .
Step II Solve the terminal subgames These subgames have no further
subgames. They have a Nash equilibrium by the Nash existence result (they
are finite!).
In our example the matching pennies subgames have the unique NE 12 H +
1
2
T for each player.
Step III Calculate the Nash payoffs of the terminal subgames and replace
these subgames with the Nash payoffs.
In our example the matching pennies payoffs are 0 for each player. We
get:
1
A C
B
1 0 0
−1 0 0
6
Step IV Goto step I. Repeat this procedure until all subgames have been
exhausted. In this way we construct ’from the back’ a SPE. In many cases
the procedure does not produce a unique SPE if a subgame has multiple NE.
In our example we are lucky because matching pennies just has one NE. In
the reduced game¡ player 1 plays A which¢ is his unique
¡ 1 BR.1The1 unique SPE
¢
∗ 1 1 1 1 ∗ 1
is therefore s1 = A, 2 H + 2 T, 2 H + 2 T and s2 = 2 H + 2 T, 2 H + 2 T .
For concreteness assume that agents want to spend 100 Dollars over two
time periods. Their discount factor for next period’s
√ utility is 0.5. Their
utility function is the square root function u (x) = x. You can check that
the agent would allocate 80 Dollar to today’s consumption and the rest to
tomorrow’s consumption.
7
The agent is necessarily time-consistent in a two-period model because
she cannot reallocate any resources in period 2. However this is no longer
true in a 3-period model.
Let’s first look at exponential discounting where consumption in period t
is discounted by δ t :
Let’s assume the same parameters as above. It’s easy to check that the agent
would want to allocate 2100
16
≈ 76 Dollars to the first period, 19 Dollars to
the second and 5 to the third.
Now the agent could potentially change her allocation in period 2. Would
she do so? The answer is no. Her decision problem in period 2 can be written
as maximizing U2 given that x2 + x3 = 100 − 76:
Note, that:
U1 = u (x1 ) + δU2 (4)
Therefore, if an agent would just a different consumption plan in period 2
she would have done so in period 1 as well.
We say that there is no conflict between different selves in games
with exponential discounting. Agents are time-consistent. Time-
consistent preferences are assumed in most of micro and macro economics.
8
What would the period 2 agent do with her remaining 50 Dollars? Her
decision problem would look as follows:
U = u (x2 ) + βu (x3 ) (6)
So she would allocate 40 Dollars to period 2 and only 10 Dollars to the third
period.
Therefore there is conflict between agent 1’s and agent 2’s preferences!
Agent 1 would like agent 2 to save more, but agent 2 can’t help herself and
splurges!
9
perbolic agent could invest 50 Dollars in a 401k from which he can’t with-
draw more than 25 Dollars in the second period. While a time-consistent
agent would never enter such a bargain (unexpected things might happen
- so why should he constrain his future choices), a time-inconsistent agent
might benefit.
Let’s assume that there are three types of agents. Time consistent agents
(TC) have δ = 1 and β = 1. Naive agents have β = 12 and sophisticated
agents are aware of their self-control problem.
TC agents will write the report immediately and skip the mediocre movie.
Generally, TC agents will maximize v − c.
Naive agents will write the report in the last period. They believe that
they will write the report in the second period. In the second period, they
assume to write it in the third period (cost 4 versus 5 now). In the third
period they again procrastinate.
Sophisticated agents use backward induction. They know that period 3
agent would procrastinate. Period 2 agent would predict period 3’s procras-
tination and write the report. Period 1 agent knows that period 2 agent will
write the report and can therefore safely procrastinate.
This example captures the idea that sophistication can somehow help
to overcome procrastination because agents are aware of their future selfs
tendencies to procrastinate.
10
7.2 Salient Rewards
Example 2 Assume you can go to the cinema on Saturdays. The schedule
consists of a mediocre movie this week, a good movie next week, a great movie
in two weeks and (best of all) a Johnny Depp movie in three weeks. You have
no money but a coupon to spend on exactly one movie. The benefit of seeing
the movies are (v1 , v2 , v3 , v4 ) = (3, 5, 8, 13). Which movie do you see?
The TC agent would wait and see the Depp movie which gives the highest
benefit.
The naive agent would see the third movie. He would not the mediocre
one because he would expect to see either the Depp movie later. He would
also not see the week 2 movie for the same reason. But in week 3 he caves
in to his impatience and spends the coupon.
The sophisticated agent would see the mediocre movie! She would expect
that period 3 self caves in to her desires. Period 2 self would then go to the
movies expecting period 3 self to cave in. But then period 1 self should go
immediately because 3 > 2.5.
The result is the opposite of the result we got for the procrastination
example. Sophistication hurts agents! The intuition is that naive agents
can pretend that they see the Depp movie and therefore are willing to wait.
Sophisticated agents know about their weakness and therefore don’t have the
same time horizon. This makes them cave in even earlier.
A sophisticated agent would not be able to withstand a jar of cookies
because she knows that she would cave in too early anyway, so she might as
well cave in right away.
11
his 1,000 Dollars at a rate of 6 percent. You make ca quick calculation and
decide that you should do more research before signing on because research
only causes you a disutility of 30 Dollars and the compounded interest gain
over 30 years far exceed this amount. What will you do?
You won’t do anything and still have your 1000 Dollars after 30 years. Wait-
ing a year has a cost of 50 Dollars (lost interest) which is discounted by β = 12
and thus is below the salient cost of doing research. So you wait. and wait.
and wait.
This is a nice example of why the perfect can be the enemy of the good:
more choice can lead to procrastination. For a naive decision maker choices
are determined by long-term benefits (just as for TC decision maker). How-
ever, procrastination is caused by small period to period costs of waiting
which exceed salient costs of doing research.
12
Lecture XIII: Analysis of Infinite Games
Markus M. Möbius
April 9, 2003
Readings for this class: OR (sections 6.5, 8.1 and 8.2) Fudenberg
and Tirole, chapters 4.2 and 4.3 are most useful; Dutta, chapter 15
1
1
L R
10 1
0
A B
2 2
x y x y
1 5 0 4
1 0 −100 0
Here (L, A, x) is the unique SPE. However, player 2 has to put a lot of
trust into player 1’s rationality in order to play x. He must believe that
player 1 is smart enough to figure out that A is a dominant strategy in the
subgame following R. However, player 2 might have serious doubts about
player 1’s marbles after the guy has just foregone 5 utils by not playing L.1
Remark 1 The main reason for the breakdown of cooperation in the finitely
repeated Prisoner’s Dilemma is not so much SPE by itself by the fact that
there is a final period in which agents would certainly defect. This raises the
question whether an infinitely repeated PD game would allow us to cooperate.
Essentially, we could cooperate as long as the other player does, and if there
is defection, we defect from then on. This still looks like a SPE - in any
subgame in which I have defected before, I might as well defect forever. If I
haven’t defected yet, I can jeopardize cooperation by defection, and therefore
should not do it as long as I care about the future sufficiently.
2
The proof proceeds by analyzing the last stage game where we would see
defection for sure. But then we would see defection in the second to last
stage game etc. In the same way we can show that the finitely repeated
Bertrand game results in pricing at marginal cost all the time.
These results should make us suspicious. Axelrod’s experiments (see fu-
ture lecture) showed that in the finitely repeated Prisoner’s Dilemma people
tend to cooperate until the last few periods when the ’endgame effect’ kicks
in. Similarly, there are indications that rival firms can learn to collude if they
interact repeatedly and set prices above marginal cost.
This criticisms of SPE is reminiscent of our criticism of IDSDS. In both
cases we use an iterative procedure to find equilibrium. We might have doubts
where real-world subjects are able (and inclined) to do this calculation.
A famous example for the perils of backward induction is Rosenthal’s
centipede game:
1
L R
1 2
0
L R
0 1
2
L R
3 2
1
L R
2 1
4
L R
5 2
3
L R
4 7
6 5
The game can be extended to even more periods. The unique SPE of the
game is to drop out immediately (play L) at each stage. However, in experi-
3
ments people typically continue until almost the last period before they drop
out.
ui (si , s−i ) = ui0 (si , s−i ) + δui1 (si , s−i ) + δ 2 ui2 (si , s−i ) + .. (1)
where uit (si , s−i ) is a payoff received at t when the strategies are followed.
2.1.1 Interpretation of δ
1
1. Interest rate δ = 1+r . Having two dollars today or two dollars tomor-
row makes a difference to you: your two dollars today are worth more,
because you can take them to the bank and get 2 (1 + r) Dollars to-
morrow where r is the interest rate. By discounting future payoffs with
1
δ = 1+r we correct for the fact that future payoffs are worth less to us
than present payoffs.
4
2. Probabilistic end of game: suppose the game is really finite, but that
the end of the game is not deterministic. Instead given that stage t is
reached there is probability δ that the game continues and probability
1 − δ that the game ends after this stage. Note, that the expected
1
number of periods is then 1−δ and finite. However, we can’t apply
backward induction directly because we can never be sure that any
round is the last one. The probabilistic interpretation is particularly
attractive for interpreting bargaining games with many rounds.
5
Remark 2 There are finite versions of this game in which play end after
period T . One has to make some assumption what happens if there is no
agreement at time T - typically, one assumes that the pie simply disappears.
If T = 1 then we get the simple ultimatum game.
3 Continuity at Infinity
None of the tools we’ve discussed so far are easy to apply for infinite games.
First, backward induction isn’t feasible because there is no end to work back-
ward from. Second, using the definition of SPE alone isn’t very easy. There
are infinitely many subgames and uncountably many strategies that might
do better.
We will discuss a theorem which makes the analysis quite tractable in
most infinite horizon games. To do so, we must first discuss what continuity
at infinity means.
In words: compare the payoffs of two strategies which are identical for all
information sets up to time T and might differ thereafter. As T becomes large
the maximal difference between any two such strategies becomes arbitrarily
small. Essentially, this means that distant future events have a very small
payoff effect.
6
For finite stage games we know that M = maxi,a,a0 |ũi (a) − ũi (a0 )| is finite.
This implies that
∞
X δT
lim |ui (σ) − ui (σ 0 )| ≤ lim δ t M = lim M = 0.
T →∞ T →∞ T →∞ 1 − δ
t=T
Note, that players in this game can win a price of 1 by staying in the game
longer than the other player. However, staying in the game is costly for both
players. Each player wants the game to finish as quickly as possible, but also
wants the other player to drop out first.
This game is not continuous at ∞. Let
Then we have ¯ ¡ T¢ ¡ ¢¯
¯ui σ − ui σ 0T ¯ = 1.
7
Theorem 1 Let G be an infinite horizon multistage game with observed ac-
tions which is continuous at ∞. A strategy profile (σ1 , .., σI ) is a SPE if and
only if there is no player i and strategy σ̂i that agrees with σi except at a sin-
gle information set hti and which gives a higher payoff to player i conditional
on hti being reached.
We write ui (σi , σ−i |x) for the payoff conditional on x being reached. For
example, in the entry game below we have u2 (Accomodate, Out|node 1 is reached) =
1.
2
In Out
1 2
0
F A
−1 1
−1 1
Recall, that we can condition on nodes which are not on the equilibrium path
because the strategy of each player defines play at each node.
8
T=1: In this case the result is clear. Suppose σi and σ̂i differ only in
the information set in period t0 . If t > t0 it is clear that ui (σ̂i , σ−i |xt ) =
ui (σi , σ−i |xt ) because the two strategies are identical at all the relevant in-
formation sets. If t ≤ t0 then:
X
ui (σ̂i , σ−i |xt ) = ui (σ̂i , σ−i |hit0 ) P rob {hit0 |σ̂i , σ−i , xt }
hit0
X
≤ ui (σi , σ−i |hit0 ) P rob {hit0 |σi , σ−i , xt }
| {z } | {z }
hit0
follows from one follows from σi
stage deviation and σ̂i having
criterion same play be-
tween t and t0
= ui (σi , σ−i |xt )
T → T+1:Assuming that the result holds for T let σ̂i be any strategy
differing from σi in T + 1 periods. Let t0 be the last period at which they
differ and define σ̃i by:
½
σ̂i (hit ) if t < t0
σ̃i (hit ) =
σi (hit ) if t ≥ t0
In other words, σ̃i differs from σi only at T periods. Therefore we have for
any xt
ui (σ̃i , σ−i |xt ) ≤ ui (σi , σ−i |xt )
by the inductive hypothesis since we assumed that the claim holds for T .
However, we also know that σ̃ and σ̂ only differ in a single deviation at
round t0 . Therefore, we can use exactly the same argument as in the previous
step to show that
ui (σ̂i , σ−i |xt ) ≤ ui (σ̃i , σ−i |xt )
for any xt .4
Combining both inequalities we get the desires result:
This proves the result for finite games with observed actions.
4
It is important that we have defined σ̃i in differing only in the last period deviation.
Therefore, after time t0 strategy σ̃i follows σi . This allows us to use the SPDP.
9
4.2 Proof for infinite horizon games
Note, that the proof for finite-horizon games also establishes that we for a
profile σ which satisfies SPDP player i cannot improve on σi in some subgame
xt by considering a new strategy σ̂i with finitely many deviations from σi .
However, it is still possible that deviations at infinitely many periods might
be an improvement for player i.
Assume this would be the case for some σ̂i . Let’s denote the gain from
using this strategy with ²:
Remark 3 Games which are at continuous at ∞ are in some sense ’the next
best thing’ to finite games.
10
1. We get a roughly even split for δ close to 1 (little discounting). The
proposer can increase his share the more impatient the other player is.
3. The division becomes less equal for finite bargaining games. Essentially,
the last proposer at period T can take everything for himself. Therefore,
he will tend to get the greatest share of the pie in period 1 as well
- otherwise he would continue to reject and take everything in the
last period. In our two-period experiments we have in deed observed
greater payoffs to the last proposer (ratio of 2 to 1). However, half
of all bargains resulted in disagreement after the second period and so
zero payoffs for everyone. Apparently, people care about fairness as
well as payoffs which makes one wonder whether monetary payoffs are
the right way to describe the utility of players in this game.
11
the backward solution converges to the Rubinstein solution. This technique
also provides an alternative proof for uniqueness.
1| {z
− x} = δx
|{z}
2’s payoff at period 1 2’s discounted payoff from period 2
1
x =
1+δ
12
5.2.2 Uniqueness
This proof illustrates a number of techniques which enable us to prove prop-
erties about equilibrium without actually constructing it.
Let v and v be the highest and lowest payoffs received by a proposer in
any SPE. We first observe:
1 − v ≥ δv (3)
If this equation would not be true then no proposer could propose v because
the recipient could always get more in any subgame. We also find:
v ≥ 1 − δv (4)
If not then no proposer would propose v - she would rather wait for the other
player to make her proposal because she would get a higher payoff this way.
We can use both inequalities to derive bounds on v and v:
v ≥ 1 − δv ≥ 1 − δ (1 − δv) (5)
¡ 2
¢
1−δ v ≥ 1−δ (6)
1
v ≥ (7)
1+δ
Similarly, we find:
1 − v ≥ δv ≥ δ (1 − δv) (8)
1
v ≤ (9)
1+δ
1
Hence v = v = 1+δ . Clearly, no other strategies can generate this payoff in
6
every subgame.
6
While being a nice result it does not necessarily hold anymore when we change the
game. For example, if both players make simultaneous proposals then any division is a
SPE. Also, it no longer holds when there are several players.
13
Lecture XIII: Repeated Games
Markus M. Möbius
April 23, 2003
1 Introduction
So far one might get a somewhat misleading impression about SPE. When
we first introduced dynamic games we noted that they often have a large
number of (unreasonable) Nash equilibria. In the models we’ve looked at so
far SPE has ’solved’ this problem and given us a unique NE. In fact, this is
not really the norm. We’ll see today that many dynamic games still have a
very large number of SPE.
2 Credible Threats
We introduced SPE to rule out non-credible threats. In many finite horizon
games though credible threats are common and cause a multiplicity of SPE.
Consider the following game:
1
L C R
¡ ¢
The game has three NE: (T,L), (M,C) and 12 T + 12 M, 12 L + 12 C
Suppose that the players play the game twice and observe first period
actions before choosing the second period actions. Now one way to get a
SPE is to play any of the three profiles above followed by another of them
(or same one). We can also, however, use credible threats to get other actions
played in period 1, such as:
2
3 Repeated Prisoner’s Dilemma
Note, that the PD doesn’t have multiple NE so in a finite horizon we don’t
have the same easy threats to use. Therefore, the finitely repeated PD has a
unique SPE in which every player defects in each period.
C D
C 1,1 −1,2
D 2,−1 0,0
0 + δ0 + δ 2 0 + ..
−1 + δ0 + δ 2 0 + ..
3
if he follows his strategy and
2 + δ0 + δ 2 0 + .. = 2
2. For δ ≥ 12 there is a SPE where the players play D in the first period
and then C in all future periods.
3. For δ > √12 there is a SPE where the players play D in every even
period and C in every odd period.
4. For δ ≥ 12 there is a SPE where the players play (C,D) in every even
period and (D,C) in every odd period.
• Assume you want to check that the cooperation with grim trigger pun-
ishment is SPE. There are two types of histories you have to check.
Along the equilibrium path there is just one history: everybody coop-
erated so far. Off the equilibrium path, there is again only one class:
one person has defected.
4
• Assume you want to check that cooperating in even periods and defect-
ing in odd periods plus grim trigger punishment in case of deviation
by any player from above pattern is SPE. There are three types of his-
tories: even and odd periods along the equilibrium path, and off the
equilibrium path histories.
• Assume you want to check that TFT (’Tit for Tat’) is SPE (which it
isn’t - see next lecture). Then you have you check four histories: only
the play of both players in the last period matters for future play - so
there are four relevant histories such as player 1 and 2 both cooperated
in the last period, player 1 defected and player 2 cooperated etc.1
Corollary 1 A strategy which has players play the same NE in each period
is always SPE.
4 Folk Theorem
The examples in 3.1 suggest that the repeated PD has a tremendous number
of equilibria when δ is large. Essentially, this means that game theory tells
us we can’t really tell what is going to happen. This turns out to be an
accurate description of most infinitely repeated games.
1
If a deviation triggers a switch to only NE1 this statement would no longer be true.
5
Let G be a simultaneous move game with action sets A1 , A2 , .., AI and
mixed strategy spaces Σ1 , Σ2 ,.., ΣI and payoff function ũi .
We can think of this as the lowest payoff a rational player could ever get
in equilibrium if he anticipates his opponents’ (possibly non-rational) play.
Intuitively, the minmax payoff v i is the payoff player 1 can guarantee to
herself even if the other players try to punish her as badly as they can. The
minmax payoff is a measure of the punishment other players can inflict.
P + δP + δ 2 P + ... = ui (s∗ )
6
• The minmax payoff for each player is 0 as you can easily check. The
other player can at most punish his rival by defecting, and each player
can secure herself 0 in this case.
2
1.5
1
(1−δ)u2
0.5
−0.5
−1
−1 −0.5 0 0.5 1 1.5 2
(1−δ)u1
Hence the theorem says that anything in this trapezoid is possible. Note,
that the equilibria I showed before generate payoffs inside this area.
F 2,1 0,0
O 0,0 1,2
7
Here each player can guarantee herself at least payoff 23 which is the pay-
off from playing the mixed strategy Nash equilibrium. You can check that
whenever player 2 mixes with different probabilities, player 1 can guarantee
herself more than this payoff by playing either F or O all the time.
2. If some player deviates punish him by having the other players for T
periods choose σ−i such that player i gets v i .
3. After the end of the punishment phase reward all players (other than i)
for having carried out the punishment by switching to an action profile
where player i gets viP < vi and all other players get vjP + ².
¡ ¢
2
For example, in the BoS it is not possible to generate 32 , 32 in the stage game even
with mixing. However, if players alternate and play (O,F) and then (F,O) the players can
get arbitrarily close for large δ.
8
Lecture XV: Axelrod Tournaments
Markus M. Möbius
April 23, 2003
1 Introduction
In 1983 Axelrod solicited computer programs from game theorists and psy-
chologists - each program was essentially a strategy of how to play a repeated
Prisoner’s Dilemma stage game 200 times:
C D
C 3,3 0,5
D 5,0 1,1
All 40 entries were matched against each other, and the score of each bilateral
match was calculated as the sum of all stage payoffs (i.e. δ = 1). All payoffs
were between 200 and 600 since a player can always guarantee at least a payoff
1
of 1 in a stage game. The average score of each strategy was calculated, and
the strategies ranked according to overall success.1
Perhaps surprisingly, the winner was the simple TIT FOR TAT rule - a
player starts to cooperate and then does whatever the last player did in the
previous period.
The reason that TFT did so well was the following:
1. TFT is ’nice’ because it never defects first. In fact, all of the best eight
strategies were nice. Nice strategies did primarily well because so many
of the entries were nice - hence they did particularly well against other
nice strategies (getting the maximum payoff of 600).
2. TFT is discriminating enough not to be exploited too badly. For ex-
ample, ’all C’ is easily exploited by ’all D’, but TFT can’t be as easily
duped.
3. TFT is forgiving - bygones are bygones, and one period cooperation can
lead to continued cooperation in the future. Many strategies did badly
because they were unforgiving. Forgiveness was the main criterion for
success amongst the set of ’nice’ strategies.
It’s ironic that the example on the solicitation letter would have done
even better than TFT. This was essentially a TIT for two TATs strategy - a
player will only defect if he sees two defections in a row. What’s nice about
TF2T is that it avoids ’echo effects’ where players are caught in an alternate
(C, D) and (D, C) cycle.
Most submissions were in some way or another variations of TFT. How-
ever, rather than making TFT more forgiving (as TF2T) participants devised
less nice rules which turned out to do worse on average.
2
how does cooperation arise in populations?
Ideally, evolutionary forces should select equilibria with more cooperation
over those with less cooperation, and provide some mechanism through which
the transition can be accomplished. This mechanism are typically ’mutations’
of agents who do better than non-cooperating agents.
In order to analyze evolution in repeated PD games we go through the
following steps:
3
Previously, agents were labelled by the strategy set (e.g. xi was the share
of agents using strategy si ). Now, the strategy set is the set of extensive form
strategies, i.e. s : H → {C, D}.2
Now we can define ESS, replicator dynamics etc. exactly as before. To
make life easier, we will use a slightly weaker form of ESS, namely collective
stability which is essentially NE:
U (s0 , s) ≤ U (s, s)
Collective stability (like NE) captures the idea that no mutation will do
better than the original strategy does against itself.4
4
Proof: ’All C’ clearly doesn’t do better than TFT since TFT is nice. We
have to check that ’all D’ doesn’t do better, and also ’alternate D and
C’. Are there any other strategies? NO.
Look at a strategy s0 which contains both C and D along the equilib-
rium path. Assume there is a string of at least two D’s somewhere in
that strategy s0 =(...DDXXX..). Note, that TFT defects after the first
defection. Consider the modified strategy s00 =(...DXXXX). If s00 does
better s0 against T F T then it does not pay to include an additional D
in the string of strategy s0 . Hence we can eliminate all double D’s from
s0 and get a better strategy against TFT. If s00 does not do better than
s0 against TFT then we can continue to insert defections to get suc-
cessively better strategies until we get back to ’all D’. This argument
implies that we only have to consider ’all D’ or mixed strategies with
no consecutive D’s.
In the same way we can exclude consecutive C’s. This shows that we
only have to consider ’all D’, and ’alternate D and C’.
3
u (TFT, TFT) =
1−δ
δ
u (all D, TFT) = 5 +
1−δ
5
u (alternate D and C , TFT) =
1 − δ2
We just have to make that none of the ¡two mutations
¢ does better than
TFT. This will be the case if δ > max 12 , 23 . QED
This look like horrible news - how can cooperation ever arise if everything
including ’All D’ is collectively stable?
5
6 Cluster-stability
For this section assume that δ = .9 such that TFT is collectively stable. The
standard ESS concept of a mutation assumes that mutations are ’small’ and
hence it’s unlikely that a mutant interacts with other mutants.
Axelrod’s idea is to consider mutations which are ’small’ compared to
the population, but whose members nonetheless interact. For example, the
senate is pretty large, but two freshmen senators from California might in-
teract sufficiently so that they don’t influence the payoffs of other senators
very much (just as individual mutations do), but they do affect each other
directly a lot.
Formally, we say that agents in a q-cluster mutation interact with each
other with probability q and interact with everyone else with probability
1 − q.
Recall, that a mutant agent playing TFT gets payoff U (TFT, All D) in
an ’All D’ equilibrium. However, mutants in a q-cluster get:
It can be easily checked that q-cluster mutants do better than the ’All D’
1
incumbents for q > 21 ≈ 5 percent. Hence, cluster mutants don’t have to
interact a lot to do a lot better than the ’meanies’ who are the majority of
the population.
1
Proposition 3 TFT is q-cluster stable for q > 21
while ’All D’ is not q-
cluster stable for any q.
5
Note, that the intuition is very similar to stochastic stability with local interaction.
6
Lecture XV: Games with Incomplete
Information
Markus M. Möbius
April 23, 2003
1 Introduction
Informally, a game with incomplete information is a game where the game
being played is not common knowledge. This idea is tremendously important
in practice where its almost always a good idea to assume that something
about the game is unknown to some players. What could be unknown?
1
2 Examples
2.1 Example I: Crazy Incumbent
Think of a standard entry game where the incumbent is ’crazy’ with proba-
bility 1 − p and rational with probability p. The normal incumbent faces the
standard entry game from lecture 11:
In Out
1 2
0
F A
−1 1
−1 1
In Out
1 2
0
F A
1 −1
−1 1
2
2.2 Example II: Auction
Two bidders are trying to purchase the same item at a sealed bid auction.
The players simultaneously choose b1 and b2 and the good is sold to the
highest bidder at his bid price (assume coin flip if b1 = b2 ). Suppose that the
players’ utilities are
vi − bi if bi > b−i
1
ui (bi , b−i ) = (v i − b i ) if bi = b−i
2
0 if bi < b−i
The crucial incomplete information is that while each player knows his own
valuation, he does not know his rival’s. Assume, each has a prior that his
rival’s valuation is uniform on [0, 1] and that this is common knowledge.
Call Dont
Assume that the actions are chosen simultaneously and that players know
only their own costs. They have prior that c−i ∈ U [c, c].
3
Alternatively, we could have player 1’s cost known to all (c1 = 21 ) but
c2 ∈ {c, c} known only to player 2.
Or, player 1 is a senior faculty member who knows from experience the
cost of such phone calls (c1 = 12 ,c2 = 23 ). Player 2 is new assistant professor
who has priors c1 , c2 ∈ U [0, 2].
3 Definitions
Definition 1 A game with incomplete information G = (Φ, S, P, u) consists
of
3. A joint probability distribution p (φ1 , .., φI ) over the types. For finite
type space assume p (φi ) > 0 for all φi ∈ Φi .
Note that payoffs can depend not only on your type but also on your rival’s
type as well.
Players know their own types but not the other players’ types.
To analyze games of incomplete information we rely on the following
observation (Harsanyi):
Observation: Games of incomplete information can be thought of as
games of complete but imperfect information where nature makes the first
move and not everyone is informed about nature’s move, i.e. nature chooses
Φ but only reveals φi to player i.
1
Note, that the player i knows his type.
4
N
Rational p Crazy 1−p
2 2
In Out In Out
1 2 1 2
0 0
F A F A
−1 1 1 −1
−1 1 −1 1
This just says that your maximize expected payoff, and given that you
know your type (that all have positive probability) this is equivalent to saying
you maximize conditional on each possible type.
Remark 1 A Bayesian Nash equilibrium is simply a Nash equilibrium of
the game where Nature moves first, chooses φ ∈ Φ from a distribution with
probability p (φ) and reveals φi to player i.
2
Note, that a Bayesian strategy is simply an extensive form strategy where each type
is treated as a distinct information set.
5
4 Solved examples
4.1 Public good I
Suppose player 1 is known to have cost c1 < 21 . Player 2 has cost c with
probability p, c with probability 1 − p. Assume that 0 < c < 1 < c and
p < 12 .
Call Dont
Proof: In a BNE each type of player must play a BR so for the type c of
player 2 calling is strictly dominated:
for all s1 .
So f2∗ (c) = Dont.
For player 1:
u1 (Call, f2∗ ; c1 ) = 1 − c1
u1 (Dont, f2∗ ; c1 ) = pu1 (Dont, f2∗ (c) ; c1 ) + (1 − p) u1 (Dont, Dont; c1 )
≤ p + (1 − p) 0 = p
6
Because 1 − c1 > p we know f1∗ (c1 ) = Call.
For the type c of player 2 we have:
u2 (f1∗ , Call; c) = 1 − c
u2 (f1∗ , Dont; c) = 1
Proof: Existence is easy - just check that each is using a BR given that
his rival calls with probability 13 and doesn’t call with probability 23 .
I’ll show uniqueness to illustrate how to find the equilibrium.
Observation: If fi∗ (ci ) = Call then fi∗ (c0i ) = Call for all c0i < ci .
© ∗ ª
To see this write z−i for P rob f−i (c−i ) = Call . If fi∗ (ci ) = call then
¡ ∗
¢ ¡ ∗
¢
Ec−i ui Call, f−i (c−i ) ; ci ≥ Ec−i ui Dont, f−i (c−i ) ; ci
1 − ci ≥ z−i
7
In light of observation a BNE must be of the form:3
½
∗ Call if c1 ≤ c∗1
f1 (c1 ) =
Don’t if c1 > c∗1
½
∗ Call if c2 ≤ c∗2
f2 (c2 ) =
Don’t if c2 > c∗2
4.3 Auctions
In the sealed bid auction where the players’ valuations are independently
uniformly distributed on [0, 1] the unique BNE is:
v1
f1∗ (v1 ) =
2
∗ v2
f2 (v2 ) =
2
3
The cutoff values themselves are indeterminate - agents might or might not call. In
this sense the equilibrium won’t be unique. However, the cutoff values are probability zero
events and hence the strategy at these points won’t matter.
8
Proof: To verify that this is a BNE is easy. We just show that each type of
each player is using a BR:
1
Ev2 (u1 , f2∗ ; v1 , v2 ) = (v1 − b1 ) P rob (f2∗ (v2 ) < b1 )+ (v1 − b1 ) P rob (f2∗ (v2 ) = b1 )
2
£ ¤
We assume b1 ∈ 0, 12 . No larger bid makes sense given f2∗ . Hence:
9
This gives us:
0
v1 = f1∗ (v1 ) + f1∗ (v1 ) v1
This is a differential equation. Let’s integrate on both sides:
1 2
v1 + K = f1∗ (v1 ) v1
2
∗ 1 k
f1 (v1 ) = v1 +
2 v1
The only true solution is k = 0. We need k ≤ 0 to have increasing
bids, and with 0 as a minimum bid k < 0 is impossible.
10
Lecture XVI: Static Games of Incomplete
Information II
Markus M. Möbius
April 24, 2003
1 Purification
People are often uncomfortable with mixed equilibria - why should agents
who are indifferent between strategies mix precisely at the right rate to make
the other player indifferent between the strategies in her support?
Harsanyi (1973) developed the idea of ‘purifying’ mixed strategies. The
idea is best explained with an example. Consider the standard BOS game
below:
F O
F 2,1 0,0
O 0,0 1,2
The game has two pure Nash equilibria (F,F) and (O,O) and a mixed one
( 23 F + 31 O, 13 F + 23 O).
1
Harsanyi pointed out that even if players know each other reasonably well
the above game is an idealization. In reality none of the two players can be
absolutely sure about the other’s payoffs. Instead, he suggested real-world
payoffs are more likely to look as follows:
F O
F 2+a,1 0,0
O 0,0 1,2+b
Each player payoffs are slightly disturbed - assume that both a and b are
distributed uniformly over [0, ²] for ² small. Each player knows his own type
but not the type of the other player.
We now have a static game of incomplete information. As before an
informed guess might tell us that an equilibrium has some nice monotonicity
properties. In particular, we guess that player 1 chooses F whenever a > α
and player 2 chooses O when b > β.
Note, that we are looking for equilibria in which each player
plays a pure strategy.
In this case, the probability that player 1 will player action F is p = ²−α
²
and the probability that player 2 will do so is q = β² . The marginal agent
has to be indifferent between choosing F and O in both cases. This gives us
the following equilibrium conditions:
(2 + ²α)q = (1 − q)
p = (2 + ²β)(1 − p)
2
In other words, the BNE of the perturbed game in which each type of
agent plays a pure strategy ‘looks’ just like the mixed strategy of the original
game.
Harsanyi showed (1) that each perturbation gives rise to an equilibrium
which looks like a mixed equilibrium, and (2) that each mixed equilibrium
can be approximated by a sequence of BNE as constructed above.
3
Lecture XVII: Dynamic Games with
Incomplete Information
Markus M. Möbius
May 5, 2003
1 Introduction
In the last lecture I introduced the idea of incomplete information. We ana-
lyzed some important simultaneous move games such as sealed bid auctions
and public goods.
In practice, almost all of the interesting models with incomplete informa-
tion are dynamic games also. Before we talk about these games we’ll need a
new solution concept called Perfect Bayesian Equilibrium.
Intuitively, PBE is to extensive form games with incomplete games what
SPE is to extensive form games with complete information. The concept
we did last time, BNE is a simply the familiar Nash equilibrium under the
Harsanyi representation of incomplete information. In principle, we could
use the Harsanyi representation and SPE in dynamic games of incomplete
information. However, dynamic games with incomplete information typi-
cally don’t have enough subgames to do SPE. Therefore, many ’non-credible’
threats are possible again and we get too many unreasonable SPE’s. PBE
allows subgame reasoning at information sets which are not single nodes
whereas SPE only applies at single node information sets of players (because
only those can be part of a proper subgame).
The following example illustrates some problems with SPE.
1
1.1 Example I - SPE
Our first example has no incomplete information at all.
L R
2 2
A B
1 3
1 3
L R RR
2 2 2
A B A B
1 3 1 3
1 3 1 3
The old SPE survives - all (pR + (1 − p) RR, B) for all p is SPE. But there
are suddenly strange SPE such as (L, qA + (1 − q) B) for q ≥ 21 . Player 2’s
2
strategy looks like an non-credible threat again - but out notion of SPE can’t
rule it out!
Remember: SPE can fail to rule out actions which are not optimal given
any ’beliefs’ about uncertainty.
• Stage I: Player 1 (worker) observes his type and chooses eduction level
e ∈ {0, 1}. Education has cost ceθ . Note, that higher ability workers
have lower cost and that getting no education is costless.
• Stage II: Player 2 (the competitive labor market) chooses the wage
rate w (e) of workers after observing the education level.
3
Suppose that u1 (e, w, ; θ) = w − ceθ and that u2 (e, w; θ) = − (w − θ)2 .
Note, that education does not enter the firm’s utility function. Also note,
that the BR of the firm is to set wages equal to the expected ability of
the worker under this utility function. This is exactly what the competitive
labor market would do if θ is equal to the productivity of a worker (the
dollar amount of output he produces). If a firm pays above the expected
productivity it will run a loss, and if it pays below some other firm would
come in and offer more to the worker. So the market should offer exactly
the expected productivity. The particular (rather strange-looking) utility
function we have chosen implements the market outcome with a single firm
- it’s a simple shortcut.
Spence’s game is a signalling game. Each signalling game has the same
three-part structure: nature chooses types, the sender (worker) observes his
type and takes and action, the receiver (firm) sees that action but not the
worker’s type. Hence the firm tries to deduce the worker’s type using his
action. His action therefore serves as a signal. Spence’s game is extreme
because the signal (education) has no value to the firm except for its sig-
nalling function. This is not the case for all signalling models: think of a car
manufacturer who can be of low or high quality and wants to send a signal
to the consumer that he is a high-quality producer. He can offer a short or
long warranty for the car. The extended warranty will not only signal his
type but also benefit the consumer.
The (Harsanyi) extensive form representation of Spence’s game (and any
other signalling game) is given below.
N
Type θ=2: p Type θ=3: 1−p
W W
e e
F F
w w
w−ce/2 w−ce/3
−(w−2)2 −(w−3)2
4
1.3 Why does SPE concept together with Harsanyi
representation not work?
We could find the set of SPE in the Harsanyi representation of the game.
The problem is that the game has no proper subgame in the second round
when the firm makes its decision. Therefore, the firm can make unreasonable
threats such as the following: both workers buy education and the firms
pays the educated worker w = 3 − p (his expected productivity), and the
uneducated worker gets w = −235.11 (or something else). Clearly, every
worker would get education, and the firm plays a BR to a worker getting
education (check for yourself using the Harsanyi representation).
However, the threat of paying a negative wage is unreasonable. Once the
firm sees a worker who has no education it should realize that the worker has
a least ability level 2 and should therefore at least get a wage of w = 2.
1. At every information set strategies are optimal given beliefs and oppo-
nents’ strategies (sequential rationality).
¡ ¢
σi∗ (h) maximizes Eµi (x|hi ) ui σi , σ−i
∗
|h, θi , θ−i
5
possible. Branches of the game tree which are reached with zero probability
cannot be derived using Bayes rule: here you can choose arbitrary beliefs.
However, the precise specification will typically matter for deciding whether
an equilibrium is PBE or not.
• The high ability worker gets education and the low ability worker does
not: e (θ = 2) = 0 and e (θ = 3) = 1. In this case my beliefs at the
information set e = 1 should be P rob (High|e = 1) = 1 and similarly,
P rob (High|e = 0) = 0.
• The high ability worker gets education and the low ability worker gets
education with probability q. This case is less trivial. What’s the prob-
ability of seeing worker get education - it’s 1 − p + pq. What’s the
probability of a worker being high ability and getting education? It’s
1 − p. Hence the probability that the worker is high ability after we
1−p
have observed him getting education is 1−p+pq . This is the non-trivial
part of Bayes rule.
6
Formally, we can derive the beliefs at some information set hi of player
i as follows. There is a probability p (θj ) that the other player is of type θj .
These probabilities are determined by nature. Player j (i.e. the worker) has
taken some action aj such that the information set hi was reached. Each
type of player j takes action aj with some probability σj∗ (aj |θj ) according to
his equilibrium strategy. Applying Bayes rule we can then derive the belief
of player i that player j has type θj at information set hi :
1. In the job signalling game with separating beliefs Bayes rule gives us
exactly what we expect - we belief that a worker who gets education is
high type.
1−p
2. In the pooling case Bayes rule gives us P rob (High|e = 1) = p×1+(1−p)×1 =
1 − p. Note, that Bayes rule does NOT apply for finding the
beliefs after observing e = 0 because the denominator is zero.
(1−p)×1
3. In the semi-pooling case we get P rob (High|e = 1) = p×q+(1−p)×1
. Sim-
(1−p)×0
ilarly, P rob (High|e = 0) = p×(1−q)+(1−p)×0
= 0.
7
• Stage 0: Nature chooses the type θ1 ∈ Θ1 of player 1 from probability
distribution p.
• Stage 1: Player 1 observes θ1 and chooses a1 ∈ A1 .
• Stage 2: Player 2 observes a1 and chooses a2 ∈ A2 .
The players utilities are:
u1 = u1 (a1 , a2 ; θ1 ) (3)
u2 = u2 (a1 , a2 ; θ1 ) (4)
8
3.1.4 Example 4: Pretrial Negotiation
• player 1 - defendant
• player 2 - plaintiff
• A1 - settlement offer
• A2 - accept/reject
2. Player 2’s beliefs are compatible with Bayes’ rule. If any type of player
1 plays a1 with positive probability then
p (θ1 ) P rob (s∗1 (θ1 ) = a1 )
µ2 (θ1 |a1 ) = P 0 ∗ 0
for all θ1 ∈ Θ1
θ 0 ∈Θ1 p (θ1 ) P rob (s1 (θ1 ) = a1 )
1
9
Remark 3 In the second stage of the education game the ”market” must
have an expectation that player 1 is type θ = 2 and attach probability µ (2|a1 )
to the player being type 2. The wage in the second period must be between
2 and 3. This rules out the unreasonable threat of the NE I gave you in the
education game (with negative wages).1
µ2 (θ = 2|e = 0) = 1
µ2 (θ = 3|e = 0) = 0
µ2 (θ = 2|e = 1) = 0
µ2 (θ = 3|e = 1) = 1
1
If player 1’s strategy is s∗1 (θ = 2) = 2
× 0 + 12 × 1 and s∗1 (θ = 3) = 1:
µ2 (θ = 2|e = 0) = 1
µ2 (θ = 3|e = 0) = 0
p
2 p
µ2 (θ = 2|e = 1) = p =
+1−p 2
2−p
2 − 2p
µ2 (θ = 3|e = 1) =
2−p
Also note, that Bayes rule does NOT apply after an actions which should not
occur in equilibrium. Suppose s∗1 (θ = 2) = s∗1 (θ = 3) = 1 then it’s OK to
assume
57
µ2 (θ = 2|e = 0) =
64
7
µ2 (θ = 3|e = 0) =
64
µ2 (θ = 2|e = 1) = p
µ2 (θ = 3|e = 1) = 1 − p
10
Remark 5 Examples SPE II and SPE III from the introduction now make
sense - if players update according to Bayes rule we get the ’reasonable’ beliefs
of players of being with equal probability in one of the two nodes.
• Now look at optimality. Player 2 sets the wage to the expected wage
so he is maximizing.
1. Note that for too small or too big costs there is no separating PBE.
11
• The beliefs are consistent with Bayes rule for e = 1. If e = 0 has been
observed Bayes rule does not apply because e = 0 should never occur
- hence any belief is fine. The belief that the worker is low type if he
does not get education makes sure that the worker gets punished for
not getting educated.
• The firm pays expected wage - hence it’s optimal response. The low
ability guy won’t deviate as long 2.5 − 2c ≥ 2 and the high ability type
won’t deviate as long as 2.5 − 3c ≥ 2. For c ≤ 1 both conditions are
true.
1. While this pooling equilibrium only works for small c there is always
another pooling equilibrium where no worker gets education and the
firms thinks that any worker who gets education is of the low type.
4.3 1<c<2
Assume that p = 12 for this section. In the parameter range 1 < c < 2 there is
a semiseparating PBE of the model. The high ability worker buys education
and the low ability worker buys education with positive probability q. The
1
wage is w = 2 if the firm observes no education and set to w = 2 + 1+q if
it observes education. The beliefs that the worker is high type is zero if he
1
gets no education and 1+q if he does.
Set 1 + q = 2c . It’s easy to check that the first condition is binding and
the second condition is strictly true. So we are done if we choose q = 2c − 1.
Note, that as c → 2 we get back the separating equilibrium and as c → 1 we
get the pooling one.
12
Lecture XVIII: Games with Incomplete
Information II - More Examples
Markus M. Möbius
May 5, 2003
1 Introduction
This lecture gives more examples of games of incomplete information, in
particular signalling games.
• The Lobbyist gets a payoff of zero if the subsidy is not passed, a payoff
of 1 if the subsidy passes and times are Bad, and a subsidy of 1/2 if
the subsidy passes and times are good.
1
1. Is there a PBE in which the subsidy passes? Answer: NO
3 Legal Settlements
• There are two players, a plaintiff and a defendant in a civil suit. The
plaintiff knows whether or not he will win the case if he goes to trial,
but the defendant does not have this information.
• The defendant knows that the plaintiff knows who would win, and the
defendant has prior beliefs that there is probability 13 that the plaintiff
will win; these prior beliefs are common knowledge.
• If the plaintiff wins, his payoff is 3 and the defendant’s payoff is -4;
if the plaintiff loses, his payoff is -1 and the defendant’s is 0. (This
corresponds to the defendant paying cash damages of 3 if the plaintiff
wins, and the loser of the case paying court costs of 1.)
• The plaintiff has two possible actions: He can ask for either a low
settlement of m = 1 or a high settlement of m = 2. If the defendant
accepts a settlement offer of m, the plaintiff’s payoff is m and the
defendant’s is −m. If the defendant rejects the settlement offer, the
case goes to court.
1. List all the pure-strategy PBE strategy profiles. For each such profile,
specify the beliefs of the defendant as a function of m, and verify that
the combination of these beliefs and the profile is in fact a PBE.
4 Corporate Investment
This is a variant of the IPO game. An entrepreneur needs financing to realize
a new project and can offer an outside investor an equity stake in his company.
2
The stake gives the investor a share of the future (next period’s) cashflow
of the company: the profits from the existing business plus the profits from
the project. The profitability of the nw project is known to both investor
and entrepreneur. However, only the entrepreneur knows the profitability of
the existing company. The investor therefore runs the risk of investing in
an unprofitable business. The new project requires investment I and gives
payoff R > (1 + r) I in the next period (where r is the interest rate).
• Nature chooses the type of the entrepreneur’s firm which can be highly
profitable (π = H) or less profitable (π = L). The business is not so
profitable with probability p.
• The entrepreneur observes π and then offers equity stake s such that
0 ≤ s ≤ 1.
5 Monetary Policy
5.1 The Barro-Gordon model (1983)
There are two periods. In the first period firms form expectations about
inflation πe . Their payoff is − (π − πe )2 . In the second period the Fed sets
inflation π. The Fed has objective function:
−cπ 2 − (y − y ∗ )2 (1)