Int. J. Business and Systems Research, Vol. 12, No. 1, 2018
Let the games begin and go on
Maria Cristina P. Matos
Instituto Politécnico de Viseu (IPV),
ESTV, CI&DETS,
VISEU, Portugal
Email:
[email protected]
Manuel Alberto M. Ferreira and
José António Filipe*
Instituto Universitário de Lisboa,
BRU-IUL, ISTAR-IUL,
Av. das Forças Armadas, 1649-026, Lisboa, Portugal
Email:
[email protected]
Email:
[email protected]
*Corresponding author
Abstract: Real life is a bigger game in which what a player does early on can
affect what others choose to do later on. In particular, we can strive to explain
how cooperative behaviour can be established as a result of rational behaviour.
When engaged in a repeated situation, players must consider not only their
short-term gains but also their long-term payoffs. The general idea of repeated
games is that players may be able to deter another player from exploiting his
short-term advantage by threatening a punishment that reduces his long-term
payoff. The aim of the paper that supports this abstract is to present and discuss
dynamic game theory. There are three basic kinds of reasons, which are not
mutually exclusive, to study what happens in repeated games. First, it provides
a pleasant and a very interesting theory and it has the advantage of making us
become more humble in our predictions. Second, many of the most interesting
economic interactions repeated often can incorporate a phenomenon which we
believe are important but which are not captured when we restrict our attention
to static games. Finally, economics, and equilibrium-based theories more
generally, do best when analysing routinised interactions.
Keywords: dynamic games; code form game; repeated game.
Reference to this paper should be made as follows: Matos, M.C.P.,
Ferreira, M.A.M. and Filipe, J.A. (2018) ‘Let the games begin and go on’,
Int. J. Business and Systems Research, Vol. 12, No. 1, pp.43–52.
Biographical notes: Maria Cristina P. Matos graduated in Pure Mathematics at
the University of Coimbra. She holds a Master in Management and a PhD in
Quantitative Methods at ISCTE-Lisbon University Institute. She is an Adjunct
Professor at IPV – Viseu Polytechnic Institute. Her research interests are
mathematics; game theory; applications to management, economics, marketing
and finance. She has published more than 40 papers, in scientific journals
and conference proceedings, and one book chapter. She presented
16 communications in international scientific conferences.
Copyright © 2018 Inderscience Enterprises Ltd.
43
44
M.C.P. Matos et al.
Manuel Alberto M. Ferreira is an Electrotechnic Engineer and received his
Master in Applied Mathematics at the Lisbon Technical University, PhD in
Management-Quantitative Methods and Habilitation in Quantitative Methods at
the ISCTE-Lisbon University Institute. He is the former Chairman of the Board
of Directors and Vice-President of ISCTE-Lisbon University Institute. He is a
Full Professor at the ISCTE-Lisbon University Institute. He is the Director in
the Department of Mathematics at the ISTA-School of Technology and
Architecture. His research interests are mathematics, statistics, stochastic
processes-queues and applied probabilities, game theory, Bayesian statistics:
application to forensic identification applications, applications to economics,
management, business, marketing, finance and social problems in general. He
has published more than 370 papers in scientific journals and conference
proceedings, and 35 book chapters. He presented 163 communications in
international scientific conferences.
José António Filipe is an Assistant Professor at the Instituto Universitário de
Lisboa (ISCTE-IUL), has his Habilitation in Quantitative Methods, received his
PhD in Quantitative Methods, Master in Management and Graduation in
Economics. His current research interests include, among others, mathematics
and statistics, multi criteria decision making methods, chaos theory, game
theory, stochastic processes – queues and applied probabilities, Bayesian
statistics – application to forensic identification, applications to economics,
management, business, marketing, finance and social problems in general. He
has published more than 200 papers in scientific journals and conference
proceedings, and 18 book chapters. He presented about 100 communications in
international scientific conferences and other scientific events.
1
Introduction
In a contentious atmosphere players, who are not always human or aware of what they
are doing, compete in order to achieve their objectives. In a game each party’s interests
are confronted, which makes each player develop action strategies to maximise gains or
minimise losses, i.e., the player is looking for a strategy which will result in reaching a
certain objective in opposition with the other players who are also trying to optimise their
position. The final outcome depends on the set of strategies taken up by all participants.
That is, a game is any situation governed by rules with well-defined outcomes
characterised by strategic interdependence.
As we know game theory is a discipline which allows vast and interesting results to
be achieved in classifying, formalising and solving distinct interaction situations. For this
reason, it is often used to study competition in oligopolistic markets. Competition in this
type of environment is characterised by strategic interdependence.
It is starting from precisely this assumption that we present and analyse ‘Prisoner’s
dilemma’ game to demonstrate how starting with game theory we can model and
establish results for situations which occur in economic theory.
‘Prisoner’s dilemma’ has attracted the attention of researchers because it depicts a
contradictory situation incisively: in looking for the best, each economic agent produces a
non-optimum outcome from the stakeholders’ point of view. In other words, this game
demonstrates that under certain circumstances looking out for their personal interests
Let the games begin and go on
45
leads economic agents to inefficient outcomes in the Pareto sense. Thus, a concerted
action may lead all agents to more favourable outcomes.
In a two-company cartel, both companies face an analogous situation. By cooperating
they can obtain half of a monopoly’s profits, but if both have to decide independently on
quantities or prices, then each company will consider it less favourable to cooperate
regardless of the competitor’s decision. So, in terms of self-interest, both companies
compete to obtain lower profits than they would obtain through cooperation. In the
context of a duopoly, this idea is validated by competitive variables in general, namely
production quantities, defining prices, costs with advertising campaigns, research and
development.
Nevertheless, in a repeated interaction, the expectation of future encounters makes
cooperating more attractive. The end of the interactions should not be known beforehand,
i.e., there should be some probability of a next play. Company heads are often heard to
say: ‘things change’. Therein lays the challenge. If the business relationship between the
various economic agents occurs only once, knowledge of what the other agent does is
irrelevant. It does not matter what his strategy is; the best response is not to cooperate. If
the relationship is to last, it is an entirely different matter. Each one’s decision depends
on the durability of the interaction. It also depends on decisions taken at earlier stages. It
does not depend, however, on future actions and no company knows the line of reasoning
of its competitor. That is, in a game where the result is not cooperative, cooperation may
be a perfect Nash equilibrium of the game repeated infinitely. This happens in the
‘Prisoner’s dilemma’. Thus, we can expect companies to cooperate in a duopoly with an
infinite horizon but theoretically not in the finite case. Even though a non-cooperative
game repeated any finite number of times is still a non-cooperative game, reality shows
that companies try often to implement ways to cooperate.
2
‘Prisoner’s dilemma
Let us now analyse the ‘Prisoner’s dilemma’ game. This example of a non-cooperative
game highlights the rationality required when two individuals meet in a position where
the decision of one of them depends on the decision of the other one.
2.1 Symmetrical game with two players
Two individuals, player 1 and player 2, who are supposedly criminals are arrested, see
Straffin (1980). The problem for the police is that, assuming both are involved and in the
absence of proof, a confession is required. Imprisoned in individual and distant cells
without communication between them each one is given the rules in this case:
•
If no one chooses to confess, both will be accused of a lesser offence which will
mean a symbolic sentence of only one month in jail.
•
If both confess, thereby assuming participation in the crime both will be condemned
to a four-year jail sentence.
•
Finally, if one confesses and the other one does not, the one who confesses will be
released immediately, and the other will receive the maximum penalty under the law:
five years in jail.
46
M.C.P. Matos et al.
The strategies in this case are: ‘confess’ or ‘not confess’. The payoffs are the sentences.
Figure 1 shows the game in code form, see Matos and Ferreira (2006) and Matos (2009).
Figure 1
‘Prisoner’s dilemma’ code form
Considering player 1, confessing – C, is better if player 2 stays quiet – PC, as he will be
freed immediately for confessing. Confessing is better if player 2 confesses because
prison is inevitable and he would, therefore, obtain benefits for having confessed. We
may conclude then, by analysing player 2’s choices, confessing is strictly better for
player 1. Consequently confessing is, for him, a dominant strategy. If the same analysis is
carried out for player 2, confessing is also a dominant strategy. Since each suspect only
has one dominant strategy, confessing, the only result of the game for rational suspects is
(C, C).
Note that this analysis only requires that each player is rational and knows the
agreement that is being offered by authorities, i.e., his possible payoffs. A suspect does
not need to know what the other suspect has been promised nor whether or not the other
suspect is rational.
2.2 ‘Prisoner’s dilemma’ in two stages
In a repeated game, players present a possibility of establishing a reputation for
cooperating which will lead the other player to proceed in accordance. The entire strategy
will depend on whether the game is repeated either a finite or an infinite number of times.
That strategic interaction process allows a history to be built between players. Thus, this
players’ behaviours history is assessed so as to evaluate the convenience or not of
following through on the game. Meanwhile, even though the player knows the decisions
that were taken in previous stages, they may be asked to decide without prior knowledge
of the other players’ choices in that stage. Repeated games are a sample of how to induce
cooperation, even when players show significant gains in the opposite behaviour by not
cooperating at each stage. The condition for which repetition of a game leads to a
valuable relationship between players concerns to credibility. This fact can be observed
by analysing the ‘Prisoner’s dilemma’ game repeated in two stages as presented in code
form in Figure 2.
Let the games begin and go on
Figure 2
47
‘Prisoner’s dilemma’ in code form repeated in two stages
Suppose that both players simultaneously decide on two occasions, after seeing the result
of the first decision but before deciding the second time. Suppose also that the payoff of
the complete game is the sum of payoffs of each stage (meaning there is no discount). In
the previous section, we saw that when the ‘Prisoner’s dilemma’ game is played once
there is only one equilibrium, in which each player confesses. Using the backward
induction process, we can easily see that the only perfect equilibrium in ‘Prisoner’s
dilemma’ sub-games in two stages consists in both prisoners confessing in each
sub-game. That is, in the two-stage ‘Prisoner’s dilemma’ game the only game equilibrium
in the second stage is independent of the result of the first stage. In each of the four final
games, there is only one equilibrium and, considering that a perfect equilibrium in subgames should contain equilibrium in each sub-game, and that is the equilibrium of the
sub-game, each player confesses in each sub-game. We are led to the first stage of the
game, where there is once again only one equilibrium (C, C).
Each player makes his choice regarding the strategy to be used considering the
consequences that this choice will have in playing out the game. Judging the previous
game we can observe that there is no incentive for cooperating in the second phase as no
future relationship will be established. Cooperation in the first phase will not lead to
cooperation in the second phase. Lack of credibility keeps prisoners away from achieving
a better result than the equilibrium of that stage. One prisoner’s promise of not confessing
in the second period, which is what would be missing in order to obtain higher payments,
does not pass the credibility filter.
48
M.C.P. Matos et al.
•
Is a cooperative solution possible in a non-cooperative game?
•
Are there ways to implement cooperation in lasting relationships?
Let us answer these questions in the following section.
2.3 The ‘prisoner’s dilemma’ in infinite stages
A finite game seems rather unrealistic to understand strategic interactions. Hence, the
study of infinitely repeated games is more realistic. Just as in the case of games that are
repeated a finite number of times, the main problem of infinitely repeated games is that
credible threats and promises may influence present behaviour.
Let us consider the ‘Prisoner’s dilemma’ repeated infinitely, see von Neumann and
Morgenstern (1967) and Jorgensen et al. (2007). Suppose that for each stage t, results of
the previous t – 1 moves of the game were observed before stage t begins. Let us begin
by redefining payoffs for this type of game. Payoff assessment in games that are repeated
an infinite number of times present some difficulties. As we know a euro in one hundred
years will not be worth what it is worth today. To overcome the time horizon, future
payoffs will be discounted with respect to the present. Thus, considering a fixed discount
rate of δ < 1, the present value of the infinite succession of payoffs associated with each
stage is
π1 + δπ2 + δ 2 π3 + … =
∑δ
πt , t ≥ 1
t −1
Considering our example, suppose each player’s discount factor is δ, and that each
player’s payoff in the repeated game is the current value of the player’s payoff at each
stage of the game.
In the ‘Prisoner’s dilemma’ base game each player must choose his dominant
strategy, which is ‘to confess’. Even when the game is repeated finitely, because each
stage of the game has a single perfect Nash equilibrium, the only perfect equilibrium in
sub-games is reached when both players choose to confess in each period. Nevertheless,
when players are patient enough, we may maintain cooperation, keeping quiet, in each
sub-game of each stage is a perfect equilibrium of the infinitely repeated game. We begin
by seeing that that cooperation is a Nash equilibrium of the repeated game and later that
cooperation is a perfect equilibrium in sub-games.
When an infinitely repeated game is played, each player i has a strategy for the
repeated game, si, that is a sequence of the history depending on the strategies of the
games stage sit , that is si = ( si0 , si1 …).
The n elements sequence of individual strategies of the repeated game is the strategic
profile of the repeated game s, that is, s = (si, …, sn).
The strategies of the repeated game, which are sufficient to lead to cooperation, are of
the following form: Player i begin the infinitely repeated game by cooperating and keeps
cooperating in each game of the following stage only if the players cooperate in each of
the previous periods. In this ‘trigger strategy’, player i cooperates until the other player
stops cooperating, a situation which implies non-cooperation in the whole future moves.
Nevertheless, if both players adopt this strategy, the result of the infinitely repeated game
will be (PC, PC).
Let the games begin and go on
49
More precisely and formally, the player i’s strategy in the repeated game is written,
si = ( si0 , si1 …), as sequence of the story depending on the strategies of the game stage
such that at period t and after history ht,
⎧ PC , t = 0 ∨ ht = ( ( PC , PC )t )
sit ( ht ) = ⎨
C , o.c.
⎪⎩
Note that history ht is a sequence of the action profiles in each game stage that were
played in t periods 0, 1, 2 …, t – 1. ((PC, PC)t) is only a way of writing
((PC, PC), (PC, PC), …, (PC, PC)), where (PC, PC) is repeated t times. We can simplify
the previous system so as to be useful to our later analysis: h0 is the null history, we
adopted the convention that profile h0 is played 0 times. Thus, t = 0 implies that
ht = ((PC, PC)t). Therefore
⎧ PC , ht = ( ( M , M )t )
sit ( ht ) = ⎨
C , o.c.
⎪⎩
Let us prove, then, that for sufficiently patient players, the strategic profile s = (s1, s2) is a
Nash equilibrium of the repeated game.
In t = 1, the history is h1 = (PC, PC), such that both keep quiet. Consequently, at
t = 2, the history is h2 = ((PC, PC), (PC, PC)), so both stay quiet, and so on…. So, the
path associated with s is the infinite sequence of strategic profiles of cooperative actions
((PC, PC), (PC, PC …). The payoff of the repeated game for each player which
corresponds to this path is trivial: the payoff at each future stage will be 1.
Can player j win if he deviates from strategy of the repeated game on condition that
player i follows faithfully? Let t be the period in which player j deviates from that
strategy for the first time. Given that player i will confess forever after period t, the best
response for player j is effectively to confess whenever the outcome of a stage is
different from (PC, PC). If player j chooses to confess at this period he will
receive a payoff of 5. This deviation on the part of player j implies player i’s
non-cooperation in any future move. This way the payoff at each future stage would
δ
, the present value of this payoff sequence is
be 1. Because 1× δ + 1× δ 2 + =
1− δ
δ
5 + 1× δ + 1× δ 2 + = 5
.
1− δ
Alternatively, staying quiet would provide a payoff of 4 at this stage and will lead
exactly to the same choice between confessing or staying quiet in the following stage. Let
V be the present value of the infinite succession of payoffs player j will receive for
making this choice in an optimum way. If player j chooses to stay quiet it is optimum and
4
since staying quite leads to the same decision in the following period.
therefore V =
1− δ
4
as we have seen before.
If player j choosing to confess is optimum, then V = 5 +
1− δ
4
1
δ
Thus, remaining quiet will be optimum if and only if V =
≥ 5+
⇔δ≥ .
1− δ
1− δ
4
50
M.C.P. Matos et al.
Thus, as long as players are patient enough, cooperating is a Nash equilibrium of the
1
repeated game as long as δ ≥ .
4
The following section presents a real-life example of a repeated ‘Prisoner’s dilemma’
in which the above concepts are demonstrated.
3
Price leadership in the breakfast cereal industry
Companies which have competed daily for a century approximate the case of infinite
repetitions. Each time the economic bases change the possibility of benefiting and the set
of the equilibriums of the game repeated an infinite number of times also changes. The
problem of coordinating an infinite number of equilibriums becomes extremely complex.
Let us see how the breakfast cereal industry has adapted to the challenge brought about
by the infinite repetition of the game – see Gardner (1996).
Breakfast cereals were invented in the USA in 1890 by advocates for a healthy diet.
Two of the inventors, Kellogg and Post, gave their names to the companies they founded.
Two other companies, General Mills and Ralston-Purina, were also able to get important
market shares in this industry. These four companies were rivals throughout the 20th
century. Large companies with strong benefits and brilliant futures, these four breakfast
cereal companies had reasons to think they were competing in this market for an
indefinite time. In this way they could act as if they were playing a game repeated an
infinite number of times, an equilibrium of which corresponds to a monopolistic price
policy.
In real life there is a restriction to the Oral Tradition Theorem for games repeated an
infinite number of times that has not been mentioned. Since 1890, a federal law in the
USA forbade “monopolizing, the intention of monopolising and the conspiracy to
monopolize” a market. Government agencies took it upon themselves to ensure
compliance with this legislation in defence of competition. Normally, the government of
the USA does not show that companies have monopolised a market. Instead, the
government tries to show that companies have conspired to monopolise the market. The
underlying behaviour of the Oral Tradition Theorem, which does not allude in any way to
conspiracy, could hardly be considered a violation of laws in defence of competition.
Nevertheless, companies with wide temporal horizons move along the thin line which
separates legal from illegal practices.
Suppose the main companies in an industry reach the solution of an infinitely
repeated game which grants monopolistic benefits. Each time one of the economic
parameters changes, such as costs or consumer preference, the equilibriums of the game
repeated a certain number of times also change. Companies need some mechanisms to
pass from the equilibrium where they are to the new one. If this mechanism fails, they
may end up at the equilibrium of the basic game with reduced benefits for everyone. The
solution for this problem, which happened in the breakfast cereal industry, was called
price leadership. Under price leadership, a company, the price leader, takes charge of the
industry’s price policy. Each time there is a change in one of the economic parameters, a
change in the price policy is required; the price leader carries it out. The members of the
industry depend on the price leader to adapt to the correct prices, so that the industry
reaps the highest possible benefits.
Let the games begin and go on
51
Throughout most of the 20th century, the price leader in the breakfast cereal industry
was Kellogg’s, which was also number one in market share with over 40% of total sales.
Considering inflation in the USA, particularly after World War II, most of price changes
were upwards. From 1950 to 1972, 99% of all price changes were price increases. A large
proportion of all price increases, 80% between 1965 and 1970, were led by Kellogg’s.
Normally the other companies in the sector followed this leader quickly. Even when other
companies did not follow the lead, Kellogg’s would not go back on its price increases
policy. Instead, they spent more on advertising and waited for the rest of the industry to
adjust to the new price.
The price leader helped the breakfast cereal sector enjoy very high benefit margins,
well above the average rates of the benefits of their assets. A government agency, always
on the lookout for signs of a conspiracy, filed a law suit against the breakfast cereal
companies. Admitting that they did not have proof of an evil conspiracy, the agency
argued that through their behaviour, the breakfast cereal companies had in practice a
shared monopoly and, should therefore submit to the agency’s dictates. The idea the
agency had of a shared monopoly, if it was correct, confirmed the idea that the companies
in this sector, in following the price leader, had effectively achieved and maintained the
solution of the maximum benefits of their repeated game.
This situation dragged on for several years with all kinds of legal manoeuvres
regarding who should be the judge in the case. The case was also greatly politicised.
During the 1980 presidential campaign, Ronald Reagan wrote to Kellogg’s expressing his
concern with that situation. At the same time, labour organisations, fearing job losses,
pressured President Carter strongly not to press Kellogg’s. The agency realised that even
if it could prove the existence of a shared monopoly, the case would not win in the courts.
The judge in the case against the breakfast cereal companies closed it in 1981. Kellogg’s
and its price followers went on enjoying impressive benefits.
4
Conclusions
Most strategic games consist of repeated interactions between players. In this type of
games, the first step is to recognise the possibility of cooperating. Games which are
repeated have payoffs which ‘generate’ tensions between players who want to compete
and cooperate. In repeated games, players interact repeated which may condition current
behaviour based on past behaviour. This allows each player to be punished and rewarded,
and in the end it allows players to achieve higher payoffs, including escaping from the
prisoner’s dilemma. If the ‘Prisoner’s dilemma’ is only played once, the tension may
produce a competitive payoff. Players may then want to cooperate so as to achieve a
higher collective payoff, but the temptation to compete is irresistible. Nevertheless, as we
have seen, when this game is played repeatedly, cooperation is reinforced meaning
players achieve higher payoffs. In a game where players are patient enough, a trigger
strategy may be used to reinforce cooperation. However, it may take some time until the
players reach a tacit agreement on how to collaborate. This agreement can be reached as
soon as they realise that they are all strategic players with immediate unattainable goals,
since in order to achieve these goals they must sacrifice something. One must also
consider that cooperation allows higher payoffs to be reached and avoiding it brings
punishment.
52
M.C.P. Matos et al.
References
Gardner, S. (1996) Juegos para Empresarios y Economistas, edited by A. Bosch, Cambridge
University Press, Barcelona.
Jorgensen, S., Quincampoix, M. and Vincente, T.L. (2007) Advances in Dynamic Game Theory,
Birkhauser, Boston.
Matos, M.C.P. (2009) Jogos na Forma Codificada – Outra Forma de Representação dos Jogos,
PhD thesis, ISCTE-IUL, Lisboa.
Matos, M.C.P. and Ferreira, M.A.M. (2006) ‘Game representation-code form’, in Namatame, A.,
Kaizouji, T. and Aruka, Y. (Eds.): Economics and Heterogeneous Interacting Agents, Lecture
Notes in Economics and Mathematical Systems, Vol. 567, pp.321–334 [online] http://dx.doi.
org/10.1007/3-540-28727-2_22 (accessed 20 January 2016).
Straffin, P. (1980) ‘The prisoner’s dilemma’, UMAP Journal, Vol. 1, pp.101–103.
von Neumann, J. and Morgenstern, O. (1967) Theory of Games and Economic Behaviour, John
Wiley & Sons, Inc., New York.