Solving simple stochastic games with few coin toss positions∗
Peter Bro Miltersen
Department of Computer Scinece
Aarhus University
arXiv:1112.5255v3 [cs.GT] 20 Mar 2012
Rasmus Ibsen-Jensen
Department of Computer Scinece
Aarhus University
November 18, 2018
Abstract
Gimbert and Horn gave an algorithm for solving simple stochastic games with running time
O(r!n) where n is the number of positions of the simple stochastic game and r is the number of
its coin toss positions. Chatterjee et al. pointed out that a variant of strategy iteration can be
implemented to solve this problem in time 4r rO(1) nO(1) . In this paper, we show that an algorithm
combining value iteration with retrograde analysis achieves a time bound of O(r2r (r log r +
n)), thus improving both time bounds. While the algorithm is simple, the analysis leading to
this time bound is involved, using techniques of extremal combinatorics to identify worst case
instances for the algorithm.
1
Introduction
Simple stochastic games is a class of two-player zero-sum games played on graphs that was introduced to the algorithms and complexity community by Condon [6]. A simple stochastic game is
given by a directed finite (multi-)graph G = (V, E), with the set of vertices V also called positions
and the set of arcs E also called actions. There is a partition of the positions into V1 (positions
belonging to player Max), V2 (positions belonging to player Min), VR (coin toss positions), and a
special terminal position GOAL. Positions of V1 , V2 , VR have exactly two outgoing arcs, while the
terminal position GOAL has none. We shall use r to denote |VR | (the number of coin toss positions)
and n to denote |V | − 1 (the number of non-terminal positions) throughout the paper. Between
moves, a pebble is resting at one of the positions k. If k belongs to a player, this player should
strategically pick an outgoing arc from k and move the pebble along this arc to another node. If
k is a position in VR , Nature picks an outgoing arc from k uniformly at random and moves the
pebble along this arc. The objective of the game for player Max is to reach GOAL and should play
so as to maximize his probability of doing so. The objective for player Min is to minimize player
Max’s probability of reaching GOAL.
A strategy for a simple stochastic game is a (possibly randomized) procedure for selecting which
arc or action to take, given the history of the play so far. A positional strategy is the very special
case of this where the choice is deterministic and only depends on the current position, i.e., a
∗
The authors acknowledge support from the Danish National Research Foundation and The National Science
Foundation of China (under the grant 61061130540) for the Sino-Danish Center for the Theory of Interactive Computation, within which this work was performed. The authors also acknowledge support from the Center for Research
in Foundations of Electronic Markets (CFEM), supported by the Danish Strategic Research Council.
1
positional strategy is simply a map from positions to actions. If player Max plays using strategy x
and player Min plays using strategy y, and the play starts in position k, a random play p(x, y, k)
of the game is induced. We let u(x, y) denote the probability that player Max will reach GOAL in
this random play. A strategy x∗ for player Max is said to be optimal if for all positions k it holds
that
inf uk (x∗ , y) ≥ sup inf uk (x, y),
(1)
y∈S2
x∈S1 y∈S2
where S1 (S2 ) is the set of strategies for player Max (Min). Similarly, a strategy y ∗ for player Min
is said to be optimal if
sup uk (x, y ∗ ) ≤ inf sup uk (x, y).
(2)
y∈S2 x∈S1
x∈S1
A general theorem of Liggett and Lippman ([12], fixing a bug of a proof of Gillette [9]) restricted
to simple stochastic games, implies that:
• Optimal positional strategies x∗ , y ∗ for both players exist.
• For such optimal x∗ , y ∗ and for all positions k,
min uk (x∗ , y) = max uk (x, y ∗ ).
y∈S2
x∈S1
This number is called the value of position k. We shall denote it val(G)k and the vectors of
values val(G).
In this paper, we consider quantitatively solving simple stochastic games, by which we mean computing the values of all positions of the game, given an explicit representation of G. Once a simple
stochastic game has been quantitatively solved, optimal strategies for both players can be found
in linear time [2]. However, it was pointed out by Anne Condon twenty years ago that no worst
case polynomial time algorithm for quantitatively solving simple stochastic games is known. By
now, finding such an algorithm is a celebrated open problem. Gimbert and Horn [10] pointed out
that the problem of solving simple stochastic games parametrized by r = |VR | is fixed parameter
tractable. That is, simple stochastic games with “few” coin toss positions can be solved efficiently.
The algorithm of Gimbert and Horn runs in time r!nO(1) . The next natural step in this direction is
to try to find an algorithm with a better dependence√on the parameter r. Thus, Dai and Ge [8] gave
a randomized algorithm with expected running time r!nO(1) . Chatterjee et al. [4] pointed out that
a variant of the standard algorithm of strategy iteration devised earlier by the same authors [5] can
be applied to find a solution in time 4r r O(1) nO(1) (they only state a time bound of 2O(r) nO(1) , but
a slightly more careful analysis yields the stated bound). The dependence on n in this bound is at
least quadratic. The main result of this paper is an algorithm running in time O(r2r (r log r + n)),
thus improving all of the above bounds. More precisely, we show:
Theorem 1 Assuming unit cost arithmetic on numbers of bit length up to Θ(r), simple stochastic
games with n positions out of which r are coin toss positions, can be quantitatively solved in time
O(r2r (r log r + n)).
The algorithm is based on combining a variant of value iteration [15, 7] with retrograde analysis
[3, 1]. We should emphasize that the time bound of Theorem 1 is valid only for simple stochastic
games as originally defined by Condon. The algorithm of Gimbert and Horn (and also the algorithm
2
Function SolveSSG(G)
v ← (1, 0, ..., 0);
for i ∈ {1, 2, . . . , 2(ln 25 max(r,6)+1 ) · 2max(r,6) } do
v ← SolveDGG(G, v);
v ′ ← v;
vk ← (vj′ + vℓ′ )/2, for all k ∈ VR , vj and vℓ being the two successors of vk ;
Round each value vk down to 7r binary digits;
v ← SolveDGG(G, v);
v ← KwekMehlhorn(v, 4r );
return v
Figure 1: Algorithm for solving simple stochastic games
of Dai and Ge, though this is not stated in their paper) actually applies to a generalized version
of simple stochastic games where coin toss positions are replaced with chance positions that are
allowed arbitrary out-degree and where a not-necessarily-uniform distribution is associated to the
outgoing arcs. The complexity of their algorithm for this more general case is O(r!(|E| + p)), where
p is the maximum bit-length of a transition probability (they only claim O(r!(n|E| + p)), but by
using retrograde analysis in their Proposition 1, the time is reduced by a factor of n).√The algorithm
of Dai and Ge has analogous expected complexity, with the r! factor replaced with r!. While our
algorithm and the strategy improvement algorithm of Chatterjee et al. can be generalized to also
work for these generalized simple stochastic games, the dependence on the parameter p would be
much worse - in fact exponential in p. It is an interesting open problem to get an algorithm with a
complexity polynomial in 2r as well as p, thereby combining the desirable features of the algorithms
based on strategy iteration and value iteration with the features of the algorithm of Gimbert and
Horn.
1.1
Organization of paper
In Section 2 we present the algorithm and show how the key to its analysis is to give upper bounds
on the difference between the value of a given simple stochastic game and the value of a time
bounded version of the same game. In Section 3, we then prove such upper bounds. In fact, we
offer two such upper bounds: One bound with a relatively direct proof, leading to a variant of
our algorithm with time complexity O(r 2 2r (r + n log n)) and an optimal bound on the difference
in value, shown using techniques from extremal combinatorics, leading to an algorithm with time
complexity O(r2r (r + n log n)). In the Conclusion section, we briefly sketch how our technique
also yields an improved upper bound on the time complexity of the strategy iteration algorithm of
Chatterjee et al.
2
2.1
The algorithm
Description of the algorithm
Our algorithm for solving simple stochastic games with few coin toss positions is the algorithm of
Figure 1.
3
Procedure ModifiedValueIteration(G)
v ← (1, 0, ..., 0);
while true do
v ← SolveDGG(G, v);
v ′ ← v;
vk ← (vj′ + vℓ′ )/2, for all k ∈ VR , vj and vℓ being the two successors of vk ;
Figure 2: Modified value iteration
In this algorithm, the vectors v and v ′ are real-valued vectors indexed by the positions of G.
We assume the GOAL position has the index 0, so v = (1, 0, ..., 0) is the vector that assigns 1 to the
GOAL position and 0 to all other positions. SolveDGG is the retrograde analysis based algorithm
from Proposition 1 in Andersson et al. [1] for solving deterministic graphical games. Deterministic
graphical games are defined in a similar way as simple stochastic games, but they do not have coin
toss positions, and arbitrary real payoffs are allowed at terminals. The notation SolveDGG(G, v ′ )
means solving the deterministic graphical game obtained by replacing each coin toss position k of
G with a terminal with payoff vk′ , and returning the value vector of this deterministic graphical
game. Finally, KwekMehlhorn is the algorithm of Kwek and Mehlhorn [11]. KwekMehlhorn(v, q)
returns a vector where each entry vi in the vector v is replaced with the smallest fraction a/b with
a/b ≥ vi and b ≤ q.
The complexity analysis of the algorithm is straightforward, given the analyses of the procedures SolveDGG and KwekMehlhorn from [1, 11]. There are O(r2r ) iterations, each requiring time
O(r log r + n) for solving the deterministic graphical game. Finally, the Kwek-Mehlhorn algorithm
requires time O(r) for each replacement, and there are only r replacements to be made, as there are
only r different entries different from 1, 0 in the vector v, corresponding to the r coin toss positions,
by standard properties of deterministic graphical games [1].
2.2
Proof of correctness of the algorithm
We analyse our main algorithm by first analysing properties of a simpler non-terminating algorithm,
depicted in Figure 2. We shall refer to this algorithm as modified value iteration.
Let v t be the content of the vector v immediately after executing SolveDGG in the (t + 1)’st
iteration of the loop of ModifiedValueIteration on input G. To understand this variant of value
iteration, we may observe that the v t vectors can be given a “semantics” in terms of the value of a
time bounded game.
Definition 2 Consider the “timed modification” Gt of the game G defined as follows. The game
is played as G, except that play stops and player Max loses when the play has encountered t + 1
(not necessarily distinct) coin toss positions. We let val(Gt )k be the value of Gt when play starts
in position k.
Lemma 3 ∀k, t : vkt = val(Gt )k .
Proof Straightforward induction in t (“backwards induction”).
From the semantics offered by Lemma 3 we immediately have ∀k, t : val(Gt )k ≤ val(Gt+1 )k .
Futhermore, it is true that limt→∞ val(Gt ) = val(G), where val(G) is the value vector of G. This
4
latter statement is very intuitive, given Lemma 3, but might not be completely obvious. It may be
established rigorously as follows:
Definition 4 For a given game G, let the game Ḡt be the following. The game is played as G,
except that play stops and player Max loses when the play has encountered t + 1 (not necessarily
distinct) positions. We let val(Ḡt )k be the value of Ḡt when play starts in position k.
(We note that val(Ḡt ) is the valuation computed after t iterations of unmodified value iteration [7].)
A very general theorem of Mertens and Neyman [13] linking the value of an infinite game to the
values of its time limited versions implies that limt→∞ val(Ḡt ) = val(G). Also, we immediately see
that for any k, val(Ḡt )k ≤ val(Gt )k ≤ val(G)k , so we also have limt→∞ val(Gt )k = val(G)k .
To relate SolveDGG of Figure 1 to modified value iteration of Figure 2, it turns out that we
want to upper bound the smallest t for which
∀i : val(G)i − val(Gt )i ≤ 2−5r .
Let T (G) be that t. We will bound T (G) using two different approaches. The first, in Subsection
3.1, is rather direct and is included to show what may be obtained using completely elementary
means. It shows that T (G) ≤ 5(ln 2) · r 2 · 2r , for any game G with r coin toss positions (Lemma 7).
The second, in Subsection 3.2, identifies an extremal game (with respect to convergence rate)
with a given number of positions and coin toss positions. More precisely:
Definition 5 Let Sn,r be the set of simple stochastic games with n positions out of which r are
coin toss positions. Let G ∈ Sn,r be given. We say that G is t-extremal if
max(val(G)i − val(Gt )i ) = max max(val(H)i − val(H t )i ).
i
H∈Sn,r
i
We say that G is extremal if it is t-extremal for all t.
It is clear that t-extremal games exists for any choice of n, r and t. (That extremal games
exists for any choice of n and r is shown later in the present paper.) To find an extremal game,
we use techniques from extremal combinatorics. By inspection of this game, we then get a better
upper bound on the convergence rate than that offered by the first approach. We show using this
approach that T (G) ≤ 2(ln 25r+1 ) · 2r , for any game G ∈ Sn,r (Corollary 14).
Assuming that an upper bound on T (G) is available, we are now ready to finish the proof of
correctness of the main algorithm. We will only do so explicitly for the bound on T (G) obtained
by the second approach from Subsection 3.2 (the weaker bound implies correctness of a version of
the algorithm performing more iterations of its main loop). From Corollary 14, we have that for
any game G ∈ Sn,r , val(GT )k and hence, modified value iteration, approximates val(G)k within
an additive error of 2−5r for t ≥ 2(ln 25r+1 ) · 2r and k being any position. SolveSSG differs from
ModifiedValueIteration by rounding down the values in the vector v in each iteration. Let ṽ t be
the content of the vector v immediately after executing SolveDGG in the t’th iteration of the loop
of SolveSSG. We want to compare val(Gt )k with ṽkt for any k. As each number is rounded down
by less than 2−7r in each iteration of the loop and recalling Lemma 3, we see by induction that
val(Gt )k − t2−7r ≤ ṽkt ≤ val(Gt )k .
5
In particular, when t = 2(ln 25r+1 ) · 2r , we have that ṽkt approximates val(G)k within 2−5r +
2(ln 25r+1 ) · 2r 2−7r < 2−4r , for any k, as we can assume r ≥ 6 by the code of SolveSSG of Figure 1.
Lemma 2 of Condon [6] states that the value of a position in a simple stochastic game with n
non-terminal positions can be written as a fraction with integral numerator and denominator at
most 4n . As pointed out by Chatterjee et al. [4], it is straightforward to see that her proof in
fact gives an upper bound of 4r , where r is the number of coin toss positions. It is well-known
1
. Therefore,
that two distinct fractions with denominator at most m ≥ 2 differ by at least m(m−1)
1
−4r
t
< 4r ·(4r −1) from below, we in fact have that val(G)k
since ṽk approximates val(G)k within 2
is the smallest rational number p/q so that q ≤ 4r and p/q ≥ ṽkt . Therefore, the Kwek-Mehlhorn
algorithm applied to ṽkt correctly computes val(G)k , and we are done.
We can not use the bound on T (G) obtained by the first direct approach (in Subsection 3.1) to
show the correctness of SolveSSG, but we can show the correctness of the version of it that runs the
main loop an additional factor of O(r) times, that is, i should range over {1, 2, . . . , 5(ln 2) · r 2 2r }
instead of over {1, 2, . . . , 2(ln 25r+1 ) · 2r }.
3
3.1
Bounds on the convergence rate
A direct approach
Lemma 6 Let G ∈ Sn,r be given. For all positions k and all integers i ≥ 1, we have
val(G)k − val(Gi·r )k ≤ (1 − 2−r )i .
Proof If val(G)k = 0, we also have val(Gi·r )k = 0, so the inequality holds. Therefore, we can
assume that val(G)k > 0.
Fix some optimal positional strategy, x, for Max in G. Let y be any pure (i.e., deterministic,
but not necessarily positional) strategy for Min with the property that y guarantees that the pebble
will not reach GOAL after having been in a position of value 0 (in particular, any best reply to any
strategy of Max, including x, clearly has this property).
The two strategies x and y together induce a probability space σk on the set of plays of the game,
starting in position k. Let the probability measure on plays of Gt associated with this strategy be
denoted Prσk . Let Wk be the event that this random play reaches GOAL. We shall also consider
the event Wk to be a set of plays. Note that any position occurring in any play in Wk has non-zero
value, by definition of y.
Claim. There is a play in Wk where each position occurs at most once.
Proof of Claim. Assume to the contrary that for all plays in Wk , some position occurs at least
twice. Let y ′ be the modification of y where the second time a position, v, in V2 is entered in a
given play, y takes the same action as was used the first time v occurred. Let W ′ be the set of plays
generated by x and y for which the pebble reaches GOAL. We claim that W ′ is in fact the empty
set. Indeed, if W ′ contains any play q, we can obtain a play in W ′ where each position occurs only
once, by removing all transitions in q occurring between repetitions of the same position. Such a
play is also an element of Wk , contradicting the assumption that all plays in Wk has a position
occurring twice. The emptiness of W ′ shows that the strategy x does not guarantee that GOAL is
reached with positive probability, when play starts in k. This contradicts either that x is optimal
or that val(G)k > 0. We therefore conclude that our assumption is incorrect, and that there is
6
a play q in Wk where each position occurs only once, as desired. This completes the proof of the
claim.
The probability according to the probability measure σk that a given play where each coin toss
position occurs only once occurs, is at least 2−r .
Let Wki be the set of plays in Wk that contains at most i occurrences of coin toss positions
(and also let Wki denote the corresponding event with respect to the measure σk ). Since the above
claim holds for any position k of non-zero value and plays in Wk only visits positions of nonzero value, we see that Prσk [¬Wki·r |Wk ] ≤ (1 − 2−r )i , for any i. Since x is optimal, we also have
Prσk [Wk ] ≥ val(G)k . Therefore,
Pr[Wki·r ] = Pr[Wk ] − Pr[¬Wki·r |Wk ] Pr[Wk ]
σk
σk
σk
σk
−r i
≥ val(G)k − (1 − 2
)
The above derivation is true for any y guaranteeing that no play can enter a position of value 0
and then reach GOAL, and therefore it is also true for y being the optimal strategy in the timelimited game, Gi·r . In that case, we have Prσk [Wki·r ] ≤ val(Gi·r )k . We can therefore conclude that
val(Gi·r )k ≥ val(G)k − (1 − 2−r )i , as desired.
Lemma 7 Let G ∈ Sn,r be given.
T (G) ≤ 5(ln 2) · r 2 · 2r
Proof We will show that for any t ≥ 5(ln 2)·r 2 ·2r and any k, we have that val(G)k −val(Gt )k < 2−5r .
From Lemma 6 we have that ∀i, k : val(G)k − val(Gi·r )k < (1 − 2−r )i . Thus,
r
t
t
val(G)k − val(Gt )k < (1 − 2−r )t/r = ((1 − 2−r )2 ) r·2r < e− r·2r ≤ e−
5(ln 2)r 2 2r
r·2r
= 2−5r .
3.2
An extremal combinatorics approach
The game of Figure 3 is a game in Sn,r . We will refer to this game as En,r . En,r consists of
no Max-positions. Each Min-position in En,r has GOAL as a successor twice. The i’th coin toss
position, for i ≥ 2, has the (i − 1)’st and the r’th coin toss position as successors. The first coin
toss position has GOAL and the r’th coin toss position as successors. The game is very similar to a
simple stochastic game used as an example by Condon [7] to show that unmodified value iteration
converges slowly.
In this subsection we will show that En,r is an extremal game in the sense of Definition 5 and
upper bound T (En,r ), thereby upper bounding T (G) for all G ∈ Sn,r .
The first two lemmas in this subsection concerns assumptions about t-extremal games we can
make without loss of generality.
Lemma 8 For all n, r, t, there is a t-extremal game in Sn,r with V1 = ∅, i.e., without containing
positions belonging to Max.
7
n
n−1
...
r+1
GOAL
1
...
r−1
r
Figure 3: The extremal game En,r . Circle nodes are coin toss positions, triangle nodes are Min
positions and the node labeled GOAL is the GOAL position.
Proof Take any t-extremal game G ∈ Sn,r . Let x be an optimal positional strategy for Max
in this game. Now replace each position belonging to Max with a position belonging to Min
with both outgoing arcs making the choice specified by x. Call the resulting game H. We claim
that H is also t-extremal. First, clearly, each position k of H has the same value as is has in
G, i.e., val(G)k = val(H)k . Also, if we compare the values of the positions of the games H t
and Gt defined in the statement of Lemma 3, we see that val(H t )k ≤ val(Gt )k , since the only
difference between H t and Gt is that player Max has more options in the latter game. Therefore,
val(H)k − val(H t )k ≥ val(G)k − val(Gt )k so H must also be t-extremal.
Lemma 9 For all n, r, t, there exists a t-extremal game in Sn,r , where all positions have value one
and where no positions belong to player Max.
Proof By Lemma 8, we can pick a t-extremal game G in Sn,r where no positions belong to player
Max. Suppose that not all positions in G have value 1. Then, it is easy to see that the set of
positions of value 0 is non-empty. Let this set be N . Let H be the game where all arcs into N
are redirected to GOAL. Clearly, all positions in this game have value 1. We claim that H is also
t-extremal.
Fix a position k. We shall show that val(H)k − val(H t )k ≥ val(G)k − val(Gt )k and we shall be
done. Let σk be a (not necessarily positional) optimal strategy for player Min in Gt for plays starting
in k and let the probability measure on plays of Gt associated with this strategy be denoted Prσk .
As σk is also a strategy that can be played in G, we have Prσk [Play does not reach N ] ≥ val(G)k .
Also, by definition, Prσk [Play reaches GOAL] = val(Gt )k . That is,
Pr[Play reaches neither GOAL nor N ] ≥ val(G)k − val(Gt )k .
σk
Let σ̄k be an optimal strategy for plays starting in k for player Min in H t . This strategy can also be
used in Gt . Let the probability distribution on plays of Gt associated with this strategy be denoted
Prσ̄k . Note that plays reaching GOAL in H t correspond to those plays reaching either GOAL or
8
N in Gt . Thus, by definition, Prσ̄k [Play reaches neither GOAL nor N ] = 1 − val(H t )k . As σk can
be used in H t where σ̄k is optimal, we have
1 − val(H t )k ≥ val(G)k − val(Gt )k .
But since val(H)k = 1, this is the desired inequality
val(H)k − val(H t )k ≥ val(G)k − val(Gt )k .
The next lemma will be used to derive a ordering of the positions in any game G satisfying the
restrictions of Lemma 9.
Lemma 10 Let G be a game without Max positions in which all positions have value one. Let
V ′ be a non-empty set of positions of G that does not include GOAL. Then, at least one of the
following two cases hold:
1. V ′ contains a Min position with both successors outside of V ′ or
2. V ′ contains a coin toss position with at least one successor outside of V ′ .
Proof Suppose not. Then the Min-player can force play to stay within V ′ when play starts in V ′ .
Thus, the values of all positions in V ′ are 0, a contradiction.
The following lemma will be used several times to change the structure of a game while only
making it more extremal, eventually making the game into the specific game En,r (in the context
of extremal combinatorics, this is a standard technique pioneered by Moon and Moser [14]).
Lemma 11 Given a game G. Let c be a coin toss position in G and let k be an immediate successor
position k of c. Also, let a position k ′ with the following property be given: ∀t : val(Gt )k′ ≤ val(Gt )k .
Let H be the game where the arc from c to k is redirected to k ′ . Then, ∀t, j : val(H t )j ≤ val(Gt )j .
Proof In this proof we will throughout refer to the properties of ModifiedValueIteration and
use Lemma 3. We show by induction in t that ∀j, t : val(H t )j ≤ val(Gt )j . For t = 0 we have
val(H t )j = val(Gt )j by inspection of the algorithm. Now assume that the inequality holds for all
values smaller than t and for all positions i and we will show that it holds for t and all positions j.
Consider a fixed position j. There are three cases.
1. The position j belongs to Max or Min. In this case, we observe that the function computed by
SolveDGG to determine the value of position j in ModifiedValueIteration is a monotonously
increasing function. Also, the deterministic graphical game obtained when replacing coin toss
positions with terminals is the same for G and for H. By the induction hypothesis, we have
that for all i, val(H t−1 )i ≤ val(Gt−1 )i . So, val(H t )j ≤ val(Gt )j .
2. The position j is a coin toss position, but not c. In this case, we have
1
1
val(Gt )j = val(Gt−1 )a + val(Gt−1 )b ,
2
2
and
1
1
val(H t )j = val(H t−1 )a + val(H t−1 )b
2
2
where a and b are the successors of j. By the induction hypothesis, val(H t−1 )a ≤ val(Gt−1 )a
and val(H t−1 )b ≤ val(Gt−1 )b . Again, we have val(H t )j ≤ val(Gt )j .
9
GOAL
3
5
2
4
1
Figure 4: An example of H0 , with n = 5 and r = 3. Circle nodes are coin toss positions, triangle
nodes are Min positions and the node labeled GOAL is the GOAL position. Note that this particular
H0 is not extremal.
3. The position j is equal to c. In this case, we have
1
1
val(Gt )c = val(Gt−1 )a + val(Gt−1 )k
2
2
where a and k are the successors of c in G while
1
1
val(H t )c = val(H t−1 )a + val(H t−1 )k′ .
2
2
By the induction hypothesis we have val(H t−1 )a ≤ val(Gt−1 )a . We also have that val(H t−1 )k′ ≤
val(Gt−1 )k′ which is, by assumption, at most val(Gt−1 )k . So, we have val(H t )c ≤ val(Gt )c .
Theorem 12 En,r is an extremal game in Sn,r .
Proof Let H0 ∈ Sn,r , where V1 = ∅ and all positions in H0 have value one. We will show that for
t ) ) ≥ max ′ (val(H ) ′ − val(H t ) ′ ). Since by Lemma
all t we have that maxk (val(En,r )k − val(En,r
0 k
k
k
0 k
9, we can take H0 to be a t-extremal game for any t, En,r is a t-extremal game for all t and is hence
an extremal game.
To illustrate the proof we will use as running example the game in Figure 4.
We shall construct a sequence H1 , H2 , . . . , Hn of games (which will in fact be identical to H0 ,
except that positions have been renumbered), so that Hk has the following property Pk .
Property Pk : Any Min position j among the positions 1, 2, . . . , k has all successors within the
set {1, 2, . . . , j − 1, GOAL}. Any coin toss position j among the positions {1, 2, . . . , k} has at least
one successor within {1, 2, . . . , j − 1, GOAL}.
Suppose we already constructed Hj for j < k. We show how to construct Hk based on Hk−1 .
Applying Lemma 10 to the game Hk−1 with V ′ = {k, . . . , n}, we find among the positions k, . . . , n
either a coin toss position u with one successor in {1, 2, . . . , k − 1, GOAL} or a Min-position u with
all successors in {1, 2, . . . , k − 1, GOAL}. In either case, we renumber u to k and k to u and let the
resulting game be Hk .
Figure 5 shows Hn for the case of our running example.
Each coin toss position in Hn has at least one successor with a lower index than itself. (Recall
that GOAL has index 0.) In the following, we call this successor the lower successor and the other
successor the higher successor. If both successors in fact have a lower index than the position, we
choose one of the two successor arbitrarily as the higher successor.
10
GOAL
1
2
3
4
5
Figure 5: Hn , if H0 is the game in Figure 4.
GOAL
1
2
3
4
6
7
8
9
10
5
Figure 6: H ′ , if H0 is the game in Figure 4.
We now make a series of transformations from Hn generating a new sequence of games. This
will take us outside the set of games in Sn,r , but the last game in the sequence will again be in
Sn,r . For each transformation, from G′ to G′′ , we will show that val(G′′ t )k ≤ val(G′ t )k for all t and
k. For the final game En,r we arrive at, we clearly have that val(En,r )k = 1 for all k. This is in fact
also true for all intermediate games, but we shall not need that fact.
For each of the original non-terminal positions 1, 2, . . . , n in Hn , we add a Min-position. We
assign index n + j to the Min-position associated with position j. We let the two successors of
position n + j be j and n + j − 1, except for the case of n + 1, where we let the two successors
be 1 and GOAL. Let the resulting game be denoted H ′ . For our running example, H ′ is shown in
Figure 6.
Applying Lemma 3 and inspecting the code of ModifiedValueIteration, we have that
t
∀k ∈ {1, 2, . . . , n}, t : val(H ′ )k = val(Hnt )k .
We will only use that ∀k ∈ {1, 2, . . . , n}, t : val(H ′ t )k ≤ val(Hnt )k .
We also have the following fact: val(H ′ t )2n = minj∈{1,2,...,n} val(H ′ t )j . Indeed, we can argue
by induction in k that val(H ′ t )n+k = minj∈{1,2,...,k} val(H ′ t )j . For the base case, we have that
val(H ′ t )n+1 = min(1, val(H ′ t )1 ). For j > 1, we have
t
t
t
val(H ′ )n+j = min(val(H ′ )j , val(H ′ )n+j−1 ),
completing the proof. This property, and the fact that the proof only used information about the
successors of n + k for k ∈ {1, 2, . . . , n}, allows us to apply Lemma 11 iteratively to modify the
11
GOAL
1
2
3
4
6
7
8
9
10
5
Figure 7: H0′ , if H0 is the game of Figure 4.
GOAL
1
2
3
4
6
7
8
9
10
5
Figure 8: Hr′ , if H0 is the game in Figure 4.
game by changing the higher successor of each and every coin toss position to be the position 2n.
We denote this game H0′ .
For our running example, H0′ is shown in Figure 7.
Let the coin toss positions in H0′ be i1 < i2 < · · · < ir . Then, we define a sequence of games
H1′ , H2′ , . . . , Hr′ as follows. We define H1′ from H0′ by changing the lower successor of i1 to GOAL.
′
For j > 1, we define Hj′ from Hj−1
by changing the lower successor of ij to be ij−1 .
For our running example, Hr′ is given in Figure 8.
Claim. For t ≥ 0, j ∈ {1, . . . , r + 1}, the following holds. For a position with index k strictly
smaller than ij , we have val(H ′ tj−1 )k ≥ val(H ′ tj−1 )ij−1 . Here, by convention, we let i0 be the GOAL
position when considering the statement for j = 1 and we let ir+1 be ∞ when considering the
statement for j = r + 1.
Proof of claim. The proof is by induction in j.
Clearly, val(H ′ tj ′ )k = 1 for all positions k in 1, 2, . . . , i1 − 1 and for all j ′ and t, so this settles
the base case of j = 1.
For larger values of j, and k < ij−1 , we have by construction that val(H ′ tj−1 )k = val(H ′ tj−2 )k .
Now, k ≥ ij−1 . Therefore there are two cases. Either k is a coin toss position or k is a Min-position.
If k is a coin toss position, we can without loss of generality assume that k = ij−2 , since
12
GOAL
1
2
3
4
6
7
8
9
10
5
Figure 9: H ′′ , if H0 is the game in Figure 4.
it has the smallest value among all coin toss positions, by the induction hypothesis. Therefore,
we only need to show that ∀t : val(H ′ tj−1 )ij−2 ≥ val(H ′ tj−1 )ij−1 . For j = 2, we have that ∀t :
val(H ′ t1 )i0 = 1 ≥ val(H ′ t1 )i1 . For j ≥ 3, we have by the properties of the algorithm that for
1
′ t−1
val(H ′ 0j−1 )ij−2 = 0 = val(H ′ 0j−1 )ij−1 and ∀t ≥ 1 : val(H ′ tj−1 )ij−2 = 21 val(H ′ t−1
j−1 )ij−3 + 2 val(H j−1 )i2n .
We have by the induction hypothesis that
1
1
t−1
t−1
val(H ′ j−1 )ij−3 + val(H ′ j−1 )i2n
2
2
1
1
t−1
t−1
val(H ′ j−1 )ij−2 + val(H ′ j−1 )i2n
2
2
t
= val(H ′ j−1 )ij−1 .
≥
If k is a Min-position in {ij−1 + 1, . . . , ij − 1}, assume to the contrary that some k fails to
satisfy val(H ′ tj−1 )k ≥ val(H ′ tj−1 )ij−1 . Consider the smallest such k. As k is a Min-position with
two successors both of which are smaller than k, we have that val(H ′ tj−1 )k is the minimum of
two numbers, both of which are at least val(H ′ tj−1 )ij−1 , either by the induction hypothesis or the
assumption that k is minimal. This completes the proof of the claim.
By the claim, for all positions k ∈ {1, 2, . . . , n}, we have that ∀t ≥ 0 : val(H ′ tr )k ≥ val(H ′ tr )ir .
Also, by an induction argument identical to the one we used to argue a similar property for H ′ ,
we have ∀t ≥ 0 : val(H ′ tr )2n = val(H ′ tr )ir . Thus we may define the game H ′′ from Hr′ by applying
Lemma 11 and changing all higher successors of all coin toss positions ij to be ir (instead of 2n).
For our running example, H ′′ is the game of Figure 9.
Finally, we arrive at En,r by subsequently removing all “new” Min-positions n + 1, . . . , 2n and
changing all successors of all original Min-positions of H ′′ to be the GOAL position.
For our running example, E5,3 is the extremal game identified by the proof, and is given in
Figure 10. Note that the same game would also be identified for any other game with r = 3 and
n = 5 (up to the indexing of the positions).
It is easy to see that all positions in En,r have value 1. We have that each Min-position k in
′′
H satisfies val(H ′′ t )k ≥ val(H ′′ t )ir .
We have that val(Hnt )k ≥ val(H ′′ t )ir = val(En,r t )ir for all k ∈ {1, 2, . . . , n}, since we found H ′′
either by applying Lemma 11 or in a way that did not change the value of any position for any
time bound.
Also, En,r ∈ Sn,r , therefore, since at least one possible option for H0 ∈ Sn,r (and therefore Hn ,
13
5
2
1
GOAL
3
4
Figure 10: The game E5,3 , upto indexing of the positions.
since Hn was a reindexing of the positions in H0 ) was a t-extremal game for any t, En,r is extremal.
This completes the proof of the lemma.
Having identified the extremal game En,r , we next estimate T (En,r ).
T ) ≤ǫ
Lemma 13 For ∀n, r > 0, ǫ > 0, t ≥ 2(ln 2ǫ−1 ) · 2r , k ∈ V : val(En,r )k − val(En,r
k
T ) , we can view E
Proof We observe that for the purposes of estimating val(En,r )k − val(En,r
n,r
k
as a game containing r coin toss positions only, since all Min-positions point directly to GOAL.
Also, when modified value iteration is applied to a game G containing only coin toss positions,
Lemma 3 implies that val(Gt )k can be reinterpreted as the probability that the absorbing Markov
process starting in state k is absorbed within t steps. By the structure of En,r , this is equal to the
probability that a sequence of t fair coin tosses contains r consecutive tails. This is known to be
(r)
(r)
exactly 1 − Ft+2 /2t , where Ft+2 is the (t + 2)’nd Fibonacci r-step number, i.e. the number given by
P
(r)
(k)
(r)
the linear homogeneous recurrence Fm = ri=1 Fm−i and the boundary conditions Fm = 0, for
(r)
(r)
(r)
m ≤ 0, F1 = F2 = 1. Asymptotically solving this linear recurrence, we have that Fm ≤ (φr )m−1
where φr is the root near 2 to the equation x + x−r = 2. Clearly, φr < 2 − 2−r , so
(r)
Ft+2 /2t <
(2 − 2−r )t+1
= 2(1 − 2−r−1 )t+1 < 2(1 − 2−r−1 )t .
2t
Therefore, the probability that the chain is not absorbed within t = 2(ln 2ǫ−1 ) · 2r steps is at most
2(1 − 2−r−1 )2(ln 2ǫ
−1 )·2r
≤ 2e− ln 2ǫ
−1
= ǫ.
Corollary 14 For ∀n, r > 0 : T (En,r ) ≤ 2(ln 25r+1 ) · 2r .
Proof The proof is by insertion into Lemma 13.
4
Conclusions
We have shown an algorithm solving simple stochastic games obtaining an improved running time
in the worst case compared to previous algorithms, as a function of its number of coin toss positions.
14
It is relevant to observe that the best case complexity of the algorithm is strongly related to its
worst case complexity, as the number of iterations of the main loop is fixed in advance.
As mentioned in the introduction, our paper is partly motivated by a result of Chatterjee et
al. [4] analysing the strategy iteration algorithm of the same authors [5] for the case of simple
stochastic games. We can in fact improve their analysis, using the techniques of this paper. We
combine three facts:
1. ([5, Lemma 8]) For a game G ∈ Sn,r , after t iterations of the strategy iteration algorithm [5]
applied to G, the (positional) strategy computed for Max guarantees a probability of winning
of at least val(Ḡt )k against any strategy of the opponent when play starts in position k, where
Ḡt is the game defined from G in Definition 4 of this paper.
2. For a game G ∈ Sn,r and all k, t, val(Ḡt(n−r+1) )k ≥ val(Gt )k . This is a direct consequence
of the definitions of the two games, and the fact that in an optimal play, either a coin toss
position is encountered at least after every n − r + 1 moves of the pebble, or never again.
3. Corollary 14 of the present paper.
These three facts together implies that the strategy iteration algorithm after 2(ln 25r+1 )·2r (n−r+1)
iterations has computed a strategy that guarantees the values of the game within an additive error
of 2−5r , for r ≥ 6. As observed by Chatterjee et al. [4], such a strategy is in fact optimal. Hence,
we conclude that their strategy iteration algorithm terminates in time 2r nO(1) . This improves
their analysis of the algorithm significantly, but still yields a bound on its worst case running
time inferior to the worst case running time of the algorithm presented here. On the other hand,
unlike the algorithm presented in this paper, their algorithm has the desirable property that it may
terminate faster than its worst case analysis suggests.
References
[1] D. Andersson, K. A. Hansen, P. B. Miltersen, and T. B. Sørensen. Deterministic graphical
games revisited. In CiE ’08: Proceedings of the 4th conference on Computability in Europe,
pages 1–10, Berlin, Heidelberg, 2008. Springer-Verlag.
[2] D. Andersson and P. B. Miltersen. The complexity of solving stochastic games on graphs. In
Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii,
USA, December 16-18, 2009. Proceedings, volume 5878 of Lecture Notes in Computer Science,
pages 112–121. Springer, 2009.
[3] R. E. Bellman. On the application of dynamic programming to the determination of optimal
play in chess and checkers. Procedings of the National Academy of Sciences of the United
States of America, 53:244–246, 1965.
[4] K. Chatterjee, L. de Alfaro, and T. A. Henzinger. Termination criteria for solving concurrent
safety and reachability games. In Proceedings of the Twentieth Annual ACM-SIAM Symposium
on Discrete Algorithms, SODA’09, pages 197–206, 2009.
[5] Krishnendu Chatterjee, Luca de Alfaro, and Thomas A. Henzinger. Strategy improvement
for concurrent reachability games. In Third International Conference on the Quantitative
Evaluation of Systems. QEST’06., pages 291–300. IEEE Computer Society, 2006.
[6] A. Condon. The complexity of stochastic games. Information and Computation, 96:203–224,
1992.
15
[7] A. Condon. On algorithms for simple stochastic games. In Advances in Computational Complexity Theory, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 51–73. American Mathematical Society, 1993.
[8] D. Dai and R. Ge. New results on simple stochastic games. In Algorithms and Computation,
20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18, 2009.
Proceedings, volume 5878 of Lecture Notes in Computer Science, pages 1014–1023. Springer,
2009.
[9] D. Gillette. Stochastic games with zero stop probabilities. In M. Dresher, A.W. Tucker,
and P. Wolfe, editors, Contributions to the Theory of Games III, volume 39 of Annals of
Mathematics Studies, pages 179–187. Princeton University Press, 1957.
[10] H. Gimbert and F. Horn. Simple Stochastic Games with Few Random Vertices are Easy to
Solve. In Proceedings of the 11th International Conference on the Foundations of Software Science and Computational Structures, FoSSaCS’08, volume 4962 of Lecture Notes in Computer
Science, pages 5–19. Springer-Verlag, 2008.
[11] S. Kwek and K. Mehlhorn. Optimal search for rationals. Inf. Process. Lett., 86(1):23–26, 2003.
[12] T. M. Liggett and S. A. Lippman. Stochastic games with perfect information and time average
payoff. SIAM Review, 11(4):604–607, 1969.
[13] J. F. Mertens and A. Neyman. Stochastic games. International Journal of Game Theory,
10:53–66, 1981.
[14] J. Moon and L. Moser. On cliques in graphs. Israel Journal of Mathematics, 3:23–28, 1965.
10.1007/BF02760024.
[15] L. S. Shapley. Stochastic games. Proc. Nat. Acad. Science, 39:1095–1100, 1953.
16