2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
The Gittins Policy in the M/G/1 Queue
Ziv Scully*
Mor Harchol-Balter*
Carnegie Mellon University
Pittsburgh, PA, USA
[email protected]
Carnegie Mellon University
Pittsburgh, PA, USA
[email protected]
Abstract—The Gittins policy is a highly general scheduling
policy that minimizes a wide variety of mean holding cost metrics
in the M/G/1 queue. Perhaps most famously, Gittins minimizes
mean response time in the M/G/1 when jobs’ service times
are unknown to the scheduler. Gittins also minimizes weighted
versions of mean response time. For example, the well-known
“cµ rule”, which minimizes class-weighted mean response time
in the multiclass M/M/1, is a special case of Gittins.
However, despite the extensive literature on Gittins in the
M/G/1, it contains no fully general proof of Gittins’s optimality.
This is because Gittins was originally developed for the multiarmed bandit problem. Translating arguments from the multiarmed bandit to the M/G/1 is technically demanding, so it has
only been done rigorously in some special cases. The extent of
Gittins’s optimality in the M/G/1 is thus not entirely clear.
In this work we provide the first fully general proof of Gittins’s
optimality in the M/G/1. The optimality result we obtain is
even more general than was previously known. For example, we
show that Gittins minimizes mean slowdown in the M/G/1 with
unknown or partially known service times, and we show that
Gittins’s optimality holds under batch arrivals. Our proof uses a
novel approach that works directly with the M/G/1, avoiding the
difficulties of translating from the multi-armed bandit problem.
TABLE I.1
G ITTINS O PTIMALITY R ESULTS FOR M/G/1-L IKE Q UEUES
Holding Cost
Model
Preemption
Service Times
Prior Proofs a
All equal
M/G/1
M/GMP /1
allowed
allowed
known
see Section III
1
11
By class
M/G/1
allowed
M/G/1+fbkb not allowed
M/M/1+fbkb allowed
unknown
unknown
unknown
4, 8
6, 9
7, 9, 10
By class and
service timec
M/G/1
M/G/1
M/G/1
known
known
unknown
2
3
5
not allowed
allowed
allowed
a
A list of prior proofs appears in Section II.
Here “fbk” stands for feedback, meaning that whenever a job exits the
system, it has some probability of being replaced by another job.
c This includes minimizing mean slowdown, in which a job’s holding cost is
the reciprocal of its service time.
b
then serves the job of maximal index at every decision time.
The “magic” of Gittins is in how it determines each job’s
index, which we describe in detail in Section IV. SRPT and
the cµ rule are both special cases of Gittins.
Given its generality, it is perhaps unsurprising that the Gittins
I. I NTRODUCTION
policy
has been discovered several times. Similarly, there
Scheduling to minimize mean holding cost in queueing
are
several
proofs of its optimality under varying conditions.
systems is an important problem. Minimizing metrics such as
Table
I.1
summarizes
several M/G/1-like settings in which
mean response time, weighted mean response time, and mean
slowdown can all be viewed as special cases of minimizing Gittins has been studied, and Section II gives a more detailed
holding cost [1].1 In single-server queueing systems, specifi- overview. In light of this substantial body of prior work, the
cally the M/G/1 and similar systems, a number of scheduling optimality of Gittins for minimizing mean holding cost in the
policies minimize mean holding cost in various special cases. M/G/1 is widely accepted in the literature [3–7].
We ourselves are researchers whose work often cites Gittins’s
Two famous examples are the Shortest Remaining Processing
Time (SRPT) policy, which minimizes mean response time when optimality in the M/G/1. However, in reviewing the literature,
service times are known to the scheduler, and the “cµ rule”, we found that there is no complete proof of Gittins’s optimality
which minimizes weighted mean response time in the multiclass in its full generality. This is in part because Gittins was
originally developed not for the M/G/1 but for the Markovian
M/M/1 with unknown service times.
It turns out that there is a policy that minimizes mean holding multi-armed bandit problem [8]. There are elegant arguments
cost in the M/G/1 under very general conditions. This policy, for Gittins’s optimality in the multi-armed bandit problem, but
now known as the Gittins policy after one of its principal they do not easily translate to the M/G/1. Results for the M/G/1
creators [2], has a relatively simple form. Gittins assigns each thus suffer from a variety of limitations (Section II), so the
job an index, which is a rating roughly corresponding to how extent of Gittins’s optimality in the M/G/1 is not entirely clear.
In this work, we give a unifying presentation of the Gittins
valuable it would be to serve that job. A job’s index depends
only on its own state, not the state of any other jobs. Gittins policy in M/G/1-like systems, resulting in the most general
definition of Gittins (Definition IV.2) and optimality theorem
* Supported by NSF grant nos. CMMI-1938909 and CSR-1763701 and a
(Theorem V.1) to date. Our approach deals directly with the
Google Faculty Award.
M/G/1, avoiding the difficulties of translating from the multi1 A job’s response time is the amount of time between its arrival and
completion. Jobs may be sorted into classes which are weighted by importance. armed bandit problem. As a result, we actually extend the
A job’s slowdown is the ratio between its response time and service time.
known scope of Gittins’s optimality, such as including systems
ISBN 978-3-903176-37-9 © 2021 IFIP
2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
with batch arrivals. We make the following contributions:
• We discuss many prior proofs of Gittins’s optimality,
detailing the limitations of each one (Section II).
• We give a new general definition of the Gittins policy (Section IV). This involves introducing a new generalization
of the M/G/1 called the MB /GMP /1 queue (Section III).
• We state (Section V) and prove (Sections VI and VII)
Gittins’s optimality in the MB /GMP /1.
II. H ISTORY OF THE G ITTINS P OLICY IN THE M/G/1
In this section we review prior work on the Gittins policy
in M/G/1-like queues. This includes work on special cases
of Gittins, such as SRPT in the case of known service times,
that are not typically thought of as instances of the Gittins
policy. Unfortunately, every prior proof of Gittins’s optimality is
limited in some way. Most limitations are one of the following:
(i) Job finiteness. Most proofs assume some type of “finiteness” of the job model. This manifests as one of
(i-a) all service times being less than some finite bound,
(i-b) service time distributions being discrete with finitely
many support points, or
(i-c) finitely many job classes.
(ii) Simple job model or metric. Some proof techniques that
work for simple job models do not readily generalize.
This includes models with
(ii-a) known service times,
(ii-b) unknown, exponentially distributed service times, or
(ii-c) unknown, generally distributed service times with
nonpreemptive service.
(iii) Only considers index policies. Some proofs only show
that Gittins is an optimal index policy, as opposed to
optimal among all policies. An index policy is one that,
like Gittins, assigns each job an index based on the job’s
state and always serves the job of maximum index.
We now present prior work on the Gittins policy in rough
chronological order. Due to space limitations, we only briefly
summarize most prior proofs.
A. 1960s and 1970s: Initial Developments in Queueing
Prior Proof 1. Schrage [9].
Model: Preemptive single-server queue, known service times.
Holding costs: Same for all jobs.
Limitations: (ii-a).
Prior Proof 2. Fife [10, Section 4].
Model: Nonpreemptive M/G/1, known service times.
Holding costs: Based on class and service time.
Limitations: (i-b), (i-c), (ii-a), and (ii-c)
Prior Proof 3. Sevcik [11, Theorem 4-1].
Model: Preemptive M/G/1, known service times.
Holding costs: Based on class and service time.
Limitations: (ii-a) and (iii). Sevcik [11, Conjecture 4-1] argues
informally that an index policy should be optimal.
Prior Proof 4. Sevcik [11, Theorem 4-2].
Model: Preemptive M/G/1, unknown service times.
ISBN 978-3-903176-37-9 © 2021 IFIP
Holding costs: Based on class.
Limitations: (i-b), (i-c), and (iii). Sevcik [11, Conjecture 4-3]
argues informally that an index policy should be optimal.
Prior Proof 5. Von Olivier [12].
Model: Preemptive M/G/1, unknown service times.
Holding costs: Based on class and service time.
Limitations: (i-b), (i-c), and (iii).
One unique aspect of the von Olivier [12] result deserves
highlighting: jobs’ holding costs can depend on their unknown
service times. This allows minimizing metrics like mean
slowdown even when service times are unknown. However, this
result is not widely known in the queueing theory community,
perhaps in part because it has only been published in German.
Klimov [13] studied a nonpreemptive M/G/1 with feedback,
denoted M/G/1+fbk. In systems with feedback, whenever a
job exits the system, it has some probability of immediately
returning as another job, possibly of a different class.
Prior Proof 6. Klimov [13].
Model: Nonpreemptive M/G/1+fbk, unknown service times.
Holding costs: Based on class.
Limitations: (i-c) and (ii-c).
B. 1980s and 1990s: Connection to Multi-Armed Bandits
Prior Proof 7. Lai and Ying [14].
Model: Preemptive M/M/1+fbk, unknown service times.
Holding costs: Based on class.
Limitations: (i-c) and (ii-b).
Prior Proof 8. Gittins [2, Theorem 5.6].
Model: Preemptive M/G/1, unknown service times.
Holding costs: Based on class.
Limitations: (i-a) and (i-c).
Gittins’s result [2] is often cited in the literature as proving
the Gittins policy’s optimality in the M/G/1 [3–7]. As such, it
deserves some more detailed discussion.
Prior Proof 8 has two main steps. The first step simplifies
the problem by assuming the scheduler can only preempt jobs
in a discrete set of states2 [2, Theorem 3.28]. The set can be
countable in principle, but the proof assumes a side condition
that is only guaranteed to hold if the set is finite. The condition
comes from translating multi-armed bandit results to the M/G/1.
The second step uses a limit argument to allow unrestricted
preemption [2, Theorem 5.6]. However, because the first step
is limited to finitely many job states, the second step’s result
is also limited. Specifically, it requires finitely many classes
and that all service times be less than some finite bound.
Prior Proof 9. Achievable region approaches. See Bertsimas
[15], Dacre et al. [16], and references therein.
Model: Preemptive M/M/1+fbk or nonpreemptive M/G/1+fbk,
unknown service times.
Holding costs: Based on class.
Limitations: (i-c), (ii-b), and (ii-c).
2 In
this setting, a job’s state is the pair of its class and attained service.
2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
C. 2000s and 2010s: Analyzing Gittins and Its Performance
A. Markov-Process Jobs
The 2000s and 2010s did not, for the most part, see new
proofs of Gittins’s optimality. Researchers instead studied
properties of the Gittins policy [3] and analyzed its performance [5–7, 17]. One performance analysis based on dynamic
programming also gave a new optimality proof [17], but it did
not expand the known scope of Gittins’s optimality.
We model jobs as absorbing continuous-time strong Markov
processes. The state of a job encodes all information that the
scheduler knows about the job. Without loss of generality, we
assume all jobs share a common state space X and follow the
same stochastic Markovian dynamics. However, the realization
of the dynamics may be different for each job. In particular,
the initial state of each job is drawn from a distribution Xnew ,
so different jobs may start in different states.
While a job is in service, its state stochastically advances
according to the Markovian dynamics. This evolution is
independent of the arrival process and the evolution of other
jobs. A job’s state does not change while waiting in the queue.
In addition to the main job state space X, there is one
additional final state, denoted xdone . When a job enters
state xdone , it completes and exits the system. One can think of
a service time S as the stochastic amount of time it takes for
a job to go from its initial state, which is drawn from Xnew ,
to the final state xdone . Because we assume E[S] < ∞, every
job eventually reaches xdone with probability 1. For ease of
notation, we follow the convention that xdone 6∈ X.
Prior Proof 10. Whittle [17].
Model: Preemptive M/M/1+fbk, unknown service times.
Holding costs: Based on class.
Limitations: (i-c) and (ii-b).
D. 2020: Modeling Jobs as General Markov Processes
Prior Proof 11. Scully et al. [18, Theorem 7.3].
Model: Preemptive M/GMP /1, i.e. the preemptive MB /GMP /1
(Section III) without batch arrivals.
Holding costs: Same for all jobs.
Limitations: Assumes equal holding costs and that jobs are
preemptible in any state.
Our work can be seen as a significant extension of Prior
Proof 11. Specific aspects we address that Scully et al. [18] do
not include varying holding costs, nonpreemptible or partially
preemptible jobs, and batch arrivals.
III. S YSTEM M ODEL : THE M B /G MP /1 Q UEUE
We study scheduling in a generalization of the M/G/1 queue
to minimize a variety of mean holding cost metrics. The average
job arrival rate is λ, the service time distribution is S, and the
load is ρ = λE[S]. We assume ρ < 1 for stability.
We call our model the MB /GMP /1 queue. The “MB ” indicates
that jobs arrive in batches with Poisson arrival times. The “GMP ”
indicates generally distributed service times, with each job’s
service time arising from an underlying Markov process.
The main feature of the MB /GMP /1 is that it models jobs as
Markov processes. The key intuition is:
A job’s state encodes all information the scheduler
knows about the job.
Example III.1. To model known service times, let a job’s state
be its remaining service time. The state space is X = (0, ∞),
the initial state distribution Xnew is the service time distribution S, and the final state is xdone = 0. During service, a job’s
state decreases at rate 1.
Example III.2. To model unknown service times, let a job’s
state be its attained service, meaning the amount of time it
has been served so far. The state space is X = [0, ∞), all jobs
start in initial state Xnew = 0, and the final state xdone is an
isolated point. During service, a job’s state increases at rate 1,
but it also has a chance to jump to xdone . The jump probability
depends on the service time distribution S: the probability a
job jumps while being served from state x to state y > x is
P[S ≤ y | S > x].
B. Preemptible and Nonpreemptible States
Every job state is either preemptible or nonpreemptible. The
job in service can only be preempted if it is in a preemptible
state. We write XP for the set of preemptible states and
XNP = X \ XP for the set of nonpreemptible states. Naturally,
we assume the scheduler knows which states are preemptible.
We assume all jobs start in a preemptible state, i.e.
Xnew ∈ XP with probability 1. This means that all jobs in the
queue are in preemptible states, and only the job in service
can be in a nonpreemptible state.
We assume preemption occurs with no cost or delay. Because
a job’s state only changes during service, our model is preemptresume, meaning that preemption does not cause loss of work.
This means that the job Markov process differs depending
on what information the scheduler knows. For example, to
model the perfect-information case where the scheduler is told
every job’s service time when it arrives, a job’s state might be
its remaining service time, and the Markov process dynamics
would be deterministic (Example III.1). On the other extreme, if
the scheduler knows nothing other than the overall service time
distribution S, then a job’s state might be the amount of service
it has received so far, and the Markov process dynamics would
be stochastic (Example III.2). The MB /GMP /1 thus encompasses
a wide variety of M/G/1-like queues.
This section explains the MB /GMP /1 queue in more detail. C. Batch Poisson Arrival Process
The model’s main feature is that the information the scheduler
In the MB /GMP /1, jobs arrive in batches. We represent a
knows about a job may change as the job receives service batch as a list of states, where the ith state is the initial state
(Section III-A). A job’s preemptibility (Section III-B) and of the ith job in the batch. The batch vector has distribution
holding cost (Section III-E) may also change during its service. Xbatch = (Xbatch,1 , . . . , Xbatch,B ), where B is the distribution
ISBN 978-3-903176-37-9 © 2021 IFIP
2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
of the number of jobs per batch. The batch arrival times are
a Poisson process of rate λ/E[B], with each batch drawn
independently from Xbatch . The initial state distribution Xnew
is an aggregate distribution determined by picking a random
element from a length-biased sample of Xbatch .
We allow Xbatch to be an arbitrary distribution over lists of
preemptible states. That is, the starting states of the jobs within
a batch can be correlated with each other or with the size of a
batch. However, after arrival, jobs’ states evolve independently
of each other (Section III-A).
Our MB /GMP /1 model differs from the traditional M/G/1 with
batch Poisson arrivals, often denoted MX /G/1, in an important
way. In the MX /G/1, service times within a batch are drawn
i.i.d. from S. The MB /GMP /1 is more general in that starting
states within a batch can be correlated, so service times within
a batch can also be correlated.
D. System State
The state of the system can be described by a list
(x1 , . . . , xn ). Here n is the number of jobs in the system, and
xi ∈ X is the state of the ith job. We denote the equilibrium
distribution of the system state as (X1 , . . . , XN ), where N is
the equilibrium distribution of the number of jobs.
When discussing the equilibrium distribution of quantities
under multiple scheduling policies, we use a superscript π, as
in N π , to refer to the distribution under scheduling policy π.
E. Holding Costs and Objective
We assume that there each job incurs a cost for each unit of
time it is not complete. Such a cost is called a holding cost,
and it applies to every job. A job’s holding cost depends on its
state, so it may change during service. We denote the holding
cost of state x ∈ X by hold(x). Holding costs have dimension
COST / TIME. We assume that holding costs are deterministic,
positive,3 and known to the scheduler. For ease of notation,
we also define
PNhold(xdone ) = 0.
Let H = i=1 hold(Xi ) be the equilibrium distribution of
the total holding cost of all jobs in the system. Our objective
is to schedule to minimize mean holding cost E[H].
F. What Does the Scheduler Know?
The scheduler also knows, at every moment in time, the
current state of all jobs in the system. This assumption is
natural because the intuition of our model is that a job’s state
encodes everything the scheduler knows about the job.
We assume the scheduler knows a description of the job
model: the state space X, the subset of preemptible states
XP ⊆ X, and the Markovian dynamics that govern how a
job’s state evolves. This assumption is necessary for the Gittins
policy, as the policy’s definition depends on the job model.
Finally, we assume that the scheduler knows the holding
cost hold(x) of each state x ∈ X. However, it is possible to
transform some problems with unknown holding costs into
problems with known holding costs. A notable example is
3 The holding cost of nonpreemptible states does not impact minimizing mean
holding cost (Lemma VII.2), so one could have hold(x) ≤ 0 for x ∈ XNP .
ISBN 978-3-903176-37-9 © 2021 IFIP
minimizing mean slowdown when service times are unknown to
the scheduler (Example V.2). After transforming such problems
into known-holding-cost form, one can apply our results.
G. Technical Foundations
We have thus far avoided discussing technical measurability
conditions that the job model must satisfy. For example, if the
job Markov process has uncountable state space X, one should
make some topological assumptions on X and XP , as well
as some continuity assumptions on holding costs. As another
example, when discussing subsets Y ⊆ XP (Definitions VI.1
and IV.2), one should restrict attention to measurable subsets.
See Scully et al. [18, Appendix D] for additional discussion.
We consider these technicalities outside the scope of this
paper. All of our results are predicated on being able to
apply basic optimal stopping theory to solve the Gittins game
(Section VI). Optimal stopping of general Markov processes
is a broad field, and the theory has been developed under
many different types of assumptions [19]. Our main result
(Theorem V.1) can be understood as proving Gittins’s optimality
in any setting where optimal stopping theory of the Gittins
game has been developed.
IV. T HE G ITTINS P OLICY
We now define the Gittins policy, the scheduling policy that
minimizes mean holding cost in the MB /GMP /1 (Section III).
Before defining Gittins, we discuss its intuitive motivation.
Suppose we are scheduling with the goal of minimizing mean
holding cost. How do we decide which job to serve? Because
our objective is minimizing mean holding cost, our aim should
be to quickly lower the holding cost of jobs in the system.
We can lower a job’s holding cost by completing it, in which
case its holding cost becomes hold(xdone ) = 0, or by serving
it until it reaches a state with lower holding cost.
The basic idea of Gittins is to always serve the job whose
holding cost we can decrease the fastest. To formalize this
description, we need to define what it means for a job’s holding
cost to decrease at a certain rate.
A. Gittins Index
As a warm-up, consider the setting of Example III.1: the
scheduler knows every job’s service time, and a job’s state is its
remaining service time. Suppose that every state is preemptible.
How quickly can we decrease the holding cost of a job
in state x, meaning x remaining service time? Serving a job
from state x to state y takes x − y time and decreases the
job’s holding cost by hold(x) − hold(y), so the holding cost
decreases at rate (hold(x) − hold(y))/(x − y). To find the
fastest possible decrease, we optimize over y:
hold(x) − hold(y)
maximum holding cost
= sup
.
decrease rate from x
x−y
y∈[0,x)
The above quantity is called the (Gittins) index of state x. A
state’s index is the maximum rate at which we can decrease
its holding cost by serving it for some amount of time.
To generalize the above discussion to general job models,
we need to make two changes. Firstly, because a job’s state
2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
dynamics can be stochastic, we need to consider serving it until
it enters a set of states Y. Secondly, because we cannot stop
serving a job while it is nonpreemptible, we require Y ⊆ XP .
Definition IV.1. For all x ∈ X and Y ⊆ XP , let
service needed for a job starting in
Serve(x, Y) =
,
state x to first enter Y ∪ {xdone }
serve(x, Y) = E[Serve(x, Y)],
holding cost of a job starting in state x
Hold(x, Y) =
,
when it first enters Y ∪ {xdone }
hold(x, Y) = E[Hold(x, Y)].
To clarify, Serve(x, Y) and Hold(x, Y) are distributions. If
x ∈ Y, then Serve(x, Y) = 0 and Hold(x, Y) = hold(x).
If we serve a job from state x until it enters Y, its holding
cost decreases at rate (hold(x) − hold(x, Y))/ serve(x, Y) on
average. We obtain a state’s Gittins index by optimizing over Y.
Definition IV.2. The (Gittins) index of state x ∈ X is
index(x) = sup
Y⊆XP
hold(x) − hold(x, Y)
.
serve(x, Y)
When we say that a job has a certain index, we mean that the
job’s current state has that index.
Given the definition of the Gittins index, the Gittins policy
boils down to one rule: at every moment in time, unless the
job in service is nonpreemptible, serve the job of maximal
Gittins index, breaking ties arbitrarily.
Because the Gittins index depends on the job model, it might
be more accurate to view Gittins not as one specific policy but
rather as a family of policies, with one instance for every job
model. When we refer to “the” Gittins policy, we mean the
Gittins policy for the current system’s job model.
B. Gittins Rank
Some work on the Gittins policy refers to the (Gittins) rank
of a state [6, 11, 18, 20], which is the reciprocal of its index:
rank(x) =
1
.
index(x)
are policies that make scheduling decisions based only on the
current and past system states.
Theorem V.1. The Gittins policy minimizes mean holding cost
in the MB /GMP /1. That is, for all nonclairvoyant policies π,
E[H Gittins ] ≤ E[H π ].
All of the prior optimality results discussed in Section II
are special cases of Theorem V.1. This makes Theorem V.1 a
unifying theorem for Gittins’s optimality in M/G/1-like systems.
Theorem V.1 also holds in scenarios not covered by any prior
result. For instance, no prior result handles batch arrivals or
holding costs that change during service.
A. Mean Slowdown and Unknown Holding Costs
Recall from Section III-E that we assume that the holding
cost of every job state is known to the scheduler. However,
some scheduling problems involve unknown holding costs. An
important example is minimizing mean slowdown, in which a
job’s holding cost is the reciprocal of its service time. Unless
all service times are known to the scheduler, this involves
unknown holding costs.
Fortunately, we can transform many problems with unknown
holding costs into problems with known holding costs. Suppose
a job’s current unknown holding cost depends only on its
current and future states. Then for all job states x ∈ X, let
unknown holding cost job reached
hold(x) = E
, (V.1)
of a job in state x
state x
where the expectation is taken over a random realization of a
job’s path through the state space. The mean holding cost of
nonclairvoyant policies is unaffected by this transformation.
Example V.2 (Gittins for mean slowdown). Consider the
system from Example III.2. It has unknown service times,
and a job’s state x is its attained service. Suppose all states
are preemptible. To minimize mean slowdown, we give a job
with service time s holding cost s−1 . This turns (V.1) into
hold(x) = E[S −1 | S > x], and the Gittins index becomes
E[S −1 1(S ≤ y) | S > x]
.
y>x E[min{S, y} − x | S > x]
index(x) = sup
VI. T HE G ITTINS G AME
Gittins thus always serves the job of minimal rank.
The Gittins rank sometimes has a more intuitive interpretaIn this section we introduce the Gittins game, which is an
tion than the Gittins index. For instance, when jobs have known optimization problem concerning a single job. The Gittins game
service times and constant holding cost 1, Gittins reduces to serves two purposes. Firstly, it gives an alternative intuition
SRPT, and a job’s rank is its remaining service time.
for the Gittins rank. Secondly, its properties are important
We use both the index and rank conventions in this work. for proving Gittins’s optimality. We define the Gittins game
This section mostly uses the index convention. Sections VI (Section VI-A), study its properties, (Sections VI-B–VI-D),
and VII, which prove Gittins’s optimality, use the rank and explain its relationship to the Gittins rank (Section VI-E).
convention because it better matches the authors’ intuitions,
A. Defining the Gittins Game
though this choice is certainly subjective.
The Gittins game is an optimal stopping problem concerning
V. S COPE OF G ITTINS ’ S O PTIMALITY
a single job. We are given a job in some starting state x ∈ X and
Our main result is that Gittins is optimal in the MB /GMP /1 a penalty parameter r ≥ 0, which has dimension TIME2 /COST.
with arbitrary state-based holding costs. Specifically, Gittins
The goal of the Gittins game is to end the game as soon
is optimal among nonclairvoyant scheduling policies, which as possible. The game proceeds as follows. We begin by
ISBN 978-3-903176-37-9 © 2021 IFIP
2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
serving the job. The job’s state evolves as usual during service Definition VI.3. The optimal give-up set for the Gittins game
(Section III-A). If the job completes, namely by reaching with penalty parameter r is
state xdone , the game ends immediately. Whenever the job’s
Y∗(r) = {x ∈ XP | game(x, r) = r hold(x)}.
state is preemptible, we may give up. If we do so, we stop
serving the job, and the game ends after deterministic delay Noe that Y∗(0) = XP . We also let Y∗(∞) = ∅. For simplicity
r hold(y), where y ∈ XP is the job’s state when we give up. of language, we call Y∗(r) “the” optimal give-up set, even
We assume the job’s current state is always visible. Playing though there may be other optimal give-up sets.
the Gittins game thus boils down to deciding whether or not
Basic results in optimal stopping theory [19] imply that
to give up based on the job’s current state.
game(x,
r) = game(x, r, Y∗(r)), so the infimum in (VI.1) is
Because the job’s state evolution is Markovian, the Gittins
namely by Y∗(r).
game is a Markovian optimal stopping problem. This means always attained,
∗
The sets Y (r) are monotonic in r, i.e. Y∗(r) ⊇ Y∗(r′ ) for
there is an optimal policy of the following form: for some
′
give-up set Y ⊆ XP , give up when the job’s state first enters Y. all r ≤ r . This is because increasing the penalty makes giving
The strong Markov property implies that this set Y need not up less attractive, so giving up is optimal in fewer states.
For most of the rest of this paper, when we discuss the
depend on the starting state, though it may depend on the
Gittins
game, we consider strategies that use optimal give-up
penalty parameter. We use this observation and Definition IV.1
sets,
so
we simplify the notation for that case.
to formally define the Gittins game.
Definition VI.1. The Gittins game is the following optimiza- Definition VI.4. For all x ∈ X and r ≥ 0, let
tion problem. The parameters are a starting state x ∈ X and
penalty parameter r, and the control is a give-up set Y ⊆ XP .
The cost of give-up set Y is
The objective is to choose Y to minimize game(x, r, Y). The
optimal cost or cost-to-go function of the Gittins game is
Y⊆XP
and similarly for serve(x, r), Hold(x, r), and hold(x, r).
D. Derivative of the Cost-To-Go Function
game(x, r, Y) = serve(x, Y) + r hold(x, Y).
game(x, r) = inf game(x, r, Y).
Serve(x, r) = Serve(x, Y∗(r))
(VI.1)
B. Shape of the Cost-To-Go Function
To gain some intuition for the Gittins game, we begin by
proving some properties of the cost-to-go function, focusing
on its behavior as the penalty parameter varies.
Lemma VI.2. For all x ∈ X and r ≥ 0, the cost-to-go
function game(x, r) is (i) nondecreasing in r, (ii) concave in r,
(iii) bounded by game(x, r) ≤ serve(x, XP ) + r hold(x, XP ),
(iv) bounded by game(x, r) ≤ serve(x, ∅). When x ∈ XP,
property (iii) becomes game(x, r) ≤ r hold(x).
Suppose we solve the Gittins game for penalty parameter r,
then change the penalty parameter to r ± ε for some small
ε > 0. One would expect that the give-up set Y∗(r) is nearly
optimal for the new penalty parameter r±ε, which would imply
game(x, r ± ε) ≈ serve(x, r) + (r ± ε) hold(x, r). One can use
Lemma VI.2 and a classic envelope theorem [21, Theorem 1]
to formalize this argument. For brevity, we omit the proof. See
Scully et al. [18, Lemma 5.3] for a similar proof.
Lemma VI.5. For all x ∈ XP, the function r 7→ game(x, r)
is differentiable almost everywhere with derivative
d
game(x, r) = hold(x, r).
dr
E. Relationship to the Gittins Rank
The Gittins game and the optimal give-up set are closely
related to the Gittins rank. In fact, we can use the Gittins game
Proof. Properties (i) and (ii) follow from (VI.1), which exto give an alternative definition of a state’s rank. For brevity,
presses game(x, r) as an infimum of nondecreasing concave
we simply state the connection below.
functions of r. Properties (iii) and (iv) follow from the fact that
two possible give-up sets are XP , meaning giving up as soon as Lemma VI.6.
(i) For all r ≥ 0, we can write the optimal give-up set as
possible, and ∅, meaning never giving up. The simplification
Y∗(r) = {x ∈ XP | rank(x) ≥ r}.
when x ∈ XP is due to Definition IV.1.
(ii) For all x ∈ XP, we can write the Gittins rank of x as
C. Optimal Give-Up Set
rank(x) = max{r ≥ 0 | x ∈ Y∗(r)}.
We now characterize one possible solution to the Gittins
VII. P ROVING G ITTINS ’ S O PTIMALITY
game. Because the Gittins game is a Markovian optimal
We now prove Theorem V.1, namely that Gittins minimizes
stopping problem, we never need to look back at past states
mean
holding cost in the MB /GMP /1. Our proof has four steps.
when deciding when to give up. This means we can find
We
begin
by showing that minimizing mean holding cost E[H]
an optimal give-up set that depends only on the penalty
is
equivalent
to minimizing the mean preemptible holding
parameter r. We ask for each preemptible state: is it optimal
E[H
],
which
only counts the holding costs of jobs in
cost
P
to give up immediately if we start in this state? The set of
preemptible
states
(Section
VII-A). We define a new quantity
states for which we answer yes is an optimal give-up set.
called r-work, the amount of work in the system “below rank r”
ISBN 978-3-903176-37-9 © 2021 IFIP
2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
(Section VII-B). We show how to relate an integral of r-work
to the preemptible holding cost HP , (Section VII-C) with
more r-work implying higher holding cost. We show that
Gittins minimizes mean r-work for all r ≥ 0, so it also
minimizes E[H] (Section VII-D).
A. Preemptible and Nonpreemptible Holding Costs
Definition VII.1. The system’s preemptible holding cost
is the total holding cost of all jobs in the system
whose P
states are preemptible. It has equilibrium distribution
N
HP = i=1 1(Xi ∈ XP ) hold(Xi ),, where 1 is the indicator
function. The nonpreemptible
holding cost is defined analoPN
gously as HNP = i=1 1(Xi ∈ XNP ) hold(Xi ).
Our goal is to show that Gittins minimizes mean holding
cost E[H] = E[HP ] + E[HNP ]. The lemma below shows that
E[HNP ] is unaffected by the scheduling policy. Minimizing
E[H] thus amounts to minimizing E[HP ].
Lemma VII.2. In the MB /GMP /1, the mean nonpreemptible
holding cost has the same value under all scheduling policies:
total cost a job accrues while in a
E[HNP ] = λE
.
nonpreemptible state during service
Proof. By a generalization of Little’s law [1],
total cost a job accrues while
E[HNP ] = λE
.
in a nonpreemptible state
The desired statement follows from the fact that if a job’s state is
nonpreemptible state, it must be in service (Section III-B).
Theorem VII.5. In the MB /GMP /1, under all nonclairvoyant
policies,
Z ∞
E[W (r)] − E[W (0)]
dr.
E[HP ] =
r2
0
Proof. By Lemma VII.4 and the definition of HP it suffices
to show that for all x ∈ XP ,
Z ∞
serve(x, r) − serve(x, 0)
hold(x) =
dr.
(VII.1)
r2
0
Because x ∈ XP , it is optimal to give up in state x when
playing the Gittins game with penalty parameter 0, so
serve(x, 0) = 0,
hold(x, 0) = hold(x).
Using Lemma VI.5, we compute
r hold(x, r) − game(x, r)
− serve(x, r)
d game(x, r)
=
=
.
2
dr
r
r
r2
This means the integral in (VII.1) becomes a difference between
two limits. Using Lemmas VI.2 and VI.5, we compute
Z ∞
game(x, r)
game(x, r)
serve(x, r)
dr = lim
− lim
2
r→∞
r→0
r
r
r
0
= hold(x, 0) − 0 = hold(x).
Theorem VII.5 implies that to minimize E[HP ], it suffices to
minimize E[W (r)] − E[W (0)] for all r ≥ 0. It turns out that
E[W (0)], much like E[HNP ], is unaffected by the scheduling
policy, so it suffices to minimize mean r-work E[W (r)]. We
omit the proof, as it is very similar to that of Lemma VII.2.
Lemma VII.6. In the MB /GMP /1, the mean 0-work E[W (0)]
has the same value under all scheduling policies.
B. Defining r-Work
Definition VII.3. The (job) r-work of state x is Serve(x, r),
namely the amount of service it requires to either complete or
enter a preemptible state of rank at least r.4 The (system) r-work
is the total r-work of all jobs in the system. Its equilibrium
distribution, denoted W (r), is
W (r) =
C. Relating r-Work to Holding Cost
N
X
Serve(Xi , r),
i=1
where (X1 , . . . , XN ) is the equilibrium system state (Section III-D). In particular, we can think of W (0) as the amount
of nonpreemptible work in the system.
Lemma VII.4. For all r ≥ 0,
"N
#
X
E[W (r)] = E
serve(Xi , r) .
i=1
Proof. This follows from the law of total expectation and the
fact that E[Serve(Xi , r) | Xi ] = serve(Xi , r).
4 Strictly speaking, Definitions IV.1 and VI.4 introduce Serve(x, r) as a
distribution, so the r-work of a job in state x is not Serve(x, r) itself but
rather a random variable with distribution Serve(x, r).
ISBN 978-3-903176-37-9 © 2021 IFIP
D. Gittins Minimizes Mean r-Work
Lemmas VII.2 and VII.6 and Theorem VII.5, together imply
that if a scheduling policy minimizes mean r-work E[W (r)] for
all r ≥ 0, then it minimizes mean holding cost E[H]. We show
that Gittins does exactly this, implying Gittins’s optimality.
Theorem VII.7. The Gittins policy minimizes mean r-work in
the MB /GMP /1. That is, for all scheduling policies π and r ≥ 0,
E[W Gittins (r)] ≤ E[W π (r)].
Before proving Theorem VII.7, we introduce the main ideas
behind the proof. For the rest of this section, fix arbitrary r ≥ 0.
We classify jobs in the system into two types.
• A job is r-good if it is nonpreemptible or has Gittins rank
less than r, i.e. its state is in X \ Y∗(r).
• A job is r-bad jobs if it has Gittins rank at least r, i.e.
its state is in Y∗(r).
During service, a job may alternate between being r-good
and r-bad. Gittins minimizes r-work because the jobs that
contribute to r-work are exactly the r-good jobs, and Gittins
always prioritizes r-good jobs over r-bad jobs. This means
that whenever the amount of r-work in the system is positive,
Gittins decreases it at rate 1, which is as quickly as possible.
2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)
Given that Gittins decreases r-work as quickly as possible,
does Theorem VII.7 immediately follow? The answer is no:
we need to look not just at how r-work decreases but also at
how it increases. Two types of events increase r-work.
• Arrivals can add r-work to the system.
• During service, a job can transition from being r-bad to
being r-good as its state evolves. Using the terminology
of Scully et al. [6, 18], we say call this r-recycling the
job. Every r-recycling adds r-work to the system.
Arrivals are outside of the scheduling policy’s control, but
r-recyclings occur at different times under different scheduling
policies. Because Gittins prioritizes r-good jobs over r-bad
jobs, all r-recyclings occur when there is zero r-work. It turns
out that because the batch arrival process is Poisson, this
r-recycling timing minimizes mean r-work.
Proof of Theorem VII.7. We are comparing Gittins to an arbitrary scheduling policy π. It is convenient to allow π to be more
powerful than an ordinary policy: we allow π to devote infinite
processing power to r-bad jobs. This has two implications:
• Whenever there is r-work in the system, π controls at
what rate it decreases, where 1 is the maximum rate.
• Regardless of the rate at which r-work is decreasing,
whenever there is an r-bad job in the system, π controls at
what moment in time it either completes or is r-recycled.
A straightforward interchange argument shows that it suffices to
only compare against policies π which are “r-work-conserving”,
meaning they decrease r-work at rate 1 whenever r-work is
nonzero. Gittins is also r-work-conserving.
It remains only to show that among r-work-conserving
policies, mean r-work is minimized by only r-recycling jobs
when r-work is zero. This follows from classic decomposition
results for the M/G/1 with generalized vacations [22]. We first
explain how to view the r-work in the MB /GMP /1 as the virtual
work in a vacation system.5
• Interpret a batch adding s r-work to the MB /GMP /1 as an
arrival of service time s in the vacation system.
• Interpret an r-recycling adding v r-work to the MB /GMP /1
as a vacation of length v in the vacation system.
Using the above interpretation, a vacation system result of
Miyazawa [22, Theorem 3.3] implies
r-work sampled immediately
E[W π (r)] = c1 + c2 E
,
before π r-recycles a job
where c1 and c2 are constants that depend on the system
parameters but not on the scheduling policy π. Because Gittins
prioritizes r-good jobs over r-bad jobs, Gittins only r-recycles
when r-work is zero. This means the expectation on the
right-hand side is zero under Gittins. But the expectation is
nonnegative in general, so Gittins minimizes mean r-work.
VIII. C ONCLUSION
We have given the first fully general statement (Theorem V.1)
and proof of Gittins’s optimality in the M/G/1. This simultaneously improves upon, unifies, and generalizes prior proofs,
5 Virtual work in a vacation system is total remaining service time of all
jobs in the system plus, if a vacation is in progress, remaining vacation time.
ISBN 978-3-903176-37-9 © 2021 IFIP
all which either apply only in special cases or require limiting
technical assumptions (Section II).
We believe Gittins’s optimality holds even more generally
than we have shown. For example, our proof likely generalizes
to settings with “branching” jobs or additional priority constraints on the scheduler [23, Section 4.7]. It is also sometimes
possible to strengthen the sense in which Gittins is optimal.
For example, SRPT is optimal for non-Poisson arrival times,
and Gittins sometimes stochastically minimizes holding cost
in addition to minimizing the mean.
R EFERENCES
[1] S. L. Brumelle, “On the relation between customer and time averages in
queues,” J. Appl. Probab., vol. 8, no. 3, pp. 508–520, 1971.
[2] J. C. Gittins, Multi-Armed Bandit Allocation Indices, 1st ed., ser. WileyInterscience Series in Systems and Optimization. Chichester, UK: Wiley,
1989.
[3] S. Aalto, U. Ayesta, and R. Righter, “On the Gittins index in the M/G/1
queue,” Queueing Syst., vol. 63, no. 1-4, pp. 437–458, Dec. 2009.
[4] ——, “Properties of the Gittins index with application to optimal
scheduling,” Prob. Eng. Inf. Sci., vol. 25, no. 3, pp. 269–288, Jul. 2011.
[5] E. Hyytiä, S. Aalto, and A. Penttinen, “Minimizing slowdown in
heterogeneous size-aware dispatching systems,” SIGMETRICS Perform.
Eval. Rev., vol. 40, no. 1, pp. 29–40, Jun. 2012.
[6] Z. Scully, M. Harchol-Balter, and A. Scheller-Wolf, “SOAP: One clean
analysis of all age-based scheduling policies,” Proc. ACM Meas. Anal.
Comput. Syst., vol. 2, no. 1, Apr. 2018.
[7] Z. Scully, L. van Kreveld, O. J. Boxma, J.-P. Dorsman, and A. Wierman,
“Characterizing policies with optimal response time tails under heavytailed job sizes,” Proc. ACM Meas. Anal. Comput. Syst., vol. 4, no. 2,
Jun. 2020.
[8] J. C. Gittins, “Bandit processes and dynamic allocation indices,” J. R.
Statist. Soc. B, vol. 41, no. 2, pp. 148–164, Jan. 1979.
[9] L. E. Schrage, “A proof of the optimality of the shortest remaining
processing time discipline,” Oper. Res., vol. 16, no. 3, pp. 687–690, Jun.
1968.
[10] D. W. Fife, “Scheduling with random arrivals and linear loss functions,”
Manag. Sci., vol. 11, no. 3, pp. 429–437, Jan. 1965.
[11] K. C. Sevcik, “The use of service time distributions in scheduling,” Ph.D.
dissertation, University of Chicago, Chicago, IL, Aug. 1971.
[12] G. von Olivier, “Kostenminimale prioritäten in wartesystemen vom typ
M/G/1 [Cost-minimum priorities in queueing systems of type M/G/1],”
Elektron. Rechenanl., vol. 14, no. 6, pp. 262–271, Dec. 1972.
[13] G. P. Klimov, “Time-sharing service systems. I,” Theory Probab. Appl.,
vol. 19, no. 3, pp. 532–551, 1974.
[14] T. L. Lai and Z. Ying, “Open bandit processes and optimal scheduling
of queueing networks,” Adv. Appl. Probab., vol. 20, no. 2, pp. 447–472,
1988.
[15] D. Bertsimas, “The achievable region method in the optimal control of
queueing systems; formulations, bounds and policies,” Queueing Syst.,
vol. 21, no. 3, pp. 337–389, Sep. 1995.
[16] M. Dacre, K. D. Glazebrook, and J. Niño-Mora, “The achievable region
approach to the optimal control of stochastic systems,” J. R. Statist. Soc.
B, vol. 61, no. 4, pp. 747–791, 1999.
[17] P. Whittle, “Tax problems in the undiscounted case,” J. Appl. Probab.,
vol. 42, no. 3, pp. 754–765, Sep. 2005.
[18] Z. Scully, I. Grosof, and M. Harchol-Balter, “The Gittins policy is nearly
optimal in the M/G/k under extremely general conditions,” Proc. ACM
Meas. Anal. Comput. Syst., vol. 4, no. 3, Nov. 2020.
[19] G. Peskir and A. N. Shiryaev, Optimal Stopping and Free-Boundary
Problems, ser. Lectures in Mathematics. ETH Zürich. Basel: Birkhäuser
Verlag, 2006.
[20] K. C. Sevcik, “Scheduling for minimum total loss using service time
distributions,” J. ACM, vol. 21, no. 1, pp. 66–75, Jan. 1974.
[21] P. Milgrom and I. Segal, “Envelope theorems for arbitrary choice sets,”
Econometrica, vol. 70, no. 2, pp. 583–601, Mar. 2002.
[22] M. Miyazawa, “Decomposition formulas for single server queues with
vacations : A unified approach by the rate conservation law,” Commun.
Statist.—Stochastic Models, vol. 10, no. 2, pp. 389–413, Jan. 1994.
[23] J. C. Gittins, K. D. Glazebrook, and R. Weber, Multi-Armed Bandit
Allocation Indices, 2nd ed. Chichester, UK: Wiley, 2011.