Academia.eduAcademia.edu

The Gittins Policy in the M/G/1 Queue

2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)

The Gittins policy is a highly general scheduling policy that minimizes a wide variety of mean holding cost metrics in the M/G/1 queue. Perhaps most famously, Gittins minimizes mean response time in the M/G/1 when jobs' service times are unknown to the scheduler. Gittins also minimizes weighted versions of mean response time. For example, the well-known "cµ rule", which minimizes class-weighted mean response time in the multiclass M/M/1, is a special case of Gittins. However, despite the extensive literature on Gittins in the M/G/1, it contains no fully general proof of Gittins's optimality. This is because Gittins was originally developed for the multiarmed bandit problem. Translating arguments from the multiarmed bandit to the M/G/1 is technically demanding, so it has only been done rigorously in some special cases. The extent of Gittins's optimality in the M/G/1 is thus not entirely clear. In this work we provide the first fully general proof of Gittins's optimality in the M/G/1. The optimality result we obtain is even more general than was previously known. For example, we show that Gittins minimizes mean slowdown in the M/G/1 with unknown or partially known service times, and we show that Gittins's optimality holds under batch arrivals. Our proof uses a novel approach that works directly with the M/G/1, avoiding the difficulties of translating from the multi-armed bandit problem. * Supported by NSF grant nos. CMMI-1938909 and CSR-1763701 and a Google Faculty Award. 1 A job's response time is the amount of time between its arrival and completion. Jobs may be sorted into classes which are weighted by importance. A job's slowdown is the ratio between its response time and service time.

2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) The Gittins Policy in the M/G/1 Queue Ziv Scully* Mor Harchol-Balter* Carnegie Mellon University Pittsburgh, PA, USA [email protected] Carnegie Mellon University Pittsburgh, PA, USA [email protected] Abstract—The Gittins policy is a highly general scheduling policy that minimizes a wide variety of mean holding cost metrics in the M/G/1 queue. Perhaps most famously, Gittins minimizes mean response time in the M/G/1 when jobs’ service times are unknown to the scheduler. Gittins also minimizes weighted versions of mean response time. For example, the well-known “cµ rule”, which minimizes class-weighted mean response time in the multiclass M/M/1, is a special case of Gittins. However, despite the extensive literature on Gittins in the M/G/1, it contains no fully general proof of Gittins’s optimality. This is because Gittins was originally developed for the multiarmed bandit problem. Translating arguments from the multiarmed bandit to the M/G/1 is technically demanding, so it has only been done rigorously in some special cases. The extent of Gittins’s optimality in the M/G/1 is thus not entirely clear. In this work we provide the first fully general proof of Gittins’s optimality in the M/G/1. The optimality result we obtain is even more general than was previously known. For example, we show that Gittins minimizes mean slowdown in the M/G/1 with unknown or partially known service times, and we show that Gittins’s optimality holds under batch arrivals. Our proof uses a novel approach that works directly with the M/G/1, avoiding the difficulties of translating from the multi-armed bandit problem. TABLE I.1 G ITTINS O PTIMALITY R ESULTS FOR M/G/1-L IKE Q UEUES Holding Cost Model Preemption Service Times Prior Proofs a All equal M/G/1 M/GMP /1 allowed allowed known see Section III 1 11 By class M/G/1 allowed M/G/1+fbkb not allowed M/M/1+fbkb allowed unknown unknown unknown 4, 8 6, 9 7, 9, 10 By class and service timec M/G/1 M/G/1 M/G/1 known known unknown 2 3 5 not allowed allowed allowed a A list of prior proofs appears in Section II. Here “fbk” stands for feedback, meaning that whenever a job exits the system, it has some probability of being replaced by another job. c This includes minimizing mean slowdown, in which a job’s holding cost is the reciprocal of its service time. b then serves the job of maximal index at every decision time. The “magic” of Gittins is in how it determines each job’s index, which we describe in detail in Section IV. SRPT and the cµ rule are both special cases of Gittins. Given its generality, it is perhaps unsurprising that the Gittins I. I NTRODUCTION policy has been discovered several times. Similarly, there Scheduling to minimize mean holding cost in queueing are several proofs of its optimality under varying conditions. systems is an important problem. Minimizing metrics such as Table I.1 summarizes several M/G/1-like settings in which mean response time, weighted mean response time, and mean slowdown can all be viewed as special cases of minimizing Gittins has been studied, and Section II gives a more detailed holding cost [1].1 In single-server queueing systems, specifi- overview. In light of this substantial body of prior work, the cally the M/G/1 and similar systems, a number of scheduling optimality of Gittins for minimizing mean holding cost in the policies minimize mean holding cost in various special cases. M/G/1 is widely accepted in the literature [3–7]. We ourselves are researchers whose work often cites Gittins’s Two famous examples are the Shortest Remaining Processing Time (SRPT) policy, which minimizes mean response time when optimality in the M/G/1. However, in reviewing the literature, service times are known to the scheduler, and the “cµ rule”, we found that there is no complete proof of Gittins’s optimality which minimizes weighted mean response time in the multiclass in its full generality. This is in part because Gittins was originally developed not for the M/G/1 but for the Markovian M/M/1 with unknown service times. It turns out that there is a policy that minimizes mean holding multi-armed bandit problem [8]. There are elegant arguments cost in the M/G/1 under very general conditions. This policy, for Gittins’s optimality in the multi-armed bandit problem, but now known as the Gittins policy after one of its principal they do not easily translate to the M/G/1. Results for the M/G/1 creators [2], has a relatively simple form. Gittins assigns each thus suffer from a variety of limitations (Section II), so the job an index, which is a rating roughly corresponding to how extent of Gittins’s optimality in the M/G/1 is not entirely clear. In this work, we give a unifying presentation of the Gittins valuable it would be to serve that job. A job’s index depends only on its own state, not the state of any other jobs. Gittins policy in M/G/1-like systems, resulting in the most general definition of Gittins (Definition IV.2) and optimality theorem * Supported by NSF grant nos. CMMI-1938909 and CSR-1763701 and a (Theorem V.1) to date. Our approach deals directly with the Google Faculty Award. M/G/1, avoiding the difficulties of translating from the multi1 A job’s response time is the amount of time between its arrival and completion. Jobs may be sorted into classes which are weighted by importance. armed bandit problem. As a result, we actually extend the A job’s slowdown is the ratio between its response time and service time. known scope of Gittins’s optimality, such as including systems ISBN 978-3-903176-37-9 © 2021 IFIP 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) with batch arrivals. We make the following contributions: • We discuss many prior proofs of Gittins’s optimality, detailing the limitations of each one (Section II). • We give a new general definition of the Gittins policy (Section IV). This involves introducing a new generalization of the M/G/1 called the MB /GMP /1 queue (Section III). • We state (Section V) and prove (Sections VI and VII) Gittins’s optimality in the MB /GMP /1. II. H ISTORY OF THE G ITTINS P OLICY IN THE M/G/1 In this section we review prior work on the Gittins policy in M/G/1-like queues. This includes work on special cases of Gittins, such as SRPT in the case of known service times, that are not typically thought of as instances of the Gittins policy. Unfortunately, every prior proof of Gittins’s optimality is limited in some way. Most limitations are one of the following: (i) Job finiteness. Most proofs assume some type of “finiteness” of the job model. This manifests as one of (i-a) all service times being less than some finite bound, (i-b) service time distributions being discrete with finitely many support points, or (i-c) finitely many job classes. (ii) Simple job model or metric. Some proof techniques that work for simple job models do not readily generalize. This includes models with (ii-a) known service times, (ii-b) unknown, exponentially distributed service times, or (ii-c) unknown, generally distributed service times with nonpreemptive service. (iii) Only considers index policies. Some proofs only show that Gittins is an optimal index policy, as opposed to optimal among all policies. An index policy is one that, like Gittins, assigns each job an index based on the job’s state and always serves the job of maximum index. We now present prior work on the Gittins policy in rough chronological order. Due to space limitations, we only briefly summarize most prior proofs. A. 1960s and 1970s: Initial Developments in Queueing Prior Proof 1. Schrage [9]. Model: Preemptive single-server queue, known service times. Holding costs: Same for all jobs. Limitations: (ii-a). Prior Proof 2. Fife [10, Section 4]. Model: Nonpreemptive M/G/1, known service times. Holding costs: Based on class and service time. Limitations: (i-b), (i-c), (ii-a), and (ii-c) Prior Proof 3. Sevcik [11, Theorem 4-1]. Model: Preemptive M/G/1, known service times. Holding costs: Based on class and service time. Limitations: (ii-a) and (iii). Sevcik [11, Conjecture 4-1] argues informally that an index policy should be optimal. Prior Proof 4. Sevcik [11, Theorem 4-2]. Model: Preemptive M/G/1, unknown service times. ISBN 978-3-903176-37-9 © 2021 IFIP Holding costs: Based on class. Limitations: (i-b), (i-c), and (iii). Sevcik [11, Conjecture 4-3] argues informally that an index policy should be optimal. Prior Proof 5. Von Olivier [12]. Model: Preemptive M/G/1, unknown service times. Holding costs: Based on class and service time. Limitations: (i-b), (i-c), and (iii). One unique aspect of the von Olivier [12] result deserves highlighting: jobs’ holding costs can depend on their unknown service times. This allows minimizing metrics like mean slowdown even when service times are unknown. However, this result is not widely known in the queueing theory community, perhaps in part because it has only been published in German. Klimov [13] studied a nonpreemptive M/G/1 with feedback, denoted M/G/1+fbk. In systems with feedback, whenever a job exits the system, it has some probability of immediately returning as another job, possibly of a different class. Prior Proof 6. Klimov [13]. Model: Nonpreemptive M/G/1+fbk, unknown service times. Holding costs: Based on class. Limitations: (i-c) and (ii-c). B. 1980s and 1990s: Connection to Multi-Armed Bandits Prior Proof 7. Lai and Ying [14]. Model: Preemptive M/M/1+fbk, unknown service times. Holding costs: Based on class. Limitations: (i-c) and (ii-b). Prior Proof 8. Gittins [2, Theorem 5.6]. Model: Preemptive M/G/1, unknown service times. Holding costs: Based on class. Limitations: (i-a) and (i-c). Gittins’s result [2] is often cited in the literature as proving the Gittins policy’s optimality in the M/G/1 [3–7]. As such, it deserves some more detailed discussion. Prior Proof 8 has two main steps. The first step simplifies the problem by assuming the scheduler can only preempt jobs in a discrete set of states2 [2, Theorem 3.28]. The set can be countable in principle, but the proof assumes a side condition that is only guaranteed to hold if the set is finite. The condition comes from translating multi-armed bandit results to the M/G/1. The second step uses a limit argument to allow unrestricted preemption [2, Theorem 5.6]. However, because the first step is limited to finitely many job states, the second step’s result is also limited. Specifically, it requires finitely many classes and that all service times be less than some finite bound. Prior Proof 9. Achievable region approaches. See Bertsimas [15], Dacre et al. [16], and references therein. Model: Preemptive M/M/1+fbk or nonpreemptive M/G/1+fbk, unknown service times. Holding costs: Based on class. Limitations: (i-c), (ii-b), and (ii-c). 2 In this setting, a job’s state is the pair of its class and attained service. 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) C. 2000s and 2010s: Analyzing Gittins and Its Performance A. Markov-Process Jobs The 2000s and 2010s did not, for the most part, see new proofs of Gittins’s optimality. Researchers instead studied properties of the Gittins policy [3] and analyzed its performance [5–7, 17]. One performance analysis based on dynamic programming also gave a new optimality proof [17], but it did not expand the known scope of Gittins’s optimality. We model jobs as absorbing continuous-time strong Markov processes. The state of a job encodes all information that the scheduler knows about the job. Without loss of generality, we assume all jobs share a common state space X and follow the same stochastic Markovian dynamics. However, the realization of the dynamics may be different for each job. In particular, the initial state of each job is drawn from a distribution Xnew , so different jobs may start in different states. While a job is in service, its state stochastically advances according to the Markovian dynamics. This evolution is independent of the arrival process and the evolution of other jobs. A job’s state does not change while waiting in the queue. In addition to the main job state space X, there is one additional final state, denoted xdone . When a job enters state xdone , it completes and exits the system. One can think of a service time S as the stochastic amount of time it takes for a job to go from its initial state, which is drawn from Xnew , to the final state xdone . Because we assume E[S] < ∞, every job eventually reaches xdone with probability 1. For ease of notation, we follow the convention that xdone 6∈ X. Prior Proof 10. Whittle [17]. Model: Preemptive M/M/1+fbk, unknown service times. Holding costs: Based on class. Limitations: (i-c) and (ii-b). D. 2020: Modeling Jobs as General Markov Processes Prior Proof 11. Scully et al. [18, Theorem 7.3]. Model: Preemptive M/GMP /1, i.e. the preemptive MB /GMP /1 (Section III) without batch arrivals. Holding costs: Same for all jobs. Limitations: Assumes equal holding costs and that jobs are preemptible in any state. Our work can be seen as a significant extension of Prior Proof 11. Specific aspects we address that Scully et al. [18] do not include varying holding costs, nonpreemptible or partially preemptible jobs, and batch arrivals. III. S YSTEM M ODEL : THE M B /G MP /1 Q UEUE We study scheduling in a generalization of the M/G/1 queue to minimize a variety of mean holding cost metrics. The average job arrival rate is λ, the service time distribution is S, and the load is ρ = λE[S]. We assume ρ < 1 for stability. We call our model the MB /GMP /1 queue. The “MB ” indicates that jobs arrive in batches with Poisson arrival times. The “GMP ” indicates generally distributed service times, with each job’s service time arising from an underlying Markov process. The main feature of the MB /GMP /1 is that it models jobs as Markov processes. The key intuition is: A job’s state encodes all information the scheduler knows about the job. Example III.1. To model known service times, let a job’s state be its remaining service time. The state space is X = (0, ∞), the initial state distribution Xnew is the service time distribution S, and the final state is xdone = 0. During service, a job’s state decreases at rate 1. Example III.2. To model unknown service times, let a job’s state be its attained service, meaning the amount of time it has been served so far. The state space is X = [0, ∞), all jobs start in initial state Xnew = 0, and the final state xdone is an isolated point. During service, a job’s state increases at rate 1, but it also has a chance to jump to xdone . The jump probability depends on the service time distribution S: the probability a job jumps while being served from state x to state y > x is P[S ≤ y | S > x]. B. Preemptible and Nonpreemptible States Every job state is either preemptible or nonpreemptible. The job in service can only be preempted if it is in a preemptible state. We write XP for the set of preemptible states and XNP = X \ XP for the set of nonpreemptible states. Naturally, we assume the scheduler knows which states are preemptible. We assume all jobs start in a preemptible state, i.e. Xnew ∈ XP with probability 1. This means that all jobs in the queue are in preemptible states, and only the job in service can be in a nonpreemptible state. We assume preemption occurs with no cost or delay. Because a job’s state only changes during service, our model is preemptresume, meaning that preemption does not cause loss of work. This means that the job Markov process differs depending on what information the scheduler knows. For example, to model the perfect-information case where the scheduler is told every job’s service time when it arrives, a job’s state might be its remaining service time, and the Markov process dynamics would be deterministic (Example III.1). On the other extreme, if the scheduler knows nothing other than the overall service time distribution S, then a job’s state might be the amount of service it has received so far, and the Markov process dynamics would be stochastic (Example III.2). The MB /GMP /1 thus encompasses a wide variety of M/G/1-like queues. This section explains the MB /GMP /1 queue in more detail. C. Batch Poisson Arrival Process The model’s main feature is that the information the scheduler In the MB /GMP /1, jobs arrive in batches. We represent a knows about a job may change as the job receives service batch as a list of states, where the ith state is the initial state (Section III-A). A job’s preemptibility (Section III-B) and of the ith job in the batch. The batch vector has distribution holding cost (Section III-E) may also change during its service. Xbatch = (Xbatch,1 , . . . , Xbatch,B ), where B is the distribution ISBN 978-3-903176-37-9 © 2021 IFIP 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) of the number of jobs per batch. The batch arrival times are a Poisson process of rate λ/E[B], with each batch drawn independently from Xbatch . The initial state distribution Xnew is an aggregate distribution determined by picking a random element from a length-biased sample of Xbatch . We allow Xbatch to be an arbitrary distribution over lists of preemptible states. That is, the starting states of the jobs within a batch can be correlated with each other or with the size of a batch. However, after arrival, jobs’ states evolve independently of each other (Section III-A). Our MB /GMP /1 model differs from the traditional M/G/1 with batch Poisson arrivals, often denoted MX /G/1, in an important way. In the MX /G/1, service times within a batch are drawn i.i.d. from S. The MB /GMP /1 is more general in that starting states within a batch can be correlated, so service times within a batch can also be correlated. D. System State The state of the system can be described by a list (x1 , . . . , xn ). Here n is the number of jobs in the system, and xi ∈ X is the state of the ith job. We denote the equilibrium distribution of the system state as (X1 , . . . , XN ), where N is the equilibrium distribution of the number of jobs. When discussing the equilibrium distribution of quantities under multiple scheduling policies, we use a superscript π, as in N π , to refer to the distribution under scheduling policy π. E. Holding Costs and Objective We assume that there each job incurs a cost for each unit of time it is not complete. Such a cost is called a holding cost, and it applies to every job. A job’s holding cost depends on its state, so it may change during service. We denote the holding cost of state x ∈ X by hold(x). Holding costs have dimension COST / TIME. We assume that holding costs are deterministic, positive,3 and known to the scheduler. For ease of notation, we also define PNhold(xdone ) = 0. Let H = i=1 hold(Xi ) be the equilibrium distribution of the total holding cost of all jobs in the system. Our objective is to schedule to minimize mean holding cost E[H]. F. What Does the Scheduler Know? The scheduler also knows, at every moment in time, the current state of all jobs in the system. This assumption is natural because the intuition of our model is that a job’s state encodes everything the scheduler knows about the job. We assume the scheduler knows a description of the job model: the state space X, the subset of preemptible states XP ⊆ X, and the Markovian dynamics that govern how a job’s state evolves. This assumption is necessary for the Gittins policy, as the policy’s definition depends on the job model. Finally, we assume that the scheduler knows the holding cost hold(x) of each state x ∈ X. However, it is possible to transform some problems with unknown holding costs into problems with known holding costs. A notable example is 3 The holding cost of nonpreemptible states does not impact minimizing mean holding cost (Lemma VII.2), so one could have hold(x) ≤ 0 for x ∈ XNP . ISBN 978-3-903176-37-9 © 2021 IFIP minimizing mean slowdown when service times are unknown to the scheduler (Example V.2). After transforming such problems into known-holding-cost form, one can apply our results. G. Technical Foundations We have thus far avoided discussing technical measurability conditions that the job model must satisfy. For example, if the job Markov process has uncountable state space X, one should make some topological assumptions on X and XP , as well as some continuity assumptions on holding costs. As another example, when discussing subsets Y ⊆ XP (Definitions VI.1 and IV.2), one should restrict attention to measurable subsets. See Scully et al. [18, Appendix D] for additional discussion. We consider these technicalities outside the scope of this paper. All of our results are predicated on being able to apply basic optimal stopping theory to solve the Gittins game (Section VI). Optimal stopping of general Markov processes is a broad field, and the theory has been developed under many different types of assumptions [19]. Our main result (Theorem V.1) can be understood as proving Gittins’s optimality in any setting where optimal stopping theory of the Gittins game has been developed. IV. T HE G ITTINS P OLICY We now define the Gittins policy, the scheduling policy that minimizes mean holding cost in the MB /GMP /1 (Section III). Before defining Gittins, we discuss its intuitive motivation. Suppose we are scheduling with the goal of minimizing mean holding cost. How do we decide which job to serve? Because our objective is minimizing mean holding cost, our aim should be to quickly lower the holding cost of jobs in the system. We can lower a job’s holding cost by completing it, in which case its holding cost becomes hold(xdone ) = 0, or by serving it until it reaches a state with lower holding cost. The basic idea of Gittins is to always serve the job whose holding cost we can decrease the fastest. To formalize this description, we need to define what it means for a job’s holding cost to decrease at a certain rate. A. Gittins Index As a warm-up, consider the setting of Example III.1: the scheduler knows every job’s service time, and a job’s state is its remaining service time. Suppose that every state is preemptible. How quickly can we decrease the holding cost of a job in state x, meaning x remaining service time? Serving a job from state x to state y takes x − y time and decreases the job’s holding cost by hold(x) − hold(y), so the holding cost decreases at rate (hold(x) − hold(y))/(x − y). To find the fastest possible decrease, we optimize over y:   hold(x) − hold(y) maximum holding cost = sup . decrease rate from x x−y y∈[0,x) The above quantity is called the (Gittins) index of state x. A state’s index is the maximum rate at which we can decrease its holding cost by serving it for some amount of time. To generalize the above discussion to general job models, we need to make two changes. Firstly, because a job’s state 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) dynamics can be stochastic, we need to consider serving it until it enters a set of states Y. Secondly, because we cannot stop serving a job while it is nonpreemptible, we require Y ⊆ XP . Definition IV.1. For all x ∈ X and Y ⊆ XP , let   service needed for a job starting in Serve(x, Y) = , state x to first enter Y ∪ {xdone } serve(x, Y) = E[Serve(x, Y)],   holding cost of a job starting in state x Hold(x, Y) = , when it first enters Y ∪ {xdone } hold(x, Y) = E[Hold(x, Y)]. To clarify, Serve(x, Y) and Hold(x, Y) are distributions. If x ∈ Y, then Serve(x, Y) = 0 and Hold(x, Y) = hold(x). If we serve a job from state x until it enters Y, its holding cost decreases at rate (hold(x) − hold(x, Y))/ serve(x, Y) on average. We obtain a state’s Gittins index by optimizing over Y. Definition IV.2. The (Gittins) index of state x ∈ X is index(x) = sup Y⊆XP hold(x) − hold(x, Y) . serve(x, Y) When we say that a job has a certain index, we mean that the job’s current state has that index. Given the definition of the Gittins index, the Gittins policy boils down to one rule: at every moment in time, unless the job in service is nonpreemptible, serve the job of maximal Gittins index, breaking ties arbitrarily. Because the Gittins index depends on the job model, it might be more accurate to view Gittins not as one specific policy but rather as a family of policies, with one instance for every job model. When we refer to “the” Gittins policy, we mean the Gittins policy for the current system’s job model. B. Gittins Rank Some work on the Gittins policy refers to the (Gittins) rank of a state [6, 11, 18, 20], which is the reciprocal of its index: rank(x) = 1 . index(x) are policies that make scheduling decisions based only on the current and past system states. Theorem V.1. The Gittins policy minimizes mean holding cost in the MB /GMP /1. That is, for all nonclairvoyant policies π, E[H Gittins ] ≤ E[H π ]. All of the prior optimality results discussed in Section II are special cases of Theorem V.1. This makes Theorem V.1 a unifying theorem for Gittins’s optimality in M/G/1-like systems. Theorem V.1 also holds in scenarios not covered by any prior result. For instance, no prior result handles batch arrivals or holding costs that change during service. A. Mean Slowdown and Unknown Holding Costs Recall from Section III-E that we assume that the holding cost of every job state is known to the scheduler. However, some scheduling problems involve unknown holding costs. An important example is minimizing mean slowdown, in which a job’s holding cost is the reciprocal of its service time. Unless all service times are known to the scheduler, this involves unknown holding costs. Fortunately, we can transform many problems with unknown holding costs into problems with known holding costs. Suppose a job’s current unknown holding cost depends only on its current and future states. Then for all job states x ∈ X, let   unknown holding cost job reached hold(x) = E , (V.1) of a job in state x state x where the expectation is taken over a random realization of a job’s path through the state space. The mean holding cost of nonclairvoyant policies is unaffected by this transformation. Example V.2 (Gittins for mean slowdown). Consider the system from Example III.2. It has unknown service times, and a job’s state x is its attained service. Suppose all states are preemptible. To minimize mean slowdown, we give a job with service time s holding cost s−1 . This turns (V.1) into hold(x) = E[S −1 | S > x], and the Gittins index becomes E[S −1 1(S ≤ y) | S > x] . y>x E[min{S, y} − x | S > x] index(x) = sup VI. T HE G ITTINS G AME Gittins thus always serves the job of minimal rank. The Gittins rank sometimes has a more intuitive interpretaIn this section we introduce the Gittins game, which is an tion than the Gittins index. For instance, when jobs have known optimization problem concerning a single job. The Gittins game service times and constant holding cost 1, Gittins reduces to serves two purposes. Firstly, it gives an alternative intuition SRPT, and a job’s rank is its remaining service time. for the Gittins rank. Secondly, its properties are important We use both the index and rank conventions in this work. for proving Gittins’s optimality. We define the Gittins game This section mostly uses the index convention. Sections VI (Section VI-A), study its properties, (Sections VI-B–VI-D), and VII, which prove Gittins’s optimality, use the rank and explain its relationship to the Gittins rank (Section VI-E). convention because it better matches the authors’ intuitions, A. Defining the Gittins Game though this choice is certainly subjective. The Gittins game is an optimal stopping problem concerning V. S COPE OF G ITTINS ’ S O PTIMALITY a single job. We are given a job in some starting state x ∈ X and Our main result is that Gittins is optimal in the MB /GMP /1 a penalty parameter r ≥ 0, which has dimension TIME2 /COST. with arbitrary state-based holding costs. Specifically, Gittins The goal of the Gittins game is to end the game as soon is optimal among nonclairvoyant scheduling policies, which as possible. The game proceeds as follows. We begin by ISBN 978-3-903176-37-9 © 2021 IFIP 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) serving the job. The job’s state evolves as usual during service Definition VI.3. The optimal give-up set for the Gittins game (Section III-A). If the job completes, namely by reaching with penalty parameter r is state xdone , the game ends immediately. Whenever the job’s Y∗(r) = {x ∈ XP | game(x, r) = r hold(x)}. state is preemptible, we may give up. If we do so, we stop serving the job, and the game ends after deterministic delay Noe that Y∗(0) = XP . We also let Y∗(∞) = ∅. For simplicity r hold(y), where y ∈ XP is the job’s state when we give up. of language, we call Y∗(r) “the” optimal give-up set, even We assume the job’s current state is always visible. Playing though there may be other optimal give-up sets. the Gittins game thus boils down to deciding whether or not Basic results in optimal stopping theory [19] imply that to give up based on the job’s current state. game(x, r) = game(x, r, Y∗(r)), so the infimum in (VI.1) is Because the job’s state evolution is Markovian, the Gittins namely by Y∗(r). game is a Markovian optimal stopping problem. This means always attained, ∗ The sets Y (r) are monotonic in r, i.e. Y∗(r) ⊇ Y∗(r′ ) for there is an optimal policy of the following form: for some ′ give-up set Y ⊆ XP , give up when the job’s state first enters Y. all r ≤ r . This is because increasing the penalty makes giving The strong Markov property implies that this set Y need not up less attractive, so giving up is optimal in fewer states. For most of the rest of this paper, when we discuss the depend on the starting state, though it may depend on the Gittins game, we consider strategies that use optimal give-up penalty parameter. We use this observation and Definition IV.1 sets, so we simplify the notation for that case. to formally define the Gittins game. Definition VI.1. The Gittins game is the following optimiza- Definition VI.4. For all x ∈ X and r ≥ 0, let tion problem. The parameters are a starting state x ∈ X and penalty parameter r, and the control is a give-up set Y ⊆ XP . The cost of give-up set Y is The objective is to choose Y to minimize game(x, r, Y). The optimal cost or cost-to-go function of the Gittins game is Y⊆XP and similarly for serve(x, r), Hold(x, r), and hold(x, r). D. Derivative of the Cost-To-Go Function game(x, r, Y) = serve(x, Y) + r hold(x, Y). game(x, r) = inf game(x, r, Y). Serve(x, r) = Serve(x, Y∗(r)) (VI.1) B. Shape of the Cost-To-Go Function To gain some intuition for the Gittins game, we begin by proving some properties of the cost-to-go function, focusing on its behavior as the penalty parameter varies. Lemma VI.2. For all x ∈ X and r ≥ 0, the cost-to-go function game(x, r) is (i) nondecreasing in r, (ii) concave in r, (iii) bounded by game(x, r) ≤ serve(x, XP ) + r hold(x, XP ), (iv) bounded by game(x, r) ≤ serve(x, ∅). When x ∈ XP, property (iii) becomes game(x, r) ≤ r hold(x). Suppose we solve the Gittins game for penalty parameter r, then change the penalty parameter to r ± ε for some small ε > 0. One would expect that the give-up set Y∗(r) is nearly optimal for the new penalty parameter r±ε, which would imply game(x, r ± ε) ≈ serve(x, r) + (r ± ε) hold(x, r). One can use Lemma VI.2 and a classic envelope theorem [21, Theorem 1] to formalize this argument. For brevity, we omit the proof. See Scully et al. [18, Lemma 5.3] for a similar proof. Lemma VI.5. For all x ∈ XP, the function r 7→ game(x, r) is differentiable almost everywhere with derivative d game(x, r) = hold(x, r). dr E. Relationship to the Gittins Rank The Gittins game and the optimal give-up set are closely related to the Gittins rank. In fact, we can use the Gittins game Proof. Properties (i) and (ii) follow from (VI.1), which exto give an alternative definition of a state’s rank. For brevity, presses game(x, r) as an infimum of nondecreasing concave we simply state the connection below. functions of r. Properties (iii) and (iv) follow from the fact that two possible give-up sets are XP , meaning giving up as soon as Lemma VI.6. (i) For all r ≥ 0, we can write the optimal give-up set as possible, and ∅, meaning never giving up. The simplification Y∗(r) = {x ∈ XP | rank(x) ≥ r}. when x ∈ XP is due to Definition IV.1. (ii) For all x ∈ XP, we can write the Gittins rank of x as C. Optimal Give-Up Set rank(x) = max{r ≥ 0 | x ∈ Y∗(r)}. We now characterize one possible solution to the Gittins VII. P ROVING G ITTINS ’ S O PTIMALITY game. Because the Gittins game is a Markovian optimal We now prove Theorem V.1, namely that Gittins minimizes stopping problem, we never need to look back at past states mean holding cost in the MB /GMP /1. Our proof has four steps. when deciding when to give up. This means we can find We begin by showing that minimizing mean holding cost E[H] an optimal give-up set that depends only on the penalty is equivalent to minimizing the mean preemptible holding parameter r. We ask for each preemptible state: is it optimal E[H ], which only counts the holding costs of jobs in cost P to give up immediately if we start in this state? The set of preemptible states (Section VII-A). We define a new quantity states for which we answer yes is an optimal give-up set. called r-work, the amount of work in the system “below rank r” ISBN 978-3-903176-37-9 © 2021 IFIP 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) (Section VII-B). We show how to relate an integral of r-work to the preemptible holding cost HP , (Section VII-C) with more r-work implying higher holding cost. We show that Gittins minimizes mean r-work for all r ≥ 0, so it also minimizes E[H] (Section VII-D). A. Preemptible and Nonpreemptible Holding Costs Definition VII.1. The system’s preemptible holding cost is the total holding cost of all jobs in the system whose P states are preemptible. It has equilibrium distribution N HP = i=1 1(Xi ∈ XP ) hold(Xi ),, where 1 is the indicator function. The nonpreemptible holding cost is defined analoPN gously as HNP = i=1 1(Xi ∈ XNP ) hold(Xi ). Our goal is to show that Gittins minimizes mean holding cost E[H] = E[HP ] + E[HNP ]. The lemma below shows that E[HNP ] is unaffected by the scheduling policy. Minimizing E[H] thus amounts to minimizing E[HP ]. Lemma VII.2. In the MB /GMP /1, the mean nonpreemptible holding cost has the same value under all scheduling policies:   total cost a job accrues while in a E[HNP ] = λE . nonpreemptible state during service Proof. By a generalization of Little’s law [1],   total cost a job accrues while E[HNP ] = λE . in a nonpreemptible state The desired statement follows from the fact that if a job’s state is nonpreemptible state, it must be in service (Section III-B). Theorem VII.5. In the MB /GMP /1, under all nonclairvoyant policies, Z ∞ E[W (r)] − E[W (0)] dr. E[HP ] = r2 0 Proof. By Lemma VII.4 and the definition of HP it suffices to show that for all x ∈ XP , Z ∞ serve(x, r) − serve(x, 0) hold(x) = dr. (VII.1) r2 0 Because x ∈ XP , it is optimal to give up in state x when playing the Gittins game with penalty parameter 0, so serve(x, 0) = 0, hold(x, 0) = hold(x). Using Lemma VI.5, we compute r hold(x, r) − game(x, r) − serve(x, r) d game(x, r) = = . 2 dr r r r2 This means the integral in (VII.1) becomes a difference between two limits. Using Lemmas VI.2 and VI.5, we compute Z ∞ game(x, r) game(x, r) serve(x, r) dr = lim − lim 2 r→∞ r→0 r r r 0 = hold(x, 0) − 0 = hold(x). Theorem VII.5 implies that to minimize E[HP ], it suffices to minimize E[W (r)] − E[W (0)] for all r ≥ 0. It turns out that E[W (0)], much like E[HNP ], is unaffected by the scheduling policy, so it suffices to minimize mean r-work E[W (r)]. We omit the proof, as it is very similar to that of Lemma VII.2. Lemma VII.6. In the MB /GMP /1, the mean 0-work E[W (0)] has the same value under all scheduling policies. B. Defining r-Work Definition VII.3. The (job) r-work of state x is Serve(x, r), namely the amount of service it requires to either complete or enter a preemptible state of rank at least r.4 The (system) r-work is the total r-work of all jobs in the system. Its equilibrium distribution, denoted W (r), is W (r) = C. Relating r-Work to Holding Cost N X Serve(Xi , r), i=1 where (X1 , . . . , XN ) is the equilibrium system state (Section III-D). In particular, we can think of W (0) as the amount of nonpreemptible work in the system. Lemma VII.4. For all r ≥ 0, "N # X E[W (r)] = E serve(Xi , r) . i=1 Proof. This follows from the law of total expectation and the fact that E[Serve(Xi , r) | Xi ] = serve(Xi , r). 4 Strictly speaking, Definitions IV.1 and VI.4 introduce Serve(x, r) as a distribution, so the r-work of a job in state x is not Serve(x, r) itself but rather a random variable with distribution Serve(x, r). ISBN 978-3-903176-37-9 © 2021 IFIP D. Gittins Minimizes Mean r-Work Lemmas VII.2 and VII.6 and Theorem VII.5, together imply that if a scheduling policy minimizes mean r-work E[W (r)] for all r ≥ 0, then it minimizes mean holding cost E[H]. We show that Gittins does exactly this, implying Gittins’s optimality. Theorem VII.7. The Gittins policy minimizes mean r-work in the MB /GMP /1. That is, for all scheduling policies π and r ≥ 0, E[W Gittins (r)] ≤ E[W π (r)]. Before proving Theorem VII.7, we introduce the main ideas behind the proof. For the rest of this section, fix arbitrary r ≥ 0. We classify jobs in the system into two types. • A job is r-good if it is nonpreemptible or has Gittins rank less than r, i.e. its state is in X \ Y∗(r). • A job is r-bad jobs if it has Gittins rank at least r, i.e. its state is in Y∗(r). During service, a job may alternate between being r-good and r-bad. Gittins minimizes r-work because the jobs that contribute to r-work are exactly the r-good jobs, and Gittins always prioritizes r-good jobs over r-bad jobs. This means that whenever the amount of r-work in the system is positive, Gittins decreases it at rate 1, which is as quickly as possible. 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) Given that Gittins decreases r-work as quickly as possible, does Theorem VII.7 immediately follow? The answer is no: we need to look not just at how r-work decreases but also at how it increases. Two types of events increase r-work. • Arrivals can add r-work to the system. • During service, a job can transition from being r-bad to being r-good as its state evolves. Using the terminology of Scully et al. [6, 18], we say call this r-recycling the job. Every r-recycling adds r-work to the system. Arrivals are outside of the scheduling policy’s control, but r-recyclings occur at different times under different scheduling policies. Because Gittins prioritizes r-good jobs over r-bad jobs, all r-recyclings occur when there is zero r-work. It turns out that because the batch arrival process is Poisson, this r-recycling timing minimizes mean r-work. Proof of Theorem VII.7. We are comparing Gittins to an arbitrary scheduling policy π. It is convenient to allow π to be more powerful than an ordinary policy: we allow π to devote infinite processing power to r-bad jobs. This has two implications: • Whenever there is r-work in the system, π controls at what rate it decreases, where 1 is the maximum rate. • Regardless of the rate at which r-work is decreasing, whenever there is an r-bad job in the system, π controls at what moment in time it either completes or is r-recycled. A straightforward interchange argument shows that it suffices to only compare against policies π which are “r-work-conserving”, meaning they decrease r-work at rate 1 whenever r-work is nonzero. Gittins is also r-work-conserving. It remains only to show that among r-work-conserving policies, mean r-work is minimized by only r-recycling jobs when r-work is zero. This follows from classic decomposition results for the M/G/1 with generalized vacations [22]. We first explain how to view the r-work in the MB /GMP /1 as the virtual work in a vacation system.5 • Interpret a batch adding s r-work to the MB /GMP /1 as an arrival of service time s in the vacation system. • Interpret an r-recycling adding v r-work to the MB /GMP /1 as a vacation of length v in the vacation system. Using the above interpretation, a vacation system result of Miyazawa [22, Theorem 3.3] implies   r-work sampled immediately E[W π (r)] = c1 + c2 E , before π r-recycles a job where c1 and c2 are constants that depend on the system parameters but not on the scheduling policy π. Because Gittins prioritizes r-good jobs over r-bad jobs, Gittins only r-recycles when r-work is zero. This means the expectation on the right-hand side is zero under Gittins. But the expectation is nonnegative in general, so Gittins minimizes mean r-work. VIII. C ONCLUSION We have given the first fully general statement (Theorem V.1) and proof of Gittins’s optimality in the M/G/1. This simultaneously improves upon, unifies, and generalizes prior proofs, 5 Virtual work in a vacation system is total remaining service time of all jobs in the system plus, if a vacation is in progress, remaining vacation time. ISBN 978-3-903176-37-9 © 2021 IFIP all which either apply only in special cases or require limiting technical assumptions (Section II). We believe Gittins’s optimality holds even more generally than we have shown. For example, our proof likely generalizes to settings with “branching” jobs or additional priority constraints on the scheduler [23, Section 4.7]. It is also sometimes possible to strengthen the sense in which Gittins is optimal. For example, SRPT is optimal for non-Poisson arrival times, and Gittins sometimes stochastically minimizes holding cost in addition to minimizing the mean. R EFERENCES [1] S. L. Brumelle, “On the relation between customer and time averages in queues,” J. Appl. Probab., vol. 8, no. 3, pp. 508–520, 1971. [2] J. C. Gittins, Multi-Armed Bandit Allocation Indices, 1st ed., ser. WileyInterscience Series in Systems and Optimization. Chichester, UK: Wiley, 1989. [3] S. Aalto, U. Ayesta, and R. Righter, “On the Gittins index in the M/G/1 queue,” Queueing Syst., vol. 63, no. 1-4, pp. 437–458, Dec. 2009. [4] ——, “Properties of the Gittins index with application to optimal scheduling,” Prob. Eng. Inf. Sci., vol. 25, no. 3, pp. 269–288, Jul. 2011. [5] E. Hyytiä, S. Aalto, and A. Penttinen, “Minimizing slowdown in heterogeneous size-aware dispatching systems,” SIGMETRICS Perform. Eval. Rev., vol. 40, no. 1, pp. 29–40, Jun. 2012. [6] Z. Scully, M. Harchol-Balter, and A. Scheller-Wolf, “SOAP: One clean analysis of all age-based scheduling policies,” Proc. ACM Meas. Anal. Comput. Syst., vol. 2, no. 1, Apr. 2018. [7] Z. Scully, L. van Kreveld, O. J. Boxma, J.-P. Dorsman, and A. Wierman, “Characterizing policies with optimal response time tails under heavytailed job sizes,” Proc. ACM Meas. Anal. Comput. Syst., vol. 4, no. 2, Jun. 2020. [8] J. C. Gittins, “Bandit processes and dynamic allocation indices,” J. R. Statist. Soc. B, vol. 41, no. 2, pp. 148–164, Jan. 1979. [9] L. E. Schrage, “A proof of the optimality of the shortest remaining processing time discipline,” Oper. Res., vol. 16, no. 3, pp. 687–690, Jun. 1968. [10] D. W. Fife, “Scheduling with random arrivals and linear loss functions,” Manag. Sci., vol. 11, no. 3, pp. 429–437, Jan. 1965. [11] K. C. Sevcik, “The use of service time distributions in scheduling,” Ph.D. dissertation, University of Chicago, Chicago, IL, Aug. 1971. [12] G. von Olivier, “Kostenminimale prioritäten in wartesystemen vom typ M/G/1 [Cost-minimum priorities in queueing systems of type M/G/1],” Elektron. Rechenanl., vol. 14, no. 6, pp. 262–271, Dec. 1972. [13] G. P. Klimov, “Time-sharing service systems. I,” Theory Probab. Appl., vol. 19, no. 3, pp. 532–551, 1974. [14] T. L. Lai and Z. Ying, “Open bandit processes and optimal scheduling of queueing networks,” Adv. Appl. Probab., vol. 20, no. 2, pp. 447–472, 1988. [15] D. Bertsimas, “The achievable region method in the optimal control of queueing systems; formulations, bounds and policies,” Queueing Syst., vol. 21, no. 3, pp. 337–389, Sep. 1995. [16] M. Dacre, K. D. Glazebrook, and J. Niño-Mora, “The achievable region approach to the optimal control of stochastic systems,” J. R. Statist. Soc. B, vol. 61, no. 4, pp. 747–791, 1999. [17] P. Whittle, “Tax problems in the undiscounted case,” J. Appl. Probab., vol. 42, no. 3, pp. 754–765, Sep. 2005. [18] Z. Scully, I. Grosof, and M. Harchol-Balter, “The Gittins policy is nearly optimal in the M/G/k under extremely general conditions,” Proc. ACM Meas. Anal. Comput. Syst., vol. 4, no. 3, Nov. 2020. [19] G. Peskir and A. N. Shiryaev, Optimal Stopping and Free-Boundary Problems, ser. Lectures in Mathematics. ETH Zürich. Basel: Birkhäuser Verlag, 2006. [20] K. C. Sevcik, “Scheduling for minimum total loss using service time distributions,” J. ACM, vol. 21, no. 1, pp. 66–75, Jan. 1974. [21] P. Milgrom and I. Segal, “Envelope theorems for arbitrary choice sets,” Econometrica, vol. 70, no. 2, pp. 583–601, Mar. 2002. [22] M. Miyazawa, “Decomposition formulas for single server queues with vacations : A unified approach by the rate conservation law,” Commun. Statist.—Stochastic Models, vol. 10, no. 2, pp. 389–413, Jan. 1994. [23] J. C. Gittins, K. D. Glazebrook, and R. Weber, Multi-Armed Bandit Allocation Indices, 2nd ed. Chichester, UK: Wiley, 2011.