Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems
Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems
Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems
[111]
112 N. Hilgert and O. Hernández-Lerma
In this paper we give conditions for the existence of expected average cost
(EAC) optimal and sample path average cost (SPAC) optimal policies for
the limiting control system (3).
The motivation to study models of the type (1) comes from our interest
in time-varying control systems of the form
(4) xn+1 = Gn (xn , an ) + ξn (n ∈ N),
where the drift term Gn (x, a) tends to “stabilize” as in (2). Such kind of
systems appears, for example, in the modelling of some biotechnological
processes ([2, 10]).
Our study is partly a sequel to [9], where we considered related α-
discounted cost (α-DC) problems, for α ∈ (0, 1). In fact, our approach in the
present average cost (AC) case is similar to that in [9] in the sense that we
first consider the AC problems for the time-invariant system for each fixed
n ∈ N, and then we let n → ∞ to obtain the corresponding AC results for
the limiting system (3). A key fact to get the latter results is that, from (2),
we obtain the convergence
(5) Qn (B | x, a) → Q∞ (B | x, a),
where, for each n ∈ N ∪ {∞}, Qn denotes the transition law of (1), namely,
(6) Qn (B | x, a) := Prob(xt+1 ∈ B | xt = x, at = a) (t ∈ N).
(A precise statement of (5) is given in Lemma 10(iii).)
Still another connection with [9] is that the existence of EAC optimal
policies is obtained via the well known “vanishing discount” approach (see,
for instance, [1, 3, 4, 5], [6, Chapter 4], [7, Chapter 10]). Thus an important
ingredient in the proof of Theorem 4 below is the α-DC optimality equation
in [9, Theorem 3] for n = ∞. Finally, the existence of SPAC optimal policies
is obtained as a consequence of Theorem 4 and recent results in [8].
The remainder of the paper is organized as follows. In §2 we introduce
some terminology and the AC criteria we are interested in. In §3 we present
the hypotheses and state our main results, Theorem 4 and Corollary 8, which
are proved in §4.
in (7) stands for the (nonempty) set of admissible controls (or actions) when
the state is x. Moreover, we suppose that the set
K = {(x, a) | x ∈ X, a ∈ A(x)}
of admissible state-action pairs is a Borel subset of X × A. Finally, Qn
denotes the transition law in (6), defined for all B ∈ B(X) and (x, a) ∈ K,
whereas c : K → R is a measurable function that represents the cost-per-
stage.
Let Π be the set of all (randomized, history-dependent) admissible con-
trol policies. If necessary, see [1, 3, 6, 7, 8] for further information on those
policies. Let F be the set of all measurable functions f : X → A with values
f (x) in A(x) for all x ∈ X. As usual, for Markov control processes, we will
identify F with the subfamily ΠDS of Π that consists of the (determinis-
tic) stationary policies. (The Borel measurability of K and Assumption 2(a)
below ensure that F—hence Π—is nonempty.)
For notational ease, let us define
c(x, f ) := c(x, f (x)) and Qn (· | x, f ) := Qn (· | x, f (x)),
for all f ∈ F, x ∈ X and n = 0, 1, . . . , ∞.
AC criteria. For each fixed n = 0, 1, . . . , ∞, let Jn0 (π, x) be the long-run
sample path average cost (SPAC) using the policy π ∈ Π, for the initial
state x0 = x, when the control model is Mn in (7); see also (1). That is,
t−1
1X
(8) Jn0 (π, x) := lim sup c(xi , ai ).
t→∞ t i=0
A policy π ∗ ∈ Π is said to be SPAC optimal for the control model Mn if
there exists a constant %bn such that
∗
(9) Jn0 (π ∗ , x) = %bn Px(n)π -a.s. ∀x ∈ X, and
Jn0 (π, x) ≥ %bn Px(n)π -a.s. ∀π ∈ Π, x ∈ X,
(n)π
where Px denotes the probability measure corresponding to Mn when
using the policy π, for the initial state x0 = x. The constant %bn is called the
optimal sample path average cost for the model Mn .
The “expected” analogue of (8) is the long-run expected average cost
(EAC):
1 X t−1
(10) Jn (π, x) := lim sup Ex(n)π c(xi , ai ) .
t→∞ t
i=0
Result (b) of Theorem 4 can then be viewed as a further result of [9], where
the convergence of Vnα (x) to V∞
α
(x) was proved for all x ∈ X and each fixed
α ∈ (0, 1).
As a consequence of Theorem 4, we may show the existence of SPAC
optimal policies under the following assumption.
Assumption 7. There exists a constant c2 ≥ 0 such that
c2 (x, a) ≤ c2 w(x) ∀(x, a) ∈ K.
With this additional assumption we have:
Corollary 8. Suppose that Assumptions 1, 2, 3 and 7 are satisfied.
Then, for each n = 0, 1, . . . , ∞, a stationary policy is EAC optimal for the
model Mn if and only if it is SPAC optimal for Mn ; hence, by Theorem 4,
there exists a policy fn ∈ F that is SPAC optimal for Mn and , furthermore,
%n is the optimal sample path average cost, that is, %n = %bn where %bn is the
constant in (9). In particular ,
%b∞ = lim %n = lim lim sup(1 − α)Vnα (x) ∀x ∈ X.
n→∞ n→∞ α%1
for all x ∈ X.
Lemma 12. (a) Under Assumption 1, for all f ∈ F, x ∈ X, B ∈ B(X)
and α ∈ (0, 1),
(24) Vnα (f, x, IB ) → V∞
α
(f, x, IB ) as n → ∞.
(b) Moreover , if in addition Assumptions 2 and 3 hold , then
(25) µn,f (B) → µ∞,f (B) as n → ∞.
Proof. (a) By Lemma 10(iii), and by the formula
Qtn (B | x, f ) = Qt−1
n (B | y, f ) Qn (dy | x, f ), t ≥ 1,
X
Limiting average cost control problems 119
where Vnα (x) is the optimal α-DC function (18) for the model Mn . Observe
that uα
n (·) is a nonnegative function.
From (29) and Assumption 2(c), it is easy to verify that
α c1 ν
0 ≤ Vn (x) ≤ 1+ w(x),
1−α 1−β
which implies
ν
0 ≤ %n (α) ≤ c1 1 + inf w(x).
1−β X
Therefore, there is a number %n such that lim supα%1 %n (α) = %n . For each
n, let {αn,k } % 1 be a sequence of “discount factors” such that %n =
Limiting average cost control problems 121
Now in (34) replace α with αn,k and take the lower limit as k → ∞. Then,
by Fatou’s Lemma, we get the ACOI (14) for all n = 0, 1, . . . , ∞.
The existence of fn ∈ F that satisfies (15) is assured by Lemma 10(b)
and Assumption 2(a), if we use a well known measurable selector theorem
(see [6, Proposition D.5], for example). Moreover, by Lemma 14 and the fact
that the cost c(·, ·) is nonnegative, [7, Theorem 10.3.1] gives
(35) %n = inf Jn (f ) = Jn (fn ) = Jn∗ (x) ∀x ∈ X,
f ∈F
with Jn (f ) as in (26).
Finally, as mentioned in Remark 6, the fact that
%n = lim sup(1 − α)Vnα (x) for all x ∈ X
α%1
where h(x) := lim inf k→∞ hnk (x). Then, by Fatou’s Lemma, and by ap-
plying a general result on the interchange of limits and minima (see [6,
Lemma 4.2.4]), we obtain
h i
(42) % + h(x) ≥ min c(x, a) + h(y) Q∞ (dy | x, a) ∀x ∈ X.
a∈A(x)
X
Moreover, as in part (a), Lemma 10(b) and Assumption 2(a) yield the ex-
istence of a stationary policy f ∈ F that attains the minimum in (42); that
is, (42) becomes
(43) % + h(x) ≥ c(x, f ) + h(y) Q∞ (dy | x, f ) ∀x ∈ X.
X
On the other hand, iteration of the latter inequality yields, for all N =
1, 2, . . . ,
N
X −1 N
X −1
N % + h(x) ≥ Ex(∞)f c(xt , f ) + Ex(∞)f h(xN ) ≥ Ex(∞)f c(xt , f ),
t=0 t=0
References
[3] E. Gordienko and O. Hernández-Lerma, Average cost Markov control policies with
weighted norms: existence of canonical policies, Appl. Math. (Warsaw) 23 (1995),
199–218.
[4] O. Hernández-Lerma, Average optimality in dynamic programming on Borel spaces
—Unbounded costs and controls, Systems Control Lett. 17 (1991), 237–242.
[5] O. Hernández-Lerma and J. B. Lasserre, Average cost optimal policies for Markov
control processes with Borel state space and unbounded costs, ibid. 15 (1990), 349–
356.
[6] —, —, Discrete-Time Markov Control Processes: Basic Optimality Criteria,
Springer, New York, 1996.
[7] —, —, Further Topics on Discrete-Time Markov Control Processes, Springer, New
York, 1999.
[8] O. Hernández-Lerma, O. Vega-Amaya and G. Carrasco, Sample-path optimality and
variance-minimization of average cost Markov control processes, SIAM J. Control
Optim. 38 (1999), 79–93.
[9] N. Hilgert and O. Hernández-Lerma, Limiting optimal discounted-cost control of a
class of time-varying stochastic systems, Systems Control Lett. 40 (2000), 37–42.
[10] N. Hilgert, R. Senoussi and J.-P. Vila, Nonparametric estimation of time-varying
autoregressive nonlinear processes, C. R. Acad. Sci. Paris Sér. I Math. 323 (1996),
1085–1090.
[11] A. Hordijk and A. A. Yushkevich, Blackwell optimality in the class of all policies
in Markov decision chains with Borel state space and unbounded rewards, Math.
Methods Oper. Res. 50 (1999), 421–448.
[12] N. V. Kartashov, Inequalities in theorems of ergodicity and stability for Markov
chains with common phase space, II , Theory Probab. Appl. 30 (1986), 507–515.
[13] —, Strong Stable Markov Chains, VSP, Utrecht, 1996.
[14] H. U. Küenle, Markov games with average cost criterion under a geometric drift
condition, paper presented at the 10th INFORMS Applied Probability Conference,
University of Ulm, July 26–28, 1999.
[15] F. Luque-Vásquez and O. Hernández-Lerma, Semi-Markov control models with av-
erage costs, Appl. Math. (Warsaw) 26 (1999), 315–331.
[16] A. S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games,
Math. Methods Oper. Res. 50 (1999), 399–419.
[17] —, Sensitive equilibria for ergodic stochastic games with countable state-spaces,
ibid., 65–76.
Received on 25.11.1999;
revised version on 6.11.2000 (1512)