Limiting Average Cost Control Problems in A Class of Discrete-Time Stochastic Systems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

APPLICATIONES MATHEMATICAE

28,1 (2001), pp. 111–123

Nadine Hilgert (México and Montpellier) and


Onésimo Hernández-Lerma (México)

LIMITING AVERAGE COST CONTROL PROBLEMS


IN A CLASS OF DISCRETE-TIME STOCHASTIC SYSTEMS

Abstract. We consider a class of Rd -valued stochastic control systems,


with possibly unbounded costs. The systems evolve according to a discrete-
time equation xt+1 = Gn (xt , at ) + ξt (t = 0, 1, . . .), for each fixed n =
0, 1, . . . , where the ξt are i.i.d. random vectors, and the Gn are given func-
tions converging pointwise to some function G∞ as n → ∞. Under suitable
hypotheses, our main results state the existence of stationary control poli-
cies that are expected average cost (EAC) optimal and sample path average
cost (SPAC) optimal for the limiting control system xt+1 = G∞ (xt , at ) + ξt
(t = 0, 1, . . .).

1. Introduction. We are concerned with a discrete-time Rd -valued


stochastic control process evolving according to the system equation
(1) xt+1 = Gn (xt , at ) + ξt (t ∈ N)
(with N := {0, 1, . . .}), for each fixed n ∈ N, where xt and at denote the state
and control variables, respectively, and {ξt } is a sequence of independent
and identically distributed (i.i.d.) random vectors. Suppose that {Gn } is a
sequence of given functions converging pointwise to some function G∞ , that
is,
(2) Gn (x, a) → G∞ (x, a) for all (x, a),
and consider the limiting control system
(3) xt+1 = G∞ (xt , at ) + ξt (t ∈ N).

2000 Mathematics Subject Classification: 93E20, 90C40.


Key words and phrases: non-homogeneous Markov control processes, discrete-time
stochastic systems, long-run average cost criteria, discounted cost, optimal policies.
The research of the second author was partially supported by the Consejo Nacional
de Ciencia y Tecnologı́a (CONACYT, México) grant 32299-E.

[111]
112 N. Hilgert and O. Hernández-Lerma

In this paper we give conditions for the existence of expected average cost
(EAC) optimal and sample path average cost (SPAC) optimal policies for
the limiting control system (3).
The motivation to study models of the type (1) comes from our interest
in time-varying control systems of the form
(4) xn+1 = Gn (xn , an ) + ξn (n ∈ N),
where the drift term Gn (x, a) tends to “stabilize” as in (2). Such kind of
systems appears, for example, in the modelling of some biotechnological
processes ([2, 10]).
Our study is partly a sequel to [9], where we considered related α-
discounted cost (α-DC) problems, for α ∈ (0, 1). In fact, our approach in the
present average cost (AC) case is similar to that in [9] in the sense that we
first consider the AC problems for the time-invariant system for each fixed
n ∈ N, and then we let n → ∞ to obtain the corresponding AC results for
the limiting system (3). A key fact to get the latter results is that, from (2),
we obtain the convergence
(5) Qn (B | x, a) → Q∞ (B | x, a),
where, for each n ∈ N ∪ {∞}, Qn denotes the transition law of (1), namely,
(6) Qn (B | x, a) := Prob(xt+1 ∈ B | xt = x, at = a) (t ∈ N).
(A precise statement of (5) is given in Lemma 10(iii).)
Still another connection with [9] is that the existence of EAC optimal
policies is obtained via the well known “vanishing discount” approach (see,
for instance, [1, 3, 4, 5], [6, Chapter 4], [7, Chapter 10]). Thus an important
ingredient in the proof of Theorem 4 below is the α-DC optimality equation
in [9, Theorem 3] for n = ∞. Finally, the existence of SPAC optimal policies
is obtained as a consequence of Theorem 4 and recent results in [8].
The remainder of the paper is organized as follows. In §2 we introduce
some terminology and the AC criteria we are interested in. In §3 we present
the hypotheses and state our main results, Theorem 4 and Corollary 8, which
are proved in §4.

2. Markov control models. For each fixed n = 0, 1, . . . , ∞, we first


introduce the Markov control model Mn associated with the system (1).
That is,
(7) Mn := (X, A, {A(x) | x ∈ X}, Qn , c) for n = 0, 1, . . . , ∞,
with state space X = R (or a Borel subset of Rd ) and control (or action)
d

set A, which is assumed to be a Borel space, namely, a Borel subset of some


complete separable metric space. The spaces X and A are endowed with
their Borel σ-algebras B(X), B(A). For each x ∈ X, the set A(x) ∈ B(A)
Limiting average cost control problems 113

in (7) stands for the (nonempty) set of admissible controls (or actions) when
the state is x. Moreover, we suppose that the set
K = {(x, a) | x ∈ X, a ∈ A(x)}
of admissible state-action pairs is a Borel subset of X × A. Finally, Qn
denotes the transition law in (6), defined for all B ∈ B(X) and (x, a) ∈ K,
whereas c : K → R is a measurable function that represents the cost-per-
stage.
Let Π be the set of all (randomized, history-dependent) admissible con-
trol policies. If necessary, see [1, 3, 6, 7, 8] for further information on those
policies. Let F be the set of all measurable functions f : X → A with values
f (x) in A(x) for all x ∈ X. As usual, for Markov control processes, we will
identify F with the subfamily ΠDS of Π that consists of the (determinis-
tic) stationary policies. (The Borel measurability of K and Assumption 2(a)
below ensure that F—hence Π—is nonempty.)
For notational ease, let us define
c(x, f ) := c(x, f (x)) and Qn (· | x, f ) := Qn (· | x, f (x)),
for all f ∈ F, x ∈ X and n = 0, 1, . . . , ∞.
AC criteria. For each fixed n = 0, 1, . . . , ∞, let Jn0 (π, x) be the long-run
sample path average cost (SPAC) using the policy π ∈ Π, for the initial
state x0 = x, when the control model is Mn in (7); see also (1). That is,
t−1
1X
(8) Jn0 (π, x) := lim sup c(xi , ai ).
t→∞ t i=0
A policy π ∗ ∈ Π is said to be SPAC optimal for the control model Mn if
there exists a constant %bn such that

(9) Jn0 (π ∗ , x) = %bn Px(n)π -a.s. ∀x ∈ X, and
Jn0 (π, x) ≥ %bn Px(n)π -a.s. ∀π ∈ Π, x ∈ X,
(n)π
where Px denotes the probability measure corresponding to Mn when
using the policy π, for the initial state x0 = x. The constant %bn is called the
optimal sample path average cost for the model Mn .
The “expected” analogue of (8) is the long-run expected average cost
(EAC):
1 X t−1 
(10) Jn (π, x) := lim sup Ex(n)π c(xi , ai ) .
t→∞ t
i=0

The corresponding value function is


(11) Jn∗ (x) := inf Jn (π, x) ∀x ∈ X.
π∈Π
114 N. Hilgert and O. Hernández-Lerma

A policy π ∗ ∈ Π is said to be EAC optimal for the control model Mn if it


attains the infimum in (11).
In the following section we show the existence of stationary policies
that are SPAC and EAC optimal for the model Mn , for each fixed n =
0, 1, . . . , ∞, and the convergence of the optimal costs Jn∗ and %bn as n → ∞.

3. Main results. We shall require three different sets of hypotheses.


The first one, Assumption 1, refers to the system (4):
Assumption 1. (a) For each n ∈ N, the function Gn : K → X is con-
tinuous, and, furthermore, there exists a continuous function G∞ : K → X
for which (2) holds for all (x, a) ∈ K.
(b) The X-valued random disturbances ξt are i.i.d., and their common
distribution is absolutely continuous (with respect to the Lebesgue measure)
with a continuous, bounded and positive density γ.
The next two assumptions are more specific to the control model (7),
and have already been used in previous works—see for instance [3, 8, 15] or
[7, Chapter 10].
Assumption 2. For each state x ∈ X:
(a) A(x) is a (nonempty) compact subset of A, and
(b) c(x, ·) is lower semicontinuous (l.s.c.) on A(x).
(c) Moreover, c(x, a) is nonnegative, and there exist a constant c1 ≥ 0
and a measurable function w ≥ 1 on X such that, for each x ∈ X and n ∈ N,
sup c(x, a) ≤ c1 w(x).
a∈A(x)

Notation. Let w(·) be as in Assumption 2, and let Bw (X) be the Ba-


nach space of measurable functions u : X → R with a finite w-norm kukw ,
which is defined as
(12) kukw := sup |u(x)|/w(x).
x∈X

Further, B0 (X) ⊂ Bw (X) denotes the subspace of bounded measurable


functions on X. We write an integral as µ(u) := u dµ, whenever it is well
defined.
Assumption 3. There exist a probability measure ν and two constants
γ > 0 and β ∈ (0, 1) for which the following holds: for each f ∈ F and
n ∈ N, there is a measurable function 0 ≤ lf,n (·) ≤ 1 such that
(a) Qn (B | x, f ) ≥ lf,n (x)ν(B) ∀x ∈ X, B ∈ B(X);
(b) ν(lf,n ) := X lf,n dν ≥ γ;
(c) ν(w) := X w dν = kνkw < ∞; and
Limiting average cost control problems 115

(d) for all x ∈ X,


(13) w(y) Qn (dy | x, f ) ≤ βw(x) + lf,n (x)ν(w).
X
For a discussion of Assumption 3, as well as related ergodicity conditions
and examples, see, for instance, [7, §10.2.C] or [8, Remark 6.1]. Similar
ergodicity conditions have been used by other authors, in particular [11, 14]
or [16, 17].
We now state our first result, which is proved in §4.
Theorem 4. Suppose that Assumptions 1 to 3 hold. Then:
(a) For each n = 0, 1, . . . , ∞, there exist a constant %n , a nonnegative
function hn in Bw (X), and a stationary policy fn ∈ F such that, for each
state x ∈ X, the Average Cost Optimality Inequality (ACOI ) holds, i.e.,
h i
(14) %n + hn (x) ≥ min c(x, a) + hn (y) Qn (dy | x, a) ,
a∈A(x)
X
and , moreover , fn (x) ∈ A(x) attains the minimum in (14), so that
(15) %n + hn (x) ≥ c(x, fn ) + hn (y) Qn (dy | x, fn ).
X
In addition, fn is EAC optimal for the model Mn , with %n being the optimal
value:
(16) %n = Jn∗ (x) ∀x ∈ X.
(b) The sequence {%n } converges to %∞ as n → ∞.
Remark 5. The ACOI (14) is not the same as in the standard EAC
optimality results. Indeed, as Assumptions 1 to 3 are stated for each finite
n ∈ N, it is well known that the ACOI (14) holds for finite n ∈ N. The
interesting point of Theorem 4 is that (14) also holds for n = ∞.
Remark 6. Theorem 4 is proved in §4 via the α-DC optimality (see [9]),
using the “vanishing discount” approach (see [1, 3, 4, 6, 7]), so let us recall
the definition of the α-DC cost, α ∈ (0, 1), for the control model Mn , using
the policy π ∈ Π with the initial state x0 = x:
X
∞ 
(17) Vnα (π, x) := Ex(n)π αt c(xt , at ) .
t=0
The corresponding optimal α-DC function is
(18) Vnα (x) := inf Vnα (π, x).
π∈Π
It will be shown that, for all x ∈ X and n = 0, 1, . . . , ∞,
(19) %n = Jn∗ (x) = lim sup(1 − α)Vnα (x).
α%1
116 N. Hilgert and O. Hernández-Lerma

Result (b) of Theorem 4 can then be viewed as a further result of [9], where
the convergence of Vnα (x) to V∞
α
(x) was proved for all x ∈ X and each fixed
α ∈ (0, 1).
As a consequence of Theorem 4, we may show the existence of SPAC
optimal policies under the following assumption.
Assumption 7. There exists a constant c2 ≥ 0 such that
c2 (x, a) ≤ c2 w(x) ∀(x, a) ∈ K.
With this additional assumption we have:
Corollary 8. Suppose that Assumptions 1, 2, 3 and 7 are satisfied.
Then, for each n = 0, 1, . . . , ∞, a stationary policy is EAC optimal for the
model Mn if and only if it is SPAC optimal for Mn ; hence, by Theorem 4,
there exists a policy fn ∈ F that is SPAC optimal for Mn and , furthermore,
%n is the optimal sample path average cost, that is, %n = %bn where %bn is the
constant in (9). In particular ,
%b∞ = lim %n = lim lim sup(1 − α)Vnα (x) ∀x ∈ X.
n→∞ n→∞ α%1

4. Proofs. Before proving Theorem 4 itself we shall state some prelim-


inary facts.
Lemma 9. For each fixed n in N, the state process {xt } (t = 0, 1, . . .),
following the model Mn under a stationary policy f ∈ F, is a Markov chain
that, under Assumption 1, is aperiodic and λ-irreducible, where λ denotes
the Lebesgue measure. If Assumption 3 is also satisfied , then the chain is
positive Harris-recurrent. We denote by µf,n the corresponding (unique) in-
variant probability measure (i.p.m.), which satisfies µf,n (w) < ∞ and the
following property: there exist two constants R ≥ 0 and r ∈ (0, 1) such that

t
(20) u(y) Q n (dy | x, f ) − µ f,n (u) ≤ Rrt kukw w(x)
X

for all f ∈ F, x ∈ X, u ∈ Bw (X), n ∈ N and t = 0, 1, . . . , which means, in


other words, that {xt } is w-geometrically ergodic.
Proof. Choose arbitrary n in N and f in F. Then, by Assumption 1(b),
we may write Qn as
(21) Qn (B | x, f ) = γ[s − Gn (x, f (x))] ds.
B

As the density γ is positive, Qn (B | x, f ) > 0 for all x ∈ X if λ(B) > 0.


Moreover, it is easy to check that {xt } has the weak Feller property, and
{xt } is, therefore, λ-irreducible.
Limiting average cost control problems 117

The aperiodicity is a straightforward consequence of the λ-irreducibility


and the fact that γ is positive.
The positive Harris-recurrence is proved, for example, in [15, Lemma 4.1],
and the w-geometric ergodicity of {xt } is due to [3, Lemmas 3.3 and 3.4],
which follows ideas of [12, 13]. Furthermore, the fact that R and r are
independent of f comes directly from the latter references, while the inde-
pendence of n results from Assumption 3, because γ, β and ν are supposed
to be independent of n, and {lf,n } is uniformly bounded in n.
Lemma 10. (a) If Assumption 1 holds, then for each n = 0, 1, . . . , ∞:
(i) The transition law Qn is strongly continuous, that is, Qn (B | ·) is
continuous on K for every Borel set B∈B(X). (Equivalently, u(y) Qn (dy | ·)
is continuous on K for each function u in B0 (X).)
(ii) The function u0 (·) := u(y) Qn (dy | ·) is l.s.c. on K for every non-
negative function u in Bw (X); in particular , u0 (x, ·) is l.s.c. on A(x) for
each x ∈ X.
(iii) Qn (· | x, a) converges setwise to Q∞ (· | x, a) as n → ∞; that is, (5)
holds for each B ∈ B(X).
(b) As a consequence of (ii) and Assumption 2(b), for each n=0,1, . . . , ∞,
x ∈ X and u ∈ Bw (X) nonnegative, the function c(x, ·) + u(y) Qn (dy|x, ·)
is l.s.c. on A(x).
Proof. Part (i) follows from (21) and the continuity of γ and Gn . The
Monotone Convergence Theorem and the Dominated Convergence Theorem
are respectively used to prove (ii) and (iii). See [9].
Lemma 11. Under Assumptions 1 to 3:
(a) Assumption 3 holds for n = ∞; that is, for each f in F, there exists
a measurable function 0 ≤ lf,∞ (·) ≤ 1 that satisfies the inequalities (a), (b)
and (d) of Assumption 3.
(b) The results of Lemma 9 hold for n = ∞.
Proof. (a) We will show that lf,∞ := lim supn→∞ lf,n satisfies inequali-
ties (a), (b) and (d) in Assumption 3.
Let B ∈ B(X), x ∈ X, and f ∈ F. Then, from Lemma 10(iii), taking
the upper limit in Assumption 3(a) we deduce that
Q∞ (B | x, f ) ≥ lf,∞ (x)ν(B).
Similarly, taking the upper limit in Assumption 3(b), by Fatou’s Lemma we
get
γ ≤ lim sup lf,n (x)ν(x) dx ≤ lf,∞ (x)ν(x) dx.
n→∞
X X
118 N. Hilgert and O. Hernández-Lerma

Now, to prove (13) for n = ∞, let {uk } be a nondecreasing sequence in


B0 (X) such that uk (x) ↑ w(x) for all x ∈ X. Choose arbitrary x ∈ X and
f ∈ F. Then, by (13), for each k, n in N,

uk (y) Qn (dy | x, f ) ≤ w(y) Qn (dy | x, f ) ≤ βw(x) + lf,n (x)ν(w).

Taking the lower limit as n → ∞, we infer by Lemma 10(iii) and an extension


of Fatou’s Lemma (see [7, Lemma 8.3.7]) that

uk (y) Q∞ (dy | x, f ) ≤ lim inf uk (y) Qn (dy | x, f )


n→∞
≤ βw(x) + lim inf lf,n (x)ν(w)
n→∞
≤ βw(x) + lf,∞ (x)ν(w).
Thus, letting k → ∞, we obtain (by monotone convergence)

w(y) Q∞ (dy | x, f ) ≤ βw(x) + lf,∞ (x)ν(w).

(b) This follows from (a) and Assumption 1.


Choose arbitrary B ∈ B(X), n = 0, 1, . . . , ∞, and f ∈ F. Let µn,f be
the unique i.p.m. of Qn (B | x, f ) in Lemmas 9 and 11(b), i.e.,

(22) µn,f (·) = Qn (· | x, f ) µn,f (dx).


X

The following Lemma 12 proves the convergence of µn,f , as n → ∞, for all


f ∈ F. To state it, for each α ∈ (0, 1), let us first define the α-discounted
cost Vnα (f, x, IB ) when c(x, a) is replaced by the indicator function IB (·) of
B ∈ B(X), i.e.,

X ∞
X
(23) Vnα (f, x, IB ) := αt Ex(n)f IB (xt ) = αt Qtn (B | x, f )
t=0 t=0

for all x ∈ X.
Lemma 12. (a) Under Assumption 1, for all f ∈ F, x ∈ X, B ∈ B(X)
and α ∈ (0, 1),
(24) Vnα (f, x, IB ) → V∞
α
(f, x, IB ) as n → ∞.
(b) Moreover , if in addition Assumptions 2 and 3 hold , then
(25) µn,f (B) → µ∞,f (B) as n → ∞.
Proof. (a) By Lemma 10(iii), and by the formula

Qtn (B | x, f ) = Qt−1
n (B | y, f ) Qn (dy | x, f ), t ≥ 1,
X
Limiting average cost control problems 119

a straightforward induction argument and the Bounded Convergence Theo-


rem give
lim Qtn (B | x, f ) = Qt∞ (B | x, f ), t = 0, 1, . . .
n→∞
Therefore, by (23) and the Bounded Convergence Theorem again, we
get (24).
(b) Let B ∈ B(X), n = 0, 1, . . . , ∞, f ∈ F and x ∈ X.
From the definition (23) of Vnα (f, x, IB ) and a straightforward calculation
(or from formula (5.3.10) in [6] with µn,f (B) in lieu of J(π, x)), we get
µn,f (B) X ∞ X
t−1 
Vnα (f, x, IB ) = + (1 − α) αt−1 Qkn (B | x, f ) − tµn,f (B) ,
1−α t=1 k=0
which can be written as

X t−1
X
µn,f (B) = (1−α)Vnα (f, x, IB )−(1−α)2 α t−1
(Qkn (B | x, f )−µn,f (B)).
t=1 k=0
To estimate |µn,f (B)−µ∞,f (B)|, we write µn,f (B)−µ∞,f (B) = Iα,n +IIα,n ,
where
Iα,n = (1 − α)[(Vnα (f, x, IB ) − V∞
α
(f, x, IB )],
and

X t−1
X
2 t−1
IIα,n = − (1 − α) α {(Qkn (B | x, f ) − µn,f (B))
t=1 k=0
− (Qk∞ (B | x, f ) − µ∞,f (B))}.
From part (a), we see that for each α ∈ (0, 1), Iα,n → 0 as n → ∞.
Moreover, taking u(·) = IB (·) in (20) gives
|Qkn (B | x, f ) − µf,n (B)| ≤ Rrk w(x) ∀k = 0, 1, . . .
We then deduce that

X t−1
X
2 t−1 2R
|IIα,n | ≤ (1 − α) α 2Rrk w(x) ≤ (1 − α) w(x).
t=1
1−r
k=0
As α is arbitrary, letting α → 1, we obtain the convergence of |IIα,n | to 0,
for all n, which finally gives (25).
Lemma 13. Under Assumptions 1 to 3, for all f ∈ F, x ∈ X and n =
0, 1, . . . , ∞,
1 (n)f  X 
N −1
(26) Jn (f ) := c(y, f ) µf,n (dy) = lim Ex c(xi , f ) .
N →∞ N
X i=0

Proof. By Assumption 2(c), c(·, f ) is in Bw (X) for all f ∈ F. Then (26)


follows from (20) with u(·) := c(·, f ).
120 N. Hilgert and O. Hernández-Lerma

Lemma 14. Under Assumptions 1 to 3, for all u ∈ Bw (X), x ∈ X,


π ∈ Π and n = 0, 1, . . . , ∞,
1
(27) lim sup Ex(n)π |u(xN )| = 0.
N →∞ N π

Proof. Choose arbitrary u ∈ Bw (X), x ∈ X, π ∈ Π, t ∈ N, and n =


0, 1, . . . , ∞. Observe that (13) in Assumption 3 yields
(28) w(y) Qn (dy | x, a) ≤ βw(x) + ν ∀x ∈ X, a ∈ A(x),
where ν := kνkw , and so
Ex(n)π w(xt ) ≤ βEx(n)π w(xt−1 ) + ν.
Iteration of this inequality gives
t−1
X  
ν
(29) Ex(n)π w(xt ) t
≤ β w(x) + ν i
β ≤ 1+ w(x)
i=0
1−β
as w(·) ≥ 1. On the other hand, from (12) we have
Ex(n)π |u(xt )| ≤ kukw Ex(n)π w(xt ),
which together with (29) implies (27).
We are now ready for the proof of Theorem 4.
Proof of Theorem 4. (a) With Lemmas 9, 11 and 14 and the results in [9]
at hand, the proof of part (a), following the “vanishing discount” approach,
is more or less standard—see [1, 3, 4, 5] and [7, Theorem 10.3.1]. Hence we
shall only give the main ideas.
Let α ∈ (0, 1), x ∈ X, and n = 0, 1, . . . , ∞, and define
(30) Mn (α) := inf Vnα (x),
x∈X
(31) %n (α) := (1 − α)Mn (α),
(32) uα α
n (x) := Vn (x) − Mn (α),

where Vnα (x) is the optimal α-DC function (18) for the model Mn . Observe
that uα
n (·) is a nonnegative function.
From (29) and Assumption 2(c), it is easy to verify that
 
α c1 ν
0 ≤ Vn (x) ≤ 1+ w(x),
1−α 1−β
which implies  
ν
0 ≤ %n (α) ≤ c1 1 + inf w(x).
1−β X

Therefore, there is a number %n such that lim supα%1 %n (α) = %n . For each
n, let {αn,k } % 1 be a sequence of “discount factors” such that %n =
Limiting average cost control problems 121

limk→∞ %n (αn,k ), and define


α
(33) hn (x) := lim inf unn,k (x) ∀x ∈ X.
k→∞

Note that hn (·) is nonnegative, by (32).


From the α-DC Optimality Equation obtained in [9], namely,
h i
Vnα (x) = min c(x, a) + α Vnα (y) Qn (dy | x, a) ,
a∈A(x)
X
and (30)–(32), we deduce
h i
(34) %n (α) + uα
n (x) = min c(x, a) + α uα
n (y) Qn (dy | x, a) .
a∈A(x)
X

Now in (34) replace α with αn,k and take the lower limit as k → ∞. Then,
by Fatou’s Lemma, we get the ACOI (14) for all n = 0, 1, . . . , ∞.
The existence of fn ∈ F that satisfies (15) is assured by Lemma 10(b)
and Assumption 2(a), if we use a well known measurable selector theorem
(see [6, Proposition D.5], for example). Moreover, by Lemma 14 and the fact
that the cost c(·, ·) is nonnegative, [7, Theorem 10.3.1] gives
(35) %n = inf Jn (f ) = Jn (fn ) = Jn∗ (x) ∀x ∈ X,
f ∈F
with Jn (f ) as in (26).
Finally, as mentioned in Remark 6, the fact that
%n = lim sup(1 − α)Vnα (x) for all x ∈ X
α%1

can be obtained by noting that


|(1 − α)Vnα (x) − %n | ≤ (1 − α)uα
n (x) + |%n (α) − %n | ∀x ∈ X.
(b) Define % := lim inf n→∞ %n and % := lim supn→∞ %n . The basic idea
of the proof is to show that
(36) % = % = %∞ .
For each n ∈ N, the first equality in (35) yields
(37) %n ≤ Jn (f∞ ),
where f∞ is an EAC optimal policy for the model M∞ . Moreover, from Lem-
ma 13 we get Jn (f∞ ) := X c(y, f∞ ) µn,f∞ (dy). Thus, by (25) and Fatou’s
Lemma, we deduce from (37) that
(38) % ≤ J∞ (f∞ ) = %∞ .
We now wish to prove
(39) % ≥ %∞ .
To get this, let {nk } be a subsequence of {n} such that
% = lim %nk ,
k→∞
122 N. Hilgert and O. Hernández-Lerma

and in the ACOI (14) replace n with nk . Thus


h i
(40) %nk + hnk (x) ≥ min c(x, a) + hnk (y) Qnk (dy | x, a) ∀x ∈ X.
a∈A(x)
X

Now take the lower limit in (40) as k → ∞ to get


h i
(41) % + h(x) ≥ lim inf min c(x, a) + hnk (y) Qnk (dy | x, a) ∀x ∈ X,
k→∞ a∈A(x)
X

where h(x) := lim inf k→∞ hnk (x). Then, by Fatou’s Lemma, and by ap-
plying a general result on the interchange of limits and minima (see [6,
Lemma 4.2.4]), we obtain
h i
(42) % + h(x) ≥ min c(x, a) + h(y) Q∞ (dy | x, a) ∀x ∈ X.
a∈A(x)
X

Moreover, as in part (a), Lemma 10(b) and Assumption 2(a) yield the ex-
istence of a stationary policy f ∈ F that attains the minimum in (42); that
is, (42) becomes
(43) % + h(x) ≥ c(x, f ) + h(y) Q∞ (dy | x, f ) ∀x ∈ X.
X

On the other hand, iteration of the latter inequality yields, for all N =
1, 2, . . . ,
N
X −1 N
X −1
N % + h(x) ≥ Ex(∞)f c(xt , f ) + Ex(∞)f h(xN ) ≥ Ex(∞)f c(xt , f ),
t=0 t=0

as h(·) is nonnegative. Hence dividing by N and letting N → ∞, from (26)


we obtain % ≥ J∞ (f ). Finally, by (35), J∞ (f ) ≥ %∞ and (39) follows, which
together with (38) gives (36).
Proof of Corollary 8. Let n = 0, 1, . . . , ∞. From Lemmas 9 and 11(b),
the state process {xt }, following the model Mn under a stationary policy
f ∈ F, is an aperiodic and λ-irreducible Markov chain. The equivalence
between EAC optimality and SPAC optimality for Mn is then a consequence
of [8, Theorem 3.7(b)] applied to each model Mn .

References

[1] A. Arapostathis, V. S. Borkar, E. Fernández-Gaucherand, M. K. Ghosh and S. I.


Marcus, Discrete-time controlled Markov processes with average cost criterion: a
survey, SIAM J. Control Optim. 31 (1993), 282–344.
[2] G. Bastin and D. Dochain, On-line Estimation and Adaptive Control of Bioreactors,
Elsevier, Amsterdam, 1990.
Limiting average cost control problems 123

[3] E. Gordienko and O. Hernández-Lerma, Average cost Markov control policies with
weighted norms: existence of canonical policies, Appl. Math. (Warsaw) 23 (1995),
199–218.
[4] O. Hernández-Lerma, Average optimality in dynamic programming on Borel spaces
—Unbounded costs and controls, Systems Control Lett. 17 (1991), 237–242.
[5] O. Hernández-Lerma and J. B. Lasserre, Average cost optimal policies for Markov
control processes with Borel state space and unbounded costs, ibid. 15 (1990), 349–
356.
[6] —, —, Discrete-Time Markov Control Processes: Basic Optimality Criteria,
Springer, New York, 1996.
[7] —, —, Further Topics on Discrete-Time Markov Control Processes, Springer, New
York, 1999.
[8] O. Hernández-Lerma, O. Vega-Amaya and G. Carrasco, Sample-path optimality and
variance-minimization of average cost Markov control processes, SIAM J. Control
Optim. 38 (1999), 79–93.
[9] N. Hilgert and O. Hernández-Lerma, Limiting optimal discounted-cost control of a
class of time-varying stochastic systems, Systems Control Lett. 40 (2000), 37–42.
[10] N. Hilgert, R. Senoussi and J.-P. Vila, Nonparametric estimation of time-varying
autoregressive nonlinear processes, C. R. Acad. Sci. Paris Sér. I Math. 323 (1996),
1085–1090.
[11] A. Hordijk and A. A. Yushkevich, Blackwell optimality in the class of all policies
in Markov decision chains with Borel state space and unbounded rewards, Math.
Methods Oper. Res. 50 (1999), 421–448.
[12] N. V. Kartashov, Inequalities in theorems of ergodicity and stability for Markov
chains with common phase space, II , Theory Probab. Appl. 30 (1986), 507–515.
[13] —, Strong Stable Markov Chains, VSP, Utrecht, 1996.
[14] H. U. Küenle, Markov games with average cost criterion under a geometric drift
condition, paper presented at the 10th INFORMS Applied Probability Conference,
University of Ulm, July 26–28, 1999.
[15] F. Luque-Vásquez and O. Hernández-Lerma, Semi-Markov control models with av-
erage costs, Appl. Math. (Warsaw) 26 (1999), 315–331.
[16] A. S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games,
Math. Methods Oper. Res. 50 (1999), 399–419.
[17] —, Sensitive equilibria for ergodic stochastic games with countable state-spaces,
ibid., 65–76.

Departamento de Matemáticas Departamento de Matemáticas


CINVESTAV-IPN CINVESTAV-IPN
Apartado Postal 14-740 Apartado Postal 14-740
México D.F. 07000, México México D.F. 07000, México
E-mail: [email protected]
Permanent address:
Laboratoire de Biométrie
INRA-ENSA.M
2 place Viala
34060 Montpellier Cedex 1, France
E-mail: [email protected]

Received on 25.11.1999;
revised version on 6.11.2000 (1512)

You might also like