Admision Control
Admision Control
Admision Control
Cynara C. Wu
Dimitri P. Bertsekas
[email protected] [email protected]
Laboratory for Information and Decision Systems
Massachusetts Institute of Technology
Cambridge, MA 02139, USA
Abstract
With the population of wireless subscribers increasing at a rapid rate, overloaded situations are likely to become an increasing problem. Admission control can be used to
balance the goals of maximizing bandwidth utilization and ensuring su cient resources
for high priority events. In this paper, we formulate the admission control problem as
a Markov decision problem. While dynamic programming can be used to solve such
problems, the large size of the state space makes this impractical. We propose an approximate dynamic programming technique, which involves creating an approximation of
the original model with a state space su ciently small so that dynamic programming can
be applied. Our results show that the method improves signi cantly on policies that are
generally in use, in particular, the greedy policy and the reservation policy. Much of the
computation required for our method can be done o -line, and the real-time computation
required is easily distributed between the cells.
Keywords
Cellular networks, admission control, dynamic programming.
1 Introduction
E cient resource utilization is a primary problem in cellular communications systems. Resource issues include determining with which users to establish connections, and assigning
transmit power levels to connected users subject to acceptable signal quality. For most users,
the inability to initiate a call is perceived as more tolerable than the unexpected termination
of a call. As a result, admission control, in which users requesting a connections are not
1
automatically admitted even if resources are available to handle the connections, may be
necessary to ensure su cient resources are available for hando s and other higher priority
events.
The most common method of dealing with hando s is to reserve a number of \guard"
channels exclusively for hando s. New calls are admitted only if the number of available
channels exceed the number of guard channels. Such systems were introduced in the mid1980's ( HR86, PG85]). A great deal of work has been done on developing more complex
admission control schemes. However, many studies make a number of unrealistic, simplifying
assumptions such as modeling hando arrivals as Poisson processes that are independent of
new call arrivals ( AAN96], Gue88], TJ92]). Other studies use more realistic models that
require extremely complex calculations ( AS96], LAN97]). Several authors have noted that
while the problem is best modeled as a continuous-time Markov chain, the size of the state
space makes the problem di cult to solve BWE95, SS97]. Chang and Geraniotis actually
use dynamic programming to solve their Markov decision problem, but their method does
not generalize well to larger problems CG98]. All the above works have focused on systems
in which channels are orthogonal and all users require the same resources. In certain systems
such as CDMA systems, di erent users may require varying resources. In addition, there
may be di erent classes of users with di erent resource requirements.
In this paper, we consider the problem of optimal admission control: given a particular
con guration of users of various classes in various regions, determine whether or not to
accept a new call request. We assume we have available an algorithm that can determine
for any distribution of users of various classes in various regions whether there is a feasible
power assignment satisfying the signal to noise requirements for all users, and if so, provides
a unique power assignment for the distribution. Our goal is to formulate the problem as
a Markov decision process and to provide a solution method that is general enough to be
widely applicable and can be implemented in real-time.
In Sec. 2, we develop a model for a multiple access cellular communications system and
formulate the admission control problem as a Markov decision process. While such processes
can be solved by dynamic programming, the size of the problem makes this impractical.
In Sec. 3, we consider an approximate dynamic programming solution. We then present
computational results in Sec. 4.
Figure 1: Model of a cellular system. A cell can be of any shape and is divided into regions that are
represented by the dashed lines. The base station is represented by the small solid circle near the middle.
If a user in region r, for r = 1 : : : R, transmits a signal with a power level w, the signal
strength received at the base station in cell j , for j = 1 : : : M , is arj w. The value arj is the
amount that a signal transmitted from region r is attenuated by interference, multi-path,
etc., by the time it is received by the base station in cell j . We assume that these values
can be obtained either empirically or by analyzing appropriate propagation models and that
they are available.
We assume that we have an algorithm that can determine whether any particular conguration of users in cell regions is feasible and if so, provides a unique assignment of power
levels to each user. The system at any particular time is then characterized by the number of
users of each class in each region. We represent this information by a matrix n of nonnegative
integers:
2
3
n11 : : : n1R 7
6
...
7
n=6
6
7
4
5
nC 1 : : : nCR
where ncr is the number of users of class c in cell region r.
The arrival of connection requests in region r = 1 : : : R by users of class c = 1 : : : C
is modeled as a Poisson process with rate cr . When a request arrives, a decision has to
be made regarding whether to admit the user or not. Given that an existing user of class
c is currently in region r, the probability that he moves to region r is independent of the
0
regions the user has already visited and is given to be qcrr . The probability that this user
ends his connection in the current region r is qcrt = 1 ; Pr qcrr > 0. If a user of class c does
not move to any more regions, the length of the remaining connection time is exponentially
distributed with rate c. Otherwise, assuming the user moves from region r to region r , the
length of time until this occurs is exponentially distributed with rate crr .
While we assume we have some information regarding the cellular network such as the
call request arrival rate to the various regions and the probability distribution of tra c
movement, we do not assume knowledge of where a particular user is heading or a history of
its locations however, our methodology extends to cases where such information is available.
In addition, we have assumed that a user's movement and remaining connection time is
independent of his previous movement and the previous length of connection. As a result,
the matrix n providing the number of users of each class in each cell region describes the
\state" of the system. We assume that we can determine this information either through
measurements or other means.
As requests for connections are accepted, existing connections are completed, and users
move from one region to another, the state of the system changes. The changes depend on
random elements such as user requests to establish connections, existing connections being
completed, and movements of users from one region to another, as well as on decisions that
need to be made, such as admission decisions. Costs or rewards can be associated with
admission decisions, as well as with lengths of connection times. The admission control
problem can therefore be viewed as a continuous-time Markov decision process. We describe
how to do so in the next section.
0
In an in nite horizon Markov decision process with a nite set of states, S , the state evolves
through time according to given transition probabilities Pii that depend on a decision or
control from a set U that depends on the current state. Suppose transitions occur at times
t1 t2 : : :. If the system is in state x(tk ) = i after the kth transition and decision u(tk ) 2 U (i)
is selected, then during the (k + 1)st transition, the system moves to state x(tk+1) = i with
given probability Pii (u(tk )). The interval between transitions is referred to as a \stage."
During the kth stage, we incur a cost g(x(tk ) u(tk )) k , where g is a given function and
k = tk+1 ; tk is the length of the kth stage.
The goal is to minimize over all possible decisions at each stage the average cost per unit
time. This cost function is of the form
1 E Z tN g(x(t) u(t))dt :
lim
N
E ftN g
0
0
!1
We assume the process is stationary i.e., the transition probabilities, the set U of available
controls, the cost functions g, etc, do not depend on the particular transition.
To formulate the admission control problem as a discrete-state in nite horizon Markov
process, we rst de ne the state as a combination of a \cell con guration" and the current
event. We then provide the control space, the probability distribution of the interval between
transitions, the transition probabilities, and the cost function.
The state space S is composed of all possible combinations of primary state components
and random state components and is given by
(
S = i = (n !) n 2 N ! 2
and !cr
0 if
Wc
X
l=1
ncrl = 0
where the last condition holds since departures and movements from region r by a user of
class c can only occur if the number of users of class c in region r is positive.
on the number of possible random events. The arrival of each of these events is a Poisson
process with the rates provided in Table 2.
6
Random State
Component
!2
Arrival
Departure ! 2 d
Movement ! 2 m
Available
Controls
Admit User u = ua
Block User
u = ub
None
Maintain
Connection u = uh
Drop
u = ud
Connection
Next Primary
State Component
n =n+!
n =n
n =n+!
0
0
n =n+!
n =n+!
(!cr = minf!cr 0g)
0
Table 1: List of the possible random state components, the available controls, and the resulting cell
con gurations according to the selected control, given a state i = (n !).
Rate
Total Rate
cr
ncr qcrt
cr
c=1 r=1
C R
XX
ncr qcrt c
c=1 r=1
C R R
XX X
ncr qcrr crr
c=1 r=1 r =1
ncr qcrr
C R
XX
c
crr
Table 2: Possible next events for a particular con guration n 2 N and the associated arrival rate. The
second column indicates the rate for the particular random event described. The third column indicates the
overall rate for all events of the type described.
The overall rate n at which events occur starting from a con guration n is the sum of
the rates of all possible events and is given by
n=
C R
XX
c=1 r=1
cr + ncr
" R
X
r =1
0
qcrr
crr
+ qcrt
#!
For any state i = (n !) 2 S , the control u 2 U (i) determines the cell con guration n for
the next state i = (n ! ) as seen in Table 1. Assuming the control takes e ect immediately,
the transition rate from state i under control u 2 U (i) is then n , or equivalently, u. We
denote the expected value of the time from the transition to state i under control u to the
transition to the next state as i (u):
(u) = 1 :
0
1. The next cell con guration n is determined by the control u 2 U (i) exercised for state
i.
0
2. The probability distribution of the next random event ! depends on the cell con guration resulting from the current state i and control u. Given the next reduced cell
con guration n , the probability of a transition from i = (n !) to i = (n ! ) is the rate
at which the event occurs divided by the total transition rate:
0
8
> cr = n
>
>
>
>
>
>
>
>
>
>
<
if ! 2 a !cr = 1 c = 1 : : : C r = 1 : : : R
0
ncr qcrt c=
if ! 2 d !cr = ;1 c = 1 : : : C r = 1 : : : R
>
>
>
>
>n q
> cr cr1 r2 cr1 r2 = n
>
>
>
>
:
0
if ! 2 m !cr1 = 1 !cr2 = ;1 c = 1 : : : C
r1 r2 = 1 : : : R:
0
(1)
Control
Next
Con guration
Admit User u = ua
n =n+!
0
Block User
u = ub
n =n
State
Next
Control
Configuration
Probab
Distrib of
Random
Event
Next
State
(n + , a )
Pr( a )
|n
n = n +
Pr( d|n )
(n + , d )
ua
Pr( m )
|n
(n + , m)
(n, )
(n, a )
Pr( a )
|n
ub
Pr( d|n )
n =n
Pr( m )
|n
(n, d )
(n, m )
Figure 2: Illustration of the transition from state (n !) when there is a connection request.
Next
Control Con guration
n =n
None
Control
Maintain
Connection u = uh
Next
Con guration
Drop
Connection u = ud
where g is a given function. We can rewrite this objective cost function as a sum of stage
costs:
N
X
lim 1
N
E ftN g k=1 E fGk g
where
Z tk+1
Gk =
g(x(tk ) u(tk ))dt
tk
is the cost of the kth stage. We assume the function g has a component g that depends
^
linearly on the length of time spent in a particular state and a component g that does not
depend on the length of time spent in a particular state. In this case, the cost of the kth
stage is then
Gk = (tk+1 ; tk )^(x(tk ) u(tk )) + g(x(tk ) u(tk )):
g
The rst component can accommodate assigning rewards (or negative costs) for the amount
of time users are connected, while the second component can accommodate assigning costs
for not admitting a user.
!1
v i(u)
where i(u) is the expected length of the transition time corresponding to state i when control
u = (i) is exercised. Similarly, the expected cost for a single stage corresponding to state
i when control u is exercised, which we denote G(i u), is
It can be shown that for the admission control problem, the vector h and scalar v satisfy
the following system of equations:
8
<
h(i) = umini) :G(i
U(
2
u) ; v i(u) +
X
0
for all i 2 S
Pii (u)h(i )
0
9
=
(2)
2S
u) ; v i(u) +
X
i
for all i 2 S :
Pii (u)h (i )
0
9
=
2S
For the admission control problem, the optimal policy for any state i = (n !) 2 S ,
obtained by solving the following equation at each stage:
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
<
(i) = >
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
8
<
X
arg ua ub :G(i ua) ; v i (ua) + Pii (ua)h
min
i
0
i
8
<
X
arg uh ud :G(i uh) ; v i(uh) + Pii (uh)h
min
i
0
if ! 2 a
9
=
Pii (ud)h (i )
0
(i )
(3)
2S
9
=
2S
Pii (ub)h (i )
0
(i )
2S
(i), is
if ! 2 m
2S
where the quantities G, i , and Pii are those provided in Sec. 2.2. Note that if ! 2 d ,
there is no control to exercise.
There are a number of methods for solving Bellman's equation to obtain the values for v
and h . Most of these are forms of the value iteration and policy iteration algorithms. The
computation can be done o -line, i.e., before the real system starts operating. Once this
computation is completed, the optimal decision at each call request can be determined by
comparing the cost resulting from applying each of the available controls and selecting the
control with the minimum value. A comprehensive treatment of DP can be found in Ber95].
One major advantage of DP over others is that the computation required to incorporate
any system details or an arbitrarily complex cost function is done o -line and therefore need
not be real-time. Once the cost function is computed, the optimal policy is often determined
quickly using the cost function with real-time information on the current state of the system.
Furthermore, the optimal cost function can vary from cell to cell to account for variances
such as arrival rates, and cell shapes and sizes.
0
11
Unfortunately, even o -line, the computation required to determine these values is overwhelming due to the large number of states. As a result, suboptimal methods for the solution
must be used. In the next section, we discuss a technique to obtain an approximation for
the optimal average cost and optimal di erential cost function.
H (i u) = G(i u) ; v i(u) +
X
i
Pii (u)h (i )
0
i 2 S u 2 U (i):
2S
If we were able to obtain the optimal average cost per stage v and the optimal di erential
cost vector h , the optimal policy, given by Eq. 3, would be equivalent to
(i) = arg umini) H (i
U(
2
8
>
<
u) = >
:
arg ua ub H (i u) if ! 2 a
min
arg uh ud H (i u) if ! 2 m
min
i 2 S:
(4)
We refer to H (i u) as the Q-factor of the state-control pair (i u). It is the expected cost
corresponding to exercising control u at state i and then proceeding to follow an optimal
policy. Note that the optimal di erential cost vector h satis es the equation
h = umini) H (i u):
U(
2
Instead of calculating the actual average cost per stage v and di erential cost vector h and
using Eq. 4 to determine the optimal decision at every state, we construct approximations of
~
the Q-factors H (i u), to which we refer as H (i u). After determining such approximations,
the control for every state is given by
~
(i) = arg umini) H (i u)
U(
2
for all i 2 S :
~
v and di erential cost function h of the approximation, thereby obtaining an approximation
~
of H (i u) of the form
X
~
~
H (i u) = G(i u) ; v i (u) + Pii (u)h(i )
~
0
i 2 S u 2 U (i):
(5)
2S
In the approximate problem, we decompose the original model into individual cells and
consider each cell independent of the others. For each cell j , we obtain an estimate of the
~
average cost per stage vj and the di erential cost vector hj . Costs associated with a particular
~
user are attributed to the cell in which the user is located when the cost is incurred. For
instance, suppose there is a xed cost associated with blocking a connection request by a
user of class c. Each cell in which a user of class c is blocked is then attributed that blocking
cost. The total average cost per stage and total di erential cost is the sum of the costs
associated with each cell:
v=
~
N
X
j =1
vj
~
X
~
~
and h(i) = hj (i)
j =1
i 2 S:
(6)
~
To obtain the approximations vj and hj , we formulate a Markov decision process for each
~
cell. The state of each process consists of \features" of the state of the original admission
control problem. With an appropriate selection of features, each process is simple enough
so that traditional dynamic programming techniques such as value iteration can be applied
~
to obtain vj and hj . Speci cally, for each state i 2 S and cell j = 1 : : : N , fj (i) =
~
fj1(i) : : : fjKj (i) is a mapping of the state i to a vector of features, where Kj is the
number of features that are to be extracted for use in evaluating cell j . The di erential
cost function is then approximated by a sum over all cells of functions that depend on these
features instead of the actual state:
N
X
~
~
h(i) = hj (fj (i)):
j =1
This approximation is illustrated in Fig. 3. Ideally, the features that are \extracted" summarize important characteristics of the state which a ect the cost, and they should incorporate
the designer's prior knowledge or intuition about the problem and about the structure of the
optimal controller.
We summarize the cell-by-cell decomposition with feature extraction procedure as follows.
1. Select suitable features for each cell.
2. Formulate a Markov decision process for each cell where the state consists of the features
selected in Step 1. We refer to the formulated process as the \feature-based MDP" for
the cell.
13
Obtain
features
for Cell 1
Formulate MDP /
Get approx value
for Cell 1
h1(f1(i))
Obtain
features
for Cell 2
f2(i)
Formulate MDP /
Get approx value
for Cell 2
h2(f2(i))
Obtain
features
for Cell M
i = (n, )
f1(i)
fM (i)
Formulate MDP /
Get approx value
for Cell M
h (fM (i))
M
~
h(i)
Figure 3: Cell-by-cell decomposition with feature extraction: For any state i, a set of features are extracted
for each cell. The features are used as the state space for a Markov decision process associated with the cell.
The costs resulting from solving the Markov decision process are approximations of the costs incurred by
that cell.
~
3. Calculate the average costs per stage vj and the di erential cost vectors hj for the
~
Markov decision processes formulated for each cell in Step 2.
4. Approximate the overall average cost per stage and di erential cost vector as a sum
of the costs associated with the individual cells and determined in Step 3 (see Eq. 6).
~
These values can be used in Eq. 5 to obtain H (i u), which is then used to make decisions
at state i 2 S according to
~
(i) = arg umini) H (i u)
U(
for all i 2 S :
Note that after using cell-by-cell decomposition and feature extraction to obtain an approximate problem, the features of a number of cells will not be a ected by the current decision
and therefore these cells need not be included in evaluating decisions. Speci cally, given a
state i = (n !), let J (i) be the set of cells whose features are modi ed by any control in
U (i). The decision at state i 2 S is then given below:
~
(i) = arg umini) H (i u)
U(
8
<
arg umini) :G(i
U(
2
u) ; i(u)
X
j
2J
(i)
vj +
~
2
39
=
X
~
4Pii (u)
hj (fj (i ))5 :
j (i)
0
2S
2J
If we do not include costs associated with cells whose features are not a ected by the
addition of a new user, the amount of calculation required to make each admission decision
becomes independent of the total number of cells in the system. The total amount of
calculation then grows linearly in the number of cells. Furthermore, since each admission
14
decision depends on local information, the method is easily distributed so that each cell
includes a controller responsible for making decisions for requests within its regions.
We describe the rst three steps in detail in the following sections. In Sec. 3.1, we
describe the process of selecting suitable features and provide several examples. In Sec. 3.2,
we illustrate with an example the process of formulating the Markov decision process for a
particular cell given a set of features. In Sec. 3.3, we describe the process of determining the
~
values of vj and hj given a Markov decision process formulation for cell j .
~
Figure 4: The user in cell 2 is likely to generate a fair amount of interference to the base station in cell 1
and therefore a ect the ability to accept future connection requests. Its in uence on future arrivals in cell
3, however, is likely to be signi cantly smaller.
Figure 5: Assume tra c ows only left to right on the illustrated road. The user on the left is more likely
to be connected to the base station in the center cell for a longer period of time than that on the right. In
addition, the user on the right is likely to require its call be handed o in the near future.
leave the cell are more likely to generate less interference in the future. This factor suggests
including the following set of features:
16
3. fj3c(i): number of users of class c in cell j that are positioned in regions in which the
probability of moving to a region of another cell is greater than 0.
The expected number of future drops for users attempting to move to a particular cell
depends on the number of existing users of each class positioned to move into this cell in
addition to the interference currently present. For instance, the user on the right in Fig. 5 is
likely to move to the cell on the right and may require its call be handed o . This suggests
the following set of features:
4. fj4c(i): number of users of class c in other cells that are positioned in regions in which
the probability of moving to cell j is greater than 0.
The set of features can be tailored to each cell. Once the set of features are selected for a
particular cell, a Markov decision process is formulated whose state consists of the selected
features. This feature-based MDP is not automatically determined by the chosen features.
Instead, the transition probabilities and other characteristics are based on additional assumptions and approximations that are to some extent heuristic and re ect our understanding of
the practical problem at hand. We illustrate this process in the next section.
In this section, we illustrate with an example the second step involved in the cell-by-cell
decomposition with feature extraction procedure: the process of formulating the featurebased MDP for a particular cell. This example is the basis of a portion of the computational
results presented in Sec. 4. In this example, we extract from any state i 2 S the following
2C features for each cell j = 1 : : : N :
1. fjc(i) = the number of users of class c 2 f1 : : : C g in all of the regions in cell j ,
2. fj(C +c)(i) = the number of users of class c 2 f1 : : : C g in all regions r such that
warj > T , where r is a region not in cell j , w is the power level assigned to any user in
region r, and T is an arbitrary constant.
The purpose of the rst set of features is to indicate how much interference is generated by
users within a particular cell. The purpose of the second set of features is to indicate how
much interference is generated by users in other cells. For convenience, let Rj be the set of
regions in cell j and let Ro (i) be the set of regions r not in cell j such that given a state
jc
i, the available power control algorithm assigns a power level w to users of class c such that
warj > T . In this example, we assume that for any cell j , the set of regions not in j in which
users generate su cient interference to the base station in j is independent of the user class
and of the actual power assignment (and consequently, the state). We therefore simplify
notation by referring to this set by Ro .
j
17
We now formulate the Markov decision process based on the selection of these features.
Both the state space and the control space are analogous to those in the formulation of the
admission control problem. In general, the transition rates and probabilities, and the cost
function for this feature-based MDP can not be determined exactly. We provide examples
of suitable approximations in our formulation.
The State Space The state space consists of the following two components:
1. A primary state component which describes the feature values of the state for cell j and
is denoted by n = (~1 : : : n(2C ) ). In this example, the features are the number of users
~ n
~
of each class in cell j and the number of users of each class in regions in Ro. We also
j
refer to this component as the feature con guration.
2. A random state component which describes the current random event and is denoted
by ! = (!1 : : : !(2C ) ).
~
Let Nj be the set of primary state components for cell j . We have
8
>
<
~
Nj = >n nc =
~ ~
:
X
r
2R
X
r
2R
o
j
9
>
=
where N is the set of primary state components for the admission control problem.
We are only concerned with events that can change a particular cell con guration. These
consist of arrivals to, departures from, and movements involving regions in Rj or Ro . Movej
o to another
ments of users from a region in Rj to another region in Rj or from a region in Rj
region in Ro need not be considered an event. In addition, we can treat a movement by a
j
user from a region in Rj or Ro to a region not in either Rj or Ro as a departure, and a
j
j
o to a region in Rj or Ro as an arrival.
movement by a user in a region not in either Rj or Rj
j
The set of all random events for a particular cell j is given by
= a ao
d
do
m1
m2
where a and ao are the sets of all events consisting of an arrival to cell j and to a region
in Ro , respectively, d and do are the sets of all events consisting of a departure from cell j
j
and from a region in Ro , respectively, and m1 and m2 are the sets of all events consisting
j
of a user moving from a region in cell j to a region in Ro and vice versa, respectively.
j
The state space associated with cell j , which we denote Sj , is composed of all possible
combinations of primary state components and random state components and is given by
n
o
Sj = i = (~ !) n 2 Nj ! 2 and !c 0 if nc = 0
n ~ ~
~
where the last condition holds since departures and movements by a user of class c can only
occur if the number of users of class c is positive.
18
The Control Space For any state i = (~ !), the possible controls as well as the resulting
n
feature con gurations n are given in Table 3 according to the various possible values of !.
~
If the random state component is a departure, there is no choice of control and the event is
accommodated. Otherwise, the control can either be to accommodate or not accommodate
the event.
0
Random State
Component
Available
Controls
Next Primary
State Component
n =n+!
~ ~
n =n
~ ~
Arrival
!2
ao
Admit User u = ua
Block User u = ub
Departure
!2
do
None
n =n+!
~ ~
n =n+!
~ ~
m2
Maintain
u = uh
Connection
Drop
u = ud
Connection
Movement
!2
m1
n =n+!
~ ~
(!c = minf!c 0g)
0
Table 3: List of the possible random state components, the available controls, and the resulting con gurations according to the selected control.
2R
2R
19
2R
cr
We can similarly approximate the probability that a user of class c in a region in Ro moves
j
o by
to a region in cell j , which we denote qjcm
~
X
qjcm =
~o
o
j
( cr
2R
X
r
qcrr )
0
2R
cr
o
j
2R
Note that from the perspective of cell j , we can consider a movement by a user in a region
in Ro to a region that is not in Ro nor in cell j as a departure or connection completion. We
j
j
therefore include the probability of such movements as part of the probability that a user of
class c in a region in Ro ends an existing connection, which we denote qjct:
~o
j
X
qjct =
~o
o
j
cr (qcrt +
2R
X
r
o
j
X
r
0 62R
cr
o
j
qcrr )
0
2R
We also denote the total departure probability of a user of class c in cell j by qjct:
~
X
qjct =
~
cr qcrt
2R
cr
2R
Transition Rates Before presenting the transition rates and probabilities, we denote the
total rate at which users of class c request a connection in each cell j = 1 : : : N by ~jc:
~jc = X
r
cr
2R
and the total rate at which users of class c request a connection in a region not in cell j but
which interferes with the base station in cell j by ~o :
jc
~o = X
jc
r
o
j
cr :
2R
We can make approximations to specifying the overall rates at which users move or complete their connections in a manner similar to that in making approximations to specifying
the probabilities of movements or connection completions. For instance, to determine an
approximation to the rate at which a user of class c in cell j moves to a region in Ro , which
j
we denote ~jcm, we can weight the various possible transition rates according to the arrival
20
rate to individual regions in cell j , and then consider the probability of moving from any of
these regions to a region in Ro . Such a weighting results in the transition rate
j
X
~jcm =
2R
( cr
2R
X
r
o
j
qcrr
crr )
2R
( cr
2R
qcrr )
o
j
An approximation to the rate at which a user of class c in Ro moves to a region in cell j can
j
be done in a similar manner, resulting in the rate
X
~o =
jcm
2R
o
j
( cr
o
j
X
r
qcrr
crr )
2R
( cr
2R
qcrr )
2R
The rate at which a user of class c in cell j ends an existing connection is independent
of the particular region and is therefore also independent of the cell:
~ct = c:
Since the probability that a user of class c in Ro ends an existing connection includes the
j
probability that such a user moves to a region not in either Ro or in cell j , we can similarly
j
weight such transitions to obtain an approximation to the rate at which a user of class c in
Ro ends an existing connection:
j
X
~o =
jct
2R
cr (qcrt c +
2R
X
r
cr (qcrt +
62R
o
j
qcrr
crr )
62R
o
j
qcrr )
Note that in the above approximation, the probabilities and rates of transitions for users
in cell j and in Ro are weighted based on the arrival rates to the individual regions of cell
j
j and of Ro . Such a weighting assumes that the distribution of existing connections in the
j
various regions is the same as the distribution of connection requests. Such an assumption
would be appropriate if admission decisions were made independently of the particular region
in which connection requests occurred. However, it is likely that controls which give preference to connection requests in certain regions result in better performance. As a result, a
weighting that is based on arrival rates may not be appropriate. In many systems, however,
the transition rates may not vary signi cantly from region to region and therefore such a
weighting can still be very accurate.
21
Using the preceding approximations, for any feature con guration n 2 Nj of cell j , the
~ ~
time until the next random event occurs for the Markov process associated with cell j is an
exponential process with an approximate rate
jn
~
C
X
c=1
Transition Probabilities As described in Sec. 2.2.4, the transition for any cell j from a
state i = (~ !) 2 Sj under control u 2 U (i) to a state i = (~ ! ) 2 Sj depends on two
n
n
0
factors. The con guration for the next state is determined by the control exercised at state
Random State
Component
Arrival
! 2 a ao
Departure
! 2 d do
Movement
! 2 m1 m2
Control
u = ua
u = ub
Admit
Block
Next
Con guration
n =n+!
~ ~
n =n
~ ~
0
n =n
~ ~
None
Maintain u = uh
Drop
u = ud
n =n+!
~ ~
n =n+!
~ ~
0
Table 4: List of the possible random events and the resulting con gurations according to the exercised
control.
i, as shown in Table 4, and the probability distribution of the random event for the next
state depends on the next con guration, as shown in Table 5.
Next Event
!2
!2
!2
!2
!2
!2
0
0
0
a
ao
d
do
m1
m2
Transition Probability
from (~ !) to (~ ! )
n
n
~jc= jn
~
~o = jn
~
jc
njcqjct ~ct= jn
~ ~
~
o ~o = j n
njcqjct jct ~
~ ~
njcqjcm ~jcm= jn
~ ~
~
o ~o = j n
njcqjcm jcm ~
~ ~
0
Table 5: List of the transition probabilities for the possible next random events.
22
The Cost Function Instead of associating costs with the current state and control, we
need to reformulate costs so that they are associated with a particular cell and can be derived
based on the features of the state that are selected. Essentially, we approximate the expected
cost for a single stage corresponding to state i under control u by a sum of costs associated
with each cell:
N
X
~
G(i u)
Gj (fj (i) u)
j =1
~
where Gj (fj (i) u) contains the expected single stage costs corresponding to state i under
control u and associated with cell j , and must be derived from the features from the state
used to represent information relevant to cell j .
Determining how to decompose the costs according to the individual cells depends on
the particular form of the cost function. Note that decomposing costs according to cells is
straightforward if costs were only associated with blocking or accepting connection requests,
and with dropping or maintaining existing connections. Then, in the case in which a connection is requested, costs could be associated with the cell in which the connection request
is located, and in the case in which an existing user moves, costs could be associated with
the cell to which the user is moving. Note that the features extracted from a state for any
cell would need to contain the information necessary to collect the desired costs, as in the
case of the current formulation example.
If costs depend on the entire state, e.g., if there were bonuses for maintaining a certain number of connections in the entire system, decomposing the costs would be much less
straightforward unless the features selected included such information. As a result, selecting the features to represent the relevant information for a particular cell should take into
consideration the form of the cost function.
~
3.3 Calculating v and h.
~
In this section, we describe the third step involved in the cell-by-cell decomposition with
~
feature extraction procedure: the process of determining the values of vj and hj given a
~
feature-based MDP for each cell j . After these values are determined, the decision at each
state is obtained by determining the cells a ected by the decision, summing up the approximate costs associated with each of these cells, and selecting the control resulting in the
smallest cost:
8
n
o
> arg min H (i ua ) H (i ub )
~
~
<
if ! 2 a
~
o
(i) = arg umini) H (i u) = > ua ub n ~
i 2 S (7)
~
U(
: arg u u H (i uh) H (i ud )
min
if ! 2 m
h d
2
where
~
H (i u) = G(i u) ; i(u)
X
j
2J
(i)
vj +
~
23
2
3
X
~
4Pii (u)
hj (fj (i ))5 :
j (i)
0
2S
2J
~
Our method of determining the values for vj and hj is based on the discussion in Ber95],
~
~
x5.3. Before we consider the problem of determining for each cell j the values for vj and hj ,
~
we consider the \auxiliary discrete-time average cost" version of the problem. Let Sj be the
feature space associated with cell j and let be any scalar such that
(
0 < < i~u)
1 ; Pjii(u)
~
for all i 2 Sj and u 2 U (i) with Pjii < 1. We de ne new transition probabilities for all i and
u 2 U (i):
8
> Pjii (u)
> ~
>
>
if i 6= i
<
i (u)
^ii (u) =
P
~
>
>
> 1 ; (1 ; Pjii (u)) if i = i
>
:
i (u)
~
where the Pjii are the transition probabilities for the Markov process associated with cell j ,
and de ne
~ u
^
Gj (i u) = Gj ((iu) )
0
as the expected cost per stage associated with cell j . If for any cell j , the scalar vj and
^
^ j satisfy
vector h
^
hj (i) = umini)
U(
2
2
^
4Gj (i
~
then vj and hj where
^
satisfy
u) ; vj +
^
X
i
3
^ ^
Pii (u)hj (i )5
2S
i 2 Sj
~
^
hj (i) = hj (i) i 2 Sj
~
hj (i) = umini)
U(
2
2
4Gj (i
u) ; vj i (u) +
^
(8)
3
~ ~
Pii (u)hj (i )5
0
i 2 Sj :
2S
^
We now consider the problem of determining the values for vj and hj . One method for
^
calculating these values is a technique called relative value iteration. To simplify notation,
^
we refer to vj with vj and hj with hj .
^
Let t be any xed state, and initialize h0(i) to arbitrary scalars for all i 2 Sj . We run
j
the following iterative algorithm
8
<
^
(Thk )(i) = umini) :Gj (i
j
U(
2
u) +
24
X
j
9
=
^
Pii (u)hk (i ) :
j
0
2S
(9)
0
Suppose at any particular stage, there are a total of m connected users regardless of class
in the entire system. Due to capacity constraints, m must be bounded. There is then some
> 0 such that the probability that each of the next m stages involves some user completing
his connection is greater than . As a result, for our Markov problem, there exists a positive
integer m such that under any policy, the probability of reaching the \empty" state, the
state in which there are no users connected, from any state in m stages is greater than some
> 0. It has been shown that for such a problem, the sequence fhk g generated by Eq. 9
j
converges to a vector hj such that (Thj )(t) is equal to the optimal average cost per stage
and hj is an associated di erential cost vector ( Ber95]). It also follows that
8
2
3
<
X
X
X
^
arg ua ub :G(i ua) ;
min ^
vj + 4Pii (ua)
hj (fj (i ))5
i 2
j (i)
j (i)
39
=
X
X
X
^
^
G(i ub) ;
vj + 4Pii (ub)
hj (fj (i ))5
i 2
j (i)
j (i)
8
3
<
X
X
X
^
arg ua ub :G(i uh) ;
min ^
vj + 4Pii (uh)
hj (fj (i ))5
i 2
j (i)
j (i)
39
=
X
X
X
^
^
G(i ud) ;
vj + 4Pii (ud)
hj (fj (i ))5
i
j (i)
j (i)
0
0 2S
2J
2J
2J
2S
2J
2S
2J
2J
where
P
if ! 2 a
2J
2S
2J
if ! 2 m
(10)
i
^
G(i u) = G((uu) :
)
i
Finally, note that since the term j (i) vj is independent of the control, it need not be
calculated in making decisions and can be omitted from Eq. 10.
2J
4 Computational Results
In this section, we describe the results of applying the approximation technique described
in Sec. 3 to a simulated cellular system. We rst describe the simulated system in Sec. 4.1.
We then describe the speci c details involved in implementing the approximation technique
for this particular system, and present some computational results in Sec. 4.2.
25
onal shape. Each cell is divided into two concentric zones, and the outer zone is further
divided into six identical regions, as shown in Figure 6. The parameters that need to be
speci ed to determine the geometry for a particular simulation are
1. Mv , the number of cells in the vertical direction,
2. Mh, the number of cells in the horizontal direction, and
3. Ei, the fraction of the area of a cell that comprises the inner region. (Note that Eo , the
fraction of the area of a cell that comprises an outer region is (1 ; Ei)=6.)
Path Gains We assume that the path attenuation gain from any part of an inner region
of any cell to that cell's base station is A1 i.e., if a user in an inner region of a cell transmits
at a power level of w, the signal level received at the cell's base station is wA1. We also
assume that the path gain from a periphery region of any cell to that cell's base station
is A2 < A1. We assume that the power assignment algorithm assigns power levels of W1
and W2 to users in an inner region of a cell and users in a periphery region, respectively,
26
Figure 6: Cell Model: Each cell consists of an inner region and six identical periphery regions.
and that W1 A1 = W2A2 i.e., we assume that power levels are assigned according to the
inverted channel power control scheme so that the signal levels received by a base station
is the same for all users that are connected to that base station. We also assume that
mobiles within the inner region of a cell, where transmissions occur at relatively low powers,
cause negligible interference in neighboring cells, while mobiles within the periphery regions,
where transmissions occur at relatively high powers, may cause signi cant interference in the
neighboring cells that border the particular outer region. We therefore assume that a signal
from a user in another cell that is in a region that is adjacent to cell j is attenuated by a
factor of A3 < A2 by the time it is received at the base station of cell j . Fig. 7 illustrates
the various possibilities.
2
R1
R2
3
R3
Figure 7: The path gain from the inner region of any cell to its own base station is A1 and 0 elsewhere.
The path gain from region R1 to cells 1 and 2 are A2 and A3 , respectively, and 0 elsewhere. The path gain
from R3 to cells 1 and 3 are A2 and A3 , respectively, and 0 elsewhere. The path gain from R2 to cells 1, 2,
and 3 are A2 , A3 , and A3 , respectively, and 0 elsewhere.
Given our assumptions, the signals received at any base station are of two levels: transmissions from any user within the cell are received at a signal level of I1 = W1A1 = W2A2
and transmissions from any user in an adjacent region of a neighboring cell are received at
a signal level of I2 = W2 A3. The two signal levels essentially specify the three di erent path
gains and we use them as the parameters for the simulated system. In addition, we assume
that each user must satisfy the following signal to noise requirement:
I1
n1 I1 + n2 I2 > T
27
where n1 is the number of users within this cell, n2 is the number of users in an adjacent
region of a neighboring cell of j , and T is a given threshold. We assume that the receiver
noise at each base station is negligible. The parameters that need to be speci ed are then
1. I1 , the signal level received at a base station from any user within the cell,
2. I2 , the signal level received at a base station from any user in an adjacent region of a
neighboring cell, and
3. T , the signal to noise threshold ratio.
User Behavior The arrival rate of new connection requests to cell j is j . The probability
that the connection request is in any particular region of a cell is proportional to the size
of the region. The directions of travel of users can be unrestricted, in which case a user
can move to any region to which his current region is adjacent, or speci ed. One possible
direction speci cation is to allow all users to only move to regions that are \above" the
current region. The probability that a user moves from one cell region to another is q. If
the user moves to another region, he is equally likely to move to any region to which he
is allowed to move. We assume that the probability that a user moves more than once is
zero. The lengths of time that a user that moves stays in its originating region and in its
destination region are both exponentially distributed with mean . The lengths of a call of
a user that is stationary are also exponentially distributed with mean .
The parameters that need to be speci ed to describe user arrivals, departures, and movements are as follows:
1.
j,
B if ! 2 a u = ub
G(i u) = > D if ! 2 m u = ud
>
: 0 otherwise:
28
The objective function which we wish to minimize is a linear combination of the expected
average number of blocks and of drops per unit time:
N
X
lim 1
N
E (tN ) k=1 E fGk g
where Gk is the cost of the kth stage.
The parameters that need to be speci ed to describe the cost function are as follows:
!1
Feature Set
1
2
3
x
x
x
x
x
x
x
x
Table 6: List of the features and those used in each feature set.
After the selection of a set of features, we formulated an appropriate feature-based MDP
for each cell as illustrated in Sec. 3.2. We then applied relative value iteration on the
29
resulting formulations to obtain the average costs per stage and di erential cost vector for
the approximate problems as described in Sec. 3.3. We have applied the policies resulting
from these cost functions on various random sequences of a large number of simulated events.
We refer to each of these sequences as a particular data set. Each data set consisted of
simulated events over a period of 3000 time units.
The problems we considered consisted of a system of four cells by four cells with a single
user class. The area of the inner region of a cell was set to Ei = 1=4 and the area of
each periphery region was set to Eo = 1=8. Calls generated a unit of interference (I1 = 1)
at the base station of the cell in which the call was located. Calls in adjacent regions of
neighboring cells generated a fraction 0:3 of a unit of interference (I2 = 0:3). The total
amount of interference that was acceptable was set to 50.15, resulting in a threshold of
T = 1=50:15. Arrival rates varied from = 15 to 30 call requests per unit time. The call
completion rate was set to = 1 per unit time. The probability that a particular call moves
varied from q = :2 to q = :4. In several of the runs, users that move were equally likely to
move to any neighboring regions in others, users that move were equally likely to move to
any neighboring region that was above the region in which the call was initiated. The cost
associated with a blocked call was B = 0:1, and the cost associated with a dropped call was
D = 1:0.
For the rst six data sets, the arrival rate of connection requests to each cell was the
same and varied from 20 to 25 requests per unit time. In the seventh data set, the arrival
rate of connection requests was 30 per unit time to the four inner cells and 15 per unit time
to the twelve outer cells.
= 15
= 15
= 15
= 15
= 30
= 15
= 15
= 30
= 30
= 15
= 15
= 30
= 15
= 15
= 15
= 15
The speci c details for each data set are provided in Table 7.
The number of value iterations we performed to obtain our DP policies varied from 40 to
500. We compared the results of applying the these policies on the above data sets to those
when the greedy algorithm or a reservation policy was applied. Under a greedy algorithm,
a user is admitted if there are su cient resources to handle the user under a reservation
policy, a user is admitted if there are su cient resources to handle a certain number of later
hando s in addition to the user. In our computations, the optimal number of later hando s
for which to reserve resources was always one. The greedy and reservation algorithms are
30
Data Set
1
2
3
4
5
6
7
Arrival
Rate
25
25
20
20
25
25
15 or 30
Travel
Direction
Uniform
Upwards
Uniform
Upwards
Uniform
Upwards
Uniform
Prob of
Movement
0.2
0.2
0.3
0.3
0.4
0.4
0.3
Total Req'd
Connections
1197265
1199966
958454
959784
1198456
1199540
898742
# Attempted
Movements
232824{234402
224420{226087
285919{275728
275128{275728
437022{444531
413321{426210
254554{256559
Table 7: List of the simulation data sets and their parameters. Each simulation was run for 3000 time
units. The expected length of a connection was 1 unit time.
Table 8: Results of applying cell-by-cell decomposition with feature extraction to Data Set 1.
applying dynamic programming to formulations resulting from cell-by-cell decomposition
with feature extraction performed signi cantly better than the greedy policy and somewhat
better than the reservation policy. The improvement in cost varied from twenty to nearly
forty- ve percent over the greedy policy. In a few cases when the number of value iterations
31
Feature #
Set iters
40
1
100
500
40
2
100
500
40
3
100
500
Greedy
Reserv
Table 9: Results of applying cell-by-cell decomposition with feature extraction to Data Set 2.
was small, the DP solution performed slightly worse than the reservation policy, but generally
the DP solution resulted in a cost that improved upon the reservation policy from a negligible
amount to about thirteen percent. Increasing the number of iterations generally improved
performance, but not consistently nor substantially. When the utilization of the system is
high, the DP solution outperformed the greedy policy to a greater extent, but outperformed
the reservation policy to a lesser extent. In general, performance did not seem to depend on
which of the three feature sets were used. In fact, the rst feature set often outperformed
the others in spite of the fact that its state space is the coarsest.
Table 10 summarizes the best results of applying cell-by-cell decomposition with feature
extraction.
5 Summary
In this paper, we formulated the admission control problem as a Markov decision problem.
One of the main advantages of such a formulation is that it can incorporate an arbitrary
amount of detail necessary to describe real cellular systems. While dynamic programming
can be used to solve such problems, the large size of the state space makes this impractical.
We proposed an approximate dynamic programming technique, cell-by-cell decomposition
with feature extraction, which involved creating an approximation of the original model with
a state space su ciently small so that dynamic programming could be applied. The real-time
computation required by our proposed technique is small and is easily distributed between
32
Feature Set 1
Feature Set 2
Feature Set 3
Data Set % Imp/ % Imp/ % Imp/ % Imp/ % Imp/ % Imp/
Number Greedy Reserv Greedy Reserv Greedy Reserv
1
23.97
10.34
24.01
10.39
23.62
9.92
2
27.49
9.69
27.43
9.63
26.54
8.52
3
25.02
12.60
23.91
11.31
25.05
12.64
4
30.77
11.84
30.77
11.84
29.99
10.84
5
40.40
3.08
40.39
3.07
39.33
1.33
6
44.70
4.71
44.43
4.24
43.96
3.43
7
33.39
7.23
33.14
6.89
32.21
5.59
Table 10: Summary of the best results of applying cell-by-cell decomposition with feature extrac-
tion to the seven data sets. The cost improvement of the results for each of the feature sets over
the results of applying the greedy and reservation policies are provided.
the cells. Our comparisons on a simulated systems with commonly used heuristic policies
have shown signi cant improvements in performance for our methodology.
Acknowledgments
This research was supported by NSF under grant NCR-9622636.
References
AAN96] P. Agrawal, D. K. Anvekar, and B. Narendran. Channel management policies for
handovers in cellular networks. Bell Labs Technical Journal, pp. 97{110, Autumn 1996.
AS96] M. Asawa and W. E. Stark. Optimal scheduling of hando s in cellular networks.
IEEE/ACM Transactions on Networking, 4:428{441, June 1996.
Ber95] D. P. Bertsekas. Dynamic Programming and Optimal Control, Vols I and II. Athena
Scienti c, Belmont, MA, 1995.
BWE95] C. Barnhart, J. Wieselthier, and A. Ephremides. Admission-control policies for
multihop wireless networks. Wireless Networks, 1:373{387, 1995.
CG98] Y. Chang and E. Geraniotis. Optimal policies for hando and channel assignment
in networks of leo satellites using cdma. Wireless Networks, 4:181{187, 1998.
33
Gue88] R. Guerin. Queueing-blocking system with two arrival streams and guard channels.
IEEE Transactions on Communications, 36(2):153{163, 1988.
HR86] D. Hong and S. S. Rappaport. Tra c model and performance analysis for cellular
mobile radio telephone systems with prioritized and nonprioritized hando procedures.
IEEE Transactions on Vehicular Technology, VT-35(3):77{92, Aug. 1986.
LAN97] D. A. Levine, I. F. Akyildiz, and M. Naghshineh. A resource estimation and call
admission algorithm for wireless multimedia networks using the shadow cluster concept.
IEEE/ACM Transactions on Networking, 5:1{12, Feb. 1997.
PG85] E. C. Posner and R. Guerin. Tra c policies in cellular radio that minimize blocking
of hando calls. ITC 11., Kyoto, Japan 1985.
SS97] M. Sidi and D. Starobinski. New call blocking versus hando blocking in cellular
networks. Wireless Networks, 3:15{27, 1997.
TJ92] S. Tekinay and B. Jabbari. A measurement-based prioritization scheme for handovers in mobile cellular networks. IEEE Journal on Selected Areas in Communications,
10:1343{1350, 1992.
34