Optimizing Coverage and Capacity in Cellular Networks Using Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Optimizing Coverage and Capacity in Cellular

Networks using Machine Learning


Ryan M. Dreifuerst∗ , Samuel Daulton† , Yuchen Qian† , Paul Varkey† , Maximilian Balandat† , Sanjay Kasturia† ,
Anoop Tomar† , Ali Yazdan† , Vish Ponnampalam† , Robert W. Heath‡
† Facebook,
Menlo Park CA
∗ Wireless
Networking and Communications Group, The University of Texas at Austin, Austin TX
‡ Department of Electrical and Computer Engineering, North Carolina State University, Raleigh NC
arXiv:2010.13710v2 [eess.SP] 8 Feb 2021

Abstract—Wireless cellular networks have many parameters for static cell boundaries by first dividing a region into
that are normally tuned upon deployment and re-tuned as the serving cell blocks and then optimizing poorly performing
network changes. Many operational parameters affect reference cells using ML techniques [5]. This does not allow cell regions
signal received power (RSRP), reference signal received quality
(RSRQ), signal-to-interference-plus-noise-ratio (SINR), and, ulti- to change, however, which is very inaccurate when antenna
mately, throughput. In this paper, we develop and compare two downtilt is modified. Reinforcement learning is a useful tool
approaches for maximizing coverage and minimizing interference for CCO thanks to its natural formulation as a state-action-
by jointly optimizing the transmit power and downtilt (elevation reward dynamic ([7], [8]). Early work was reported in [7], but
tilt) settings across sectors. To evaluate different parameter con- this required individual cells to be optimized independently.
figurations offline, we construct a realistic simulation model that
captures geographic correlations. Using this model, we evaluate Multi-cell optimization was performed with RL in [9], but
two optimization methods: deep deterministic policy gradient the choice of optimization parameters for signal power and
(DDPG), a reinforcement learning (RL) algorithm, and multi- interference power was not investigated. More recently, [8]
objective Bayesian optimization (BO). Our simulations show that proposed a multi-agent mean field RL algorithm for macrocell
both approaches significantly outperform random search and optimization. While scalable to large networks, this method
converge to comparable Pareto frontiers, but that BO converges
with two orders of magnitude fewer evaluations than DDPG. did not consider realistic radio channel characteristics.
Our results suggest that data-driven techniques can effectively
self-optimize coverage and capacity in cellular networks. In this paper, we propose optimizing capacity and coverage
Index Terms—coverage and capacity optimization, Bayesian in a multi-cell network by jointly tuning the downtilt and
optimization, machine learning, reinforcement learning the transmit power in each sector. We treat the coverage and
capacity objectives as black-box functions, with no analytical
I. I NTRODUCTION formula and no gradient observations. The goal is to maximize
Simultaneous optimization of coverage and capacity (CCO) coverage and capacity by optimizing the downtilt and transmit
is a central problem in the deployment and configuration of power in each sector over a bounded set X ⊂ Rd . In this
wireless cellular networks. Data-driven methods are ideally multi-objective optimization problem, there is no single best
suited to this problem, given the complexity of network solution; rather, the goal is to identify the set of Pareto
configuration and the availability of performance-related data. optimal solutions, where any increase in coverage means
However, prior work on intelligent network configuration in degrading capacity (and vice versa). We evaluate two black-
industry has only considered rule-based configuration settings, box optimization techniques for solving the multi-objective
e.g. self-organizing network (SON) paradigm for 4G networks optimization problem: Bayesian optimization (BO) [10], a
[1], and required costly manual oversight due to the many sample-efficient method that uses a probabilistic surrogate
radio environments and deployment configurations [2], [3], model to balance exploration and exploitation, and the deep
[4]. Machine learning (ML) and modern Bayesian optimization deterministic policy gradient algorithm (DDPG) [11], which
techniques have the potential to identify solutions that improve has been shown to have strong empirical performance over
performance and reduce the operationally expensive manual high-dimensional, continuous action spaces. Since there is no
oversight. single best solution, we compare the methods in terms of the
As data collection capabilities of cellular networks increase, quality of the solution sets identified by each method and the
there are more opportunities to tackle CCO with ML [5]. sample efficiency of each method. Since running field experi-
In addition, without improved optimization, the demand on ments on a real cell network is time-consuming and could de-
cellular networks will require dramatic densification [6]. Den- grade network performance, we develop a spatially consistent
sification will increase cell overlap and result in massive, RF coverage map simulator for evaluating the optimization
complex networks to manage–an impossible task for rule- methods. Due to the computational complexity of realistic
based systems. Initially, optimization was only considered propagation models, we combine offline computation of radio
©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
propagation using a MATLAB-based open-source simulator It is illustrative to understand the significance of these two
from the Fraunhofer Heinrich Hertz Institute [12] with Python- objectives, by inspecting a linear combination of them. Let
based post-processing to compute the user-defined coverage a linear combination of these objectives depend on setting x
and capacity objectives. Our code and simulations are available and parameter λ. Then, an optimal setting x∗ that maximizes
on Github [13]. a linear combination may be written as :

II. P ROBLEM F ORMULATION x∗ = arg max u(x)


x
Consider a pre-defined area of interest containing multiple X (b)
X (b)
X (b0 )

= arg max λ rij +(1 − λ) rij − rij
base stations with multiple sectors, under the management x
i,j i,j b0 6=b
of a single controller. We identify areas of under coverage X 
(b) (b0 )
X
(coverage holes) as those locations that do not have enough = arg max rij −(1 − λ) · rij
received signal strength. We identify areas of over-coverage x
i,j b0 6=b
as those where the interfering cells are generating too much a
X 
(b) (b)
interference. To define these concepts more precisely, we use = arg max 10 log Sij − (1 − λ)10 log Iij
x
the notion of RSRP. In LTE networks, RSRP is a common i,j
performance indicator reported by user equipment (UE) to rep- (b)
X Sij
 
resent the signal level and coverage quality, which is used for = arg max 10 log (b)
determining the serving cell. Using the RSRP, under covered
x
i,j (Iij )(1−λ)
areas are those locations where the maximum RSRP from any (b
We change variables from rij to 10 log Sij and from
(b)
sector falls below a pre-specified threshold while over-covered P (b0 ) (b)
areas are the locations where the difference between the b0 6=b rij to 10 log Iij in (a), to signify signal and inter-
maximum RSRP and the sum of received powers from other ference power (in dB).
sectors does not exceed another pre-specified threshold. The Thus we see that the optimization seamlessly trades off
use of pre-defined thresholds is based on receiver sensitivity, between optimizing signal to interference power (λ = 0) and
selectivity, network density, and interference management for RSRP (λ = 1), which roughly correspond to optimizing for
frequency reuse, which is standard in LTE networks. Common capacity and coverage, respectively.
values for these thresholds are 6dB for over-coverage and -110 In the rest of this work, we select the following two sums of
dBm for under-coverage. sigmoid functions, centered around the respective thresholds,
as the dual objectives to optimize. The under-coverage (1) and
A configuration x specifies a joint setting of transmit power
over-coverage (2) functions are defined as
and downtilt for each sector. In this work, we consider 10
discretized values for downtilt selection and allow the transmit X (b) 
σ γw − rij : under coverage (1)
power to be a continuous value defined over a restricted range.
i,j
Note that the joint setting space, for N controllable antennas, X  X (b0 ) 
is exponential in N , which precludes naive brute-force search. (b)
σ rij − rij +γo : over coverage. (2)
Let γw and γo be the weak and over coverage thresholds, i,j b0 6=b
respectively. We consider a 2D-grid world representation of where σ is the sigmoid function. The use of sigmoids is impor-
the area of interest, and index the row and column, for each tant because gradient-based optimization with hard threshold-
grid point, using i and j, respectively. In the following, we ing results in sparse gradients and poor learning. Specifically,
let b denote the sector antenna from which the highest signal learning only occurs if a region is either under- or over-
power is received at any location, also known as the attached covered, but not if a region could be improved. Instead, a
(b)
or serving sector, and rij denotes RSRP from that sector. b0 soft thresholding function that saturates, is differentiable, and
represents any other sector. monotonic will lead to dense gradients. The sigmoid function
(b) (b) P (b0 )
If rij < γw (or, rij − b0 6=b rij < γw ), location in particular is a commonly used method for soft thresholding.
[i, j] is said to be under-covered (respectively, over-covered). Note that the signs are flipped to signify that both under-
Minimizing both under-coverage and over-coverage presents coverage and over-coverage are to be minimized.
a multi-objective optimization problem. Typically in multi-
objective optimization, there is no single best solution; rather, III. S IMULATION DATA GENERATION
the goal is to identify the set of Pareto optimal solutions. A The RF Reference Signal Receive Power maps are sim-
solution is Pareto optimal if any improvement of one metric ulated using a MATLAB tool suite called QuaDRiGa [12].
deteriorates another. Provided with the Pareto set of optimal We consider a 1.2 × 1.2 km2 network site composed of
trade-offs, network operators can select a Pareto optimal five base stations with three sector antennas each with con-
configuration according to their preferences. The choice of trollable downtilt, di ∈ {0, 1, ...10}, and transmit power,
optimizing over coverage thresholds ensures serving areas are pi ∈ [30, 50] dBm, making the configuration vector x =
evaluated fairly and network-wide performance metrics can be [a1 , a2 ...a15 ]T , ai = [di , pi ]T . Locations of the base stations
defined over regions rather than UE specific metrics. and azimuth orientations of the sectors are selected in advance
and antenna parameters are based on commonly available IV. BAYESIAN O PTIMIZATION
hardware. Simulating each of the exponentially many joint Bayesian optimization (BO) is a sample-efficient method for
settings is infeasible. Instead, we sample the RF channel black-box optimization that that uses a probabilistic surrogate
parameters once for each grid point from each sector antenna. model to balance exploration and exploitation. The surrogate
Then we iteratively store the coverage map generated by ai model is typically a Gaussian Process (GP), which provides a
for each downtilt, totaling 150 coverage maps for one RF posterior distribution P (f |D) over the true function values
environment. During model training, for any given input joint given observed data D = {xn , yn }N n=1 and is known for
configuration, x, the associated downtilt maps are scaled by its well-calibrated uncertainty estimates [14]. BO provides a
adjusting the RSRP at each point by the associated transmit distinct advantage over deep learning approaches because of
power. An example of the sum of RSRP maps over each sector the uncertainty estimates, and natural evolution from Bayes
antenna is shown in Fig. 1. Theory. In BO, the next candidate is selected as the configura-
tion that maximizes an acquisition function α(x) that provides
the utility of evaluating x on the true function. While the true
function is expensive-to-evaluate, the acquisition function is
cheap-to-compute and optimize using the surrogate model to
select the next candidate xN +1 = arg maxx α(x).
Evaluating a configuration x in a real deployment is time-
consuming because it requires running a field trial using x
and measuring the observed weak-coverage and over-coverage.
Since only a single configuration can be evaluated at a time,
efficiently converging to the set of optimal configurations is
critical.
We model each objective with an independent GP with
Matern-5/2 ARD kernel and fit the GP hyperparameters using
maximum a posteriori estimation. In order to optimize both
objectives simultaneously, we use the q-Expected Hypervol-
ume Improvement (qEHVI) acquisition function [15]. We
initialized the optimization with space-filling design using 512
Fig. 1. Example of an RSRP map summed over all the sectors. Base stations points from a scrambled Sobol sequence and then ran 500
are marked by a red circle, and all antennas are configured to 5◦ downtilt iterations of BO.
and 46dBm transmit power. The upper right base station, circled, has a strong
effect on local RSRP due to being placed only 20m above ground as opposed V. R EINFORCEMENT L EARNING : DDPG
to 25-30m for the other sites.
The DDPG algorithm integrates deep neural networks with
To ground intuition about the dual objectives of interest the deterministic policy gradient algorithm. Based on the
(under-coverage and over-coverage), in Figure 2 we illustrate actor-critic architecture, it takes advantage of policy-based and
the RF Coverage KPIs for minimum power and maximum value-based learning methods to learn ‘end-to-end’ policies in
power, on the left and right sides, respectively. The downtilt high-dimensional and continuous action spaces [11]. The actor
was arbitrarily set. The gray color represents good cover- network learns the policy mapping states to specific actions
age, and black and white represent, respectively, areas of deterministically and the critic network learns Q-values of
under-coverage (prevalent in the low-power regime) and over- action-state pairs.
coverage (prevalent in the high-power regime). In the current DDPG model, the states are defined as the
operational status of different sectors. Typically all the sectors
will work fine, and states remained unchanged during the time.
The actions contain the N length configuration vector, x. The
reward function is defined to be a convex combination of two
coverage metrics:
X (b) 
X  X (b0 ) (b)

λ σ γw − rij + (1 − λ) σ rij − rij +γo .
i,j i,j b0 6=b

We formulate the reward as a convex combination because


the two metrics, over- and under-coverage are intuitively and
visually shown to be convex in the BO results from Fig. 4. As
a result, a convex combination will explore the Pareto frontier.
Fig. 2. RF Coverage maps showing under-coverage and over-coverage
measures for minimum and maximum powers, respectively. Under-coverage
The algorithm will concentrate on areas of over-coverage or
is shown in black while over-coverage is shown in white. areas of under-coverage according to different λ values (see
discussion in Section II).
The simulation goes through λ settings from 0 to 1 with VI. D ISCUSSION AND FUTURE WORK
0.1 strides, to determine the best configurations in different
situations. The replay memory capacity is set to 5,000. At In Figure 4, we compare DDPG and BO with a base-
training time, explorations are performed by adding Gaussian line policy that randomly selects configurations with uniform
noise with decaying variance to the actions. Initially, the noise probability and retains the best configuration. Both BO and
variance starts at 1, and the decay coefficient is 0.9996 per DDPG significantly outperform random search and identify
iteration. For one specific setting λ, the algorithm will run for comparable Pareto frontiers. In finding the frontier, however,
30,000 iterations. The choice of parameters did not greatly DDPG required an explicit convex combination, whereas BO
affect performance, but we found these to perform best in our is able to sample along the frontier through multi-objective
investigation. optimization. We also see that the RL approach is able to
Figure 3 contrasts the improved nature of RF coverage identify a slightly better Pareto frontier. However, DDPG
obtained by DDPG (right-hand side) over the best setting required evaluating over two orders of magnitude more con-
randomly found (left-hand side) for λ = 0.6 (as before, gray figurations than BO: the DDPG Pareto frontier was identified
represents areas of good coverage, while black and white rep- with 300,000 evaluations and BO only used 1,012 evaluations.
resent under-coverage and over-coverage, respectively). Note Given that BO can identify a comparable Pareto front and
that the results from BO are visually similar and so are not improve sample efficiency by over two orders of magnitude
shown here. relative to DDPG, BO shows significant promise for optimizing
network coverage and capacity in real-world field experiments
where sample efficiency is crucial.
In future work, we consider two directions: network scaling
and risk-averse optimization. Scaling up to larger geographic
areas with larger search spaces will be critical for real world
deployment, where optimization from a single central source
may be infeasible for hundreds or thousands of cells. Another
practical concern is ensuring that a parameter configuration
does not lead to unacceptable quality of network service (e.g.
from large coverage gaps). Neither the DDPG nor BO methods
considered in this work ensure that only ”safe” parameter
configurations will be tested, but there there is a body of
Fig. 3. Random vs. DDPG: When λ = 0.6, (A) the RF Coverage map with literature on risk-averse BO [16] and RL [17] that could be
the best settings from random search resulting in 12% under-coverage and used to mitigate the risk of severe degradation in network
25% over-coverage area (B) the RF Coverage map with optimized settings performance.
from DDPG search resulting in 9% under-coverage and 17% over-coverage
area. Automated coverage and capacity optimization (CCO) in
cellular networks has a long history beginning with rule-
based or heuristic SONs. With the extensive data collection
capabilities now supported by 3GPP, LTE networks are now
moving from rule-based SONs to ML-based SONs [5]. Our
work considers two approaches: deep deterministic policy
gradient and Bayesian optimization. We have created a simu-
lation environment to generate realistic RF environments and
have shown the results of these algorithms for optimizing cell
configurations to balance the conflicting goals of capacity and
coverage in a network. Our results demonstrate the utility
of principled multi-objective optimization for sample-efficient
network optimization.

R EFERENCES

[1] O. G. Aliu, A. Imran, M. A. Imran, and B. Evans, “A Survey of


Self Organisation in Future Cellular Networks,” IEEE Communications
Surveys Tutorials, vol. 15, no. 1, pp. 336–361, 2013.
[2] V. Buenestado, M. Toril, S. Luna-Ramı́rez, J. M. Ruiz-Avilés, and A.
Mendo, “Self-tuning of Remote Electrical Tilts Based on Call Traces
for Coverage and Capacity Optimization in LTE,” IEEE Transactions
Fig. 4. Comparison of the Pareto frontier for random search, BO, and DDPG. on Vehicular Technology, vol. 66, no. 5, pp. 4315–4326, 2017.
DDPG achieves the best frontier, with an average improvement of 1.0% over [3] A. Temesvary, “Self-Configuration of Antenna Tilt and Power for
BO. As expected, the two conflicting metrics create a convex frontier in R2 . Plug Play Deployed Cellular Networks,” in 2009 IEEE Wireless
Communications and Networking Conference, 2009, pp. 1–6.
[4] M. Amirijoo, L. Jorguseski, R. Litjens, and L. C. Schmelz, “Cell
Outage Compensation in LTE Networks: Algorithms and Performance
Assessment,” in 2011 IEEE 73rd Vehicular Technology Conference (VTC
Spring), 2011, pp. 1–5.
[5] Z. Lin, Y. Ouyang, L. Su, W. Lu, and Z. Li, “A Machine Learning
Assisted Method of Coverage and Capacity Optimization (CCO) in 4G
LTE Self Organizing Networks (SON),” in 2019 Wireless Telecommu-
nications Symposium (WTS), 2019, pp. 1–9.
[6] R. Jurdi, J. G. Andrews, D. Parsons, and R. W. Heath, “Identifying
coverage holes: Where to densify?,” in 2017 51st Asilomar Conference
on Signals, Systems, and Computers, 2017, pp. 1417–1421.
[7] R. Razavi, S. Klein, and H. Claussen, “Self-optimization of Capacity
and Coverage in LTE Networks using a Fuzzy Reinforcement Learning
Approach,” in 21st Annual IEEE International Symposium on Personal,
Indoor and Mobile Radio Communications, 2010, pp. 1865–1870.
[8] E. Balevi and J. G. Andrews, “Online Antenna Tuning in Heteroge-
neous Cellular Networks With Deep Reinforcement Learning,” IEEE
Transactions on Cognitive Communications and Networking, vol. 5, no.
4, pp. 1113–1124, 2019.
[9] N. Dandanov, H. Al-Shatri, A. Klein, and V. Poulkov, “Dynamic Self-
Optimization of the Antenna Tilt for Best Trade-off Between Coverage
and Capacity in Mobile Networks,” Wireless Personal Communications,
vol. 92, no. 1, pp. 251–278, 2017.
[10] M. Balandat, B. Karrer, D. R. Jiang, S. Daulton, B. Letham, A. G. Wil-
son, and E. Bakshy, “BoTorch: Programmable Bayesian Optimization
in PyTorch,” in Advances in Neural Information Processing Systems 33.
2020.
[11] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. A.
Riedmiller, “Deterministic Policy Gradient Algorithms,” in ICML, 2014.
[12] F. Burkhardt, S. Jaeckel, E. Eberlein, and R. Prieto-Cerdeira,
“QuaDRiGa: A MIMO channel model for land mobile satellite,” in
European Conference on Antennas and Propagation (EuCAP 2014),
2014, pp. 1274–1278.
[13] R. M. Dreifuerst, “Quadriga-based RF channel and response simulation
(data and code),” https://github.com/Ryandry1st/CCO-in-ORAN.
[14] C. E. Rasmussen, Gaussian Processes in Machine Learning, pp. 63–71,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
[15] S. Daulton, M. Balandat, and E. Bakshy, “Differentiable Expected
Hypervolume Improvement for Parallel Multi-Objective Bayesian Opti-
mization,” in Advances in Neural Information Processing Systems 33.
2020.
[16] S. Cakmak, R. Astudillo, P. Frazier, and E. Zhou, “Bayesian Optimiza-
tion of Risk Measures,” 2020.
[17] P. Geibel and F. Wysotzki, Risk-Sensitive Reinforcement Learning, pp.
2866–2869, Springer US, Boston, MA, 2012.

You might also like