Q-Learning Based Link Adaptation in 5G
Q-Learning Based Link Adaptation in 5G
Q-Learning Based Link Adaptation in 5G
MAC
Abstract—This paper proposes a novel method of con- tuned parameters in the LA design process to optimize
structing a Q-learning framework for link adaptation (LA) performance. The main benefits of OLA are simple im-
in the fifth generation (5G) mobile network. The state-action plementation and smooth transitions between MCSs and
function is approximated via a neural network (NN). The
state space relies on the hybrid automatic repeat request transmission ranks. However, there are two main draw-
(HARQ) and channel state information (CSI) reports from backs in the OLA approach. First, heuristically tuned
the user equipment (UE). The output of the Q-learning based parameters can lead to suboptimal performance. Second,
LA (QLA) approach consists of the assigned modulation the number of these parameters can explode because the
and coding schemes (MCSs) and number of layers, which number of 5G multi-service scenarios becomes massive.
are used to construct the action space. Reward is calculated
based on the HARQ information and the transmit block size As a result, researchers have been studying how to
(TBS). System level simulation in a typical indoor hotspot introduce network intelligence, in order to simultaneously
scenario has been performed, showing that the proposed minimize heuristically tuned parameters and provide opti-
QLA outperforms the ordinary LA approach in terms of mal performance [5]. One consideration is reinforcement
user throughput. learning [6]. Authors in [7] considered the state-action-
Index Terms—Q-learning, link adaptation, neural network,
5G. reward-state-action (SARSA) learning approach for LA.
However, the number of layers, which plays a crucial
I. I NTRODUCTION role in a 5G MIMO system, has not been taken into
account in the framework in [7]. Also, the state in [7] has
The emerging demand for high-speed reliable com- not considered any past acknowledgement information,
munications with significantly improved user experience which can result in sudden fluctuations in instantaneous
has been driving the development of the fifth generation UE throughput. In a multi-cell mobile network, two archi-
(5G) wireless communication network [1]. Currently, 5G tectures of reinforcement learning are widely considered.
networks have introduced technologies such as massive One is the single-agent method, which assumes that there
multiple-input multiple-output (MIMO), millimeter wave exists a central agent which controls all the actions of base
(mmWave), and vehicle-to-everything (V2X) to cover an stations. This architecture is close to optimum but it is less
unprecedentedly wide range of application scenarios. This practical because the overhead is high when the central
means that a 5G network should be able to deliver infor- agent is attempting to collect information from all base
mation in various channel conditions. Additionally, it can stations. Another architecture is the multi-agent method,
be expected that the application of artificial intelligence i.e., there are multiple agents to perform reinforcement
and machine learning [2] can further enhance the capabil- learning individually. There may be shared information
ity of 5G networks. across all agents, which can still increase overhead.
Link adaptation (LA) is used to ensure link throughput This paper proposes a Q-learning framework for LA
and reliability according to different channel conditions. (QLA). The key contributions of this paper are two-fold.
LA relies on the channel measurement feedback and 1) The proposed framework combines the benefits of
packet acknowledgement from a UE to meet a certain OLA and reinforcement based method. The pro-
block error rate (BLER) and has played an important posed QLA considers past HARQ information and
role in both the fourth generation (4G) and 5G mobile CSI feedback from user in a state. This design takes
networks. the filtering element from OLA to guarantee smooth
LA consists of adjusting modulation orders and code transitions between MCSs and transmission ranks. In
rates (MCS) and the number of MIMO layers, also known the action space, the number of transmission ranks
as transmission rank. Higher MCS and transmission rank is also obtained in addition to MCSs. Additionally,
can potentially increase data rate but at the same time neural network (NN) is applied to approximating the
increase BLER. Ordinary LA (OLA) approaches such as state-action function. The corresponding architecture
[3] and [4] use filtering methods to decide the increment of the QLA-based radio access network (RAN) is
and decrement of MCSs based on the statistics of HARQ depicted in Fig. 1, where multiple QLA modules
information of a UE. These rely on many heuristically are equipped in the medium access control (MAC)
978-1-5386-8110-7/19/$31.00 © 2019 IEEE layer. Each module can be assigned to a UE and
Authorized licensed use limited to: QSIO. Downloaded on October 28,2022 at 23:52:49 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications: Track 2: Networking and
MAC
K
where aK = {0, 1} is a vector recording the HARQ
K
information and rK = {1, 2, 3, 4} is a vector recoding
the rank information of the past K transmissions. Then,
the metric β will be mapped to find the MCSs and
Nlayers to be assigned in the next transmission. If the
metric β is larger than a heuristic threshold, MCS and/or
Nlayers will be increased, where as β is smaller than a
heuristic threshold, MCS and/or Nlayers will be decreased.
When K is small, the output MCSs and Nlayers will be
more relevant to the instantaneous CSI report, causing
larger fluctuations in user throughput. When K is large,
the output MCSs and Nlayers will rely more on past
information, resulting in smoother but more conservative
user throughputs. Also, as mentioned before, the number
of heuristic parameters such as K and thresholds can grow
Fig. 1. Architecture of the proposed QLA-based RAN. massively when more scenarios need to be covered by the
system.
Authorized licensed use limited to: QSIO. Downloaded on October 28,2022 at 23:52:49 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications: Track 2: Networking and
MAC
Authorized licensed use limited to: QSIO. Downloaded on October 28,2022 at 23:52:49 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications: Track 2: Networking and
MAC
TABLE I
1: Q(s, a) = 0, ∀s, a; π(s, a) = |A(s)|
1
, ∀a; S ETTINGS OF THE NN GRAPH
2: for t = 1, 2, 3, . . . do
3: BS chooses action a according to π (s); Input layer neurons 128
4: BS assigns MCSs, Nlayers corresponding to a and Output layer neurons (Nlayers − 1) × 15 × 15 + 15
transmits data packages to UE; Hidden layer neurons twice the output layer neurons
5: UE feeds back ACK/NACK and CSI to BS, form- Optimizer AdamOptimizer [12]
ing new state s ;
6: BS calculates reward r and computes a = π(s );
7: BS updates the state-action function be in the order of 102 bits while that of (sH , aH ) can be in
the order 104 bits. If both errors are equally weighted and
Q(s, a) = Q(s, a) + α (r + γQ(s , a ) − Q(s, a)) ; contributing to the computation of RMSE, the objective
8: BS updates policy π(s) = argmax Q(s, u); function will be highly biased to multi-layer high-MCS
u actions. To overcome this problem, a relative metric can
9: s = s , a = a ;
be used. As a result, the objective function considered in
10: end for
this paper is defined by the root mean squared logarithmic
error (RMSLE), which considers the relative error instead
Fig. 3. Pseudo codes of QLA. of absolute error values. The RMSLE can be expressed as
N 2
target
where Qpredict
i is the ith element of the NN output vector
and Qtarget
i is the ith element of the target vector.
Authorized licensed use limited to: QSIO. Downloaded on October 28,2022 at 23:52:49 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications: Track 2: Networking and
MAC
15
10 10-1
5
10-2
Exploration
0
0 500 1000 1500 2000 2500 3000 3500 4000
10-3
15
10-4
10
5 10-5
0
0 100 200 300 400 500 600 700 800 900 1000 10-6
0 500 1000 1500 2000 2500 3000
Fig. 5. Comparison of MCS1 between QLA and OLA. Fig. 7. RMLSE with respect to number of iterations.
4.5
2
4 1.9
3.5 1.8
1.7
3
1.6
2.5
1.5
2
1.4
1.5 1.3
1 1.2
Mean
1.1
0.5 Median
0 100 200 300 400 500 600 700 800
1
5 10 15 20 25 30 35 40 45
Authorized licensed use limited to: QSIO. Downloaded on October 28,2022 at 23:52:49 UTC from IEEE Xplore. Restrictions apply.
2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications: Track 2: Networking and
MAC
TABLE II
1
PMF S OF NUMBERS OF LAYERS IN OLA AND QLA
0.9
Rank 1 Rank 2
0.8
OLA 89% 11%
Rank PMF 0.7
QLA 1% 99%
OLA 57% 42% 0.6
ACK PMF
QLA 70% 85% 0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18
TABLE III
U SER THROUGHPUT GAIN OF QLA OVER OLA
Nt =8, Nr =2 Nt =8, Nr =4
5% throughput 1% 3%
Median throughput 13% 12%
95% throughput 5% 5%
Mean throughput 6% 8%
Fig. 9. Histogram of MCSs in OLA and QLA.
Authorized licensed use limited to: QSIO. Downloaded on October 28,2022 at 23:52:49 UTC from IEEE Xplore. Restrictions apply.