Hybrid Precoding For Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach
Hybrid Precoding For Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach
Hybrid Precoding For Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach
Abstract—In multi-user millimeter wave (mmWave) multiple- transmitter and receiver sides by yielding a massive multiple-
arXiv:1911.04239v1 [eess.SP] 11 Nov 2019
input-multiple-output (MIMO) systems, hybrid precoding is a input-multiple-output (MIMO) structure enhancing the signal-
crucial task to lower the complexity and cost while achieving a to-noise ratio (SNR) at the received signal [3].
sufficient sum-rate. Previous works on hybrid precoding were
usually based on optimization or greedy approaches. These Signal processing in conventional systems with frequencies
methods either provide higher complexity or have sub-optimum lower than 3GHz is performed digitally where both the am-
performance. Moreover, the performance of these methods mostly plitude and the phases are processed in the baseband. For
relies on the quality of the channel data. In this work, we this reason, dedicated radio-frequency (RF) hardware for each
propose a deep learning (DL) framework to improve the per- antenna element is required [4]. Unfortunately, in the case of
formance and provide less computation time as compared to
conventional techniques. In fact, we design a convolutional neural mmWave MIMO systems implemented with a large number of
network for MIMO (CNN-MIMO) that accepts as input an antennas, digital processing is not cost-efficient since it brings
imperfect channel matrix and gives the analog precoder and high cost at the system hardware and significant complexity.
combiners at the output. The procedure includes two main To reduce the cost and provide sufficient performance, hy-
stages. First, we develop an exhaustive search algorithm to brid precoding architectures are proposed where the signal
select the analog precoder and combiners from a predefined
codebook maximizing the achievable sum-rate. Then, the selected is processed by both analog and digital precoders [5]–[8].
precoder and combiners are used as output labels in the training Especially, in the analog processing part of the hybrid systems,
stage of CNN-MIMO where the input-output pairs are obtained. phase shifters with constant modulus are usually used. The
We evaluate the performance of the proposed method through role of phase shifters is the introduction of discrete phases
numerous and extensive simulations and show that the proposed to the transmitted/received signal to steer the beam, and thus,
DL framework outperforms conventional techniques. Overall,
CNN-MIMO provides a robust hybrid precoding scheme in the increase the gain [8].
presence of imperfections regarding the channel matrix. On In recent years, several techniques have been proposed to
top of this, the proposed approach exhibits less computation design the hybrid precoding in mmWave MIMO systems. In
time with comparison to the optimization and codebook based particular, initial works focused on the single-user scenario
approaches. [6]. In such a case, the user is assumed to be deployed with
Index Terms—Hybrid precoding, mmWave systems, multi- multiple antennas. While the single-user case constitutes the
user MIMO transmission, deep learning, convolutional neural baseline for multi-user systems being of practical interest, the
networks. interference from other users should be taken into account
when designing the precoders [7]–[10]. In [8], the performance
of low-resolution analog to digital converters (ADCs) are
I. I NTRODUCTION
investigated when a single RF chain is used at mobile users.
Millimeter wave (mmWave) communication systems pro- In [9], simultaneous channel estimation is considered for
vide a higher data rate and wider bandwidth at high fre- multiple-user systems, while, in [10], antenna selection in
quencies (in the range of 30 − 300 GHz) [1]. Reasonably, mmWave MIMO is considered together with hybrid precoding
it has become a leading candidate to be realized in the fifth- estimation. The authors in [7] also consider the multi-user
generation (5G) wireless networks [2]. However, in mmWave scenario but the hybrid precoders are obtained by a greedy-like
bands, the propagation loss is higher as compared to conven- approach as in [6] where a simultaneous orthogonal matching
tional systems with lower frequencies [1], [2]. To overcome pursuit (SOMP) algorithm is proposed. It is worthwhile to
the high propagation path-loss and to provide beamforming mention that all of the above methods are based on the
power gain, massive numbers of antennas are used at both the assumption of perfect channel state information and the avail-
ability of the array response sets, namely, F and W for the
Copyright (c) 2015 IEEE. Personal use of this material is permitted. precoder and combiner design, respectively. These sets are
However, permission to use this material for any other purposes must be composed of the transmit and receive steering vectors with
obtained from the IEEE by sending a request to [email protected].
A. M. Elbir is with the Department of Electrical and Electronics Engineer- respect to the direction-of-arrival/departures (DOA/DODs) of
ing, Duzce University, Duzce, Turkey (e-mail: [email protected]). the user locations. Taking into consideration that these array
A. Papazafeiropoulos is with the Communications and Intelligent Systems responses are directly related to the singular value matrix of
Research Group, University of Hertfordshire, Hatfield AL10 9AB, U.K., and
also with SnT (http://www.securityandtrust.lu), University of Luxembourg, L- the channel through a linear transformation, they become the
1855 Luxembourg City, Luxembourg (e-mail: [email protected]). best candidates for the precoder design problem [5]–[7].
2
As a class of machine learning techniques, DL has gained channel matrix of users is selected as the input of CNN-
much interest recently for the solution of many challenging MIMO, and the output labels are selected as the hybrid
problems such as speech recognition, visual object recognition, precoder weights. In the training stage, which is an offline
and language processing [11], [12]. DL has several advan- process (please see Fig. 2), we generate several channel real-
tages such as low computational complexity when solving izations of multiple users and obtain the corresponding hybrid
optimization-based or combinatorial search problems and the precoders via an exhaustive search algorithm. This process
ability to extrapolate new features from a limited set of features requires the knowledge of the feasible sets of array responses
contained in a training set [11]. Very recently, a great deal of F, W which are not used in the prediction stage. Once the
attention has been received for DL-based techniques regarding network is trained, CNN-MIMO is used to predict the hybrid
radar [13], and fundamental communication theory topics precoders by simply feeding the network with the channel
[14]–[24] such as channel estimation [16], DOA estimation matrix of users. The proposed DL framework provides a
[17], and analog beam selection [18]. Especially, in the physi- nonlinear mapping between the channel matrix and the hybrid
cal layer of wireless communications, DL has been applied for beamformers. Hence, the proposed method achieves more
signal detection [19], channel estimation [21], [25], [26] and robust performance than the competing algorithms since the
dynamic multi-channel access problems [20]. In this direction, deep network can handle the imperfections and the corruptions
an end-to-end communication scenario is modeled in [21] and in the input channel data whereas the other algorithms do not
[22] by using auto-encoders where single-input-single-output have such capability. The proposed approach also has superior
(SISO) systems are considered. The authors in [23] have also sum-rate performance due to the use of the “best” hybrid
used auto-encoders for the channel state information (CSI) beamformers which are obtained via an exhaustive search in
feedback problem. Interestingly, [24] studies the physical layer the training process. The main contributions of this work are
structures without channel models via DL. as follows.
An interesting topic concerns the investigation of the hybrid • A DL-based approach is proposed for the hybrid pre-
precoding problem in the context of DL [27]–[31]. Inspired coding in multi-user massive MIMO mmWave systems.
from dense fully connected layers, deep multilayer perceptrons We leverage DL to estimate the precoder and combiner
(MLPs) have been proposed in [27]–[29]. Specifically, in [27] weights so that CNN-MIMO is more robust against the
and [28], MLP has been employed only for the precoder deviations in the channel matrix. Hence, the proposed
design and just for the single-user scenario. In [29], an MLP DL framework has superior performance with comparison
architecture is considered for coordinated beam training where to the conventional greedy and codebook based tech-
the perfect CSI is assumed to be known. Moreover, in [30], a niques [6]–[8] whose performances strongly rely on the
convolutional neural network (CNN)-based approach has been quality of the channel.
proposed for the joint precoder and combiner design problem • In most of the previous works such as [6], [7], the
but for the single-user setting again. Also, in [31], quantized codebooks formed by the feasible set of array responses
and unquantized CNNs have been used for hybrid precoding F and W are assumed to be known. Then, the analog
in the case of a single-user MIMO system. The performance precoding design problem reduces to the selection of the
of DL-based approaches such as [27]–[29] strongly relies on best candidates in F and W to maximize the sum-rate. In
the perfectness of the channel matrix whereas in [30] and [31], this work, we only need F and W in the training stage to
robust DL approaches are proposed against the imperfections obtain the network labels and the proposed DL technique
in the channel data but these works are developed only for the does not require such information in the prediction stage
single-user scenario. where DL network itself obtains the analog precoder
weights by learning the features hidden in the input data.
A. Motivation • To train the network, a very large training data (almost
Although there are optimization-based approaches that di- half a million samples) is generated. Hence, a robust
rectly estimate the precoders, they appear large computational performance against the imperfect channel case and the
complexity and local-minimum problems due to random ini- deviations in the channel data is achieved.
• The proposed approach also enjoys less computation
tialization [32]. Also, the design of hybrid precoders for the
common multi-user MIMO scenario, being of high practical time for hybrid precoding design. While the conventional
importance, has not been considered in the context of DL. techniques require an optimization process or greedy
Thus, driven by the advantages of DL such as its provided searches, our CNN approach estimates the precoders by
low computational complexity, we develop a method that can simply feeding the network with the corrupted channel
handle the hybrid precoding design in the case of multi-user matrix.
MIMO transmission in the mmWave region when corrupted
channel feedback data is available. C. Notation
Vectors and matrices are denoted by boldface lower and
B. Contribution upper case symbols, respectively. In the case of a vector a,
In this paper, we propose a DL framework in terms of [a]i represents its ith element. For a matrix A, [A]:,i and
a CNN, which is for mmWaves hybrid precoding design, [A]i,j denote the ith column and the (i, j)th entry, respectively.
henceforth called CNN-MIMO. In our DL framework, the IN is the identity matrix of size N × N , E{·} denotes the
3
III. P ROBLEM F ORMULATION We start by formulating the optimization problem for hybrid
precoding in the multi-user scenario as
The principal aim in this work is to design the hybrid {F̂BB , F̂RF , ŴRF } = argmax R̄
precoder and combiners FBB , FRF and {wRFk }K k=1 in the FBB ,FRF ,WRF
presence of imperfect channel data by maximizing the sum- subject to: FRF ∈ F, WRF ∈ W,
rate. Specifically, we first develop an algorithm to compute the kFRF FBB k2F = K, (7)
hybrid precoders which maximizes the sum-rate, and then a
deep network is designed such that the hybrid precoders are where WRF = [wRF1 , wRF2 , . . . , wRFK ] denotes the analog
predicted by feeding the network with imperfect CSI. combiner of all users while F and W are the feasible sets of
In a nutshell, the proposed DL framework provides a the precoder and combiners. In practice, both F and W are
(l,k) (l,k)
nonlinear mapping from the channel matrix H to the analog composed of the steering vectors aT (ΘT ) and aR (ΘR ),
beamformers FRF and {wRFk }K k=1 . The label generation
∀l, k with quantized phases, respectively. Specifically, the
process depends on the channel model which is not required array response sets are selected as
for updating the network parameters in the training stage. (1,1) (L,K)
Hence, CNN-MIMO can also be used for various channel F = {Q(aT (ΘT )), . . . , Q(aT (ΘT ))}, (8)
models in mmWave systems [37]. Given that our main focus and
is hybrid beamforming, in this work, we use the block-fading
(1,1) (L,K)
channel model due to the simplistic structure of channel matrix W = {Q(aR (ΘR )), . . . , Q(aR (ΘR ))}, (9)
model and rate computation [25], [26], [38]. The application of
DL to other channel models is the topic of ongoing research. where Q(·) denotes the phase quantization operator as men-
tioned before.
The estimation process of the channel matrix of the users is
In the exhaustive search algorithm, it is desired to visit
a challenging task, especially in the case of a large number of
all possible combinations of the elements in the feasible sets
antennas taking place in massive MIMO systems [39], [40].
F and W to achieve near-optimum performance. For this
In addition, since the coherence time of the channel is very
reason, we design new feasible sets F and W which include
short in the mmWave massive MIMO scenario, the parameters
all precoder and combiner combinations. The search algorithm
related to the channel characteristics change greatly in a short
visits all the nodes in the direction set
time [41]. To obtain a robust precoding performance, we feed
the deep network with several channel realizations which are 2π 4π (L̄ − 1)2π
D = [0, , ,..., ], (10)
corrupted by synthetic noise in the training stage which is an L̄ L̄ L̄
offline process. Hence, in the testing stage when the network where |D| = L̄. By assuming that the BS receives L̄ paths
predicts the precoder weights, the network does not necessarily from each user, the kth column of FRF can take L̄ different
require the perfect CSI [30]. We show, through simulations, (l,k)
values, i.e., {Q(aT (ΘT ))}L̄l=1 . If we generalize it for all
that the proposed approach can handle the corrupted channel users, we have QF = L̄K possible candidates to design FRF .
matrix case and exhibits satisfactory performance regarding Thus, we define a new set as
the achievable sum-rate.
The main stages of the proposed DL framework are label F = {F1 , F2 , . . . , FQF }, (11)
generation, training, and prediction. In the following section,
where FqF ∈ CNT ×K is given by
we first discuss how the labels are obtained from the channel
data. Then, in Section V, we present the details of the training (l ,1) (l ,2) (l ,K)
FqF = [Q(aT (ΘT1 )), Q(aT (ΘT2 )), . . . , Q(aT (ΘTK ))]
and the prediction stages.
with the indices for each user given by l1 , l2 , . . . , lK =
1, . . . , L̄. Hence, we have qF = 1, . . . , L̄K which de-
notes the precoder candidates for K users. In a similar
IV. H YBRID P RECODING D ESIGN I N M ULTI -U SER MIMO way, the set for the analog combiners is defined as W =
S YSTEMS {W1 , W2 , . . . , WQW } where WqW ∈ CNR ×K is given by
(l ,1) (l ,2) (l ,K)
WqW = [Q(aR (ΘR1 )), Q(aR (ΘR2 )), . . . , Q(aR (ΘRK ))]
In order to design the network and training data, we first
need to solve the hybrid precoding problem and obtain the with wRFk selected from the kth column of W, i.e.,
labels of the training data samples. For this reason, we first (l ,k)
Q(aR (ΘRk )).
develop an exhaustive search algorithm that visits all precoder Once the analog precoders are selected from the sets F and
and combiner combinations in the feasible sets F and W W, the effective channel Heff K×NTRF
qF ,qW ∈ C is given by
such that the sum-rate in (6) is maximized. Then, we solve
eff
the exhaustive search problem in an offline manner to obtain hqF ,qW ,1
the training data inputs and labels. The advantage of using heff
qF ,qW ,2
a DL approach is the reduction of the computation time of Heff
qF ,qW = .. , (12)
the hybrid precoding design problem and obtain near-optimum
.
performance that can be obtained from an exhaustive search. heff
qF ,qW ,K
5
Algorithm 1 Hybrid precoding for Multi-user MIMO and it accepts an input data of size NR × NT × 3 while it
Input: {Hk }K k=1 , F, W, D. yields a K(NT + NR ) × 1 vector at the output. The overall
Output: F̂RF , ŴRF . network architecture of CNN-MIMO can be represented by
1: for 1 ≤ qF ≤ QF do the function Π(·) : RNR ×NT ×3 → RK(NR +NT ) . Let us define
2: FRF = FqF , the arithmetic operation of the ith layer in the network with
3: for 1 ≤ qW ≤ QW do f (i) (·), then the representation of the overall network can be
4: wRFk = [WqW ]:,k , given as
H
5: heff
k = wRFk Hk FRF , Π(X) = f (10) f (9) (· · · f (1) (X) · · · ) = z,
† (16)
6: FBB = Heff ,
7: fBBk = fBBk /kFRF fBBk kF , where each layer has certain task described above and we
8: Compute R̄qF ,qW as in (14). explicitly show the arithmetic operations for fully connected
9: end for qW , layers are convolutional layers in the sequel.
10: end for qF , Let W̄ ∈ RCx ×Cy be the weights of a fully connected layer
11: {q̄F , q̄W } = arg maxqF ,qW R̄qF ,qW . in the network with input x̄ ∈ RCx and output ȳ ∈ RCy . The
12: F̂RF = Fq̄F and ŴRF = Wq̄W . cy th element of the output of the layer can be given by the
inner product
X
ȳcy = hW̄cy , x̄i = [W̄]Tcy ,i x̄i , (17)
where the corresponding effective channel for each user can i
be calculated as
for cy = 1, . . . , Cy and W̄cy is the cy th column vector of W̄.
heff H
qF ,qW ,k = [WqW ]:,k Hk FqF . (13) For a convolutional layer, define X̄ ∈ Rdx ×dx ×Cx and Ȳ ∈
dy ×dy ×Cy
R as the feature maps and output of a convolutional
The baseband precoder
† can be given by layer, respectively. Let us also define dx ×dy as the size of the
FBB,qF ,qW = Heff
qF ,qW and it is normalized as convolutional kernel, and Cx × Cy as the size of the response
(qF ,qW ) (qF ,qW ) (qF ,qW )
fBBk = fBBk /kFqF fBBk kF [7]. Thus, the of convolutional layer for each feature map. Then, the response
achievable sum-rate then can be written as of a convolutional layer becomes
X
R̄qF ,qW = log2 IK + Ȳpy ,cy = hW̄cy ,pk , X̄px i, (18)
pk ,px
P H effH
2
Heff
qF ,qW FBB,qF ,qW FBB,qF ,qW HqF ,qW . (14) where Ȳpy ,cy is the response for the 2-D spatial region py in
Kσ
the cy th channel of the feature maps, W̄cy ,pk ∈ RCx denotes
Using the sets F and W, the optimization problem in (7) can the weights of the cy th convolutional kernel, and X̄px ∈ RCx
be rewritten as is the input feature map at spatial position px . Hence we define
{q̄F , q̄W } = argmax R̄qF ,qW px and pk as the 2-D spatial positions in the feature maps and
qF ,qW convolutional kernels, respectively [42].
subject to: FRF = FqF , wRFk = [WqW ]:,k ,
heff H A. Training Data Generation
k = wRFk Hk FRF ,
† In order to train the network, we prepare a training dataset
FBB = Heff ,
for several channel realizations. We generate N different
fBBk = fBBk /kFRF fBBk kF , (15) channel realizations for K users. Next, each of these channel
where q̄F and q̄W denote the indices providing the maximum matrices are corrupted by a synthetic noise for G realizations.
sum-rate. We summarize the algorithmic steps of the proposed The noise is added to each term in the channel matrix
approach in Algorithm 1. Note that the proposed hybrid and we define the SNR for the training data generation
(n,g)
|[Hk ]i,j |2 2
precoding optimization in (15) is different than the one in [7], as SNRTRAIN = 20 log10 ( 2
σTRAIN
), where σTRAIN is the
in which, not all possible combinations of the analog precoders (n,g)
variance of synthetic noise. Note that [Hk ]i,j denotes
are considered as it is done in this work. In Section VI, we the (i, j)th entry of the kth channel matrix for the (n, g)th
show that (15) yields better results as compared to [7]. The realization with n = 1, . . . , N and g = 1, . . . , G.
problem in (15) requires to visit QF QW nodes to estimate the The input of the network consists of three channels. In the
hybrid precoders. To reduce the complexity and the need for first channel, the absolute values of the entries in the channel
the array responses, in the following section, we propose a matrix are used. The second and the third channels include the
DL-based approach where we elaborate on the details of the real and imaginary parts of the channel matrix, respectively.
training data generation and network architecture. This approach provides good features for the solution of the
problems [31]. Specifically, let X ∈ RNR ×NT ×3 be the input
V. L EARNING -BASED H YBRID PRECODING of the network, then, for a channel matrix H ∈ CNR ×NT , the
In this part, we present our DL framework for hybrid pre- first channel of the input is given by [[X]:,:,1 ]i,j = |[H]i,j |.
coding design. The proposed network architecture is illustrated The second and the third channels are given by [[X]:,:,2 ]i,j =
in Fig. 2. The CNN-MIMO architecture consists of ten layers Re{[H]i,j } and [[X]:,:,3 ]i,j = Im{[H]i,j }, respectively.
6
Fig. 2. (Top) The proposed network architecture. The input is the channel matrix of any user in the network and the output is the corresponding analog
precoder and combiners. (Bottom) The diagram for the training and prediction stage of the proposed DL framework.
The output of the network is composed of the analog Algorithm 2 Training data generation for CNN-MIMO.
precoder and combiners. Let z ∈ RNT K+NR K be a real valued Input: N , G, K, SNRTRAIN .
vector, then we design the output as Output: Training data DTRAIN .
1: Generate N different realizations of the multi-user MIMO
z = [∠{vec(FRF )T }, ∠{vec(WRF )T }]T , (19) scenario with channel matrices {Hk }N
(n)
n=1 and corre-
where FRF ∈ CNT ×K and WRF ∈ CNR ×K . Hence the sponding feasible sets {F }n=1 , {W(n) }N
(n) N
n=1 ∀k.
input-output pair of the network is (X, z). We summarize the 2: Initialize with t=1while the dataset length is T = N GK.
data generation process in Algorithm 2. The total number of 3: for 1 ≤ n ≤ N do
inputs is T = N GK for K users. Note that the input data is 4: for 1 ≤ g ≤ G do
(n,g) (n) 2
composed of each user channel information as in lines 7−12 of 5: [Hk ]i,j ∼ CN ([Hk ]i,j , σTRAIN ).
(n,g) (n) (n)
Algorithm 2 and we record the analog precoder and combiner 6: Using Hk , F , W in Algorithm 1, find
(n,g) (n,g) (n,g) (n,g)
associated with each user channel. Note also that the same F̂RF and ŴRF using q̄F and q̄W .
analog precoders are used for all noisy channel realizations. 7: for 1 ≤ k ≤ K do
(n,g)
This is to introduce synthetic noise in the input dataset to 8: [[X(t) ]:,:,1 ]i,j = |[Hk ]i,j |.
make the network robust against the corrupted channel data (n,g)
9: [[X(t) ]:,:,2 ]i,j = Re{[Hk ]i,j } .
[13], [31]. 10: (t) (n,g)
[[X ]:,:,3 ]i,j = Im{[Hk ]i,j } ∀ij.
(n,g) (n,g)
11: z(t) = [∠{vec(F̂RF )T }, ∠{vec(ŴRF )T }]T .
B. Network Architecture 12: Construct the input-output pair (X(t) , z(t) ).
The proposed network shown in Fig. 2 is composed of ten 13: t = t + 1.
layers. The first layer is the input layer accepting the channel 14: end for k,
matrix data of size NR × NT × 3 which denotes 3 ”channels”, 15: end for g,
each of which has size equal to NR × NT . The second and the 16: end for n,
fourth layer are the convolutional layers with 256 filters of size 17: Training data for CNN-MIMO is obtained from the col-
2 × 2 to extract the features hidden in the input data. We feed lection of the input-output pairs as
the network with the real and imaginary parts of the channel DTRAIN = (X(1) , z(1) ), (X(2) , z(2) ), . . . , (X(T ) , z(T ) ) .
data which provides a large number of features [13], [30] to
be handled to help the network map and learn the input data
in accordance with their label data. After each convolutional
layer, there is a normalization layer to normalize the output of layers, number of filters and kernel sizes, we have conducted
and provide better convergence. The sixth and eighth layers a hyperparameter tuning process to achieve the sufficiently
are fully connected layers with 2048 units, respectively. There good network accuracy and sum-rate performance [11], [13],
are dropout layers after the fully connected layers (the seventh [30]. The current network architecture with a kernel size
and ninth layers) with a 50% probability. The dropout layers 2 × 2 is one possible solution of the considered problem
make the network non-dependent on the initial weights. The with similar/same performance with network structures having
output layer is the regression layer with K(NR + NT ) units different kernels. In other words, although different kernel
which include the phase information of the analog precoders. sizes can also be used for this problem, in this work, we have
In order to obtain the network parameters such as the number first considered a hyperparameter tuning process providing the
7
sufficient performance for the considered scenario with less and elevation angles, are uniform randomly selected from the
computational complexity [11], [13], [30]. intervals φ ∈ [−30◦ , 30◦ ] and θ ∈ [−20◦ , 20◦ ], respectively
The computational workload of a CNN is the result of [6]. We use sectorized angular range by selecting the antenna
(l,k) (l,k)
intensive use of arithmetic operations in its layers. Most of gains gR (ΘR ), gT (ΘT ) as unity for these angular ranges
the operations occur on the convolutional parts of the network. and zero otherwise to provide a sectorized angular interval in-
Hence, convolutional layers are responsible for more than 90% creasing the beamforming gain and reducing interference and
of the execution time during the inference [43]. Conversely provide increased beamforming gain [5]. Hence, the training
to computations, most of the CNN weights are included on data includes a large number of scenarios where the users
the fully connected layers which require approximately 90% are randomly located. For each scenario, the corresponding
of memory due to a large number of weights [43]. Hence, precoder and combiners are obtained by Algorithm 1.
the complexity of CNN is directly proportional to the number The training stage takes about 5 hours for T = 450000
of parameters and the number of layers. The layers of the samples. This process includes both the labeling and the input
proposed CNN structure are described above and the number data generation. Note that the training stage is performed only
of parameters can be calculated as C 2 2Ncv (wh + 1) + once. Then, in the prediction stage, it takes only milliseconds
50
([Nf c1 + 1] + [Nf c2 + 1]) · 100 [43]. Here, C = 3 corresponds to estimate the hybrid precoders as demonstrated in the sim-
to the number of channels, w = h = 2 is the filter size, ulations (please see Table I). Hence, the proposed approach,
and Ncv = 256 is the number of filters in both convolutional providing high data rate and low latency, is quite attractive
layers. The variables Nf c1 = Nf c2 = 2048 describe the since it meets the 5G requirements.
number of units in the fully connected layers for 50% dropout The trained network can work for different parameters such
probability. Hence, the CNN-MIMO structure in Fig. 2 has as the number of users1 K, number of paths L, SNRTEST
41481 parameters. and SNRTRAIN which motivates the practical implementation
of the proposed DL framework. The proposed CNN structure
C. Training requires to be retrained if there is a change in the parameters
like NT , NR , NTRF , which directly dictate the input and output
The CNN structure in Fig. 2 is realized and trained in dimensions of the deep network. The performance of the
MATLAB on a PC with a single GPU and a 768-core network also depends on the angular interval selected in D
processor. We have used the stochastic gradient descent al- when designing the feasible sets F and W as well as the
gorithm with momentum 0.9 [44] and updated the network antenna gains obtaining sectorized angular intervals.
parameters with learning rate 0.005 and mini-batch size of
500 samples for 100 epochs. As a loss function,2 we used the
PT D. Prediction
MSE given by L = T1 t=1 z(t) − f (X(t) ) where f (X) is
a function of the input data X, which represents the nonlinear Once the CNN-MIMO is trained offline as demonstrated in
transformation achieved by the network [11]. Fig. 2, it can be used for the prediction of the hybrid beam-
To train the proposed CNN structure, N = 500 different formers. In order to generate the test data in the prediction
multi-user scenarios are realized with K = 3 users (1500 stage, we have picked users randomly from the validation
channel realizations in total) as in Algorithm 2. For each data and the synthetic noise is also added to the test data
channel matrix, AWGN is added for different powers of with SNRTEST to eliminate the similarity between the test and
SNRTRAIN ∈ {15, 20, 25}dB with G = 100 to account training datasets. The corrupted channel data of each user is
for different channel characteristics. The use of multiple fed to the network and the analog precoders are predicted
SNRTRAIN levels provides a wide range of corrupted data from the output layer of the network. Then, their phases are
in the training which improves the learning and robustness quantized in [0, 2π] with 2B discrete points. Specifically, the
B
of the network. Hence, the total size of the training data values of the quantized phases belong in the set { 2πb }2 to
2B b=1
is NR × NT × 3 × 450000. In the training process, 80% allow the realization of the analog precoder and combiners in
and 20% of all generated data are selected as the training a hardware-efficient manner.
and validation datasets, respectively. The validation aids in
VI. N UMERICAL S IMULATIONS
hyperparameter tuning during the training phase to avoid
the network simply memorizing the training data rather than In this section, we present the performance of the proposed
learning general features for accurate prediction with new method, CNN-MIMO, via several experiments where we train
data. The validation data is used to test the performance of the network with the parameters described in Section V-B such
the network in the simulations for JT = 100 Monte Carlo as N = 500, K = 3, G = 100, SNRTRAIN = {15, 20, 25} dB,
trials. In order to prevent the similarity between the test data learning rate 0.005, batch size 500 and number of epochs 100.
and the training data we also add synthetic noise to the We compare the performance of CNN-MIMO with state-of-
test data where the SNR during testing is defined similar to the-art hybrid precoding techniques such as the manifold op-
|[H] |2 timization (MO) [45], the low-resolution hybrid beamforming
SNRTRAIN as SNRTEST = 20 log10 ( σ2i,j ). The number of
TEST
grid points is selected as L̄ = 60 for azimuth and L̄ = 20 1 When the network is trained for K
TRAIN users, the output size of the
for elevation angular sectors in Algorithm 1. In addition, the network is z ∈ RNT KTRAIN +NR KTRAIN . Then we can use the trained
network for hybrid beamforming when there are K ≤ KTRAIN users by
propagation environment is modeled with L = 10 paths from substituting network output of size NT K + NR K × 1 corresponding to
the users and all the user directions, i.e., all the azimuth those K users.
8
Sum-Rate [bits/s/Hz]
compared with the DL-based approach MLP proposed in [27]. 6
(a)
12
0.16
No Interference Manifold Optimization
Manifold Optimization 0.15 CNN-MIMO
10 Algorithm 1 LRHB
CNN-MIMO 0.14 MLP
LRHB TS-HB
Sum-Rate [bits/s/Hz]
8 MLP SOMP
0.13
TS-HB
SOMP
0.12
RMSE
6 5.1
0.11
5
0.1
4 4.9
4.8 0.09
-5.1 -5 -4.9
0.08
2
0.07
0 0.06
-30 -25 -20 -15 -10 -5 0 5 10 0 5 10 15 20 25 30 35 40
SNR, [dB] SNR TEST , [dB]
(b)
Fig. 3. Sum-rate versus SNR (NT = 36, NR = 9, K = 3, B = 3 and
SNRTEST = 20 dB).
0.018
Manifold Optimization
In Fig. 3, we present the achievable sum-rate performance CNN-MIMO
0.016 LRHB
of the algorithms with respect to different SNR levels. The MLP
TS-HB
design parameters of CNN-MIMO are given in Section IV- 0.014 SOMP
B. Moreover, we select the number of antennas per BS and
per user as NT = 36, NR = 9, respectively. Synthetic 0.012
RMSE
and TS-HB have poorer performance as compared to CNN- The analog precoders are designed with discrete phase
MIMO. Especially, while SOMP was initially proposed for shifters with constant modulus to steer the beam in spatial
the single-user case, we have adapted the algorithm for the precoding. To assess the performance for the phase resolu-
multi-user scenario where the analog precoders are designed tion in the phase shifters, we present the sum-rate of the
based on the similarity between the optimum precoder and algorithms for different quantization resolutions where the
the analog precoders. As a result, SOMP does not always phases of the analog precoder and combiners are quantized
find the optimum weights maximizing the sum-rate [31]. TS- for B = {1, . . . , 8} bits. The results are depicted in Fig. 5
HB algorithm has better performance than SOMP since it is where we observe that the other algorithms converge after 4
based on the maximization of the sum-rate and its performance bits while, remarkably, the proposed CNN approach achieves
converges to the same one as SOMP when there is a single higher sum-rate starting from one-bit quantization.
path from each user. In Fig. 6(a), the performance is evaluated for varying
number of users, namely, K ∈ {2, . . . , 8} where L = 10
8
is fixed. Notably, CNN-MIMO performs better than the other
algorithms. In particular, the gap between ”No interference”
7.5
and CNN-MIMO becomes larger as K increases. We observe
7 that the performance of MLP becomes better than LRHB
after K ≥ 5 and exhibits robust performance like CNN-
Sum-Rate [bits/s/Hz]
6.5
MIMO with a certain performance loss. The main reason
6
is that the use of training data prepared with Algorithm 1
5.5
No Interference which provides more accurate beamformers than the other
Manifold Optimization
5 Algorithm 1 algorithms. We also see that CNN-MIMO closely follows the
CNN-MIMO
4.5 LRHB performance of Algorithm 1. However, this gap appears due
MLP
TS-HB
to the insufficient performance of interference cancellation.
4
SOMP
Hence, it is suggested to develop more effective algorithms to
3.5
1 2 3 4 5 6 7 8 handle the interference among the users.
Number of Quantization Bits In Fig. 6(b), we evaluate the performance of CNN-MIMO
when the number of paths for each user is not fixed. Hence, we
Fig. 5. Sum-rate versus angular resolution of the analog precoders (NT = 36,
NR = 9, SNR= 0 dB, SNRTEST = 20 dB). train the network with the same parameters except selecting
L uniform randomly from the interval [1, 10]. Using varying
The feedback data, namely, the channel matrix {Hk }K k=1 L values for different users reduces the similarity between the
and the feasible array response sets F and W may not always channel data of users and we obtain satisfactory performance
be perfectly available. In order to evaluate the performance of CNN-MIMO similar to the observations made when L is
of the algorithms on the robustness against the corrupted fixed.
feedback, we simulate the performance of the algorithms
for different SNRTEST levels for the same setting as in the TABLE I
previous simulation. In this case, complex AWGN was added C OMPUTATION T IMES (I N SECONDS ).
to both channel and array response data to resemble the
NT Algorithm 1 CNN-MIMO MLP LRHB TS-HB SOMP
deviations in the feedback data. The results are presented in 4 0.1061 0.0039 0.0034 0.0059 0.0093 0.0122
Fig. 4 where we present the achievable sum-rate in Fig.4(a) 16 0.1164 0.0043 0.0038 0.0113 0.0103 0.0139
while the RMS error on precoder FRF and combiner WRF 64 0.1175 0.0049 0.0045 0.0159 0.0108 0.0216
100 0.1242 0.0052 0.0049 0.0318 0.0125 0.0282
are shown in Figs. 4(b) and 4(c), respectively. Note that
Algorithm 1 is fed with perfect CSI to demonstrate the best
achievable performance. As can be seen from Fig. 4, CNN- In Fig. 7, we illustrate the performance for varying number
MIMO is more robust against the corruption in the channel of BS antennas. As can be seen, similar observations can be
data as compared to the other methods. Note that the manifold obtained. Specifically, CNN-MIMO performs better than the
optimization, LRHB, MLP, and CNN-MIMO are only affected other algorithms. Furthermore, we present the computation
by the corruption in the channel data since they automatically times of the algorithms for a different number of BS antennas
estimate the analog precoders, unlike SOMP and TS-HB which in Table I in seconds. While the complexity of Algorithm 1 is
require the feasible sets F and W as input. As a result, the highest due to the exhaustive search, DL-based approaches,
the performance of TS-HB and SOMP heavily rely on the i.e., CNN-MIMO and MLP have the least computation time as
accuracy of both the channel matrix and the array response compared to LRHB and the rest. MLP appears slightly lower
sets. Moreover, the knowledge of channel data and the feasible complexity than CNN-MIMO due to its less complex structure,
sets F and W is only needed in the training stage of the however, it has poorer performance as was shown in the
network to obtain the labels and it is not used in the prediction previous experiments. In addition, regarding the complexity
stage. However, the other algorithms like SOMP and TS-HB, of TS-HB and SOMP, given its dependence on the number
require this information to solve the hybrid precoding problem. of elements in the feasible sets F and W, it is observed that
Overall, these results show the robustness of the proposed TS-HB has less computation time than SOMP since it does
CNN-MIMO. not follow an OMP stage to obtain the precoders but it selects
10
9 9
No Interference LRHB
Manifold Optimization MLP
8.5 8
Algorithm 1 TS-HB
CNN-MIMO SOMP
7
8
Sum-Rate [bits/s/Hz]
Sum-Rate [bits/s/Hz]
6
7.5
5
7
4 No Interference
Manifold Optimization
6.5 Algorithm 1
3 CNN-MIMO
LRHB
6 2 MLP
TS-HB
SOMP
5.5 1
2 3 4 5 6 7 8 0 20 40 60 80 100
Number of Users Number of BS Antennas
(a)
Fig. 7. Sum-rate versus number of BS antennas (K = 3, NR = 9, SNR= 0
dB and SNRTEST = 20 dB).
9
8 VII. C ONCLUSIONS
Spectral Efficiency [bits/s/Hz]