Hybrid Precoding For Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach

1
Hybrid Precoding for Multi-User Millimeter Wave

Massive MIMO Systems: A Deep Learning
Approach
Ahmet M. Elbir and Anastasios Papazafeiropoulos, Senior Member, IEEE
Abstract—In multi-user millimeter wave (mmWave) multiple- transmitter and receiver sides by yielding a massive multiple-
arXiv:1911.04239v1 [eess.SP] 11 Nov 2019
input-multiple-output (MIMO) systems, hybrid precoding is a input-multiple-output (MIMO) structure enhancing the signal-
crucial task to lower the complexity and cost while achieving a to-noise ratio (SNR) at the received signal [3].
sufficient sum-rate. Previous works on hybrid precoding were
usually based on optimization or greedy approaches. These Signal processing in conventional systems with frequencies
methods either provide higher complexity or have sub-optimum lower than 3GHz is performed digitally where both the am-
performance. Moreover, the performance of these methods mostly plitude and the phases are processed in the baseband. For
relies on the quality of the channel data. In this work, we this reason, dedicated radio-frequency (RF) hardware for each
propose a deep learning (DL) framework to improve the per- antenna element is required [4]. Unfortunately, in the case of
formance and provide less computation time as compared to
conventional techniques. In fact, we design a convolutional neural mmWave MIMO systems implemented with a large number of
network for MIMO (CNN-MIMO) that accepts as input an antennas, digital processing is not cost-efficient since it brings
imperfect channel matrix and gives the analog precoder and high cost at the system hardware and significant complexity.
combiners at the output. The procedure includes two main To reduce the cost and provide sufficient performance, hy-
stages. First, we develop an exhaustive search algorithm to brid precoding architectures are proposed where the signal
select the analog precoder and combiners from a predefined
codebook maximizing the achievable sum-rate. Then, the selected is processed by both analog and digital precoders [5]–[8].
precoder and combiners are used as output labels in the training Especially, in the analog processing part of the hybrid systems,
stage of CNN-MIMO where the input-output pairs are obtained. phase shifters with constant modulus are usually used. The
We evaluate the performance of the proposed method through role of phase shifters is the introduction of discrete phases
numerous and extensive simulations and show that the proposed to the transmitted/received signal to steer the beam, and thus,
DL framework outperforms conventional techniques. Overall,
CNN-MIMO provides a robust hybrid precoding scheme in the increase the gain [8].
presence of imperfections regarding the channel matrix. On In recent years, several techniques have been proposed to
top of this, the proposed approach exhibits less computation design the hybrid precoding in mmWave MIMO systems. In
time with comparison to the optimization and codebook based particular, initial works focused on the single-user scenario
approaches. [6]. In such a case, the user is assumed to be deployed with
Index Terms—Hybrid precoding, mmWave systems, multi- multiple antennas. While the single-user case constitutes the
user MIMO transmission, deep learning, convolutional neural baseline for multi-user systems being of practical interest, the
networks. interference from other users should be taken into account
when designing the precoders [7]–[10]. In [8], the performance
of low-resolution analog to digital converters (ADCs) are
I. I NTRODUCTION
investigated when a single RF chain is used at mobile users.
Millimeter wave (mmWave) communication systems pro- In [9], simultaneous channel estimation is considered for
vide a higher data rate and wider bandwidth at high fre- multiple-user systems, while, in [10], antenna selection in
quencies (in the range of 30 − 300 GHz) [1]. Reasonably, mmWave MIMO is considered together with hybrid precoding
it has become a leading candidate to be realized in the fifth- estimation. The authors in [7] also consider the multi-user
generation (5G) wireless networks [2]. However, in mmWave scenario but the hybrid precoders are obtained by a greedy-like
bands, the propagation loss is higher as compared to conven- approach as in [6] where a simultaneous orthogonal matching
tional systems with lower frequencies [1], [2]. To overcome pursuit (SOMP) algorithm is proposed. It is worthwhile to
the high propagation path-loss and to provide beamforming mention that all of the above methods are based on the
power gain, massive numbers of antennas are used at both the assumption of perfect channel state information and the avail-
ability of the array response sets, namely, F and W for the
Copyright (c) 2015 IEEE. Personal use of this material is permitted. precoder and combiner design, respectively. These sets are
However, permission to use this material for any other purposes must be composed of the transmit and receive steering vectors with
obtained from the IEEE by sending a request to [email protected].
A. M. Elbir is with the Department of Electrical and Electronics Engineer- respect to the direction-of-arrival/departures (DOA/DODs) of
ing, Duzce University, Duzce, Turkey (e-mail: [email protected]). the user locations. Taking into consideration that these array
A. Papazafeiropoulos is with the Communications and Intelligent Systems responses are directly related to the singular value matrix of
Research Group, University of Hertfordshire, Hatfield AL10 9AB, U.K., and
also with SnT (http://www.securityandtrust.lu), University of Luxembourg, L- the channel through a linear transformation, they become the
1855 Luxembourg City, Luxembourg (e-mail: [email protected]). best candidates for the precoder design problem [5]–[7].
2
As a class of machine learning techniques, DL has gained channel matrix of users is selected as the input of CNN-
much interest recently for the solution of many challenging MIMO, and the output labels are selected as the hybrid
problems such as speech recognition, visual object recognition, precoder weights. In the training stage, which is an offline
and language processing [11], [12]. DL has several advan- process (please see Fig. 2), we generate several channel real-
tages such as low computational complexity when solving izations of multiple users and obtain the corresponding hybrid
optimization-based or combinatorial search problems and the precoders via an exhaustive search algorithm. This process
ability to extrapolate new features from a limited set of features requires the knowledge of the feasible sets of array responses
contained in a training set [11]. Very recently, a great deal of F, W which are not used in the prediction stage. Once the
attention has been received for DL-based techniques regarding network is trained, CNN-MIMO is used to predict the hybrid
radar [13], and fundamental communication theory topics precoders by simply feeding the network with the channel
[14]–[24] such as channel estimation [16], DOA estimation matrix of users. The proposed DL framework provides a
[17], and analog beam selection [18]. Especially, in the physi- nonlinear mapping between the channel matrix and the hybrid
cal layer of wireless communications, DL has been applied for beamformers. Hence, the proposed method achieves more
signal detection [19], channel estimation [21], [25], [26] and robust performance than the competing algorithms since the
dynamic multi-channel access problems [20]. In this direction, deep network can handle the imperfections and the corruptions
an end-to-end communication scenario is modeled in [21] and in the input channel data whereas the other algorithms do not
[22] by using auto-encoders where single-input-single-output have such capability. The proposed approach also has superior
(SISO) systems are considered. The authors in [23] have also sum-rate performance due to the use of the “best” hybrid
used auto-encoders for the channel state information (CSI) beamformers which are obtained via an exhaustive search in
feedback problem. Interestingly, [24] studies the physical layer the training process. The main contributions of this work are
structures without channel models via DL. as follows.
An interesting topic concerns the investigation of the hybrid • A DL-based approach is proposed for the hybrid pre-
precoding problem in the context of DL [27]–[31]. Inspired coding in multi-user massive MIMO mmWave systems.
from dense fully connected layers, deep multilayer perceptrons We leverage DL to estimate the precoder and combiner
(MLPs) have been proposed in [27]–[29]. Specifically, in [27] weights so that CNN-MIMO is more robust against the
and [28], MLP has been employed only for the precoder deviations in the channel matrix. Hence, the proposed
design and just for the single-user scenario. In [29], an MLP DL framework has superior performance with comparison
architecture is considered for coordinated beam training where to the conventional greedy and codebook based tech-
the perfect CSI is assumed to be known. Moreover, in [30], a niques [6]–[8] whose performances strongly rely on the
convolutional neural network (CNN)-based approach has been quality of the channel.
proposed for the joint precoder and combiner design problem • In most of the previous works such as [6], [7], the
but for the single-user setting again. Also, in [31], quantized codebooks formed by the feasible set of array responses
and unquantized CNNs have been used for hybrid precoding F and W are assumed to be known. Then, the analog
in the case of a single-user MIMO system. The performance precoding design problem reduces to the selection of the
of DL-based approaches such as [27]–[29] strongly relies on best candidates in F and W to maximize the sum-rate. In
the perfectness of the channel matrix whereas in [30] and [31], this work, we only need F and W in the training stage to
robust DL approaches are proposed against the imperfections obtain the network labels and the proposed DL technique
in the channel data but these works are developed only for the does not require such information in the prediction stage
single-user scenario. where DL network itself obtains the analog precoder
weights by learning the features hidden in the input data.
A. Motivation • To train the network, a very large training data (almost
Although there are optimization-based approaches that di- half a million samples) is generated. Hence, a robust
rectly estimate the precoders, they appear large computational performance against the imperfect channel case and the
complexity and local-minimum problems due to random ini- deviations in the channel data is achieved.
• The proposed approach also enjoys less computation
tialization [32]. Also, the design of hybrid precoders for the
common multi-user MIMO scenario, being of high practical time for hybrid precoding design. While the conventional
importance, has not been considered in the context of DL. techniques require an optimization process or greedy
Thus, driven by the advantages of DL such as its provided searches, our CNN approach estimates the precoders by
low computational complexity, we develop a method that can simply feeding the network with the corrupted channel
handle the hybrid precoding design in the case of multi-user matrix.
MIMO transmission in the mmWave region when corrupted
channel feedback data is available. C. Notation
Vectors and matrices are denoted by boldface lower and
B. Contribution upper case symbols, respectively. In the case of a vector a,
In this paper, we propose a DL framework in terms of [a]i represents its ith element. For a matrix A, [A]:,i and
a CNN, which is for mmWaves hybrid precoding design, [A]i,j denote the ith column and the (i, j)th entry, respectively.
henceforth called CNN-MIMO. In our DL framework, the IN is the identity matrix of size N × N , E{·} denotes the
3
statistical expectation, and k · kF is the Frobenious norm. The

notation (·)† denotes the Moore-Penrose pseudo-inverse while
∠{·} denotes the angle of a complex scalar/vector while the
notation, expressing a convolutional layer with N filters of
size D × D, is given by N @D × D. For a complex scalar
a = ejϕ with continuous phase ϕ, Q(a) = ejϕB denotes
the quantization operator where ϕB is the quantized angle in
[0, 2π] sampled with 2B points.
II. S YSTEM M ODEL

Fig. 1. A multi-user MIMO system with hybrid (analog and baseband)
precoding on the BS and analog-only combining at K users.
We consider a multi-user mmWave MIMO system as shown
in Fig. 1. The base station (BS), serving K users each of
which has NR antennas, is employed with NT antennas and A. Channel Model
NTRF RF chains. By taking into consideration of cheaper
In mmWave transmission, the channel can be represented by
hardware at each user, and subsequently, low power consump-
a geometric model with limited scattering [34]–[36]. Hence,
tion, we assume that the BS communicates with each user
we assume that the channel matrix Hk includes the contribu-
via a single stream, i.e., NS = 1 [7]. Hence, only analog
tions of L scattering paths. Considering a 2-D uniform planar
combining is applied at the receiver. Another assumption is
array (UPA), the channel matrix corresponding to the kth user
that NTRF ≥ K, i.e., the maximum number of simultaneously
is given by
served users cannot be greater than the number of BS RF
chains. In the downlink, the BS applies baseband precoding L
(l,k) (l,k) (l,k) (l,k)
X
RF
FBB = [fBB1 , fBB2 , . . . , fBBK ] ∈ CNT ×K to the transmit Hk = γ αl,k gR (ΘR )gT (ΘT )aR (ΘR )aH
T (ΘT ),
signal s = [s1 , s2 , . . . , sK ]T ∈ CK obeying to E{ssH } = l=1
P
K IK by assuming equal power allocation among the users.
(l,k) (l,k) (l,k) (l,k) (l,k) (l,k)
where ΘR = (φR , θR ) and ΘT = (φT , θT )
Note that P denotes the average power. The RF precoders
RF denote the angle of arrivals and departures, respectively. Note
FRF ∈ CNT ×NT , which are constructed by phase shifters, that the angular parameters φ and θ ∈ [0, 2π] correspond to
are used to convey the signal to NT transmit antennas. Also, the azimuth and the elevation angles, respectively. The scalar
given that FRF consists of analog phase shifters, we assume p
γ = NT NR /L is the normalization factor and αl,k is the
that the RF precoder has constant equal-norm elements, i.e., complex channel gain associated with the kth user and lth path
|[FRF ]i,j |2 = 1/NT . In addition, we have the power constraint (l,k) (l,k)
l = 1, . . . , L. Also, gR (ΘR ) and gT (ΘT ) are the antenna
kFRF FBB k2F = K that is enforced by the normalization of (l,k)
element gains for the antennas in the arrays while aR (ΘR )
FBB . Thus, the NT × 1 transmitted signal is written as (l,k)
and aT (ΘT ) are the NR × 1 and NT × 1 steering vectors
x = FRF FBB s. (1) representing the array responses at the kth user and the BS,
(l,k)
respectively. The nth element of the steering vector aR (ΘR )
We can write the received signal of the kth user for a is given as
narrowband block-fading channel as [33]

(l,k) 2π (l,k)
[aR (ΘR )]n = exp − pTn r(ΘR ) , (4)
λ
K
X
ỹk = Hk FRF fBBn sn + nk , (2) where λ is the wavelength, pn = [xn , yn , zn ]T is the posi-
n=1 tion of the nth antenna in the Cartesian coordinate system.
Regarding the direction vector, it is given by
where Hk ∈ CNR ×NT is the channel matrix between the BS
(l,k) (l,k) (l,k)
and the kth user with kHk kF = NR NT . The vector nk ∈ CNR r(ΘR ) =[sin(φR ) cos(θR ),
denotes the complex additive white Gaussian noise (AWGN) (l,k) (l,k) (l,k)
sin(φR ) sin(θR ), cos(θR )]T . (5)
with nk ∼ CN (0, σ 2 INR ).
Once the transmitted signal is received from the kth user, In a similar way, the transmitter side steering vector
(l,k) (l,k)
the received signal is processed by the combiner wRFk ∈ CNR aT (ΘT ) can also be defined as for aR (ΘR ).
H
as yk = wRF ỹ , i.e.,
k k
By assuming that Gaussian symbols are transmitted through
the mmWave channel under study, the achievable rate for the
K
X kth user is written as [5], [7]
H H
yk = wRFk
Hk FRF fBBn sn + wRF n ,
k k
(3) P H 2
n=1 K |wRFk Hk FRF fBBk |
Rk = log2 1 + P
P H 2
(6)
K n6=k |wRFn Hn FRF fBBn | + σ2
where the RF combiners are constructed by means of phase
shifters with the normalization constraint as |[wRFk ]i |2 = PK the achievable sum-rate of the system is given by R̄ =
and
1/NR . k=1 Rk .
4
III. P ROBLEM F ORMULATION We start by formulating the optimization problem for hybrid
precoding in the multi-user scenario as
The principal aim in this work is to design the hybrid {F̂BB , F̂RF , ŴRF } = argmax R̄
precoder and combiners FBB , FRF and {wRFk }K k=1 in the FBB ,FRF ,WRF
presence of imperfect channel data by maximizing the sum- subject to: FRF ∈ F, WRF ∈ W,
rate. Specifically, we first develop an algorithm to compute the kFRF FBB k2F = K, (7)
hybrid precoders which maximizes the sum-rate, and then a
deep network is designed such that the hybrid precoders are where WRF = [wRF1 , wRF2 , . . . , wRFK ] denotes the analog
predicted by feeding the network with imperfect CSI. combiner of all users while F and W are the feasible sets of
In a nutshell, the proposed DL framework provides a the precoder and combiners. In practice, both F and W are
(l,k) (l,k)
nonlinear mapping from the channel matrix H to the analog composed of the steering vectors aT (ΘT ) and aR (ΘR ),
beamformers FRF and {wRFk }K k=1 . The label generation
∀l, k with quantized phases, respectively. Specifically, the
process depends on the channel model which is not required array response sets are selected as
for updating the network parameters in the training stage. (1,1) (L,K)
Hence, CNN-MIMO can also be used for various channel F = {Q(aT (ΘT )), . . . , Q(aT (ΘT ))}, (8)
models in mmWave systems [37]. Given that our main focus and
is hybrid beamforming, in this work, we use the block-fading
(1,1) (L,K)
channel model due to the simplistic structure of channel matrix W = {Q(aR (ΘR )), . . . , Q(aR (ΘR ))}, (9)
model and rate computation [25], [26], [38]. The application of
DL to other channel models is the topic of ongoing research. where Q(·) denotes the phase quantization operator as men-
tioned before.
The estimation process of the channel matrix of the users is
In the exhaustive search algorithm, it is desired to visit
a challenging task, especially in the case of a large number of
all possible combinations of the elements in the feasible sets
antennas taking place in massive MIMO systems [39], [40].
F and W to achieve near-optimum performance. For this
In addition, since the coherence time of the channel is very
reason, we design new feasible sets F and W which include
short in the mmWave massive MIMO scenario, the parameters
all precoder and combiner combinations. The search algorithm
related to the channel characteristics change greatly in a short
visits all the nodes in the direction set
time [41]. To obtain a robust precoding performance, we feed
the deep network with several channel realizations which are 2π 4π (L̄ − 1)2π
D = [0, , ,..., ], (10)
corrupted by synthetic noise in the training stage which is an L̄ L̄ L̄
offline process. Hence, in the testing stage when the network where |D| = L̄. By assuming that the BS receives L̄ paths
predicts the precoder weights, the network does not necessarily from each user, the kth column of FRF can take L̄ different
require the perfect CSI [30]. We show, through simulations, (l,k)
values, i.e., {Q(aT (ΘT ))}L̄l=1 . If we generalize it for all
that the proposed approach can handle the corrupted channel users, we have QF = L̄K possible candidates to design FRF .
matrix case and exhibits satisfactory performance regarding Thus, we define a new set as
the achievable sum-rate.
The main stages of the proposed DL framework are label F = {F1 , F2 , . . . , FQF }, (11)
generation, training, and prediction. In the following section,
where FqF ∈ CNT ×K is given by
we first discuss how the labels are obtained from the channel
data. Then, in Section V, we present the details of the training (l ,1) (l ,2) (l ,K)
FqF = [Q(aT (ΘT1 )), Q(aT (ΘT2 )), . . . , Q(aT (ΘTK ))]
and the prediction stages.
with the indices for each user given by l1 , l2 , . . . , lK =
1, . . . , L̄. Hence, we have qF = 1, . . . , L̄K which de-
notes the precoder candidates for K users. In a similar
IV. H YBRID P RECODING D ESIGN I N M ULTI -U SER MIMO way, the set for the analog combiners is defined as W =
S YSTEMS {W1 , W2 , . . . , WQW } where WqW ∈ CNR ×K is given by
(l ,1) (l ,2) (l ,K)
WqW = [Q(aR (ΘR1 )), Q(aR (ΘR2 )), . . . , Q(aR (ΘRK ))]
In order to design the network and training data, we first
need to solve the hybrid precoding problem and obtain the with wRFk selected from the kth column of W, i.e.,
labels of the training data samples. For this reason, we first (l ,k)
Q(aR (ΘRk )).
develop an exhaustive search algorithm that visits all precoder Once the analog precoders are selected from the sets F and
and combiner combinations in the feasible sets F and W W, the effective channel Heff K×NTRF
qF ,qW ∈ C is given by
such that the sum-rate in (6) is maximized. Then, we solve
 eff 
the exhaustive search problem in an offline manner to obtain hqF ,qW ,1
the training data inputs and labels. The advantage of using  heff
 qF ,qW ,2 

a DL approach is the reduction of the computation time of Heff
qF ,qW =  .. , (12)
the hybrid precoding design problem and obtain near-optimum
 . 
performance that can be obtained from an exhaustive search. heff
qF ,qW ,K
5
Algorithm 1 Hybrid precoding for Multi-user MIMO and it accepts an input data of size NR × NT × 3 while it
Input: {Hk }K k=1 , F, W, D. yields a K(NT + NR ) × 1 vector at the output. The overall
Output: F̂RF , ŴRF . network architecture of CNN-MIMO can be represented by
1: for 1 ≤ qF ≤ QF do the function Π(·) : RNR ×NT ×3 → RK(NR +NT ) . Let us define
2: FRF = FqF , the arithmetic operation of the ith layer in the network with
3: for 1 ≤ qW ≤ QW do f (i) (·), then the representation of the overall network can be
4: wRFk = [WqW ]:,k , given as
H
5: heff
k = wRFk Hk FRF , Π(X) = f (10) f (9) (· · · f (1) (X) · · · ) = z,

† (16)
6: FBB = Heff ,
7: fBBk = fBBk /kFRF fBBk kF , where each layer has certain task described above and we
8: Compute R̄qF ,qW as in (14). explicitly show the arithmetic operations for fully connected
9: end for qW , layers are convolutional layers in the sequel.
10: end for qF , Let W̄ ∈ RCx ×Cy be the weights of a fully connected layer
11: {q̄F , q̄W } = arg maxqF ,qW R̄qF ,qW . in the network with input x̄ ∈ RCx and output ȳ ∈ RCy . The
12: F̂RF = Fq̄F and ŴRF = Wq̄W . cy th element of the output of the layer can be given by the
inner product
X
ȳcy = hW̄cy , x̄i = [W̄]Tcy ,i x̄i , (17)
where the corresponding effective channel for each user can i
be calculated as
for cy = 1, . . . , Cy and W̄cy is the cy th column vector of W̄.
heff H
qF ,qW ,k = [WqW ]:,k Hk FqF . (13) For a convolutional layer, define X̄ ∈ Rdx ×dx ×Cx and Ȳ ∈
dy ×dy ×Cy
R as the feature maps and output of a convolutional
The baseband precoder
† can be given by layer, respectively. Let us also define dx ×dy as the size of the
FBB,qF ,qW = Heff
qF ,qW and it is normalized as convolutional kernel, and Cx × Cy as the size of the response
(qF ,qW ) (qF ,qW ) (qF ,qW )
fBBk = fBBk /kFqF fBBk kF [7]. Thus, the of convolutional layer for each feature map. Then, the response
achievable sum-rate then can be written as of a convolutional layer becomes
X
R̄qF ,qW = log2 IK + Ȳpy ,cy = hW̄cy ,pk , X̄px i, (18)
pk ,px
P H effH
2
Heff
qF ,qW FBB,qF ,qW FBB,qF ,qW HqF ,qW . (14) where Ȳpy ,cy is the response for the 2-D spatial region py in
Kσ
the cy th channel of the feature maps, W̄cy ,pk ∈ RCx denotes
Using the sets F and W, the optimization problem in (7) can the weights of the cy th convolutional kernel, and X̄px ∈ RCx
be rewritten as is the input feature map at spatial position px . Hence we define
{q̄F , q̄W } = argmax R̄qF ,qW px and pk as the 2-D spatial positions in the feature maps and
qF ,qW convolutional kernels, respectively [42].
subject to: FRF = FqF , wRFk = [WqW ]:,k ,
heff H A. Training Data Generation
k = wRFk Hk FRF ,
† In order to train the network, we prepare a training dataset
FBB = Heff ,
for several channel realizations. We generate N different
fBBk = fBBk /kFRF fBBk kF , (15) channel realizations for K users. Next, each of these channel
where q̄F and q̄W denote the indices providing the maximum matrices are corrupted by a synthetic noise for G realizations.
sum-rate. We summarize the algorithmic steps of the proposed The noise is added to each term in the channel matrix
approach in Algorithm 1. Note that the proposed hybrid and we define the SNR for the training data generation
(n,g)
|[Hk ]i,j |2 2
precoding optimization in (15) is different than the one in [7], as SNRTRAIN = 20 log10 ( 2
σTRAIN
), where σTRAIN is the
in which, not all possible combinations of the analog precoders (n,g)
variance of synthetic noise. Note that [Hk ]i,j denotes
are considered as it is done in this work. In Section VI, we the (i, j)th entry of the kth channel matrix for the (n, g)th
show that (15) yields better results as compared to [7]. The realization with n = 1, . . . , N and g = 1, . . . , G.
problem in (15) requires to visit QF QW nodes to estimate the The input of the network consists of three channels. In the
hybrid precoders. To reduce the complexity and the need for first channel, the absolute values of the entries in the channel
the array responses, in the following section, we propose a matrix are used. The second and the third channels include the
DL-based approach where we elaborate on the details of the real and imaginary parts of the channel matrix, respectively.
training data generation and network architecture. This approach provides good features for the solution of the
problems [31]. Specifically, let X ∈ RNR ×NT ×3 be the input
V. L EARNING -BASED H YBRID PRECODING of the network, then, for a channel matrix H ∈ CNR ×NT , the
In this part, we present our DL framework for hybrid pre- first channel of the input is given by [[X]:,:,1 ]i,j = |[H]i,j |.
coding design. The proposed network architecture is illustrated The second and the third channels are given by [[X]:,:,2 ]i,j =
in Fig. 2. The CNN-MIMO architecture consists of ten layers Re{[H]i,j } and [[X]:,:,3 ]i,j = Im{[H]i,j }, respectively.
6
Fig. 2. (Top) The proposed network architecture. The input is the channel matrix of any user in the network and the output is the corresponding analog
precoder and combiners. (Bottom) The diagram for the training and prediction stage of the proposed DL framework.
The output of the network is composed of the analog Algorithm 2 Training data generation for CNN-MIMO.
precoder and combiners. Let z ∈ RNT K+NR K be a real valued Input: N , G, K, SNRTRAIN .
vector, then we design the output as Output: Training data DTRAIN .
1: Generate N different realizations of the multi-user MIMO
z = [∠{vec(FRF )T }, ∠{vec(WRF )T }]T , (19) scenario with channel matrices {Hk }N
(n)
n=1 and corre-
where FRF ∈ CNT ×K and WRF ∈ CNR ×K . Hence the sponding feasible sets {F }n=1 , {W(n) }N
(n) N
n=1 ∀k.
input-output pair of the network is (X, z). We summarize the 2: Initialize with t=1while the dataset length is T = N GK.
data generation process in Algorithm 2. The total number of 3: for 1 ≤ n ≤ N do
inputs is T = N GK for K users. Note that the input data is 4: for 1 ≤ g ≤ G do
(n,g) (n) 2
composed of each user channel information as in lines 7−12 of 5: [Hk ]i,j ∼ CN ([Hk ]i,j , σTRAIN ).
(n,g) (n) (n)
Algorithm 2 and we record the analog precoder and combiner 6: Using Hk , F , W in Algorithm 1, find
(n,g) (n,g) (n,g) (n,g)
associated with each user channel. Note also that the same F̂RF and ŴRF using q̄F and q̄W .
analog precoders are used for all noisy channel realizations. 7: for 1 ≤ k ≤ K do
(n,g)
This is to introduce synthetic noise in the input dataset to 8: [[X(t) ]:,:,1 ]i,j = |[Hk ]i,j |.
make the network robust against the corrupted channel data (n,g)
9: [[X(t) ]:,:,2 ]i,j = Re{[Hk ]i,j } .
[13], [31]. 10: (t) (n,g)
[[X ]:,:,3 ]i,j = Im{[Hk ]i,j } ∀ij.
(n,g) (n,g)
11: z(t) = [∠{vec(F̂RF )T }, ∠{vec(ŴRF )T }]T .
B. Network Architecture 12: Construct the input-output pair (X(t) , z(t) ).
The proposed network shown in Fig. 2 is composed of ten 13: t = t + 1.
layers. The first layer is the input layer accepting the channel 14: end for k,
matrix data of size NR × NT × 3 which denotes 3 ”channels”, 15: end for g,
each of which has size equal to NR × NT . The second and the 16: end for n,
fourth layer are the convolutional layers with 256 filters of size 17: Training data for CNN-MIMO is obtained from the col-
2 × 2 to extract the features hidden in the input data. We feed lection of the input-output pairs as
the network with the real and imaginary parts of the channel DTRAIN = (X(1) , z(1) ), (X(2) , z(2) ), . . . , (X(T ) , z(T ) ) .
data which provides a large number of features [13], [30] to
be handled to help the network map and learn the input data
in accordance with their label data. After each convolutional
layer, there is a normalization layer to normalize the output of layers, number of filters and kernel sizes, we have conducted
and provide better convergence. The sixth and eighth layers a hyperparameter tuning process to achieve the sufficiently
are fully connected layers with 2048 units, respectively. There good network accuracy and sum-rate performance [11], [13],
are dropout layers after the fully connected layers (the seventh [30]. The current network architecture with a kernel size
and ninth layers) with a 50% probability. The dropout layers 2 × 2 is one possible solution of the considered problem
make the network non-dependent on the initial weights. The with similar/same performance with network structures having
output layer is the regression layer with K(NR + NT ) units different kernels. In other words, although different kernel
which include the phase information of the analog precoders. sizes can also be used for this problem, in this work, we have
In order to obtain the network parameters such as the number first considered a hyperparameter tuning process providing the
7
sufficient performance for the considered scenario with less and elevation angles, are uniform randomly selected from the
computational complexity [11], [13], [30]. intervals φ ∈ [−30◦ , 30◦ ] and θ ∈ [−20◦ , 20◦ ], respectively
The computational workload of a CNN is the result of [6]. We use sectorized angular range by selecting the antenna
(l,k) (l,k)
intensive use of arithmetic operations in its layers. Most of gains gR (ΘR ), gT (ΘT ) as unity for these angular ranges
the operations occur on the convolutional parts of the network. and zero otherwise to provide a sectorized angular interval in-
Hence, convolutional layers are responsible for more than 90% creasing the beamforming gain and reducing interference and
of the execution time during the inference [43]. Conversely provide increased beamforming gain [5]. Hence, the training
to computations, most of the CNN weights are included on data includes a large number of scenarios where the users
the fully connected layers which require approximately 90% are randomly located. For each scenario, the corresponding
of memory due to a large number of weights [43]. Hence, precoder and combiners are obtained by Algorithm 1.
the complexity of CNN is directly proportional to the number The training stage takes about 5 hours for T = 450000
of parameters and the number of layers. The layers of the samples. This process includes both the labeling and the input
proposed CNN structure are described above and the number data generation. Note that the training stage is performed only
of parameters can be calculated as C 2 2Ncv (wh + 1) + once. Then, in the prediction stage, it takes only milliseconds
50

([Nf c1 + 1] + [Nf c2 + 1]) · 100 [43]. Here, C = 3 corresponds to estimate the hybrid precoders as demonstrated in the sim-
to the number of channels, w = h = 2 is the filter size, ulations (please see Table I). Hence, the proposed approach,
and Ncv = 256 is the number of filters in both convolutional providing high data rate and low latency, is quite attractive
layers. The variables Nf c1 = Nf c2 = 2048 describe the since it meets the 5G requirements.
number of units in the fully connected layers for 50% dropout The trained network can work for different parameters such
probability. Hence, the CNN-MIMO structure in Fig. 2 has as the number of users1 K, number of paths L, SNRTEST
41481 parameters. and SNRTRAIN which motivates the practical implementation
of the proposed DL framework. The proposed CNN structure
C. Training requires to be retrained if there is a change in the parameters
like NT , NR , NTRF , which directly dictate the input and output
The CNN structure in Fig. 2 is realized and trained in dimensions of the deep network. The performance of the
MATLAB on a PC with a single GPU and a 768-core network also depends on the angular interval selected in D
processor. We have used the stochastic gradient descent al- when designing the feasible sets F and W as well as the
gorithm with momentum 0.9 [44] and updated the network antenna gains obtaining sectorized angular intervals.
parameters with learning rate 0.005 and mini-batch size of
500 samples for 100 epochs. As a loss function,2 we used the
PT D. Prediction
MSE given by L = T1 t=1 z(t) − f (X(t) ) where f (X) is
a function of the input data X, which represents the nonlinear Once the CNN-MIMO is trained offline as demonstrated in
transformation achieved by the network [11]. Fig. 2, it can be used for the prediction of the hybrid beam-
To train the proposed CNN structure, N = 500 different formers. In order to generate the test data in the prediction
multi-user scenarios are realized with K = 3 users (1500 stage, we have picked users randomly from the validation
channel realizations in total) as in Algorithm 2. For each data and the synthetic noise is also added to the test data
channel matrix, AWGN is added for different powers of with SNRTEST to eliminate the similarity between the test and
SNRTRAIN ∈ {15, 20, 25}dB with G = 100 to account training datasets. The corrupted channel data of each user is
for different channel characteristics. The use of multiple fed to the network and the analog precoders are predicted
SNRTRAIN levels provides a wide range of corrupted data from the output layer of the network. Then, their phases are
in the training which improves the learning and robustness quantized in [0, 2π] with 2B discrete points. Specifically, the
B
of the network. Hence, the total size of the training data values of the quantized phases belong in the set { 2πb }2 to
2B b=1
is NR × NT × 3 × 450000. In the training process, 80% allow the realization of the analog precoder and combiners in
and 20% of all generated data are selected as the training a hardware-efficient manner.
and validation datasets, respectively. The validation aids in
VI. N UMERICAL S IMULATIONS
hyperparameter tuning during the training phase to avoid
the network simply memorizing the training data rather than In this section, we present the performance of the proposed
learning general features for accurate prediction with new method, CNN-MIMO, via several experiments where we train
data. The validation data is used to test the performance of the network with the parameters described in Section V-B such
the network in the simulations for JT = 100 Monte Carlo as N = 500, K = 3, G = 100, SNRTRAIN = {15, 20, 25} dB,
trials. In order to prevent the similarity between the test data learning rate 0.005, batch size 500 and number of epochs 100.
and the training data we also add synthetic noise to the We compare the performance of CNN-MIMO with state-of-
test data where the SNR during testing is defined similar to the-art hybrid precoding techniques such as the manifold op-
|[H] |2 timization (MO) [45], the low-resolution hybrid beamforming
SNRTRAIN as SNRTEST = 20 log10 ( σ2i,j ). The number of
TEST
grid points is selected as L̄ = 60 for azimuth and L̄ = 20 1 When the network is trained for K
TRAIN users, the output size of the
for elevation angular sectors in Algorithm 1. In addition, the network is z ∈ RNT KTRAIN +NR KTRAIN . Then we can use the trained
network for hybrid beamforming when there are K ≤ KTRAIN users by
propagation environment is modeled with L = 10 paths from substituting network output of size NT K + NR K × 1 corresponding to
the users and all the user directions, i.e., all the azimuth those K users.
8
(LRHB) [8], SOMP [6] and the two-stage hybrid beamforming

8
(TS-HB) algorithm [7]. While manifold optimization and
7.5
SOMP were proposed for a single-user scenario, we adapt the
algorithms for the multi-user case by using the same strategy 7
for interference cancellation as in [7]. CNN-MIMO is also 6.5
Sum-Rate [bits/s/Hz]
compared with the DL-based approach MLP proposed in [27]. 6
MLP is designed as described in [27] but adapted for the 5.5

multi-user scenario with the same training data used for CNN- 5 No Interference
Manifold Optimization
MIMO. As another benchmark and denoted as ”No interfer- 4.5
Algorithm 1
CNN-MIMO
ence” in the simulations, we present the performance of fully- LRHB
4
digital beamforming and combining where the interference is MLP
TS-HB
3.5
completely eliminated. In addition, the performance plot of the SOMP
precoders used in the test data (obtained from Algorithm 1) is 3

0 5 10 15 20 25 30 35 40
indicated as ”Algorithm 1” in the experiments. SNR TEST , [dB]
(a)
12
0.16
No Interference Manifold Optimization
Manifold Optimization 0.15 CNN-MIMO
10 Algorithm 1 LRHB
CNN-MIMO 0.14 MLP
LRHB TS-HB
8 MLP SOMP
0.13
TS-HB
SOMP
0.12
RMSE
6 5.1
0.11
5
0.1
4 4.9
4.8 0.09
-5.1 -5 -4.9
0.08
2
0.07
0 0.06
-30 -25 -20 -15 -10 -5 0 5 10 0 5 10 15 20 25 30 35 40
SNR, [dB] SNR TEST , [dB]
(b)
Fig. 3. Sum-rate versus SNR (NT = 36, NR = 9, K = 3, B = 3 and
SNRTEST = 20 dB).
0.018
In Fig. 3, we present the achievable sum-rate performance CNN-MIMO
0.016 LRHB
of the algorithms with respect to different SNR levels. The MLP
TS-HB
design parameters of CNN-MIMO are given in Section IV- 0.014 SOMP
B. Moreover, we select the number of antennas per BS and
per user as NT = 36, NR = 9, respectively. Synthetic 0.012
RMSE
noise is added to both the channel matrices and the array

0.01
responses with SNRTEST = 20 dB and B = 3 quantization
bits are used. The number of users is K = 3 and there are 0.008
L = 10 paths for each user. As a benchmark, we use the

0.006
fully digital beamforming and the MO algorithm which has
the best performance since it obtains near-optimum analog and 0.004
baseband precoders. Our CNN approach follows the perfor- 0 5 10 15 20 25 30 35 40
SNR TEST , [dB]
mance of the MO algorithm. In fact, CNN-MIMO provides
(c)
the highest sum-rate as compared to the other algorithms.
Notably, although LRHB is the state-of-the-art technique based Fig. 4. Performance comparison for corrupted channel data. In (a), sum-rate
versus SNRTEST is given whereas the RMSE for precoder FRF and combiner
on phase extraction and it is regarded as the technique having WRF are shown in (b) and (c), respectively (NT = 36, NR = 9, K = 3,
the best performance in the literature [8], we observe the B = 3 and SNR= 0 dB).
outperformance of CNN-MIMO. MLP has poorer performance
due to the lack of feature extraction that is achieved by the
convolutional layers in CNN-MIMO. In particular, the effec- can be obtained if CNN-MIMO yields the output exactly the
tiveness of CNN-MIMO can be attributed to the maximization same as the labels obtained in Algorithm 1. Hence, we can
of the sum-rate by visiting all possible combinations for the say that the performance of CNN-MIMO is limited by the
analog parts at both the receiver and transmitter side through performance of Algorithm 1. We observe that the performance
an exhaustive search and well-trained deep network. We can of CNN-MIMO is close to Algorithm 1 where the gap between
point out that the ultimate performance from CNN-MIMO these two is due to the corruption in the input data. SOMP
9
and TS-HB have poorer performance as compared to CNN- The analog precoders are designed with discrete phase
MIMO. Especially, while SOMP was initially proposed for shifters with constant modulus to steer the beam in spatial
the single-user case, we have adapted the algorithm for the precoding. To assess the performance for the phase resolu-
multi-user scenario where the analog precoders are designed tion in the phase shifters, we present the sum-rate of the
based on the similarity between the optimum precoder and algorithms for different quantization resolutions where the
the analog precoders. As a result, SOMP does not always phases of the analog precoder and combiners are quantized
find the optimum weights maximizing the sum-rate [31]. TS- for B = {1, . . . , 8} bits. The results are depicted in Fig. 5
HB algorithm has better performance than SOMP since it is where we observe that the other algorithms converge after 4
based on the maximization of the sum-rate and its performance bits while, remarkably, the proposed CNN approach achieves
converges to the same one as SOMP when there is a single higher sum-rate starting from one-bit quantization.
path from each user. In Fig. 6(a), the performance is evaluated for varying
number of users, namely, K ∈ {2, . . . , 8} where L = 10
8
is fixed. Notably, CNN-MIMO performs better than the other
algorithms. In particular, the gap between ”No interference”
7.5
and CNN-MIMO becomes larger as K increases. We observe
7 that the performance of MLP becomes better than LRHB
after K ≥ 5 and exhibits robust performance like CNN-
6.5
MIMO with a certain performance loss. The main reason
6
is that the use of training data prepared with Algorithm 1
5.5
No Interference which provides more accurate beamformers than the other
5 Algorithm 1 algorithms. We also see that CNN-MIMO closely follows the
CNN-MIMO
4.5 LRHB performance of Algorithm 1. However, this gap appears due
MLP
TS-HB
to the insufficient performance of interference cancellation.
4
SOMP
Hence, it is suggested to develop more effective algorithms to
3.5
1 2 3 4 5 6 7 8 handle the interference among the users.
Number of Quantization Bits In Fig. 6(b), we evaluate the performance of CNN-MIMO
when the number of paths for each user is not fixed. Hence, we
Fig. 5. Sum-rate versus angular resolution of the analog precoders (NT = 36,
NR = 9, SNR= 0 dB, SNRTEST = 20 dB). train the network with the same parameters except selecting
L uniform randomly from the interval [1, 10]. Using varying
The feedback data, namely, the channel matrix {Hk }K k=1 L values for different users reduces the similarity between the
and the feasible array response sets F and W may not always channel data of users and we obtain satisfactory performance
be perfectly available. In order to evaluate the performance of CNN-MIMO similar to the observations made when L is
of the algorithms on the robustness against the corrupted fixed.
feedback, we simulate the performance of the algorithms
for different SNRTEST levels for the same setting as in the TABLE I
previous simulation. In this case, complex AWGN was added C OMPUTATION T IMES (I N SECONDS ).
to both channel and array response data to resemble the
NT Algorithm 1 CNN-MIMO MLP LRHB TS-HB SOMP
deviations in the feedback data. The results are presented in 4 0.1061 0.0039 0.0034 0.0059 0.0093 0.0122
Fig. 4 where we present the achievable sum-rate in Fig.4(a) 16 0.1164 0.0043 0.0038 0.0113 0.0103 0.0139
while the RMS error on precoder FRF and combiner WRF 64 0.1175 0.0049 0.0045 0.0159 0.0108 0.0216
100 0.1242 0.0052 0.0049 0.0318 0.0125 0.0282
are shown in Figs. 4(b) and 4(c), respectively. Note that
Algorithm 1 is fed with perfect CSI to demonstrate the best
achievable performance. As can be seen from Fig. 4, CNN- In Fig. 7, we illustrate the performance for varying number
MIMO is more robust against the corruption in the channel of BS antennas. As can be seen, similar observations can be
data as compared to the other methods. Note that the manifold obtained. Specifically, CNN-MIMO performs better than the
optimization, LRHB, MLP, and CNN-MIMO are only affected other algorithms. Furthermore, we present the computation
by the corruption in the channel data since they automatically times of the algorithms for a different number of BS antennas
estimate the analog precoders, unlike SOMP and TS-HB which in Table I in seconds. While the complexity of Algorithm 1 is
require the feasible sets F and W as input. As a result, the highest due to the exhaustive search, DL-based approaches,
the performance of TS-HB and SOMP heavily rely on the i.e., CNN-MIMO and MLP have the least computation time as
accuracy of both the channel matrix and the array response compared to LRHB and the rest. MLP appears slightly lower
sets. Moreover, the knowledge of channel data and the feasible complexity than CNN-MIMO due to its less complex structure,
sets F and W is only needed in the training stage of the however, it has poorer performance as was shown in the
network to obtain the labels and it is not used in the prediction previous experiments. In addition, regarding the complexity
stage. However, the other algorithms like SOMP and TS-HB, of TS-HB and SOMP, given its dependence on the number
require this information to solve the hybrid precoding problem. of elements in the feasible sets F and W, it is observed that
Overall, these results show the robustness of the proposed TS-HB has less computation time than SOMP since it does
CNN-MIMO. not follow an OMP stage to obtain the precoders but it selects
10
9 9
No Interference LRHB
Manifold Optimization MLP
8.5 8
Algorithm 1 TS-HB
CNN-MIMO SOMP
7
8
6
7.5
5
7
4 No Interference
6.5 Algorithm 1
3 CNN-MIMO
LRHB
6 2 MLP
TS-HB
SOMP
5.5 1
2 3 4 5 6 7 8 0 20 40 60 80 100
Number of Users Number of BS Antennas
(a)
Fig. 7. Sum-rate versus number of BS antennas (K = 3, NR = 9, SNR= 0
dB and SNRTEST = 20 dB).
9
8 VII. C ONCLUSIONS
Spectral Efficiency [bits/s/Hz]
We proposed a DL framework for hybrid precoding design

7
in multi-user mmWave MIMO systems. The proposed network
architecture is a CNN which accepts as input the channel
6
No Interference LRHB
matrix of users and gives at the output the analog precoder
5
Algorithm 1
MLP
TS-HB
and combiners. The proposed technique was compared with
CNN-MIMO SOMP both optimization- and greedy-based approaches as well as
4
DL-based techniques such as MLP. The effectiveness of
the proposed CNN approach was evaluated through several
3 experiments and it is shown that CNN-MIMO achieves a
2 3 4 5 6 7 8
Number of Users
better performance than the state-of-the-art hybrid precoding
(b)
approaches as well as less computation time. The effectiveness
of CNN-MIMO can be attributed to the use of exhaustive
Fig. 6. Sum-rate versus number of users. The number of paths is fixed as
L = 10 in (a), and L is selected uniform randomly in the interval [1, 10] in search to obtain the best analog precoders and combiners
(b) respectively. (NT = 100, NR = 9, SNR= 0 dB and SNRTEST = 20 in the training stage. In order to train the network, a large
dB). training data, with a length of nearly half a million, was
used. Notably, large training data provides robust performance
against the deviations in the channel data. Moreover, we
the ones with the highest channel gain from the codebook
showed that CNN-MIMO achieves more robust results in the
[7]. It is also worthwhile to mention the trade-off between the
presence of imperfections regarding the channel matrix and
computation time and the performance of CNN-MIMO. While
array responses.
the MO algorithm has slightly better performance than CNN-
MIMO, the proposed DL framework provides a significantly
faster computation of the hybrid beamformers than the MO R EFERENCES
algorithm. The complexity of MO also increases at a higher [1] R. W. Heath, N. González-Prelcic, S. Rangan, W. Roh, and A. M.
rate than that of CNN-MIMO. This observation demonstrates Sayeed, “An Overview of Signal Processing Techniques for Millimeter
Wave MIMO Systems,” IEEE J. Sel. Topics Signal Process., vol. 10,
that CNN-MIMO is more useful in terms of computational pp. 436–453, April 2016.
complexity even for a very large number of antennas which [2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K.
is the case in 5G systems. Hence, we believe the proposed Soong, and J. C. Zhang, “What Will 5G Be?,” IEEE J. Sel. Areas
Commun., vol. 32, pp. 1065–1082, June 2014.
approach can be a promising technique to be used in mmWave [3] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,
systems where low complexity and robust performance are O. Edfors, and F. Tufvesson, “Scaling Up MIMO: Opportunities and
required. The run times of CNN-MIMO can be further ac- Challenges with Very Large Arrays,” IEEE Signal Process. Mag.,
vol. 30, pp. 40–60, Jan 2013.
celerated by implementing the network in general-purpose [4] L. Wei, R. Q. Hu, Y. Qian, and G. Wu, “Key elements to enable mil-
hardware such as FPGA. For example, domain-specific archi- limeter wave communications for 5G wireless systems,” IEEE Wireless
tectures have been implemented in [46] for AlexNet [43] and Communications, vol. 21, pp. 136–143, December 2014.
[5] A. Alkhateeb, O. E. Ayach, G. Leus, and R. W. Heath, “Hybrid
VGG-16 for real-time image classification with 194 GOP/s precoding for millimeter wave cellular systems with partial channel
(billions of fixed-point OPerations per second) and consuming knowledge,” in 2013 Information Theory and Applications Workshop
only 300 mW. These promising results encourage us to develop (ITA), pp. 1–5, Feb 2013.
[6] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath,
more energy-efficient DL approaches for the problems in “Spatially Sparse Precoding in Millimeter Wave MIMO Systems,” IEEE
communications systems. Trans. Wireless Commun., vol. 13, pp. 1499–1513, March 2014.
11
[7] A. Alkhateeb, G. Leus, and R. W. Heath, “Limited feedback hybrid pre-

coding for Multi-User millimeter wave systems,” IEEE Trans. Wireless based Millimeter-Wave Massive MIMO for Hybrid Precoding,” IEEE
Commun., vol. 14, pp. 6481–6494, Nov. 2015. Trans. Veh. Technol., pp. 1–1, 2019.
[8] Z. Wang, M. Li, Q. Liu, and A. L. Swindlehurst, “Hybrid Precoder [28] T. Lin and Y. Zhu, “Beamforming Design for Large-Scale Antenna
and Combiner Design With Low-Resolution Phase Shifters in mmWave Arrays Using Deep Learning,” arXiv e-prints, p. arXiv:1904.03657, Apr
MIMO Systems,” IEEE J. Sel. Topics Signal Process., vol. 12, pp. 256– 2019.
269, May 2018. [29] A. Alkhateeb, S. P. Alex, P. Varkey, Y. Li, Q. Z. Qu, and D. Tujkovic,
[9] M. Kokshoorn, H. Chen, Y. Li, and B. Vucetic, “Beam-On-Graph: “Deep Learning Coordinated Beamforming for Highly-Mobile Millime-
Simultaneous Channel Estimation for mmWave MIMO Systems With ter Wave Systems,” IEEE Access, vol. 6, pp. 37328–37348, 2018.
Multiple Users,” IEEE Trans. Commun., vol. 66, pp. 2931–2946, July [30] A. M. Elbir, “CNN-based precoder and combiner design in mmWave
2018. MIMO systems,” IEEE Commun. Lett., vol. 23, no. 7, pp. 1240–1243,
[10] X. Zhai, Y. Cai, Q. Shi, M. Zhao, G. Y. Li, and B. Champagne, “Joint 2019.
Transceiver Design With Antenna Selection for Large-Scale MU-MIMO [31] A. M. Elbir and K. V. Mishra, “Joint Antenna Selection and Hybrid
mmWave Systems,” IEEE J. Sel. Areas Commun., vol. 35, pp. 2085– Beamformer Design using Unquantized and Quantized Deep Learning
2096, Sep. 2017. Networks,” arXiv e-prints, p. arXiv:1905.03107, May 2019.
[11] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, [32] X. Yu, J. Shen, J. Zhang, and K. B. Letaief, “Alternating minimization
no. 7553, pp. 436–444, 2015. algorithms for hybrid precoding in millimeter wave MIMO systems,”
[12] D. Yu and L. Deng, “Deep learning and its applications to signal and IEEE J. Sel. Top. Signal Process., vol. 10, pp. 485–500, Apr. 2016.
information processing [exploratory dsp],” IEEE Signal Process. Mag., [33] E. Torkildson, C. Sheldon, U. Madhow, and M. Rodwell, “Millimeter-
vol. 28, pp. 145–154, Jan 2011. Wave Spatial Multiplexing in an Indoor Environment,” in 2009 IEEE
[13] A. M. Elbir, K. V. Mishra, and Y. C. Eldar, “Cognitive radar antenna Globecom Workshops, pp. 1–6, Nov 2009.
selection via deep learning,” IET Radar, Sonar & Navigation, vol. 13, [34] R. Méndez-Rial, C. Rusu, A. Alkhateeb, N. Gonzlez-Prelcic, and R. W.
pp. 871–880(9), June 2019. Heath, “Channel estimation and hybrid combining for mmWave: Phase
[14] Z. Jiang, S. Chen, A. F. Molisch, R. Vannithamby, S. Zhou, and Z. Niu, shifters or switches?,” in 2015 Information Theory and Applications
“Exploiting Wireless Channel State Information Structures Beyond Workshop (ITA), pp. 90–97, Feb 2015.
Linear Correlations: A Deep Learning Approach,” IEEE Commun. Mag., [35] V. Raghavan and A. M. Sayeed, “Multi-antenna capacity of sparse
vol. 57, pp. 28–34, March 2019. multipath channels,” IEEE TRANS. INFORM. THEORY, 2006.
[15] M. Feng and S. Mao, “Dealing with Limited Backhaul Capacity in [36] T. S. Rappaport, F. Gutierrez, E. Ben-Dor, J. N. Murdock, Y. Qiao,
Millimeter-Wave Systems: A Deep Reinforcement Learning Approach,” and J. I. Tamir, “Broadband Millimeter-Wave Propagation Measurements
IEEE Commun. Mag., vol. 57, pp. 50–55, March 2019. and Models Using Adaptive-Beam Antennas for Outdoor Urban Cellular
[16] H. Ye, G. Y. Li, and B. Juang, “Power of Deep Learning for Channel Communications,” IEEE Trans. Antennas Propag., vol. 61, pp. 1850–
Estimation and Signal Detection in OFDM Systems,” IEEE Wireless 1859, April 2013.
Communications Letters, vol. 7, pp. 114–117, Feb 2018. [37] I. A. Hemadeh, K. Satyanarayana, M. El-Hajjar, and L. Hanzo,
[17] H. Huang, J. Yang, H. Huang, Y. Song, and G. Gui, “Deep Learning “Millimeter-Wave Communications: Physical Channel Models, Design
for Super-Resolution Channel Estimation and DOA Estimation Based Considerations, Antenna Constructions, and Link-Budget,” IEEE Com-
Massive MIMO System,” IEEE Trans. Veh. Technol., vol. 67, pp. 8549– mun. Surveys Tuts., vol. 20, pp. 870–913, Secondquarter 2018.
8560, Sep. 2018. [38] H. Huang, J. Yang, H. Huang, Y. Song, and G. Gui, “Deep learning for
[18] Y. Long, Z. Chen, J. Fang, and C. Tellambura, “Data-Driven-Based super-resolution channel estimation and doa estimation based massive
Analog Beam Selection for Hybrid Beamforming Under mm-Wave mimo system,” IEEE Trans. Veh. Technol., vol. 67, pp. 8549–8560, Sept
Channels,” IEEE J. Sel. Topics Signal Process., vol. 12, pp. 340–352, 2018.
May 2018. [39] Z. Marzi, D. Ramasamy, and U. Madhow, “Compressive Channel
[19] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” in 2017 Estimation and Tracking for Large Arrays in mm-Wave Picocells,” IEEE
IEEE 18th International Workshop on Signal Processing Advances in J. Sel. Topics Signal Process., vol. 10, pp. 514–527, April 2016.
Wireless Communications (SPAWC), pp. 1–5, July 2017. [40] J. Wang, Z. Lan, C. woo Pyo, T. Baykas, C. sean Sum, M. A. Rahman,
[20] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep Reinforce- J. Gao, R. Funada, F. Kojima, H. Harada, and S. Kato, “Beam codebook
ment Learning for Dynamic Multichannel Access in Wireless Networks,” based beamforming protocol for multi-Gbps millimeter-wave WPAN
IEEE Transactions on Cognitive Communications and Networking, systems,” IEEE J. Sel. Areas Commun., vol. 27, pp. 1390–1399, October
vol. 4, pp. 257–265, June 2018. 2009.
[21] S. Dörner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep Learning [41] E. Björnson, L. Van der Perre, S. Buzzi, and E. G. Larsson, “Massive
Based Communication Over the Air,” IEEE J. Sel. Topics Signal MIMO in Sub-6 GHz and mmWave: Physical, Practical, and Use-Case
Process., vol. 12, pp. 132–143, Feb 2018. Differences,” arXiv e-prints, p. arXiv:1803.11023, Mar 2018.
[22] V. Raj and S. Kalyani, “Backpropagating Through the Air: Deep [42] J. Cheng, J. Wu, C. Leng, Y. Wang, and Q. Hu, “Quantized CNN: A
Learning at Physical Layer Without Channel Models,” IEEE Commun. unified approach to accelerate and compress convolutional networks,”
Lett., vol. 22, pp. 2278–2281, Nov 2018. IEEE Transactions on Neural Networks and Learning Systems, vol. 29,
[23] C. Wen, W. Shih, and S. Jin, “Deep Learning for Massive MIMO CSI no. 10, pp. 4730–4743, 2018.
Feedback,” IEEE Wireless Communications Letters, vol. 7, pp. 748–751, [43] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
Oct 2018. with deep convolutional neural networks,” in Advances in Neural Infor-
[24] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learning mation Processing Systems, pp. 1097–1105, 2012.
at physical layer without channel models,” IEEE Commun. Lett., vol. 22, [44] C. M. Bishop, Pattern Recognition and Machine Learning. Springer,
pp. 2278–2281, Nov. 2018. New York, 2006.
[25] P. Dong, H. Zhang, G. Y. Li, N. Naderializadeh, and I. Gaspar, “Deep [45] X. Yu, J. Shen, J. Zhang, and K. B. Letaief, “Alternating Minimization
cnn based channel estimation for mmwave massive mimo systems,” Algorithms for Hybrid Precoding in Millimeter Wave MIMO Systems,”
ArXiv, vol. abs/1904.06761, 2019. IEEE J. Sel. Topics Signal Process., vol. 10, pp. 485–500, April 2016.
[26] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep Learning-Based Channel [46] B. Sun, L. Yang, P. Dong, W. Zhang, J. Dong, and C. Young,
Estimation for Beamspace mmWave Massive MIMO Systems,” IEEE “Ultra Power-Efficient CNN Domain Specific Accelerator with
Wireless Communications Letters, vol. 7, pp. 852–855, Oct 2018. 9.3TOPS/Watt for Mobile and Embedded Applications,” arXiv e-prints,
[27] H. Huang, Y. Song, J. Yang, G. Gui, and F. Adachi, “Deep-Learning- p. arXiv:1805.00361, Apr 2018.

Hybrid Precoding For Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach

Uploaded by

Copyright:

Available Formats

Hybrid Precoding For Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hybrid Precoding For Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach

Uploaded by

Copyright:

Available Formats

1

Hybrid Precoding for Multi-User Millimeter Wave

statistical expectation, and k · kF is the Frobenious norm. The

II. S YSTEM M ODEL

(LRHB) [8], SOMP [6] and the two-stage hybrid beamforming

for interference cancellation as in [7]. CNN-MIMO is also 6.5

MLP is designed as described in [27] but adapted for the 5.5

precoders used in the test data (obtained from Algorithm 1) is 3

noise is added to both the channel matrices and the array

L = 10 paths for each user. As a benchmark, we use the

We proposed a DL framework for hybrid precoding design

[7] A. Alkhateeb, G. Leus, and R. W. Heath, “Limited feedback hybrid pre-

You might also like