Spatial Deep Learning For Wireless Scheduling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1

Spatial Deep Learning for Wireless Scheduling


Wei Cui, Kaiming Shen, Student Member, IEEE, and Wei Yu, Fellow, IEEE

Abstract—The optimal scheduling of interfering links in optimizing the schedule based on the estimated channels.
a dense wireless network with full frequency reuse is a This model-based approach, however, suffers from two key
challenging task. The traditional method involves first esti- shortcomings. First, the need to estimate not only the direct
mating all the interfering channel strengths then optimizing
the scheduling based on the model. This model-based method channels but also all the interfering channels is resource
is however resource intensive and computationally hard, intensive. In a network of N transmitter-receiver pairs, N 2
arXiv:1808.01486v2 [eess.SP] 16 Dec 2018

because channel estimation is expensive in dense networks; channels need to be estimated within each coherence block.
further, finding even a locally optimal solution of the resulting Training takes valuable resources away from the actual data
optimization problem may be computationally complex. This transmissions; further, pilot contamination is inevitable in
paper shows that by using a deep learning approach, it
is possible to bypass channel estimation and to schedule large networks. Second, the achievable data rates in an
links efficiently based solely on the geographic locations of interfering environment are nonconvex functions of the
transmitters and receivers for networks in which the channels transmit powers. Moreover, scheduling variables are binary.
are largely functions of distance dependent path-losses. This Hence, even with full channel knowledge, the optimization
is accomplished by unsupervised training over randomly of scheduling is a nonconvex integer programming problem
deployed networks, and by using a novel neural network
architecture that takes the geographic spatial convolutions for which finding an optimal solution is computationally
of the interfering or interfered neighboring nodes as input complex and is challenging for real-time implementation.
over multiple feedback stages to learn the optimum solution. This paper proposes a new approach, named spatial
The resulting neural network gives near-optimal performance
for sum-rate maximization and is capable of generalizing to
learning, to address the above two issues. Our key idea
larger deployment areas and to deployments of different link is to recognize that in many deployment scenarios, the
densities. Moreover, to provide fairness, this paper proposes a optimal link scheduling does not necessarily require the
novel scheduling approach that utilizes the sum-rate optimal exact channel estimates, and further the interference pattern
scheduling algorithm over judiciously chosen subsets of links in a network is to a large extent determined by the relative
for maximizing a proportional fairness objective over the
network. The proposed approach shows highly competitive
locations of the transmitters and receivers. Hence, it ought
and generalizable network utility maximization results. to be possible to learn the optimal scheduling based
solely on the geographical locations of the neighboring
Index Terms—Deep learning, discrete optimization, geo-
graphic location, proportional fairness, scheduling, spatial
transmitters/receivers, thus bypassing channel estimation
convolution altogether. Toward this end, this paper proposes a neural
network architecture that takes the geographic spatial con-
volution of the interfering or interfered neighboring trans-
I. I NTRODUCTION mitters/receivers as input, and learns the optimal scheduling
Scheduling of interfering links is one of the most fun- in a densely deployed D2D network over multiple stages
damental tasks in wireless networking. Consider a densely based on the spatial parameters alone.
deployed device-to-device (D2D) network with full fre- We are inspired by the recent explosion of successful
quency reuse, in which nearby links produce significant in- applications of machine learning techniques [1], [2] that
terference for each other whenever they are simultaneously demonstrate the ability of deep neural networks to learn
activated. The task of scheduling amounts to judiciously rich patterns and to approximate arbitrary function map-
activating a subset of mutually “compatible” links so as pings [3]. We further take advantage of the recent progress
to avoid excessive interference for maximizing a network on fractional programming methods for link scheduling
utility. [4]–[6] that allows us to compare against the state-of-
The traditional approach to link scheduling is based the-art benchmark. The main contribution of this paper
on the paradigm of first estimating the interfering chan- is a specifically designed neural network architecture that
nels (or at least the interference graph topology), then facilitates the spatial learning of geographical locations of
Manuscript submitted to IEEE Journal on Selected Areas in Com- interfering or interfered nodes and is capable of achieving
munications on July 21, 2018, revised December 18, 2018. This work a large portion of the optimum sum rate of the state-of-the-
is supported by Natural Science and Engineering Research Council art algorithm in a computationally efficient manner, while
(NSERC) via the Canada Research Chairs program. The materials in this
paper have been be presented in part at the IEEE Global Communication requiring no explicit channel state information (CSI).
Conference (Globecom), Abu Dhabi, December 2018. The authors are Traditional approach to scheduling over wireless inter-
with The Edward S. Rogers Sr. Department of Electrical and Computer
Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e- fering links for sum rate maximization are all based on
mails: {cuiwei2, kshen, weiyu}@ece.utoronto.ca). (non-convex) optimization, e.g., greedy heuristic search
2

[7], iterative methods for achieving quality local optimum This paper considers the objective function of max-
[4], [8], methods based on information theory considera- imizing the weighted sum rate over the N users over
tions [9], [10] or hyper-graph coloring [11], [12], or meth- each scheduling slot. More specifically, for fixed values
ods for achieving the global optimum but with worst-case of weights wi , the scheduling problem is formulated as
exponential complexity such as polyblock-based optimiza- N
tion [13] or nonlinear column generation [14]. The recent
X
maximize wi Ri (2a)
re-emergence of machine learning has motivated the use x
i=1
of neural networks for wireless network optimization. This subject to xi ∈ {0, 1}, ∀i. (2b)
paper is most closely related to the recent work of [15], [16]
in adapting deep learning to perform power control and The weights wi indicate the priorities assigned to each
[17] in utilizing ensemble learning to solve a closely related user, (i.e., the higher priority users are more likely to be
problem, but we go one step further than [15]–[17] in that scheduled). The overall problem is a challenging discrete
we forgo the traditional requirement of CSI for spectrum optimization problem, due to the complicated interactions
optimization. We demonstrate that in wireless networks for between different links through the interference terms
which the channel gains largely depend on the path-losses, in the signal-to-interference-and-noise (SINR) expressions,
the location information (which can be easily obtained via and the different priority weights each user may have.
global positioning system) can be effectively used as a The paper begins by treating the scheduling problem
proxy for obtaining near-optimum solution, thus opening with equal weights w1 = w2 = · · · = wN , equivalent
the door for much wider application of learning theory to to a sum-rate maximization problem. The second part
resource allocation problems in wireless networking. of this paper deals with the more challenging problem
The rest of the paper is organized as follows. Section II of scheduling under adaptive weights w1 , w2 , · · · , wN for
establishes the system model. Section III proposes a deep maximizing a network utility. The assignment of weights
learning based approach for wireless link scheduling for is typically based on upper-layer considerations, e.g., as
sum-rate maximization. The performance of the proposed function of the queue length in order to minimize delay or
method is provided in Section IV. Section V discusses to stabilize the queues [19], or as function of the long-term
how to adapt the proposed method for proportionally fair average rate of each user in order to provide fairness across
scheduling. Conclusions are drawn in Section VI. the network [20], or as combination of both.
This paper utilizes unsupervised training to optimize
the parameters of the neural network. The results will be
II. W IRELESS L INK S CHEDULING compared against multiple benchmarks including a recently
developed and the-state-of-art fractional programming ap-
Consider a scenario of N independent D2D links located proach (referred to as FPLinQ or FP) [4] for obtaining
in a two-dimensional region. The transmitter-receiver dis- high-quality local optimum benchmark solutions. We re-
tance can vary from links to links. We use pi to denote the mark that the FPLinQ benchmark solutions can also be uti-
fixed transmit power level of the ith link, if it is activated. lized as training targets for supervised training of the neural
Moreover, we use hij ∈ C to denote the channel from the network, and a numerical comparison is provided later in
transmitter of the jth link to the receiver of the ith link, the paper. FPLinQ relies on a transformation of the SINR
and use σ 2 to denote the background noise power level. expression that decouples the signal and the interference
Scheduling occurs in a time slotted fashion. In each time terms and a subsequent coordinated ascent approach to find
slot, let xi ∈ {0, 1} be an indicator variable for each link i, the optimal transmit power for all the links. The FPLinQ
which equals to 1 if the link is scheduled and 0 otherwise. algorithm is closely related to the weighted minimum
We assume full frequency reuse with bandwidth W . Given mean-square-error (WMMSE) algorithm for weighted sum-
a set of scheduling decisions xi , the achievable rate Ri for rate maximization [8]. For the scheduling task, FPLinQ
link i in the time slot can be computed as quantizes the optimized power in a specific manner to
!
|hii |2 pi xi obtain the optimized binary scheduling variables.
Ri = W log 1 + P , (1)
Γ( j6=i |hij |2 pj xj + σ 2 ) III. D EEP L EARNING BASED L INK S CHEDULING FOR
where Γ is the SNR gap to the information theoretical S UM -R ATE M AXIMIZATION
channel capacity, due to the use of practical coding and We begin by exploring the use of deep neural network
modulation for the linear Gaussian channel [18]. Because for scheduling, while utilizing only location information,
of the interference between the links, activating all the links under the sum-rate maximization criterion. The sum-rate
at the same time would yield poor data rates. The wireless maximization problem (i.e., with equal weights) is con-
link scheduling problem is that of selecting a subset of siderably simpler than weighted rate-sum maximization,
links to activate in any given transmission period so as to because all the links have equal priority. We aim to use
maximize some network utility function of the achieved path-losses and the geographical locations information to
rates. determine which subset of links should be scheduled.
3

Transmitter Density Grid

1 2
1 3
1
1

1 1
1
2
2 1
Original Links Layout Layout with Discretized Cells 1
Receiver Density Grid

Fig. 1. Transmitter and receiver density grids

A. Learning Based on Geographic Location Information In fact, if we account for fast fading in addition, the CSI
can be thought of as a stochastic function of GLI
A central goal of this paper is to demonstrate that for
wireless networks in which the channel gains are largely CSI = f (GLI). (3)
functions of distance dependent path-losses, the geographi- While optimization approaches to the wireless link
cal location information is already sufficient as a proxy for scheduling problem aim to find a mapping g(·) from CSI
optimizing link scheduling. This is in contrast to traditional to the scheduling decisions, i.e.,
optimization approaches for solving (2) that require the full
instantaneous CSI, and also in contrast to the recent work x = g(CSI), (4)
[15] that proposes to use deep learning to solve the power the deep learning architecture of this paper aims to capture
control problem by learning the WMMSE optimization directly the mapping from GLI to x, i.e., to learn the
process. In [15], a fully connected neural network is function
designed that takes in the channel coefficient matrix as the x = g(f (GLI)). (5)
input, and produces optimized continuous power variables
as the output to maximize the sum rate. While satisfactory
scheduling performance has been obtained in [15], the
architecture of [15] is not scalable. In a D2D links network B. Transmitter and Receiver Density Grid as Input
with N transmitter-receiver pairs, there are N 2 channel To construct the input to the neural network based on
coefficients. A fully connected neural network with N 2 GLI, we quantize the continuous (dtx rx
i , di ) in a grid form.
nodes in the input layer and N output layer would require Without loss of generality, we assume a square `×` meters
at least O(N 3 ) interconnect weights (and most likely much deployment area, partitioned into equal-size square cells
more). Thus, the neural network architecture proposed in with an edge length of `/M , so that there are M 2 cells
[15] has training and testing complexity that grows rapidly in total. We use (s, t) ∈ [1 : M ] × [1 : M ] to index the
with the number of links. cells. For a particular link i, let (stx tx
i , ti ) be the index of
Instead of requiring the full set of CSI between every the cell where the transmitter di is located, and (srx
tx rx
i , ti )
transmitter and every receiver as the inputs to the neu- be the index of the cell where the receiver di is located. rx
ral network {hij }, which has O(N 2 ) entries, this paper We use the tuple (stx tx rx rx
i , ti , si , ti ) to represent the location
proposes to use the geographic location information (GLI) information of the link.
as input, defined as a set of vectors {(dtx rx
i , di )}i , where We propose to construct two density grid matrices of size
2 2
dtx
i ∈ R and drx
i ∈ R are the transmitter and the receiver M × M , denoted by T and R, to represent the density of
locations of the ith link, respectively. Note that the input the active transmitters and receivers, respectively, in the ge-
now scales linearly with the number of links, i.e., O(N ). ographical area. The density grid matrices are constructed
We advocate using GLI as a substitute for CSI because in by simply counting the total number of active transmitters
many wireless deployment scenarios, GLI already captures and receivers in each cell, as illustrated in Fig. 1. The
the main feature of channels: the path-loss and shadow- activation pattern {xi } is initialized as a vector of all 1’s at
ing of a wireless link are mostly functions of distance the beginning. As the algorithm progressively updates the
and location. This is essentially true for outdoor wireless activation pattern, the density grid matrices are updated as
channels, and especially so in rural regions or remote X
areas, where the number of surrounding objects to reflect T (s, t) = xi , (6)
the wireless signals is sparse. An example application is {i|(stx tx
i ,ti )=(s,t)}
X
a sensor network deployed outdoors for environmental R(s, t) = xi . (7)
monitoring purposes. {i|(srx rx
i ,ti )=(s,t)}
4

Original Field in Grids

Raw Weight Parameters


60 3
Convolution Filter
Direct Channel Strength
Estimation

50 2

40 1

30 0
Transmitter’s
location
20 −1 Filter Center Anchor:
Receiver’s location

10 −2
Fig. 3. Extracting direct channel strength from convolution filter
0
0 10 20 30 40 50 60

pre-defined size and trainable parameters. The value of


Fig. 2. A trained spatial convolution filter (in log scale)
each entry of the filter can be interpreted as the channel
coefficient of a transceiver located at a specific distance
from the center of the filter. Through training, the filter
C. Novel Deep Neural Network Structure learns the channel coefficient by adjusting its weights.
Fig. 2 shows a trained filter. As expected, the trained filter
The overall neural network structure for link scheduling exhibits a circular symmetric pattern with radial decay.
with sum-rate objective is an iterative computation graph. The convolution stage described above summarizes two
A key novel feature of the network structure is a forward quantities for each link: the total interference produced by
path including two stages: a convolution stage that captures the transmitter and the total interference the receiver is
the interference patterns of neighboring links based on being exposed to. Furthermore, we can also extract another
the geographic location information and a fully connected important quantity for scheduling from the trained convolu-
stage that captures the nonlinear functional mapping of the tion filter: the direct channel strength. At the corresponding
optimized schedule. Further, we propose a novel feedback relative location of the transmitter from its receiver, the
connection between the iterations to update the state of value of the convolution filter describes the channel gain
optimization. The individual stages and the overall network of the direct link between this transmitter/receiver pair.
structure are described in detail below. The procedure for obtaining this direct channel strength is
1) Convolution Stage: The convolution stage is respon- illustrated in Fig. 3. The direct channel strength is referred
sible for computing two functions, corresponding to that to as DCSi for link i.
of the interference each link causes to its neighbors and 2) Fully Connected Stage: The fully connected stage
the interference each link receives from its neighbors, is the second stage of the forward computation path,
respectively. As a main innovation in the neural network following the convolution stage described above. It takes a
architecture, we propose to use spatial convolutional filters, feature vector extracted for each link as input and produces
whose coefficients are optimized in the training process, an output xi ∈ [0, 1], (which can be interpreted as a relaxed
that operate directly on the transmitter and receiver density scheduling variable or alternatively as continuous power)
grids described in the previous section. The transmitter and for that link.
receiver spatial convolutions are computed in parallel on The feature vector for each link comprises of the fol-
the two grids. At the end, two pieces of information are lowing entries: TxINTi , RxINTi , DCSi , DCSmax , DCSmin ,
computed for the transmitter-receiver pair of each link: xt−1
i . The first three terms have been explained in the
a convolution of spatial geographic locations of all the previous section. DCSmax and DCSmin denote the largest
nearby receivers that the transmitter can cause interference and smallest direct channel strength among links in the
to, and a convolution of spatial geographic locations of entire layout; and xt−1i represents the fully connected stage
all the nearby transmitters that the receiver can experience output at the previous iteration in the overall feedback
interference from. The computed convolutions are referred structure, as described later. The tuple (TxINTi , RxINTi )
to as TxINTi and RxINTi , respectively, for link i. describes the interference between the ith link and its
Since the idea is to estimate the effect of total inter- neighbors, while the triplet (DCSi , DCSmax , DCSmin )
ference each link causes to nearby receivers and effect describes the link’s own channel strength as compared to
of the total interference each link is exposed to, we the strongest and the weakest links in the entire layout.
need to exclude the link’s own transmitter and receiver It is worth noting that the minimum and maximum chan-
in computing the convolutions. This is done by subtracting nel strengths over the layout are chosen here to characterize
the contributions each link’s own transmitter and receiver the range of the direct channel strengths. This is appropriate
in the respective convolution sum. when the D2D link pairwise distances are roughly uniform,
The convolution filter is a 2D square matrix with fixed as we assume in the numerical simulations of this paper.
5

Convolution Filter

Filter Center Anchor: Receiver’s location

ReLu nonlinearity ReLu nonlinearity


f (x) f (x)

x x
1 2
Sigmoid nonlinearity
1 3 f (x)

1 Feature Vector Per Link x

1
Transmitter Density Grid

Filter Center Anchor: Transmitter’s location

1 1
DCS* of the link
1

2
Largest DCS*
2 1 in the layout
Output Per Link
1
Smallest DCS*
Receiver Density Grid
in the layout

Previous Iteration
Allocation (0 ∼ 1) *DCS: Direct Channel Strength

Fig. 4. Forward computation path for a single link with spatial convolutions and link distance as input to a neural network

However, if the D2D link pairwise distances do not follow then able to converge within a small number of iterations.
a uniform distributions, a more robust characterization The feedback stage is designed as following: After the
could be, for example, 10th and 90th percentile values of completion of (t − 1)th forward computation, the x vector
the channel strength distribution, to alleviate the effect of of [0, 1] values is obtained, with each entry representing
potential outliers. the activation status for each of the N links. Then, a new
The value xi for this link is computed based on its forward computation is started with input density grids
feature vector through the functional mapping of a fully prepared by feeding this x vector into (6)-(7). In this way,
connected neural network (denoted here as Ff c ) over the the activation status for all N links are updated in the
feedback iterations indexed by t: density grids for subsequent interference estimations. Note
that the trainable weights of the convolutional filter and the
xti ← Ff c (TxINTi , RxINTi , DCSi , neural network are tied together over the multiple iterations
DCSmax , DCSmin , xt−1
i ). (8) for more efficient training.
The convolution stage and the fully connected stage After a fixed number of iterations, the scheduling deci-
together form one forward computation path for each sions are obtained from the neural network by quantizing
transmitter-receiver pair, as depicted in Fig. 4. In the the x vector from the last iteration into binary values,
implementation, we use two hidden layers with 30 neurons representing the scheduling decisions of the N links.
in each layer to ensure sufficient expressive power of the The overall feedback structure is depicted in Fig. 5. We
neural network. A rectified linear unit (ReLU) is used at emphasize here that the neural network is designed on
each neuron in the hidden layers; a sigmoid nonlinearity is a per-link basis, thus the overall model is scalable with
used at the output node to produce a value in [0, 1]. respect to the network size. Specifically, at the convolution
3) Feedback Connection: The forward computation stage, the convolutions are computed based on the fixed
(which includes the convolution stage and the fully con- (and trained) convolution filter that covers the neighboring
nected stage) takes the link activation pattern xi as the input non-negligible interference sources. At the fully connected
for constructing the density grid. In order to account for stage, the neural networks of different links are decoupled,
the progressive (de)activation pattern of the wireless links thus scheduling can be performed in a distributed fashion.
through the iterations (i.e., each subsequent interference Moreover, in the training stage, the convolutional filter
estimates need to be aware of the fact that the deactivated parameters and the neural network weights of the different
links no longer produce or are subject to interference), we links are tied together. This facilitates efficient training,
propose a feedback structure, in which each iteration of and implicitly assumes that the propagation environments
the neural network takes the continuous output x from the of the different links are similar. Under this homogeneity
previous iteration as input, then iterate for a fixed number assumption, regardless of how large the layout is and how
of iterations. We find experimentally that the network is many links are to be scheduled in the network, the overall
6

Continuous Scheduling Binary Scheduling


Spatial Convolutions
Variable Variable

Forward Path Q

Forward Path Q

Forward Path Q

Forward Path Q

Forward Path Q

Forward Path Q

Feedback

Fig. 5. Overall neural network with one forward path per link and with feedback connections and quantized output (denoted as “Q”).

trained neural network model can be directly utilized for


scheduling, without adjustment or re-training,

D. Training Process
The overall deep neural network is trained using wireless
network layouts with randomly located links and with the
transmitter-receiver distances following a specific distri-
bution. Specifically, we train the model to maximize the
target sum rate via gradient descent on the convolutional
filter weights and neural network weight parameters. It is Fig. 6. Oscillatory behavior in the neural network training process.
worth noting that while the channel gains are needed at the
training stage for computing rates, they are not needed for
As noted earlier, we could have also used a state-of-
scheduling, which only requires GLI.
the-art optimization algorithm to generate locally optimal
To allow the gradients to be back-propagated through the
schedules as targets and train the neural network in a
network, we do not discretize the network outputs when
supervised fashion. Promising results have been obtained
computing the rates. Therefore, the unsupervised training
for specific transmitter-receiver distance distributions (e.g.,
process is essentially performing a power control task for
2∼65 meters) [21], but supervised learning does not always
maximizing the sum rate. The scheduling decisions are
work well for more general distributions; see Section IV-E.
obtained from discretizing the optimized power variables.
A possible explanation is that the high quality local optimal
We randomly generate wireless D2D networks consist-
schedules are often not smooth functional mappings of the
ing of N = 50 D2D pairs in a 500 meters by 500 meters
network parameters, and are therefore difficult to learn.
region. The locations for the transmitters are generated
uniformly within the region. The locations of the receivers
are generated according to a uniform distribution within E. Symmetry Breaking
a pairwise distances of dmin ∼ dmax meters from their The overall neural network is designed to encourage
respective transmitters. We generate 800,000 such network links to deactivate either when it produces too much
layouts for training. interference to its neighbors, or when it experiences too
The transmitter-receiver distance has significant effect much interference from its neighbors. However, because
on the achievable rate. Link scheduling for sum-rate max- training happens in stages and all the links update their
imization tends to favor short links over long links, so the activation pattern in parallel, the algorithm frequently gets
distribution of the link distances has significant effect on into situations in which multiple links may oscillate be-
the scheduling performance. To develop the capacity of the tween being activated and deactivated.
proposed deep learning approach to accommodate varying Consider the following scenario involving two closely
transmitter-receiver distances, we generate training samples located links with identical surroundings. Starting from the
based on the following distribution: initialization stage where both links are fully activated, both
• Generate dmin uniformly from 2 ∼ 70 meters. links see severe interference coming from each other. Thus,
• Generate dmax uniformly from dmin ∼ 70 meters. at the end of the first forward path, both links would be
• Generate D2D links distance as uniform dmin ∼ dmax . turned off. Now assuming that there are no other strong
7

TABLE I TABLE II
D ESIGN PARAMETERS FOR THE S PATIAL D EEP N EURAL N ETWORK AVERAGE S UM R ATE AS P ERCENTAGE OF FP

Parameters Values % CSI 30∼70 2∼65 10∼50 all 30


Convolution Filter Size 63 cells × 63 cells Learning 7 92.19 98.36 98.42 96.90
Cell Size 5m by 5m Greedy X 84.76 97.08 94.00 84.56
First Hidden Layer 30 units Strongest 7 59.66 82.03 75.41 N/A
Second Hidden Layer 30 units Random 7 35.30 47.47 49.63 50.63
Training 3∼20 iterations All 7 26.74 54.18 48.22 43.40
Number of Iterations
Testing 20 iterations FP X 100 100 100 100

interference in the neighborhood, then at the end of the monotonic convergence for the optimization over the con-
second iteration, both links would see little interference; tinuous power variables, it does not necessarily produce
consequentially both would be encouraged to be turned monotonically increasing sum rate for scheduling. Exper-
back on. This oscillation pattern can keep going, and imentally, scheduling outputs after 100 iterations show
the training process for the neural network would never good numerical performance. We generate 5000 layouts
converge to a good schedule (which is that one of the for testing in this section.
two links should be on). Fig. 6 shows a visualization The design parameters for the neural network are sum-
of the phenomenon. Activation patterns produced by the marized in the Table I. We compare the sum rate perfor-
actual training process are shown in successive snapshots. mance achieved by the trained neural network with each
Notice that the three closely located strong interfering links of the following benchmarks in term of both the average
located at middle bottom of the layout have the oscillating and the maximum sum rate over all the testing samples:
pattern between successive iterations. The network could
not converge to an optimal scheduling where only one of • All Active: Activate all links.
the three links are scheduled. The same happens to the two • Random: Schedule each link with 0.5 probability.
links in the upper left part of the area. • Strongest Links First: We sort all links according
To resolve this problem, a stochastic update mechanism to the direct channel strength, then schedule a fixed
to break the symmetry is proposed. At the end of each portion of the strongest links. The optimal percentage
forward path, the output vector x contains the updated is taken as the average percentage of active links in
activation pattern for all the links. However, instead of the FP target.
feeding back x directly to the next iteration, we feed- • Greedy: Sort all links according to the link distance,
back the updated entries of x with 50% probability (and then schedule one link at a time. Choose a link to
feedback the old entries of x with 50% probability). This be active only if scheduling this link strictly increases
symmetry breaking is used in both the training and testing the objective function (i.e., the sum rate). Note that the
phase and is observed to benefit the overall performance interference at all active links needs to be re-evaluated
of the neural network. in each step as soon as a new link is turned on or off.
• FP: Run FPLinQ for 100 iterations.
IV. P ERFORMANCE OF S UM -R ATE M AXIMIZATION We run experiments with the following D2D links pair-
A. Testing on Layouts of Same Size as Training Samples wise distance distributions in the test samples:
We generate testing samples of random wireless D2D • Uniform in 30 ∼ 70 meters.
network layouts of the same number of links and the same • Uniform in 2 ∼ 65 meters.
size as the training samples, except with fixed uniform • Uniform in 10 ∼ 50 meters.
link distance distribution between some values of dmin and • All links at 30 meters.
dmax . The channel model is adapted from the short-range The distance distribution affects the optimal scheduling
outdoor model ITU-1411 with a distance-dependent path- strategies, e.g., in how advantageous it is for scheduling
loss [22], over 5MHz bandwidth at 2.4GHz carrier fre- only the strongest links. The sum rate performance of
quency, and with 1.5m antenna height and 2.5dBi antenna each of the above methods are reported in Table II. The
gain. The transmit power level is 40dBm; the background performance is expressed as the percentages as compared
noise level is -169dBm/Hz. We assume an SNR gap of to FPLinQ.
6dB to Shannon capacity to account for practical coding As shown in Table II, the proposed spatial learning ap-
and modulation. proach always achieves more than 92% of the average sum
For each specific layout and each specific channel real- rate produced by FPLinQ for all cases presented, without
ization, the FPLinQ algorithm [4] is used to generate the explicitly knowing the channels. The neural network also
sum-rate maximizing scheduling output with a maximum outperforms the greedy heuristic (which requires CSI) and
iteration of 100. We note that although FPLinQ guarantees outperforms other benchmarks by large margins.
8

500
Activation Result by Greedy on Sample #2 TABLE III
450 G ENERALIZABILITY TO L AYOUTS OF L ARGER D IMENSIONS BUT
400
S AME L INK D ENSITY AND L INK D ISTANCE : S UM R ATE AS % OF FP
350
300
Tx
250 Rx 2m∼65m All 30m
200 Size (m2 ) Links
150 DL Greedy DL Greedy
100
50 750 × 750 113 98.5 102.4 98.4 98.4
0
0 50 100 150 200 250 300 350 400 450 500
1000 × 1000 200 99.2 103.2 98.3 98.8
(a) Greedy 1500 × 1500 450 99.5 103.8 98.3 100.0
2000 × 2000 800 99.7 104.1 98.8 100.8
500
450
Activation Result by FP on Sample #2
500
450
Activation Result by Neural Network on Sample #2
2500 × 2500 1250 99.7 104.2 99.1 101.3
400 400
350 350
300 300
250
Tx
Rx 250
Tx
Rx
TABLE IV
200
150
200
150
G ENERALIZABILITY TO L AYOUTS WITH D IFFERENT L INK D ENSITIES :
100 100 S UM R ATE AS % OF FP
50 50
0 0
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
2m∼65m All 30m
Size (m2 ) Links
(b) FP (c) Neural Network DL Greedy DL Greedy
10 95.5 90.0 94.9 81.6
Fig. 7. Greedy heuristic prematurely activates the strongest link
30 97.0 93.2 96.1 81.3
500 × 500 100 98.6 99.8 99.0 88.7
The main reason that the greedy heuristics performs 200 97.8 101.7 96.0 92.4
poorly is that it always activate the strongest link first, 500 93.0 104.1 92.91 92.8
but once activated, the algorithm does not reconsider the 1 50 iterations are required for deep learning to achieve this result
scheduling decisions already made. The earlier scheduling
decision may be suboptimal; this leads to poor performance
as illustrated in an example in Fig. 7. Note that under the 1) Generalizability to Layouts of Large Sizes: First, we
channel model used in simulation, the interference of an keep the link density and distance distribution the same
activated link reaches a range of 100m to 300m. If a greedy and test the performance of the neural network on larger
algorithm activates a link in the center of the 500m by layouts occupying an area of up to 2.5km by 2.5km and
500m layout, it could preclude the activation of all other 1250 links. The resulting sum rate performance is presented
links, while the optimal scheduling should activate multiple in Table III. Note that following the earlier convention,
weaker links roughly 100m to 300m apart as shown in the entries for the deep learning (DL) neural network and
Fig. 7. greedy method are the percentages of sum rates achieved
Throughout testings of many cases, including the exam- as compared with FP, averaged over the testing set.
ple shown in Fig. 7, the spatial learning approach always Table III shows that the neural network is able to
produces a scheduling pattern close to the FP output. This generalize to layouts of larger dimensions very well, with
shows that the neural network is capable of learning the its performance very close to FP. It is worth emphasizing
state-of-the-art optimization strategy. that while the greedy algorithm also performs well (likely
because the phenomenon of Fig. 7 is less likely to occur on
B. Generalizability to Arbitrary Topologies larger layouts), it requires CSI, as opposed to just location
An important test of the usefulness of the proposed information utilized by spatial deep learning.
spatial deep learning design is its ability to generalize 2) Generalizability to Layouts with Different Link Den-
to different layout dimensions and link distributions. In- sities: We further explore the neural network’s generaliza-
tuitively, the neural network performs scheduling based tion ability in optimizing scheduling over layouts that have
on an estimate of the direct channel and the aggregate different link densities as compared to the training set. For
interference from a local region surrounding the transmitter this part of the evaluation, we fix the layout size to be 500
and the receiver of each link. Since both of these estimates meters by 500 meters as in the training set, but instead
are local, one would expect that the neural network should of having 50 links, we vary the number of links in each
be able to extend to general layout. layout from 10 to 500. The resulting sum rate performances
To validate this generalization ability, we test the trained of deep learning and the greedy heuristics are presented in
neural network on layouts with larger number of links, Table IV.
while first keeping the link density the same, then further As shown in Table IV, with up to 4-fold increase in
test on layouts in which the link density is different. Note the density of interfering links, the neural network is able
that we do not perform any further training on the neural to perform near optimally, achieving almost the optimal
network. For each test, 500 random layouts are generated FP sum rate, while significantly outperforming the greedy
to obtain the average maximum sum rate. algorithm, especially when the network is sparse.
9

TABLE V 1) Theoretical Analysis: We first provide the complexity


S UM R ATE AS % OF FP ON C HANNELS WITH FAST FADING scaling for each of the methods as functions of the number
% CSI 30∼70 2∼65 10∼50 30 of links N :
DL 7 71.8 88.6 82.5 73.9 • FPLinQ Algorithm: Within each iteration, to up-

FP no fade X 77.7 88.9 82.7 76.3 date scheduling outputs and relevant quantities, the
Greedy X 95.9 98.3 97.7 96.7 dominant computation includes matrix multiplication
Strongest X 65.4 80.8 75.0 68.8 with the N ×N channel coefficient matrix. Therefore,
Random 7 31.7 44.5 44.0 42.7 the complexity per iteration is O(N 2 ). Assuming
that a constant number of iterations is needed for
All Active 7 25.3 50.4 43.8 38.4
convergence, the total run-time complexity is then
FP X 100 100 100 100
O(N 2 ).
• Greedy Heuristic: The greedy algorithm makes

However, the generalizability of deep learning does have scheduling decisions for each link sequentially. When
limitations. When the number of links increases to 500 or deciding whether to schedule the ith link, it needs
more (10-fold increase as compared to training set), the to compare the sum rate of all links that have been
neural network becomes harder to converge, resulting in scheduled so far, with and without activating the
dropping in performance. This is reflected in one entry in new link. This involves re-computing the interference,
the last row of Table IV, where it takes 50 iterations for the which costs O(i) computation. As i ranges from 1 to
neural network to reach a satisfactory rate performance. If N , the overall complexity of the greedy algorithm is
the link density is further increased, it may fail to converge. therefore O(N 2 ).
• Neural Network Let the discretized grid be of di-
Likely, new training set with higher link density would be
needed. mension K × K, and the spatial filter be of dimen-
sion J × J. Furthermore, let h0 denotes the size of
input feature vector for fully connected stage, and let
C. Sum Rate Optimization with Fast Fading
(h1 , h2 , ...hn ) denote the number of hidden units for
So far we have tested on channels with only a path-loss each of the n hidden layers (note that the output layer
component (according to the ITU-1411 outdoor model). has one unit). The total run-time complexity of the
Since path-loss is determined by location, the channels are proposed neural network can be computed as:
essentially deterministic function of the location.
In this section, Rayleigh fast fading is introduced into K 2 × J 2 +N × (h0 h1 + · · · + hn−1 hn + hn )
| {z } | {z }
the testing channels. This is more challenging, because the Convolution Stage Fully Connected Stage per Pair
channel gains are now stochastic functions of GLI inputs. (9)
Note that the neural network is still trained using channels
Thus, given a layout of fixed region size, the time
without fading.
complexity of neural network scales as O(N ).
We use test layouts of 500 meters by 500 meters with 50
2) Experimental Verification: In actual implementa-
D2D links and with 4 uniform link distance distributions.
tions, due to its ability to utilize parallel computation
The sum rate performance results are presented in Table
architecture, the run-time of the neural network can be
V, with an additional benchmark:
even less than O(N ). To illustrate this point, we measure
• FP without knowing fading: Run FP based on
the total computation time of scheduling one layout of
the CSI without fast fading effect added in. This varying number of D2D links by using FP and greedy
represents the best that one can do without knowing algorithms and by using the proposed neural network. The
the fast-fading. timing is conducted on a single desktop, with the hardware
As shown in Table V, the performance of deep learning specifications as below:
indeed drops significantly as compared to FP or Greedy • FP and Greedy: Intel CPU Core i7-8700K @ 3.70GHz
(both of which require exact CSI). However, it is still • Neural Network: Nvidia GPU GeForce GTX 1080Ti
encouraging to see the performance of neural network
To provide reasonable comparison of the running time, we
matches FP without knowing fading, indicating that it is
select hardwares most suited for each of the algorithms.
already performing near optimally given that only GLI is
The implementation of the neural network is highly par-
available as inputs.
allelizable; it greatly benefits from the parallel computa-
tion power of GPU. On the other hand, FP and greedy
D. Computational Complexity algorithms have strictly sequential computation flows, thus
In this section, we further argue that the proposed benefiting more from CPU due to its much higher clock
neural network has a computation complexity advantage speed. The CPU and GPU listed above are selected at about
as compared to the greedy or FP algorithms by providing the same level of computation power and price point with
a theoretical analysis and some experimental verification. regard to their respective classes.
10

TABLE VI
3.0 U NSUPERVISED VS . S UPERVISED T RAINING – S UM R ATE AS % OF FP
CPU/GPU run time for optimizing one layout (seconds) FP
Neural Net
2.5 Greedy
Sum Rate (%) 2∼65 10∼50 30∼70 30
2.0 Unsupervised 98.4 98.4 92.2 96.9
1.5 Supervised 96.2 90.3 83.2 82.0
1.0

0.5 When the layouts contain links of similar distances, many


0.0 distinct local optima emerge, which tend to confuse the
50200 1000 2000 3000
Number of D2D Links
4000 5000 supervised learning process. In these cases, using the sum-
rate objective directly tends to produce better results.
Fig. 8. Computation time on layouts with varying number of D2D links.
V. S CHEDULING WITH P ROPORTIONAL FAIRNESS
As illustrated in Fig. 8, the computational complexity of This paper has thus far focused on scheduling with sum-
the proposed neural network is approximately constant, and rate objective, which does not include a fairness criterion,
is indeed several orders of magnitude less than FP baseline thus tends to favor shorter links and links that do not expe-
for layouts with large number of D2D links. rience large amount of interference. Practical applications
We remark here that the complexity comparison is in- of scheduling, on the other hand, almost always requires
herently implementation dependent. For example, the bot- fairness. In the remaining part of this paper, we first
tleneck of our neural network implementation is the spatial illustrate the challenges in incorporating fairness in spatial
convolutions, which are computed by built-in functions deep learning, then offer a solution that takes advantage of
in TensorFlow [23]. The built-in function for computing the existing sum-rate maximization framework to provide
convolution in TensorFlow, however, computes convolution fair scheduling across the network.
in every location in the entire geographic area, which is an
overkill. If a customized convolution operator is used only
at specific locations of interests, the rum-time complexity A. Proportional Fairness Scheduling
of our neural network can be further reduced. The complex- We can ensure fairness in link scheduling by defining
ity is expected to be O(N ), but with much smaller constant an optimization objective of a network utility function over
than the complexity curve in Fig. 8. We also remark that the long-term average rates achieved by the D2D links. The
the computational complexity of traditional optimization long-term average rate, for example, can be defined over
approaches can potentially be reduced by further heuristics; a duration of T time slots, with an exponential weighted
see, e.g., [24]. window:
To conclude, the proposed neural network has significant
computational complexity advantage in large networks, R̄it = (1 − α)R̄it−1 + αRit t≤T (10)
while maintaining near-optimal scheduling performance. where Rit is the instantaneous rate achieved by the D2D
This is remarkable considering that the neural network has link i in time slot t, which can be computed as in (1) based
only been trained on layouts with 50 links, and requires on the scheduling decision binary vector in each time slot,
only O(N ) GLI rather than O(N 2 ) CSI. xt . Define a concave and non-decreasing utility function
U (R̄i ) for each link. The network utility maximization
E. Unsupervised vs. Supervised Training problem is that of maximizing
As mentioned earlier, the neural network can also be N
X
trained in a supervised fashion using the locally opti- U (R̄i ). (11)
mal schedule from FP, or in a unsupervised fashion di- i=1
rectly using the sum-rate objective. Table VI compares In the proportional fairness scheduling, the utility function
these two approaches on the layouts of 500 meters by is chosen to be U (·) = log(·).
500 meters with 50 D2D links, but with link distances The idea of proportional fairness scheduling is to max-
following four different distributions. It is interesting to imize the quantity defined in (11) incrementally [25].
observe that while supervised learning is competitive for Assuming large T , in each new time slot, the incremental
link distance distribution of 2m∼65m, it generally has contribution of the achievable rates of the scheduled links
inferior performance in other cases. An intuitive reason to the network utility is approximately equivalent to a
is that when the layouts contain very short links, the sum- weighted sum rate [20]
rate maximization scheduling almost always chooses these N
short links. It is easier for the neural network to learn X
wi Rit (12)
such pattern in either a supervised or unsupervised fashion. i=1
11

Time Slot #1
Consecutive Time Slots at a Fixed Layout
Time Slot #2 Time Slot #3 weight increasing over time until they cross a threshold,
1.0 1.0 1.0 weights
FP allocs then all the sudden it gets scheduled. Thus, the mapping
Normalized Weight/Allocation Value

Normalized Weight/Allocation Value

Normalized Weight/Allocation Value


0.8 0.8 0.8
between the proportional fairness weights and the optimal
0.6 0.6 0.6 schedule is highly sensitive to these sharp turns. If we
0.4 0.4 0.4 desire to learn this mapping from a data-driven approach,
0.2 0.2 0.2
one should expect to need a considerably larger amount of
0.0 0.0 0.0
training samples to be collected just to be able to survey the
l1 l10 l20 l30
Link Index
l40 l50 l1 l10 l20 l30
Link Index
l40 l50 l1 l10 l20 l30
Link Index
l40 l50 functional landscape, not to mention the many more local
sharp optima that would make training difficult. Further
Fig. 9. The optimal scheduling can drastically change over slow varying exacerbating the difficulty is the fact that there is no easy
proportional fairness weights way to sample the space of proportional fairness weights.
In a typical scheduling process, the sequence of weights
where the weights are set as: are highly non-uniform.

∂U (R̄it ) ∂ log(R̄it ) 1
wi = = = . (13) C. Weighted Sum Rate Maximization via Binary Weights
∂R R̄it ∂R R̄it R̄it
To tackle the proportionally fair scheduling problem,
Thus, the original network utility maximization problem this paper proposes the following new idea: Since the
(11) can be solved by a series of weighted sum-rate neural network proposed in the first part of this paper is
maximization, where the weights are updated in each capable of generalizing to arbitrary topologies for sum-
time slot as in (13). The mathematical equivalence of rate maximization, we take advantage of this ability by
decomposition (11) to this series of weighted sum-rate emulating weighted sum-rate maximization by sum-rate
maximization (12)-(13) is established in [26]. In the rest of maximization, but over a judiciously chosen subset of links.
the paper, to differentiate the weights in the weighted rate- The essence of scheduling is to select an appropriate
sum maximization from the weights in the neural network, subset of users to activate. Our idea is therefore to first
we refer wi as the proportional fairness weights. construct a shortlist of candidate links based on the pro-
The weights wi can take on any positive real values. portional fairness weights alone, then further refine the can-
This presents a significant challenge to deep learning based didate set of links using deep learning. Alternatively, this
scheduling. In theory, one could train a different neural can also be thought of as to approximate the proportional
network for each set of weights, but the complexity of fairness weights by a binary weight vector taking only the
doing so would be prohibitive. To incorporate wi as an values of 0 or 1.
extra input to the neural network turns out to be quite The key question is how to select this initial shortlist
difficult as well. We explain this point in the next section, of candidate links, or equivalently how to construct the
then offer a solution. binary weight vector. Denote the original proportional
fairness weights as described in (13) by wt . Obviously, the
B. Challenge in Learning to Maximize Weighted Sum Rate links with higher weights should have higher priority. The
A natural idea is to incorporate the proportional fairness question is how many of the links with the large weights
weights as an extra input for each link in the neural should be included.
network. However, this turns out to be quite challenging. This paper proposes to include the following subset of
We have implemented both the spatial convolution based links. We think of the problem as to approximate wt by
neural network (using the structure mentioned in the first a binary 0-1 vector ŵt . The proposed scheme finds this
part of the paper, while taking an extra proportional fairness binary approximation in such a way so that the dot product
weight parameter) and the most general fully connected between wt (normalized to unit `2 -norm) and ŵt (also
neural network to learn the mapping from the proportional normalized) is maximized. For a fixed real-valued weight
fairness weights to the optimal scheduling. With millions vector wt , we find the binary weight vector ŵt as follows:
of training data, the neural network is unable to learn such 
y wt

t
a mapping, even for a single fixed layout. ŵ = arg min , (14)
y∈{0,1}N kyk2 kwt k2
The essential difficulty lies in the high dimensionality
of the function mapping. To visualize this complexity, we where h·, ·i denotes the dot product of two vectors. Geo-
provide a series of plots of proportional fairness weights metrically, this amounts to finding an ŵt that is closest to
against FP scheduling allocations in sequential time slots wt in term of the angle between the two vectors.
in Fig. 9. It can be observed that the FP schedule can Algorithmically, such a binary vector can be easily found
change drastically when the proportional weights only by first sorting the entries of wt , then setting the largest k
vary by a small amount. This is indeed a feature of entries to 1 and the rest of 0, where k is found by a linear
proportional fairness scheduling: an unscheduled link sees search using the objective function in (14). With the binary
its average rate decreasing and its proportional fairness weight vector ŵt , the weighted sum rate optimization
12

reweighting scheme wi = U 0 (R̄i ), the corresponding utility


4
objective must be U (R̄i ). In our case, the utility function
3
U (R̄i ) can be computed explicitly as
Z
2
U (R̄i ) = α W (R̄i ) dR̄i
α 
1 = αR̄i − ln 1 + exp(κ(R̄i − θ)) + β (18)
κ
Utility

0 where α > 0 is a scaling parameter and β ∈ R is


Proportional Fairness an offset parameter. These two parameters do not affect
Fixed-Threshold Scheme, =0.5
-1 Fixed-Threshold Scheme, =1.0 the scheduling performance. Fig. 10 compares the utility
Fixed-Threshold Scheme, =5.0 function U (R̄i ) of the binary weighting scheme with the
-2 log-utility proportional fairness function. It is observed that
the utility of the fixed-threshold scheme follows the same
-3
0 5 10 15 20 25 30 trend as the proportional fairness utility.
Long-term average rate (Mbps)
Note that the above simplified analysis assumes that the
threshold θ is fixed, but in the proposed binary reweighting
Fig. 10. Utility function of the fixed-threshold binary weighting scheme
vs. proportional fairness scheduling. Here θ = 0.1. scheme, the threshold changes adaptively in each step, so
this analysis is an approximation. Observe also that the
utility function of the binary reweighting scheme saturates
is effectively reduced to sum rate optimization, over the when R̄ is greater than the threshold, in contrast to the
subset of links with weights equal to 1. We can then utilize proportional fairness utility which grows logarithmically
spatial deep learning to perform scheduling over this subset with R̄. This difference becomes important in the numeri-
of links. cal evaluation of the proposed scheme.

D. Utility Analysis of Binary Reweighting Scheme


VI. P ERFORMANCE OF P ROPORTIONAL FAIRNESS
The proposed binary reweighting scheme is a heuristic S CHEDULING
for producing fair user scheduling, but a rigorous analysis
of such a scheme is challenging. In the following, we We now evaluate the performance of the deep learning
provide a justification as to why such a scheme provides based approach with binary reweighting for proportional
fairness. From a stochastic approximation perspective [26], fairness scheduling in three types of wireless network
the proposed way of updating the weights can be thought layouts:
of as maximizing a particular utility function of the long- • The layouts with the same size and link density;
term average user rate. To see what this utility function • The larger layouts but with same link density;
looks like, we start with a simple fixed-threshold scheme: • The larger layouts but with different link density.
For testing on layouts with the same setting, 20 distinct
(
1, if wi ≥ θ
ŵi = (15) layouts are generated for testing, with each layout being
0, otherwise
scheduled over 500 time slots. For the other two settings,
for some fixed threshold θ > 0, where ŵi and wi are the 10 distinct layouts are generated and scheduled over 500
binary weight and the original weight, respectively. Since time slots. Since scheduling is performed here within a
wi = 1/R̄i , we can rewrite (15) as finite number of time slots, we compute the mean rate of
 each link by averaging the instantaneous rates over all the
 1, if R̄ ≤ 1 time slots:
i
ŵi = θ . (16) T
1X t
 0, otherwise R̄i = R. (19)
T t=1 i
Recognizing (16) as a reverse step function with sharp
transition from 1 to 0 at 1/θ, we propose to use the The utility of each link is computed as the logarithm of the
following reverse sigmoid function to mimic ŵi : mean rates in Mbps. The network utility is the sum of link
utilities as defined in (11). The utilities of distinct layouts
1
W (R̄i ) = (17) are averaged and presented below. To further illustrate the
1 + exp(κ(R̄i − θ)) mean rate distribution of the D2D link, we also plot the
where the parameter κ > 0 controls the steepness of the cumulative distribution function (CDF) of the mean link
W (R̄i ). We can now recover the utility function that the rates, serving as a visual illustration of fairness.
reweighting scheme (17) implicitly maximizes. The proposed deep learning based proportional fairness
For a fixed strictly concave utility U (R̄i ), the user scheduling solves a sum-rate maximization problem over a
weights are set as wi = U 0 (R̄i ). Thus, given some subset of links using the binary reweighting scheme in each
13

1.0 1.0

0.8 0.8
Cumulative Distribution Function

Cumulative Distribution Function


0.6 0.6

0.4 0.4
Deep Learning Deep Learning
FP FP
0.2 Weighted Greedy 0.2 Weighted Greedy
Max Weight Only Max Weight Only
All Active All Active
Random Random
0.0 0.0
0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
Mean Rate for each link (Mbps) Mean Rate for each link (Mbps)

Fig. 11. CDF of mean rates for layouts of 50 links in 500m×500m area Fig. 12. CDF for mean rates of Layouts with 200 links in 500m×500m
for the case that link distance distribution is 30m to 70m. area with link distance fixed at 30m

TABLE VII
M EAN L OG U TILITY P ERFORMANCE FOR P ROPORTIONALLY FAIR •Uniform in 10 ∼ 50 meters.
S CHEDULING •All 30 meters.
CSI 30-70 2-65 10-50 30 The log utility values achieved by the various schemes
DL 7 45.9 61.9 63.3 62.6 are presented in Table VII. The CDF plot of mean rates
W. Greedy X 39.7 51.5 51.1 49.0 achieved for the case of link distributed in 30m-70m is
Max Weight 7 38.3 42.1 41.9 41.4 presented in Fig. 11.
Random 7 0.76 38.4 38.7 35.1 Remarkably, despite the many approximations, the deep
All Active 7 -27.6 24.0 20.9 15.7 learning approach with binary reweighting achieves excel-
FP X 45.2 63.1 63.3 63.0 lent log-utility values as compared to the FP. Its log-utility
also exceeds the weighted greedy algorithm noticeably.
We again emphasize that this is achieved with geographic
time slot. In addition to the baseline schemes mentioned information only without explicit CSI.
previously, we also include: It is interesting to observe that the deep learning ap-
• Max Weight: Schedule the single link with the high- proach has a better CDF performance as compared to the
est proportional fairness weight in each time slot. FP in the low-rate regime, but worse mean rate beyond
• Weighted Greedy: Generate a fixed ordering of all the 80-percentile range. This is a consequence of the fact
links by sorting all the links according to the propor- that the implicit network utility function of the binary
tional fairness weight of each link multiplied by the reweighting scheme is higher than proportional fairness
maximum direct link rate it can achieve without inter- utility at low rates, but saturates at high rate, as shown
ferences, then schedule one link at a time in this order. in Fig. 10.
Choose a link to be active only if scheduling this 2) Performance on Larger Layouts with Same Link
link strictly increases the weighted sum rate. Note that Density: To demonstrate the ability of the neural network
interference is taken into account when computing the to generalize to layouts of larger size under the proportional
link rate in the weighted sum rate computation. Thus, fairness criterion, we conduct further testing on larger
CSI is required. In fact, the interference at all active layouts with the same link density. We again emphasize
links needs to be re-evaluated in each step whenever that no further training is conducted. We test the following
a new link is activated. two D2D links pairwise distance distributions:
1) Performance on Layouts of Same Size and Link • Uniform in 2 ∼ 65 meters.
Density: In this first case, we generate testing layouts with • All 30 meters.
size 500 meters by 500 meters, with 50 D2D links in each The results for this setting are summarized in Table VIII.
layout. Similar to sum rate optimization evaluation, we It is observed that under the proportional fairness cri-
have conducted the testing under the following 4 D2D links terion, the spatial deep learning approach still generalizes
pairwise distance distributions: really well. It is competitive with respect to both FP and
• Uniform in 30 ∼ 70 meters. the weighted greedy methods, using only O(N ) GLI as
• Uniform in 2 ∼ 65 meters. input and using the binary weight approximation.
14

TABLE VIII
M EAN L OG U TILITY P ERFORMANCE FOR P ROPORTIONALLY FAIR S CHEDULING ON L ARGER L AYOUTS WITH S AME L INK D ENSITY

2m∼65m all 30 m
Layout Size Links
FP DL W. Greedy FP DL W. Greedy
750m ×750m 113 127 124 106 127 126 111
1000m ×1000m 200 217 205 203 219 214 205
1500m ×1500m 450 462 432 454 466 448 462

TABLE IX
M EAN L OG U TILITY P ERFORMANCE FOR P ROPORTIONALLY FAIR S CHEDULING ON L AYOUTS WITH D IFFERENT L INK D ENSITY

2m∼65m all 30m


Layout Size Links
FP DL W. Greedy FP DL W. Greedy
30 52 49 47 50 50 44
500m ×500m 200 -11 -13 -90 -11 -26 -102
500 -511 -514 -736 -485 -542 -739

3) Performance on Layout with Different Link Density: on larger wireless networks as compared to the traditional
We further test the neural network on a more challenging optimization algorithms and the competing heuristics.
case: layouts with different link densities than the setting on Moreover, this paper proposes a binary reweighting
which it is trained. Specifically, we experiment on layouts scheme to allow the weighted sum-rate maximization prob-
of 500 meters by 500 meters size and varying number of lem under the proportional fairness scheduling criterion to
D2D links. The resulting sum log utility value, averaged be solved using the neural network. The proposed method
over 10 testing layouts, are summarized in Table IX. achieves near optimal network utility, while maintaining
It is observed that the neural network still competes the advantage of bypassing the need for CSI.
really well against FP in log utility, and outperforms the Taken together, this paper shows that deep learning
weighted greedy method significantly. To visualize, we is promising for wireless network optimization tasks, es-
select one specific layout of 500 meters by 500 meters pecially when the models are difficult or expensive to
region with 200 links with link distances fixed to 30 meters, obtain and when computational complexity of existing
and provide the CDF plot of long-term mean rates achieved approaches is high. In these scenarios, a carefully crafted
by each link in Fig. 12. neural network topology specifically designed to match the
problem structure can be competitive to the state-of-the-art
VII. C ONCLUSION methods.

Deep neural network has had remarkable success in


many machine learning tasks, but the ability of deep R EFERENCES
neural networks to learn the outcome of large-scale discrete
[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
optimization in still an open research question. This paper learning applied to document recognition,” Proc. IEEE, vol. 86,
provides evidence that for the challenging scheduling task no. 11, pp. 2278–2324, Nov. 1998.
for the wireless D2D networks, deep learning can perform [2] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, pp.
436–444, May 2015.
very well for sum-rate maximization. In particular, this [3] K. Hornik, “Multilayer feedforward networks are universal approx-
paper demonstrates that in certain network environments, imators,” Neural Netw., vol. 2, pp. 359–366, 1989.
by using a novel geographic spatial convolution for esti- [4] K. Shen and W. Yu, “FPLinQ: A cooperative spectrum sharing
strategy for device-to-device communications,” in IEEE Int. Symp.
mating the density of the interfering neighbors around each Inf. Theory (ISIT), Jun. 2017, pp. 2323–2327.
link and a feedback structure for progressively adjusting [5] ——, “Fractional programming for communication systems—Part
the link activity patterns, a deep neural network can in I: Power control and beamforming,” IEEE Trans. Signal Process.,
effect learn the network interference topology and perform vol. 66, no. 10, pp. 2616–2630, May 15, 2018.
[6] ——, “Fractional programming for communication systems—Part
scheduling to near optimum based on the geographic II: Uplink scheduling via matching,” IEEE Trans. Signal Process.,
spatial information alone, thereby eliminating the costly vol. 66, no. 10, pp. 2631–2644, May 15, 2018.
channel estimation stage. [7] X. Wu, S. Tavildar, S. Shakkottai, T. Richardson, J. Li, R. Laroia,
and A. Jovicic, “FlashLinQ: A synchronous distributed scheduler
Furthermore, this paper demonstrates the generalization for peer-to-peer ad hoc networks,” IEEE/ACM Trans. Netw., vol. 21,
ability of the neural network to larger layouts and to no. 4, pp. 1215–1228, Aug. 2013.
layouts of different link density (without the need for [8] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively
weighted MMSE approach to distributed sum-utility maximization
any further training). This ability to generalize provides for a MIMO interfering broadcast channel,” IEEE Trans. Signal
computational complexity advantage for the neural network Process., vol. 59, no. 9, pp. 4331–4340, Apr. 2011.
15

[9] N. Naderializadeh and A. S. Avestimehr, “ITLinQ: A new approach


for spectrum sharing in device-to-device communication systems,”
IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1139–1151, Jun.
2014.
[10] X. Yi and G. Caire, “Optimality of treating interference as noise: A
combinatorial perspective,” IEEE Trans. Inf. Theory, vol. 62, no. 8,
pp. 4654–4673, Jun. 2016.
[11] B. Zhuang, D. Guo, E. Wei, and M. L. Honig, “Scalable spectrum
allocation and user association in networks with many small cells,”
IEEE Trans. Commun., vol. 65, no. 7, pp. 2931–2942, Jul. 2017.
[12] I. Rhee, A. Warrier, J. Min, and L. Xu, “DRAN: Distributed
randomized TDMA scheduling for wireless ad hoc networks,” IEEE
Trans. Mobile Comput., vol. 8, no. 10, pp. 1384–1396, Oct. 2009.
[13] L. P. Qian and Y. J. Zhang, “S-MAPEL: Monotonic optimization
for non-convex joint power control and scheduling problems,” IEEE
Trans. Wireless Commun., vol. 9, no. 5, pp. 1708–1719, May 2010.
[14] M. Johansson and L. Xiao, “Cross-layer optimization of wireless
networks using nonlinear column generation,” IEEE Trans. Wireless
Commun., vol. 5, no. 2, pp. 435–445, Feb. 2006.
[15] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos,
“Learning to optimize: Training deep neural networks for interfer-
ence management,” IEEE Trans. Signal Process., vol. 66, no. 20,
pp. 5438–5453, Aug. 2018.
[16] M. Eisen, C. Zhang, L. F. O. Chamon, D. D. Lee, and A. Ribeiro,
“Learning optimal resource allocations in wireless systems,” [On-
line] Available: https://arxiv.org/pdf/1807.08088.
[17] F. Liang, C. Shen, and F. Wu, “Power control for interference
management via ensembling deep neural networks,” 2018, preprint.
[Online] Available: https://arxiv.org/pdf/1807.10025.
[18] J. G. D. Forney and G. Ungerboeck, “Modulation and coding for
linear gaussian channels,” IEEE Trans. Inf. Theory, vol. 44, no. 6,
Oct. 1998.
[19] M. J. Neely, Stochastic Network Optimization with Application to
Communication and Queueing Systems. Morgan & Claypool, 2010.
[20] J. Huang, R. Berry, and M. Honig, “Distributed interference com-
pensation for wireless networks,” IEEE J. Sel. Areas Commun.,
vol. 24, no. 5, pp. 1074–1084, May 2006.
[21] W. Cui, K. Shen, and W. Yu, “Spatial deep learning in wireless
scheduling,” in IEEE Global Commun. Conf. (GLOBECOM), Abu
Dhabi, UAE, Dec. 2018.
[22] Recommendation ITU-R P.1411-8. International Telecommunica-
tion Union, 2015.
[23] M. Abadi et al., “TensorFlow: Large-scale machine learning
on heterogeneous systems,” 2015, software available from
tensorflow.org. [Online]. Available: https://www.tensorflow.org/
[24] Z. Zhou and D. Guo, “1000-cell global spectrum management,” in
ACM Int. Symp. Mobile Ad Hoc Netw. Comput. (MobiHoc), Jul.
2017.
[25] E. F. Chaponniere, P. J. Black, J. M. Holtzman, and D. N. C.
Tse, “Transmitter directed code division multiple access system
using path diversity to equitably maximize throughput,” U.S. Patent
345 700, Jun. 30, 1999.
[26] H. J. Kushner and P. A. Whiting, “Convergence of proportional-fair
sharing algorithms under general conditions,” IEEE Trans. Wireless
Commun., vol. 3, no. 4, pp. 1250–1259, Jul. 2004.

You might also like