arXiv:1906.00284v1 [cs.NI] 1 Jun 2019
Proportional Fair RAT Aggregation in HetNets
Ehsan Aryafar
Alireza Keshavarz-Haddad
Carlee Joe-Wong
Portland State University
Portland, OR
Shiraz University
Shiraz, Iran
Carnegie Mellon University
Silicon Valley, CA
Abstract—Heterogeneity in wireless network architectures (i.e.,
the coexistence of 3G, LTE, 5G, WiFi, etc.) has become a key
component of current and future generation cellular networks.
Simultaneous aggregation of each client’s traffic across multiple
such radio access technologies (RATs) / base stations (BSs) can
significantly increase the system throughput, and has become an
important feature of cellular standards on multi-RAT integration.
Distributed algorithms that can realize the full potential of this
aggregation are thus of great importance to operators. In this
paper, we study the problem of resource allocation for multi-RAT
traffic aggregation in HetNets (heterogeneous networks). Our
goal is to ensure that the resources at each BS are allocated so that
the aggregate throughput achieved by each client across its RATs
satisfies a proportional fairness (PF) criterion. In particular, we
provide a simple distributed algorithm for resource allocation at
each BS that extends the PF allocation algorithm for a single BS.
Despite its simplicity and lack of coordination across the BSs, we
show that our algorithm converges to the desired PF solution and
provide (tight) bounds on its convergence speed. We also study
the characteristics of the optimal solution and use its properties
to prove the optimality of our algorithm’s outcomes.
I. I NTRODUCTION
The increasing demand for wireless data has led to denser
and more heterogeneous wireless network deployments. This
heterogeneity manifests itself in terms of network deployments
across multiple radio access technologies (e.g., 3G, LTE, WiFi,
5G), cell sizes (e.g., macro, pico, femto), and frequency bands
(e.g., TV bands, 1.8-2.4 GHz, mmWave), etc. To realize the
gains associated with such heterogeneous networks (HetNets),
consumer (client) devices are also being equipped with an
increasing number of radio access technologies (RATs), and
some are already able to simultaneously aggregate the traffic
across multiple RATs to increase throughput [1].
To support such traffic aggregation on the network side, the
3GPP (3rd generation partnership project) has been actively
developing multi-RAT integration solutions. The introduction
of LWA (LTE-WiFi Aggregation) as part of the 3GPP Release
13 [2] was a step in this direction. LWA allows using both
LTE and WiFi links for a single traffic flow and is generally
more efficient than transport layer aggregation protocols (e.g.,
MultiPath TCP), due to coordination at lower protocol stack
layers. LWA’s design primarily follows the LTE Dual Connectivity (DC) architecture (defined in 3GPP Release 12 [3]),
which allows a wireless device to connect to two LTE eNBs
that are on different carrier frequencies, and utilize the radio
resources that belong to both of them. Currently, the 3GPP
is working on a solution to support below IP (layer 2) multiRAT integration across any combination of RATs, including
LTE, WiFi, 802.11ad/ay, and 5G New Radio (NR) [4]. The
proposed architecture would allow for dynamic traffic splitting
across RATs for each client, which can lead to a significant
increase in the system performance (e.g., total throughput).
However, it is difficult to design resource allocation algorithms for each BS1 that realize the performance benefits of
such integrated HetNets. Specifically, (i) backhaul links from
different BSs in HetNets show diverse capacity and latency
characteristics and depend on the underlying backhauling technology. For example, cable and DSL have on average 28 and
62 ms roundtrip latencies, respectively [5], [6]. The latency can
be even higher when a network operator uses a third party ISP
to communicate with its BSs (e.g., a mobile operator that uses
a wired ISP to control its WiFi BSs). Such latencies make it
infeasible for BSs to communicate with each other or a central
controller for real-time resource allocation at each BS. As a
result, any practical resource allocation algorithm for multiRAT HetNets should be fully distributed (i.e., autonomously
executed by each BS). (ii) Resource allocation has many
practical constraints. Conventional BS hardware allows only
minor modifications to existing resource allocation algorithms
through software updates, limiting the algorithm design space.
New algorithms should also incur minimal signaling overhead
and computational complexity. Distributed algorithms based
on the traditional network utility maximization framework [7],
[8] do not meet these requirements, because as we will
show later through simulations the resulting algorithms are
radically different from how conventional BSs operate, have
significant over-the-air signaling overhead, and increase the
computational complexity on the client side. (iii) In HetNets,
each client has access to a client-specific set of RATs, and
receives packets at a different PHY rate on each RAT. These
rates are naturally different across clients. This multi-rate
property of HetNets makes it particularly challenging to design
resource allocation algorithms with performance guarantee. As
a result, existing solutions in the literature are all limited to
simple setups, e.g., when each client has only two RATs as in
the case of LWA [9] or LTE DC [10].
In this paper, we study the problem of resource allocation
for traffic aggregation in multi-RAT HetNets. We focus on
the proportional-fair (PF) fairness objective as it is widely
used and implemented in BSs and provides a balance between
fairness and throughput [11], [12]. We first consider PF
resource allocation in a single BS, and then use our insights
1 We
use “BS” generically to mean an LTE eNB, WiFi AP, etc.
2
from this case to design a distributed algorithm that meets our
three research challenges. We next show that our algorithm
converges to an optimal PF resource allocation. The key
contributions are as follows:
• Algorithm Design: We study the basics of PF resource
allocation in a single BS to gain intuition for the distributed
algorithm design. We show that PF resource allocation in a
single BS can be viewed as a special type of water-filling.
We generalize this observation to a new fully distributed
water-filling algorithm (named AFRA) that makes a minor
modification to the conventional single BS algorithm and
achieves PF in HetNets.
• Convergence and Speed: We show that AFRA is guaranteed to converge to an equilibrium as BSs autonomously
execute it [Theorem 1] and derive tight bounds on its
convergence time (speed) [Theorem 2].
• Optimality: We first show that at optimality, the sum of
the inverse water-fill levels across all BSs is equal to the
sum of the weights (numbers that show clients’ priorities)
across all clients [Theorem 3]. Next, we use this property
to prove that any equilibrium outcome of AFRA is globally
optimal [Theorem 4]. Finally, we show that at equilibrium
the vector of throughput rates across all clients is unique;
however, there could be infinitely many resource allocations
that realize this outcome [Theorem 5].
• Practicality: We construct a testbed with programmable BS
hardware, and show that we can successfully aggregate the
throughput across multiple BSs at the MAC layer. We also
show that replacing the conventional resource allocation
algorithm on each BS with AFRA can substantially increase
the system throughput and fairness.
• Performance: We conduct extensive simulations to characterize AFRA’s convergence time properties as we scale the
number of BSs and clients. We also introduce policies that
reduce the convergence time by more than 30%. Finally, we
compare the performance of AFRA against DDNUM, a dual
decomposition algorithm that we derived from the NUM
framework. We show that compared to DDNUM, AFRA is
2-3 times faster with 4-5 times less over-the-air overhead.
This paper is organized as follows. We discuss the related
work in Section II. We present the system model and details
of AFRA in Section III. In Sections IV and V we prove the
convergence and optimality of AFRA. We present the results
of our experiments, simulations, and comparisons against
DDNUM in Section VI. We conclude the paper in Section VII.
II. R ELATED W ORK
We discuss the related work in the areas of multi-BS
communication and distributed optimization, and highlight
their differences from this paper.
Single-RAT Multi-BS Communication. Prior works have
studied the problem of traffic aggregation when a client can
simultaneously communicate with multiple same technology
BSs. For example, [13] uses game theory to model selfish traffic splitting by each client in WLANs. On the other hand, the
resource allocation problem in HetNets is primarily addressed
at the BS side. Similarly, [10] proposes an approximation algorithm to address the problem of client association and traffic
splitting in LTE DC. Our algorithm (AFRA) goes beyond
this and other related work by guaranteeing optimal resource
allocation for any number of RATs and BSs. Other works
have developed centralized client association algorithms to
achieve max-min [14] and proportional fairness [15] in multirate WLANs. In contrast, the problem of resource allocation
in HetNets needs to be solved in a fully distributed manner.
Multi-RAT Communication. Resource allocation algorithms that realize the capacity gains in HetNets are still in
their early stages. The problem of PF resource allocation for
LWA was studied in [9]. In the proposed setup, each client
has one LTE and one WiFi RAT. Further, there is only a
single LTE BS in the network, and each client’s throughput
across its WiFi RAT is fixed. Next, the authors propose a
water-filling based resource allocation algorithm at the LTE
BS that achieves PF. Similarly, we show that the optimal
PF resource allocation in a single BS can be interpreted as
a form of water-filling. However, we use the observation to
design an optimal algorithm for the generic problem with any
number of BSs and client RATs, and explicitly model the
impact of system dynamics on the throughput that each client
gets from every BS. In our prior work [16], we addressed
the problem of max-min fair resource allocation in HetNets.
However, even with opportunistic centralized network supervision over autonomous resource allocation at each BS we
could not optimally solve the problem. Here, we focus on
the PF objective, which is commonly implemented in BSs,
and show that we can optimally solve the problem in a
purely distributed manner. Other works have built testbeds to
evaluate the over-the-air performance of MAC-level cross-RAT
throughput aggregation [17]–[20]. All these works have relied
on conventional scheduling algorithms on each BS and focused
on higher layer transport and application performance. We
experimentally show that replacing the conventional resource
allocation algorithms with AFRA can substantially increase
the system throughput and fairness.
Distributed Network Utility Maximization (NUM). There
is a large body of general results on the mathematics of
distributed computation, some of which are summarized in
standard textbooks such as [21], [22]. More recently, the
framework of NUM [7], [8], [23] has emerged as a mathematical tool to optimize layered network architectures. The
framework allows for decomposition of a global optimization
problem into subsets of local problems that are carried out
distributedly and implicitly solve the global NUM problem.
We have derived an alternative distributed algorithm (named
DDNUM) by leveraging dual decomposition and the NUM
framework. We will show through simulations that DDNUM
is 2-3 times slower than AFRA (in terms of convergence
time) and increases the over-the-air signaling overhead by 4-5
times. These disadvantages, coupled with the increased client
side computational complexity and lack of compatibility with
conventional BSs, make NUM-based algorithms impractical
for multi-RAT traffic aggregation.
3
III. S YSTEM M ODEL
We discuss the system model and the resource allocation
algorithm that is autonomously executed by each BS.
The total amount of time fractions available to each BS
cannot exceed 1. Thus, for the λi,j s to be feasible we have
N
X
A. Network Model
λi,j ≤ 1 ∀j ∈ M
We consider a HetNet composed of a set of BSs M =
{1, ..., M } and a set of clients N = {1, ..., N }. Each BS has a
limited transmission range and can only serve clients within its
range. Each client has a client-specific number of RATs, and
therefore has access to a subset of BSs. We model clients that
can aggregate traffic across BSs of the same technology (e.g.,
LTE DC) with multiple such RATs. Fig 1 shows an example
HetNet topology. We assume that clients split their traffic
over the BSs and focus on the resource allocation problem
at each BS. It is itself a challenging problem to determine
which BS to associate with among same technology BSs (e.g.,
choosing the optimal LTE BS if a client has an LTE RAT).
We assume there exists a rule to pre-determine client RAT to
BS association. The pre-determination rule could for instance
be any load balancing algorithm [24], [25], or based on the
received signal strength. Similar to [13]–[16], [24], we assume
that the transmission in one BS does not interfere with an
adjacent BS. This can be achieved through spectrum separation
between BSs that belong to different access networks and
frequency reuse among same technology BSs.
Fig. 1. A heterogeneous network with 4 access technologies. Each client
is in the coverage area of a group of BSs (dotted lines) and can split or
aggregate its traffic across the corresponding BSs (RATs). The 3GPP is
actively developing several new RATs for both sub-6 GHz and mmWave
bands, re-emphasizing the heterogeneity of future wireless networks.
λi,j ≥ 0 ∀i ∈ N, j ∈ M
(3)
TABLE I
M AIN N OTATION
N and N : Set and number of all clients in the network
M and M : Set and number of all BSs in the network
Ri,j : PHY rate of client i to BS j
Rmax : maximum PHY rate across all clients and BSs
Rmin : non-zero minimum PHY rate across all clients and BSs
λi,j : Fraction of time allocated to client i by BS j
λ: Vector of λi,j s across all clients and BSs
ri : Total throughput of client i across all its RATs
ωi : A positive number that represents client i’s weight or priority
θj : Water-fill level at BS j
C. Background: Conventional PF Allocation in a Single BS
We first describe the basics of the PF resource allocation
that is conventionally implemented in today’s BSs. Consider
a network topology consisting of only a single BS j and
n0 clients. Let ri denote the throughput of client i and
ωi a positive number that denotes its weight (or priority).
A widely
used objective function for PF is to maximize
Pn0
ω
log(r
i ) [11], [12]. It represents a tradeoff between
i=1 i
throughput and fairness among the clients. Let λi denote the
time fraction allocated to client i by BS j. To maximize the
PF objective function, the BS needs to solve the following
problem
0
P1 : max
n
X
ωi log(Ri,j λi )
i=1
0
s.t.
n
X
λi ≤ 1
i=1
B. Throughput Model
variables:
We consider a multi-rate system and use Ri,j to denote the
PHY rate of client i to BS j. Since each BS generally serves
more than one client, clients of the same BS need to share
resources such as time and frequency slots (e.g. in 3/4/5G)
or transmission opportunities (e.g. in WiFi). The throughput
achieved by client i from BS j thus depends on the load of
the BS and will be a fraction of Ri,j . We assume that each
BS employs a TDMA throughput sharing model2 and let λi,j
denote the fraction of time allocated to client i by BS j. Hence,
the throughput achieved by client i from BS j is equal to
λi,j Ri,j and its total throughput across all its RATs would be
M
X
Total Throughput of Client i = ri =
λi,j Ri,j (1)
j=1
2 In
(2)
i=1
Section VI-A, we discuss how we can extend our model and algorithm
to capture practical implementation issues such as WiFi contention.
λi ≥ 0
Problem P1 can be easily solved through a simple algorithm. The Lagrangian of P1 can be expressed as
0
L(λ, µ) =
n
X
i=1
0
ωi log(Ri,j λi ) + µ(1 −
n
X
λi )
(4)
i=1
where µ is a constant number (Lagrange multiplier) chosen
to meet the time resource constraint. Differentiating with
respect to time fraction resource λi and setting to zero gives
Ri,j ωi
ωi
− µ = 0 =⇒
= µ ∀i ∈ {1, ..., n0 }
Ri,j λi
λi
(5)
Since the sum of time fractions at optimality
is equal to 1,
P
we can conclude from Eq. (5) that µ = ωi . With known µ
and ωi , we can derive λi s from Eq. (5).
4
Now, let θj be defined as
1
µ.
Leveraging Eq. (5), we have
λi
= θj ∀i ∈ {1, ..., n0 } =⇒
ωi
ri
= θj ∀i ∈ {1, ..., n0 }
ωi Ri,j
(6)
Eq. (6) has an interesting water-filling based interpretation:
the time allocated to each client is such that the throughput
of the client divided by its PHY rate times its weight is the
same across all clients. We refer to this ratio (i.e., θj ) as the
water-fill level of BS j. In the next section, we will turn this
observation in a single BS into a distributed resource allocation
algorithm in HetNets.
(Fig. 3) summarizes the steps that are autonomously executed
by each BS j. There are three main steps in the algorithm: (i)
clients are sorted based on the total throughout they receive
from other BSs (ri0 ) divided by ωi Ri,j (Line 3), (ii) BS j
finds the water-fill level (θj ) and allocates the time resources
accordingly (Line 4), and (iii) finally we introduce a randomization parameter to limit concurrent resource adaptation of a
single client by multiple BSs (Line 5).
D. Distributed Resource Allocation in HetNets
There are two approaches to designing a resource allocation
algorithm for generic HetNets. One approach, as we show in
the Appendix, is to extend the formulation in P1 to include
multiple BSs and client RATs, and use dual decomposition
to derive a distributed algorithm. This approach converges to
the optimal solution; however, the Lagrange multipliers across
BSs would no longer correspond to BSs’ water-fill levels.
The second approach is to directly generalize the water-filling
interpretation to derive an alternative algorithm, which still
converges to the optimal solution (Section V) with far less
overhead, convergence time, and complexity than the dual
decomposition based algorithm (Section VI-C).
From Eq. (6), we observe that in a network with only a
single BS, the BS allocates its time resources so that the clients
who get the time resources reach the same water-fill level (i.e.,
throughput divided by ωi Ri,j ). Thus, in generic HetNets, if
each BS considers the total throughput of each client across all
its RATs (i.e., ri ) divided by ωi Ri,j in its water-fill definition,
this should lead to a fair distributed algorithm. In other words,
each BS j should share its time resources across its clients
such that: (1) all clients who get the time resources reach the
same water-fill level at BS j (i.e., θj ), and (2) if a client (e.g.,
i0 ) does not get any time resources from BS j, its ω 0rRi0 0 is
i
i ,j
greater than θj . Fig. 2 illustrates this operation.
Fig. 2. There are 4 clients with non-zero PHY rates to BS j. Blue boxes
denote contributions to ω rRi by BS j (when it allocates time resources)
i i,j
and white boxes show contributions to it by other BSs. BS j allocates
its time resources so that all clients that get resources achieve the same
water-fill level (θj ). Clients that do not get any resources from BS j have
a higher ω rRi
than θj . Client i3 is one such client in this example.
i
Fig. 3.
Resource allocation algorithm autonomously run by each BS j.
We next elaborate on how each BS j finds its water-fill level
and its clients’ time resource fractions (Line 4). Let n0 denote
the number of clients such that Ri,j > 0. Let ri0 denote the
total throughput of client i 0from all BSs other than j. Consider
r
an ordering in clients’ ωi Rii,j according to Line 3 of AFRA.
In order to solve the water-fill problem (i.e., Line 4 of AFRA),
we need to find the water-fill level θj , client index k, and time
fractions λi,j s such that
r0 + λ2,j R2,j
r0 + λk,j Rk,j
r10 + λ1,j R1,j
= 2
= ... = k
= θj
ω1 R1,j
ω2 R2,j
ωk Rk,j
(7)
0
0
r
rk
k+1
< θj ≤
(8)
ωk Rk,j
ωk+1 Rk+1,j
k
X
λi,j = 1, λi,j > 0
(9)
i=1
i,j
We next turn this idea into a distributed resource allocation
algorithm. Consider slotted time for now. Algorithm AFRA
We can find these variables with a simple set of linear operations. First, we can find k by checking a set of inequalities
5
r0 ω R
0
2 1 1,j
ω2 R2,j −r1
≥ 1 ⇒ k = 1 else
R1,j
0ω R
r30 ω1 R1,j 0
r3
2 2,j
0
ω3 R3,j −r1
ω3 R3,j −r2
+
≥ 1 ⇒ k = 2 else
R1,j
R2,j
...
r0 0 ω 0
R
r 0 0 ω1 R1,j
n −1 n0 −1,j
0
n
n
−r10
−rn
0 −1
ω
R
ω 0R 0
0
0
n
n ,j
n
n ,j
+
...
+
≥1
R1,j
Rn0 −1,j
⇒ k = n0 − 1 else
0
k=n
r 0 +R
r0
1,j
In the first inequality, we first check if ω1 1 R1,j
≤ ω2 R22,j .
If this is true, from
Eq. (7) we conclude that client 2 would
r0
have a higher ω2 R22,j than ω1rR11,j even if BS j allocated all
its time
resources to client 1 (i.e., to the client with minimum
ri0
across
all n0 clients). As a result k should be equal to
ωi Ri,j
1. This procedure (and logic) is continued until k is found.
With known k, we can find θj by combining Eqs. (7) and
(9) and solving the following linear equation
k
X
θj ωi Ri,j − ri0
=1
Ri,j
i=1
(10)
With known k and θj , the λi,j s can be found from Eq. (7).
AFRA’s Computational Complexity and Message Passing Overhead. We calculate AFRA’s computational complexity in finding the new time resource fractions (λi,j s) for a BS j.
Let n0 denote the number of clients with non-zero PHY rates to
j. The complexity of sorting clients (Line 3) is O(n0 log(n0 )).
The complexity of finding the water-fill level and the new
time resource fractions (Line 4) is O(n0 log(n0 )) (with a
binary search to find k). Thus, the overall computational
complexity is O(n0 log(n0 )). If we assume that each client has
on average K RATs, then on average n0 would be equal to
KN
M . Thus, the computational complexity would also be equal
KN
to O( KN
M log( M )).
Each BS uses the total throughput of each client across all
its RATs in its calculations to find the water-fill level and the
new λi,j s. Each time a client’s time resource (and hence total
throughput) is changed, the client needs to inform all BSs to
which it is connected about its new total throughput. Thus, the
total message passing overhead generated by clients of a single
2
BS is at most equal to O(n0 K), or alternatively O( KMN ).
IV. C ONVERGENCE AND S PEED OF AFRA
In this section, we investigate the convergence properties
of AFRA. We first show that as BSs autonomously execute
AFRA, the system converges to an equilibrium. Next, we
investigate the convergence time properties of AFRA and
provide tight bounds to quantify it.
A. Convergence to an Equilibrium
Before we discuss convergence, we present a formal definition of an equilibrium.
Definition 1 Equilibrium: The vector of time fractions across
all the BSs and clients is an equilibrium outcome if none of the
BSs can increase its water-fill level through unilateral change
of its time resource allocations.
Our next theorem guarantees the convergence of AFRA.
Theorem 1 Let each BS autonomously execute AFRA.
Then, the system converges to an equilibrium, i.e., ∀i ∈ N
eq
eq
and j ∈ M λi,j → λeq
i,j , θj → θj , and ri → ri .
Proof: Let λ denote the vector of time fractions (λi,j s)
PN
across all clients and BSs, and f (λ) = i=1 ωi log(ri ) be the
potential function. A potential function [26] is a useful tool
to analyze equilibrium properties, as it maps the payoff (e.g.,
throughput) of all clients into a single function.
Since the number of clients and BSs is finite, f is bounded.
The key step to prove convergence, is to show that each time a
BS j adjusts its time fractions (i.e., λi,j s), the potential function (f ) increases. This property coupled with f ’s boundedness
guarantees its convergence. We will show later in Eq. (15) that
the change in potential function is proportional to the product
of the change in water-fill levels and the change in λi,j s. Since
f converges (i.e., its variations converge to 0), one or both of
these terms should converge to 0. Either of these conditions
guarantee the convergence of the λi,j s (and hence, θj s and ri s).
Next, we show that each time a BS runs AFRA, f increases.
When a BS runs AFRA, it takes some time resources from
clients with high ωi rRii,j and distributes them across clients
with lower values. To ease the proof presentation, we focus
on two clients and follow the changes on f as the BS adjusts
the λi,j s dedicated to these clients.
Let, i, i0 denote two clients who are currently receiving time
resources from BS j. Assume the following initial (old) order
between these two clients
ri
ri0
<
(11)
ωi Ri,j
ωi0 Ri0 ,j
Therefore, as BS j executes AFRA it changes the time resources from λi and λi0 to λi +δ and λi0 −δ, respectively. This,
only changes the two corresponding terms in the potential
function, i.e.
f (λ)new − f (λ)old = ωi log(ri + δRi,j )−
ωi log(ri ) + ωi0 log(ri0 − δRi0 ,j ) − ωi0 log(ri0 ) =
Ri,j
Ri0 ,j
ωi log(1 + δ
) + ωi0 log(1 − δ
)
ri
ri0
(12)
Let g(δ) denote the variation in potential function, i.e.
g(δ) = ωi log(1 + δ
Ri,j
Ri0 ,j
) + ωi0 log(1 − δ
)
ri
ri0
(13)
Thus, to prove convergence, we need to prove that g(δ) is
always positive. We prove this by showing that first g 0 (δ) ≥ 0.
This shows that g(δ) is always non-decreasing. Second, we
show that g(δ) is positive for very small values of δ. Now
6
the following initial (old) order among the clients
0
g (δ) = ωi
1
Ri,j
ri
R
+ δ ri,j
i
−ω
i0
1
Ri0 ,j
ri0
R0
− δ ri 0,j
i
(14)
ωi Ri,j
ωi0 Ri0 ,j
ωi0 Ri0 ,j
ωi Ri,j
=
−
= new − new ≥ 0
ri + δRi,j
ri0 − δRi0 ,j
ri
ri0
Here rinew and rinew
are the new throughput values for clients
0
i and i0 , respectively. It is new
clear that new
after BS j adjusts the time
ri0
ri
resources, we still have ωi Ri,j ≤ ω 0 R 0 . This is because after
r new
i
i ,j
0
BS j reduces λi0 ,j , ω 0iR
would be either equal to the new
i
i0 ,j
water-fill level or higher than it (if λi0 ,j = 0). On the other
r new
hand, ωiiRi,j would be equal to the new water-fill level. As a
result, the final term in Eq. (14) is non-negative. Finally, g(δ)
is greater than zero for small values of δ because
g(δ)
Ri,j
Ri0 ,j
≈
ωi δ
=
− ωi0 δ
ri
ri0
ωi Ri,j
ωi0 Ri0 ,j
δ(
−
)>0
ri
ri0
riold
riold
riold
q
1
2
≤
≤ ... ≤
ωi1 Ri1 ,j
ωi2 Ri2 ,j
ωiq Riq ,j
Taylor Approx
(15)
When BS j executes AFRA, it adjusts the time fractions in
a way that increases the time resources allocated to client i1 .
Let i1 denote the increase in client 1’s time resources and
= ri1 its new throughput. Let ip denote the change in
rinew
1
its new
client ip ’s (ip ∈ {i2 , ..., iq }) time resources and rinew
p
throughput. Hence, we have
rinew
= ri1 , riold
= ri1 − i1 Ri1 ,j
1
1
rinew
p
PN
Proof: Let f (λ) =
i=1 ωi log(ri ) be the potential
function from the proof of Theorem 1. To compute a bound
on the convergence time, we study the increments of f . The
key step is to find a lower bound on f ’s increments. Since
f increases whenever a BS makes adjustments to its λi,j s,
the convergence time is then upper bounded by the difference
between the maximum and minimum possible values of f
divided by the lower bound on f ’s increments.
We take the following steps to find a lower bound on the
potential function’s increments. Let {i1 , i2 , ..., iq } denote the
set of clients with non-zero PHY rates to BS j and assume
= rip + ip Rip ,j ∀ip ∈ {i2 , ..., iq } (18)
(19)
However, even after BS j adjusts its time resources, i1
would still have the minimum ωi rRii,j across all clients. This
is due to the water-fill based operation in AFRA. As a result
rip
ri1
≤
∀ip ∈ {i2 , ..., iq }
ωi1 Ri1 ,j
ωip Rip ,j
Rip ,j
ωi Ri ,j
=⇒
≤ 1 1
rip
ωip ri1
(20)
(21)
Next, we find a lower bound on the potential function’s
increments
f (λ)old − f (λ)new =
ωi1 log(1 −
q
X
i Ri ,j Eq. (21)
i1 Ri1 ,j
)+
ωip log(1 + p p ) ≤
ri1
rip
p=2
ωi1 log(1 −
q
X
i Ri ,j ωi1
i1 Ri1 ,j
)+
ωip log(1 + p 1
)
ri1
ri1 ωip
p=2
(22)
i Ri
Pq
,j ωi
p
1
1
Let W =
. Since the
p=2 ωip and xp =
ri1 ωip
logarithm is a concave function, from Jensen’s inequality [27],
q
X
Theorem 2 Consider a HetNet with N clients and M
BSs. Then, the number of steps that it takes for AFRA
2
N)
).
to converge is upper bounded by O( N M log(M
2
(17)
i1 = i2 + i3 + ... + iq
B. Convergence Time
Definition 2 Discretization Policy: During water-fill calculation by a BS j in AFRA, the time fraction allocated to the client
with minimum ωi rRii,j should increase by at least . Otherwise,
the BS would not update its time fractions.
Based on the above discretization policy, we can derive the
following bound on the convergence time.
riold
p
= rip ,
The last term in the above equation is due to Eq. (11).
Before we can derive a bound on convergence time, we
need to define a discretization factor on the time fractions
(i.e., λi,j s). This technicality is due to the fact that λi,j s in
our model are continuous variables, which can cause some
BSs to continuously make infinitesimal adjustments to them.
These adjustments converge to 0 as time goes to infinity.
In practice, operations always happen in discretized levels.
For example, consider the following discretization policy:
(16)
ωip log(1 + xp ) = W
p=2
q
X
ωip
p=2
W
log(1 + xp ) ≤
q
q
X
X
ωi
ωi
ωip
W log( ( p + p xp )) = W log(1 +
xp )
W
W
W
p=2
p=2
(23)
Leveraging Eq. (23), we conclude that Eq. (22) is
=i
≤ ωi1 log(1 −
i1 Ri1 ,j
ωi Ri ,j
) + W log(1 + 1 1
ri1
W ri1
z }|1 {
q
X
ip ) =
p=2
z Taylor Series
z2
ωi1 [log(1 − z) + γ log(1 + )]
≤
−ωi1
γ
2
2
z
=⇒ f (λ)new − f (λ)old ≥ ωi1
2
(24)
7
R
1 ,j
and γ = ωWi . Note that since we seek
where z = i1 rii1
1
an upper bound on convergence time, we can choose a small
enough i1 so that z, γz < 1. These assumptions increase the
upper bound but allow us to use the Taylor series in Eq. (24).
If we let Rmin and Rmax denote the minimum and maximum
PHY rates across all the clients and BSs, then we have
Convergence Time ≤
Part 3. We leverage I and II to derive property III as
follows
N
X
ωi =
i=1
N
X
ωi req
i=1
Maxf (λ) − Minf (λ)
i
rieq
N X
M
X
ωi λeq
i,j Ri,j
=
Eq. (26)
=
1
2 Ri1 ,j 2
2 ωmin ( ri1 )
rieq
i=1 j=1
=
X ωi λeq
i,j Ri,j
rieq
eq
λi,j >0
M
X λeq
II X
1
i,j
eq =
eq
θ
θ
eq
j
j=1 j
(28)
λi,j >0
PN
( i=1 ωi )(log(rmax ) − log(rmin ))
≤
1
2 Rmin 2
2 ωmin ( M Rmax )
P
( ωi )(log(M Rmax ) − log( ωPmin
ωi Rmin ))
≤
We next show that any equilibrium outcome of AFRA is
globally optimal, i.e., it maximizes the global PF resource
allocation problem.
1
2 Rmin 2
2 ωmin ( M Rmax )
≡ O(
N M 2 log(M N )
)
2
(25)
V. O PTIMALITY OF AFRA
Beyond convergence, we study the optimality properties
of AFRA’s equilibria. We first derive some useful properties
of the equilibria that we leverage for optimality analysis.
Next, we prove that the equilibria also maximize the global
proportional fair resource allocation problem across all the
BSs, and hence are globally optimal. Finally we discuss
the uniqueness of the equilibria and prove that while the
equilibrium throughput vector across all the clients is unique,
there could be infinitely many resource allocations that realize
this outcome. For simplicity, we do not consider discretization
in this section.
Theorem 3 Consider an equilibrium outcome of AFRA.
Let rieq denote the throughput of client i, θjeq the water-fill
level of BS j, and λeq
i,j the fraction of time allocated to
client i by BS j. Then
ω R
I ireqi,j ≤ θ1eq ∀i ∈ N, j ∈ M
PiN eq j
II
i=1 λi,j = 1 ∀j ∈ M
PN
PM 1
III
i=1 ωi =
j=1 θ eq
j
Theorem 4 Consider an equilibrium outcome of AFRA.
Then, the equilibrium outcome also maximizes the
global
PN PF resource allocation problem, i.e., it maximizes
i=1 ωi log(ri ) subject to the feasibility constraints in
Eqs. (1)-(3).
Proof: Let rieq and θjeq denote the throughput of client i
and water-fill level of BS j at an equilibrium, respectively.
We prove that for any feasible selection of λi,j s (i.e., λi,j s
that satisfy the feasibility conditions in Eqs. (2) and (3))
and the corresponding clients’ throughput values (i.e., ri s as
defined in Eq. (1)) we have
N
X
ωi log(ri ) ≤
N
X
PN
Define W = i=1 ωi . Eq. (29) can then be proved through
the following inequalities by leveraging properties I and
III from Theorem 3:
N
X
W
ωi log(ri ) −
i=1
N
X
N
X
ωi log(rieq ) =
i=1
rieq
= θjeq
ωi Ri,j
rieq
= 0 =⇒
≥ θjeq
ωi Ri,j
Ri,j > 0, λeq
i,j
(26)
W log(
ωi log(
ri
)=
rieq
ωi
ri Jensen Inequality
ωi
ri
log( eq )
≤
W log( ( × eq )) =
W
ri
W ri
i=1
N
M
X ωi ri
X X ωi λi,j Ri,j
1
1
×
( eq )) = W log( ×
)
W i=1 ri
W i=1 j=1
rieq
N X
M
M
I
III
X
X
1
λi,j Eq. (2)
1
1
≤ W log( ×
)
≤
W
log(
×
eq
eq ) =
W i=1 j=1 θj
W j=1 θj
N
(27)
Property I follows from Eqs. (26) and (27).
Part 2. Every BS can always increase its water-fill level
by distributing its unused time resources across its clients.
The property follows, since at equilibrium the water-fill levels
cannot be further increased.
N
X
i=1
N
X
N
Ri,j , λeq
i,j > 0 =⇒
(29)
i=1
i=1
i=1
Proof: Part 1. From the water-fill definition we have
ωi log(rieq )
W log(
X
1
×
ωi ) = 0
W i=1
(30)
In our last theorem we prove that while the equilibrium
throughput vector across all clients is unique, there could be
infinitely many resource allocations that realize this outcome.
8
eq
Theorem 5 Let req = (r1eq , ..., rN
) denote the vector
of throughput rates across all clients at an equilibrium.
Then, req is unique. However, there could be infinitely
many resource allocations across the BSs that realize req .
Proof: Part 1. We first prove that req is unique. Let
r maximize the global proportional-fair resource allocation
across all clients and assume r0eq is a different equilibria.
From Theorem 4, we know that every other equilibrium should
also maximize the global PF resource allocation. This means
that all inequalities in Eq. (30) should be equalities for any
equilibrium, including r0eq . Now, for the first inequality to be
an equality (i.e., Jensen inequality of Eq. (30)), the following
condition needs to be satisfied [27]
eq
r2eq
rneq
r1eq
=⇒ ri0eq = α rieq ∀i ∈ N (31)
=
=
...
=
r10eq
r20eq
rn0eq
PN
PN
eq
0eq
Further, since
i=1 ωi log(ri ) =
i=1 ωi log(ri ), we
conclude that
rieq = ri0eq ∀i ∈ N
(32)
Part 2. To prove that there could be infinitely many resource
allocations that realize req , we provide an example. Consider
a topology with two BSs (j1 , j2 ) and two clients (i1 , i2 ). Let
Ri1 ,j = 1P∀j ∈ M, Ri2 ,j = 2 ∀j ∈ M, and ωi1 = ωi2 =
2. Then,
ωi log(ri ) is maximized by the following time
fractions for any α ∈ [0 1].
λi,j = α
for i = i1 , j = j1 and i = i2 , j = j2
λi,j = 1 − α for i = i1 , j = j2 and i = i2 , j = j1
(33)
Here, irrespective of α, ri1 = 1 and ri2 = 2.
VI. P ERFORMANCE E VALUATION
In this section, we evaluate AFRA’s performance through
experiments and simulations. First, we investigate the benefits
of MAC level traffic aggregation in a small testbed composed
of four SDR (software-defined radio)-based BSs and clients.
Next, we conduct simulations to evaluate AFRA’s equilibria
properties as we scale the number of clients and BSs. Finally,
we compare AFRA’s speed and over-the-air signaling overhead
against DDNUM, a dual decomposition based algorithm that
we derived from the NUM framework.
A. SDR-Based Implementation and Real-World Performance
Implementation. We construct a HetNet topology composed of a WiFi BS, a cellular BS, and two clients. The
two BSs are physically separated from each other and are
placed in an indoor lab environment (Fig. 4(a)). We use a
WARP board [28] with 802.11a reference design as our WiFi
BS. We use another WARP board with OFDM PHY (WARP
OFDM reference design) and a custom TDMA (Time Division
Multiple Access) MAC to mimic a cellular BS. We use two
other WARP boards to construct our two clients. Each client
has access to both WiFi and cellular radios, and remains static
and connected to both BSs throughout the experiments.
A server running iPerf sessions is connected to both BSs
through Ethernet. For each client, the server generates a single
fully-backlogged UDP traffic flow with 500 byte packets.
We implement a below-IP sublayer to split this traffic flow
between the two BSs. This sublayer is responsible for selection
of the BS to be used for each packet, and acts similar to the
LWA Adaptation Protocol (LWAAP) in the LWA standard [2].
In our implementation, we sequentially iterate between the
WiFi and cellular BSs to route the packets of each traffic flow.
AFRA, as presented in Section III-D, does not account for
various types of overhead (e.g., PHY/MAC header, ACKs, idle
slots, collisions) that exist in PHY/MAC protocols. To address
the issue, we introduce the notion of effective rate (Reff ) and
eff
replace all Ri,j s in AFRA with Ri,j
s. For a single packet, Reff
can be calculated as the number of bits in the packet divided
by the total time it takes by a BS to successfully transmit
that packet (including all overhead). In our implementation,
each BS keeps track of the total time spent in successfully
transmitting the past 5 packets of each traffic flow (i.e., the past
eff
5 packets of each client) to calculate its Ri,j
. The averaging
over 5 packets is to account for channel fluctuations in our
experiments, and can be adjusted based on the client mobility.
We implement the following mechanisms: (i) WiFi only: the
cellular BS is off but the WiFi BS is active; (ii) Cellular only:
WiFi BS is off; (iii) AGG-RR: this scheme uses aggregation
but with a round robin (RR) scheduler at the WiFi BS
and conventional PF MAC at the cellular BS. With the RR
scheduler, the WiFi BS maintains a different queue for each
client and sequentially serves a single packet from each queue
at every round. With the PF MAC at the cellular BS, the
BS dedicates its time resources to each client according to
Section III-C (single BS PF); (iv) AFRA: each BS uses its
calculated λi,j s to determine the number of packets that should
be served from each queue in WiFi and the number of time
slots that should be dedicated to each queue (client) in cellular,
at every round. In our implementation, both clients’ ωi are
equal to 1 and the BSs updates their λi,j s every 5 ms.
Performance Results. Fig. 4(c) shows the performance of
the four schemes. In both the WiFi only and Cellular only
options, only a single BS is active throughout the experiments.
We observe that the Cellular only scheme provides a higher
sum throughput than the WiFi only scheme. With careful
evaluation of packet transmission traces, we discovered that
this higher throughput is primarily due to the corresponding
MAC protocols. In particular, WiFi MAC provides the same
transmission opportunity to each traffic flow (client). As a
result, the client with lower PHY rate occupies the channel
for a longer duration that the other client. This decreases the
throughput for both clients. In contrast, the cellular TDMA
MAC provides the same transmission time for both clients
(with 2 clients, single BS PF equally divides the time between
the clients (Eq. 5)). As a result, the throughput of the client
with higher PHY rate does not drop because of the client with
a lower PHY rate. This, along with other MAC issues such as
9
50
0
20
40
60
80
100
Number of Clients (N)
(a)
100
75
N=10
N=20
N=50
50
25
0
20
40
60
80
100
Number of BSs (M)
(b)
M=10, N=10
45
Run1
Run2
Run3
Run4
Run5
Priority
40
35
0
5
10 15 20 25 30
Step Number
(c)
Potential Function
100
M=10
M=20
M=50
Potential Function
150
Avg Num of Steps
Avg Num of Steps
Fig. 4. We use two WARP boards to construct two BSs in our testbed. The BSs are connected to a server through Ethernet. The server runs a single
fully-backlogged DL UDP iPerf session to each client. A sublayer implementation below the IP layer at the server, selects the BS for each packet
of every traffic flow. The clients (not shown in the photo) have access to both radios, and remain static and connected to both BSs throughput the
experiments (a); Cellular TDMA and WiFi MACs. The PHY header and ACKs are sent at a fixed transmission rate. Clients embed the throughput
they receive from other BS in their ACK packets. The MAC header and payload are transmitted at a variable transmission rate. We define Ref f
ef f
as the total number of payload bits divided by the total time it takes to successfully transmit a packet. We replace all Ri,j s in AFRA with Ri,j
to derive the λi,j s and determine the number of packets that should be served from each queue (b); Total throughput across the two clients for
four schemes: WiFi only (WiFi), Cellular only (Cellular), AGG-RR, and AFRA. AFRA achieves a higher average total throughput (29 Mbps vs 20
Mbps) and PF index (2.3 vs 1.97) compared to AGG-RR(c); Per-client throughput values for both AFRA and AGG-RR (d).
M=10, N=20
60
Run1
Run2
Run3
Run4
Run5
Priority
55
50
45
40
0
5
10 15 20 25 30
Step Number
(d)
Fig. 5. AFRA’s performance evaluation results. Average number of steps to convergence as a function of number of clients (a) and number of BSs
(b). Evolution of potential function for two simulation scenarios one with M=10, N=10 (c) and the other with M=10, N=20 (d). Each Run in these
figures corresponds to a different simulation realization. In the priority curves (solid black curve with * markers), the BS with the highest local
increase in potential function gets priority in executing AFRA. Leveraging this policy reduces the average convergence time by more than 30%.
WiFi contention reduce the WiFi only throughput.
Fig. 4(c) also shows that the two RAT aggregation schemes
(AGG-RR and AFRA) can successfully aggregate WiFi and
cellular capacities and provide a higher sum throughput than
the WiFi only and Cellular only options. Further, AFRA
increases the average total throughput by 45% (from 20 to 29
Mbps) with 18 and 11 Mbps per-client total throughput values
(per-client throughput plots are shown in Fig.
P24(d)). Let us
define the proportional fairness index as PF = i=1 log(ri ) (ri
is the total throughput of each client across its RATs in Mbps).
Then, the PF index in AFRA would be 2.3. With AGG-RR, the
per-client throughput rates drop to 12.5 and 7.5 Mbps. Thus,
the PF index reduces to 1.97. AGG-RR uses the conventional
scheduling algorithms on each BS (i.e., it uses RR in WiFi
and single BS PF in cellular), which reduce both the sum
throughput and the PF fairness index.
B. AFRA’s Equilibria Properties
Setup. We simulated network deployments with N clients
and M BSs to evaluate AFRA’s equilibria properties as we
scale the number of clients and BSs. All clients’ ωi s are
equal to 1. Half of the BSs are WiFi and the other half are
cellular. Each client has access to 4 RATs, two WiFi and
two cellular. The PHY rates for the WiFi and cellular RATs
are randomly selected from the sets {1, 2, 5.5, 11} Mbps and
{5.2, 10.3, 25.5, 51} Mbps, respectively. In each simulation
realization, we randomly associate clients’ RATs with BSs.
Next, we run AFRA until an equilibrium is reached. We set
the discretization factor equal to 0.05, i.e., a BS adjusts its
time fractions only if the increase in time fraction (i.e., λi,j )
at its client with minimum ωi rRii,j is greater than or equal to
0.05. For the initial allocation, each BS equally divides its
time across its clients. Unless otherwise specified, each of our
simulation points is an average of 100 simulation realizations.
AFRA’s Convergence Time. Figs. 5(a) and 5(b) depict
the impact of the number of clients and BSs on AFRA’s
convergence time. In each of these figures, we count the
number of steps until convergence is reached. At each step, a
single BS that needs to adjust its time fractions is randomly
selected. In Fig. 5(a), we vary the number of clients from 10
to 100 and plot the corresponding convergence times for three
different M values: 10, 20, and 50. We repeat this simulation
by changing the N and M variables and plot the corresponding
results in Fig. 5(b). From these two figures, we observe that
time to convergence is highest when the number of clients is
between one to two times the number of BSs. As the ratio
N
between the number of clients and BSs (i.e., M
) leaves this
range, the convergence time rapidly drops and then stabilizes.
The results show that AFRA requires a small number of steps
to reach an equilibrium.
Policies to Further Reduce AFRA’s Convergence Time.
Our next goal is to design policies that can further reduce
AFRA’s convergence time. To gain intuition on how to design
such policies, we simulated a topology with 10 clients and
10 BSs
P and plotted the evolution of the potential function
(i.e.,
i log(ri )) as BSs adjusted their time fractions. The
10
results are shown in Fig. 5(c). Here, each Run corresponds
to a different simulation realization. From these realizations
we make two observations. First, there is a wide gap in
the convergence times. Second, a high jump in the potential
function pushes the system closer to equilibrium. Based on
these observations, we designed a prioritization policy among
the BSs to reduce the convergence time.
We let each BS calculate the increase in the potential
function assuming that it is the only BS executing AFRA.
Since in AFRA each BS knows the current total throughput
of its clients, it has all the needed information to calculate the
increase in the potential function due to its action. Next, each
BS broadcasts its calculated value. Finally, the BS with the
highest value gets priority in executing AFRA. This distributed
policy can be easily implemented in networks where all the
BSs are connected to the same backbone (e.g., Ethernet). The
solid black curve in Fig. 5(c) shows the potential function’s
evolution with this policy. We observe that on average, the
convergence time drops from 15 steps to 10, i.e., the prioritization policy reduces the convergence time by 33%. We repeated
this simulation for another setup with 20 clients to increase
the topological redundancy. The results are plotted in Fig. 5(d).
Similarly, the average convergence time reduces from 19 steps
to 13, i.e., a 32% reduction in convergence time.
C. Comparison Against DDNUM
We have compared AFRA’s performance against DDNUM,
a distributed algorithm that we developed by leveraging dual
decomposition and the NUM framework. Dual decomposition
is appropriate to solve the multi-RAT PF allocation problem,
because the coupling constraint (Eq. (2)) can be relaxed
through the dual problem and then the problem decouples into
subproblems that can be iteratively solved by clients and BSs.
DDNUM is in essence similar to the standard dual algorithm
presented in [7] to solve the basic NUM problem. We modified
the algorithm in [7] to capture the constraints of our problem.
At a high level, DDNUM has three main steps (for detailed
algorithm derivation and discussions, refer to the Appendix):
• Step 1: Initialization: set t = 0 and µ (0) to some nonnegative value for each BS. Here, µ (t) is the vector of Lagrange
multipliers that shows the cost or congestion across all BSs.
Each BS broadcasts its µj (0) to clients with Ri,j > 0.
• Step 2: Each client i locally solves its Lagrangian problem,
i.e., finds its time fractions (λ∗i,j (µj (t))) for each BS with
Ri,j > 0 and informs those BSs.
• Step 3: Each BS updates its price with a step size γ and
broadcasts the new price µj (t + 1) to all its clients.
This procedure is repeated until a satisfying termination
point is reached (e.g., the solution is within a desired proximity of the optimal solution). Similar to AFRA, DDNUM is
guaranteed to converge and maximize the global optimization
problem. However, there are several practicality and performance issues. We highlight a few of these issues next.
Setup. To compare AFRA to DDNUM, we used the simulation setup in Section VI-B (without the BS prioritization
policy). We first run AFRA and let the system converge to
(a)
(b)
Fig. 6. Compared to AFRA, DDNUM increases the average convergence
time by 2.4x and the average over-the-air signaling overhead by 4.5x.
an equilibrium. Next, we consider the 95% value of AFRA’s
potential function at equilibrium as the desired algorithm
termination point. We count the number of steps to reach
the termination point and the resulting over-the-air signaling
overhead in each of these two schemes. In DDNUM, the step
size γ (step 3) provides a balance between the final throughput
values and speed. We choose the γ that results in the fastest
convergence time, subject to the potential function reaching
the termination point. Finally, both AFRA and DDNUM can
operate in either parallel or sequential mode with similar
relative performance. We present the sequential mode results,
i.e., at each time only a single BS adjusts its water-fill level (in
AFRA) or announces a new price (in DDNUM). We assume
that clients immediately update their BSs about their new
throughput values (in AFRA) and desired λi,j s (in DDNUM)
with no impact on the convergence time (similar to an FDD
system in which uplink data is immediately available).
Speed. Fig. 6(a) show the convergence time results for
a scenario with 10 BSs and varying number of clients. We
observe that irrespective of the number of clients, DDNUM
increase the convergence time by a factor of 2-3x with an
average of 2.4x. In AFRA, each BS simultaneously calculates
the water-fill level and finds the corresponding time fraction
for each client. In DDNUM, the pricing mechanism requires a
high number of iterations so that clients can find their optimal
time fractions. This increases the convergence time.
Over-the-Air Overhead. Fig. 6(b) shows the wireless signaling overhead results of the two schemes. We observe that
DDNUM increases the signaling overhead by a factor of 45x with an average of 4.5x. There are several factors that
contribute to DDNUM’s high signaling overhead. First, the
increases in convergence time results in a similar multiplicative
increase in overhead. Second, in DDNUM both BSs and
clients contribute to overhead. BSs continuously broadcast new
prices and clients continuously inform each of their BSs about
their desired time fractions. In contrast, in AFRA only clients
update the BSs regarding their new throughput values. Third,
with careful examination of simulation traces, we observed
that in AFRA the water-fill operation only impacts a few of a
BS’s clients each time. In DDNUM, each time a BS updates
its price, most of its clients would request new time fractions.
Practicality. In DDNUM, each BS broadcasts its price
while each client finds its desired λ∗i,j s from its BSs. However,
in real wireless systems BSs are responsible for resource
allocation. Note that in DDNUM, it is not practical to shift
11
the calculation of λ∗i,j s (i.e., step 2) to BSs. This is because in
order for a BS j to find the λ∗i,j s for each of its clients (e.g., i),
it would require knowledge about the client’s Ri,j and µj to
every other BS for which the client’s rate (i.e., Ri,j ) is greater
than zero. This information is only available at the client and
pushing it to the BS would significantly increase the overhead,
which is already very high in DDNUM.
Complexity. In DDNUM, each client has to solve a complex
Lagrangian subproblem to find its desired time fraction for
each BS (step 2). This increases the computational complexity
on the client devices. In contrast, AFRA identifies the time
resources at the BSs, which have higher power and computing
resources. Moreover, as we discussed in Section III-D, AFRA
has a very low total computational complexity.
VII. C ONCLUSION
We addressed the problem of proportional fair multi-RAT
traffic aggregation in HetNets. We studied the conventional
PF resource allocation in a single BS and showed that we can
look at the problem as a special type of water-filling. Based
on this observation, we designed a new fully distributed waterfilling algorithm for HetNets. We also studied the convergence,
speed, and optimality of our algorithm. We proved that our
algorithm quickly converges to equilibria and derived tight
bounds to quantify its speed. We also studied the characteristics of the optimal outcome, and used the properties to prove
the outcomes of our algorithm are globally optimal.
[14] Y. Bejerano, S.-J. Han, and L. Li, “Fairness and load balancing in
wireless lans using association control,” in IEEE/ACM Transactions on
Networking, 2007.
[15] L. Li, M. Pal, and Y. Yang, “Proportional fairness in multi-rate wireless
lans,” in Proceedings of IEEE INFOCOM, 2008.
[16] E. Aryafar, A. K. Haddad, C. Joe-Wong, and M. Chiang, “Max-min
fair resource allocation in hetnets: Distributed algorithms and hybrid
architecture,” in Proceedings of IEEE ICDCS, 2017.
[17] D. Ibarra, N. Desai, and I. Demirkol, “Software-based implementation
of LTE/Wi-Fi aggregation and its impact on higher layer protocols,” in
Proceedings of IEEE ICC, 2018.
[18] Y. Khadraoui, X. Lagrange, and A. Gravey, “Implementation of
LTE/WiFi link aggregation with very tight coupling,” in Proceedings
of IEEE PIMRC, 2017.
[19] T. V. Pasca, N. Sen, V. Reddy, B. R. Tamma, and A. Franklin, “A
framework for integrating MPTCP over LWA - a testbed evaluation,”
in Proceedings of ACM WiNTECH, 2018.
[20] Y.-B. Lin, H.-C. Tseng, L.-C. Wang, and L.-J. Chen, “Performance of
splitting LTE-WLAN aggregation,” in Mobile Networks and Applications, Springer, 2018.
[21] D. P. Bertsekas and J. N. Tsitsiklis, “Parallel and distributed computation: numerical methods,” in Englewood Cliffs, NJ: Prentice-Hall, 1989.
[22] D. P. Bertsekas and R. G. Gallager, “Data networks,” in Englewood
Cliffs, NJ: Prentice-Hall, 1987.
[23] X. Lin, N. B. Shroff, and R. Srikant, “A tutorial on cross-layer optimization in wireless networks,” in IEEE Journal on Selected Areas in
Communications, 2006.
[24] W. Wang, X. Liu, J. Vicente, and P. Mohapatra, “Integration gain
of heterogeneous WiFi/WiMAX networks,” in IEEE Transactions on
Mobile Computing, 2011.
[25] Q. Ye, B. Rong, Y. Chen, M. Al-Shalash, C. Caramanis, and J. G.
Andrews, “User association for load balancing in heterogeneous cellular
networks,” in IEEE Transactions on Wireless Communications, 2013.
[26] A. Monderer and L. S. Shapley, “Potential games,” in Games and
Economic Behavior, 1996.
[27] Jensen Inequality, https://en.wikipedia.org/wiki/Jensen%27s_inequality
[28] “WARP Project,” https://warpproject.org/trac
R EFERENCES
[1] “Samsung download booster: use WiFi and LTE simultaneously,”
https://www.pcmag.com/article2/0,2817,2455011,00.asp
[2] 3GPP, “Introduction of LTE-WLAN radio level integration and interworking enhancement,” in 3GPP Technical Report, R2-156737, 2015.
[3] A. Zakrzewska, D. Lopez-Perez, S. Kucera, and H. Claussen, “Dual
connectivity in LTE hetnets with split control and user plane,” in
Proceedings of IEEE GLOBECOM Workshops, 2013.
[4] 3GPP, “Study on new radio (NR) access technology (release 14),” in
3GPP Technical Report, TR 38.912, 2017.
[5] FCC, “2016 broadband progress report,” January 2016.
[6] FCC’s Office of Engineering & Technology and Consumer & Governmental Affairs Bureau, “2015 measuring broadband america fixed
broadband report: A report on consumer fixed broadband performance
in the US,” 2015.
[7] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods
for network utility maximization,” in IEEE Journal on Selected Areas
in Communications, 2006.
[8] F. P. Kelly, A. Maulloo, and D. Tan, “Rate control for communication
networks: shadow prices, proportional fairness and stability,” in Journal
of the Operational Research Society, 1998.
[9] S. Singh, M. Geraseminko, S.-P. Yeh, N. Himayat, and S. Talwar,
“Proportional fair traffic splitting and aggregation in heterogeneous
wireless networks,” in IEEE Communications Letters, 2016.
[10] N. Prasad and S. Rangarajan, “Exploiting dual connectivity in heterogeneous cellular networks,” in Proceedings of IEEE WiOpt, 2017.
[11] A. Stolyar, “On the asymptotic optimality of the gradient scheduling
algorithm for multi-user throughput allocation,” in Operations Research
Journal, 2005.
[12] S.-B. Lee, S. Choudhury, A. Khoshnevis, S. Xu, and S. Lu, “Downlink
MIMO with frequency-domain packet scheduling for 3GPP LTE,” in
Proceedings of IEEE INFOCOM, 2009.
[13] S. Shakkottai, E. Altman, and A. Kumar, “Multihoming of users to
access points in WLANs: a population game perspective,” in IEEE
Journal on Selected Areas in Communication, 2009.
A PPENDIX
To maximize the PF objective function in generic multi-RAT
HetNets we need to solve the following problem
P2 : max
N
X
ωi log(ri )
i=1
s.t.
ri =
M
X
λi,j Ri,j ∀i ∈ N
j=1
N
X
λi,j ≤ 1
∀j ∈ M
i=1
variables:
λi,j ≥ 0
∀i ∈ N, j ∈ M
By capturing the first constraint in the objective function
we can reformulate P2 as
P3 : max
N
X
M
X
(ωi log(
λi,j Ri,j ))
i=1
s.t.
N
X
j=1
λi,j ≤ 1
∀j ∈ M
i=1
variables:
λi,j ≥ 0
∀i ∈ N, j ∈ M
We can use dual decomposition to solve P3 since the
constraints that couple the λi,j variables (i.e., the first line
12
of constraints in P3 ) can be relaxed using Lagrange duality,
and then the optimization problem decouples into several
subproblems that as we show next can be solved distributedly.
Let µj be the Lagrange multiplier for the j th constraint.
Then the Lagrangian of P3 can be written as
L(λ, µ ) =
N
X
M
M
N
X
X
X
(ωi log(
λi,j Ri,j )) +
µj (1 −
λi,j )
i=1
=
N
X
j=1
j=1
i=1
"
#
M
M
M
X
X
X
ωi log(
λi,j Ri,j ) −
µj λi,j +
µj
i=1
j=1
j=1
j=1
(34)
Here λ is the vector of original optimization variables,
which are also referred to as primal variables. The Lagrange
multipliers (µj ) are also referred to as dual variables. The
problem now separates into two levels of optimization [7].
At the lower level, each client i needs to solve the following
Lagrangian subproblem for a given µ
max
λi,j
s.t.
M
M
X
X
µj λi,j
λi,j Ri,j ) −
ωi log(
j=1
j=1
λi,j ≥ 0
∀i ∈ N, j ∈ M
(35)
At a higher level, we have the master dual problem in charge
µ) by solving the following dual
of updating the dual variables (µ
problem:
min
µ
X
µ) +
gi (µ
i
s.t.
M
X
µj
j=1
µ≥0
(36)
µ) is the dual function, obtained as the maximum
where gi (µ
value of the Lagrangian subproblem solved in (35) for a given
µ . This approach solves the dual problem. However, since the
original problem in P3 is convex (and there exists a strictly
feasible solution), solving the dual problem equivalently solves
the primal problem in P3 .
Note that the objective function in (36) is convex and
differentiable. Hence, we can use the following simple gradient
method at each BS j to solve (36):
"
h
µj (t + 1) = µj (t) − γ 1 −
N
X
i
λ∗i,j (t)
#+
(37)
i=1
where λ∗i,j is the solution to (35), t is the iteration index, γ > 0
is a positive step size, and [.]+ denotes the projection into the
non-negative orthant.
As t → ∞, the dual variables converge to the dual
µ(t)) converge to the
optimal µ ∗ and the primal variables λ∗ (µ
optimal primal variable λ∗ . Algorithm DDNUM shown below,
summarizes the above steps.
DDNUM: Dual Decomposition Based Resource Allocation
Inputs: Known Ri,j at each client i for every BS j for which
Ri,j > 0.
Initialization: Set t = 0 and µ (0) to some nonnegative value
for each BS.
• Step 1: Each client i locally solves its Lagrangian problem
µ(t))) for each BS
in (35), i.e., finds its time fractions (λ∗i,j (µ
j with Ri,j > 0, and informs those BSs.
• Step 2: Each BS updates its price according to Eq. (37) and
broadcasts the new price to all its clients (i.e., clients with
Ri,j > 0).
• Step 3: Set t ← t + 1 and go to step 1 (until the satisfying
termination point is reached).