Academia.eduAcademia.edu

Proportional Fair RAT Aggregation in HetNets

2019, arXiv (Cornell University)

Heterogeneity in wireless network architectures (i.e., the coexistence of 3G, LTE, 5G, WiFi, etc.) has become a key component of current and future generation cellular networks. Simultaneous aggregation of each client's traffic across multiple such radio access technologies (RATs) / base stations (BSs) can significantly increase the system throughput, and has become an important feature of cellular standards on multi-RAT integration. Distributed algorithms that can realize the full potential of this aggregation are thus of great importance to operators. In this paper, we study the problem of resource allocation for multi-RAT traffic aggregation in HetNets (heterogeneous networks). Our goal is to ensure that the resources at each BS are allocated so that the aggregate throughput achieved by each client across its RATs satisfies a proportional fairness (PF) criterion. In particular, we provide a simple distributed algorithm for resource allocation at each BS that extends the PF allocation algorithm for a single BS. Despite its simplicity and lack of coordination across the BSs, we show that our algorithm converges to the desired PF solution and provide (tight) bounds on its convergence speed. We also study the characteristics of the optimal solution and use its properties to prove the optimality of our algorithm's outcomes.

arXiv:1906.00284v1 [cs.NI] 1 Jun 2019 Proportional Fair RAT Aggregation in HetNets Ehsan Aryafar Alireza Keshavarz-Haddad Carlee Joe-Wong Portland State University Portland, OR Shiraz University Shiraz, Iran Carnegie Mellon University Silicon Valley, CA Abstract—Heterogeneity in wireless network architectures (i.e., the coexistence of 3G, LTE, 5G, WiFi, etc.) has become a key component of current and future generation cellular networks. Simultaneous aggregation of each client’s traffic across multiple such radio access technologies (RATs) / base stations (BSs) can significantly increase the system throughput, and has become an important feature of cellular standards on multi-RAT integration. Distributed algorithms that can realize the full potential of this aggregation are thus of great importance to operators. In this paper, we study the problem of resource allocation for multi-RAT traffic aggregation in HetNets (heterogeneous networks). Our goal is to ensure that the resources at each BS are allocated so that the aggregate throughput achieved by each client across its RATs satisfies a proportional fairness (PF) criterion. In particular, we provide a simple distributed algorithm for resource allocation at each BS that extends the PF allocation algorithm for a single BS. Despite its simplicity and lack of coordination across the BSs, we show that our algorithm converges to the desired PF solution and provide (tight) bounds on its convergence speed. We also study the characteristics of the optimal solution and use its properties to prove the optimality of our algorithm’s outcomes. I. I NTRODUCTION The increasing demand for wireless data has led to denser and more heterogeneous wireless network deployments. This heterogeneity manifests itself in terms of network deployments across multiple radio access technologies (e.g., 3G, LTE, WiFi, 5G), cell sizes (e.g., macro, pico, femto), and frequency bands (e.g., TV bands, 1.8-2.4 GHz, mmWave), etc. To realize the gains associated with such heterogeneous networks (HetNets), consumer (client) devices are also being equipped with an increasing number of radio access technologies (RATs), and some are already able to simultaneously aggregate the traffic across multiple RATs to increase throughput [1]. To support such traffic aggregation on the network side, the 3GPP (3rd generation partnership project) has been actively developing multi-RAT integration solutions. The introduction of LWA (LTE-WiFi Aggregation) as part of the 3GPP Release 13 [2] was a step in this direction. LWA allows using both LTE and WiFi links for a single traffic flow and is generally more efficient than transport layer aggregation protocols (e.g., MultiPath TCP), due to coordination at lower protocol stack layers. LWA’s design primarily follows the LTE Dual Connectivity (DC) architecture (defined in 3GPP Release 12 [3]), which allows a wireless device to connect to two LTE eNBs that are on different carrier frequencies, and utilize the radio resources that belong to both of them. Currently, the 3GPP is working on a solution to support below IP (layer 2) multiRAT integration across any combination of RATs, including LTE, WiFi, 802.11ad/ay, and 5G New Radio (NR) [4]. The proposed architecture would allow for dynamic traffic splitting across RATs for each client, which can lead to a significant increase in the system performance (e.g., total throughput). However, it is difficult to design resource allocation algorithms for each BS1 that realize the performance benefits of such integrated HetNets. Specifically, (i) backhaul links from different BSs in HetNets show diverse capacity and latency characteristics and depend on the underlying backhauling technology. For example, cable and DSL have on average 28 and 62 ms roundtrip latencies, respectively [5], [6]. The latency can be even higher when a network operator uses a third party ISP to communicate with its BSs (e.g., a mobile operator that uses a wired ISP to control its WiFi BSs). Such latencies make it infeasible for BSs to communicate with each other or a central controller for real-time resource allocation at each BS. As a result, any practical resource allocation algorithm for multiRAT HetNets should be fully distributed (i.e., autonomously executed by each BS). (ii) Resource allocation has many practical constraints. Conventional BS hardware allows only minor modifications to existing resource allocation algorithms through software updates, limiting the algorithm design space. New algorithms should also incur minimal signaling overhead and computational complexity. Distributed algorithms based on the traditional network utility maximization framework [7], [8] do not meet these requirements, because as we will show later through simulations the resulting algorithms are radically different from how conventional BSs operate, have significant over-the-air signaling overhead, and increase the computational complexity on the client side. (iii) In HetNets, each client has access to a client-specific set of RATs, and receives packets at a different PHY rate on each RAT. These rates are naturally different across clients. This multi-rate property of HetNets makes it particularly challenging to design resource allocation algorithms with performance guarantee. As a result, existing solutions in the literature are all limited to simple setups, e.g., when each client has only two RATs as in the case of LWA [9] or LTE DC [10]. In this paper, we study the problem of resource allocation for traffic aggregation in multi-RAT HetNets. We focus on the proportional-fair (PF) fairness objective as it is widely used and implemented in BSs and provides a balance between fairness and throughput [11], [12]. We first consider PF resource allocation in a single BS, and then use our insights 1 We use “BS” generically to mean an LTE eNB, WiFi AP, etc. 2 from this case to design a distributed algorithm that meets our three research challenges. We next show that our algorithm converges to an optimal PF resource allocation. The key contributions are as follows: • Algorithm Design: We study the basics of PF resource allocation in a single BS to gain intuition for the distributed algorithm design. We show that PF resource allocation in a single BS can be viewed as a special type of water-filling. We generalize this observation to a new fully distributed water-filling algorithm (named AFRA) that makes a minor modification to the conventional single BS algorithm and achieves PF in HetNets. • Convergence and Speed: We show that AFRA is guaranteed to converge to an equilibrium as BSs autonomously execute it [Theorem 1] and derive tight bounds on its convergence time (speed) [Theorem 2]. • Optimality: We first show that at optimality, the sum of the inverse water-fill levels across all BSs is equal to the sum of the weights (numbers that show clients’ priorities) across all clients [Theorem 3]. Next, we use this property to prove that any equilibrium outcome of AFRA is globally optimal [Theorem 4]. Finally, we show that at equilibrium the vector of throughput rates across all clients is unique; however, there could be infinitely many resource allocations that realize this outcome [Theorem 5]. • Practicality: We construct a testbed with programmable BS hardware, and show that we can successfully aggregate the throughput across multiple BSs at the MAC layer. We also show that replacing the conventional resource allocation algorithm on each BS with AFRA can substantially increase the system throughput and fairness. • Performance: We conduct extensive simulations to characterize AFRA’s convergence time properties as we scale the number of BSs and clients. We also introduce policies that reduce the convergence time by more than 30%. Finally, we compare the performance of AFRA against DDNUM, a dual decomposition algorithm that we derived from the NUM framework. We show that compared to DDNUM, AFRA is 2-3 times faster with 4-5 times less over-the-air overhead. This paper is organized as follows. We discuss the related work in Section II. We present the system model and details of AFRA in Section III. In Sections IV and V we prove the convergence and optimality of AFRA. We present the results of our experiments, simulations, and comparisons against DDNUM in Section VI. We conclude the paper in Section VII. II. R ELATED W ORK We discuss the related work in the areas of multi-BS communication and distributed optimization, and highlight their differences from this paper. Single-RAT Multi-BS Communication. Prior works have studied the problem of traffic aggregation when a client can simultaneously communicate with multiple same technology BSs. For example, [13] uses game theory to model selfish traffic splitting by each client in WLANs. On the other hand, the resource allocation problem in HetNets is primarily addressed at the BS side. Similarly, [10] proposes an approximation algorithm to address the problem of client association and traffic splitting in LTE DC. Our algorithm (AFRA) goes beyond this and other related work by guaranteeing optimal resource allocation for any number of RATs and BSs. Other works have developed centralized client association algorithms to achieve max-min [14] and proportional fairness [15] in multirate WLANs. In contrast, the problem of resource allocation in HetNets needs to be solved in a fully distributed manner. Multi-RAT Communication. Resource allocation algorithms that realize the capacity gains in HetNets are still in their early stages. The problem of PF resource allocation for LWA was studied in [9]. In the proposed setup, each client has one LTE and one WiFi RAT. Further, there is only a single LTE BS in the network, and each client’s throughput across its WiFi RAT is fixed. Next, the authors propose a water-filling based resource allocation algorithm at the LTE BS that achieves PF. Similarly, we show that the optimal PF resource allocation in a single BS can be interpreted as a form of water-filling. However, we use the observation to design an optimal algorithm for the generic problem with any number of BSs and client RATs, and explicitly model the impact of system dynamics on the throughput that each client gets from every BS. In our prior work [16], we addressed the problem of max-min fair resource allocation in HetNets. However, even with opportunistic centralized network supervision over autonomous resource allocation at each BS we could not optimally solve the problem. Here, we focus on the PF objective, which is commonly implemented in BSs, and show that we can optimally solve the problem in a purely distributed manner. Other works have built testbeds to evaluate the over-the-air performance of MAC-level cross-RAT throughput aggregation [17]–[20]. All these works have relied on conventional scheduling algorithms on each BS and focused on higher layer transport and application performance. We experimentally show that replacing the conventional resource allocation algorithms with AFRA can substantially increase the system throughput and fairness. Distributed Network Utility Maximization (NUM). There is a large body of general results on the mathematics of distributed computation, some of which are summarized in standard textbooks such as [21], [22]. More recently, the framework of NUM [7], [8], [23] has emerged as a mathematical tool to optimize layered network architectures. The framework allows for decomposition of a global optimization problem into subsets of local problems that are carried out distributedly and implicitly solve the global NUM problem. We have derived an alternative distributed algorithm (named DDNUM) by leveraging dual decomposition and the NUM framework. We will show through simulations that DDNUM is 2-3 times slower than AFRA (in terms of convergence time) and increases the over-the-air signaling overhead by 4-5 times. These disadvantages, coupled with the increased client side computational complexity and lack of compatibility with conventional BSs, make NUM-based algorithms impractical for multi-RAT traffic aggregation. 3 III. S YSTEM M ODEL We discuss the system model and the resource allocation algorithm that is autonomously executed by each BS. The total amount of time fractions available to each BS cannot exceed 1. Thus, for the λi,j s to be feasible we have N X A. Network Model λi,j ≤ 1 ∀j ∈ M We consider a HetNet composed of a set of BSs M = {1, ..., M } and a set of clients N = {1, ..., N }. Each BS has a limited transmission range and can only serve clients within its range. Each client has a client-specific number of RATs, and therefore has access to a subset of BSs. We model clients that can aggregate traffic across BSs of the same technology (e.g., LTE DC) with multiple such RATs. Fig 1 shows an example HetNet topology. We assume that clients split their traffic over the BSs and focus on the resource allocation problem at each BS. It is itself a challenging problem to determine which BS to associate with among same technology BSs (e.g., choosing the optimal LTE BS if a client has an LTE RAT). We assume there exists a rule to pre-determine client RAT to BS association. The pre-determination rule could for instance be any load balancing algorithm [24], [25], or based on the received signal strength. Similar to [13]–[16], [24], we assume that the transmission in one BS does not interfere with an adjacent BS. This can be achieved through spectrum separation between BSs that belong to different access networks and frequency reuse among same technology BSs. Fig. 1. A heterogeneous network with 4 access technologies. Each client is in the coverage area of a group of BSs (dotted lines) and can split or aggregate its traffic across the corresponding BSs (RATs). The 3GPP is actively developing several new RATs for both sub-6 GHz and mmWave bands, re-emphasizing the heterogeneity of future wireless networks. λi,j ≥ 0 ∀i ∈ N, j ∈ M (3) TABLE I M AIN N OTATION N and N : Set and number of all clients in the network M and M : Set and number of all BSs in the network Ri,j : PHY rate of client i to BS j Rmax : maximum PHY rate across all clients and BSs Rmin : non-zero minimum PHY rate across all clients and BSs λi,j : Fraction of time allocated to client i by BS j λ: Vector of λi,j s across all clients and BSs ri : Total throughput of client i across all its RATs ωi : A positive number that represents client i’s weight or priority θj : Water-fill level at BS j C. Background: Conventional PF Allocation in a Single BS We first describe the basics of the PF resource allocation that is conventionally implemented in today’s BSs. Consider a network topology consisting of only a single BS j and n0 clients. Let ri denote the throughput of client i and ωi a positive number that denotes its weight (or priority). A widely used objective function for PF is to maximize Pn0 ω log(r i ) [11], [12]. It represents a tradeoff between i=1 i throughput and fairness among the clients. Let λi denote the time fraction allocated to client i by BS j. To maximize the PF objective function, the BS needs to solve the following problem 0 P1 : max n X ωi log(Ri,j λi ) i=1 0 s.t. n X λi ≤ 1 i=1 B. Throughput Model variables: We consider a multi-rate system and use Ri,j to denote the PHY rate of client i to BS j. Since each BS generally serves more than one client, clients of the same BS need to share resources such as time and frequency slots (e.g. in 3/4/5G) or transmission opportunities (e.g. in WiFi). The throughput achieved by client i from BS j thus depends on the load of the BS and will be a fraction of Ri,j . We assume that each BS employs a TDMA throughput sharing model2 and let λi,j denote the fraction of time allocated to client i by BS j. Hence, the throughput achieved by client i from BS j is equal to λi,j Ri,j and its total throughput across all its RATs would be M X Total Throughput of Client i = ri = λi,j Ri,j (1) j=1 2 In (2) i=1 Section VI-A, we discuss how we can extend our model and algorithm to capture practical implementation issues such as WiFi contention. λi ≥ 0 Problem P1 can be easily solved through a simple algorithm. The Lagrangian of P1 can be expressed as 0 L(λ, µ) = n X i=1 0 ωi log(Ri,j λi ) + µ(1 − n X λi ) (4) i=1 where µ is a constant number (Lagrange multiplier) chosen to meet the time resource constraint. Differentiating with respect to time fraction resource λi and setting to zero gives Ri,j ωi ωi − µ = 0 =⇒ = µ ∀i ∈ {1, ..., n0 } Ri,j λi λi (5) Since the sum of time fractions at optimality is equal to 1, P we can conclude from Eq. (5) that µ = ωi . With known µ and ωi , we can derive λi s from Eq. (5). 4 Now, let θj be defined as 1 µ. Leveraging Eq. (5), we have λi = θj ∀i ∈ {1, ..., n0 } =⇒ ωi ri = θj ∀i ∈ {1, ..., n0 } ωi Ri,j (6) Eq. (6) has an interesting water-filling based interpretation: the time allocated to each client is such that the throughput of the client divided by its PHY rate times its weight is the same across all clients. We refer to this ratio (i.e., θj ) as the water-fill level of BS j. In the next section, we will turn this observation in a single BS into a distributed resource allocation algorithm in HetNets. (Fig. 3) summarizes the steps that are autonomously executed by each BS j. There are three main steps in the algorithm: (i) clients are sorted based on the total throughout they receive from other BSs (ri0 ) divided by ωi Ri,j (Line 3), (ii) BS j finds the water-fill level (θj ) and allocates the time resources accordingly (Line 4), and (iii) finally we introduce a randomization parameter to limit concurrent resource adaptation of a single client by multiple BSs (Line 5). D. Distributed Resource Allocation in HetNets There are two approaches to designing a resource allocation algorithm for generic HetNets. One approach, as we show in the Appendix, is to extend the formulation in P1 to include multiple BSs and client RATs, and use dual decomposition to derive a distributed algorithm. This approach converges to the optimal solution; however, the Lagrange multipliers across BSs would no longer correspond to BSs’ water-fill levels. The second approach is to directly generalize the water-filling interpretation to derive an alternative algorithm, which still converges to the optimal solution (Section V) with far less overhead, convergence time, and complexity than the dual decomposition based algorithm (Section VI-C). From Eq. (6), we observe that in a network with only a single BS, the BS allocates its time resources so that the clients who get the time resources reach the same water-fill level (i.e., throughput divided by ωi Ri,j ). Thus, in generic HetNets, if each BS considers the total throughput of each client across all its RATs (i.e., ri ) divided by ωi Ri,j in its water-fill definition, this should lead to a fair distributed algorithm. In other words, each BS j should share its time resources across its clients such that: (1) all clients who get the time resources reach the same water-fill level at BS j (i.e., θj ), and (2) if a client (e.g., i0 ) does not get any time resources from BS j, its ω 0rRi0 0 is i i ,j greater than θj . Fig. 2 illustrates this operation. Fig. 2. There are 4 clients with non-zero PHY rates to BS j. Blue boxes denote contributions to ω rRi by BS j (when it allocates time resources) i i,j and white boxes show contributions to it by other BSs. BS j allocates its time resources so that all clients that get resources achieve the same water-fill level (θj ). Clients that do not get any resources from BS j have a higher ω rRi than θj . Client i3 is one such client in this example. i Fig. 3. Resource allocation algorithm autonomously run by each BS j. We next elaborate on how each BS j finds its water-fill level and its clients’ time resource fractions (Line 4). Let n0 denote the number of clients such that Ri,j > 0. Let ri0 denote the total throughput of client i 0from all BSs other than j. Consider r an ordering in clients’ ωi Rii,j according to Line 3 of AFRA. In order to solve the water-fill problem (i.e., Line 4 of AFRA), we need to find the water-fill level θj , client index k, and time fractions λi,j s such that r0 + λ2,j R2,j r0 + λk,j Rk,j r10 + λ1,j R1,j = 2 = ... = k = θj ω1 R1,j ω2 R2,j ωk Rk,j (7) 0 0 r rk k+1 < θj ≤ (8) ωk Rk,j ωk+1 Rk+1,j k X λi,j = 1, λi,j > 0 (9) i=1 i,j We next turn this idea into a distributed resource allocation algorithm. Consider slotted time for now. Algorithm AFRA We can find these variables with a simple set of linear operations. First, we can find k by checking a set of inequalities 5  r0 ω R 0 2 1 1,j  ω2 R2,j −r1   ≥ 1 ⇒ k = 1 else  R1,j   0ω R  r30 ω1 R1,j 0 r3 2 2,j 0   ω3 R3,j −r1 ω3 R3,j −r2   + ≥ 1 ⇒ k = 2 else  R1,j R2,j   ... r0 0 ω 0 R r 0 0 ω1 R1,j  n −1 n0 −1,j 0 n n  −r10 −rn 0 −1  ω R ω 0R 0 0 0  n n ,j n n ,j  + ... + ≥1  R1,j Rn0 −1,j     ⇒ k = n0 − 1 else     0 k=n r 0 +R r0 1,j In the first inequality, we first check if ω1 1 R1,j ≤ ω2 R22,j . If this is true, from Eq. (7) we conclude that client 2 would r0 have a higher ω2 R22,j than ω1rR11,j even if BS j allocated all its time resources to client 1 (i.e., to the client with minimum ri0 across all n0 clients). As a result k should be equal to ωi Ri,j 1. This procedure (and logic) is continued until k is found. With known k, we can find θj by combining Eqs. (7) and (9) and solving the following linear equation k X θj ωi Ri,j − ri0 =1 Ri,j i=1 (10) With known k and θj , the λi,j s can be found from Eq. (7). AFRA’s Computational Complexity and Message Passing Overhead. We calculate AFRA’s computational complexity in finding the new time resource fractions (λi,j s) for a BS j. Let n0 denote the number of clients with non-zero PHY rates to j. The complexity of sorting clients (Line 3) is O(n0 log(n0 )). The complexity of finding the water-fill level and the new time resource fractions (Line 4) is O(n0 log(n0 )) (with a binary search to find k). Thus, the overall computational complexity is O(n0 log(n0 )). If we assume that each client has on average K RATs, then on average n0 would be equal to KN M . Thus, the computational complexity would also be equal KN to O( KN M log( M )). Each BS uses the total throughput of each client across all its RATs in its calculations to find the water-fill level and the new λi,j s. Each time a client’s time resource (and hence total throughput) is changed, the client needs to inform all BSs to which it is connected about its new total throughput. Thus, the total message passing overhead generated by clients of a single 2 BS is at most equal to O(n0 K), or alternatively O( KMN ). IV. C ONVERGENCE AND S PEED OF AFRA In this section, we investigate the convergence properties of AFRA. We first show that as BSs autonomously execute AFRA, the system converges to an equilibrium. Next, we investigate the convergence time properties of AFRA and provide tight bounds to quantify it. A. Convergence to an Equilibrium Before we discuss convergence, we present a formal definition of an equilibrium. Definition 1 Equilibrium: The vector of time fractions across all the BSs and clients is an equilibrium outcome if none of the BSs can increase its water-fill level through unilateral change of its time resource allocations. Our next theorem guarantees the convergence of AFRA. Theorem 1 Let each BS autonomously execute AFRA. Then, the system converges to an equilibrium, i.e., ∀i ∈ N eq eq and j ∈ M λi,j → λeq i,j , θj → θj , and ri → ri . Proof: Let λ denote the vector of time fractions (λi,j s) PN across all clients and BSs, and f (λ) = i=1 ωi log(ri ) be the potential function. A potential function [26] is a useful tool to analyze equilibrium properties, as it maps the payoff (e.g., throughput) of all clients into a single function. Since the number of clients and BSs is finite, f is bounded. The key step to prove convergence, is to show that each time a BS j adjusts its time fractions (i.e., λi,j s), the potential function (f ) increases. This property coupled with f ’s boundedness guarantees its convergence. We will show later in Eq. (15) that the change in potential function is proportional to the product of the change in water-fill levels and the change in λi,j s. Since f converges (i.e., its variations converge to 0), one or both of these terms should converge to 0. Either of these conditions guarantee the convergence of the λi,j s (and hence, θj s and ri s). Next, we show that each time a BS runs AFRA, f increases. When a BS runs AFRA, it takes some time resources from clients with high ωi rRii,j and distributes them across clients with lower values. To ease the proof presentation, we focus on two clients and follow the changes on f as the BS adjusts the λi,j s dedicated to these clients. Let, i, i0 denote two clients who are currently receiving time resources from BS j. Assume the following initial (old) order between these two clients ri ri0 < (11) ωi Ri,j ωi0 Ri0 ,j Therefore, as BS j executes AFRA it changes the time resources from λi and λi0 to λi +δ and λi0 −δ, respectively. This, only changes the two corresponding terms in the potential function, i.e. f (λ)new − f (λ)old = ωi log(ri + δRi,j )− ωi log(ri ) + ωi0 log(ri0 − δRi0 ,j ) − ωi0 log(ri0 ) = Ri,j Ri0 ,j ωi log(1 + δ ) + ωi0 log(1 − δ ) ri ri0 (12) Let g(δ) denote the variation in potential function, i.e. g(δ) = ωi log(1 + δ Ri,j Ri0 ,j ) + ωi0 log(1 − δ ) ri ri0 (13) Thus, to prove convergence, we need to prove that g(δ) is always positive. We prove this by showing that first g 0 (δ) ≥ 0. This shows that g(δ) is always non-decreasing. Second, we show that g(δ) is positive for very small values of δ. Now 6 the following initial (old) order among the clients 0 g (δ) = ωi 1 Ri,j ri R + δ ri,j i −ω i0 1 Ri0 ,j ri0 R0 − δ ri 0,j i (14) ωi Ri,j ωi0 Ri0 ,j ωi0 Ri0 ,j ωi Ri,j = − = new − new ≥ 0 ri + δRi,j ri0 − δRi0 ,j ri ri0 Here rinew and rinew are the new throughput values for clients 0 i and i0 , respectively. It is new clear that new after BS j adjusts the time ri0 ri resources, we still have ωi Ri,j ≤ ω 0 R 0 . This is because after r new i i ,j 0 BS j reduces λi0 ,j , ω 0iR would be either equal to the new i i0 ,j water-fill level or higher than it (if λi0 ,j = 0). On the other r new hand, ωiiRi,j would be equal to the new water-fill level. As a result, the final term in Eq. (14) is non-negative. Finally, g(δ) is greater than zero for small values of δ because g(δ) Ri,j Ri0 ,j ≈ ωi δ = − ωi0 δ ri ri0 ωi Ri,j ωi0 Ri0 ,j δ( − )>0 ri ri0 riold riold riold q 1 2 ≤ ≤ ... ≤ ωi1 Ri1 ,j ωi2 Ri2 ,j ωiq Riq ,j Taylor Approx (15) When BS j executes AFRA, it adjusts the time fractions in a way that increases the time resources allocated to client i1 . Let i1 denote the increase in client 1’s time resources and = ri1 its new throughput. Let ip denote the change in rinew 1 its new client ip ’s (ip ∈ {i2 , ..., iq }) time resources and rinew p throughput. Hence, we have rinew = ri1 , riold = ri1 − i1 Ri1 ,j 1 1 rinew p PN Proof: Let f (λ) = i=1 ωi log(ri ) be the potential function from the proof of Theorem 1. To compute a bound on the convergence time, we study the increments of f . The key step is to find a lower bound on f ’s increments. Since f increases whenever a BS makes adjustments to its λi,j s, the convergence time is then upper bounded by the difference between the maximum and minimum possible values of f divided by the lower bound on f ’s increments. We take the following steps to find a lower bound on the potential function’s increments. Let {i1 , i2 , ..., iq } denote the set of clients with non-zero PHY rates to BS j and assume = rip + ip Rip ,j ∀ip ∈ {i2 , ..., iq } (18) (19) However, even after BS j adjusts its time resources, i1 would still have the minimum ωi rRii,j across all clients. This is due to the water-fill based operation in AFRA. As a result rip ri1 ≤ ∀ip ∈ {i2 , ..., iq } ωi1 Ri1 ,j ωip Rip ,j Rip ,j ωi Ri ,j =⇒ ≤ 1 1 rip ωip ri1 (20) (21) Next, we find a lower bound on the potential function’s increments f (λ)old − f (λ)new = ωi1 log(1 − q X i Ri ,j Eq. (21) i1 Ri1 ,j )+ ωip log(1 + p p ) ≤ ri1 rip p=2 ωi1 log(1 − q X i Ri ,j ωi1 i1 Ri1 ,j )+ ωip log(1 + p 1 ) ri1 ri1 ωip p=2 (22) i Ri Pq ,j ωi p 1 1 Let W = . Since the p=2 ωip and xp = ri1 ωip logarithm is a concave function, from Jensen’s inequality [27], q X Theorem 2 Consider a HetNet with N clients and M BSs. Then, the number of steps that it takes for AFRA 2 N) ). to converge is upper bounded by O( N M log(M 2 (17) i1 = i2 + i3 + ... + iq B. Convergence Time Definition 2 Discretization Policy: During water-fill calculation by a BS j in AFRA, the time fraction allocated to the client with minimum ωi rRii,j should increase by at least . Otherwise, the BS would not update its time fractions. Based on the above discretization policy, we can derive the following bound on the convergence time. riold p = rip , The last term in the above equation is due to Eq. (11). Before we can derive a bound on convergence time, we need to define a discretization factor on the time fractions (i.e., λi,j s). This technicality is due to the fact that λi,j s in our model are continuous variables, which can cause some BSs to continuously make infinitesimal adjustments to them. These adjustments converge to 0 as time goes to infinity. In practice, operations always happen in discretized levels. For example, consider the following discretization policy: (16) ωip log(1 + xp ) = W p=2 q X ωip p=2 W log(1 + xp ) ≤ q q X X ωi ωi ωip W log( ( p + p xp )) = W log(1 + xp ) W W W p=2 p=2 (23) Leveraging Eq. (23), we conclude that Eq. (22) is =i ≤ ωi1 log(1 − i1 Ri1 ,j ωi Ri ,j ) + W log(1 + 1 1 ri1 W ri1 z }|1 { q X ip ) = p=2 z Taylor Series z2 ωi1 [log(1 − z) + γ log(1 + )] ≤ −ωi1 γ 2 2 z =⇒ f (λ)new − f (λ)old ≥ ωi1 2 (24) 7 R 1 ,j and γ = ωWi . Note that since we seek where z = i1 rii1 1 an upper bound on convergence time, we can choose a small enough i1 so that z, γz < 1. These assumptions increase the upper bound but allow us to use the Taylor series in Eq. (24). If we let Rmin and Rmax denote the minimum and maximum PHY rates across all the clients and BSs, then we have Convergence Time ≤ Part 3. We leverage I and II to derive property III as follows N X ωi = i=1 N X ωi req i=1 Maxf (λ) − Minf (λ) i rieq N X M X ωi λeq i,j Ri,j = Eq. (26) = 1 2 Ri1 ,j 2 2 ωmin  ( ri1 ) rieq i=1 j=1 = X ωi λeq i,j Ri,j rieq eq λi,j >0 M X λeq II X 1 i,j eq = eq θ θ eq j j=1 j (28) λi,j >0 PN ( i=1 ωi )(log(rmax ) − log(rmin )) ≤ 1 2 Rmin 2 2 ωmin  ( M Rmax ) P ( ωi )(log(M Rmax ) − log( ωPmin ωi Rmin )) ≤ We next show that any equilibrium outcome of AFRA is globally optimal, i.e., it maximizes the global PF resource allocation problem. 1 2 Rmin 2 2 ωmin  ( M Rmax ) ≡ O( N M 2 log(M N ) ) 2 (25) V. O PTIMALITY OF AFRA Beyond convergence, we study the optimality properties of AFRA’s equilibria. We first derive some useful properties of the equilibria that we leverage for optimality analysis. Next, we prove that the equilibria also maximize the global proportional fair resource allocation problem across all the BSs, and hence are globally optimal. Finally we discuss the uniqueness of the equilibria and prove that while the equilibrium throughput vector across all the clients is unique, there could be infinitely many resource allocations that realize this outcome. For simplicity, we do not consider discretization in this section. Theorem 3 Consider an equilibrium outcome of AFRA. Let rieq denote the throughput of client i, θjeq the water-fill level of BS j, and λeq i,j the fraction of time allocated to client i by BS j. Then ω R I ireqi,j ≤ θ1eq ∀i ∈ N, j ∈ M PiN eq j II i=1 λi,j = 1 ∀j ∈ M PN PM 1 III i=1 ωi = j=1 θ eq j Theorem 4 Consider an equilibrium outcome of AFRA. Then, the equilibrium outcome also maximizes the global PN PF resource allocation problem, i.e., it maximizes i=1 ωi log(ri ) subject to the feasibility constraints in Eqs. (1)-(3). Proof: Let rieq and θjeq denote the throughput of client i and water-fill level of BS j at an equilibrium, respectively. We prove that for any feasible selection of λi,j s (i.e., λi,j s that satisfy the feasibility conditions in Eqs. (2) and (3)) and the corresponding clients’ throughput values (i.e., ri s as defined in Eq. (1)) we have N X ωi log(ri ) ≤ N X PN Define W = i=1 ωi . Eq. (29) can then be proved through the following inequalities by leveraging properties I and III from Theorem 3: N X W ωi log(ri ) − i=1 N X N X ωi log(rieq ) = i=1 rieq = θjeq ωi Ri,j rieq = 0 =⇒ ≥ θjeq ωi Ri,j Ri,j > 0, λeq i,j (26) W log( ωi log( ri )= rieq ωi ri Jensen Inequality ωi ri log( eq ) ≤ W log( ( × eq )) = W ri W ri i=1 N M X ωi ri X X ωi λi,j Ri,j 1 1 × ( eq )) = W log( × ) W i=1 ri W i=1 j=1 rieq N X M M I III X X 1 λi,j Eq. (2) 1 1 ≤ W log( × ) ≤ W log( × eq eq ) = W i=1 j=1 θj W j=1 θj N (27) Property I follows from Eqs. (26) and (27). Part 2. Every BS can always increase its water-fill level by distributing its unused time resources across its clients. The property follows, since at equilibrium the water-fill levels cannot be further increased. N X i=1 N X N Ri,j , λeq i,j > 0 =⇒ (29) i=1 i=1 i=1 Proof: Part 1. From the water-fill definition we have ωi log(rieq ) W log( X 1 × ωi ) = 0 W i=1 (30) In our last theorem we prove that while the equilibrium throughput vector across all clients is unique, there could be infinitely many resource allocations that realize this outcome. 8 eq Theorem 5 Let req = (r1eq , ..., rN ) denote the vector of throughput rates across all clients at an equilibrium. Then, req is unique. However, there could be infinitely many resource allocations across the BSs that realize req . Proof: Part 1. We first prove that req is unique. Let r maximize the global proportional-fair resource allocation across all clients and assume r0eq is a different equilibria. From Theorem 4, we know that every other equilibrium should also maximize the global PF resource allocation. This means that all inequalities in Eq. (30) should be equalities for any equilibrium, including r0eq . Now, for the first inequality to be an equality (i.e., Jensen inequality of Eq. (30)), the following condition needs to be satisfied [27] eq r2eq rneq r1eq =⇒ ri0eq = α rieq ∀i ∈ N (31) = = ... = r10eq r20eq rn0eq PN PN eq 0eq Further, since i=1 ωi log(ri ) = i=1 ωi log(ri ), we conclude that rieq = ri0eq ∀i ∈ N (32) Part 2. To prove that there could be infinitely many resource allocations that realize req , we provide an example. Consider a topology with two BSs (j1 , j2 ) and two clients (i1 , i2 ). Let Ri1 ,j = 1P∀j ∈ M, Ri2 ,j = 2 ∀j ∈ M, and ωi1 = ωi2 = 2. Then, ωi log(ri ) is maximized by the following time fractions for any α ∈ [0 1]. λi,j = α for i = i1 , j = j1 and i = i2 , j = j2 λi,j = 1 − α for i = i1 , j = j2 and i = i2 , j = j1 (33) Here, irrespective of α, ri1 = 1 and ri2 = 2. VI. P ERFORMANCE E VALUATION In this section, we evaluate AFRA’s performance through experiments and simulations. First, we investigate the benefits of MAC level traffic aggregation in a small testbed composed of four SDR (software-defined radio)-based BSs and clients. Next, we conduct simulations to evaluate AFRA’s equilibria properties as we scale the number of clients and BSs. Finally, we compare AFRA’s speed and over-the-air signaling overhead against DDNUM, a dual decomposition based algorithm that we derived from the NUM framework. A. SDR-Based Implementation and Real-World Performance Implementation. We construct a HetNet topology composed of a WiFi BS, a cellular BS, and two clients. The two BSs are physically separated from each other and are placed in an indoor lab environment (Fig. 4(a)). We use a WARP board [28] with 802.11a reference design as our WiFi BS. We use another WARP board with OFDM PHY (WARP OFDM reference design) and a custom TDMA (Time Division Multiple Access) MAC to mimic a cellular BS. We use two other WARP boards to construct our two clients. Each client has access to both WiFi and cellular radios, and remains static and connected to both BSs throughout the experiments. A server running iPerf sessions is connected to both BSs through Ethernet. For each client, the server generates a single fully-backlogged UDP traffic flow with 500 byte packets. We implement a below-IP sublayer to split this traffic flow between the two BSs. This sublayer is responsible for selection of the BS to be used for each packet, and acts similar to the LWA Adaptation Protocol (LWAAP) in the LWA standard [2]. In our implementation, we sequentially iterate between the WiFi and cellular BSs to route the packets of each traffic flow. AFRA, as presented in Section III-D, does not account for various types of overhead (e.g., PHY/MAC header, ACKs, idle slots, collisions) that exist in PHY/MAC protocols. To address the issue, we introduce the notion of effective rate (Reff ) and eff replace all Ri,j s in AFRA with Ri,j s. For a single packet, Reff can be calculated as the number of bits in the packet divided by the total time it takes by a BS to successfully transmit that packet (including all overhead). In our implementation, each BS keeps track of the total time spent in successfully transmitting the past 5 packets of each traffic flow (i.e., the past eff 5 packets of each client) to calculate its Ri,j . The averaging over 5 packets is to account for channel fluctuations in our experiments, and can be adjusted based on the client mobility. We implement the following mechanisms: (i) WiFi only: the cellular BS is off but the WiFi BS is active; (ii) Cellular only: WiFi BS is off; (iii) AGG-RR: this scheme uses aggregation but with a round robin (RR) scheduler at the WiFi BS and conventional PF MAC at the cellular BS. With the RR scheduler, the WiFi BS maintains a different queue for each client and sequentially serves a single packet from each queue at every round. With the PF MAC at the cellular BS, the BS dedicates its time resources to each client according to Section III-C (single BS PF); (iv) AFRA: each BS uses its calculated λi,j s to determine the number of packets that should be served from each queue in WiFi and the number of time slots that should be dedicated to each queue (client) in cellular, at every round. In our implementation, both clients’ ωi are equal to 1 and the BSs updates their λi,j s every 5 ms. Performance Results. Fig. 4(c) shows the performance of the four schemes. In both the WiFi only and Cellular only options, only a single BS is active throughout the experiments. We observe that the Cellular only scheme provides a higher sum throughput than the WiFi only scheme. With careful evaluation of packet transmission traces, we discovered that this higher throughput is primarily due to the corresponding MAC protocols. In particular, WiFi MAC provides the same transmission opportunity to each traffic flow (client). As a result, the client with lower PHY rate occupies the channel for a longer duration that the other client. This decreases the throughput for both clients. In contrast, the cellular TDMA MAC provides the same transmission time for both clients (with 2 clients, single BS PF equally divides the time between the clients (Eq. 5)). As a result, the throughput of the client with higher PHY rate does not drop because of the client with a lower PHY rate. This, along with other MAC issues such as 9 50 0 20 40 60 80 100 Number of Clients (N) (a) 100 75 N=10 N=20 N=50 50 25 0 20 40 60 80 100 Number of BSs (M) (b) M=10, N=10 45 Run1 Run2 Run3 Run4 Run5 Priority 40 35 0 5 10 15 20 25 30 Step Number (c) Potential Function 100 M=10 M=20 M=50 Potential Function 150 Avg Num of Steps Avg Num of Steps Fig. 4. We use two WARP boards to construct two BSs in our testbed. The BSs are connected to a server through Ethernet. The server runs a single fully-backlogged DL UDP iPerf session to each client. A sublayer implementation below the IP layer at the server, selects the BS for each packet of every traffic flow. The clients (not shown in the photo) have access to both radios, and remain static and connected to both BSs throughput the experiments (a); Cellular TDMA and WiFi MACs. The PHY header and ACKs are sent at a fixed transmission rate. Clients embed the throughput they receive from other BS in their ACK packets. The MAC header and payload are transmitted at a variable transmission rate. We define Ref f ef f as the total number of payload bits divided by the total time it takes to successfully transmit a packet. We replace all Ri,j s in AFRA with Ri,j to derive the λi,j s and determine the number of packets that should be served from each queue (b); Total throughput across the two clients for four schemes: WiFi only (WiFi), Cellular only (Cellular), AGG-RR, and AFRA. AFRA achieves a higher average total throughput (29 Mbps vs 20 Mbps) and PF index (2.3 vs 1.97) compared to AGG-RR(c); Per-client throughput values for both AFRA and AGG-RR (d). M=10, N=20 60 Run1 Run2 Run3 Run4 Run5 Priority 55 50 45 40 0 5 10 15 20 25 30 Step Number (d) Fig. 5. AFRA’s performance evaluation results. Average number of steps to convergence as a function of number of clients (a) and number of BSs (b). Evolution of potential function for two simulation scenarios one with M=10, N=10 (c) and the other with M=10, N=20 (d). Each Run in these figures corresponds to a different simulation realization. In the priority curves (solid black curve with * markers), the BS with the highest local increase in potential function gets priority in executing AFRA. Leveraging this policy reduces the average convergence time by more than 30%. WiFi contention reduce the WiFi only throughput. Fig. 4(c) also shows that the two RAT aggregation schemes (AGG-RR and AFRA) can successfully aggregate WiFi and cellular capacities and provide a higher sum throughput than the WiFi only and Cellular only options. Further, AFRA increases the average total throughput by 45% (from 20 to 29 Mbps) with 18 and 11 Mbps per-client total throughput values (per-client throughput plots are shown in Fig. P24(d)). Let us define the proportional fairness index as PF = i=1 log(ri ) (ri is the total throughput of each client across its RATs in Mbps). Then, the PF index in AFRA would be 2.3. With AGG-RR, the per-client throughput rates drop to 12.5 and 7.5 Mbps. Thus, the PF index reduces to 1.97. AGG-RR uses the conventional scheduling algorithms on each BS (i.e., it uses RR in WiFi and single BS PF in cellular), which reduce both the sum throughput and the PF fairness index. B. AFRA’s Equilibria Properties Setup. We simulated network deployments with N clients and M BSs to evaluate AFRA’s equilibria properties as we scale the number of clients and BSs. All clients’ ωi s are equal to 1. Half of the BSs are WiFi and the other half are cellular. Each client has access to 4 RATs, two WiFi and two cellular. The PHY rates for the WiFi and cellular RATs are randomly selected from the sets {1, 2, 5.5, 11} Mbps and {5.2, 10.3, 25.5, 51} Mbps, respectively. In each simulation realization, we randomly associate clients’ RATs with BSs. Next, we run AFRA until an equilibrium is reached. We set the discretization factor  equal to 0.05, i.e., a BS adjusts its time fractions only if the increase in time fraction (i.e., λi,j ) at its client with minimum ωi rRii,j is greater than or equal to 0.05. For the initial allocation, each BS equally divides its time across its clients. Unless otherwise specified, each of our simulation points is an average of 100 simulation realizations. AFRA’s Convergence Time. Figs. 5(a) and 5(b) depict the impact of the number of clients and BSs on AFRA’s convergence time. In each of these figures, we count the number of steps until convergence is reached. At each step, a single BS that needs to adjust its time fractions is randomly selected. In Fig. 5(a), we vary the number of clients from 10 to 100 and plot the corresponding convergence times for three different M values: 10, 20, and 50. We repeat this simulation by changing the N and M variables and plot the corresponding results in Fig. 5(b). From these two figures, we observe that time to convergence is highest when the number of clients is between one to two times the number of BSs. As the ratio N between the number of clients and BSs (i.e., M ) leaves this range, the convergence time rapidly drops and then stabilizes. The results show that AFRA requires a small number of steps to reach an equilibrium. Policies to Further Reduce AFRA’s Convergence Time. Our next goal is to design policies that can further reduce AFRA’s convergence time. To gain intuition on how to design such policies, we simulated a topology with 10 clients and 10 BSs P and plotted the evolution of the potential function (i.e., i log(ri )) as BSs adjusted their time fractions. The 10 results are shown in Fig. 5(c). Here, each Run corresponds to a different simulation realization. From these realizations we make two observations. First, there is a wide gap in the convergence times. Second, a high jump in the potential function pushes the system closer to equilibrium. Based on these observations, we designed a prioritization policy among the BSs to reduce the convergence time. We let each BS calculate the increase in the potential function assuming that it is the only BS executing AFRA. Since in AFRA each BS knows the current total throughput of its clients, it has all the needed information to calculate the increase in the potential function due to its action. Next, each BS broadcasts its calculated value. Finally, the BS with the highest value gets priority in executing AFRA. This distributed policy can be easily implemented in networks where all the BSs are connected to the same backbone (e.g., Ethernet). The solid black curve in Fig. 5(c) shows the potential function’s evolution with this policy. We observe that on average, the convergence time drops from 15 steps to 10, i.e., the prioritization policy reduces the convergence time by 33%. We repeated this simulation for another setup with 20 clients to increase the topological redundancy. The results are plotted in Fig. 5(d). Similarly, the average convergence time reduces from 19 steps to 13, i.e., a 32% reduction in convergence time. C. Comparison Against DDNUM We have compared AFRA’s performance against DDNUM, a distributed algorithm that we developed by leveraging dual decomposition and the NUM framework. Dual decomposition is appropriate to solve the multi-RAT PF allocation problem, because the coupling constraint (Eq. (2)) can be relaxed through the dual problem and then the problem decouples into subproblems that can be iteratively solved by clients and BSs. DDNUM is in essence similar to the standard dual algorithm presented in [7] to solve the basic NUM problem. We modified the algorithm in [7] to capture the constraints of our problem. At a high level, DDNUM has three main steps (for detailed algorithm derivation and discussions, refer to the Appendix): • Step 1: Initialization: set t = 0 and µ (0) to some nonnegative value for each BS. Here, µ (t) is the vector of Lagrange multipliers that shows the cost or congestion across all BSs. Each BS broadcasts its µj (0) to clients with Ri,j > 0. • Step 2: Each client i locally solves its Lagrangian problem, i.e., finds its time fractions (λ∗i,j (µj (t))) for each BS with Ri,j > 0 and informs those BSs. • Step 3: Each BS updates its price with a step size γ and broadcasts the new price µj (t + 1) to all its clients. This procedure is repeated until a satisfying termination point is reached (e.g., the solution is within a desired proximity of the optimal solution). Similar to AFRA, DDNUM is guaranteed to converge and maximize the global optimization problem. However, there are several practicality and performance issues. We highlight a few of these issues next. Setup. To compare AFRA to DDNUM, we used the simulation setup in Section VI-B (without the BS prioritization policy). We first run AFRA and let the system converge to (a) (b) Fig. 6. Compared to AFRA, DDNUM increases the average convergence time by 2.4x and the average over-the-air signaling overhead by 4.5x. an equilibrium. Next, we consider the 95% value of AFRA’s potential function at equilibrium as the desired algorithm termination point. We count the number of steps to reach the termination point and the resulting over-the-air signaling overhead in each of these two schemes. In DDNUM, the step size γ (step 3) provides a balance between the final throughput values and speed. We choose the γ that results in the fastest convergence time, subject to the potential function reaching the termination point. Finally, both AFRA and DDNUM can operate in either parallel or sequential mode with similar relative performance. We present the sequential mode results, i.e., at each time only a single BS adjusts its water-fill level (in AFRA) or announces a new price (in DDNUM). We assume that clients immediately update their BSs about their new throughput values (in AFRA) and desired λi,j s (in DDNUM) with no impact on the convergence time (similar to an FDD system in which uplink data is immediately available). Speed. Fig. 6(a) show the convergence time results for a scenario with 10 BSs and varying number of clients. We observe that irrespective of the number of clients, DDNUM increase the convergence time by a factor of 2-3x with an average of 2.4x. In AFRA, each BS simultaneously calculates the water-fill level and finds the corresponding time fraction for each client. In DDNUM, the pricing mechanism requires a high number of iterations so that clients can find their optimal time fractions. This increases the convergence time. Over-the-Air Overhead. Fig. 6(b) shows the wireless signaling overhead results of the two schemes. We observe that DDNUM increases the signaling overhead by a factor of 45x with an average of 4.5x. There are several factors that contribute to DDNUM’s high signaling overhead. First, the increases in convergence time results in a similar multiplicative increase in overhead. Second, in DDNUM both BSs and clients contribute to overhead. BSs continuously broadcast new prices and clients continuously inform each of their BSs about their desired time fractions. In contrast, in AFRA only clients update the BSs regarding their new throughput values. Third, with careful examination of simulation traces, we observed that in AFRA the water-fill operation only impacts a few of a BS’s clients each time. In DDNUM, each time a BS updates its price, most of its clients would request new time fractions. Practicality. In DDNUM, each BS broadcasts its price while each client finds its desired λ∗i,j s from its BSs. However, in real wireless systems BSs are responsible for resource allocation. Note that in DDNUM, it is not practical to shift 11 the calculation of λ∗i,j s (i.e., step 2) to BSs. This is because in order for a BS j to find the λ∗i,j s for each of its clients (e.g., i), it would require knowledge about the client’s Ri,j and µj to every other BS for which the client’s rate (i.e., Ri,j ) is greater than zero. This information is only available at the client and pushing it to the BS would significantly increase the overhead, which is already very high in DDNUM. Complexity. In DDNUM, each client has to solve a complex Lagrangian subproblem to find its desired time fraction for each BS (step 2). This increases the computational complexity on the client devices. In contrast, AFRA identifies the time resources at the BSs, which have higher power and computing resources. Moreover, as we discussed in Section III-D, AFRA has a very low total computational complexity. VII. C ONCLUSION We addressed the problem of proportional fair multi-RAT traffic aggregation in HetNets. We studied the conventional PF resource allocation in a single BS and showed that we can look at the problem as a special type of water-filling. Based on this observation, we designed a new fully distributed waterfilling algorithm for HetNets. We also studied the convergence, speed, and optimality of our algorithm. We proved that our algorithm quickly converges to equilibria and derived tight bounds to quantify its speed. We also studied the characteristics of the optimal outcome, and used the properties to prove the outcomes of our algorithm are globally optimal. [14] Y. Bejerano, S.-J. Han, and L. Li, “Fairness and load balancing in wireless lans using association control,” in IEEE/ACM Transactions on Networking, 2007. [15] L. Li, M. Pal, and Y. Yang, “Proportional fairness in multi-rate wireless lans,” in Proceedings of IEEE INFOCOM, 2008. [16] E. Aryafar, A. K. Haddad, C. Joe-Wong, and M. Chiang, “Max-min fair resource allocation in hetnets: Distributed algorithms and hybrid architecture,” in Proceedings of IEEE ICDCS, 2017. [17] D. Ibarra, N. Desai, and I. Demirkol, “Software-based implementation of LTE/Wi-Fi aggregation and its impact on higher layer protocols,” in Proceedings of IEEE ICC, 2018. [18] Y. Khadraoui, X. Lagrange, and A. Gravey, “Implementation of LTE/WiFi link aggregation with very tight coupling,” in Proceedings of IEEE PIMRC, 2017. [19] T. V. Pasca, N. Sen, V. Reddy, B. R. Tamma, and A. Franklin, “A framework for integrating MPTCP over LWA - a testbed evaluation,” in Proceedings of ACM WiNTECH, 2018. [20] Y.-B. Lin, H.-C. Tseng, L.-C. Wang, and L.-J. Chen, “Performance of splitting LTE-WLAN aggregation,” in Mobile Networks and Applications, Springer, 2018. [21] D. P. Bertsekas and J. N. Tsitsiklis, “Parallel and distributed computation: numerical methods,” in Englewood Cliffs, NJ: Prentice-Hall, 1989. [22] D. P. Bertsekas and R. G. Gallager, “Data networks,” in Englewood Cliffs, NJ: Prentice-Hall, 1987. [23] X. Lin, N. B. Shroff, and R. Srikant, “A tutorial on cross-layer optimization in wireless networks,” in IEEE Journal on Selected Areas in Communications, 2006. [24] W. Wang, X. Liu, J. Vicente, and P. Mohapatra, “Integration gain of heterogeneous WiFi/WiMAX networks,” in IEEE Transactions on Mobile Computing, 2011. [25] Q. Ye, B. Rong, Y. Chen, M. Al-Shalash, C. Caramanis, and J. G. Andrews, “User association for load balancing in heterogeneous cellular networks,” in IEEE Transactions on Wireless Communications, 2013. [26] A. Monderer and L. S. Shapley, “Potential games,” in Games and Economic Behavior, 1996. [27] Jensen Inequality, https://en.wikipedia.org/wiki/Jensen%27s_inequality [28] “WARP Project,” https://warpproject.org/trac R EFERENCES [1] “Samsung download booster: use WiFi and LTE simultaneously,” https://www.pcmag.com/article2/0,2817,2455011,00.asp [2] 3GPP, “Introduction of LTE-WLAN radio level integration and interworking enhancement,” in 3GPP Technical Report, R2-156737, 2015. [3] A. Zakrzewska, D. Lopez-Perez, S. Kucera, and H. Claussen, “Dual connectivity in LTE hetnets with split control and user plane,” in Proceedings of IEEE GLOBECOM Workshops, 2013. [4] 3GPP, “Study on new radio (NR) access technology (release 14),” in 3GPP Technical Report, TR 38.912, 2017. [5] FCC, “2016 broadband progress report,” January 2016. [6] FCC’s Office of Engineering & Technology and Consumer & Governmental Affairs Bureau, “2015 measuring broadband america fixed broadband report: A report on consumer fixed broadband performance in the US,” 2015. [7] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” in IEEE Journal on Selected Areas in Communications, 2006. [8] F. P. Kelly, A. Maulloo, and D. Tan, “Rate control for communication networks: shadow prices, proportional fairness and stability,” in Journal of the Operational Research Society, 1998. [9] S. Singh, M. Geraseminko, S.-P. Yeh, N. Himayat, and S. Talwar, “Proportional fair traffic splitting and aggregation in heterogeneous wireless networks,” in IEEE Communications Letters, 2016. [10] N. Prasad and S. Rangarajan, “Exploiting dual connectivity in heterogeneous cellular networks,” in Proceedings of IEEE WiOpt, 2017. [11] A. Stolyar, “On the asymptotic optimality of the gradient scheduling algorithm for multi-user throughput allocation,” in Operations Research Journal, 2005. [12] S.-B. Lee, S. Choudhury, A. Khoshnevis, S. Xu, and S. Lu, “Downlink MIMO with frequency-domain packet scheduling for 3GPP LTE,” in Proceedings of IEEE INFOCOM, 2009. [13] S. Shakkottai, E. Altman, and A. Kumar, “Multihoming of users to access points in WLANs: a population game perspective,” in IEEE Journal on Selected Areas in Communication, 2009. A PPENDIX To maximize the PF objective function in generic multi-RAT HetNets we need to solve the following problem P2 : max N X ωi log(ri ) i=1 s.t. ri = M X λi,j Ri,j ∀i ∈ N j=1 N X λi,j ≤ 1 ∀j ∈ M i=1 variables: λi,j ≥ 0 ∀i ∈ N, j ∈ M By capturing the first constraint in the objective function we can reformulate P2 as P3 : max N X M X (ωi log( λi,j Ri,j )) i=1 s.t. N X j=1 λi,j ≤ 1 ∀j ∈ M i=1 variables: λi,j ≥ 0 ∀i ∈ N, j ∈ M We can use dual decomposition to solve P3 since the constraints that couple the λi,j variables (i.e., the first line 12 of constraints in P3 ) can be relaxed using Lagrange duality, and then the optimization problem decouples into several subproblems that as we show next can be solved distributedly. Let µj be the Lagrange multiplier for the j th constraint. Then the Lagrangian of P3 can be written as L(λ, µ ) = N X M M N X X X (ωi log( λi,j Ri,j )) + µj (1 − λi,j ) i=1 = N X j=1 j=1 i=1 " # M M M X X X ωi log( λi,j Ri,j ) − µj λi,j + µj i=1 j=1 j=1 j=1 (34) Here λ is the vector of original optimization variables, which are also referred to as primal variables. The Lagrange multipliers (µj ) are also referred to as dual variables. The problem now separates into two levels of optimization [7]. At the lower level, each client i needs to solve the following Lagrangian subproblem for a given µ max λi,j s.t. M M X X µj λi,j λi,j Ri,j ) − ωi log( j=1 j=1 λi,j ≥ 0 ∀i ∈ N, j ∈ M (35) At a higher level, we have the master dual problem in charge µ) by solving the following dual of updating the dual variables (µ problem: min µ X µ) + gi (µ i s.t. M X µj j=1 µ≥0 (36) µ) is the dual function, obtained as the maximum where gi (µ value of the Lagrangian subproblem solved in (35) for a given µ . This approach solves the dual problem. However, since the original problem in P3 is convex (and there exists a strictly feasible solution), solving the dual problem equivalently solves the primal problem in P3 . Note that the objective function in (36) is convex and differentiable. Hence, we can use the following simple gradient method at each BS j to solve (36): " h µj (t + 1) = µj (t) − γ 1 − N X i λ∗i,j (t) #+ (37) i=1 where λ∗i,j is the solution to (35), t is the iteration index, γ > 0 is a positive step size, and [.]+ denotes the projection into the non-negative orthant. As t → ∞, the dual variables converge to the dual µ(t)) converge to the optimal µ ∗ and the primal variables λ∗ (µ optimal primal variable λ∗ . Algorithm DDNUM shown below, summarizes the above steps. DDNUM: Dual Decomposition Based Resource Allocation Inputs: Known Ri,j at each client i for every BS j for which Ri,j > 0. Initialization: Set t = 0 and µ (0) to some nonnegative value for each BS. • Step 1: Each client i locally solves its Lagrangian problem µ(t))) for each BS in (35), i.e., finds its time fractions (λ∗i,j (µ j with Ri,j > 0, and informs those BSs. • Step 2: Each BS updates its price according to Eq. (37) and broadcasts the new price to all its clients (i.e., clients with Ri,j > 0). • Step 3: Set t ← t + 1 and go to step 1 (until the satisfying termination point is reached).