Academia.eduAcademia.edu

Universal Beamforming: A Deep RFML Approach

Proceedings of the International Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems on International Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems

We introduce, design, and evaluate a set of universal receiver beamforming techniques. Our approach and system DEFORM, a Deep Learning (DL)-based RX beamforming achieves significant gain for multi-antenna RF receivers while being agnostic to the transmitted signal features (e.g., modulation or bandwidth). It is well known that combining coherent RF signals from multiple antennas results in a beamforming gain proportional to the number of receiving elements. However in practice, this approach heavily relies on explicit channel estimation techniques, which are link specific and require significant communication overhead to be transmitted to the receiver. DEFORM addresses this challenge by leveraging Convolutional Neural Network to estimate the channel characteristics in particular the relative phase to antenna elements. It is specifically designed to address the unique features of wireless signals complex samples, such as the ambiguous 2 phase discontinuity and the high sensitivity of the link Bit Error Rate. The channel prediction is subsequently used in the Maximum Ratio Combining algorithm to achieve an optimal combination of the received signals. While being trained on a fixed, basic RF settings, we show that DEFORM's DL model is universal, achieving up to 3 dB of SNR gain for a twoantenna receiver in extensive evaluation demonstrating various settings of modulations and bandwidths. CCS CONCEPTS • Networks → Wireless access networks; • Computing methodologies → Neural networks.

Universal Beamforming: A Deep RFML Approach Hai N. Nguyen Guevara Noubir [email protected] Khoury College of Computer Sciences Northeastern University [email protected] Khoury College of Computer Sciences Northeastern University ABSTRACT used spatial filtering technique to steer RF emissions towards/from other devices. It enables array and diversity gains of multiple-inputsingle-output (MISO) systems [7, 27], and canceling interference from unwanted sources. Beamforming was extensively investigated in a variety of applications such as in radars, sonars, acoustics, astronomy, and even medical devices design [18]. Beamforming and more advanced MIMO techniques are widely used in systems targeting high throughput and spectral efficiency such as cellular systems since the third generation 3GPP 3G, and IEEE 802.11n. Extensive research was conducted on beamforming [4, 19, 28, 30, 34]. However, beamforming in today’s systems requires explicit mechanisms such as sounding and feedback in 802.11, Demodulation Reference Signal (DMRS) in 5G, training sequences, etc. This approach introduces critical limitations: Firstly, it results in additional overhead to transmit reference signals and estimate the channel. Secondly, accurate channel estimation typically exhibits long delay that is undesirable in fast-changing channels. Thirdly, it requires compatibility between the transmitter and receiver (to agree on when, what, and how reference signals are transmitted). Advances in Deep Learning (DL) techniques and models have recently achieved great success in numerous areas such as computer vision [8], speech recognition [13], and wireless communications [21]. A deep neural network can analyze complex patterns in raw I/Q data collected from the RF front-end and accurately predict channel characteristics in a very short amount of time. Such capabilities eliminate the requirement of TX-RX compatibility and are crucial for a universal beamforming component that support the communications of different technologies operating over the same spectrum. However, despite of the great potentials, most of prior work on DL-based RX beamforming [16, 32] is limited to analytical and simulation results, and lack the goals of universality. In this work, we introduce a set of techniques, and their extensive experimental evaluations demonstrating the practicality of universal beamforming for wireless receivers using Deep Learning. Our approach and system denoted DEFORM, is agnostic to the specifics of transmitted signals such as modulation, bandwidth or standard. DEFORM is designed around a deep convolutional neural network (CNN) augmented with a Maximum Ratio Combiner (MRC). It is specifically designed to address the unique features of wireless signals complex samples, such as the 2𝜋 phase ambiguous discontinuity. Also, RF links targeting low Bit Error Rates (e.g., below 10 −4 ) are sensitive to the typical variations and outliers in the continuous-valued estimations of neural networks [3]. DEFORM addresses these challenges successfully achieving the optimal beamforming gain. Our contributions can be summarized: We introduce, design, and evaluate a set of universal receiver beamforming techniques. Our approach and system DEFORM, a Deep Learning (DL)-based RX beamforming achieves significant gain for multi-antenna RF receivers while being agnostic to the transmitted signal features (e.g., modulation or bandwidth). It is well known that combining coherent RF signals from multiple antennas results in a beamforming gain proportional to the number of receiving elements. However in practice, this approach heavily relies on explicit channel estimation techniques, which are link specific and require significant communication overhead to be transmitted to the receiver. DEFORM addresses this challenge by leveraging Convolutional Neural Network to estimate the channel characteristics in particular the relative phase to antenna elements. It is specifically designed to address the unique features of wireless signals complex samples, such as the ambiguous 2𝜋 phase discontinuity and the high sensitivity of the link Bit Error Rate. The channel prediction is subsequently used in the Maximum Ratio Combining algorithm to achieve an optimal combination of the received signals. While being trained on a fixed, basic RF settings, we show that DEFORM’s DL model is universal, achieving up to 3 dB of SNR gain for a twoantenna receiver in extensive evaluation demonstrating various settings of modulations and bandwidths. CCS CONCEPTS · Networks → Wireless access networks; · Computing methodologies → Neural networks. KEYWORDS Universal RX beamforming; deep learning; RFML ACM Reference Format: Hai N. Nguyen and Guevara Noubir. 2022. Universal Beamforming: A Deep RFML Approach. In Proceedings of the International Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems (MSWiM ’22), October 24–28, 2022, Montreal, QC, Canada. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3551659.3559041 1 INTRODUCTION The success of wireless communications was accompanied by a dramatic crowding of the RF spectrum. Beamforming is a widely Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada © 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-9479-6/22/10. . . $15.00 https://doi.org/10.1145/3551659.3559041 • A design, model, and algorithm for a novel Deep Learning-based universal receiver beamformer (DEFORM). To the best of our knowledge, our work is the first in the literature that leverages DL to enable multi-antenna receivers for beamforming RF signals 165 MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada Receiver Double-output selection (Section 3.2) Stabilizing optimization (Section 3.3) DEFORM Data Decoder h2 Amplitude estimation (Section 2.2) 1 sample 2 samples 8 samples 32 samples 128 samples 0.25 h1 0.20 Beamforming weights calculation 0.10 0.10 0.05 0.05 0 Figure 1: DEFORM’s beamforming workflow 0.20 0.15 0.15 Combiner 2 4 6 SNR (dB) 8 10 0.00 (a) Two antennas irrespective of modulations or bandwidths (and not requiring any explicit mechanisms such as sounding). • A two-antenna SDR [1] receiver prototype leveraging DEFORM that supports arbitrary and unseen modulations and bandwidths. DEFORM is trained on a fixed basic RF settings (BPSK, (1 MHz) bandwidth, in a cable environment) and bolstered with efficient RF dataset collection process with phased augmentation. • DEFORM is extensively evaluated (B/Q/8-PSK, 16-QAM, GMSK with different bandwidths) in cabled setup with emulated fading channel effects. DEFORM achieved the optimal 3 dB gain of a two antenna receiver. 2 Consider a transmitter and a receiver communicating through a slow fading channel. The receiver has 𝑁 antenna elements, where we model the signal 𝑅𝑖 received by antenna 𝑖 consisting of the transmitted signal 𝑆 adjusted by the channel gain ℎ𝑖 and the additive Gaussian noise 𝑁𝑖 : (1) Receiver beamforming aims to leverage the diversity of the independent wireless channels between the transmitting and receiving antennas by combining the receiving branches with the adequate complex beamforming weights: = 𝑎𝑖 𝑅𝑖 = 𝑖=1 𝑁 ∑︁ (𝑎𝑖 𝑠𝑖 + 𝑎𝑖 𝑁𝑖 ) (2) |𝑠 𝑗 | ∀𝑖 ∈ 1, ..., 𝑁 10 (b) Four antennas |𝑠𝑖 | 𝑒 − 𝑗𝜃𝑖 ∀𝑖 ∈ 1, ..., 𝑁 𝑎ˆ𝑖 = Í𝑁 |𝑠 | 𝑗 𝑗=1 (6) |𝑠 | To find 𝑎ˆ𝑖 , we need to estimate the amplitude 𝐴𝑖 = Í𝑁 𝑖 𝑗 =1 where we assume the noise in each branch is independent and has the same Power Spectral Density (PSD) 𝑁 0 . The maximization of 𝑆𝑁 𝑅Í is solved using the Cauchy-Schwartz inequality [7] and yields the optimal weights: 𝑗=1 8 2.2 Approach The beamforming weights 𝑎𝑖 are chosen to maximize the combining Signal-to-Noise Ratio (SNR), which is given by: Í𝑁 ( 𝑖=1 𝑎𝑖 𝑠𝑖 ) 2 Í 𝑆𝑁 𝑅 = (3) Í𝑁 𝑁 0 𝑖=1 |𝑎𝑖 | 2 𝑎ˆ𝑖 = Í𝑁 6 We design the receiver beamforming system with the goal to estimate the optimal beamforming weights 𝑎ˆ𝑖 accurately. We first reformulate 𝑎ˆ𝑖 from Equation (4) in the polar representation: 𝑖=1 𝑠𝑖∗ 4 SNR (dB) The optimal combining SNR is the sum of the SNRs from all receiving branches. In the best case scenario when all branches have the same SNRs, this beamforming technique (also known as MaximalRatio Combining [7]) can achieve a final SNR of 𝑁 times the SNR acquired from a single branch. For instance, with a two-antenna receiver we gain twice the SNR (corresponding to a 3 dB gain). The main challenge to achieving the optimal combiner is how to estimate the optimal weight 𝑎ˆ𝑖 for each receiving branch 𝑖. In Equation (4), 𝑎ˆ𝑖 is dependent on 𝑠𝑖 (𝑠𝑖 = ℎ𝑖 𝑆), which is unknown to the receiver. Conventional techniques, which are still extensively used in OFDM or MIMO systems [26] , have to insert mutuallyknown data such as a training sequence or pilot symbols into the transmitted signals that causes significant communications overhead. Meanwhile, phased-array systems [27] do not require channel estimation, but are especially inaccurate against multi-path effects which add up the channel gains from multiple paths and distort the channel characteristics of the direct path. 2.1 Theory and Challenge 𝑁 ∑︁ 2 where the denominator is the scaling factor for the weights. Substitute to Equation (3), we now have the combining SNR: Í𝑁 2 ∑︁ 𝑁 𝑠 𝑆𝑁 𝑅𝑖 (5) 𝑆𝑁 𝑅Í = 𝑖=1 𝑖 = 𝑁0 𝑖=1 Receiver (RX) beamforming aims to optimally combine the received signals from multiple antennas to maximize the Signal-to-Noise Ratio (SNR). In this section, we identify and formulate the RX beamforming problem and present the key ideas of DEFORM. 𝑅Í 0 Figure 2: Error analysis of the approximation of 𝐴𝑖 (Equation (7)) with regards to different SNR levels, number of samples, and number of antennas. Each data point is acquired by 10, 000 Monte Carlo simulations in the AWGN channel. PROBLEM AND APPROACH 𝑅𝑖 = ℎ𝑖 𝑆 + 𝑁𝑖 = 𝑠𝑖 + 𝑁𝑖 1 sample 2 samples 8 samples 32 samples 128 samples 0.25 Approximation Error Ratio Transmitter Approximation Error Ratio Phase estimation CNN Estimation (Section 3.1) Hai N. Nguyen & Guevara Noubir |𝑠 𝑗 | and the phase 𝜃𝑖 . First, we present a simple approach to estimate the amplitude. Out of 𝑁 receiving branches, we pick an arbitrary branch 𝑘, and approximate 𝐴𝑖 for every branch 𝑖 with the transformation: 𝐴𝑖 = (4) 166 |𝑠𝑖 | |𝑠𝑘 | Í𝑁 |𝑠 𝑗 | 𝑗=1 |𝑠𝑘 | ≈ |𝑅𝑖 | |𝑅𝑘 | Í𝑁 |𝑅 𝑗 | 𝑗=1 |𝑅𝑘 | (7) Universal Beamforming: A Deep RFML Approach MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada 𝑎¯𝑖 = 𝐴𝑖 𝑒 − 𝑗 Δ𝜃𝑖 W id th E2 [0, 2𝛑] Height 2 Input 2 64 2 64 64 Conv.3x3 64 Conv. 3x3 ReLU ReLU 12 8 2 12 8 12 8 2 12 8 Depth 12 8 |𝑠 | where the ratio |𝑠 𝑖 | is approximated by the amplitude ratio of re𝑘 |𝑅 | ceived signals |𝑅 𝑖 | for every 𝑖 ≠ 𝑘 (If 𝑖 = 𝑘, the ratio is 1). It is easy 𝑘 to see that the approximation is correct when |𝑠𝑖 | ≈ |𝑠𝑘 |∀𝑖 ∈ 1, ..., 𝑁 . Nonetheless, as |𝑠𝑖 | gets significantly bigger than |𝑠𝑘 |, the approximation error increases. The worst case scenario is when |𝑠𝑖 | ≫ |𝑠𝑘 | and at the same time, the signal power |𝑠𝑘 | 2 is close to the noise PSD 𝑁 0 . However, we can minimize the approximation error by calculating the average of 𝐴𝑖 over multiple RF samples, instead of only using a single sample. As shown by the numerical analysis in Figure 2, the approximation error decreases as we increase the number of samples used for the approximation, regardless the number of antennas in the system. When 128 samples are used for the estimation, we can reduce the error to less than 5% even at a very low SNR (i.e., about 3dB in Figure 2). Moreover, many wireless communications typically requires a sufficiently high SNR to operate (e.g., over 20 dB for Wi-Fi [5]). As we will discuss in later sections, a SNR of 10 dB is the minimum requirement for the RX to achieve a target Bit Error Rate of less than 10 −4 . Estimating the phase 𝜃𝑖 is more challenging, especially in the absence of explicit information from the transmitter (e.g. reference signal or sounding mechanism). This is due to the effect of multipath propagation, in which the constructive and destructive phase combining of multiple copies of the signal traversing the space is typically unpredictable. At the same time, estimating the phase is a critical requirement for the optimal beamforming weights to make the receiving branches co-phased in the combined signal. Without this, the branch signals will not add up coherently and the combining signal will experience even further fading (similar to the behavior of multi-path) [7]. To address this, we present a new approach to achieve co-phasing and optimal beamforming weights. Instead of finding the absolute signal phase 𝜃𝑖 , we estimate the relative signal phase Δ𝜃𝑖 = 𝜃𝑖 − 𝜃𝑘 between the current branch 𝑖 and a pre-selected arbitrary branch 𝑘, resulting in the new weight: 64 64 Conv. 3x3 ReLU 1 E1 [-𝛑, 𝛑] 128 128 Conv. 2x1 ReLU FC Linear Figure 3: The CNN structure for DEFORM with rotational double-output feature. the optimal beamforming gain. Here, we emphasize that existing phase-array-based systems addressing multi-path typically require more antennas [31] or multiple types of sensing hardware [15]. Furthermore, they are limited by some assumptions on the communication channels, and therefore not universal. The operation workflow of DEFORM is depicted in Figure 1, where the branch signals are combined using the optimal beamforming weights, resulting in the output signal which is sent to the decoder to decode the data. The design and optimization of the most important module - phase estimation, will be presented in details in Section 3. 3 PHASE ESTIMATION FOR UNIVERSAL RX BEAMFORMING In this section, we present the design of DEFORM’s phase estimation module, which leverages a CNN to accurately estimate the relative phases (which is critical for DEFORM to calculate the beamforming weights), supported by optimization techniques specially designed to address the unique features of wireless signals complex samples. We have considered several neural network architectures and choose CNN because it is very powerful to extract relevant low-level features embedded in data of various types [2, 13, 21, 25]. It is natural to leverage such capabilities to disentangle the signal components and provide accurate phase estimations. 3.1 Neural Network Design (8) Goals. We define three goals for the design of the CNN model. Firstly, the model should be able to process the continuous stream of 𝐼 /𝑄 data efficiently. Feeding a very long stream of samples to the CNN would significantly increase the size of the network as well as the computation cost. Secondly, the model should output the real-valued relative phase with the smallest estimation error as possible. Finally, the network estimation needs to be fast and computationally efficient. We note that there is a natural trade-off between the second and third goals, therefore we aim to find the best model that achieves optimal accuracy and speed throughout the training and validation processes. which makes the received signal from branch 𝑖 co-phased with the signal from branch 𝑘. As a result, all branch signals are co-phased in the combiner and we achieve the optimal SNR gain. To estimate the relative phase, we need to detach the signal component from the noise component. Phased-array systems [27] have been known for the capability to disentangle different components in the received signals. However, they typically perform poorly in the presence of multi-path fading [23]. To address this problem, we propose a novel Deep Learning-based universal RX beamforming system (DEFORM) that centers around an efficient, powerful Convolutional Neural Network (CNN). Inspired by the capability of CNN to filter and extract relevant low-level features from data in various areas such as text [13], RF [21], visual [25], or speech [2], we develop a CNN model that precisely estimates the relative phase directly from the branch signals. The CNN is facilitated with optimization techniques to address the unique features of wireless signals complex samples, such as the misleading 2𝜋 phase discontinuity and the link Bit Error Rate sensitivity, bolstering DEFORM’s universality. DEFORM is implemented for a two-antenna receiver, and extensively evaluated in various RF settings of modulations and bandwidths to validate the universality in achieving CNN Architecture. To reduce the computational cost of the network, a long stream of RF samples is divided into equal chunks of 𝑀 samples (𝑀 = 128 in our implementation). As complex RF samples are composed of In-phase and Quadrature components, it is intuitive to view a block of M samples as a matrix of size 2 × 𝑀 where each entry is an In-phase (1𝑠𝑡 row) or Quadrature (2𝑛𝑑 row) value. The matrices of 𝑁 antennas (𝑁 = 2 in our current system) are stacked along the third dimension, which consequently form a 2 × 𝑀 × 𝑁 tensor of real-valued elements for each input data. After having done a network structure search, we converge on an optimized structure that achieves good performance in both 167 MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada Hai N. Nguyen & Guevara Noubir processing speed and estimation correctness. We find that as the network gets deeper, the estimation error generally decreases. However, after reaching a certain depth, the improvement becomes less significant. As one of our goals is estimation efficiency, we choose the fastest model that can achieve comparable performance with deeper models. We justify our selection by comparing with popular DL architectures (which is discussed in Section 4.1). The structure of our neural network is shown in Figure 3. Preceding layers are three convolutional layers with kernel size of 3 × 3, followed by one 2 × 1 convolutional layer and the fully-connected output layer. 3 × 3 convolutional layers have been widely used in state-of-the-art DL architectures [9, 25] due to their capabilities of extracting lowlevel features appearing in local regions of the input data. On the other hand, the 2 × 1 convolutional layer aims to provide high-level semantics of angular distance with the sample-wise combining of 𝐼 /𝑄 channels. The fully connected layer synthesizes the output from previous layers and makes prediction. Rectified Linear Unit (ReLU) activation is used for the convolutional layers because it is computationally efficient and more effective against the vanishing gradient problem [6], while linear activation is applied to the output layer. We utilize Batch Normalization [12] for all convolutional layers to improve the training convergence and eliminate the needs for regularization. To avoid overfitting, we use Learning Rate Decay [33] which lowers the learning rate if the validation error remains unimproved for a period of time (e.g., a few epochs). Algorithm 1: Double-output selection Data: 𝐸 1, 𝐸 2, 𝑆𝑇 𝐷 1, 𝑆𝑇 𝐷 2, 𝜖 Result: 𝐸𝑐𝑢𝑟 if 𝜋 − 𝜖 < 𝐸 2 < 𝜋 + 𝜖 then 𝑆𝐸𝐿𝐸𝐶𝑇2 ← 𝑇 𝑟𝑢𝑒 else 𝑆𝐸𝐿𝐸𝐶𝑇2 ← 𝐹𝑎𝑙𝑠𝑒 end if if −𝜖 < 𝐸 1 < 𝜖 then 𝑆𝐸𝐿𝐸𝐶𝑇1 ← 𝑇 𝑟𝑢𝑒 else 𝑆𝐸𝐿𝐸𝐶𝑇1 ← 𝐹𝑎𝑙𝑠𝑒 end if Convert 𝐸 2 to [−𝜋, +𝜋] by Equation (9) if 𝑆𝐸𝐿𝐸𝐶𝑇2 = 𝑇 𝑟𝑢𝑒 and 𝑆𝐸𝐿𝐸𝐶𝑇1 = 𝐹𝑎𝑙𝑠𝑒 then 𝐸𝑐𝑢𝑟 ← 𝐸 2 else if 𝑆𝐸𝐿𝐸𝐶𝑇2 = 𝐹𝑎𝑙𝑠𝑒 and 𝑆𝐸𝐿𝐸𝐶𝑇1 = 𝑇 𝑟𝑢𝑒 then 𝐸𝑐𝑢𝑟 ← 𝐸 1 else if 𝑆𝐸𝐿𝐸𝐶𝑇2 = 𝑇 𝑟𝑢𝑒 and 𝑆𝐸𝐿𝐸𝐶𝑇1 = 𝑇 𝑟𝑢𝑒 then if 𝑆𝑇 𝐷 1 < 𝑆𝑇 𝐷 2 then 𝐸𝑐𝑢𝑟 ← 𝐸 1 else 𝐸𝑐𝑢𝑟 ← 𝐸 2 end if else 𝐸𝑐𝑢𝑟 ← (𝐸 1 + 𝐸 2 )/2 end if 3.2 Rotational Double-Output Theoretically, the neural network is required to provide one estimation for one relative phase between two receiving branches. Nonetheless, while investigating various models, we find that the phase estimation exhibits abrupt variations as the relative phase gets very close to the boundaries of phase values (i.e. upper and lower bounds of the ranges [−𝜋, 𝜋] or [0, 2𝜋]). More specifically, the behavior is described as follows: 𝑆𝐸𝐿𝐸𝐶𝑇2 to imply the correctness of the respective outputs 𝐸 1 and 𝐸 2 . If both indicators are 𝑇 𝑟𝑢𝑒, the algorithm selects the predictions with a smaller Standard Deviation (STD) in a history windows of 𝐾 = 10 elements. In the last case, the average value of two outputs is taken. We note that eventually, 𝐸 2 should be converted to [−𝜋, 𝜋] to avoid inconsistencies of the selections: ( 𝐸 2 − 2𝜋 𝐸 2 > 𝜋 𝐸2 = (9) 𝐸2 else • If the network estimates the phase in [−𝜋, 𝜋], then the estimation abruptly fluctuates when the true value is either in [−𝜋, −(𝜋 − 𝜖)] or [𝜋 − 𝜖, 𝜋]. • If the network estimates the phase in [0, 2𝜋], then the estimation abruptly fluctuates when the true value is either in [0, 𝜖] or [2𝜋 − 𝜖, 2𝜋]. where 𝐸 2 is the mean of 𝐾 = 10 last predictions in [0, 2𝜋]. With this feature, our CNN is trained to minimize the modified Mean Squared Error: where 𝜖 ≈ 0.2𝜋 based on our investigation during the validation process. We believe that such behavior is related to the discontinuity of the phase when it travels beyond the boundaries, for example, above 𝜋 and below −𝜋 for [−𝜋, 𝜋]. Because of the rotational characteristics (i.e., 2𝜋 + 𝜃 = 𝜃 mod 2𝜋), the phase will be shifted backward by an angle of 2𝜋. This confuses the estimation model because those values are far apart (by a distance of 2𝜋) in the numerical axis. To overcome this problem, we enhance the CNN model with what we call a rotational double-output feature, which incorporates two estimation outputs 𝐸 1 , 𝐸 2 (as depicted in Figure 3) for the relative phase converted in [−𝜋, 𝜋] and [0, 2𝜋], respectively. We note that the two estimations do not experience abrupt variations simultaneously. Therefore, if we know which output is experiencing errors and not usable, we can select the other output. The double-output selection is described in Algorithm 1, where we define 𝑆𝐸𝐿𝐸𝐶𝑇1 and L𝜃 = (Δ𝜃 1 − 𝐸 1 ) 2 + (Δ𝜃 2 − 𝐸 2 ) 2 (10) where Δ𝜃 1 and Δ𝜃 2 are the conversions of the true relative phase Δ𝜃 in [−𝜋, 𝜋] and [0, 2𝜋], respectively. 3.3 Stabilizing The Estimations When DEFORM system directly uses the continuous-valued phase estimations to compute the beamforming weights, it becomes susceptible to variations and outliers as typically seen in neural network models [8]. Meanwhile, practical wireless communication systems and standards require not only high precision, but also stability in the estimations as they target a Bit Error Rate in the orders of 10 −4 for a proper operation. Short-term changes on some samples have significant and lasting impacts on the whole packet decoding and easily aggravate the Bit Error Rate. To address this, we propose two different methods for stabilizing the phase estimations: 168 Universal Beamforming: A Deep RFML Approach Data Generation Recording TX & RX samples b1b2b3…bN Random data ... ... Modulate Data Augmentation MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada when the number of samples is limited but higher computation power is available (e.g., offline processing), we can instead stabilize the prediction by processing the same samples multiple times (thus multi-trials). The algorithm is described in Algorithm 2. To avoid repeatedly getting the same estimation, in each trial, we augment the RF samples by artifically adjusting the phases with a random 𝜃𝑟𝑎𝑛𝑑 . We note that after being processed by the CNN, the current estimation 𝐸𝑐𝑢𝑟 needs to be re-adjusted to account for the prior augmentation. To deal with possible drastic changes, we categorizes the estimations across 𝑁 trials into two clusters based on 𝑇 and 𝐸𝑇 for their values: We maintain the average estimations 𝐸𝐶 𝐶2 1 the clusters at time period 𝑇 . An estimation 𝐸𝑐𝑢𝑟 for any trial is 𝑇 | < 𝛼 where 𝛼 = 1.5𝜋 is categorized into cluster 𝐶 1 if |𝐸𝑐𝑢𝑟 − 𝐸𝐶 1 the categorization threshold, otherwise cluster 𝐶 2 . Finally, after 𝑁 trials, we count and compare the number of elements in each group, and choose the average estimation where the corresponding cluster has more elements, as the final estimation for current period. Automatic Labeling Antenna 1 samples Antenna 2 samples Cross-correlation with TX samples 𝛉1 𝛉2 Im 𝛉 Re Complex samples Phaseshifting RX samples Δ𝛉 = 𝛉2 - 𝛉1 Figure 4: The procedure of building dataset for training the DL model of DEFORM. Temporal smoothing. Because wireless channels typically change with at a much slower rate than the incoming rate of RF samples (128 RF samples collected at 1Msamp/s is only 0.128 ms long), we can stabilize the prediction at a given instant by combining with estimations from the recent past. We stabilize the estimation and improve the robustness of the RX beamforming by using the exponential smoothing function: 𝐸𝑇 = 𝐸𝑐𝑢𝑟 𝜆 + 𝐸𝑇 −1 (1 − 𝜆) 3.4 Dataset Collection It is widely known that adequate, curated training data is critical to any Deep Learning approaches. For supervised learning, data is required to have sufficient high-quality labels. Unfortunately, data labeling is typically a manual process done by humans, requires domain knowledge, is slow, arduous, and can be very costly. Furthermore, open datasets of real RF emissions for RX multi-antenna beamforming remain absent. To address this, we devise an efficient multi-stage approach towards building a sufficiently large dataset for training our Deep Learning model, as depicted in Figure 4. Our setup comprises a single-antenna transmitter (TX) and a multi-antenna receiver (RX). As a first step, we generate random complex samples and save them in the memory of both the TX and RX. Then, the TX transmits the saved samples and the RX collects the samples from receiving branches and saves to files. Because of the channel effects, the received samples will experience unknown phase shifts. To determine these phase shifts, we first chunk the received samples, which we subsequently cross-correlate with the transmitted samples already saved in the RX. The phase shift is calculated as the argument of the peak in the correlation output. The labels (relative phase Δ𝜃 ) are obtained by taking the difference between the two phase shifts. When the channels are static, the acquired labels will have very little variance. This would negatively impact the training and bias the DL model towards a small range of values. To address this, and improve the diversity of the dataset, we employ a simple data augmentation technique. For each chunk of RF samples, we randomly shift the phases by a value in [−𝜋, 𝜋] and adjust the label accordingly. We emphasize that while this process is quite efficient, it is not necessary for DEFORM to repeat this process for each type of data (modulation or bandwidth). As we will show in later sections, our deep beamforming system is agnostic to bandwidths and modulations. Thanks to its universality, DEFORM can be quickly deployed without a prior knowledge about the RF signal and channel parameters. (11) where the final phase estimation at the current time period 𝑇 is computed using the estimation from previous period 𝑇 − 1 and the current CNN output estimation acquired from Algorithm 1. Parameter 𝜆 controls the smoothness of the result, and is chosen with the best value 𝜆 = 0.2 through the validation process. It should be noted that if the offset between 𝐸𝑇 −1 and 𝐸𝑐𝑢𝑟 is significant (exceeding a certain threshold 𝛼, i.e. 𝛼 = 1.5𝜋, we should select the instantaneous estimation 𝐸𝑐𝑢𝑟 instead. This behavior is typically seen when the channel changes from being vacant to being occupied by the transmitter. In this case, a drastic change of the phase estimation indicates the beginning of a packet that we should account for. Algorithm 2: Multi-trial averaging Data: 𝑁 , RF samples in period 𝑇 Result: 𝐸𝑇 repeat N times Artificially adjust the phases with a random 𝜃𝑟𝑎𝑛𝑑 ; Compute instantaneous 𝐸𝑐𝑢𝑟 using Algorithm 1; Update 𝐸𝑐𝑢𝑟 to original value with 𝜃𝑟𝑎𝑛𝑑 ; Categorize 𝐸𝑐𝑢𝑟 into two clusters 𝐶 1, 𝐶 2 ; end Count cluster elements 𝐶𝑂𝑈 𝑁𝑇𝐶1 , 𝐶𝑂𝑈 𝑁𝑇𝐶2 ; 𝑇 , 𝐸𝑇 ; Calculate average estimations 𝐸𝐶 𝐶2 1 if 𝐶𝑂𝑈 𝑁𝑇𝐶1 > 𝐶𝑂𝑈 𝑁𝑇𝐶2 then 𝑇 𝐸𝑇 ← 𝐸𝐶 1 else 𝑇 𝐸𝑇 ← 𝐸𝐶 2 end if 4 Multi-trial averaging. Temporal smoothing requires multiple chunks of RF samples to achieve a stabilized prediction. In cases EVALUATION We validate the universality of DEFORM for different modulations and bandwidths under fading wireless channels. To train the DL 169 15.52 3.28 3 10.19 2 1 0 Estimation error Forward time 18 12 6 0.26 0.2 ResNet18 VGG16 0.26 1.26 0.3 Models MR-CNN DEFORM Forward time (ms) Estimation error 4 Hai N. Nguyen & Guevara Noubir Relative Phase (rad) MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada 0 −40 −50 −50 −60 −70 −0.5 0.0 0.5 Baseband frequency (MHz) 1.0 −80 −1.0 0 1 2 3 Data Index 4 5 1e6 Figure 7: Emulated fading pattern for over-the-cables evaluation. The initial relative phase is randomly chosen within [−𝜋, 𝜋], then slowly changing throughout the experiment for a total amount of 2𝜋 radian. −60 −70 −80 0.0 −3.0 PSD (dB/Hz) −60 −90 −1.0 −40 PSD (dB/Hz) PSD (dB/Hz) −50 1.5 −1.5 Figure 5: Comparison of the DL models −40 3.0 −70 −0.5 0.0 0.5 Baseband frequency (MHz) 1.0 −80 −1.0 −0.5 0.0 0.5 Baseband frequency (MHz) 1.0 with a test error of 12.6 times lower (0.26 compared to 3.28). Moreover, DEFORM’s CNN is 8 times faster than VGG16 and 12 times faster than VGG16 while having comparable estimation error (equal to ResNet18, and only 0.06 lower compared to VGG16). Compared with these two models, DEFORM is lightweight, more computationally efficient, and can quickly estimate phases with high precision, which makes it more suitable for real-time and embedded systems. Figure 6: Power Spectral Density plot shows the center frequency offsets in the received wideband signals. It is noted that the y-axis scale is relative. model, we use the techniques described in Section 3.4 and collect a dataset of over 167 million complex RF samples transformed into 654, 553 real-valued tensors of size 2 × 128 × 2, each corresponds to a total of 256 samples collected by the two RX antennas. For this dataset, we use an Ettus USRP B210 software-defined radio (SDR) with the TX implemented using GNURadio [1] to transmit BPSK signals (SNR ranging from 0 to 35 dB) to the RX through two identical coaxial cables with a fixed TX bandwidth of 1 MHz. The RX bandwidth is maintained at 1 MHz for the whole dataset. The dataset is split into training, validation, and test sets with ratio 0.64 : 0.16 : 0.2, respectively. While being trained with a fixed set of RF settings, DEFORM still performs very well on other unseen and more sophisticated settings of modulations, bandwidths, and channels. To the best of our knowledge, our work is the first universal RX beamforming system in the literature that is designed using Convolutional Neural Network (CNN) and extensively evaluated for practical, universal RF beamforming capabilities. 4.2 Over-the-cables Evaluation We evaluate DEFORM’s performance in a relatively idealistic environment where the RF signals propagate through coaxial cables, where multi-path and other fading effects are absent. Data packets are sent from the TX to the two analog inputs of RX (devices are Ettus USRP B210 with SDR) through a pair of identical cables (So the received signals have similar SNRs). TX signal is modulated using differential BPSK, QPSK, 8-PSK, GMSK and 16-QAM techniques, and with a fixed TX bandwidth of 1 MHz and center frequency of 795 MHz. We assess DEFORM’s wideband capability by using different values of RX bandwidths, with a random shift of RX center frequency when the bandwidth is larger than 1 MHz (Figure 6). Evaluation for Model Optimizations. To highlight the impact of the optimization techniques, we measure and analyze the received Bit Error Rate (BER) in five cases: (1) When beamforming is turned off and one of the two received signals is selected for decoding (Note that in our over-the-cables experiments, received signals have similar SNRs), (2) when beamforming is used with the single-output CNN (BF-SO), or (3) with the rotational double-output CNN (BFDO), and when DEFORM is enabled with double-output CNN and (4) temporal smoothing (DEFORM-TS) or (5) multi-trial averaging (DEFORM-MT) optimization. To calculate BER, we record TX and RX signals at the SDRs and transfer them to the host computer to compare and count the bit errors. Figure 8 illustrates the evaluation results, where we compare the approaches with BPSK transmissions. For each scenario, RX signal features (relative phase, bandwidth) are artificially adjusted to show the effects of model optimizations: For Figure 8a, because Δ𝜃 is far from the phase boundaries, all beamforming settings including the baseline single-output CNN achieve 3 dB gain compared to non-beamforming. When Δ𝜙 = 9𝜋 10 which is very close to 𝜋 (where the estimation of the single-output CNN estimating phase in [−𝜋, 𝜋] experiences abrupt variations - Figure 8b), the efficiency of BF-SO decreases to less than 1 dB 4.1 DL Model Comparison To validate our design of neural network and highlight the benefits of our model for the specific task of phase estimation, we evaluate and compare the CNN model with three popular CNN architectures: VGG16 [25], ResNet18 [9] and MR-CNN [21] using the test set of our beamforming dataset. All models are implemented using PyTorch library [22], then trained and tested on a NVIDIA GeForce GTX 1080 GPU using Adam Optimizer [14] and ReduceLROnPlateau Learning Rate Decay scheduler [33] with initial learning rate 𝑙𝑟 = 0.005. The evaluation metrics are estimation error and network forward time. Mean Square Error loss function (Equation (10)) is used to calculate the estimation error. We use torch.cuda.synchronize from PyTorch to synchronize CUDA operations before and after the network propagation function and calculate the elapsed time of such function accurately. Figure 5 compares the estimation error of models on the test set and the network forward time. It is clear that in terms of estimation correctness, DEFORM outperforms MR-CNN 170 logBER −2 −3 No BF BF-SO BF-DO DEFORM-MT DEFORM-TS −1 −2 logBER No BF BF-SO BF-DO DEFORM-MT DEFORM-TS −1 MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada −3 −4 (a) Δ𝜙 = 6 𝜋 10 , SNR 9 12 1MHz bandwidth −2 −3 −5 −5 3 No BF BF-SO BF-DO DEFORM-MT DEFORM-TS −4 −4 −5 −1 logBER Universal Beamforming: A Deep RFML Approach 3 (b) Δ𝜙 = 6 9𝜋 10 SNR 9 , 1MHz bandwidth 12 2 4 (c) Δ𝜙 = 𝜋 2 6 SNR 8 10 , 6MHz bandwidth Figure 8: BER comparison between non-beamforming, and beamforming with the baseline CNNs and DEFORM’s optimizations. gain, while BF-DO and further optimization settings maintain the 3 dB gain. When we increase RX bandwidth to 6 MHz ( Figure 8c), DEFORM-TS and DEFORM-MT outperform the baseline CNNs with 3 dB and 2 dB gains compared to non-beamforming, respectively. This justifies the efficiency of DEFORM’s optimizations to address the increasing variations and outliers of the baseline CNNs introduced by communication settings of wider bandwidths. on the same frequency band, which results in significant interference that damages the phase structure in the captured RF samples. In this case, the prediction could be accurate for only one user, or even not working for any users. It would be interesting to investigate the CNN’s capability for multi-band spectrum and collision analysis in the future work. Theoretical aspects of beamforming have been investigated in the literature [4, 19, 28, 30, 34], including some efforts to build the so-called blind beamformers [4, 10, 34], that estimates the channel phase offsets without explicit knowledge from the transmitter. The practicality of those methods and systematic deployment guidelines, however, are still open questions. Popular wireless communications technologies still utilize informed beamformer approaches that estimate the channel using information from the transmitter such as pilot sequence [35] in IEEE 802.11 or reference signal [17] in 5G radios. As discussed, this approach typically requires significant communication overhead, and is limited to specific TX-RX setup and communication techniques. Advances in Machine Learning and Deep Learning have emerged as a solution for critical problems in various areas, including wireless communications. In recent years, ML and DL approaches have been extensively utilized in various tasks such as modulation recognition [21], RF technology identification [20], or wireless localization [29]. There are also some efforts to enable ML-driven beamforming. For example, in [32], a Deep Neural Network (DNN) is developed for OFDM channel estimation. In [11], DNN is also utilized for downlink MIMO beamforming. Nonetheless, those works are limited by simulated data, and specific assumptions of channel model. Furthermore, they lack the explanation and evaluation for the impact of various RF characteristics, i.e. link modulations and bandwidths. Compared to those, our work is unique in two main aspects. First, our CNN-based DL model is trained with real RF data acquired by an efficient dataset collection process. Second, our system is agnostic to different RF settings of modulations and bandwidths. Despite being trained on a fixed, basic RF settings, we can still achieve the optimal beamforming gain in complex, unseen settings. Hence, DEFORM can be used as a universal RX beamforming module for existing multi-antenna RF receivers. As the next step, we will apply DEFORM to RF receiver operating in realistic over-the-air environments with various wireless channel artifacts (e.g., multi-path fading). We will also justify DEFORM’s universality Evaluation for Signal Features Universality. As mentioned above, we consider various settings of modulations and bandwidths to evaluate the universality of DEFORM. We emulate the slow fading effect of the real wireless channels by artificially adjusting the phases of received signals following a stair-step pattern for the relative phase Δ𝜃 as illustrated in Figure 7 to cover the whole 2𝜋 phase range. Figure 9 shows the Bit Error Rate (BER) analysis where DEFORM is evaluated with both DEFORM-TS and DEFORM-MT approaches, and compared with non-beamforming. It is evident that for all combinations of modulation and RX bandwidth, DEFORM-TS can provide a 3 dB SNR gain compared to using a single RX branch, with the only exception of GMSK-2MHz where the gain for BER = 10 −5 is approximately 2.5 dB. Meanwhile, DEFORM-MT also achieves 3 dB gain for 1 and 2 MHz RX bandwidths, and about 2-3 dB gain for 4 MHz bandwidth. However, with 6 MHz bandwidth, the SNR gain of DEFORM-MT declines to 1 dB for BPSK and QPSK at BER ≥ 10 −4 , and to 2 dB for 8-PSK and 16-QAM. Interestingly, it still performs equally well with DEFORM-TS for GMSK in 6 MHz RX bandwidth, both having a 3 dB gain for all BER levels. 5 DISCUSSION AND RELATED WORK It is clearly seen that the Deep Learning-based approach of DEFORM can be extended to larger multi-antenna systems to achieve even higher than 3 dB gain. For a RF receiver system of 𝑁 receiving antenna elements, the deep learning architecture can be modified to have 2 × (𝑁 − 1) outputs in which two estimations are made for each relative phase between the pre-selected antenna and one other antenna. The new requirement of such systems are time synchronization mechanisms for RX radios as typical wireless peripherals have a limited number of antennas (Most of Ettus’s USRPs only support 2 simultaneous RX channels [24]). Our current system assumes that there is only one user in the observing spectrum at any given time. The problem becomes more challenging when multiple users are simultaneously transmitting 171 MSWiM ’22, October 24–28, 2022, Montreal, QC, Canada QPSK-1MHz 8-PSK-1MHz −2 −2 −3 7 SNR 9 11 13 10 12 SNR 14 16 4 −2 6 8 10 5 7 9 SNR 11 13 −4 2 4 6 SNR 8 10 −5 5 7 BPSK-6MHz 13 −2 −2 2 4 6 8 10 20 5 7 12 14 SNR 11 15 13 17 SNR 19 21 GMSK-4MHz 18 20 15 10 12 14 16 18 20 25 17 19 SNR 21 17 DEFORM-TS DEFORM-MT No BF −5 SNR 19 21 15 17 19 SNR 21 16-QAM-6MHz −1 −2 −3 −4 DEFORM-TS DEFORM-MT No BF −4 SNR 23 16-QAM-4MHz −4 −3 −5 21 −3 GMSK-6MHz DEFORM-TS DEFORM-MT No BF 15 −1 DEFORM-TS DEFORM-MT No BF −5 16 SNR −2 −4 SNR 19 DEFORM-TS DEFORM-MT No BF −5 −2 −5 9 −4 −1 −4 DEFORM-TS DEFORM-MT No BF −5 SNR 18 −3 −4 DEFORM-TS DEFORM-MT No BF 16 8-PSK-6MHz −1 −3 −5 SNR DEFORM-TS DEFORM-MT No BF 10 −1 logBER logBER −3 14 17 16-QAM-2MHz −1 −3 QPSK-6MHz −2 −4 12 logBER −1 11 15 −2 −5 SNR 23 −1 −4 9 21 DEFORM-TS DEFORM-MT No BF −5 −3 DEFORM-TS DEFORM-MT No BF SNR −3 −4 logBER −2 logBER logBER −2 19 −2 8-PSK-4MHz −1 17 GMSK-2MHz DEFORM-TS DEFORM-MT No BF 10 −1 −4 DEFORM-TS DEFORM-MT No BF 15 −3 −5 −3 −3 22 DEFORM-TS DEFORM-MT No BF −5 −2 QPSK-4MHz −2 20 −4 −1 −4 DEFORM-TS DEFORM-MT No BF BPSK-4MHz −1 18 SNR −3 −5 SNR 16 logBER −2 logBER logBER −1 −4 DEFORM-TS DEFORM-MT No BF 2 14 8-PSK-2MHz −1 −3 −3 −4 12 QPSK-2MHz −2 −5 8 BPSK-2MHz −1 −5 6 −5 logBER 5 −5 DEFORM-TS DEFORM-MT No BF logBER 3 −5 −3 −4 DEFORM-TS DEFORM-MT No BF logBER −5 −4 DEFORM-TS DEFORM-MT No BF logBER −4 DEFORM-TS DEFORM-MT No BF −3 logBER −4 −2 −2 −3 logBER −3 16-QAM-1MHz −1 logBER −2 GMSK-1MHz −1 logBER −1 logBER −1 logBER −1 logBER BPSK-1MHz Hai N. Nguyen & Guevara Noubir 15 17 DEFORM-TS DEFORM-MT No BF −5 SNR 19 21 15 17 19 SNR 21 Figure 9: BER analysis of over-the-cables communications for different modulation schemes and RX bandwidths. DEFORM is evaluated with Temporal Smoothing (DEFORM-TS) and Multi-Trial Averaging (DEFORM-MT). with the beamforming-relay application supporting the communications of different wireless technologies such as ZigBee and LoRa. [16] H.J. Kwon et al. 2019. Machine Learning-Based Beamforming in Two-User MISO Interference Channels. In ICAIIC’19. [17] X. Lin et al. 2019. 5G New Radio: Unveiling the Essentials of the Next Generation Wireless Access Technology. IEEE CSM (2019). [18] W. Liu et al. 2010. Wideband Beamforming: Concepts and Techniques. [19] C. Masouros et al. 2015. Exploiting Known Interference as Green Signal Power for Downlink Beamforming Optimization. IEEE TSP (2015). [20] H. N. Nguyen et al. 2021. Wideband, Real-time Spectro-Temporal RF Identification. In ACM MobiWac’21. [21] T. O’Shea. 2016. Convolutional Radio Modulation Recognition Networks. In Engineering Applications of Neural Networks. [22] A. Paszke et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NIPS’19. [23] D. Rahamim et al. 2004. Source localization using vector sensor array in a multipath environment. IEEE TSP (2004). [24] Ettus Research. 2021. Products. https://www.ettus.com/products/ [25] K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In arXiv:1409.1556. [26] G. Stuber et al. 2004. Broadband MIMO-OFDM wireless communications. Proc. IEEE (2004). [27] H.L. Van Trees. 2004. Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory. Wiley. [28] V. Venkateswaran et al. 2010. Analog Beamforming in MIMO Communications With Phase Shift Networks and Online Channel Estimation. IEEE TSP (2010). [29] X. Wang et al. 2016. CSI-based fingerprinting for indoor localization: A deep learning approach. IEEE TVT (2016). [30] Q. Wu et al. 2019. Intelligent Reflecting Surface Enhanced Wireless Network via Joint Active and Passive Beamforming. IEEE TWC (2019). [31] J. Xiong et al. 2013. ArrayTrack: A Fine-Grained Indoor Location System. In NSDI’13. [32] H. Ye. 2018. Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems. IEEE WCL (2018). [33] M. Zaheer. 2018. Adaptive Methods for Nonconvex Optimization. In NIPS’18. [34] L. Zhang et al. 2017. An Eigendecomposition-Based Approach to Blind Beamforming in a Multipath Environment. IEEE CL (2017). [35] Z. Zhao et al. 2013. Channel Estimation Schemes for IEEE 802.11p Standard. IEEE ITSM (2013). ACKNOWLEDGMENTS This work was partially supported by grants NAVY/N00014-20-12124, NCAE-Cyber Research Program, and NSF/DGE-1661532. REFERENCES [1] 2021. GNURadio. https://www.gnuradio.org [2] O. Abdel-Hamid et al. 2014. Convolutional Neural Networks for Speech Recognition. IEEE/ACM TASLP (2014). [3] C. M. Bishop. 2006. Pattern Recognition and Machine Learning. [4] J.F. Cardoso et al. 1993. Blind beamforming for non-gaussian signals. IEEE Proceedings F (1993). [5] CISCO Meraki. 2018. SNR and Wireless Signal Strength . [6] X. Glorot et al. 2011. Deep Sparse Rectifier Neural Networks. In AISTATS’11. [7] Andrea Goldsmith. 2005. Wireless Communications. Cambridge University Press. [8] I. Goodfellow et al. 2016. Deep Learning. [9] K. He et al. 2016. Deep Residual Learning Image Recognition. In CVPR’16. [10] I. Himawan et al. 2011. Clustered Blind Beamforming From Ad-Hoc Microphone Arrays. IEEE TASLP (2011). [11] H. Huang et al. 2018. Unsupervised learning-based fast beamforming design for downlink MIMO. IEEE Access (2018). [12] S. Ioffe et al. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML’15. [13] N. Kalchbrenner et al. 2014. A Convolutional Neural Network for Modelling Sentences. arXiv:1404.2188 [14] D. Kingma et al. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [15] S. Kumar et al. 2014. Accurate indoor localization with zero start-up cost. In ACM Mobicom’14. 172