B11 - Mid Report
B11 - Mid Report
B11 - Mid Report
By
Supervisor
Done by
Manda Abhinay Reddy - 204242
Pammi Vijay Hemanth - 204255
Suhel Faraz Siddiqui - 204268
Supervisor
Dr. Chayan Bhar
Assistant Professor
Department of Electronics and Communication Engineering
2
Declaration
I declare that this written submission represents my ideas in my own words
and where others' ideas or words have been included, I have adequately cited
and referenced the original sources. I also declare that I have adhered to all
principles of academic honesty and integrity and have not misrepresented,
fabricated or falsified any idea/data/fact/source in my submission. I
understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources that have
thus not been properly cited or from whom proper permission has not been
taken when needed.
Date: 30-10-2023
Certificate
3
This is to certify that the dissertation Work entitled “ML BASED USER-BASE
STATION ASSOCIATION FOR HIGH MOBILITY SCENARIOS” is a bonafide
record of work carried out by
Contents
4
Approval Sheet ……………………………………………………………………………….2
Declaration ……………………………………………………………………………………. 3
Certificate ……………………………………………………………………………………….4
1. Introduction ……………………………………………………………………………….7
1.1 Problem Statement……………………………………………………………..8
1.2 Theory…………………………………………………………………………….…8
1.3 Applications, Merits, and Demerits……………………………….……11
2. Literature Survey….…………………………………………………………………...12
2.1 Human Mobility……………………………………………………………….12
2.2 Poisson Point Processing(PPP)...................................................15
2.3 Reinforcement Learning…………………………………………………….19
2.3.1 Proximal Policy Optimization…………………………………20
2.4 Path loss, Shadowing and Fading…………………………………………21
3. Workflow…………………………………………………………………………………...24
4. Workdone and Results……………………………………………………………….26
4.1 3D Contour………………………………………………………………………26
4.2 Poisson Point Processing(PPP)...................................................29
4.3 Base Stations………………………………………………………………………32
4.4 RL Agent…………………….…………………………………………………….33
References……………………………………………………………………………………..
5
List of Acronyms
6
Chapter 1
Introduction
7
1.1 Problem Statement
● User and Base Station distribution is modeled as a Poisson point
process (PPP), where they are distributed as individual points.
● Dynamic and immersive 3D contour-based environment for
simulating realistic scenarios.
● Integrating real-world factors like shadowing, signal fading, and
path loss to determine the optimal Base Station association.
● A model of human mobility guides the actions of the users.
1.2 Theory
8
(i)Supervised learning: Algorithms learn from labeled data, making
predictions or decisions based on known outcomes. Common
algorithms include linear regression, decision trees, and neural networks.
Figure 1.1(a),(b)
Figure 1.2
9
receives observations and a reward from the environment and sends an
action to the environment.
Figure 1.3
10
internet or to other cellphones within the cell. The size of the base
station depends on the size of the area covered, the number of clients
supported and the local geography.
11
Chapter 2
Literature Survey
12
0 < α ≤ 2 and 0< β ≤ 1
Three empirical observations that indicate that human trajectories follow
reproducible scaling laws, but also illustrate the shortcoming of the CTRW
model in capturing the observed scaling properties.
ζ ≈ 1.2±0.1
This suggests that the visitation frequency distribution follows
● Ultraslow diffusion: the CTRW model predicts that the mean square
displacement (MSD) asymptotically follows
with ν = 2β/α ≈ 3.1. As both P(Δr) and P(Δt) have
cutoffs, asymptotically the MSD should converge to a Brownian
behavior with ν =1. However, this convergence is too slow to be
relevant in our observational time frame. Either way, CTRW predicts
that the longer we follow a human trajectory, the further it will drift
from its initial position. Yet, humans have a tendency to return home
on a daily basis, suggesting that simple diffusive processes, which are
not recurrent in two dimensions, do not offer a suitable description of
human mobility. Measurements indicate an ultraslow diffusive process,
in which the MSD seems to follow a slower than logarithmic growth.
13
Figure 2.1
After a waiting time is chosen from the P(Δt) distribution, the individual
will change his/her location. We assume that the individual has two choices.
(i) Exploration: with probability
the individual moves to a new location (different from the S locations he/she
visited before). The distance 1r that he/she covers during this exploratory
jump is chosen from the P(Δr) distribution and his/her direction is selected
to be random. As the individual moves to this new position, the number of
previously visited locations increases from S to S + 1.
14
number of visits the user previously had to that location. That is, we assume
that 𝚷i = fi.
15
cell edge.
PPPs have been used to model co-channel interference from macro
cellular base stations, cross-tier interference from femtocells, co-channel
interference in ad hoc networks, and as a generic source of interference.
Modeling co-channel interference from other base stations as performed is a
good starting point for developing insights into heterogeneous network
interference. PPPs are used to model various components of a
telecommunications network including subscriber locations, base station
locations, as well as network infrastructure leveraging results on Voronoi
tessellation. A cellular system with PPP distribution of base stations, called a
shotgun cellular system, is shown to lower bound a hexagonal cellular
system in terms of certain performance metrics and to be a good
approximation in the presence of shadow fading.
Figure 2.3
(a) Common fixed geometry model with hexagonal cells and multiple tiers of interference. (b)
Stochastic geometric model where all base stations are distributed according to some 2D
random process. (c) A proposed hybrid approach where there is a fixed cell of a fixed size
surrounded by base stations distributed according to some 2D random process, with an
exclusion region around the cell and a dominant interferer at the boundary of the guard region.
16
cell has been derived using approximations for the distribution of the cell
area. This approach does not extend to path-loss models that include
shadowing where users do not necessarily connect to the spatially nearest
base station. The purpose of this study is to examine a good model for the
distribution of the number of user per cell in a shadowing environment.
We can write the distribution of number of users per cell, denoted as Dist(n),
as
17
NB is negative binomial
Note that the probability mass function (PMF) of a negative binomial (NB)
random variable T is
18
Figure 2.5, Negative binomial fit
and simulation for s = 0(no shadowing), s = 5, and s = 10
19
actions to maximize a cumulative reward over time. It draws inspiration from
behavioral psychology and is often used in applications where an agent must
learn to make a sequence of decisions, such as robotics, gaming,
recommendation systems, and autonomous vehicles. The agent is the learner
or decision-maker that interacts with the environment. It makes decisions
based on a policy to achieve a certain goal. A reward(R) is a scalar value that
the environment provides to the agent after each action. It quantifies how
good or bad the action was in the given state. The agent's objective is to
maximize the cumulative reward over time.
20
region policy optimization (TRPO) , while using only first-order
optimization. We propose a novel objective with clipped probability ratios,
which forms a pessimistic estimate (i.e., lower bound) of the performance of
the policy. To optimize policies, we alternate between sampling data from the
policy and performing several epochs of optimization on the sampled data.
The main objective proposed in this paper is
epsilon is a hyperparameter.
Figure 2.6
where c1, c2 are coefficients, and S denotes an entropy bonus, and LtVF is a squared-error
loss.
● Path loss: Path loss, also known as free-space path loss, refers to the
21
reduction in signal strength as a radio wave travels through the air or
any other medium. It is a consequence of the signal spreading out as it
propagates and is inversely proportional to the square of the distance
between the transmitter and receiver.
22
● Fading: Fading refers to the rapid fluctuations in signal strength over
short time scales due to the superposition of multiple reflected and
refracted paths between the transmitter and receiver. This
phenomenon results in constructive and destructive interference,
causing signal strength to vary significantly. Fading is primarily caused
by multipath propagation, where signals take different paths and arrive
at the receiver with different phases. This can lead to constructive
(signal reinforcement) or destructive (signal cancellation)
interference.There are two main types of fading:
○ Small-Scale Fading: Occurs over a short distance and time scale
and is responsible for the fast variations in signal strength.
○ Large-Scale Fading: Occurs over a larger distance and time scale
and is often influenced by path loss and shadowing.
23
Chapter 3
Workflow
3D Contour:
Creating a realistic 3D surface for the environment to align with
real-world scenarios.Create a continuous terrain with the help of Perlin noise
in Python.Perlin noise is applied to create the terrain, with adjustable
parameters like scale, octaves, persistence, lacunarity, and seed for controlling
the appearance of the landscape. The resulting height map is normalized to
fit within a specified height range, ensuring it can be visualized accurately.
User Mobility:
The user mobility is random and there are many researches are going
on we use the approximations from a specific research paper in the python
code which give a closer approximation for human mobility (“Modeling the
scaling properties of human mobility”) .the parameters like wait time,step
size and probability of exploration and reaching to the same point are
approximated in this research paper.
24
Base station Association & UAV:
Creating a python code such that the base Stations are distributed
among the users using a Poisson Point Process and there can be mobile base
stations (i.e. UAV) where its position changes dynamically along with the
time in the environment.The python code should be given an optimum base
station for a user by considering all the parameters.
RL Agent:
The goal of reinforcement learning is to train an agent to complete a
task within an uncertain environment. At each time interval, the agent
receives observations and a reward from the environment and sends an action
to the environment.We train the Environment using Proximal Policy
Optimisation algorithm for finding the best Base Station for the user.
25
Chapter 4
4.1 3D Contour
To create a more realistic scenario we used the noise function in
Python. Python code that generates synthetic terrain using Perlin noise and
visualizes it from different perspectives.
26
● Parameters:
○ Scale: The scale parameter determines the "zoom" or granularity
of the noise. A larger scale value results in a smoother, broader
pattern, while a smaller scale value produces finer, more detailed
noise. Adjusting the scale can change the overall appearance of
the generated terrain.
27
28
4.2 Poisson Point Processing(PPP)
The users and Base Stations are distributed on the contour using the
Poisson Point Process. The code generates random user locations within a
grid of cells and visualizes the user distribution.
● Calculate the expected number of users in the entire space based on the
user density. Generate a Poisson point process to determine the
number of users in each cell of the grid.
user_counts = np.random.poisson(lambda_value, (space_size // cell_size, space_size // cell_size))
● Find the number of users
val = 0
for i in user_counts :
for j in i :
val += j
print("Total Users Count from poisson distribution : " , val)
29
users_count:
30
4.3 Base Stations
Base Stations are also distributed using the Poisson Point Process. Here
we included a few mobile Base Stations.
31
● The Edge weights and Edge rewards are assigned randomly, we train
the rewards of each edge by updating its value according to the
parameters (current edge reward, euclidean distance between the nodes
& the edge weights)
● After Training the edge rewards we randomly add a new node into the
graph And new edges between the new node and the original nodes.
The edge weights are also assigned randomly now we have trained to
find the best path from source to destination considering all the new
paths that have been added.
● For each Episode we add a random node into the graph and we train
for 10 epochs.
4.4 RL Agent:
The PPO is a policy gradient algorithm. There are two main variants of
PPO (i.e. PPO Penalty and PPO Clip) and we use PPO clip for training the
environment. It usually outperforms the penalty-based variant and is simpler
to implement. Rather than bothering with changing penalties over time, we
32
simply restrict the range within which the policy can change.
The project utilizes the Proximal Policy Optimization (PPO) algorithm from
the Stable Baselines3 library to train an agent to make power allocation
decisions.
33
References
1. Song, C., Koren, T., Wang, P. et al. Modelling the scaling properties of
human mobility. Nature Phys 6, 818–823 (2010).
https://doi.org/10.1038/nphys1760
2. González, M., Hidalgo, C. & Barabási, AL. Understanding individual
human mobility patterns. Nature 453, 779–782 (2008).
https://doi.org/10.1038/nature06958
3. R. W. Heath, M. Kountouris and T. Bai, "Modeling Heterogeneous
Network Interference Using Poisson Point Processes," in IEEE
Transactions on Signal Processing, vol. 61, no. 16, pp. 4114-4126,
Aug.15, 2013, doi: 10.1109/TSP.2013.2262679.
4. B. Pilanawithana, S. Atapattu and J. Evans, "Distribution of number of
users per cell in a poisson wireless network with shadowing," 2016
Australian Communications Theory Workshop (AusCTW),
Melbourne, VIC, Australia, 2016, pp. 153-156, doi:
10.1109/AusCTW.2016.7433666.
5. Oliveira, F.; Luís, M.; Sargento, S. Machine Learning for the Dynamic
Positioning of UAVs for Extended Connectivity. Sensors 2021, 21,
4618. https://doi.org/10.3390/s21134618
6. F. Zishen, X. Xianzhong, S. Zhaoyuan and C. Xiping, "Proximal Policy
Optimization Based Continuous Intelligent Power Control in
Cognitive Radio Network," 2020 IEEE 6th International Conference
on Computer and Communications (ICCC), Chengdu, China, 2020,
pp. 820-824, doi: 10.1109/ICCC51575.2020.9345062.
7. Y. Meng, S. Kuppannagari, R. Kannan and V. Prasanna, "PPOAccel: A
High-Throughput Acceleration Framework for Proximal Policy
Optimization," in IEEE Transactions on Parallel and Distributed
Systems, vol. 33, no. 9, pp. 2066-2078, 1 Sept. 2022, doi:
10.1109/TPDS.2021.3134709.
8. S. Sun, T. A. Thomas, T. S. Rappaport, H. Nguyen, I. Z. Kovacs and I.
Rodriguez, "Path Loss, Shadow Fading, and Line-of-Sight Probability
Models for 5G Urban Macro-Cellular Scenarios," 2015 IEEE
Globecom Workshops (GC Wkshps), San Diego, CA, USA, 2015, pp.
1-7, doi: 10.1109/GLOCOMW.2015.7414036.
34
35