Final Project

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

A Synopsis Report

on
Design and develop a system for Exploratory analysis of
Geolocational Data in Python
Submitted to the Department of Computer Science and Engineering
In partial fulfilment of the requirements
For the degree of
Bachelor of Technology
In

Computer Science and Engineering

by
MADHAV SHARMA

(2200140100054)

SARTHAK YADAV

(2200140100094)

SHIVANSH GUPTA

(2200140100101)

Group No. 14

Guided By

Er. ANU SAXENA

Department of Computer Science and Engineering


Shri Ram Murti Smarak College of Engineering & Technology, Bareilly
Dr. A. P. J. Abdul Kalam Technical University, Lucknow
February, 2024
Acknowledgement

I would like to express my special thanks to our mentor Ms. Anu Saxena for her time and
efforts she provided throughout the year. Your useful advice and suggestions were really
helpful to us during the project’s completion. In this aspect, I am eternally grateful to you.

I would like to acknowledge that this project was completed entirely by us and not by

someone else.

Signature………………………………… Signature………………………………

Name…………………………………….. Name…………………………………..

Roll No………………………………….. Roll No…………………………………

Date…………………………………….. Date…………………………………

Signature…………………………………

Name……………………………………..

Roll No…………………………………..

Date……………………………………..
Abstract
This project involves the use of K-Means Clustering to find the best accommodation for the
migrants by classifying accommodation for migrants on the basis of their preferences on
facility, budget and proximity to the location. To Fetch, Clean, Analyse and run K Means
Clustering on Geo-locational data to recommend accommodations to immigrants to a city.
Keywords: Data, Dataset, Recommendation, Map
TABLE OF CONTENT

Chapters Page Numbers

1 Acknowledgement

2 Abstract

3 Introduction of Project 1

3.1 Problem statement 1

3.2 Literature Review 1


4 Technology Used 2

5 Way of approach 3

6 Project stages 4

Reference
3.Introduction of Project

In an era where data-driven decision-making is paramount, the analysis of geolocational data


has emerged as a crucial tool across various domains. Geolocational data, encompassing GPS
coordinates, addresses, and spatial attributes, holds the key to understanding spatial patterns,
trends, and relationships that influence everything from urban planning to environmental
conservation. To harness the full potential of this data, there is a growing demand for robust
systems that facilitate its exploratory analysis.

This project endeavors to design and develop a comprehensive system for exploratory
analysis of geolocational data using the versatile programming language Python. By
leveraging Python's rich ecosystem of libraries and tools, we aim to create a platform that
empowers users to extract actionable insights from geospatial datasets with ease and
efficiency.

The system will be tailored to cater to the diverse needs of users across different domains,
including urban planners seeking to optimize city infrastructure, environmental scientists
monitoring ecological changes, and businesses looking to understand consumer behavior
based on spatial dynamics.

Through this endeavor, we aim to democratize access to geospatial analysis tools, enabling
researchers, analysts, and decision-makers to unlock the hidden potential of geolocational
data and make informed choices that shape our world. From data import and cleaning to
advanced spatial analysis techniques and visualization, this system will provide a seamless
and intuitive user experience, paving the way for data-driven insights that drive positive
change and innovation.
3.1Problem statement

 Users feel uneasy when their location data is collected without their knowledge or
consent.
 limitation imposed by the user's device and internet connection.
 Location data is impacted by a variety of factors including businesses opening or
closing, people relocating, devices changing hands, and so on. Inaccurate data can
lead to poor business decisions, wasted resources, and reduced customer satisfaction
so it's accuracy can be improved.
 Location data can be complex, and it may be difficult to integrate with other types of
data or into a product without requiring advanced analytics and data processing tools
to derive insights.

 location-based app development can benefit from ready-to-use map APIs provided by
Google Maps, Mapbox, TomTom or other providers. It's important to carefully
consider the different pricing options offered by these providers, as the usage of
different geolocation features will generate costs that vary across these API providers

3.2 Literature Review


There are many algorithms derived above to determine k automatically. Most of thesemethods
are wrappers around kmeans or some other clustering algorithm for fixed k. Use the wrapper method
divide and combine the rules for centers to increase or decrease the value of k as the algorithm
progresses.After calculating the BIC or Bayesian Information Criterion(BIC is a method for scoring
and selecting a model ) for each clustering model. Apart from BIC, other scoring functions are also
available. Some researchers use the MDL method to find the best k. The researchers also used the
Minimum Description Length (MDL) framework, where the description length is the measurement
value that tells us how well the data fit the model. This algorithm starts with a large value for k and
removes the center (reduces k) each time that selection reduces the length of the description. Among
the k reduction steps, they used the kmeans algorithm to optimize the fit of the model to the data
4.Technology used

 GPS
 Global Positioning System, was originally developed for military navigation but
nowadays anyone with a GPS device can receive radio signals that these satellites
broadcast. This global satellite system provides geolocation and time information to a
GPS receiver almost anywhere on the Earth if there are no obstacles and at least three
GPS satellites available.

 A big plus of GPS is its accuracy. It can locate something up to five meters precisely
or even better with dual-band GPS receivers. The accuracy depends on many factors
and it is also important to take into account the time it takes to determine a position,
the fix time.
 Another advantage is that GPS works everywhere outdoors and there is no specific
infrastructure required. The downside is that…
 Bluetooth Low Energy
 Bluetooth is a wireless short-range communications technology standard. It’s mainly
designed for communicating over short distances. The signals do not carry very far,
even in optimal circumstances devices need to be within 100 meters. Although
Bluetooth has been around for two decades, its latest version, Bluetooth Low Energy
(BLE) is making big strides in geolocation and positioning.

 There are two options to localise a tracking device via BLE:

 Geobeacon (Fixed-location BLE beacon): a BLE location marker on a fixed location


advertises its position over BLE. The tracking device receives this data and reports the
position to the end-user
 BLE gateway: the tracking device announces its presence via BLE to a BLE gateway
on a fixed locatio…

.
 Wi-Fi positioning taps into wireless local area networks (WLANs), which are
networks of devices that connect to a specific radio frequency, usually 2.4GHz or
5.0GHz. The Wi-Fi device transfers signals for a range of up to one hundred meters,
which means Wi-Fi can cover both indoor and outdoor sites. A tracking device will
sniff for nearby Wi-Fi access points (APs). By determining the unique identifier of the
APs, the MAC address for example, a position can be determined. Local or public
databases provide the link between observed MAC addresses and geolocation.

 Tracking devices only sniff for Wi-Fi signals, they do not have to connect to the Wi-
Fi. Therefore Wi-Fi positioning also harnesses Wi-Fi networks that you don’t own or
can’t access. For instan…
 Network-based geolocation
 Location can also be determined by using a service provider’s network infrastructure.
The accuracy of network-based techniques can vary. This is both dependent on the
concentration of base stations and the implementation of the most up-to-date timing
methods. A technique used by different network providers is network triangulation.
This means that you can determine the location of a point by forming triangles to it
from known points. To use a service provider’s network infrastructure your tracking
device will be equipped with a module of the service provider.

 Of all the geolocation technologies discussed, network-based geolocation requires the


eleast energy. The accuracy of this positioning technique depends on the network a nd
the …
5.Way Of Approach

 Get Datasets from the pertinent locations (Data Collection)

 Clean the Datasets to prepare them for analysis. (Data Cleaning via Pandas)

 Visualise the data using boxplots. (Using Matplotlib /Seaborn /Pandas)

 Fetch Geo-locational Data ((Foursquare API )REST APIs)

 Use K-Means Clustering to cluster the locations (Using ScikitLearn)

 Discover the locations on the map. (Using Folium/Seaborn)


6.Project Stages

1. Data Collection Module Collect the data from the users and store
the data in the database for later use.

2. Searching Module After giving the input the user search for the
location comes under their budget with their required facilities.

3. Recommendation Module After searching the required


information on the search bar, it will show the recommendation.
Based on budget and requirements.

4. Communication Module From the shown recommendations, the


user select the best accommodation and with the help of
communication module it will redirect the user to the contact details
of the owner.
Reference

[1] Cao, L., & Cong, G. (2018). Big data analytics in geolocational data: a survey.
ACM Computing Surveys (CSUR), 51(1), 1-36.

[2] Long, Y., & Shekhar, S. (2019). Spatial big data: a review on data acquisition,
storage, and management. Information Systems Frontiers, 21(6), 1247-1269.

[3] Gao, S., et al. (2017). Exploring human mobility patterns using geolocated
tweets. Applied Geography, 81, 44-54.

[4] Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2000). Quantitative


geography: perspectives on spatial data analysis. Sage.

[5] Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical


analysis, 27(2), 93-115.

“Design and develop a system for Exploratory


analysis of Geolocational Data in Python"
by
Group no.14

Signature………………………….. Signature………………………

Name: Mdhav sharma Name: Sarthak Yadav

Roll no: 2200140100054 Roll no: 2200140100094

Signature…………………………..
Name: Shivansh Gupta

Roll no: 2200140100101

Er. Hiresh Gupta Er. Anubha Dhaka Er. Anu Saxena

HOD (CSE MINI PROJECT INCHARGE SUPERVISOR/GUIDE

Department of Computer Science and Engineering


Shri Ram Murti Smarak College of Engineering & Technology, Bareilly

Dr. APJ Abdul Kalam Technical University, Lucknow

February, 2024

You might also like