Capstone Air Pollution Review 2 PT
Capstone Air Pollution Review 2 PT
Capstone Air Pollution Review 2 PT
18MIS0074 - Jerline Anne S S
Worldwide, air pollution is responsible for around 1.3 million deaths annually
according to the World Health Organization (WHO). The depletion of air quality is
just one of harmful effects due to pollutants released into the air. Other
detrimental consequences, such as acid rain, global warming, aerosol formation, and
photochemical smog, have also increased over the last several decades. The recent
rapid spread of COVID-19 has prompted many researchers to investigate underlying
pollution-related conditions contributing to COVID-19 pandemics in countries.
Several shreds of evidence have shown that air pollution is linked to significantly
higher COVID-19 death rates, and patterns in COVID-19 death rates mimic patterns in
both high population density and high PM2.5 exposure areas. All the above mentioned
raises an urgent need to anticipate and plan for pollution fluctuations to help
communities and individuals better mitigate the negative impact of air pollution.
To do so, air quality evaluation plays a significant role in monitoring and
controlling air pollution.
In developing countries like India, the rapid increase in population and economic
upswing in cities have led to environmental problems such as air pollution, water
pollution, noise pollution and many more. Air pollution has a direct impact on
human health .There has been increased public awareness about the same in our
country. Global warming, acid rains, and an increase in the number of asthma
patients are some of the long-term consequences of air pollution. Pr�cised air
quality forecasting can reduce the effect of maximal pollution on the humans and
biosphere as well. Hence, enhancing air quality forecasting is one of the prime
targets for the society. Sulfur Dioxide is a gas. It is one of the major pollutants
present in air. It is colorless and has a nasty, sharp smell. It combines easily
with other chemicals to form harmful substances like sulphuric acid, sulfurous acid
etc. Sulfur dioxide affects human health when it is breathed in. It irritates the
nose, throat, and airways to cause coughing, wheezing, shortness of breath, or a
tight feeling around the chest. The concentration of sulfur dioxide in the
atmosphere can influence the habitat suitability for plant communities, as well as
animal life. The proposed system is capable of predicting quality of air using ML
Over the past few decades, due to human activities, industrialization, and
urbanization, air pollution has become a life-threatening factor in many countries
around the world. Air, an essential natural resource, has been compromised in terms
of quality by economic activities. Air pollution is a severe problem in areas where
population density is high such as metropolitan cities. Various types of emissions
caused by people's actions, such as transportation, power, and fuel use, are
affecting air quality. Considerable research has been devoted to predicting
instances of poor air quality, but most studies are limited by insufficient
longitudinal data, making it difficult to account for seasonal and other factors.
Several prediction models have been developed using an 11-year dataset collected by
Taiwan's Environmental Protection Administration (EPA). We forecast the air quality
by using machine learning to predict the air quality index of a given area. Air
quality index is a dataset for a standard measure used to indicate the pollutant
(so2, no2, rspm, spm. etc.) levels over a period. The ML models like Decision tree
and Random Forest Classifier will be implemented and compared to show better
Air pollution is rapidly increasing due to various human activities and is the
introduction into the atmosphere of chemicals, particulates or biological materials
that cause discomfort, disease or death of humans, damage other living organisms
such as food crops, or damage natural environment or built environment. Indeed air
pollution is one of the important environmental problems in metropolitan and
industrial cities. So it's very important to predict pollution and avoid these
problems. Air pollution prediction is one of the most interesting and challenging
tasks and we give the prediction techniques which can be used to give next day,
next month, next year air pollution counts to avoid these problems.
Sravya, B.L., Mahalakshmi, A.P., Swarupini, D.B., & Jaswanth, B.V. (2020). A Deep
Learning based Air Quality Prediction.
Industries are the major means of air pollutants. Air pollution in the form of
carbon dioxide and methane raises the earth's temperature, the less gasoline we
burn, the better we do to reduce air pollution and harmful effects of climate
change. Especially in metropolitan cities, the change in the temperature combined
with harmful chemicals may lead to dangerous signs of air pollution. Quality of air
prediction techniques has a major importance in the current learning world. Many
machine learning algorithms have done a lot of research in identifying the air
quality index. Applying deep learning models on these data can show a great
difference in predicting the quality of air. They proposed an LSTM based deep
learning technique in evaluating hourly based encompassing air quality. The
proposed results outperformed the existing model results through predicting RMSE
Ijmtst, Editor. (2022). Air Pollution Control using Data Mining. International
Journal for Modern Trends in Science and Technology. 8. 303-312.
Air pollution is one of the major hazards among the environmental pollution. As
each living organism needs fresh and good quality air for every second. None of the
living things can survive without such air. But because of automobiles,
agricultural activities, factories and industries, mining activities, burning of
fossil fuels our air is getting polluted. These activities spread sulfur dioxide,
nitrogen dioxide, carbon monoxide, particulate matter pollutants in our air which
is harmful for all living organisms. The air we breathe every moment causes several
health issues. So we need a good system that predicts such pollution and is helpful
in a better environment. It leads us to look for advanced techniques for predicting
air pollution. So here we are predicting air pollution for our smart city using
data mining techniques. In our model we are using a multivariate multi step Time
Series data mining technique using random forest algorithm. Our system takes past
and current data and applies them to our model to predict air pollution. This model
reduces the complexity and improves the effectiveness and practicability and can
provide more reliable and accurate decisions for environmental protection
departments for smart cities.
T. Madan, S. Sagar and D. Virmani, "Air Quality Prediction using Machine Learning
Algorithms -A Review," 2020 2nd International Conference on Advances in Computing,
Communication Control and Networking (ICACCCT), Greater Noida, India, 2020, pp.
140-145, doi: 10.1109/ICACCCN51052.2020.9362912.
Predicting air quality is a necessary step to be taken by the government as it is
becoming a major concern among the health of human beings. The Air quality Index
measures the quality of air. Various air pollutants causing air pollution are
Carbon dioxide, Nitrogen dioxide, carbon monoxide etc that are released from
burning of natural gas, coal and wood, industries, vehicles etc. Air Pollution can
cause severe diseases like lung cancer, brain disease and even lead to death.
Machine learning algorithms help in determining the air quality index. Various
research is being done in this field but results are still not accurate. Dataset
are available from Kaggle, air quality monitoring sites and divided into two
Training and Testing. Machine Learning algorithms employed for this are Linear
Regression, Decision Tree, Random Forest, Artificial Neural Network, Support Vector
C. Shi, Y. Wang, Y. Wan and S. Wu, "Air Quality Prediction Based on Machine
Learning," 2022 International Conference on Machine Learning and Knowledge
Engineering (MLKE), Guilin, China, 2022, pp. 1-5, doi:
In recent years, due to the vigorous development of industrialization,
environmental protection measures can not be effectively guaranteed. Increasingly
serious environmental problems have gradually become the primary problem affecting
the quality of national life. Therefore, we need to establish a relatively accurate
air quality prediction model to understand the possible air pollution process in
advance. According to the prediction results of the model, it is of great
significance to establish and take corresponding control measures to reduce air
pollution. This paper makes full use of data mining methods such as mutual
information theory, neural networks, and intelligent optimization algorithms. We
use the basic data of long-term air quality prediction of open monitoring points as
a training set and test set. Firstly, the SOM neural network model is used for
unsupervised clustering of relevant pollutant data to analyze the correlation
between various monitored pollutants. Aiming at the problems of a large amount of
data and long calculation time of the algorithm, combined with the clustering
results, an NSGA-II optimized neural network is proposed to predict the future
pollution situation. The experimental results show that the prediction accuracy of
pollutants can reach more than 90%.
J. Collado and C. Pinzon, "Air Pollution Prediction Using Machine Learning
Algorithms: A Literature Review," 2022 V Congreso Internacional en Inteligencia
Ambiental, Ingenier�a de Software y Salud Electr�nica y M�vil (AmITIC), San Jos�,
Costa Rica, 2022, pp. 1-6, doi: 10.1109/AmITIC55733.2022.9941271.
Air pollution continues to be a problem that affects us all worldwide, since it is
estimated that 7 million people die each year due to repeated exposure to
pollutants that cause chronic conditions such as severe respiratory diseases,
cardiovascular problems and cancer. On the other hand, climatological effects lead
to the deterioration of the planet's ecosystems. Therefore, air quality monitoring
systems are the main tools used by governments to control the emission of toxic
gasses into the atmosphere. This makes it possible to ensure the quality of life
and the general well-being of the population, as well as to strengthen the
agricultural and industrial sectors. The objective of this work is to present a
review of the characteristics present in the mechanisms applied in the prediction
of air pollution, with the main objective of synthesizing the knowledge found,
identifying the models, approaches and variables that are most studied. The results
show that the hybrid models based on CNN-LTSM are the most used. Other studies use
GRU and ELM that have good results when making predictions. Suspended particles
(PM10 and PM2.5) are the main object of study. Multivariate models are also more
accurate and efficient when it comes to forecasting.
S. Yarragunta, M. A. Nabi, J. P and R. S, "Prediction of Air Pollutants Using
Supervised Machine Learning," 2021 5th International Conference on Intelligent
Computing and Control Systems (ICICCS), Madurai, India, 2021, pp. 1633-1640, doi:
The Estimation of the air quality index can be accomplished based on pollutants
causing effect viz., PM10, PM2.5, SO2, CO, and NO2. These are the components used
in supervised machine learning procedures to compare the air quality of an
environment. The main goal is to create a machine learning model and investigate
the air quality index by predicting the best results from various machine learning
algorithms based on their precision. We used logistic regression, decision tree,
support vector machine, random forest tree, Nave Bayes theorem, and K-nearest
neighbor as six basic machine learning algorithms.
S. Jeya and L. Sankari, "Air Pollution Prediction by Deep Learning Model," 2020 4th
International Conference on Intelligent Computing and Control Systems (ICICCS),
Madurai, India, 2020, pp. 736-741, doi: 10.1109/ICICCS48265.2020.9120932.
The impact of harmful pollutants in the air on human health is a vast area of
research, preventing or controlling, and also monitoring the pollutant is the huge
responsibility of any governing body. Several computing models starting from
statistical and machine learning to deep learning have compared and contrasted to
prove the accuracy of forecasting air quality standards until date. The level of
pollutants is still not in control in several parts of the world due to various
sources and reasons. This paper attempts to forecast the PM2.5 pollutant which is
one of the detrimental diseases triggering pollutants throughout the globe by using
bidirectional long short term memory model. The proposed model accuracy is
comparatively greater than the existing model by evaluating the following error
estimation metrics Root mean square error = 9.86, mean absolute error = 7.53, and
symmetric mean absolute percentage error = 0.1664.
Examining and protecting air quality has become one of the most essential
activities for the government in many industrial and urban areas today. The
meteorological and traffic factors, burning of fossil fuels, and industrial
parameters play significant roles in air pollution. With this increasing air
pollution, We are in need of implementing models which will record information
about concentrations of air pollutants(so2,no2,etc).The deposition of these harmful
gasses in the air is affecting the quality of people's lives, especially in urban
areas. Lately, many researchers began to use Big Data Analytics approach as there
are environmental sensing networks and sensor data available
Air Quality Index(AQI), is used to measure the quality of air. Earlier classical
methods such as probability, statistics were used to predict the quality of air,
but those methods are very complex to predict the quality of air. Due to
advancement of technology, now it is very easy to fetch the data about the
pollutants of air using sensors and then store the data in files. Assessment of raw
data to detect the pollutants needs vigorous analysis like ML models.
The existing system only predicts the quality of air country wise which is not
sufficient to understand the air quality impact in depth level. One drawback in the
existing system is that it cannot predict air quality in sub-regions, where each
sub-region can have different air pollutant levels than others. To overcome this we
use the Air Quality Analysis and Prediction system. In this system we fetch the air
pollution data. Once data is fetched data is trained according to the environment.
This data is used to generate patterns in later phases. Region wise air quality
analysis is performed and prediction of future air quality is determined.
There are two primary phases in the system:
Training phase: The system is trained by using the data in the data set and fits a
model (line/curve) based on the algorithm chosen accordingly.
Testing phase: the system is provided with the inputs and is tested for its
working. The accuracy is checked. And therefore, the data that is used to train the
model or test it, has to be appropriate.
The system is designed to detect and predict AQI level and hence appropriate
algorithms must be used to do the two different tasks. Before the algorithms are
selected for further use, different algorithms were compared for its accuracy. The
well-suited one for the task was chosen.
As the existing system only predict the quality of air country wise which is not
sufficient to understand the air quality impact in depth level. One drawback in
existing system is that it cannot predict air quality in sub-regions, where each
sub region can have different air pollutant level than other. To overcome this we
use Air Quality Analysis and Prediction system. In this system we fetch the air
pollution data. Once data is fetched data is trained according to environment. This
data is use to generate patterns in later phase. Region wise air quality analysis
is performed and prediction of future air quality is determined.
Step 1: Extraction of historical dataset.
Step 2: Data pre-processing and normalization.
Step 3: Divide dataset in 70:30 ratio.
Step 4: Perform Feature selection on the dataset features.
Step 5: Train and test using different regression algorithms.
To prognosticate the air quality of The NCR area, we want the pollutant
concentration of all the elements available in the air. Which will be available in
the the website, which holds all the data that contaminates the area
every year. We use data from several stations which measures many elements present
in the atmosphere. Data is taken from 10 different stations in NCR. These data are
stored in the form of a table which consists of a total of 3469 rows and having 8
columns in each row. The AQI formulae will be applied in order to calculate the AQI
by using the various regression algorithm for a particular year.
After the input dataset is given, the data will be preprocessed by
� Removing Null values from a data frame and replace NaN values with default
� Sometimes our data will be qualitative form, that is we have texts as our data.
We can find categories in text form. Now it gets complicated for machines to
understand texts and process them, rather than numbers, since the models are based
on mathematical equations and calculations. Therefore, we have to encode the
categorical data.
� Then it fit the model to the data, then transform the data according to the
fitted model. Step 3: After the preprocessing, the data is scaled to a fixed range
- usually 0 to 1. The cost of having this bounded range - in contrast to
standardization - is that we will end up with smaller standard deviations, which
can suppress the effect of outliers. Then using s_to_super function the first
column of row(t) is shifted to last column of row(t-1) and concatenated. This act
transforms a normal preprocessed dataset to recurrent dataset.
Feature selection is the method of choosing a subset from primary features that
include important information to prognosticating output data. In the case of
unnecessary data, feature extraction implies used. Feature extraction includes the
choice of best input parameters of the chosen input dataset. The unified dataset
hence gathered is used for further study. The maximum amount of inputs available
for review is seven, hence all the inputs are selected for the computations.
Now we need to split our dataset into two sets - a Training set and a Test set. We
will train our machine learning models on our training set, i.e. our machine
learning models will try to understand any correlations in our training set and
then wewilltestthe models on our test setto check how accurately it can predict. A
general ruleofthe thumbis toallocate80%ofthe dataset to training set and the
remaining 20% to test set. For this task, we will importtest_train_split from
model_selection library of scikit.
Now to build our training and test sets, we will create 4 sets- X_train (training
part of the matrix of features), X_test (test part of the matrix of features),
Y_train (training part of the dependent variables associated with the X train sets,
and therefore also the same indices), Y_test(test part ofthe dependent variables
associated with the X test sets, and therefore also the same indices). We will
assign to them the test_train_split, which takes the parameters - arrays (X and Y),
Now, we need to build a model to train the data. Here the model used is Decision
tree and random forest.
The random forest is a supervised learning algorithm that randomly creates and
merges multiple decision trees into one "forest." The goal is not to rely on a
single learning model, but rather a collection of decision models to improve
accuracy. The primary difference between this approach and the standard decision
tree algorithm is that the root nodes feature splitting nodes are generated
The Decision Tree Regressor observes features of an attribute and trains a model in
the form of a tree to predict data in the future to produce meaningful output.
Decision tree Regressor learns from the max depth, min depth of a graph and
according to system analyzes the data.
The regulation of air pollutant levels is rapidly becoming one of the most
important tasks. It is important that people know what the level of pollution in
their surroundings is and take a step towards fighting against it. The proposed
system will help common people as well as those in the meteorological department to
detect and predict pollution levels and take the necessary action in accordance
with that. Also, this will help people establish a data source for small localities
which are usually left out in comparison to the large cities. The agenda of my work
is not only to bring awareness but also to minimize pollution through proper
measures and ensure that the vehicles are emitting the pollutants within the range
of regular pollution checks. This can lead to a pollution free region in the area.
Okokpujie, Kennedy & Noma-Osaghae, Etinosa & Odusami, Modupe & John, Samuel &
Oluwatosin, Oluga. (2018). A Smart Air Pollution Monitoring System. International
Journal of Civil Engineering and Technology. 9. 799-809.
Vineeta , Ajit Bhat , Asha S Manek , Pranay Mishra, 2019, Machine Learning based
Prediction System for Detecting Air Pollution, INTERNATIONAL JOURNAL OF ENGINEERING
RESEARCH & TECHNOLOGY (IJERT) Volume 08, Issue 09 (September 2019)
M. Ghoneim and S. M. Hamed, "Towards a Smart Sustainable City: Air Pollution
Detection and Control using Internet of Things," 2019 5th International Conference
on Optimization and Applications (ICOA), Kenitra, Morocco, 2019, pp. 1-6, doi:
Sravya, B.L., Mahalakshmi, A.P., Swarupini, D.B., & Jaswanth, B.V. (2020). A Deep
Learning based Air Quality Prediction.
Ijmtst, Editor. (2022). Air Pollution Control using Data Mining. International
Journal for Modern Trends in Science and Technology. 8. 303-312.
T. Madan, S. Sagar and D. Virmani, "Air Quality Prediction using Machine Learning
Algorithms -A Review," 2020 2nd International Conference on Advances in Computing,
Communication Control and Networking (ICACCCT), Greater Noida, India, 2020, pp.
140-145, doi: 10.1109/ICACCCN51052.2020.9362912.
C. Shi, Y. Wang, Y. Wan and S. Wu, "Air Quality Prediction Based on Machine
Learning," 2022 International Conference on Machine Learning and Knowledge
Engineering (MLKE), Guilin, China, 2022, pp. 1-5, doi:
J. Collado and C. Pinzon, "Air Pollution Prediction Using Machine Learning
Algorithms: A Literature Review," 2022 V Congreso Internacional en Inteligencia
Ambiental, Ingenier�a de Software y Salud Electr�nica y M�vil (AmITIC), San Jos�,
Costa Rica, 2022, pp. 1-6, doi: 10.1109/AmITIC55733.2022.9941271.
S. Yarragunta, M. A. Nabi, J. P and R. S, "Prediction of Air Pollutants Using
Supervised Machine Learning," 2021 5th International Conference on Intelligent
Computing and Control Systems (ICICCS), Madurai, India, 2021, pp. 1633-1640, doi:
S. Jeya and L. Sankari, "Air Pollution Prediction by Deep Learning Model," 2020 4th
International Conference on Intelligent Computing and Control Systems (ICICCS),
Madurai, India, 2020, pp. 736-741, doi: 10.1109/ICICCS48265.2020.9120932.
U. Mahalingam, K. Elangovan, H. Dobhal, C. Valliappa, S. Shrestha and G. Kedam, "A
Machine Learning Model for Air Quality Prediction for Smart Cities," 2019
International Conference on Wireless Communications Signal Processing and
Networking (WiSPNET), Chennai, India, 2019, pp. 452-457, doi:
V. Hable-Khandekar and P. Srinath, "Machine Learning Techniques for Air Quality
Forecasting and Study on Real-Time Air Quality Monitoring," 2017 International
Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune,
India, 2017, pp. 1-6, doi: 10.1109/ICCUBEA.2017.8463746.
R. Murugan and N. Palanichamy, "Smart City Air Quality Prediction using Machine
Learning," 2021 5th International Conference on Intelligent Computing and Control
Systems (ICICCS), Madurai, India, 2021, pp. 1048-1054, doi:
Forecasting Criteria Air Pollutants Using Data Driven Approaches; An Indian Case
Study Tikhe Shruti, Dr. Mrs. Khare , Dr. Londhe,IOSR-JESTFT (Mar. - Apr. 2013)
Air Quality Forecasting Methods,
Multivariate Multistep Time series Forecasting model for Air Pollution.
multivariate-multi-stepair-pollution-time-series-forecasting/K. Elissa, "Title of
paper if known," unpublished.
Yi-Ting Tsai, Yu-Ren,Zeng, Yue-Shan Chang, "Air pollution forecasting using RNN
with LSTM", IEEE(2018)
Mansi Yadav, Suruchi Jain and K. R. Seeja," Prediction of Air Quality Using Time
Series Data Mining", Springer (2019) [5] Manisha Bisht and K.R. Seeja," Air
Pollution Prediction Using Extreme Learning Machine: A Case Study on Delhi.",
Khaled Bashir Shaban, Senior Member, IEEE, Abdullah Kadri, Member, IEEE, and Eman
Rezk," Air Pollution Monitoring System With Forecasting Models.", IEEE(2016)
Khaled Bashir Shaban, Abdullah Kadri, Eman Rezk, "Urban Air Pollution Monitoring
System With Forecasting Models",IEEE SENSORS JOURNAL, VOL. 16, NO. 8, APRIL 15,