Telecom Report
Telecom Report
Telecom Report
PROJECT REPORT
Submitted by
POOJA PRADEEP
(Reg. No: ICE22MCA-2038)
CERTIFICATE
External Examiner
ACKNOWLEDGEMENT
Apart from the efforts of me, the success of this project depends largely on the
encouragement and guidelines of many other. I take this opportunity to express my
gratitude to the people who have been instrumental in the successful completion of this
project. I would like to show my heartfelt gratitude towards Prof. Dr. K A Navas,
Principal, Ilahia College of Engineering and Technology for granting me the permission
towork this project. Also, I would like to show my reatest gratitude towards our head of
the Department of Master of Computer Applications Prof. Anoop R, Project Guide Prof.
Sukrith Lal P S ,Assistant Professor and Project Coordinator Dr.Sulfath P M , Assistant
Professor, for their valuable advice and guidance. Finally, I express my gratitude and
thanks to all our teachers and other faculty members of the Department of Master of
Computer Applications, for their sincere and friendly cooperation in completing this
project.
POOJA PRADEEP
Abstract
Chapter Page
No.
ACNOWLEGEMENT.......................................................................................
.......3
ABSTRACT....................................................................................................
LIST OF FIGURES.........................................................................................
LIST OF TABLES...........................................................................................
1. INTRODUCTION.......................................................................................
2. SUPPORTING LITERATURE..................................................................
2.1 Literature Review.................................................................................
2.2 Findings And Proposals........................................................................
3. SYSTEM ANALYSIS...............................................................................
3.1 Analysis of dataset................................................................................
3.1.1 About the dataset.............................................................................
3.1.2 Explore the dataset..........................................................................
3.2 Data Preprocessing...............................................................................
3.2.1 Data Cleaning..................................................................................
3.2.2 Feature Engineering........................................................................
3.3 Data Visualization.................................................................................
3.4 Analysis Of Algorithm..........................................................................
3.4.1 Logistic Regression.........................................................................
3.5 Project Pipeline.....................................................................................
3.6 Feasibility Analysis...............................................................................
3.6.1 Technical Feasibility........................................................................
3.6.2 Operational Feasibility....................................................................
3.6.3 Economic Feasibility.......................................................................
3.7 System Environment.............................................................................
3.7.1 Software Environment.....................................................................
3.7.2 Hardware Environment...................................................................
4. SYSTEM DESIGN....................................................................................
4.1 Model Building.....................................................................................
4.1.1 Implementation of code...................................................................
4.1.2 Model Planning...............................................................................
4.1.3 Model Training................................................................................
4.1.3 Model Testing..................................................................................
4.2 Database Design...................................................................................
5. RESULTS AND DISCUSSION................................................................
6. MODEL DEPLOYMENT.........................................................................
6.1 UI Design..............................................................................................
7. CONCLUSION........................................................................................42
8. SCOPE AND FUTURE ENHANCEMENT.............................................
9. APPENDIX................................................................................................
10. REFERENCES........................................................................................
LIST OF FIGURES
1. INTRODUCTION
2. SUPPORTING LITERATURE
Year 2023
Key Findings Experimental results found that the Random Forest is providing
more accurate results in predicting the customer churn with the
accuracy of 79%.
Paper2: Customer Churn Prediction Using Apriori Algorithm and Ensemble Learning
Authors: Diaa Azzam, Manar Hamed, Nora Kasiem, Yomna Eid, Walaa Medhat
Published in: 2023 5th Novel Intelligent and Leading Emerging Sciences
Conference (NILES)
Customer churn poses a formidable challenge within the Telecom industry, as it can
result in significant revenue losses. In this research, we conducted an extensive study
aimed at developing a viable customer churn prediction method. Our method utilizes
the Apriori algorithm’s strength to identify the key causes of customer churn. In the
pursuit of this goal, we utilized multiple machine learning predictive models. All of
which were developed from the insights gleaned from the Apriori algorithm’s feature
extraction for churning customers. This extensive analysis encompassed a spectrum
of machine learning techniques that include Logistic Regression, Naive Bayes,
Support Vector Machines, Random Forests, and Decision Trees. Furthermore, we
utilized an ensemble learning approach to enhance the predictive accuracy of our
models. We also used a voting classifier refined with the best features within our
dataset. The voting classifier yielded an accuracy rate of 81.56%, underscoring the
effectiveness of our approach in addressing the critical issue of customer churn in the
Telecom industry.
Year 2023
Paper3: Exploratory Data Analysis and Customer Churn Prediction for the
Telecommunication Industry
Authors: Kiran Deep Singh, Prabh Deep Singh, Ankit Bansal, Gaganpreet Kaur,
Vikas Khullar
Published in: 2023 3rd International Conference on Advances in Computing,
Communication, Embedded and Secure Systems (ACCESS)
The telecommunications business is one of the key industries with a higher risk of
revenue loss owing to client turnover and environmental impact. Thus, efficient and
effective churn management includes targeted marketing campaigns, special
promotions, or other incentives to keep the customer engaged in technological
progress. There are a lot of machine learning algorithms available now, but very few
of them can effectively take into account the asymmetrical structure of the
telecommunications dataset. The efficiency of machine learning algorithms may also
vary depending on how closely they approximate the real-world telecommunications
data rather than the publicly available dataset. As a result, the researchers used
various predictive models, including XGBoost, for this dataset. The accuracy
achieved on the native dataset is 82.80%. Results show the effectiveness of the
predictive model with great technological capabilities.
Paper Title Exploratory Data Analysis and Customer Churn Prediction for
the Telecommunication Industry
Year 2023
Algorithm XGBoost
The data preprocessing procedures for the customer churn prediction dataset involve
several pivotal steps to ensure the dataset's suitability for predictive modeling.
Initially, comprehensive data cleaning is undertaken, addressing missing values,
duplicates, and outliers. Missing values are either imputed or removed, duplicates are
identified and eliminated to prevent redundancy, and outliers are addressed through
techniques like trimming or winsorization. This meticulous cleaning process is
crucial for maintaining data quality and integrity, minimizing biases or inaccuracies
that could impact subsequent analyses. Following data cleaning, feature encoding is
performed to transform categorical variables into numerical format, enhancing model
Feature engineering plays a pivotal role in enhancing the telecom customer churn
prediction dataset's predictive power and interpretability. Following data cleaning,
various feature engineering techniques are applied to enrich the dataset and optimize
model performance. Feature scaling is first applied to standardize the scales of
continuous variables, ensuring equal contribution to the model. Next, categorical
variables are transformed into numerical format through feature encoding, further
enhancing model interpretability and performance. Feature engineering also involves
Cost Function:
• Logistic regression uses the cross-entropy loss function to measure the
difference between the predicted probabilities and the actual binary outcomes.
• The cost function for logistic regression is derived from maximum likelihood
estimation.
• The goal is to minimize the cost function by adjusting the model parameters θ
using optimization algorithms like gradient descent.
Model Training:
• The model parameters θ are initially assigned random values.
• The cost function is iteratively minimized using optimization techniques such
as gradient descent.
• During each iteration, the model parameters are updated in the direction that
reduces the cost function.
• The process continues until convergence, where the change in the cost
function becomes negligible.
Decision Boundary:
• Logistic regression learns a linear decision boundary that separates the two
classes in the feature space.
• The decision boundary is determined by the coefficients θ learned during
training.
Model Evaluation:
• Logistic regression models are evaluated using metrics such as accuracy,
precision, recall, F1-score, and ROC curve.
• Accuracy measures the overall correctness of predictions, while precision and
recall provide insights into the model's performance on individual classes.
• ROC curve and AUC (Area Under the ROC Curve) quantify the trade-off
between true positive rate and false positive rate at different classification
thresholds.
Assumptions:
• Logistic regression assumes that the relationship between the input features
and the log-odds of the outcome variable is linear.
• It assumes that there is no multicollinearity among the predictor variables.
• The observations are assumed to be independent of each other.
Applications:
• Logistic regression finds applications in various fields such as healthcare ,
finance , marketing and more.
• It is often used as a baseline model for binary classification tasks due to its
simplicity and interpretability.
For costumer churn details , Telecom churn dataset dataset is used. The dataset have
to be split into train and test set. The preprocessing needed is cleaning and feature
engineering. Using training dataset model is fitted and it is evaluated using test
dataset. The trained model is saved and used for evaluation.
Deployment Pipeline
In system deployment phase, user needs to register in the website and provide personal
details for creating an account. User needs to verify the account details and then
connection approval should be done by the admin. The saved model is used to predict
the customer churn . Approved connecions can check the connection and payment
details.
Feasibility study is carried out to select the best system that meets the performance
requirements. It involves preliminary investigation of the project and examines
whether the designed system will be useful to the organization.
Technical Feasibility
Operational Feasibility
Economic Feasibility
The proposed churn management system not only demonstrates technical feasibility
but also exhibits strong operational feasibility, as it requires minimal training for staff
members and is designed with user-friendliness in mind. With straightforward
instructions and intuitive interfaces, any necessary training can be delivered quickly
and easily, ensuring that staff members can effectively utilize the system without
extensive prior knowledge of computer operations. Moreover, the project is
developed with the general public in mind, prioritizing accessibility for individuals
System environment specifies the hardware and software configuration of the new
system. Regardless of how the requirement phase proceeds, it ultimately ends with
the software requirement specification. A good SRS contains all the system
requirements to a level of detail sufficient to enable designers to design a system that
satisfies those requirements. The system specified in the SRS will assist the potential
users to determine if the system meets their needs or how the system must be
modified to meet their needs.
3.7.1 Software Environment
Various software used for the development of this application are the following:
1. Python
collection of routines for processing those arrays. In this application, its used for
handling arrays.
b. Matplotlib:
Matplotlib is a cross-platform, data visualization and graphical plotting library for
Python. One of the greatest benefits of visualization is that it allows us visual access
to huge amounts of data in easily digestible visuals. In this application, its used for
plotting the graph.
c. Tensorflow:
TensorFlow is an open-source library developed by Google primarily for deep
learning applications. In this application, its used for creating and handling the
model.
d. Keras:
Keras is a powerful and easy-to-use free open-source Python library for developing
and evaluating deep learning models. It wraps the efficient numerical computation
libraries Theano and TensorFlow and allows you to define and train neural network
models in just a few lines of code. In this application, its used for creating and
handling the model.
e. Scikit-learn:
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in
Python. It provides a selection of efficient tools for machine learning and statistical
modelling including classification, regression, clustering and dimensionality
reduction via a consistence interface in Python. In this application it is used to plot
confusion matrix.
f. OS:
The OS module in Python provides functions for interacting with the operating
system.OS comes under Python’s standard utility modules. This module provides a
portable way of using operating system-dependent functionality. In this application,
its used for saving the model.
g.Seaborn :
Seaborn is a Python data visualization library built on top of Matplotlib, providing a
high-level interface for creating attractive and informative statistical graphics. It
simplifies the process of generating complex visualizations by offering a wide range
of built-in themes, color palettes, and statistical plotting functions.
h. Matplotlib:
Matplotlib is a comprehensive Python library for creating static, animated, and
interactive visualizations in various formats and platforms.With its versatile plotting
functions and robust capabilities, Matplotlib is widely used for data exploration,
analysis, and presentation in fields such as data science, machine learning, and
academic research.
2. Jupyter Notebook
Jupyter Notebook is an open-source web application that allows users to create and
share documents containing live code, equations, visualizations, and narrative text. It
supports multiple programming languages, including Python, R, and Julia, making it
a versatile tool for data analysis, scientific computing, machine learning, and
education. With its interactive interface, users can write and execute code in
individual cells, view the output immediately, and document their analysis in
markdown cells. Jupyter Notebook promotes collaboration and reproducibility by
enabling users to share their work as interactive notebooks, facilitating seamless
communication and knowledge dissemination within the data science community and
beyond.
3. Pycharm
PyCharm is a powerful integrated development environment (IDE) specifically
designed for Python development. Developed by JetBrains, it offers a wide range of
features to enhance productivity and streamline the coding process. PyCharm
provides intelligent code completion, code navigation, and refactoring tools, helping
developers write clean, efficient code. Its robust debugger allows for easy
troubleshooting and testing of Python scripts and applications. PyCharm also offers
integration with version control systems like Git, as well as support for popular web
frameworks like Django and Flask, making it a versatile tool for web development.
With its user-friendly interface and extensive plugin ecosystem, PyCharm is a
preferred choice for both beginner and experienced Python developers looking to
build high-quality Python projects efficiently.
4. HTML,CSS,JS,Bootstrap:
HTML, CSS, and JS are the fundamental building blocks of web development.
HTML (Hypertext Markup Language) provides the structure and content of web
pages, defining elements like headings, paragraphs, and images. CSS (Cascading
Style Sheets) controls the presentation and layout of these elements, allowing
The minimum requirement for the implementation of the system is only one
machine. CPU: Intel Core i3
RAM: 4GB
Keyboard: Standard 108 keys Enhanced Keyboard
Display: 15” Monitor
Pointing device: Serial Mouse
4. SYSTEM DESIGN
effectively. Overall, model planning in this context aims to create a robust predictive
model tailored to the telecom industry's specific needs.
id Int 10 Numbers of
id
The aim of the telecom customer churn prediction project is to leverage data-driven
insights to understand and predict customer behavior within the telecommunications
industry, specifically focusing on identifying potential churners. Through analysis of
the customer churn prediction dataset, the project aims to uncover patterns and trends
in customer churn rates based on factors such as gender and service types. By
integrating advanced machine learning algorithms and additional functionalities like
bill alerts and payment history viewing, the project seeks to provide telecom
companies with a powerful tool for proactive churn mitigation and enhanced
customer engagement. Ultimately, the goal is to optimize services, foster customer
loyalty, and drive sustainable growth in a competitive telecom landscape through
data-driven decision-making and effective retention strategies.
Accuracy
In our thorough evaluation of the logistic regression model for customer churn
prediction, we conducted an extensive analysis, considering key metrics such as
accuracy scores of multiple algorithms. Upon examination, we found that the logistic
regression model achieved an impressive accuracy score of 0.882 on the dataset,
indicating an overall accuracy of approximately 88.2%. This accuracy metric
represents the proportion of correctly classified instances out of the total number of
instances, offering a comprehensive view of the model's performance. However, it's
important to note that while accuracy provides a general assessment, it may not
capture the intricacies of class-specific predictions. Despite this limitation, the high
accuracy score obtained suggests that the logistic regression model effectively
generalizes patterns within the data, showcasing its proficiency in predicting
customer churn. This performance underscores the reliability and robustness of the
model, providing valuable insights for telecommunications companies aiming to
mitigate customer churn and enhance overall business strategies. Through its
accurate predictions and ability to discern patterns, the logistic regression model
serves as a valuable tool for guiding decision-making processes and optimizing
customer retention efforts in the competitive telecom industry landscape.
Confusion Matrix
The confusion matrix, an essential tool in evaluating classification models, offers a
detailed breakdown of the model's performance across different classes. It illustrates
the number of true negatives (TN), false positives (FP), false negatives (FN), and
true positives (TP). In this case, the confusion matrix reveals that the model
accurately predicted 542 instances of non-churned customers (TN), while incorrectly
classifying 19 instances as churned customers (FP). Additionally, it correctly
identified 20 instances of churned customers (TP) but misclassified 56 instances as
non-churned (FN).
Classification Report
Further analysis through the classification report provides insights into the precision,
recall, and F1-score of the model across different classes. For non-churned customers
(class 0), the model demonstrated high precision (0.91), recall (0.97), and F1-score
0.94), indicating its effectiveness in accurately predicting non-churned customers.
However, for churned customers (class 1), the model's performance was
comparatively lower, with precision, recall, and F1-score values of 0.51, 0.26, and
0.35, respectively. While the model shows promise in identifying churned customers,
there is potential for improvement in its predictive capabilities for this class.
ROC Curve
The ROC curve evaluates a binary classification model's performance by plotting the
True Positive Rate (TPR) against the False Positive Rate (FPR) at varying
classification thresholds. TPR measures the proportion of actual positive cases
correctly identified by the model, while FPR represents the proportion of actual
negative cases incorrectly classified as positive. A model with a curve closer to the
top-left corner signifies high TPR and low FPR, indicating better performance. The
Area Under the Curve (AUC) quantifies overall model performance, with higher
values suggesting better discriminative ability. ROC curves are widely used in
machine learning to assess and compare classification models.
6. Model Deployment
Model deployment in deep learning encompasses the process of transitioning a
trained machine learning model from development to practical use in a production
environment, aiming to enable end-users to interact with the model and generate
predictions based on new input data. Central to this process is the focus on designing
and implementing a user-friendly interface for seamless user interaction. In the
present context, the user interface is constructed using HTML, CSS, JavaScript, and
Bootstrap, enhancing the web application's interactivity and usability. The inclusion
of these technologies contributes to a more engaging and intuitive user experience,
with visual aids such as figures providing clarity on the system's interface layout.
The resulting interface is designed to be simple and easy to navigate, with only
essential elements displayed on the pages, ensuring that users can interact with the
deployed model efficiently and effectively.
6.1 UI Design
fig6.1.1Home page
fig6.1.3 User-Payment
7. CONCLUSION
In conclusion, our analysis of the telecom customer churn prediction project has
unveiled valuable insights into customer behavior and service preferences within the
telecommunications industry. By delving into the custamor-churn-prediction dataset,
we identified distinct patterns in churn rates based on gender and service types,
shedding light on potential areas for targeted retention efforts. Moreover, our
investigation into profitable service types has informed strategic decision-making,
enabling companies to tailor offerings and maximize revenue. Through the
application of advanced machine learning algorithms and the integration of bill alerts
and payment history viewing functionalities, this project offers telecom companies a
powerful tool for proactive churn mitigation and enhanced customer engagement.
Overall, these findings underscore the importance of data-driven decision-making in
optimizing services, fostering customer loyalty, and driving sustainable growth in a
competitive telecom landscape.
9. APPENDIX
10. REFERENCES
• https://ieeexplore.ieee.org/document/9793315
• https://ieeexplore.ieee.org/document/10303463
• https://ieeexplore.ieee.org/document/9718460
• https://www.kaggle.com/datasets/shrutiarora185/churn
• https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0191-
6
• https://www.sciencedirect.com/science/article/pii/S2666720723001443
• https://www.linkedin.com/pulse/predicting-telecom-churn-machine-learning-
approach-asiedu-kingsley
• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9051585/
• https://link.springer.com/article/10.1007/s10844-022-00739-z