Telecom Report

COLLEGE OF ENGINEERING ANDTECHNOLOGY
Mulavoor P.O, Muvattupuzha, Kerala – 686673
TELECOM CUSTOMER CHURN PREDICTION
PROJECT REPORT
Submitted by
POOJA PRADEEP
(Reg. No: ICE22MCA-2038)
in partial fulfillment for the award of the degree

of
MASTER OF COMPUTER APPLICATIONS

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
MAY 2024
ILAHIA COLLEGE OF ENGINEERING AND TECHNOLOGY
MULAVOOR P.O MUVATTUPUZHA
DEPARTMENT OF COMPUTER APPLICATIONS
CERTIFICATE
This is to certify that the report entitled TELECOM CUSTOMER CHURN

PREDICTION submitted by POOJA PRADEEP (ICE22MCA-2038) to the APJ
Abdul Kalam Technological University in partial fulfillment of the requirements for the
award of the Degree of Master of Computer Applications is a bonafide record of the
project work carried out by her under my guidance and supervision. This report in any
form has not been submitted to any other University or Institute for any purpose.
Project Coordinator Project Guide Head of the Department

Dr. Sulfath P M Prof. Sukrith Lal P S Prof. Anoop
Submitted for the viva-voce held on ______________ at ICET, Muvattupuzha.
External Examiner
ACKNOWLEDGEMENT
Apart from the efforts of me, the success of this project depends largely on the
encouragement and guidelines of many other. I take this opportunity to express my
gratitude to the people who have been instrumental in the successful completion of this
project. I would like to show my heartfelt gratitude towards Prof. Dr. K A Navas,
Principal, Ilahia College of Engineering and Technology for granting me the permission
towork this project. Also, I would like to show my reatest gratitude towards our head of
the Department of Master of Computer Applications Prof. Anoop R, Project Guide Prof.
Sukrith Lal P S ,Assistant Professor and Project Coordinator Dr.Sulfath P M , Assistant
Professor, for their valuable advice and guidance. Finally, I express my gratitude and
thanks to all our teachers and other faculty members of the Department of Master of
Computer Applications, for their sincere and friendly cooperation in completing this
project.
POOJA PRADEEP
Abstract
In the dynamic realm of telecommunications, where customer churn poses a significant

challenge, this study delves into predicting churn dynamics by amalgamating
sophisticated data analysis techniques with innovative technological solutions. By
weaving together advanced analytics and technological advancements, we aim to decode
intricate patterns within vast datasets encompassing subscriber demographics, usage
patterns, and service-related metrics. Leveraging a spectrum of machine learning
algorithms, our research endeavors to construct predictive models capable of discerning
subtle indicators of churn. Simultaneously, our approach facilitates seamless model
integration into telecom infrastructure, enabling real-time churn analysis and proactive
mitigation strategies. Through meticulous data preprocessing and feature engineering,
we refine model predictive capabilities, yielding actionable insights and tailored
retention initiatives. This study underscores the significance of continuous model
refinement and real-time data processing in effectively combating customer churn,
empowering telecommunications companies to optimize operational efficiencies and
foster sustainable growth amidst fierce market competition. Through empirical
validation on telecom datasets, our integrated approach showcases efficacy in identifying
churners and implementing targeted retention strategies, thereby cultivating enduring
customer loyalty and organizational success.
CONTENTS
Chapter Page
No.
ACNOWLEGEMENT.......................................................................................
.......3
ABSTRACT....................................................................................................
LIST OF FIGURES.........................................................................................
LIST OF TABLES...........................................................................................
1. INTRODUCTION.......................................................................................
2. SUPPORTING LITERATURE..................................................................
2.1 Literature Review.................................................................................
2.2 Findings And Proposals........................................................................
3. SYSTEM ANALYSIS...............................................................................
3.1 Analysis of dataset................................................................................
3.1.1 About the dataset.............................................................................
3.1.2 Explore the dataset..........................................................................
3.2 Data Preprocessing...............................................................................
3.2.1 Data Cleaning..................................................................................
3.2.2 Feature Engineering........................................................................
3.3 Data Visualization.................................................................................
3.4 Analysis Of Algorithm..........................................................................
3.4.1 Logistic Regression.........................................................................
3.5 Project Pipeline.....................................................................................
3.6 Feasibility Analysis...............................................................................
3.6.1 Technical Feasibility........................................................................
3.6.2 Operational Feasibility....................................................................
3.6.3 Economic Feasibility.......................................................................
3.7 System Environment.............................................................................
3.7.1 Software Environment.....................................................................
3.7.2 Hardware Environment...................................................................
4. SYSTEM DESIGN....................................................................................
4.1 Model Building.....................................................................................
4.1.1 Implementation of code...................................................................
4.1.2 Model Planning...............................................................................
4.1.3 Model Training................................................................................
4.1.3 Model Testing..................................................................................
4.2 Database Design...................................................................................
5. RESULTS AND DISCUSSION................................................................
6. MODEL DEPLOYMENT.........................................................................
6.1 UI Design..............................................................................................
7. CONCLUSION........................................................................................42
8. SCOPE AND FUTURE ENHANCEMENT.............................................
9. APPENDIX................................................................................................
10. REFERENCES........................................................................................
LIST OF FIGURES
Fig No. Figure Name Page

No
3.2.1 Missing value visualization 16

3.3 Data Visualization 17
3.4.1 Logit Curve 19
3.5.1 Development Pipeline 21
3.5.2 Deployment Pipeline 22
4.1.1 Implementation of code 29
5.1 Accuracy Visualization 35
5.2 Confusion Matrix 36
5.3 Classification Report 37
5.4 ROC Curve 37
6.1.1 Home Page 38
6.1.2 Login Page 39
6.1.3 User-Payment 39
6.1.4 Telecom-Churn Prediction 40
6.1.5 Telecom-Connection Approval 40
6.1.6 Telecom-Payment Review 41
LIST OF TABLES
Fig No. Table Name Page

No
2.1.1 Literature Review 1 10

4.2.1 Bill payment 31
4.2.2 Connection Number 31
4.2.3 Connection Request 32
4.2.4 Customer 32
4.2.5 Customer Groups 33
4.2.6 Customer User Permissions 33
Telecom Customer Churn Prediction
1. INTRODUCTION
In the dynamic landscape of the telecommunication industry, customer retention reigns

supreme as a cornerstone for sustained business growth and market resilience. The
persistent challenge of customer churn, wherein subscribers discontinue services, poses
a formidable obstacle for telecommunication companies worldwide. Our project
embarks on a multifaceted journey to delve deep into the intricate web of customer
behavior and attributes, deploying sophisticated data analysis techniques to unveil the
underlying patterns and trends associated with churn. At its core lies the Churn dataset, a
comprehensive repository of customer information spanning demographics, usage
patterns, and service-related metrics.
Through meticulous exploratory data analysis, we aim to illuminate the nuanced factors
driving churn and discern the distinguishing characteristics between churners and non-
churners. This endeavor encompasses a fusion of machine learning algorithms and full-
stack development methodologies, seamlessly integrating advanced analytical
approaches with agile technological solutions. By drawing upon a diverse array of
machine learning algorithms and feature engineering techniques, our objective is to
construct predictive models capable of detecting subtle signals indicative of impending
churn.
By harnessing the power of data-driven insights, telecommunication companies can
proactively identify at-risk customers and tailor targeted retention strategies to
effectively mitigate churn. This holistic approach not only equips businesses with the
knowledge and tools necessary to fortify customer relationships but also optimizes
operational efficiencies and drives sustainable growth in an increasingly competitive
market landscape. Our project serves as a testament to the synergistic potential of
machine learning and full-stack development, empowering telecommunication
companies to navigate the complexities of customer churn with confidence and agility.
Department of Computer Applications, ICET 9

2. SUPPORTING LITERATURE
2.1 Literature Review
Paper1: Customer Churn Prediction in Telecom Services

Authors: Manisha Aeri, Shiv Ashish Dhondiyal, Yash Rana, Suraj Rawat,
Piyush Kothari, Ritik A
Published in: 2023 International Conference on Sustainable Emerging
Innovations in Engineering and Technology (ICSEIET)
Customer churn, sometimes referred to as attrition or defection of customers, is the
process through which consumers discontinue their affiliation with a company or
subscription with a particular business or service. In today's highly competitive
marketplace, where acquiring new customers can be costly, understanding, and
mitigating customer churn has become a critical aspect of business strategy. This
abstract aims to provide a concise overview of customer churn, its causes,
consequences, and strategies for reducing churn rates. Experimental results found
that the Random Forest is providing more accurate results in predicting the customer
churn with the accuracy of 79%.
Paper Title Customer Churn Prediction in Telecom Services
Publication International Conference on Sustainable Emerging

Innovations in Engineering and Technology (ICSEIET)
Year 2023
Algorithm Random Forest
Key Findings Experimental results found that the Random Forest is providing
more accurate results in predicting the customer churn with the
accuracy of 79%.
Table2.1.1 Literatire Review 1

Paper2: Customer Churn Prediction Using Apriori Algorithm and Ensemble Learning
Authors: Diaa Azzam, Manar Hamed, Nora Kasiem, Yomna Eid, Walaa Medhat
Published in: 2023 5th Novel Intelligent and Leading Emerging Sciences
Conference (NILES)
Customer churn poses a formidable challenge within the Telecom industry, as it can
result in significant revenue losses. In this research, we conducted an extensive study
aimed at developing a viable customer churn prediction method. Our method utilizes
the Apriori algorithm’s strength to identify the key causes of customer churn. In the
pursuit of this goal, we utilized multiple machine learning predictive models. All of
which were developed from the insights gleaned from the Apriori algorithm’s feature
extraction for churning customers. This extensive analysis encompassed a spectrum
of machine learning techniques that include Logistic Regression, Naive Bayes,
Support Vector Machines, Random Forests, and Decision Trees. Furthermore, we
utilized an ensemble learning approach to enhance the predictive accuracy of our
models. We also used a voting classifier refined with the best features within our
dataset. The voting classifier yielded an accuracy rate of 81.56%, underscoring the
effectiveness of our approach in addressing the critical issue of customer churn in the
Telecom industry.
Paper Title Customer Churn Prediction Using Apriori Algorithm and

Ensemble Learning
Publication 5th Novel Intelligent and Leading Emerging Sciences

Conference (NILES)
Year 2023
Algorithm Voting classifier
Key The voting classifier yielded an accuracy rate of 81.56%,

Findings underscoring the effectiveness of our approach in addressing
the critical issue of customer churn in the Telecom industry.
Table2.1.2 Literature Review 2

Paper3: Exploratory Data Analysis and Customer Churn Prediction for the
Telecommunication Industry
Authors: Kiran Deep Singh, Prabh Deep Singh, Ankit Bansal, Gaganpreet Kaur,
Vikas Khullar
Published in: 2023 3rd International Conference on Advances in Computing,
Communication, Embedded and Secure Systems (ACCESS)
The telecommunications business is one of the key industries with a higher risk of
revenue loss owing to client turnover and environmental impact. Thus, efficient and
effective churn management includes targeted marketing campaigns, special
promotions, or other incentives to keep the customer engaged in technological
progress. There are a lot of machine learning algorithms available now, but very few
of them can effectively take into account the asymmetrical structure of the
telecommunications dataset. The efficiency of machine learning algorithms may also
vary depending on how closely they approximate the real-world telecommunications
data rather than the publicly available dataset. As a result, the researchers used
various predictive models, including XGBoost, for this dataset. The accuracy
achieved on the native dataset is 82.80%. Results show the effectiveness of the
predictive model with great technological capabilities.
Paper Title Exploratory Data Analysis and Customer Churn Prediction for
the Telecommunication Industry
Publication 3rd International Conference on Advances in Computing,

Communication, Embedded and Secure Systems (ACCESS)
Year 2023
Algorithm XGBoost
Key The accuracy achieved on the native dataset is 82.80%.

Findings Results show the effectiveness of the predictive model with
great technological capabilities.
Table2.1.3 Literature Review 3

2.2 Findings and Proposals

The project's significance lies in its pivotal role within the telecommunication
industry, where the relentless pursuit of customer retention is paramount for
sustained success. At its core, this endeavor addresses the pervasive challenge of
customer churn, a phenomenon that directly impacts companies' revenue streams and
market competitiveness. By developing robust predictive models and leveraging
advanced data analysis techniques, the project aims to unravel the intricate web of
factors contributing to churn, thereby offering telecommunication companies
invaluable insights into customer behavior and preferences.
One of the project's primary contributions is its ability to safeguard revenue streams
by proactively identifying and mitigating churn. In an industry characterized by
fierce competition and evolving customer expectations, the ability to retain existing
customers is crucial for maintaining financial stability and market relevance. By
accurately predicting churn, companies can implement targeted retention strategies,
such as personalized offers and proactive customer support, to minimize revenue loss
and maximize customer lifetime value.
Moreover, the project holds significance in its potential to enhance customer
experience and satisfaction. By understanding the underlying reasons for churn,
telecommunication companies can address pain points, improve service quality, and
tailor offerings to better meet customer needs. This focus on customer-centricity not
only fosters stronger relationships but also cultivates brand loyalty and positive
word-of-mouth, driving sustainable growth in the long term.
Additionally, the project promotes data-driven decision-making within
telecommunication companies, marking a shift from intuition-based approaches to
empirical evidence-based strategies. By harnessing the power of advanced analytics
and machine learning, companies can make informed decisions regarding resource
allocation, marketing initiatives, and product development, thereby optimizing
operational efficiency and maximizing return on investment.
In conclusion, the project's significance extends beyond its immediate goal of churn
prediction; it embodies a strategic imperative for telecommunication companies
seeking to thrive in an increasingly competitive landscape. By leveraging data-driven
insights, companies can fortify their market position, enhance customer satisfaction,
and drive sustainable growth, ultimately shaping the future trajectory of the industry.
3. SYSTEM ANALYSIS

3.1 Analysis of dataset
3.1.1 About the dataset
The source of data set is from kaggle.com. Kaggle is a subsidiary of GoogleLLC, is a

online community of data scientists and machine learning practitioners. Kaggle
allows users to find and publish datasets, explore and build models in a web-based
data science environment.
A dataset is a collection of data in which data is arranged in some order. A dataset
can contain any data from a series of an array to a database table. Here, we need a
huge amount of data, because, without the data, one cannot train models. The
technology applied behind any ML system cannot work properly if the dataset is not
well prepared and pre-processed.
The telecom customer churn prediction dataset comprises 7,043 rows and 21
columns of comprehensive customer information. It includes data on customer churn
status, indicating whether customers left within the last month, alongside details on
the array of services each customer has signed up for, spanning phone, multiple lines,
internet, online security, online backup, device protection, tech support, and
streaming TV and movies. Additionally, the dataset encompasses customer account
information, encompassing tenure, contract type, payment method, paperless billing
status, monthly charges, and total charges. Furthermore, demographic information
such as gender, age range, and the presence of partners and dependents is provided,
offering a holistic view of customer profiles. This rich dataset serves as a valuable
resource for predicting churn behavior and informing targeted retention strategies
within the telecommunications industry.
3.1.2 Explore the dataset
The dataset encapsulates information on telecom customer churn, encompassing

7043 entries, each representing a customer, and spanning 21 columns elucidating
diverse customer attributes. The focal point of interest lies within the "Churn"
column, serving as the target variable for analytical scrutiny or predictive modeling
endeavors.
Amidst the array of attributes, "Customer ID" serves as a unique identifier for each
customer, while "gender" denotes the customer's gender identity. The column "senior
citizen" distinguishes whether a customer falls into the senior citizen demographic,

indicated by binary markers. "Partner" signifies whether the customer has a

significant other, and "dependents" reveals if the customer has family members
reliant on their subscription.
The "tenure" attribute quantifies the duration, in months, of a customer's tenure with
the telecom service. "Phone service" elucidates the customer's subscription status to
phone services, while "Multiple Lines" indicates if the customer opts for additional
phone lines.
Regarding internet services, the column "Internet Service" delineates the customer's
internet service provider, categorized as DSL, Fiber optic, or None. Furthermore,
attributes like "Online Security", "Online Backup", "Device Protection", "Tech
Support", "Streaming TV", and "Streaming Movies" outline the customer's
subscription status to various digital amenities.
The "Contract" attribute specifies the contractual agreement's term, ranging from
Month-to-month, One year, to Two years. "Paperless Billing" denotes the customer's
preference for billing documentation, while "Payment Method" unveils the mode of
financial transaction utilized.
Financial parameters include "Monthly Charges", indicating the recurring monthly
fee, and "Total Charges", portraying the cumulative financial outlay. Lastly, the
"Churn" column signifies the customer's attrition status, implying whether they have
discontinued their subscription.
3.2 Data Preprocessing
3.2.1 Data Cleaning
The data preprocessing procedures for the customer churn prediction dataset involve
several pivotal steps to ensure the dataset's suitability for predictive modeling.
Initially, comprehensive data cleaning is undertaken, addressing missing values,
duplicates, and outliers. Missing values are either imputed or removed, duplicates are
identified and eliminated to prevent redundancy, and outliers are addressed through
techniques like trimming or winsorization. This meticulous cleaning process is
crucial for maintaining data quality and integrity, minimizing biases or inaccuracies
that could impact subsequent analyses. Following data cleaning, feature encoding is
performed to transform categorical variables into numerical format, enhancing model

interpretability and performance. Feature scaling is then applied to standardize the

scales of continuous variables, ensuring that all features contribute equally to the
model. Feature engineering further enriches the dataset by creating new features or
transforming existing ones to augment predictive power. Lastly, the dataset is split
into training and testing subsets to evaluate model performance effectively, ensuring
that the model's generalization ability is assessed accurately. Overall, these data
preprocessing steps are vital for preparing the dataset for predictive modeling,
facilitating robust analysis and interpretation of customer churn prediction.
Fig3.2.1 Missing value visualization
3.2.2 Feature Engineering
Feature engineering plays a pivotal role in enhancing the telecom customer churn
prediction dataset's predictive power and interpretability. Following data cleaning,
various feature engineering techniques are applied to enrich the dataset and optimize
model performance. Feature scaling is first applied to standardize the scales of
continuous variables, ensuring equal contribution to the model. Next, categorical
variables are transformed into numerical format through feature encoding, further
enhancing model interpretability and performance. Feature engineering also involves

creating new features or transforming existing ones to augment predictive power,

thereby enriching the dataset's predictive capabilities. Finally, the dataset is split into
training and testing subsets to evaluate model performance effectively, ensuring
accurate assessment of the model's generalization ability. Overall, these feature
engineering steps are essential for maximizing the predictive capabilities of the
dataset and facilitating robust analysis and interpretation of customer churn
prediction.
3.3 Data Visualization
fig3.3 Data Visualization
About 75% of customer with Month-to-Month Contract opted to move out as

compared to 13% of customers with One Year Contract and 3% with Two Year
Contract.

3.4 Analysis of Algorithm
3.4.1 Logistic Regression
Logistic regression is a foundational classification algorithm widely utilized across

diverse domains due to its simplicity and effectiveness. It operates by estimating the
probability of a binary outcome based on one or more independent variables. This
estimation is facilitated through the logistic function, which transforms real-valued
inputs into probabilities between 0 and 1. By optimizing the coefficients of this
function, typically achieved through techniques like maximum likelihood estimation
or gradient descent, logistic regression effectively models the relationship between
predictor variables and the likelihood of a specific outcome.
Despite its straightforward nature, logistic regression yields interpretable results,
making it valuable for scenarios where understanding the underlying factors driving
classification decisions is essential. This interpretability extends its applicability
beyond mere prediction tasks, allowing stakeholders to glean insights into the
relationships between variables and the probability of specific outcomes.
Consequently, logistic regression finds utility in a wide array of applications,
including but not limited to binary classification tasks, risk assessment in finance and
insurance, customer churn prediction in telecommunications, and medical diagnosis.
Its versatility and interpretability make logistic regression a go-to choice for analysts
and practitioners seeking robust and understandable solutions to classification
problems, even in cases where the relationships between variables may exhibit
nonlinear behavior.
Model Representation:
 Logistic regression models the probability P(y=1 x) of the binary outcome y∣
given input features x.
 It uses the logistic function (sigmoid function) to map the linear combination
of input features to a probability between 0 and 1.

fig3.4.1 Logit Curve
 The logistic function is defined as: g(z)=1+e−z1, where z=θTx, and θ

represents the model parameters or coefficients.
Cost Function:
• Logistic regression uses the cross-entropy loss function to measure the
difference between the predicted probabilities and the actual binary outcomes.
• The cost function for logistic regression is derived from maximum likelihood
estimation.
• The goal is to minimize the cost function by adjusting the model parameters θ
using optimization algorithms like gradient descent.
Model Training:
• The model parameters θ are initially assigned random values.
• The cost function is iteratively minimized using optimization techniques such
as gradient descent.
• During each iteration, the model parameters are updated in the direction that
reduces the cost function.
• The process continues until convergence, where the change in the cost
function becomes negligible.
Decision Boundary:
• Logistic regression learns a linear decision boundary that separates the two
classes in the feature space.
• The decision boundary is determined by the coefficients θ learned during
training.

• For binary classification, the decision boundary is typically a hyperplane in

the input feature space.
Model Evaluation:
• Logistic regression models are evaluated using metrics such as accuracy,
precision, recall, F1-score, and ROC curve.
• Accuracy measures the overall correctness of predictions, while precision and
recall provide insights into the model's performance on individual classes.
• ROC curve and AUC (Area Under the ROC Curve) quantify the trade-off
between true positive rate and false positive rate at different classification
thresholds.
Assumptions:
• Logistic regression assumes that the relationship between the input features
and the log-odds of the outcome variable is linear.
• It assumes that there is no multicollinearity among the predictor variables.
• The observations are assumed to be independent of each other.
Applications:
• Logistic regression finds applications in various fields such as healthcare ,
finance , marketing and more.
• It is often used as a baseline model for binary classification tasks due to its
simplicity and interpretability.

3.5 Project Pipeline

Development Pipeline
fig3.5.1 Development Pipeline
For costumer churn details , Telecom churn dataset dataset is used. The dataset have
to be split into train and test set. The preprocessing needed is cleaning and feature
engineering. Using training dataset model is fitted and it is evaluated using test
dataset. The trained model is saved and used for evaluation.

Deployment Pipeline
fig3.5.2 Deployment Pipeline
In system deployment phase, user needs to register in the website and provide personal
details for creating an account. User needs to verify the account details and then
connection approval should be done by the admin. The saved model is used to predict
the customer churn . Approved connecions can check the connection and payment
details.

3.6 Feasibility analysis
Feasibility study is carried out to select the best system that meets the performance
requirements. It involves preliminary investigation of the project and examines
whether the designed system will be useful to the organization.
 Technical Feasibility
 Operational Feasibility
 Economic Feasibility
3.6.1 Technical Feasibility
The proposed churn management system demonstrates technical feasibility through

its adept utilization of established web technologies and tools. Python, renowned for
its simplicity and extensive libraries, serves as the foundational language, while
Django, a robust web framework, facilitates rapid development. Complementing
these are HTML and CSS, enabling the creation of intuitive user interfaces essential
for effective churn analysis. SQLite provides a lightweight yet efficient database
solution for data storage and retrieval. The synchronized technical skills required for
these technologies ensure a seamless development process, minimizing compatibility
issues and promoting collaboration among team members. The decision to initially
host the website on a free web hosting space reflects a pragmatic approach, allowing
for cost-effective deployment and testing. Overall, the system's reliance on
wellsupported technologies, compatibility with common operating systems, and
synchronized technical expertise render it technically feasible for implementation
across a variety of machines supporting Windows and internet services. From these it
is clear that the system is technically feasible.
3.6.2 Operational Feasibility
The proposed churn management system not only demonstrates technical feasibility
but also exhibits strong operational feasibility, as it requires minimal training for staff
members and is designed with user-friendliness in mind. With straightforward
instructions and intuitive interfaces, any necessary training can be delivered quickly
and easily, ensuring that staff members can effectively utilize the system without
extensive prior knowledge of computer operations. Moreover, the project is
developed with the general public in mind, prioritizing accessibility for individuals

with limited computer literacy. By streamlining access to pertinent information and

simplifying navigation, the system empowers users to efficiently retrieve relevant
data without encountering significant barriers. Additionally, the system's automation
capabilities contribute to operational efficiency by reducing redundancies and
minimizing manual intervention. Tasks that were previously labor-intensive and
time-consuming can now be executed seamlessly, allowing staff members to allocate
their time and resources more effectively. Overall, the operational feasibility of the
proposed system is underscored by its user-friendly design, minimal training
requirements, and automation capabilities, making it a practical and efficient solution
for churn management within the telecommunications industry.
3.6.3 Economical Feasibility
The proposed churn management system demonstrates strong economic feasibility

through its utilization of free and open-source tools, minimizing development costs.
By leveraging resources such as Python, Django, HTML, CSS, and SQLite, which
are freely available and widely supported by the developer community, the system
significantly reduces upfront expenses associated with software licensing and
procurement. Additionally, the use of open-source technologies ensures that ongoing
maintenance costs remain low, as updates and patches are readily accessible without
additional fees. Furthermore, the hardware requirements for the system are feasible
and straightforward, as it can run on standard machines supporting Windows and
internet services, eliminating the need for expensive infrastructure investments. With
minimal maintenance requirements and low operational overhead, the total cost of
ownership for the proposed system remains economical over its lifecycle. Moreover,
the development of the system is not capital-intensive, as it does not necessitate
significant financial investment in proprietary software or specialized hardware.
Therefore, the money invested in the application development is justified by its cost
effectiveness and potential for long-term savings. Overall, the proposed churn
management system represents an economically feasible solution, offering a balance
between affordability and functionality, and ensuring that resources are utilized
efficiently to achieve the desired business objectives.

3.7 System Environment
System environment specifies the hardware and software configuration of the new
system. Regardless of how the requirement phase proceeds, it ultimately ends with
the software requirement specification. A good SRS contains all the system
requirements to a level of detail sufficient to enable designers to design a system that
satisfies those requirements. The system specified in the SRS will assist the potential
users to determine if the system meets their needs or how the system must be
modified to meet their needs.
3.7.1 Software Environment
Various software used for the development of this application are the following:
Front End: Python,HTML,CSS,JS

Back End: SQLite
1. Python
Python is an interpreted, object-oriented, high-level programming language with

dynamic semantics. Its high-level built in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python’s simple, easy to learn syntax emphasizes readability
and therefore reduces the cost of program maintenance. Python is a well-known
software that is used everywhere.
Python supports modules and packages, which encourages program modularity and
code reuse. The Python interpreter and the extensive standard library are available in
source or binary form without charge for all major platforms, and can be freely
distributed. Since there is no compilation step, the edit-test-debug cycle is incredibly
fast. Debugging Python programs is easy: a bug or bad input will never cause a
segmentation fault. The fast edit-test debug cycle makes this simple approach very
effective. These are reason for choosing python as backend in this system.
a. NumPy:
NumPy is a Python library used for working with arrays. NumPy, which stands for
Numerical Python, is a library consisting of multidimensional array objects and a

collection of routines for processing those arrays. In this application, its used for
handling arrays.
b. Matplotlib:
Matplotlib is a cross-platform, data visualization and graphical plotting library for
Python. One of the greatest benefits of visualization is that it allows us visual access
to huge amounts of data in easily digestible visuals. In this application, its used for
plotting the graph.
c. Tensorflow:
TensorFlow is an open-source library developed by Google primarily for deep
learning applications. In this application, its used for creating and handling the
model.
d. Keras:
Keras is a powerful and easy-to-use free open-source Python library for developing
and evaluating deep learning models. It wraps the efficient numerical computation
libraries Theano and TensorFlow and allows you to define and train neural network
models in just a few lines of code. In this application, its used for creating and
handling the model.
e. Scikit-learn:
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in
Python. It provides a selection of efficient tools for machine learning and statistical
modelling including classification, regression, clustering and dimensionality
reduction via a consistence interface in Python. In this application it is used to plot
confusion matrix.
f. OS:
The OS module in Python provides functions for interacting with the operating
system.OS comes under Python’s standard utility modules. This module provides a
portable way of using operating system-dependent functionality. In this application,
its used for saving the model.
g.Seaborn :
Seaborn is a Python data visualization library built on top of Matplotlib, providing a
high-level interface for creating attractive and informative statistical graphics. It
simplifies the process of generating complex visualizations by offering a wide range
of built-in themes, color palettes, and statistical plotting functions.

h. Matplotlib:
Matplotlib is a comprehensive Python library for creating static, animated, and
interactive visualizations in various formats and platforms.With its versatile plotting
functions and robust capabilities, Matplotlib is widely used for data exploration,
analysis, and presentation in fields such as data science, machine learning, and
academic research.
2. Jupyter Notebook
Jupyter Notebook is an open-source web application that allows users to create and
share documents containing live code, equations, visualizations, and narrative text. It
supports multiple programming languages, including Python, R, and Julia, making it
a versatile tool for data analysis, scientific computing, machine learning, and
education. With its interactive interface, users can write and execute code in
individual cells, view the output immediately, and document their analysis in
markdown cells. Jupyter Notebook promotes collaboration and reproducibility by
enabling users to share their work as interactive notebooks, facilitating seamless
communication and knowledge dissemination within the data science community and
beyond.
3. Pycharm
PyCharm is a powerful integrated development environment (IDE) specifically
designed for Python development. Developed by JetBrains, it offers a wide range of
features to enhance productivity and streamline the coding process. PyCharm
provides intelligent code completion, code navigation, and refactoring tools, helping
developers write clean, efficient code. Its robust debugger allows for easy
troubleshooting and testing of Python scripts and applications. PyCharm also offers
integration with version control systems like Git, as well as support for popular web
frameworks like Django and Flask, making it a versatile tool for web development.
With its user-friendly interface and extensive plugin ecosystem, PyCharm is a
preferred choice for both beginner and experienced Python developers looking to
build high-quality Python projects efficiently.
4. HTML,CSS,JS,Bootstrap:
HTML, CSS, and JS are the fundamental building blocks of web development.
HTML (Hypertext Markup Language) provides the structure and content of web
pages, defining elements like headings, paragraphs, and images. CSS (Cascading
Style Sheets) controls the presentation and layout of these elements, allowing

developers to customize colors, fonts, and spacing to create visually appealing

designs. JS(JavaScript), on the other hand, adds interactivity and dynamic behavior
to web pages, enabling features like form validation, animations, and real-time
updates. Together, these three technologies form the backbone of modern web
development, empowering developers to create engaging and responsive websites
and web applications for users across various devices and platforms.Bootstrap is a
free, open- source front-end development framework for the creation of websites and
web apps. We have used it to make our user interface more interactive.
5. Django:
Django is a high-level Python web framework that enables rapid development of

secure and maintainable websites. Built by experienced developers, Django takes
care of much of the hassle of web development, so you can focus on writing your app
without needing to reinvent the wheel. It is free and open source, has a thriving and
active community, great documentation, and many options for free and paid-for
support.
6. SQLite:
SQLite is a lightweight, serverless relational database management system renowned

for its simplicity and efficiency. Unlike traditional database systems, SQLite operates
without a separate server process, making it ideal for embedded systems, mobile
applications, and small-scale projects. Its self-contained nature allows it to be
seamlessly integrated into applications, requiring minimal setup and administration.
Despite its simplicity, SQLite offers robust SQL support and ACID transactions,
ensuring data integrity and reliability. With its small footprint and high performance,
SQLite is a popular choice for developers seeking a fast, reliable, and easy-to-use
database solution for their applications.
3.7.2 Hardware Environment
The minimum requirement for the implementation of the system is only one
machine. CPU: Intel Core i3
RAM: 4GB
Keyboard: Standard 108 keys Enhanced Keyboard
Display: 15” Monitor
Pointing device: Serial Mouse

4. SYSTEM DESIGN
4.1 Model Building

Model building machine learning refers to the process of creating predictive models
that can accurately forecast the target. This involves training the machine learning
algorithms on historical data, including demographics, usage patterns, and
servicerelated metrics, to learn patterns and trends associated with target. The trained
models are then used to predict for new or existing data based on their characteristics
and behavior. The aim is to develop robust models that can effectively identify target
and enable beneficiaries to implement targeted strategies to mitigate the target.
4.1.1 Implementation of code

Simple Logistic Regression algorithm is used to build the model.
Fig4.1.1 Implementation of code
4.1.2 Model Planning

Model planning is a critical aspect of the machine learning processes, which involves
strategically designing the approach to accurately predict customer churn. This
includes selecting relevant features from the dataset, such as customer demographics,
usage patterns, and service-related metrics. Logistic regression, a simple yet effective
algorithm, is chosen for its ability to model the probability of churn based on these
features. The dataset is pre-processed to ensure data quality and integrity, with steps
such as handling missing values and encoding categorical variables. Additionally, the
dataset is split into training and testing subsets to evaluate the model's performance

effectively. Overall, model planning in this context aims to create a robust predictive
model tailored to the telecom industry's specific needs.
4.1.3 Model Training

Model training in machine learning refers to the process of fitting a model to the
training data in order to learn patterns, relationships, and associations within the data.
This involves adjusting the parameters or weights of the model to minimize the
difference between the predicted outputs and the actual targets in the training dataset.
During training, the model iteratively adjusts its parameters using an optimization
algorithm, such as gradient descent, to minimize a loss function that quantifies the
error between predictions and ground truth. The trained model captures the
underlying patterns in the data, allowing it to make predictions on new, unseen data
with reasonable accuracy. Model training is a crucial step in the machine learning
pipeline and typically involves techniques such as cross-validation and
hyperparameter tuning to optimize model performance and generalization ability. In
your case, the training score obtained is 0.85, indicating the accuracy achieved by the
model on the training dataset.
4.1.4 Model Testing

Testing in machine learning refers to evaluating the performance of a trained model
on a separate dataset that was not used during training, known as the test dataset.
This process assesses how well the model generalizes to new, unseen data and
provides insights into its predictive capabilities. The testing score, often measured
using metrics like accuracy, precision, recall, or F1-score, quantifies the model's
performance on the test dataset. A testing score of 0.88 indicates that the model
correctly classified 88% of the instances in the test dataset, reflecting its
effectiveness in making accurate predictions on new data.
4.2 Database design
Table name: Bill payment
Column Type Length Description
id Int 10 Numbers of

id
date_added Datetime - Date and

time of bill
payment
connection_ Bigint 20 Connection

number id
number_id
amount_pai Varchar 10 Amount

d paid
Table 4.2.1 Bill payment
Table name: Connection number

id Int 10 Numbers of id
connection_numbe Varchar 10 Number of
r connections
date_added Datetime - Date of
connection
connection_request Bigint 20 Connection
_id request id
amount Varchar 10 Amount for
connection
Table 4.2.2 Connection number
Table name: Connection request
Column Type Description Description

date_added Datetime - Date of
connection request

status Varchar 20 Approval status

customer_id Bigint 20 Customer id
Table 4.2.3 Connection request
Table name: Customer
Column Type Description Description

password Varchar 128 Password of user
last_login Datetime - Date of last login
is_superuser Bool 10 Supreme user
username Varchar 150 Username of
customer
first_name Varchar 150 First name of
customer
last_name Varchar 150 Last name of
customer
is_staff Bool 10 Staff or not
is_active Bool 10 Active or not
date_joined Datetime - Date of join
email Varchar 254 Email id of user
user_type Varchar 50 Type of user
Table 4.2.4 Customer
Table name: Customer groups



group_id Int 10 Group id
Table 4.2.5 Customer groups
Table name: Customer user permissions

Permission_id Int 10 Permission id
Table 4.2.6 Customer user permissions
5. RESULTS AND DISCUSSION
The aim of the telecom customer churn prediction project is to leverage data-driven
insights to understand and predict customer behavior within the telecommunications
industry, specifically focusing on identifying potential churners. Through analysis of
the customer churn prediction dataset, the project aims to uncover patterns and trends

in customer churn rates based on factors such as gender and service types. By
integrating advanced machine learning algorithms and additional functionalities like
bill alerts and payment history viewing, the project seeks to provide telecom
companies with a powerful tool for proactive churn mitigation and enhanced
customer engagement. Ultimately, the goal is to optimize services, foster customer
loyalty, and drive sustainable growth in a competitive telecom landscape through
data-driven decision-making and effective retention strategies.
Accuracy
In our thorough evaluation of the logistic regression model for customer churn
prediction, we conducted an extensive analysis, considering key metrics such as
accuracy scores of multiple algorithms. Upon examination, we found that the logistic
regression model achieved an impressive accuracy score of 0.882 on the dataset,
indicating an overall accuracy of approximately 88.2%. This accuracy metric
represents the proportion of correctly classified instances out of the total number of
instances, offering a comprehensive view of the model's performance. However, it's
important to note that while accuracy provides a general assessment, it may not
capture the intricacies of class-specific predictions. Despite this limitation, the high
accuracy score obtained suggests that the logistic regression model effectively
generalizes patterns within the data, showcasing its proficiency in predicting
customer churn. This performance underscores the reliability and robustness of the
model, providing valuable insights for telecommunications companies aiming to
mitigate customer churn and enhance overall business strategies. Through its
accurate predictions and ability to discern patterns, the logistic regression model
serves as a valuable tool for guiding decision-making processes and optimizing
customer retention efforts in the competitive telecom industry landscape.

Fig5.1 Accuracy visualization
After experimenting with a variety of machine learning algorithms for customer

churn prediction, the obtained accuracy scores offer valuable insights into their
respective performances. Among the models tested, Logistic Regression exhibited
the highest accuracy score of 0.882, indicating its effectiveness in accurately
predicting customer churn within the dataset. Following closely behind, Random
Forest achieved an accuracy score of 0.814, showcasing its robust predictive
capabilities. While Decision Tree, KNN, SVM-SVC, AdaBoost, and Gradient
Boosting also yielded respectable accuracy scores ranging from 0.725 to 0.808, they
fell slightly short of the top performers. Overall, Logistic Regression emerges as the
model with the highest accuracy score, making it the preferred choice for customer
churn prediction based on this evaluation metric.

Confusion Matrix
The confusion matrix, an essential tool in evaluating classification models, offers a
detailed breakdown of the model's performance across different classes. It illustrates
the number of true negatives (TN), false positives (FP), false negatives (FN), and
true positives (TP). In this case, the confusion matrix reveals that the model
accurately predicted 542 instances of non-churned customers (TN), while incorrectly
classifying 19 instances as churned customers (FP). Additionally, it correctly
identified 20 instances of churned customers (TP) but misclassified 56 instances as
non-churned (FN).
Fig5.2 Confusion Matrix
Classification Report
Further analysis through the classification report provides insights into the precision,
recall, and F1-score of the model across different classes. For non-churned customers
(class 0), the model demonstrated high precision (0.91), recall (0.97), and F1-score
0.94), indicating its effectiveness in accurately predicting non-churned customers.
However, for churned customers (class 1), the model's performance was
comparatively lower, with precision, recall, and F1-score values of 0.51, 0.26, and
0.35, respectively. While the model shows promise in identifying churned customers,
there is potential for improvement in its predictive capabilities for this class.

Fig5.3 Classification Report
ROC Curve
The ROC curve evaluates a binary classification model's performance by plotting the
True Positive Rate (TPR) against the False Positive Rate (FPR) at varying
classification thresholds. TPR measures the proportion of actual positive cases
correctly identified by the model, while FPR represents the proportion of actual
negative cases incorrectly classified as positive. A model with a curve closer to the
top-left corner signifies high TPR and low FPR, indicating better performance. The
Area Under the Curve (AUC) quantifies overall model performance, with higher
values suggesting better discriminative ability. ROC curves are widely used in
machine learning to assess and compare classification models.
Fig6.3.6.3 ROC Curve

6. Model Deployment
Model deployment in deep learning encompasses the process of transitioning a
trained machine learning model from development to practical use in a production
environment, aiming to enable end-users to interact with the model and generate
predictions based on new input data. Central to this process is the focus on designing
and implementing a user-friendly interface for seamless user interaction. In the
present context, the user interface is constructed using HTML, CSS, JavaScript, and
Bootstrap, enhancing the web application's interactivity and usability. The inclusion
of these technologies contributes to a more engaging and intuitive user experience,
with visual aids such as figures providing clarity on the system's interface layout.
The resulting interface is designed to be simple and easy to navigate, with only
essential elements displayed on the pages, ensuring that users can interact with the
deployed model efficiently and effectively.
6.1 UI Design
fig6.1.1Home page

fig6.1.2 Login page
fig6.1.3 User-Payment

fig6.1.4 Telecom-churn prediction
fig6.1.5 Telecom-Connection approval

fig6.1.6 Telecom- Payment review

7. CONCLUSION
In conclusion, our analysis of the telecom customer churn prediction project has
unveiled valuable insights into customer behavior and service preferences within the
telecommunications industry. By delving into the custamor-churn-prediction dataset,
we identified distinct patterns in churn rates based on gender and service types,
shedding light on potential areas for targeted retention efforts. Moreover, our
investigation into profitable service types has informed strategic decision-making,
enabling companies to tailor offerings and maximize revenue. Through the
application of advanced machine learning algorithms and the integration of bill alerts
and payment history viewing functionalities, this project offers telecom companies a
powerful tool for proactive churn mitigation and enhanced customer engagement.
Overall, these findings underscore the importance of data-driven decision-making in
optimizing services, fostering customer loyalty, and driving sustainable growth in a
competitive telecom landscape.

8. SCOPE AND FUTURE ENHANCEMENTS

The telecom customer churn prediction project aims to leverage the churn dataset for
in-depth analysis of user behaviors, billing patterns, and service interactions within
the telecommunications domain. Through the development and deployment of
machine learning models, the project seeks to accurately predict customer churn
based on factors like bill amounts. Additionally, functionalities for sending bill alerts,
viewing payment history, and managing user connections are integrated to enhance
user engagement and operational efficiency. Data privacy and security considerations
are paramount throughout the project, ensuring compliance with regulatory standards
and safeguarding user information.
Future enhancements to the project encompass various avenues for further
innovation and optimization. These include the integration of additional data sources
like customer feedback and social media interactions to enrich predictive models.
Advanced machine learning techniques such as deep learning and real-time
predictive analytics will be explored to enhance churn prediction accuracy and
responsiveness. Personalized retention strategies, improved user experience through
enhanced interfaces, and automation of workflows using artificial intelligence are
also key areas for future development. By continuously evolving and embracing
emerging technologies, the project aims to drive value creation, operational
excellence, and customer satisfaction in the dynamic telecom industry landscape.

9. APPENDIX
9.1 Software environment
• Front End: Python,HTML,CSS,JS

• Back End: SQLite , ML
9.2 Hardware environment
• CPU: Intel Core i3

• RAM: 4GB
• Keyboard: Standard 108 keys Enhanced Keyboard
• Display: 15” Monitor
• Pointing device: Serial Mouse

10. REFERENCES
• https://ieeexplore.ieee.org/document/9793315
• https://www.kaggle.com/datasets/shrutiarora185/churn
• https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0191-
6
• https://www.sciencedirect.com/science/article/pii/S2666720723001443
• https://www.linkedin.com/pulse/predicting-telecom-churn-machine-learning-
approach-asiedu-kingsley
• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9051585/
• https://link.springer.com/article/10.1007/s10844-022-00739-z

Telecom Report

Uploaded by

Copyright:

Available Formats

Telecom Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Telecom Report

Uploaded by

Copyright:

Available Formats

COLLEGE OF ENGINEERING ANDTECHNOLOGY

Mulavoor P.O, Muvattupuzha, Kerala – 686673

TELECOM CUSTOMER CHURN PREDICTION

in partial fulfillment for the award of the degree

MASTER OF COMPUTER APPLICATIONS

DEPARTMENT OF COMPUTER APPLICATIONS

This is to certify that the report entitled TELECOM CUSTOMER CHURN

Project Coordinator Project Guide Head of the Department

Submitted for the viva-voce held on ______________ at ICET, Muvattupuzha.

In the dynamic realm of telecommunications, where customer churn poses a significant

Fig No. Figure Name Page

3.2.1 Missing value visualization 16

Fig No. Table Name Page

2.1.1 Literature Review 1 10

In the dynamic landscape of the telecommunication industry, customer retention reigns

Department of Computer Applications, ICET 9

2.1 Literature Review

Paper1: Customer Churn Prediction in Telecom Services

Paper Title Customer Churn Prediction in Telecom Services

Publication International Conference on Sustainable Emerging

Algorithm Random Forest

Table2.1.1 Literatire Review 1

Department of Computer Applications, ICET 10

Paper Title Customer Churn Prediction Using Apriori Algorithm and

Publication 5th Novel Intelligent and Leading Emerging Sciences

Algorithm Voting classifier

Key The voting classifier yielded an accuracy rate of 81.56%,

Table2.1.2 Literature Review 2

Department of Computer Applications, ICET 11

Publication 3rd International Conference on Advances in Computing,

Key The accuracy achieved on the native dataset is 82.80%.

Table2.1.3 Literature Review 3

Department of Computer Applications, ICET 12

2.2 Findings and Proposals

Department of Computer Applications, ICET 13

3.1 Analysis of dataset

3.1.1 About the dataset

The source of data set is from kaggle.com. Kaggle is a subsidiary of GoogleLLC, is a

3.1.2 Explore the dataset

The dataset encapsulates information on telecom customer churn, encompassing

Department of Computer Applications, ICET 14

indicated by binary markers. "Partner" signifies whether the customer has a

3.2 Data Preprocessing

3.2.1 Data Cleaning

Department of Computer Applications, ICET 15

interpretability and performance. Feature scaling is then applied to standardize the

Fig3.2.1 Missing value visualization

3.2.2 Feature Engineering

Department of Computer Applications, ICET 16

creating new features or transforming existing ones to augment predictive power,

3.3 Data Visualization

fig3.3 Data Visualization

About 75% of customer with Month-to-Month Contract opted to move out as

Department of Computer Applications, ICET 17

3.4 Analysis of Algorithm

3.4.1 Logistic Regression

Logistic regression is a foundational classification algorithm widely utilized across

Department of Computer Applications, ICET 18

fig3.4.1 Logit Curve