Literature Survey On Customer Churn Prediction
Literature Survey On Customer Churn Prediction
Literature Survey On Customer Churn Prediction
Abstract : The paper reviews the work done in customer churn prediction using machine learning techniques. Customer churn prediction
has significance in many areas such as banking, telecom, online applications, game providers and so on. Hence the review mainly focusses
on the methodologies than the area of application. Along with the review of significant papers in the field, methodologies used in them are
also summarized. This paper forms the overview of research done in customer churn prediction for the last few years.
Customer churn refers to switching from one service provider to another. The reason for churn can be, issues with the service provider to
cancellation by mistake. There are numerous service providers in every field and hence customer churn is an important issue to deal with.
Customer Relationship Management is concerned about issues with customers and they deal with them as per the insights from customer
churn analysis. Churn prediction is used to predict the customers who may end service with the current service provider shortly. Once the
prediction is conducted, the reason for their churn is analyzed and CRM takes necessary actions to retain that customer. The reason why
companies are so concerned with churn prediction is of two issues: the first reason being churn of a large number of customers affects the
reputation and reliability of service providers. The second reason is economical, attaining a new customer costs five to six times that retaining
an old customer.
Customer churn has applications in many fields, though telecommunication is the most explored one. There are numerous service
providers in telecommunications and with the increased number of providers, competition among them put forth various schemes which
tempt customers to hop frequently. Another field is Banking, where customers may close their account due to some reasons. Customer churn
analysis is recently making its space in the online application and gaming sector where they are interested to retain their users with interesting
offers and rewards. Beyond all this, customer churn has a wide scope of applications where it can be used in any field where customers are
provided with continuous service by any company.
We are concerned about predicting customer churn using Machine Learning techniques. There can be various reasons for customer
churning. Reasons like terminating service after a specified period of contract or cancellation by mistake are not dealt with machine learning
techniques. Voluntary termination of services due to unsatisfactory customer service, sudden hike in price, lack of offers for a long period
and so on are reasons to be dealt with prediction using machine learning. Machine Learning refers to the study of algorithms which help
machines to perform a specific task without the intervention of humans. The machines are trained with a specific set of features and then
made to do the same based on inference and patterns learned before. Supervised Machine learning deals with mainly two types of tasks such
as Regression and Classification. Predicting churning customers fall under the classification problem where the given customer is classified
as either churn or non-churn. There are numerous classification algorithms prevalent, also various hybrid and boosting algorithms taking a
step forward out of all others. This paper is a review of various approaches taken to predict customer churn using various machine learning
methods.
Customer churn prediction with machine learning can’t be confined to classification algorithms since algorithmic modelling only forms a
part of the whole process. Data is said to be the fuel for so long now and that is the truth too. Data collection forms an important part and we
need to have a data source with relevant and useful features. The collected data has to be transformed into useful form as we need, which is
called as processing of data. Data wrangling, cleaning, scaling, normalizing, standardizing forms processing of data. Later, this summarized
data is used for modelling. After modelling, various optimization techniques are used and the model is evaluated with various measures. Thus
prediction with machine learning is far more than just finding the right algorithm to classify.
This paper reviews papers published on customer churn prediction using various machine learning techniques. Next section of this paper
deals with literature review of various papers and the following section summarize various methodologies specified in the review section.
Methodologies is subdivided into algorithms used, preprocessing and optimization techniques and evaluation criteria. Last section is
conclusion and analysis where analysis of reviewed papers is done and the paper is concluded.
This section reviews papers based on customer churn prediction using machine learning techniques. Papers published in the last ten years
is reviewed and analyzed based on the methodologies used.
Kiran Dahiya and Surbhi Bhatia [1] go through churn analysis process. Data mining is a Knowledge Discovery Process where new
insights are evolved from a large set of data. Churn stands for customers switching from one service provider to another. Service can be
telecommunication, banking, etc. Churn analysis stands for analyzing the customers for their churn probability. The analysis consists of
stages such as data acquisition, Data preparation, Data pre-processing, Data Extraction and Decision. The model is implemented using the
decision tree and logistic regression. The founded insights are passed to CRM to take necessary actions to reduce the churn rate. Out of the
two implementations, the decision tree shows more accuracy and efficiency.
Navid Forhad, Md. Shahriar Hussain, Rashedur M Rahman [2] discusses the process of predicting churners. This paper explains input
data collection, methods of analysis, filtering and rule generation. The result is generated based on the frequency of bill payment. Later,
possible obstacles while doing churn analysis are also discussed. The paper comes to an end with the view that a complete dataset is necessary
for churn prediction with accuracy. A comparison of methods and mining tools are other areas to consider.
IJRAR2002049 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 347
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
Nadeem Ahmad Naz, Umar Shoaib and M. Shahzad Sarfraz [3] focuses on explaining the concept churn prediction. The main objective
of services like telecommunication is to retain existing customers than attaining new ones due to saturation. Data mining techniques can be
used for this customer churn analysis. The phases of data mining involve selection, pre-processing, data mining, transformation, and
evaluation. Churners can be classified as voluntary and involuntary whereas voluntary can be either deliberate or incidental. Out of this, we
apply data mining techniques mainly for the analysis of deliberate churners. There are various modeling techniques. We choose among them
based on the objective of analysis. Decision Tree and Support Vector Machine is used to find true churn rate and false churn rate while
logistic regression is used to find churn probability.
Mitkees, Ibrahim & Badr, Sherif & Elseddawy, Ahmed [4] used data from the IBM Watson Analytics community and performs data
mining phases. The data mining techniques include a classification for prediction and clustering and association for detection. Classification
is done with the aid of Matlab software which uses the decision tree, logistic regression, and SVM. For clustering, the author uses k-means
and DB-scan algorithms. Association rule is implemented using Apriori-type and FP-Growth algorithms. They are applied to Weka mining
software. The efficiency of algorithms is represented using tables and graphs.
David L.Garcia, Alfredo Vellido, and Angela Nebot [5] describes the design and development of the predictive model as 4 stages. First,
one is identifying and obtaining the best data. As said, the data used determines the accuracy and efficiency of prediction. The second stage
is the selection of attributes. It has two phases such as the search phase and evaluation phase. The data used for both phases should be
different to avoid overfitting. The development of a predictive model is the third phase. For that, standard methods such as Decision tree and
Regression analysis can be used or other soft computing methods. In the fourth stage, validation of the result is carried out by dividing the
whole data into 70:30 ratios, where 70% of data is used for training the model and the rest 30% used for testing.
N. L. R. Machado and D. D. A. Ruiz [6] propose a churn prediction method based on mobile application usage. It is based on the interest
of companies to understand customers' usage of their application. To analyze customers’ behaviors, they are grouped based on their activity
patterns. Prediction is carried out with algorithms such as K-Means, STREAM, and DBSCAN. Among these, K-Means presents the best
performance. Further tests performed show an accuracy rate of 87% which is satisfactory.
Abinash Mishra and U. Srinivasulu Reddy [7] compare ensemble-based classifiers with well-known classifiers. Also utilized those
ensemble-based classifiers such as Bagging, Boosting and Random Forest for customer churn prediction in the telecom industry. The
classifiers are compared based on error rate, specificity, sensitivity, and accuracy. Among all classifiers, Random Forest performs better.
They also compared the performance of Random Forest with an existing churn prediction model called Classification and Regression
Tree(CART) and its variants. Then too Random Forest has a high-performance rate.
G. Xia, H. Wang and Y. Jiang [8] set up the model based on the characteristics of amount and imbalance data and verify on the real data
of telecom. By comparing with the Bayes, Decision Tree (DT), Artificial Neural Networks (ANN) and Support Vector Machine (SVM), the
ensemble learning algorithms have the potential advantages. The effect of ensemble is obvious advantage especially the base classifiers are
Support Vector Machines and has better hit rate, lift coefficient and accuracy rate. It can be used as an effective measure for customer churn
prediction. The experiments are conducted in the environment based on MATLAB 2011a, DTree used classical C4.5, Bayes use Naive
Bayesian, and ANN model use BP algorithm consists of a single hidden layer.
P.K.D.N.M. Alwis, B.T.G.S. Kumara, H.A.C.S. Hapuarachchi [9] create a predictive churn model that obtain customer churn rate of five
telecommunication companies. For model building, classified the relevant variables with the use of the Pearson chi-square test, cluster
analysis, and association rule mining. The C5.0 Decision tree algorithm tree, the Bayesian Network algorithm, the Logistic Regression
algorithm, and the Neural Network algorithms were developed. The C5.0 algorithm of decision trees model proved optimal among the models
with 85 percent accuracy.
Kamya Eria and Booma Poolan Marikannan [10] identifies that Support Vector Machines, Naive Bayes, Decision Trees and Neural
Networks are the mostly used CCP techniques. Feature selection is the mostly used data preparation method followed by Normalization and
Noise removal. Support Vector Machines and Neural Networks as the mostly preferred prediction techniques. However, ensembles of these
techniques improve the prediction accuracy of the models because of the combined advantages of the components.
A. Saran and D. Chandrakala [11] review various machine learning algorithms used for churn prediction across various sectors. The paper
handles churn prediction as binary classification task consisting of churners and non-churners. It observes the main reasons for churn to be
dissatisfaction with the customer service, high costs, unattractive plans and bad support. Reviewed papers were mainly on SVM and various
boosted, ensemble based algorithms. It concludes by suggesting SVM with boosting algorithms as it showed remarkable accuracy.
Shini Renjith [12] proposed a framework for proactive detection of customer churn based on support vector machine and a hybrid
recommendation strategy. While SVM predict E-Commerce customer churn, recommendation strategy suggests personalized retention
actions. SVM used a strategy called kernel trick which resulted in an algorithm where every dot product being replaced by the nonlinear
kernel function. Recommendations with high degree of personalization is generated by using individual profile, past transactional traits,
demographic information, behavioral patter of similar users and similarity functions. Thus, the proposed framework suggests hybrid approach
to finalize the retention strategy.
B. Mishachandar and Kakelli Anil Kumar [13] come up with a customer churn model that predict the possibility and time of churn. The
paper proposed a novel approach combining machine learning algorithms and big data analytics tools and retention technique called targeted
proactive retention. The model used Naïve Bayes classification and Decision Tree algorithm and the entire experiment is performed on Azure
Workbench. The paper detailed the experiment process with Azure and concludes with the view of building churn prediction models with
retention strategies as the preferred model.
A. Idris and A. Khan [14] proposed a churn prediction model that exploits the power of feature selection. The approach considered two
standard telecom dataset using 10-fold cross validation. The model transforms the feature set into useful discriminative features using mRMR
method which later constructs diverse ensemble models using majority voting. Higher accuracy is attained through combining ensemble
members with majority voting. Majority voting is based on Random Forest, Rotation Forest and KNN. Among these, Rotation Forest achieves
high accuracy by employing Principal Component Analysis.
Mumin Yildiz and Songul Albayrak [15] used down sampling and Rotation Forest for efficient customer churn prediction. By down
sampling process subset with x time churn customers and 2x times non churn customers are created. Though the basic working principle of
rotation forest is similar to random forest, dataset used in training of every decision tree is determined by Principal Component Analysis.
Later, results of rotation forest is compared with Ant-Miner+ and C4.5 which shows better results for Rotation Forest method.
IJRAR2002049 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 348
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
III. METHODOLOGIES
Various methodologies are discussed in reviewed papers. Along with various classification algorithms, various preprocessing methods
and evaluations criteria are considered. This section discusses some of the significant methodologies among them.
3.1 Preprocessing
Preprocessing is the most important part of any machine learning modelling. Well preprocessed determines the accuracy and
efficiency of the model. Preprocessing techniques used in papers reviewed are discussed briefly here.
1. Data Wrangling: It is the process of converting data from one format to another for the ease of analysis and modelling techniques.
2. Data Cleaning: Cleaning involves dealing with missing values and noisy data. It can be filling missing value with mean, correcting errors
or removing outliers.
3. Data Transformation: This step involves preparing data to the format suitable for mining and modelling. It involves data scaling and
normalizing where range of features is normalized. Another technique is standardizing for converting data to a common format. Creating
new attributes from existing attributes is another method involved.
4. Data Reduction: This stage helps in dealing with huge volume of data and large number of features available. Two of the main reduction
techniques are feature selection where only significant features to the model are selected and dimensionality reduction where the size of
whole data is reduced.
3.2 Algorithms
Machine learning algorithms are divided mainly into supervised learning, unsupervised learning and reinforcement learning. In
supervised learning the model is trained with labelled data whereas in unsupervised learning, training is done without labelled data.
Reinforcement learning is a method of learning from experience. The model learns from the rewards it receives from the steps it took last
time. Customer churn prediction is a supervised classification model. In classification models, data is classified into one of the specified
classes. Here, each data is classified either as churn or non-churn. There are various classification algorithms used for this. Those of which
discussed in the reviewed papers are discussed here.
1. Logistic Regression: This algorithm estimates discrete values (0 or 1) based on a given set of an independent variable(s). It basically
predicts the probability of occurrence of an event by fitting data to a logit function
2. Support Vector Machine: In SVM, each data item is plotted as a point in n-dimensional space (where n is number of features you have)
with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that
differentiates the two classes very well.
3. Naïve Byes: It is a classification algorithm derived from Bayesian Theorem of classification which is used with high dimensional data.
4. Decision Tree: It is a classification algorithm that models data into a tree structure by dividing data into smaller subsets and forms built
final tree with decision nodes and leaf nodes.
5. Random Forest: It is a combination of many decision trees, where each tree take different feature set and the prediction is based on the
majority vote from all the uncorrelated trees.
6. Bagging Algorithms: It is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then
aggregate their individual predictions (either by voting or by averaging) to form a final prediction.
7. Boosting Algorithms: This refers to a family of algorithms which converts weak learner to strong learners. Weak rules are found by
iterative process of applying base algorithms with different distribution and generating new weak predictive rule.
1. Accuracy: It is the ratio of number of correct predictions to the total number of input samples.
2. Error rate: The inaccuracy of predicted output values of categorical target values.
3. Specificity: It is the proportion of actual negatives which are predicted as true negative.
4. Sensitivity: It is the proportion of actual positives which are predicted as true positive.
5. Precision: It refers to the fraction of relevant instances among the total retrieved instances.
6. Recall: It is the fraction of relevant instances retrieved over the total amount of relevant instances.
7. ROC Curve: It is a graph showing the performance of a classification model at all classification thresholds by plotting parameters such
as True Positive Rate and False Positive Rate.
8. AUC Curve: It measures the entire two-dimensional area underneath the entire ROC curve and provides an aggregate measure of
performance across all possible classification thresholds.
9. F Score: It is the weighted harmonic mean of the test’s precision and recall.
10. Confusion Matrix: It is a summary of prediction results on a classification problem. The number of correct and incorrect predictions
are summarized with count values and broken down by each class.
IV. CONCLUSION
Customer churn prediction is one of the application of machine learning that is being explored significantly for some years. This
paper showed an overview of various machine learning techniques used for customer churn prediction. Regarding the algorithms, hybrid
algorithms outperform single models in many cases. Another important point to consider is the usage of boosting algorithms that provide
significant improvement in performance. This paper also summaries various methodologies used in modelling.
IJRAR2002049 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 349
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
REFERENCES
[1] K. Dahiya and S. Bhatia, "Customer churn analysis in telecom industry," 2015 4th International Conference on Reliability, Infocom
Technologies and Optimization (ICRITO) (Trends and Future Directions), 2015.
[2] Navid Forhad, Md. Shahriar Hussain, Rashedur M Rahman, "Churn Analysis: Predicting Churners", 2014 IEEE.
[3] Nadeem Ahmad Naz, Umar Shoaib and M. Shahzad Sarfraz, "A review on customer churn prediction data mining modelling
techniques", Indian Journal of Science and Technology, Vol 11(27), July 2018.
[4] I.M. M. Mitkees, S. M. Badr and A. I. B. ElSeddawy, "Customer churn prediction model using data mining techniques," 2017 13th
International Computer Engineering Conference (ICENCO), 2017.
[5] David L.Garcia, Alfredo Vellido, and Angela Nebot, "Predictive Models in Churn Data Mining".
[6] N. L. R. Machado and D. D. A. Ruiz, "Customer: A novel customer churn prediction method based on mobile application usage," 2017
13th International Wireless Communications and Mobile Computing Conference (IWCMC), Valencia, 2017, pp. 2146-2151.
[7] Abinash Mishra and U. Srinivasulu Reddy, "A Comparative Study of Customer Churn Prediction in Telecom Industry Using Ensemble
Based Classifiers". Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017).
[8] G. Xia, H. Wang and Y. Jiang, "Application of customer churn prediction based on weighted selective ensembles," 2016 3rd
International Conference on Systems and Informatics (ICSAI), Shanghai, 2016, pp. 513-519.
[9] P.K.D.N.M. Alwis, B.T.G.S. Kumara, H.A.C.S. Hapuarachchi, ‘Customer Churn Analysis and Prediction in Telecommunication for
Decision Making’, International Conference On Business Innovation (ICOBI), 25-26 August 2018.
[10] Kamya Eria and Booma Poolan Marikannan, ‘Systematic Review of Customer Churn Prediction in the Telecom’, Journal of Applied
Technology and Innovation, 2018.
[11] A. Saran & D. Chandrakala. (2016). A Survey on Customer Churn Prediction using Machine Learning Techniques. International
Journal of Computer Applications. 154. 13-16. 10.5120/ijca2016912237.
[12] Shini Renjith, ‘B2C E-Commerce Customer Churn Management: Churn Detection using Support Vector Machine and Personalized
Retention using Hybrid Recommendations’, International Journal on Future Revolution in Computer Science & Communication
Engineering, ISSN: 2454-4248 Volume: 3 Issue: 11.
[13] B. Mishachandar1, Kakelli Anil Kumar, ‘Predicting customer churn using targeted proactive retention’, International Journal of
Engineering & Technology (2018).
[14] A. Idris and A. Khan, "Ensemble Based Efficient Churn Prediction Model for Telecom," 2014 12th International Conference on
Frontiers of Information Technology, Islamabad, 2014, pp. 238-244.
[15] Yildiz, Mumin & Varlı, Songul. (2015). Customer churn prediction in telecommunication. 2015 23rd Signal Processing and
Communications Applications Conference, SIU 2015.
IJRAR2002049 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 350