An Optimal Genetic Algorithm With Support Vector Machine For Cloud Based Customer Churn Prediction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/344897761

An Optimal Genetic Algorithm with Support Vector Machine for Cloud Based
Customer Churn Prediction

Conference Paper · October 2020


DOI: 10.1109/ICSCAN49426.2020.9262443

CITATIONS READS
0 58

2 authors, including:

M. Jeyakarthic
Annamalai University
65 PUBLICATIONS   124 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Business Analytics View project

Wireless intrusion detection View project

All content following this page was uploaded by M. Jeyakarthic on 27 October 2020.

The user has requested enhancement of the downloaded file.


IEEE - International Conference on System, Computation, Automation and Networking (ICSCAN) 3 & 4 July 2020
ISBN : 978-1-7281-6201-0

An Optimal Genetic Algorithm with Support Vector


Machine for Cloud Based Customer Churn Prediction
S. Venkatesh Dr. M. Jeyakarthic
Assistant Professor, Assistant Director (Academic),
Dept., of Computer Science, Tamil Virtual Academy,
Govt, Arts and Science College, Perumbakkam, Chennai. Chennai.
venkas76@@gmail.com [email protected]

Abstract— At present times, Customer Churn Prediction providers and yields a gradual profit. Usually, CRM focus on
(CCP) becomes a familiar research area and is mainly solved devoted users to make important decisions about the churn. To
by the use of machine learning models. The advancements in handle the user churn in an effective manner, it is very
Internet of Things (IoT) and cloud computing platform allows significant to create productive as well as précised user churn
collecting data about the customers to carry out CCP. This predictive method. Subsequently, statistical and DM methods
paper introduces an optimal genetic algorithm (OGA) with are applied to build these churn predictive applications.
support vector machine (SVM) model for CCP. Initially, the
OGA is derived by the use of double chain-quantum-genetic- [2] proposed a technology for CCP to find the customers
algorithm. Next, the derived OGA is applied for optimizing with churn behavior. This method has been developed on the
the parameters of SVM namely C and γ. The outcome of the basis of elegant k-means for clustering; and used JRip to
OGA-SVM model is tested against a benchmark dataset from extract rules available. But, it has a disadvantage where it
telecom industry. The experimental outcome indicated that the cannot be applied at banking sector to detect the user churns
OGA-SVM shows excellent results by offering better and offers several other procedures for client retention. [3]
sensitivity of 94.50, specificity of 66.06, accuracy of 90.27, F- presented a user classifying method according to rough set
Score of 94.30 and kappa value of 61.17 correspondingly. theory for classifying user churn. It has been claimed that
rough set classifier is capable to perform more effectively than
Keywords— Customer Churn Prediction, SVM, Genetic linear regression, J48, and Neural Network (NN). Besides, [4]
algorithm defined the dimension of general discussions of clients in a
telecommunication system. This model helps to detect the
I. INTRODUCTION user's churn behavior by observing the incoming as well as
In general, data mining (DM) is defined as the extraction outgoing calls. Based on this process, it can be easily detected
of private data from massive databases. A DM model enables that a user has relationship with alternate churn where the
the conversion of original data into business information. It is customer is previously churned. Also, it is very complex to
associated with the process of selection, identification as well assume which detective model can be fixed as a reputed
as labeling more number of data to expose hidden data technique to report CCP in an effective manner. It does not
patterns for commercial purpose. A fast development of change the study of ML approach. Though the previous
market in all domains leads to greater production of service operation [5] reveals that SVM is an optimal classifier since it
providers. Many number of participants, novel as well as is capable of handling nonlinearities but [6] addressed that
creative business techniques and good services has been churn prediction in telecommunication industry.
increasing the user acquisition in a developing market, and
several organizations has the knowledge to work hard to In addition, [7] defined that the arbitrary sampling
retains previous clients and to satisfy user needs [1]. As an framework is an optimized technique established in prior to
inclusion, cloud computing (CC) technique has been classification method. This model helps to reduce the irregular
employed which provides the dimension of primary users. data distribution which happens due to the absence of decided
Thus, the service provider has to work hard for predicting the class data from user churn. Also, it has been addressed [8] that
churn. class irregularity cannot enhance the function of prediction
technique. Apart from this, [9] applied weighted random
Nowadays, Customer Relationship Management (CRM) forests (RF) and provided the derived outcome that showcased
methods replace the conventional marketing principles by a little enhancement in CCP method. However, it has been
applying defined marketing procedures. Such kinds of underestimated for its difficulty in learning and interpretation
personalized marketing services contributes in finding a sub- [10]. On the other hand, there are no alternate studies to report
set of old clients mostly stop using good or services provided the factors involved in modeling CCP method for TCI.
by churn or a firm. Since the previous user churn often
provides the result in loss of profit and customers, a minor Here, CCP is said to be a binary classifying issue such that
alteration in the retention value tends in minimum success for users are classified as 2 feasible behaviors namely, Churn, and

MVIT, PONDICHERRY 522


IEEE - International Conference on System, Computation, Automation and Networking (ICSCAN) 3 & 4 July 2020
ISBN : 978-1-7281-6201-0

Non-Churn. Furthermore, a churn behavior is divided as: functions available namely, Linear kernel, Polynomial kernel
Voluntary customer churn, where the user agree to relive from as well as RBF kernel. In this study, it prefers RBF kernel as
a company, and involuntary customer churn, where a firm major kernel function of SVM, as demonstrated in Eq. (2).
decides to stop a contract signed with a user [11]. Therefore,
the study reveals that a voluntary user churns is very complex ( ) ( ) ( )
to predict this kind of customer churn whereas it is simple to
extract the involuntary customer churn using simple queries.
Besides, this work states that previous models advertise that
there is no other optimal technique to control CCP issue. This
is due to the irregular outcome provided by various datasets.
Hence, there is a major requirement of determining the quality
of a classification model with respect to certainty or
uncertainty, while predicting the customer churn.

This paper introduces an optimal genetic algorithm (OGA)


with support vector machine (SVM) model for CCP. Initially,
the OGA is derived by the use of double chain-quantum-
genetic-algorithm. Next, the derived OGA is applied for
optimizing the parameters of SVM namely C and γ. The
outcome of the OGA-SVM model is tested against a
benchmark dataset from telecom industry.
II. PROPOSED WORK
Fig. 1 The optimal classification hyperplane of SVM
A. SVM
Generally, SVM depends upon the model of decision Where represents inversely proportional to a width of the
planes that describes a decision edges. A decision plane kernel. In order to resolve Eq. (1) with RBF kernel function,
performs the task of separating collection of objects which has dual 90 formulations has been provided:
various class memberships. The instance of linearly divisible
classes, an optimized classifying hyperplane could divide the
instances into 2 labels. In case of linearly inseparable issues, ( ) ∑∑ (
samples available at actual space would be mapped with high-
definitional feature space with the help of nonlinear
conversion.

For developing a best hyperplane, SVM applies a
traversing training procedure that can be applied to reduce the
error function. A typical SVM classifier could be defined as ( )
arithmetic optimizing issue.

( ) ∑ ∑

( ( ) ) ( ) ( ) Conventionally, SVM classification technique applies


basic set of as well as for resolving the pattern classifying
Where C implies a penalty attribute, and denotes a variable issue, which does not provide a reasonable classification
to handle indivisible data. Basically, reducing an object outcome as SVM classifier has diverse sets of and which
function ( ) of Eq. (1) requires few aspects like has various functions and different from one another.
improvising the margin among 2 classes and decreasing the Exploring an efficient path to get best attributes and is
misclassification value. The variable C handles a trade-off very significant requirement to enhance the performance of
from the slack parameter penalty as well as margin size. The SVM
index denotes the training factors, are the class
labels whereas refers the autonomous parameters. The B. Optimal Genetic algorithm (OGA)
kernel function can be applied to convert data from input to
the feature space. Hence, the architecture of SVM is depicted OGA is said to be a probability searching technique to deal
in Fig. 1. with frequent space optimization issue. This model applies
Here, a kernel function is mainly employed to convert quantum bits while coding the chromosomes, probability
data from input to feature space. There are 3 types of kernel

MVIT, PONDICHERRY 523


IEEE - International Conference on System, Computation, Automation and Networking (ICSCAN) 3 & 4 July 2020
ISBN : 978-1-7281-6201-0

amplitude to define possible solution as well as quantum convergence. The sign as well as size of a rotation angle,
rotating gate to update chromosomes. depends upon a gradient data for objective function which can
be estimated with the help of following technique.
When the solution of dimensional space optimization has
been considered as vectors, is the variable of optimization Let | |, where and are assumed to be
problem which has to be optimized, and the continuous
optimization issue might be defined as given in the following: respective probability amplitudes of global optimal solution,
and are corresponding probability amplitudes of recent
( ) ( ) solution, If , then the rotation direction is ( ).
Else, positive as well as negative directions should be
( ) approved.

To improve the quality of proposed solutions, it The rotation angle size may be computed on the basis of
defines the fitness function (FF) as follows: gradient step size searching theory for a frequent optimizing
issue. Hence, size of a rotation angle can be estimated as given
( ) ( ) ( ) in the following:
( )
where denotes more amount of FF is the optimization ( )
( ) ( )
task.

By assuming the random behaviour of encoding as where implies a initial rotation angle. As provided
well as conditions of quantum state probability amplitudes, the and are described as:
system of double chains coding has been provided in the
following: ( ) ( )
,| | | |-
( ) ( ) ( )
* + ( )
( ) ( ) ( ) ( ) ( )
,| | | |- ( )
Where ,
represents the population size, implies the In case of discrete optimization problem, size of a rotation
count of quantum bits. Hence, a first line solution from angle might be computed according to the differential of first
chromosome is meant to be cosine solution, and alternate one order among 2 generations as defined by Eq. (11).
is named as sine solution.
( ) ( ) ( )
Obviously, every chromosome consists of group of
qubits. Later, it maps the provided solution which has been
{| ( )| | ( )|} ( )
shown by every chromosomes present at unit space
the solution space of frequent optimizing issues.
Defined by a recent best individual, the updating process of { ( ) ( )}
qubit chromosome has been conducted with the application of
quantum rotation gate that is depicted in Eq. (7): Where and 155 are parent and offspring
chromosomes, correspondingly.
( ) ( )
( ) [ ] ( )
( ) ( ) C. Improved SVM Classifier based on OGA
The OGA-based optimization task of penalty variable C as
Upgrading function could be described as given in the
well as a kernel attribute of SVM classification along with
following:
RBF could be depicted in Fig. 2, which is named as
( ) ( ) ( ) ( ) OGASVM.
[ ][ ] [ ] ( )
( ) ( ) ( ) ( )
The OGA-SVM optimization function is defined as
provided in the following:
where represents a rotation angle, ( ) denotes a quantum
rotation gate.
1. Specimen collection. Gather the samples of various
classes, and classify the instances as training as well
is said to be more important for convergence
as testing group.
superiority, where angle sign agrees the converging
manipulation at the time of angle size deciding the speed of

MVIT, PONDICHERRY 524


IEEE - International Conference on System, Computation, Automation and Networking (ICSCAN) 3 & 4 July 2020
ISBN : 978-1-7281-6201-0

2. Initializing parameter. Fix a population size , III. RESULT ANALYSIS


initially , mutation probability , crossover Table 1 offers a detailed description of simulation outcome
probability , definite iterations for OGA. Assign provided by AG-BPNN method on an employed dataset from
values of C and γ for SVM. telecommunication system. This dataset contains about 3333
3. Parameter optimization. Provide the training group to samples and 21 total numbers of features. The count of classes
SVM classifier; estimate the error quadratic value of available in the database is 2 which represent churner and non-
output rate as well as original and actual value as per churners. The overall count of 14.49% of instances belongs to
objective function of OGA. Determine the FF for churner class and rest 85.51% samples belong to non-churner
every chromosome and acquire best parameter and category. Fig. 3 illustrates sample distribution which is present
by applying mutation, crossing, and selecting same in every classes of employed dataset.
models.
4. Test and verification. Give a testing group to trained
SVM classification that contains good attribute TABLE I
and , and evaluate the accuracy of classification DATASET DESCRIPTION
process.

Description Dataset

Number of Instances 3333

Number of Features 21

Number of Class 2

Percentage of Positive Samples 14.49%

Percentage of Negative Samples 85.51%

Data sources [12]

Fig. 2 The flow chart of OGA-SVM

Fig. 3 Frequency Distribution of Dataset for all attributes

MVIT, PONDICHERRY 525


IEEE - International Conference on System, Computation, Automation and Networking (ICSCAN) 3 & 4 July 2020
ISBN : 978-1-7281-6201-0

TABLE II
PERFORMANCE EVALUATION OF DIFFERENT TRADITIONAL METHODS WITH PROPOSED

Methods Sensitivity Specificity Accuracy F-Score Kappa

OGA-SVM 94.50 66.06 90.27 94.30 61.17

Naïve Bayes 92.21 57.98 87.64 92.81 48.44

SVM 93.46 60.13 88.30 93.19 61.40

Vote 85.51 - 85.51 92.18 0

Table 2 and Fig. 4 show the results offered by the OGA- as given in Table 3 and Fig. 5. Upon calculating the final
SVM with the classical models. It is shown that the Vote outcome in terms of accuracy, it shows that the LDT/UDT-1
model offers least results by attaining a specificity of 85.51%, as well as LDT/UDT-2 methods accomplish a minimum and
accuracy of 85.51% and F-score of 92.18%. Besides, the NB similar accuracy of 84. Followed by, a gradual better
model has offered a slightly better outcome by offering a classification result is provided by LDT/UDT-10 that shows
sensitivity of 92.21%, specificity of 57.98%, accuracy of the accuracy of 84.30.
87.64%, F-score of 92.81% and kappa value of 48.44%
respectively. Along with that, the SVM model has offered near TABLE III
optimal results and provided a higher sensitivity of 93.46%, COMPARISON WITH RECENT METHODS FOR APPLIED DATASET
specificity of 60.13%, accuracy of 88.30%, F-score of 93.19% IN TERMS OF ACCURACY AND F-SCORE
and kappa value of 61.40% respectively. However, the
proposed model shows superior results by offering a
maximum sensitivity of 94.50%, specificity of 66.06%, Methods Accuracy F-Measure
accuracy of 90.27%, F-score of 94.30% and kappa value of
61.17% respectively. OGA-SVM 90.27 94.30

LDT/UDT-1 84.00 57.89

LDT/UDT-2 84.00 54.29

LDT/UDT-3 85.33 54.17

LDT/UDT-4 84.75 55.47

LDT/UDT-5 85.40 56.29

LDT/UDT-6 84.67 54.90

LDT/UDT-7 84.86 57.60

LDT/UDT-8 84.63 58.02

LDT/UDT-9 84.78 56.23


Fig. 4 Comparative analysis with traditional methods
LDT/UDT-10 84.30 56.02
The further examination of optimal result of OGASVM
method states that, a relative investigation is conducted by
using projected methods in terms of accuracy and F-measure

MVIT, PONDICHERRY 526


IEEE - International Conference on System, Computation, Automation and Networking (ICSCAN) 3 & 4 July 2020
ISBN : 978-1-7281-6201-0

Then, LDT/UDT-8 and LDT/UDT-6 approach depicts a REFERENCES


reasonable final outcome when compared with previous [1] J. Han and M. Kamber, Data Mining: Concepts and
models by reaching closer accuracy measure of 84.63 and Techniques (Morgan Kaufmann, India, 2006).
84.67, correspondingly. Simultaneously, LDT/UDT-4 and [2] V. Veeramanikandan, M. Jeyakarthic, "An Ensemble
LDT/UDT-7 frameworks showcase acceptable results by Model of Outlier Detection with Random Tree Data
reaching accuracy of 84.75 and 84.86, correspondingly. Classification for Financial Credit Scoring Prediction
Afterwards, LDT/UDT-3 and LDT/UDT-5 methods exhibit a System". International Journal of Recent Technology
better classification result of 85.33 and 85.40 respectively. and Engineering (IJRTE), ISSN: 2277-3878, Volume-8
However, the proposed OGASVM technique attains a Issue-3, September 2019
qualified CCP classification by reaching optimal accuracy of [3] Muhammad, M., Nafis, S., Mohammad, M., Awang,
90.27. By estimating the simulation outcome in terms of F- M., Rahman, M., & Deris, M. (2017). Churn
measure, the LDT/UDT-3 and LDT/UDT-2 method obtains classification model for local telecommunication
lower and same F-measure of 54.17 and 54.29. Next, a little company base don rough set theory. Journal of
increase in classification result provided by LDT/UDT-6 that Fundamental and Applied Sciences, 9(68), 854–868.
gives an F-measure of 54.90. Then, LDT/UDT-4 and [4] Haenlein, M. (2013). Social interactions in customer
LDT/UDT-10 techniques implement a gradual increased churn decisions: The impact of relationship
outcome than traditional models by reaching similar F- directionality. International Journal of Research in
measure of 55.47 and 56.02 correspondingly. Simultaneously, Marketing, 30(3), 236–248.
LDT/UDT-9 and LDT/UDT-7 approach shows manageable http://www.sciencedirect.com/science/article/pii/S0167
final outcome given by F-measure of 56.23 and 57.60 811613000402?
correspondingly. Afterwards, the LDT/UDT-1 and LDT/UDT- via%3Dihubhttp://dx.doi.org/10.1016/J.IJRESMAR.20
8 methods attained a good classification result of 57.89 and 13.03.003.
58.02. Finally, the projected OGASVM technique attains [5] Brandusoiu, I., & Toderean, G. (2013). Churn
optimal CCP classification by achieving a greater F-measure prediction in the telecommunications sector using
of 94.30, respectively. support vector machines. Annals Of The Oradea
University Fascicle of Management and Technological
Engineering(1).
[6] Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J.
and Anwar, S., (2019). Customer churn prediction in
telecommunication industry using data
certainty. Journal of Business Research, 94, pp.290-
301.
[7] He, B., Shi, Y., Wan, Q., & Zhao, X. (2014). Prediction
of customer attrition of commercial banks based on
SVM model. Procedia computer science. Vol.~31.
Procedia computer science (pp. 423–430). Elsevier
Masson SAS. http://dx.doi.org/10.1016/j.procs.2014.
05.286.
[8] Burez, J., & Van den Poel, D. (2009). Handling class
imbalance in customer churn prediction. Expert
Systems with Applications, 36(3), 4626–4636.
[9] Burez, J., & Poel, D. V. D. (2012). Data mining
Fig. 5 Comparative analysis with recently presented models concepts and techniques. Pearson Education Asia Inc.
[10] Richter, Y., & Slonim, N. (2010). Predicting customer
churn in mobile networks through analysis of social
IV. CONCLUSION groups. Proceedings of the 2010 SIAM International
This paper has introduced a new OGA-SVM model for Conference on Data Mining (pp. 732–741).
CCP. Initially, the OGA is derived by the use of double chain- [11] Lu, N., Lin, H., Lu, J., & Zhang, G. (2014). A customer
quantum-genetic-algorithm. Next, the derived OGA is applied churn prediction model in telecom industry using
for optimizing the parameters of SVM namely C and γ. The boosting. Journal of Industrial Informatics, 10(2), 1–7.
outcome of the OGA-SVM model is tested against a [12] http://www.sgi.com/tech/mlc/db/ (Last Access:
benchmark dataset from telecom industry. The experimental November 30, 2017 02:00 PM).
outcome indicated that the OGA-SVM shows excellent results
by offering better sensitivity of 94.50, specificity of 66.06,
accuracy of 90.27, F-Score of 94.30 and kappa value of 61.17
correspondingly.

MVIT, PONDICHERRY 527


View publication stats

You might also like