Olatunji 2017
Olatunji 2017
Olatunji 2017
Abstract— Extreme Learning machines (ELM) and Support Vector messages on daily basis several times within few minutes or
Machines have become two of the most widely used machine learning hours. In essence, the importance of email in our present day life
techniques for both classification and regression problems of recent. is very clear and self convincing. Despite the huge benefits of
However the comparison of both ELM and SVM for classification and emails, unfortunately its usage has been bedeviled with the huge
regression problems has often caught the attention of several presence of unsolicited and sometimes fraudulent emails that
researchers. In this work, an attempt has been made at investigating have often caused several and huge damages to individuals and
how SVM and ELM compared on the unique and important problem of corporate establishments both psychologically and financially. To
Email spam detection, which is a classification problem. The mitigate the effect of such email abuses, there has always been
importance of email in this present age cannot be overemphasized.
the need to promptly detect and isolate such unwanted emails
Hence the need to promptly and accurately detect and isolate
unsolicited mails through spam detection system cannot be over
through what is popularly referred to as spam detection system.
emphasized. Empirical results from experiments carried out using very Spam detection facilitates separating spam email from non-spam
popular dataset indicated that both techniques outperformed the best emails thereby making it possible to prevent spam email from
earlier published techniques on the same popular dataset employed in getting into the inbox of users. Thus it could be stated that spam
this study. However, SVM performed better than ELM on comparison detection is the first step and the most important stage in the
scale based on accuracy. But in term of speed of operation, ELM email filtering process to ensure spam mails are prevented from
outperformed SVM significantly. entering the user’s inbox, particularly in this age of huge spam
mails due to bulk mailing tools that has pushed up the amount of
Keywords— Extreme Learning Machines; Support vector spam emails in a skyrocketed manner.
machines; Email; Spam; Non-Spam; Spam detector; computational
intelligenc) Several spam detection models have been proposed and tested
in literatures but still the reported accuracy still begs for more
work in this direction in order to achieve better accuracy. Authors
I. INTRODUCTION in [13] made use of artificial neural network based model for
Extreme Learning machines (ELM) was recently introduced spam detection but only succeeded in achieving 86% accuracy
as a learning algorithm to be used in training single hidden-layer which is still considered far from the ideal. In [14], the authors
feed forward neural network and within a very short time it has applied naves Bayes approach while incorporating cost-sensitive
distinguished itself as one of the widely used and successful multi-objective Genetic Programming for feature extraction and
techniques for both prediction and classifications problems [1]– used it for Spam detection but achieved an accuracy of 79.3%
[4]. Meanwhile, Support vector machines that has its origin in correctly detected email types. Moreover in [15], the authors
statistical learning theory has distinguished itself as one of the presented a spam detection system based on interval type-2 fuzzy
best if not the best machine learning based techniques in recent sets, exploring the capabilities of fuzzy logic of type-2 category,
times with several successful applications in several fields, for but could only succeeded in obtaining spam detection accuracy
both classification and regression, often with excellent results of 86.9% for the testing set. The authors in [16] made use of
[5]–[9]. However the comparison of both ELM and SVM for genetic algorithm based hybrid on the same dataset but were only
classification and regression problems has often caught the able to push the accuracy to 90% accuracy for the testing set.
attention of several researchers in recent time [10]–[12]. In this Furthermore and of recent is the work of [17], where an hybrid
paper, the comparison between ELM and SVM is investigated for model consisting of smart hybridization of negative selection
the unique and important problem of Email spam detection, algorithm (NSA) with particle swarm optimization (PSO) was
which happens to be a classification problem, in order to achieve presented. They were able to push the accuracy to 91.22% for the
better accuracy in the detection process. testing set. Considering the work done so far and the performance
accuracy obtained till date, it is clear that there is still the need to
It is an established fact that electronic mail (Email) has further explore the possibility of achieving better results using the
become extremely popular among people nowadays. In fact same popular datasets. This work is thus set to come up with
hardly can people do without sending or receiving email
A. Extreme Learning Machines (ELM) where wi = [wi1, wi2, … , win]T is the weight vector that
Extreme learning machine (ELM) is a learning algorithm for connects the ith hidden neuron and the input neurons, βi = [βi1, βi2,
single-hidden layer feed-forward neural networks (SLFNs), … , βim]T is the weight vector that connects the ith neuron and the
which chooses its hidden nodes randomly and then determines output neurons, and bi represents the threshold of the ith hidden
the output weights of SLFNs analytically [18][19]. This new neuron. The “.” in wi . xj stands for the inner product of wi and xj.
learning algorithm for SLFNs was proposed as a means to It must be noted that in the case of email spam, xi will
overcoming the perennial problems of the classical feed-forward be those 57 independent variables. That is, there are inputs
neural networks (FFNN), particularly its slow gradient based
learning algorithms and the iterative tuning of its parameters. The
x ij , i = 1,....., N ; j = 1,....., M and target t i , which is the
proposed ELM uniquely work in such a way that it is tuning-free class label indicating whether the current sample i is a spam
and avoids using the gradient based learning algorithms that email or not, where N=number of data points and M=57, which is
consumes lots of time. Further details of this interesting the number of independent variables. Details about the dataset
technique could be found in [18], [19]. are presented later in section III.
The aim of the SLFN is to minimize the difference between
network output (oj) and the target (tj). This can be expressed
mathematically as:
N
β g (w
i =1
i i . x j +bi ) =t j , j =1,..., N (2)
where
Y = ( y 1 … y n ) . The lower
T
and the output variables,
g(w1.x1 +b1) ... g(wN .xN +bN )
case letters x i 1 , x i 2 , … , x ip for all i = 1, … , n refer to the
. . values of each observation of the input variables, and y = k to
H(w1,...,wN ,b1,...,bN ,x1,...,xN )= . ... . (4a)
the response variable Y to refer to class A k for all
. .
g(w .x +b ) ... g(w .x +b ) k = 1, 2, … , c , where c ≥ 2 , but in this case of spam
1 N 1 N N N N×N
detection, c = 2 representing spam or not spam two classes
β T
T T labels.
1 1
In what follows, the basic ideas behind SVM for pattern
. . recognition, especially for the two-classes classification problem
β = . and T = . (4b) are briefly described and reader are referred to [22], [25] for a
full description of the technique.
. . According to [22], [25] the goal of two-classes SVM is to
β T T T construct a binary classifier or derive a decision function from
N N × m N N ×m
the available samples which has a small probability of
misclassifying a future sample. The proposed SVM implements
As proposed by Huang and Babri in [20], [21] H is
referred to as the output matrix of the neural network. Based on
the following idea: it maps the input vectors x ∈ R d into a
the aforementioned, the training procedures for the ELM based high dimensional feature space Φ ( x ) ∈ Η (see figure 1) and
classifier can be summarized in the following algorithmic steps.
constructs an Optimal Separating Hyperplane (OSH), which
See [18], [19] for further details on the workings of ELM
maximizes the margin, which is the distance between the hyper
algorithm.
plane and the nearest data points of each class in the space
B. Support Vector Machines (SVM)
Η (see figure 2). Different mappings construct different SVMs.
The mapping Φ (⋅) is performed by a kernel function
Support vector machines (SVM) is a statistical based
machine learning techniques with unique ability to model K (x i , x j ) which defines an inner product in the space Η .
complex relationships among variables [22]. It uniquely The decision function implemented by SVM can be written as
addresses the curse of dimensionality through the use of [22], [25]:
generalization control technique. Curse of dimensionality often
limit the performance of machine learning techniques in the face N
of few data simples but for support vector machines it has f (x ) = sgn y i α i ⋅ K (x , x i ) + b (5)
distinguish itself as a unique technique with ability to perform i =1
excellently even in the face of few data samples [22]–[24]. In
support vector machines, the formulation leads to a global Where the coefficients α i are obtained by solving the
quadratic optimization problem with box constraints, which is following convex Quadratic Programming (QP) problem:
readily solved by interior point methods [22], [25]. Support
vector machines is uniquely empowered through its kernel Maximize
functions to easily map non-separable problems to higher 1 N
N N
dimensions where they become easily separable. α −
i =1 i
α i α j ⋅ y i y j ⋅ K ( x i ,x j )
2 i =1 j =1
Generally, in prediction and classification problems, the
purpose is to determine the relationship among the set of input Subject to 0 ≤ αi ≤ C (6)
and output variables of a given dataset D = {Y , X } where
N
X ∈ R p represents the n-by-p matrix of p input variables also
know as predictors or independent variables. It may be noted
α
i =1
i yi = 0 i = 1, 2,.....N .
III. EMPIRICAL STUDIES spam or not. After successfully training the SVM model and
For the empirical work, the popular and earlier used dataset validation, the testing set that have been kept away from the
[27] by several researchers was acquired. Computational system are then presented to be used for the actual testing that
intelligence methodologies and procedures based on SVM and will determine the performance accuracy of the proposed
ELM were then followed to arrive at the final outcome of the system. Similar procedures were also followed for the ELM
empirical works. model.
during the testing phase, ELM also achieved shorted possible SVM achieved 94.06% testing accuracy while ELM achieved
time compared to SVM. 93.04% accuracy, which indicated just 1.1% performance
improvement that SVM achieved over that of ELM. Since the
B. results of ELM & SVM compared with earlier works improvement offered by SVM over ELM accuracy is minimal, it
Furthermore, in order to appreciate and make the therefore suggest that in situation where time of detection is very
improvement provided by these two proposed models clearer, important like in real time systems, then ELM spam detector
their accuracies comparison to other earlier published schemes should be given preference over SVM spam detector. This is in
implemented on the same dataset are presented below to support tandem with the outcomes of previous researches [11], [32], [33],
result discussion. which compared SVM with ELM on other problem areas. In all
these cases, ELM has always demonstrated speed advantage over
SVM while SVM often achieve better performance over ELM
TABLE II. COMPARISON OF TESTING ACCURACY FOR THE PROPOSED SVM even though with just minimal difference in performance
AND ELM BASED SPAM DETECTORS WITH OTHER EARLIER PUBLISHED
CLASSIFIERS ON THE SAME DATASET IN TERM OF ACCURACY
accuracy or even at par.
Classifiers Accuracy (%) Furthermore, testing accuracy for these two compared system
Proposed SVM-spam detector 94.06 (ELM & SVM) were also compared with those of other recently
Proposed ELM-spam detector 93.04 published spam detector schemes tested on the same popular
NSA-PSO [17] 91.22 database used in this study. The need for a better and more
PSO [17] 81.32 accurate email spam detector scheme is definitely germane,
NSA [17] 68.86 hence the two spam detector models implemented in this research
BART [29] 79.3
work came appropriately and timely as two improved schemes
IT2 Fuzzy Set [15] 86.9
over the best among the previous methods used on the same
dataset. In fact, the best accuracy reported in literature was
From the results in table II above, it could be clearly seen 91.22% testing accuracy [17], which indicated that SVM and
that the two proposed SVM and ELM based spam detector in ELM respectively achieved 3.11% and 2.0% improvement over
this work clearly outperformed all the earlier used classifiers on the best reported model in literature on the same dataset. Thus,
the same dataset as earlier cited in [17]. Specifically, it could be the encouraging outcomes recorded in this work has further
noticed that the proposed SVM and ELM spam detectors corroborated the unique reputation of SVM and ELM as two of
respectively achieved improvement of 3.11% and 2.0% over the the very viable and reliable prediction or classification tools with
best among the other earlier schemes, which was the hybridized excellent performance in different field of applications. As a
negative selection algorithm-particle swarm optimization (NSA– result of the promising results achieved in this work, efforts shall
PSO) proposed in [17]. NSA-PSO was reported to be the be made next, to investigate any possible means to improve upon
hybrid of NSA and PSO in order to achieve better accuracy [17], the performance while also exploring the unique capability of
yet both ELM and SVM based classifiers in this work ELM and SVM classifiers in other germane areas where accurate
outperformed all the three including the hybrid scheme, perhaps prediction or classification outcomes are highly desirable, for
due to the systematic parameter search procedures implemented instance in biomedical predictions to save lives and facilitate
coupled with the excellent reputation of both ELM and SVM in preemptive diagnosis of diseases. Finally, the comparison of
various previous research findings. ELM and SVM models in different field have caught the
attention of researchers in recent time and this work present the
Thus it is very clear from the obtained accuracy as presented first attempt at comparing ELM and SVM on the problem of
in table II that both ELM and SVM based spam detectors email spam detection. Hopefully this work should spur further
implemented in this work clearly outperformed earlier schemes works in this direction.
implemented on the same dataset. This work has further
corroborated the often reported superior performance of SVM
and ELM models in various fields of applications [7], [9], [24],
[30], [31].
Acknowledgment
V. CONCLUSION The author will like to acknowledge the University of Dammam,
Kingdom of Saudi Arabia for some of the facilities utilized
In this work, the comparison of ELM and SVM as two of the
during the course of this research.
very popular and recent successful computational intelligence
techniques have been carried out on the problem of email spam
detection. The two models have been proposed, trained and
tested using popular and often used standard database. Empirical References
results from simulation indicated that the proposed SVM based [1] T. Mantoro, A. Olowolayemo, and S. O. Olatunji, Mobile user location
scheme outperformed the ELM in term of accuracy while ELM determination using extreme learning machine. IEEE, 2010, pp. D25–D30.
outperformed SVM in term of speed of operation. Specifically [2] S. O. Olatunji, I. A. Adeleke, and A. Akingbesote, “Data Mining Based on
2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE)
Extreme Learning Machines for the Classification of Premium and Regular [18] G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: a
Gasoline in Arson and Fuel Spill Investigation,” J. Comput., vol. 3, no. 3, new learning scheme of feedforward neural networks,” International Joint
pp. 130–136, 2011. Conference on Neural Networks (IJCNN2004), vol. 2. Budapest, Hungary,
pp. 985–990, 2004.
[3] S. O. Olatunji, Z. Rasheed, K. A. Sattar, A. M. Al-Mana, M. Alshayeb, and
E. A. El-Sebakhy, “Extreme Learning Machine as Maintainability [19] G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine:
Prediction model for Object-Oriented Software Systems,” J. Comput. Vol. Theory and applications,” Neurocomputing, Elsevier, vol. 70, no. 1–3, pp.
2, Issue 8, August 2010, vol. 2, no. 8, pp. 42–56, 2010. 489–501, 2006.
[4] S. O. Olatunji, A. Selamat, A. Abdulraheem, and A. A. Abdul Raheem, “A [20] H. Guang-Bin and A. B. Haroon, “Upper bounds on the number of hidden
hybrid model through the fusion of type-2 fuzzy logic systems, and neurons in feedforward networks with arbitrary bounded nonlinear
extreme learning machines for modelling permeability prediction,” Inf. activation functions,” IEEE Trans. Neural Networks, vol. 9, no. 1, pp. 224–
Fusion, vol. 16, no. 2014, pp. 29–45, Mar. 2014. 229, 1998.
[5] K. O. K. O. Akande, T. O. Owolabi, and S. O. S. O. Olatunji, [21] G. B. Huang and H. A. Babri, “Feedforward neural networks with arbitrary
“Investigating the effect of correlation-based feature selection on the bounded nonlinear activation functions. 9(1):224–229,” IEEE Trans
performance of support vector machines in reservoir characterization,” J. Neural Netw., vol. 9, no. 1, pp. 224–229, 1998.
Nat. Gas Sci. Eng., vol. 22, pp. 515–522, Jan. 2015.
[22] Cortes and V. Vapnik, “Support vector networks.,” Mach. Learn., vol. 20,
[6] A. E. El-Sebakhy, “Forecasting PVT properties of crude oil systems based pp. 273–297, 1995.
on support vector machines modeling scheme,” J. Pet. Sci. Eng., vol. 64,
no. 1–4, pp. 25–34, 2009. [23] T. O. Owolabi, K. O. Akande, and S. O. Olatunji, “Development and
validation of surface energies estimator (SEE) using computational
[7] T. O. Owolabi, K. O. K. O. Akande, and S. O. S. O. Olatunji, “Application intelligence technique,” Comput. Mater. Sci., vol. 101, pp. 143–151, Apr.
of computational intelligence technique for estimating superconducting 2015.
transition temperature of YBCO superconductors,” vol. 43, pp. 143–149,
2016. [24] A. A. A. A. Adewumi, T. O. Owolabi, I. O. I. O. Alade, and S. O. S. O.
Olatunji, “Estimation of physical, mechanical and hydrological properties
[8] T. O. Owolabi, K. O. Akande, and S. O. Olatunji, “Estimation of Surface of permeable concrete using computational intelligence approach,” Appl.
Energies of Transition Metal Carbides Using Machine Learning Soft Comput., vol. 42, pp. 342–350, Feb. 2016.
Approach,” Int. J. Mater. Sci. Eng., no. June, pp. 104–119, 2015.
[25] V. Vapnik, The Nature of Statistical Learning Theory. Springer , N.Y. ,
[9] M. O. Ibitoye, N. A. Hamzaid, A. K. Abdul Wahab, N. Hasnan, S. O. 1995.
Olatunji, and G. M. Davis, “Estimation of electrically-evoked knee torque
from mechanomyography using support vector regression,” Sensors [26] S. Canu and Y. Grandvalet and V. Guigue and A. Rakotomamonjy, “SVM
(Switzerland), vol. 16, no. 7, 2016. and Kernel Methods Matlab Toolbox,” Perception Systemes et
Information, INSA de Rouen, Rouen, France, 2008. .
[10] Q. Xu, Q. Xu, Xu, and Qingsong, “A Comparison Study of Extreme
Learning Machine and Least Squares Support Vector Machine for [27] J. S. Mark Hopkins, Erik Reeber, George Forman, “SpamBase Dataset.
Structural Impact Localization,” Math. Probl. Eng., vol. 2014, pp. 1–8, Hewlett-Packard Labs; 1501 Page Mill Rd.; Palo Alto; CA 94304,” 1999.
2014. [28] G.-B. Huang, “MATLAB Codes of ELM Algorithm,”
[11] G.-J. Cheng, L. Cai, and H.-X. Pan, “Comparison of Extreme Learning http://www.ntu.edu.sg/home/egbhuang/ELM_Codes.htm. 2006.
Machine with Support Vector Regression for Reservoir Permeability [29] S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair, “Bayesian Additive
Prediction,” in 2009 International Conference on Computational Regression Trees-Based Spam Detection for Enhanced Email Privacy,” in
Intelligence and Security, 2009, pp. 173–176. 2008 Third International Conference on Availability, Reliability and
[12] S. O. Olatunji, “Comparison of Extreme Learning Machines and Support Security, 2008, pp. 1044–1051.
Vector Machines on Premium and Regular Gasoline Classification for [30] T. O. Owolabi, K. O. Akande, and S. O. Olatunji, “Computational
Arson and Oil Spill Investigation,” ASIAN J. Eng. Sci. Technol., vol. 1, no. intelligence method of estimating solid- liquid interfacial energy of
1, pp. 1–7, 2011. materials at their melting tem- peratures,” J. Intell. fuzzy Syst., 2016.
[13] F. G. Levent Özgür, Tunga Güngör and F. Gürgen, “Spam Mail Detection [31] T. O. Owolabi, M. Faiz, S. O. S. O. Olatunji, and Idris.K.Popoola,
Using Artificial Neural Network and Bayesian Filter,” pp. 505–510, 2004. “Computational intelligence method of determining the energy band gap of
[14] Y. Zhang, H. Li, M. Niranjan, and P. Rockett, “Applying Cost-Sensitive doped ZnO semiconductor,” vol. 101, pp. 277–284, 2016.
Multiobjective Genetic Programming to Feature Extraction for Spam E- [32] S. A. Mahmoud and S. O. Olatunji, “Automatic recognition of off-line
mail Filtering,” Springer Berlin Heidelberg, 2008, pp. 325–336. handwritten Arabic (Indian) numerals using support vector and extreme
[15] R. Ariaeinejad and A. Sadeghian, “Spam detection system: A new learning machines,” Int. J. Imaging, vol. 2, no. 9 A, pp. 34–53, 2009.
approach based on interval type-2 fuzzy sets,” in 2011 24th Canadian [33] S. O. Olatunji and H. Arif, “Identification Of Erythemato-Squamous Skin
Conference on Electrical and Computer Engineering(CCECE), 2011, pp. Diseases Using Extreme Learning Machine And Artificial Neural
000379–000384. Network,” Ictact J. Soft Comput., vol. 6956, no. October, 2013.
[16] F. Temitayo, O. Stephen, and A. Abimbola, “Hybrid GA-SVM for
Efficient Feature Selection in E-mail Classification,” ISSN, vol. 3, no. 3,
pp. 2222–1719, 2012.
[17] I. Idris and A. Selamat, “Improved email spam detection model with
negative selection algorithm and particle swarm optimization,” Appl. Soft
Comput., vol. 22, no. September 2014, pp. 11–27, Sep. 2014.