Ade 2014

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

An incremental ensemble of classifiers as a

technique for prediction of student’s career choice


Ms. Roshani Ade Dr. P. R. Deshmukh
Research Scholar, Dept. of Computer Science and Engg, Professor, Dept. of Computer Science and Engg,
Sipna College of Engineering and Technology, Amravati Sipna College of Engineering and Technology, Amravati,
Amravati University, Maharashtra, India, Amravati University, Maharashtra,
[email protected] [email protected]

Abstract— The ability to predict the career of students can be formed during training. Firstly, how the hypothesis can be
beneficial in a huge number of different techniques which are developed in the principled way, which includes the approaches
connected with the education structure. Student’s marks and the like bagging, boosting, subspace, stacked generalization and
result of some kind of psychometric test on students can form the mixture of experts. Secondly, how the hypothesis will be
training set for the supervised data mining algorithms. As the
integrated. For this number of rules such as a geometric average
student’s data in the educational systems is increasing day by day,
the incremental learning properties are important for machine rule, arithmetic average method, majority voting method,
learning research. Against to the classical batch learning median value, and an arithmetic average rule can be used. In
algorithm, incremental learning algorithm tries to forget this paper, for incremental learning concept, the mixture of
unrelated information while training new instances. These days, expert ensemble is combined by using majority voting rule.
combining classifiers is nothing but taking more than one opinion In most of the cases, data is being generated continuously,
contributes a lot, to get more accurate results. Therefore, a where the batch learning is meaningless, because it cannot
suggestion is an incremental ensemble of three classifiers namely handle the endlessly produced data. For larger training set, these
Naïve Bayes, K-Star, SVM using voting scheme. The ensemble algorithms are not feasible because for examining and
technique proposed in this paper is compared with the incremental
processing point of view, it would be expensive. The ensemble
algorithms, without any ensemble concept, for the student’s data
set and it was found that the proposed algorithm gives better of classifiers approach which is generally used for improving
accuracy. the generalization accuracy of a classifier can be used to tackle
the issue of incremental learning.
Keywords— incremental learning, ensemble, voting scheme. Incremental learning algorithm has the ability to learn from
new incoming data to the system even after the classifier has
I. INTRODUCTION already been generated from the previously available data. The
Now a day’s choosing right career is one of the most incremental learning algorithm produces sequences of
important aspects of the students learning process, and it is hypothesis for the sequence of training samples, where recent
difficult to choose the right career option when the number of hypothesis is nothing but the description of all data which have
options are available to choose. It is very much important to been accumulated thus far and it always hinge on preceding
consider the interest, talent, projected growth or sustainability hypothesis and the recent training data [7] -[15]. Incremental
in a particular area, before choosing a career. It is commonly learning algorithm should learn the novel information and it
seen that, many of the students have their poor academic record should preserve formerly acquired knowledge without
just because of choosing their career without considering their accessing the formerly seen data so far.
own capabilities and it will cause waste of time and the money, In this paper, an incremental ensemble which is a
so it is important to choose the right career in the first place. It combination of three classifiers which could work
is also observed that there is an impact of psychological incrementally, called as Naïve Bayes, K-Star and SVM using
parameters for choosing a right career option. The psychometric the majority voting methodology.
tests is conducted on the students and the students are classified This paper is ordered as follows, section 2 introduces about
for choosing their right career option. prediction of student's career choice, and introduces about the
In the supervised machine learning approach, the model is ensemble and incremental learning techniques in the literature
built by using predefined label instances which produces the and the applications of incremental learning concept. Section 3
model hypothesis, then that hypothesis is used to predict the describes the proposed architecture with its methodology. The
dataset description is given in section 4. The section 5 talks
new incoming instances. When the multiple hypothesis, which
about the experimentation and results of the proposed
supports the final decision making process are integrated
incremental learning concept on the student’s dataset. Finally,
together called as ensemble learning [1]. Ensemble learning section 6 gives the conclusion of the work and its future scope.
technique is having benefit for better-quality correctness as
compared to the model which generates single hypothesis, II. BATCH LEARNING AND INCREMENTAL LEARNING
therefore it has concerned increasing interest in the To handle the massive amount of data is one of the challenge
computational intelligence society [2] - [6]. As the ensemble in data mining, Online and batch algorithms can be used for it.
learning mainly focuses on the issues related to hypotheses In batch learning, an algorithm has a number of examples, which

978-1-4799-3486-7/14/$31.002014
c IEEE 384
Fig 1. The proposed incremental ensemble

can be used to classify test instances. In this case, there will not voting rule, called as majority voting rule. As the majority
be any modifications in the hypothesis once the model is built. voting rule is used, we expect to get good results as number of
In case of online learning, the algorithms continually modify its expert’s decisions will be counted.
hypothesis as it receives the samples. In this case, the model In the proposed ensemble three supervised algorithms are
frequently receives a pattern, predicts class and then the used, namely Naïve Bayes, K-star and SVM.
hypothesis will be modified consequently.
A. Naïve Bayes
The incremental learning is useful in many applications like,
computer security, market basket analysis, intelligently acting It is a simpler form of Bayesian networks, as it assumes that
user interfaces, For example, student’s carrier preferences each feature is not dependent on the remaining features. The
changes as new courses or facilities become available. So, the naïve Bayes algorithm usually used for a batch learning,
learning algorithm need to obtain the set of concept explanation because when algorithm handles each training sample
from training data which was scattered over time [16, 17]. separately, it could not perform its operations well, described
Algorithms for copying concept drift, definition of a class is in[18]-[21]. The algorithm can be trained incrementally. As per
changing as the time passes, must congregate speedily and the characteristics of the incremental learning algorithm, the
precisely to new goal concept and it should be efficient in terms naïve Bayes algorithm can be trained by using one pass only as
of space and time complexity. Some of the important per the steps below:
characteristics of incremental learning algorithms are: 1. Initialize count and total=0
x While training, it should require small stable time per a. Go through all the training samples, one
sample sample at a time
x There should be only one sample at a time in memory, b. Each training sample, t (x, y) will have its
so the fixed amount of memory will be used label associated with it.
x It will build the model by just scanning the database c. Increment the value of count, as it goes
only once. through the particular training sample.
x It should preserve previously obtained knowledge. 2. The probability is calculated by dividing individual
count by the set of training data samples of the similar
The incremental learning algorithm processes, each training class attribute.
sample at a time and maintain a current hypothesis. 3. Compute the previous probabilities p(y) as the portion
of entirely training samples which are in class y.
III. PROPOSED INCREMENTAL ENSEMBLE
Ensemble learning is a machine learning approach where the B. K-Star
number of classifiers is used for training instead of simple It is a sample based learner, where the test sample case is
machine learning classifier where only one hypothesis is decided by using the class label of training samples based on
generated from the training data. An ensemble method some kind of similarity function. It uses entropy based distance
constructs number of hypotheses by using a number of function [22]. The probability of instance x is calculated in class
classifiers and these hypotheses will be combined by using y by summing the probabilities from sample x to each sample

2014 First International Conference on Networks & Soft Computing 385


TABLE I Histogram of all the attributes, total score and the class.
A B C D E F G H I J
(Self (Empathy) (Self (Emotion (Managing (Integrit (Self (Value (Com (Altruistic
Awareness) Motivation) al Relations) y) Develo Orienta mitme Behavior)
Stability) pment) tion) nt)
11and 15 and 18 and 11 and 12 and 8 and 6 and 6 and 6 and 6and
above above above above above above above above above above
4 to 10 7 to 14 9 to 17 4 to 10 5 to 11 4 to 7 2 to 5 2 to 5 2 to 5 2 to 5
3 and 6 and 8 and 3 and 4 and 3 and 1 and 1 and 1 and
below below below below below below below below below 1 and below

that is a component of y. calculated by dividing the whole training set into equal
The probability for each class is calculated using equation 1. proportion ten sets, ie 10 cross validation is used.
The relative probability got an estimate of the class distribution The experimentation has been done by using the free available
at the point of the sample space, x. source code by Witten and Frank. Table II compares the output
of different algorithms of first experiment and shows that the
ܲ ‫ כ‬ሺȁšሻ ൌ σୠ஫େ ܲ ‫ כ‬ሺšͳȁšሻ (1) proposed algorithm gives good accuracy by using majority
voting rule. The majority voting rule is stated in equation 2.
C. SVM

The training of SVM requires the explanation of a precise š୲ ՜ ‫ݕ‬௝ ‫ ݕ݂ݏ݅ݐܽݏ‬௠௔௫
௬ೕ σ௜ୀଵ ߂௜ ሺ‫ݕ‬௝ ȁš ୲ ሻ (2)
quadratic programming optimization problem. The amount of
memory needed to train the dataset using SVM is linear, which During the second experiment, the proposed algorithm gives
allows SVM to handle very huge data set [23]-[27]. good accuracy when it is compared with Adaboost by using
SVM globally replaces all missing values and transforms SVM as a weak learner [29-30] and SVM and multilayer
nominal attributes into binary ones. The training can be done perceptron shown in Table III.
with Polynomial or RBF kernels. Multiclass problems are Table IV shows that the proposed algorithm takes less time
solved using pairwise classification. as compared to algorithms Adaboost.SVM, SVM and Multi
The proposed ensemble incremental learning algorithm is Layer Perceptron.
created by using the set of three classifiers namely, Naive
Bayes, K-Star and SVM. Whenever new samples arrive, it TABLE II. Accuracy of algorithms and voting scheme used
passes through all the algorithms and accepts a prediction of The proposed
each model, and in this incremental setting the hypothesis ensemble Naïve
K-star SVM
produced by each learner h1, h2 and is combined and the final algorithm Bayes
hypothesis h is calculated using voting strategy used in equation
2. The proposed ensemble is shown in Fig.1.
The proposed algorithm is parallelized using different 90.8
learning algorithms, so it can predict the career choice of a (Majority Voting)
student accurately, and it increases its speed. As the system
89.6 89.2 89.2
takes advantage of a parallel or distributed execution, the
efficiency of a system in terms of speed and time is very good 90.4
and it also increases the collection of applications where this ( Average of
efficiency is required. Probability)
Data set Used
The dataset is created by conducting psychometric test on TABLE III Comparing the Proposed ensemble with
300 students(dataset size) of age group 16 to 20. The dataset Adaboost.SVM, SVM, Multilayer Perceptron
contains 300 samples, 10 attributes, as described in Table I, and
7 classes, which denotes the categories of the interest of the
student [28]. The attributes are shown in Table I. All the Proposed
Multilayer
attributes are numeric attributes, except the class attribute. The ensemble Adaboost.SVM SVM
Perceptron
class value depends on total score of a student. The total score is algorithm
the addition of a score of all the 10 attributes and there are 7
classes depending on the total score of the student.
IV. EXPERIMENT RESULTS AND COMPARISONS
90.8 89.2 89.2 88
For the purpose of this study, the data set shown in Table I is
used. There is no natural order in the dataset. The accuracy is

386 2014 First International Conference on Networks & Soft Computing


TABLE IV. Training time (in sec) of a proposed algorithm by [9] M. Muhlbaier and R. Polikar, “An Ensemble Approach for Incremental
Learning in Nonstationary Environments,” Proc. Seventh Int’l Conf.
using a dual core 2 GHz system with 2GB memory Multiple Classifier Systems, pp. 490-500, 2007.
Proposed [10] H. He and S. Chen, “IMORL: Incremental multiple-object recognition
Multilayer
ensemble Adaboost.SVM SVM and localization,” IEEE Trans. Neural Netw., vol. 19, no. 10, pp. 1727–
Perceptron 1738, Oct. 2008.
algorithm
[11] J. Gao, B. Ding, F. Wei, H. Jiawei, and P. S. Yu, “Classifying data streams
with skewed class distributions and concept drifts,” IEEE Internet
0.31 4.18 0.31 3.32 Comput., vol. 12, no. 6, pp. 37–49, Nov.–Dec. 2008.
[12] H. He and S. Chen, “IMORL: Incremental Multiple-Object Recognition
and Localization,” IEEE Trans. Neural Networks, vol. 19, no. 10, pp.
Obviously, the incremental updation would be quicker as 1727-1738, Oct. 2008
compared to batch training algorithm for the data. Their [13] M. Karnick, M. D. Muhlbaier, and R. Polikar, “Incremental learning in
efficiency can be increased by minimizing the time needed for non-stationary environments with concept drift using a multiple classifier
training the data, as the main research area of data analysis is based approach,” in Proc. 19th Int. Conf. Pattern Recognit., Tampa,
the examination of accurate techniques which can be applied to FL,Dec. 2008, pp. 1–4.
the difficulties with huge number of training samples or may be [14] R. Elwell and R. Polikar, “Incremental Learning in Nonstationary
Environments with Controlled Forgetting,” Proc. Int’l Joint Conf. Neural
less number of training samples which are introducing with the Networks (IJCNN ’09), pp. 771-778, 2009.
system with some time period. [15] R. Elwell and R. Polikar, “Incremental learning in nonstationary
environments with controlled forgetting,” in Proc. Int. Joint Conf. Neural
V. CONCLUSION AND FUTURE SCOPE Netw., Atlanta, GA, Jun. 2009, pp. 771–778.
This paper intends to fill up the gap between experimental [16] M. Scholz and R. Klinkenberg, “Boosting classifiers for drifting
concepts,” Intell. Data Anal., vol. 11, no. 1, pp. 3–28, Jan. 2007.
classification of students for their career choice and the existing
[17] R. Elwell and R. Polikar, “Incremental Learning of Concept Drift in
machine learning techniques of incremental learning concept. Nonstationary Environments,” IEEE Trans. Neural Networks, vol. 22, no.
In a situation, where the data is being endlessly generated, 10, pp. 1517-1531, Oct. 2011.
storage of data is not possible for batch learning concept. Thus, [18] StijnViaene, Richard A. Derrig, and Guido Dedene, “A Case Study of
incremental ensemble learning algorithm proposed in this Applying BoostingNaive Bayes to Claim Fraud Diagnosis” ,Actions On
paper, is found to be a useful technique for offering best career Knowledge and Data Engineering, Vol. 16, No. 5, May 2004, 612-620.
choice for the student. [19] BojanMihaljevic,Pedro Larrañaga, Concha Bielza, “Augmented Semi-
naive Bayes Classifier” ,IEEE Transactions on Systems, Man and
Use of incremental learning algorithm for the data of Cypernetics-PartB: Cybernetics, Vol. 36, No. 5, Oct 2006, 1149-116.
different types, including time series, web log, spatial and [20] V. Robles,P. Larrañaga, J. M. Pria, E. Menasalvas, M. S. Perez, “ Interval
multimedia is an important area of future work as this data is like Estimation Naïve Bayes”,Advanced Data Mining and Applications”, Vol
a stream data, where the use of batch classifiers is impracticable. 4632, 2007, pp 134-145
For combining voting, the strategy used in this paper is average [21] Liangxiao Jiang,Dianhong Wang, ZhihuaCai, Xuesong Yan, “Survey of
of probability and the majority vote, apart from this it may be Improving Naive Bayes for Classification”,Advances in Artificial
significant to go for different mixture of rules to find the Intelligence,Vol 8109, 2013, pp 159-167
promptness between the rules having mixture, the individual [22] Cleary, John G., and Leonard E. Trigg, “K*: An instance based learner
using an entropic distance measure.” ICML,pp. 108-114, 1995.
classifier and the dataset used.
[23] J.C. Platt, B. Scholkopf, C. Burges and A. SVMla, “ Training of Support
REFERENCES Vector Machines using Sequential Minimal Optimization, Adavances in
kernel methods support vector learning, pp.185-208,MIT Press, 1999.
[1] S. B. Kotsiants, K. Patriacheas, M. Xenos, “A combinational incremental [24] L.J. Cao and Francis E.H. Tay, Support Vector Machine with Adaptive
ensemble of classifiers as a technique for predicting students performance Parameters in Financial Time Series Forecasting, IEEE Transactions on
in distance education”, Knowledge based systems, pp. 529-535, vol. 23, Neural Networks, 14 (6), (2003), 1045-9227.
2010.
[25] FaridMegani and Lorenzo Bruzzone, Classification of Hyperspectral
[2] K. Woods, W. P. J. Kegelmeyer, and K. Bonwyer, “Combination of Remote Sensing Images With Support Vector Machines, 42(8), (2004),
multiple classifiers using local accuracy estimates,”IEEE Trans. Pattern 0196-2892.
annal. Mach. Intell.,vol.19, no. 4,pp. 405-410,Apr.1997.
[26] Peter G.V. Axelberg, Irene Yu-HuaGu and Math H.J. Bollen, Support
[3] N. Oza, “Online ensemble learning,” Ph.D. dissertation, Dept. Comput. Vector Machine for Classifications of Voltage Disturbances, IEEE
Sci., Univ. California, Berkeley, 2001. Transactions on Power Delivery, 22 (3), (2007), 0885-8977.
[4] L. I. Kuncheva, “Classifier ensembles for changing environments,” In [27] Gang Wang, Jian Ma, A hybrid ensemble approach for enterprise credit
Multiple Classifier Systems, vol. 3077. New York: Springer-Verlag, risk assessment based on Support Vector Machine, Expert System with
2004, pp. 1–15. Applications, 39, 2012, 5325-5331.
[5] L. I. Kuncheva, “Classifier ensembles for detecting concept change in [28] Roshani Ade, Dr. P. R. Deshmukh, “Classification of students using
streaming data: Overview and perspectives,” in Proc. Eur. Conf. psychometric tests with the help of incremental naïve baiyes algorithm,
Artif.Intell., 2008, pp. 5–10. IJCA, Vol.89, No. 14, pp.27-31, March 2014.
[6] M. Muhlbaier, A. Topalis, and R. Polikar, “Learn++.NC: Combining [29] Shuji Kawaguchi and Ryuei Nishii, Hyperspectral Image Classification
ensemble of classifiers with dynamically weighted consult-and-vote for by BootstrapAdaBoost With Random Decision Stumps, IEEE
efficient incremental learning of new classes,” IEEE Trans. Neural Transactions on Geoscience and Remote Sensing, vol 45, No. 11,
Netw.,vol. 20, no. 1, pp. 152–168, Jan. 2009. November 2007.
[7] C. Giraud-Carrier, “A Note on the Utility of Incremental Learning,” [30] Carmen AlinaLupas ࡤcu, DomenicoTegolo, and EmanueleTrucco, FABC:
Artificial Intelligence Comm., vol. 13, no. 4, pp. 215-223, 2000. Retinal Vessel Segmentation Using AdaBoost, IEEE Transactions on
[8] S. Lange and S. Zilles, “Formal Models of Incremental Learning and Information Technology in Biomedicine, vol 14, No. 5, Sept 2010
Their Analysis,” Proc. Int’l Joint Conf. Neural Networks, vol. 4, pp. 2691-
2696, 2003.

2014 First International Conference on Networks & Soft Computing 387

You might also like