Ijser
Ijser
Ijser
ISSN 2229-5518
Abstract— Faults are related to failures and they do not have much power for indicating a higher quality or a better system above the
baseline that the end-users expect.The system faults are the defects that brim in executable files. Conventional approaches employ the
experts to navigate directly into the source code errors. However expansion in system size grew the complexity of task exponentially and
generated a scope for new methods in fault classification.Experimental studies have shown that miniature bugs are reason of faults. In a
considerable size of system the faulty labels and non-faulty labels are marked during modular phase. This paper presents the adaptive-
neuro fuzzy c-means clustering for fault classification via fuzzy c-means clustering.Experimental studies confirmed that only a small portion
of software modules cause faults in software systems.The NASA pc1 database is used for experiments and the results in this approach is
enhanced than previous clustering based approaches.
Index Terms— Adaptive-neuro fuzzy c-means clustering, Data Classification, Defect Prediction, Fault Prediction, Fault proneness, Fuzzy
clustering, Software Project Success.
—————————— ——————————
IJSER
1 INTRODUCTION
IJSER
have classified data of faulty modules. Today the software tasets considered by authors. Yue Jiang, Bojan Cukic and Tim
fault detection systems are designed with 85% accuracy in Menzies [30] experimented to find the suitability of metrics in
their analysis subjected to dataset considered and evaluation early development to find fault prone software module. Also
parameters taken into account. This level of accuracy is appli- authors demonstrated a predictive module using metrics to
cable in economical world of software industry. characterize textual requirements. The model was tested on 5
The fault prediction techniques are sourced by means datasets. The early lifecycle metrics plays significant role in
of historical data. Research work suggest that the system un- project management but the requirement metrics is unable to
der development is prone to fault if the software metrics predict data itself.
measures similar properties of software and faulty modules
developed and sensed previously in same environment [10].
The conventional applications provide us a platform for fault
3 METHODOLOGY
proneness and methodology for techniques for fault predic- The Fuzzy C-Means clustering for classification of faults in
tion and mitigation. The supervised learning methods stated software fault prediction is conventional approach. The Fuzzy
in literature are combined study of fault data and software C-Means clustering method [31] is the reference of adaptive
metrics that implements different learning algorithms. In late method that improve the performance index in faults classifi-
90’s and starting years of 21st century many techniques sur- cation sector for software systems. The enhancement of this
faced as the solution for this problem. Neural networks [11, 12, method is the collective co-relation of feed-forward neural
13], Genetic Algorithm [14] were developed for large datasets network [33] with fuzzy c-means to outperform the assign-
to generate the generalized relation. Dampster-Shafer Net- ment of mean deviation and absolute error to a cluster to min-
works [15] believed the data can be treated as faulty depend- imize the distance for fault prediction.
ing on combined evidence from different sources. Naïve Bayes The data of PC1 (NASA) is input to the system. For the
[16, 17] stated that the data may be considered as faulty irre- Fuzzy C-Means to cluster the faulty data requires pre-
spective of its nature, if its parameters match the predefined processing of data to minimize time consumption. The output
threshold for faulty systems. Decision trees [18] map the ob- of C-Means clustering is stored for comparison. The output of
servations to predict the possible outcomes. Artificial Immune C-Means is fed to adaptive Neuro-Fuzzy C-Means clustering
Systems [19] is an appreciated algorithm as it defines the cog- algorithm. The algorithm tuned by Neural Network trains the
nitive patterns once trained to update data. Support Vector data and improve performance index. The output of algorithm
Machines [20, 21] employs associated learning algorithm used is compared with output of Fuzzy C-Means in terms of accu-
in classification to recognize and analyze datasets. Case-based racy, reliability, RMSE and MAE.
Reasoning [22] keeps the track of solutions and detects prob- TABLE 1
lems based on these results. Ant-Colony Optimization [23] is PC1 DATASET (SOURCE: NASA)
probabilistic approach to minimize the computational prob- Faulty Infor- Faultless
lems by graphical method. Fuzzy logic [24] is the clustering mation Information
algorithm to collect fault data with high accuracy. Basili et al.
1996 [25] proposed the logistic regression that employs do- PC1 Da- 23 77
IJSER © 2014
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September-2014 294
ISSN 2229-5518
taset the clustering algorithm executed for determination of each
point’s degree that belongs to cluster. Generally, the restrictive
constraint scale the performance of various hard clustering algo-
3.1 Data Pre-processing
rithms to cluster original data into different cluster groups. The
This paper refers PCI dataset as input. The dataset is the Bezdek [32] proposed the Fuzzy C-Means Clustering to improve
matrix ( ) with distinguish features. the Hard C-Means Clustering (HCM) in which the overlapping
between different groups takes place.
Xij is the initial PC1 data matrix.
A is the size of dataset. The reference Fuzzy C-Means clustering considered is studied
B is the number of distinguish features. in the work of Zhiwei Gao et al. [31]. In this method the out-
(1) put vector of preprocessing (eq. 3) taken as input
( ) are classified in c number of clusters
Where is and each clustering center is calculated. The primary differ-
ence that outdates the HCM against FCM is fuzzy partition.
is the number of elements in sample This partition is responsible for determination of the degree by
is the random value of B taken from a finite data which every data point belongs to every group using the
set . membership function in the range 0-1. ‘U’ the element of
is the standard deviation. membership matrix is allowed to vary in range 0 to 1, and also
derives the Fuzzy partition. Based on the rules of normaliza-
(2) tion the total of membership elements in a data set is equal to
Where t is the mean value of distinguish features 1 [31].
B= (4)
From eq. (1) and eq. (2) the new preprocessed dataset Where, c is number of clusters.
IJSER
will be (matrix representation).: The objective function (or cost function) of FCM is the general-
(3) ization of equation:
Where s is the standard deviation and (5)
t is the mean of distinguish features.
where, is between 0 and 1, c i is the clustering centers of
fuzzy group , is the Euclidean distance be-
Data
tween the ith clustering centers and the jth data point, x j is the
jth data point, is weighted index.
To obtain the required parameters new objective functions are
Data Preproessing structured that makes equation (5) into minimum
(6)
Calculation of Objec-
tive Function Where is the Lagrange multiplier of n inhibit-
ed formula described in equation (4). The required values that
minimize the equation (5) are as follows:
Calculation of cluster
centers and member- (7)
ship functions (8)
3.2 Fuzzy C-Means Clustering Step2: Calculating the value of clustering centers c i from eq
The Fuzzy C-Means Clustering (FCM) also called ISODATA is (4).
IJSER © 2014
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 5, Issue 9, September-2014 295
ISSN 2229-5518
Step 3: In accordance with equation (5), calculate the cost func- Layer 4: the input of this layer is the fuzzy output of layer 2
tion. The iteration in algorithm will be terminated once it is while the output is a single number. The parameters are de-
less than certain value of threshold in comparison with change fined by eq. 13 [35].
in last cost function value.
(13)
Step 4: Calculate the next matrix U (based on eq. 8) and return 5. Layer 5: the overall output is computed by summation of
to step 2. each incoming signal [34].
Finally it can be concluded that weighted index m and cluster
(14)
c are required in FCM algorithm. m decides the flexibility of
algorithm i.e. if it high in value the cluster effect will be de-
prived and if in case if it is too small, the algorithm will be Data Set of NASA (100*21) values
near to Hard C-Means Clustering (HCM). If the condition c>1
holds true, the c will be far less than total number of cluster
samples.
Preprocessing (100*2) by Mean and
Standard Deviation
4 ADAPTIVE NEURO FUZZY INFERENCE
The adaptive neuro fuzzy inference system is reasoning fuzzy
FCM (Formation of 25 Centers)
system that is trained by neural network for computation of
membership function parameters. The method tracks the in-
put output data as a non-linear relation with inputs x,y and f
as output.
IJSER
FIS generation of Data from FCM
Layer Layer 2 Layer Layer Layer
A Π N A
𝚺 Trained FIS (This FIS takes the input
B Π N and provides Result)
A
1
Y Fig 3. Flow Diagram of Adaptive Neuro Fuzzy C-Means Clustering
B
2.5
IJSER
1.5
0.5
0
0 10 20 30 40 50 60 70 80 90 100
In training dataset taken as input for Class Distribution: the Fig.7. Membership Function Plot of Adaptive Neuro Fuzzy Inference Sys-
class value (defects) is discrete in nature. tem
% data with positive attribute: 23 = 23%
% data with negative attribute: 77 = 77% TABLE 2
COMPARATIVE TABLE BASED ON SIMULATION FACTORS FOR FUZZY
C-MEANS AND ADAPTIVE NEURO FUZZY C-MEANS CLUSTERING
Adaptive
Fuzzy C-
Neuro Fuzzy
Means
C-Means
Accuracy 77% 91%
IJSER
[8] N. E. Fenton and M. Neil. A critique of software defect prediction models.
[29] Hassan Najadat and Izzat Alsmadi, Enhance Rule Based Detection for Soft-
IEEE, Transactions on Software Engineering, 25(5):675–689, 1999.
ware Fault Prone Modules. International Journal of Software Engineering and
[9] N. E. Fenton and N. Ohlsson. Quantitative analysis of faults and failures in a
Its Applications Vol. 6, No. 1, January, 2012
complex software system. IEEE Transactions on Software Engineering,
[30] Zhiwei Guo, Chengqing Yuan, Peng Liu, “Study on Identification Model of
26(8):797–814, 2000
Cylinder Liner-Piston Ring Using Vibration Analysis Based on Fuzzy C-
[10] Khoshgoftaar TM, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997)
means Clustering” The Open Mechanical Engineering Journal, 2012, 6, (Suppl
Predicting fault-prone modules with case-based reasoning. The Eighth Inter-
2: M2) 126-132 1874-155X
national Symposium on Software Engineering (ISSRE '07). IEEE Computer
[31] James C. Bezdek, Robert Ehrlich, William Full, “FCM: The Fuzzy C-Means
Society, pp 27–35
Clustering Algorithm” Computers & Geosciences Vol. 10, No. 2-3, Pp. 191-
[11] Thwin, M. M.; Quah, T. Application of Neural Networks for Software Quality
203, 1984.
Prediction using Object-oriented Metrics. // ICSM 2003. / Amsterdam, The
[32] Pejman Tahmasebi, Ardeshir Hezarkhani, Application of Adaptive Neuro-
Netherlands, 2003, pp. 113-122.
Fuzzy Inference System for Grade Estimation; Case Study, Sarcheshmeh
[12] Quah, T. S. Estimating Software Readiness using Predictive Models. // In-
Porphyry Copper Deposit, Kerman, Iran” Australian Journal of Basic and
formation Sciences, 179, 4(2009), pp. 430-445.
Applied Sciences, 4(3): 408-420, 2010
[13] Kanmani S,Uthariaraj V. R. Sankaranarayanan V, Thambidurai, P. Object-
[33] Ahmed A. M. Emam, Eisa Bashier M. Tayeb, A. Taifour Al, Ammar Hassan
oriented Software Fault Prediction using Neural Networks.Information and
Habiballh, “ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM IDENTI-
Software Technology, 49, 5(2007), pp. 483-492.
FICATION OF AN INDUCTION MOTOR” European Journal of Science and
[14] Evett, M.; Khoshgoftaar, T.; Chien, P.; Allen, E. GP-based Software Quality
Engineering Vol. 1, Issue 1, 2013.
Prediction. // Proceedings of the 3rd Annual Genetic Programming Confer-
[34] M. H. Olyaee, H. Abasi, M. Yaghoobi, “Using Hierarchical Adaptive Neuro
ence / San Francisco, CA, 1998, pp. 60-65.
Fuzzy Systems And Design Two New Edge Detectors In Noisy Images” Vol-
[15] Guo, L.; Cukic, B.; Singh, H. Predicting Fault Prone Modules by the Demp-
ume 2013, Year 2013 Article ID jsca-00030, 10 Pages doi:10.5899/2013/jsca-
ster-Shafer Belief Networks. // Proceedings of the 18th IEEE Int’l Conf. on
00030.
Automated Software Eng. / Montreal, Canada, 2003, pp. 249-252.
[16] Menzies, T.; Greenwald, J.; Frank, A. Data Mining Static Code Attributes to
Learn Defect Predictors. // IEEE Transactions on Software Engineering, 33,
1(2007), pp. 2-13.
[17] Zhang, M. L.; Peña, J. M.; Robles, V. Feature Selection for Multi-label Naive
Bayes Classification. // Information Sciences, 179, 19(2009), pp. 3218-3229.
[18] Khoshgoftaar, T. M.; Seliya, N. Software Quality Classification Modeling
using the SPRINT Decision Tree Algorithm. // Proc.of the 4th IEEE Int’l
Conf. on Tools with AI. / Washington, DC, 2002, pp. 365-374.
[19] Catal, C.; Diri, B. Investigating the Effect of Dataset Size, Metrics Sets, and
Feature Selection Techniques on Software Fault Prediction Problem. // In-
formation Sciences, 179, 3(2009), pp. 1040-1058.
[20] Elish, K. O.; Elish, M. O. Predicting Defect-Prone Software Modules using
IJSER © 2014
http://www.ijser.org