Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification

(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 7, No. 5, 2016
Artificial Neural Networks and Support Vector

Machine for Voice Disorders Identification
Nawel SOUISSI, Adnane CHERIF
University of Tunis El-Manar, Faculty of sciences of Tunis
Innov'COM Laboratory, 2092, Tunis
Tunis, Tunisia
Abstract—The diagnosis of voice diseases through the invasive researches indicate that voice disorders identification can be
medical techniques is an efficient way but it is often done by the exploitation of Mel Frequency Cepstral
uncomfortable for patients, therefore, the automatic speech Coefficients (MFCC) with the harmonics-to-noise ratio,
recognition methods have attracted more and more interest normalized noise energy and glottal-to-noise excitation ratio,
recent years and have known a real success in the identification Gaussian mixture model was used as classifier [4]. Also,
of voice impairments. In this context, this paper proposes a Daubechies‟ discrete wavelet transform, linear prediction
reliable algorithm for voice disorders identification based on two coefficient, and least-square Support Vector Machine (LS-
classification algorithms; the Artificial Neural Networks (ANN) SVM) were investigated in [5]. In addition, a voice recognition
and the Support Vector Machine (SVM). The feature extraction
algorithm was proposed in [6] based on the MFCC
task is performed by the Mel Frequency Cepstral Coefficients
(MFCC) and their first and second derivatives. In addition, the
coefficients, their first and second derivatives, performance of
Linear Discriminant Analysis (LDA) is proposed as feature F-ratio and Fisher‟s discriminant ratio as feature reduction
selection procedure in order to enhance the discriminative ability methods and Gaussian Mixture Model (GMM) as classifier; the
of the algorithm and minimize its complexity. The proposed voice main idea, here, consists in demonstrating that the detection of
disorders identification system is evaluated based on a voice impairments can be performed using both mel cepstral
widespread performance measures such as the accuracy, vectors and their first derivative, ignoring the second
sensitivity, specificity, precision and Area Under Curve (AUC). derivative. In this paper, we will prove that the contribution of
the first and second derivatives of the MFCC features mainly
Keywords—Automatic Speech Recognition (ASR); Pathological depends on the classifier. Indeed, the Artificial Neural
voices; Artificial Neural Networks (ANN); Support Vector Machine Networks (ANN) and the Support Vector Machine (SVM) as
(SVM); Linear Discriminant Analysis (LDA); Mel Frequency classifiers are investigated in this work and a comparative
Cepstral Coefficients (MFCC) study between their respective performances is conducted. In
addition, three combinations of the MFCC features, their first
I. INTRODUCTION and second derivatives are proposed for the feature extraction
When the mechanism of voice production is affected, the task. In order to select the most relevant parameters from the
voice becomes pathological and sometimes intelligible which resulting feature vector, the Linear Discriminant Analysis
causes many problems and difficulties to integrate the social (LDA) is suggested as feature selection procedure.
environment and to have an easy exchange between members Furthermore, the system performance is assessed in terms of
of the same community. Therefore, the diagnosis of voice the accuracy, sensitivity, specificity, precision and Area Under
impairments is imperative to avoid so many issues. Voice Curve (AUC). In the next section, the methodology and
disorders can be classified into three main categories: organic, database used in this work are described as well as the
functional or combination of both [1]. This study is designed performance measures. Then, section 3 presents the
for organic voice disorders. Indeed, a voice disorder is organic experimental results and section 4 discusses these obtained
if it is caused by structural (anatomic) or physiologic disease, results. Finally, we conclude this paper with section 5.
either a disease of the larynx itself or by remote systemic or
neurologic diseases that alter larungeal structure or function II. MATERIALS AND METHODS
[2]. In this research, we have worked on both structural and A. Database
neurogenic disorders. Four types of pathologies are examined:
Chronical laryngitis, Cyst, Reinke edema and Spasmodic In this research, we have selected the voice samples from
dysphonia since they are widespread diseases and their medical the „Saarbrucken Voice Database‟ (SVD) [7], [8] which is a
analysis is a bit tricky to date. Among many techniques to German disorders voice database collected in collaboration
identify voice diseases, the automatic acoustic analysis has with the Department of Phonetics and ENT at the Caritas clinic
proven its efficiency last years and has attracted more and more St. Theresia in Saarbrucken and the Institute of Phonetics of
success. The advantage of acoustic analysis is its nonintrusive the University of the Saarland. It contains 2225 voice samples
nature and its potential for providing quantitative data with with a sampling rate of 50 kHz and with a 16 bit amplitude
reasonable expenditure of analysis time [3]. Therefore, several resolution. Subjects have sustained the vowels [i], [a] and [u]
techniques and methods have been introduced and many for 1s long. In this study, the continuous vowel [a] phonation
studies have been conducted in the literature. Some of these produced by 50 normal people and 70 patients were examined.
Four types of pathologies are investigated: Chronical laryngitis
339 | P a g e
www.ijacsa.thesai.org
Vol. 7, No. 5, 2016
(24), Cyst (6), Reinke‟s edema (19) and Spasmodic dysphonia binning. Therefore, Mel filtering process has to be performed.
(21). Thus, the obtained speech signal spectrum is filtered by a
group of triangle bandpass filters that simulate the
B. The Proposed Algorithm characteristics of human's ear [9], [10]. The following equation
In this paper, the extraction of the acoustical features from is used to compute the Mel frequency fMel for a given linear
the speech signal is performed by the MFCC parameterization frequency fHz in Hz.
method. In addition, the first and second derivatives which
provide information about the dynamics of the time-variation
in MFCC original features were investigated to verify their f Mel  2595 * log(1  f Hz / 700) (1)
contribution to the proposed algorithm. In order to optimize the
voice disorders detection, a projection based Linear
The nonlinear characteristic of human auditory system in
Discriminant Analysis (LDA) as feature selection method is
frequency is approximated by the Mel filtering procedure. At
suggested and a comparative study is elaborated between
this stage, a natural logarithm is applied on each output
optimized and non-optimized features for every tested
spectrum from Mel bank. Finally, The Discrete Cosine
combination. As regards the classification task, the Artificial
Transform (DCT) is performed to convert the log Mel
Neural Networks (ANN) are used as unconventional approach
spectrum into time domain; thus the Mel Frequency Cepstrum
in addition to the Support Vector Machine as a new method
Coefficients (MFCC) are obtained. Besides, there are several
successfully exploited in recent years, Fig. 1.
ways to approximate the first derivative of a cepstral
coefficient. In this research, we use the following formula [11]:
dx ( t )
 x ( t )  x  mM  M m ( t  m ) (2)
dt
Where x(t) is the cepstral coefficient, t is the frame number

and 2M + 1 is the number of frames considered in the
evaluation. The same formula can be applied to the first
derivative to produce the acceleration.
Fig. 1. Block diagram of the proposed system

For each time frame, the MFCC feature vector is
composing of N original cepstral features, N delta cepstral
C. Feature Extraction Method coefficients and N delta- delta coefficients. Where N is the
number of MFCC features chosen for a simulation. In this
Feature extraction is obviously the most crucial task in work, several experiments were conducted using 13 original
speech recognition process. In this research, the Mel Frequency MFCC features, their derivatives and accelerations in order to
Cepstral Coefficients (MFCC) procedure is chosen as a robust perform a comparative study between the different proposed
technique commonly used and has proven its efficiency in combinations.
speech recognition.
D. Feature Selection Procedure
The Mel Frequency Cepstral Coefficients (MFCC) is a
nonparametric frequency domain approach which is based on In this research, Linear Discriminant Analysis (LDA) is
human auditory perception system. suggested as a feature selection procedure which is a
supervised subspace learning method based on Fisher Criterion
[12]. Indeed, it aims to estimate the parameters of a projection
matrix in order to map features from an h-dimensional space to
a k-dimensional space (k<h) in which the between class scatter
is maximized while the within-class scatter is minimized. The
within-class scatter calculates the average variance of the data
within each class, while the between-class scatter represents
Fig. 2. Block diagram of the MFCC procedure the average distance between the means of the data in each
class and the global mean [13]. Linear Discriminant Analysis is
As presented in Fig. 2, the procedure of the MFCC features investigated in this research in order to optimize the proposed
extraction starts by the decomposition of the speech signal into identification algorithm since it is able to select the most
small frames since it is slowly time varying and can be treated relevant parameters from a feature vector in order to minimise
as a stationary random process when considered under a short the complexity of the system while improving recognition
time frame [9], Then, windowed (a 30 ms. Hamming window rates.
was used) with no preemphasis. The frames were extracted
with a 50% frame shift. The spectral coefficients of the speech E. Classification Algorithms
frames are estimated using the nonparametric fast Fourier Two classification algorithms are proposed in this work and
transform (FFT)-based approach. On the other hand, the human a comparative study is established between their performance
auditory system perceives sound in a nonlinear frequency
340 | P a g e
Vol. 7, No. 5, 2016
rates in order to conclude the most effective classifier for the pathological class. Furthermore, Accuracy measures the
identification of voice disorders. algorithm correct classification rate and the AUC which is an
important statistical property for evaluating the discriminability
1) Support Vector Machine: between the two classes of normal and pathological samples.
Support Vector Machines are a class of learning techniques Therefore, the AUC provides another way to measure the
introduced by Vladimir Vapnik in the early 90s [14], [15]. The accuracy of the proposed system. These measures are based on
binary classification is where the training data comes only from the following notions:
two different classes (+1 or -1). The idea of SVM is to find a
hyperplane that best separates the two classes with maximum TP : True Positive : identified as pathological when
margin. If the data is linearly separable, it is called « Hard- pathological samples are actually present
margin SVM ». If the data is non-linearly separable, it is called TN : True Negative : identified as normal when normal
"Soft-margin SVM". In this case, the data are mapped into a samples are actually present
higher-dimensional space where the function becomes linear.
This transformation space is often performed using a "Kernel FP : False Positive : identified as pathological when normal
Mapping function" and the new space is called "Features samples are actually present
space". The most widely used SVM kernel functions are linear
kernel, polynomial kernel and Radial Basis function (RBF) as FN : False Negative : identified as normal when
Gaussian kernel. pathological samples are actually present
The training phase of the SVM classifier involves searching These measures can be calculated as follows:
the hyperplane that maximizes the margin. Such hyperplane is TP  TN
called « hyperplane optimal separation ». Accuracy 
TP  TN  FP  FN
In this research, the proposed algorithm was trained with
the « Radial Basis Function » (RBF) as a Gaussian SVM kernel TP
Sensitivity 
and LIBSVM which is a SVM library [16]. TP  FN
2) Artificial Neural Networks: TN
Artificial Neural Networks are absolutely one of the most
Specificity 
TN  FP
effective approaches for speech recognition thanks to their
numerous architectures and learning algorithms, In this paper, TP
Precision =
the architecture of the proposed neural networks is composed TP  FP
of three layers, an input layer for the transmission of the input
1  TP TN 
features without distortion, a hidden layer containing 250 Area Under Curve (AUC) =    
neurons (sigmoid is applied as activation function) and an 2  TP  FN TN  FP 
output layer containing a linear function neuron. Each layer is
completely connected to the next one. The proposed neural III. EXPERIMENTAL RESULTS
network learning is performed based on the principles of the In this research, the dataset was divided into two parts: 70%
Bayesian regularization algorithms. Indeed, the network weight of the data were used for training and 30% for validation. All
values are adjusted successively at every step of learning in simulations were conducted in MATLAB 2013a with Intel
order to achieve an output as close as possible to the Core-i7, 2.20 GHz CPU and 4 GB RAM.
considered data [17].
A. Evaluation Based on the SVM Performance
Concerning the Bayesian approach, it is based on the
exploitation of a random distribution of the network weight In this part of the article, we present the SVM performance
probabilities. The neural network learning consists in rates for different combinations of the MFCC coefficients
determining the distribution knowing the training data. Indeed, before and after applying the LDA feature selection procedure.
after the examination of the training data, the initial probability Table 1 shows the SVM performance in terms of accuracy
attributed to weights, before performing the learning, is (Acc %), sensitivity (Sens %), specificity (Spec %), precision
transformed into a final distribution through the application of (Prec %) and AUC (%) for the different MFCC feature vectors.
the Bayes theorem [17]. The experimental results show that there is a slight
F. Evaluation Process increase, in the SVM performance rates between the MFCC
and MFCC_Delta1 combinations, of 0.04% in the accuracy
In order to judge the effectiveness and the robustness of the rate, 0.03% in the AUC rate, 0.04% in the sensitivity rate,
proposed algorithm, it has to be assessed according to different 0.05% in the specificity rate and 0.07% in the precision rate.
performance measures. In this research, five performance Whereas, the system performances are exactly equal for the
measures were used: accuracy, sensitivity, specificity, combinations of MFCC_Delta1 and MFCC_Deltas1&2 with
precision and the Area Under Curve (AUC) from the Receiver an accuracy rate of 80.4%, sensitivity of 87.83%, specificity of
Operating Characteristic Curve (ROC). Indeed, sensitivity 73.58%, AUC of 80.7% and precision of 72.29%. Therefore,
measures the ability of the algorithm to recognise pathological we can note that the first and the second derivatives don‟t
samples. It opposes specificity which evaluates the ability of provide a significant improvement in the system performances
the algorithm to identify normal samples. Precision represents when the SVM is used as classifier which demonstrates that the
the proportion of well-classified pathological samples from the
341 | P a g e
Vol. 7, No. 5, 2016
SVM algorithm is not sensible to the information provided by for the MFCC_Delta1 and 6.94% between the optimized and
these features about the dynamics of the time-variation in the non-optimized MFCC_Delta1&2 features.
MFCC original vector. Besides, after applying the LDA
procedure, the SVM performance rates are certainly less close 100
Non optimized (%)
but not enough distant to change the whole analysis about the 80.67
87.53
80.70
87.31
80.70
87.64 Optimized (%)
80
contribution of the first and the second derivatives in the
Area Under Curve (%)

proposed algorithm when the SVM is applied as classifier.
60
TABLE I. THE SVM PERFORMANCE BASED ON THE MFCC

40
COMBINATIONS BEFORE APPLYING THE LDA PROCEDURE (TABLE 1-1)
AND AFTER INCLUDING LDA PROCEDURE (TABLE 1-2)
20
0
MFCC MFCC_Delta1 MFCC_Deltas1&2
Fig. 4. Comparison between the SVM AUC rates of the optimized and non-
optimized MFCC features
Hence, the LDA procedure can be considered efficient in

the selection of the most relevant parameters in order to obtain
the optimized feature vector able to achieve best performance
rates. Thus, the best performances were achieved by the
In the literature, previous results found by Godino-Llorente optimized MFCC_Delta1&2 with a slight increase comparing
et al. [6] demonstrate that the detection of voice impairments to the other optimized features as mentioned in Table 1.
can be performed using both mel cepstral vectors and their first
derivative, ignoring the second derivative when the Gaussian B. Evaluation Based on the ANN Performance
Mixture Models are applied as classifier. However, our Table 2 shows the ANN performance rates for different
findings prove that even the first derivative can be ignored in combinations of the MFCC coefficients before and after
the detection of voice impairment and only the original Mel applying the LDA feature selection procedure. The system
Frequency Cepstral Coefficients are significant with the SVM performances are presented in terms of accuracy (Acc%),
classifier. sensitivity (Sens%), specificity (Spec%), precision (Prec%)
and AUC(%) for the different MFCC feature lengths.
On the other hand, the LDA feature selection method was
applied considering the different MFCC feature vectors. The TABLE II. THE ANN PERFORMANCE BASED ON THE MFCC
experimental results show a significant improvement in the COMBINATIONS BEFORE APPLYING THE LDA PROCEDURE (TABLE 2-1)
system performance. Thus, Fig. 3 exposes an optimization of AND AFTER INCLUDING LDA PROCEDURE (TABLE 2-2)
5.92% for the MFCC features which leads to an accuracy rate

of 86.28%.
100
Non optimized (%)
86.28 86.07 86.44 Optimized (%)
80.36 80.40 80.40
80
Accuracy (%)
60
40
It is obvious that the ANN performance is increasingly

20 better after integrating the first and second derivatives of the
MFCC features. In fact, the accuracy and AUC rates are about
0
75.13% and 75.02%, respectively, for the combination of the
original MFCC features whereas these rates are about 81.19%
Fig. 3. Comparison between the SVM accuracy rates of the optimized
and non-optimized MFCC features
and 81.74%, respectively, when the first MFCC derivatives are
associated with the original ones. This improvement is
Similarly, the optimized MFCC_Delta1 and enhanced for the combination of the MFCC features with their
MFCC_Delta1&2 combinations provide the accuracy rates of first and second derivatives since a significant increase in the
86.07% and 86.44% representing an increase of 5.67% and system performance measurements is observed. Indeed, this
6.04%, respectively. combination offers an accuracy of 85.2% and AUC of 85.21%.
Therefore, the first and second derivatives of the MFCC
The AUC rates for the different MFCC combinations are coefficients can be considered significant when the ANN is
presented in Fig. 4. It is observed that the improvement is applied as classifier since they offer a great improvement in the
important between optimized and non-optimized features such system performance compared to the results of the original
as the increase of 6.86% for the MFCC combination and 6.61% MFCC features. In fact, this variation between the different
342 | P a g e
Vol. 7, No. 5, 2016
combinations is observed before and after applying the LDA IV. DISCUSSION
transformation. In this paper, the ANN is proposed as unconventional
As regards the LDA method, it was applied to the different approach in addition to the SVM as a new method successfully
MFCC combinations in order to select the most significant exploited in speech recognition. The main motivation for
parameters from the feature extraction task to be the input conducting this research was to investigate the efficiency of
vector of the ANN architecture. This strategy leads to an each of those classifiers in the identification of voice disorders.
optimization in the system performance. Indeed, the In addition, it was interesting to scrutinize the contribution of
experimental results show an improvement in the ANN the first and second derivatives of the MFCC features for every
performance measurements for all the optimized MFCC feature classifier. The experimental results demonstrate that the effect
combinations. Fig. 5 compares the ANN accuracy rates of the of these derivative features depends on the classifier. Indeed,
optimized and non-optimized MFCC vectors. when the SVM is used as classifier, the first and second
100
derivatives do not provide any improvement to the system
87.82
Non optimized (%) performance comparing to the original MFCC features.
90 Optimized (%)
81.19
84.06 85.20 However, when the ANN is used as classifier, these derivative
80.25
80
75.13 features can be considered important since they contribute in
70 the improvement of the system performance. In this case, there
is an average improvement about 4% between the combination
Accuracy (%)
60
50
of the MFCC, MFCC_Delta1 and the MFCC_Delta1&2.
40 Besides, the LDA procedure is used to select the most
30 relevant parameters from a resulting feature vector in order to
20 reduce the system dimensionality without affecting its
10
performance. Indeed, our findings show that the LDA method
minimizes the system complexity while improving the
0
MFCC MFCC_Delta1 MFCC_Deltas1&2 performance rates for every feature combination; therefore it
Fig. 5. Comparison between the ANN accuracy rates of the optimized and can be considered as an optimization procedure.
non-optimized MFCC features Table 3 compares the proposed algorithms with previous
significant works. It is observed that the proposed algorithm
The experimental results exposed in Fig. 5 show an
appears competitive for the detection of voice disorders from
optimization of 5.12% in the accuracy rate of the non-
the Saarbrucken Voice Database (SVD).
optimized MFCC features, while the improvement is about
2.87% for the combination of the MFCC features and their first TABLE III. COMPARATIVE TABLE BETWEEN PROPOSED ALGORITHM AND
derivatives. Also the optimization procedure provides a 2.62% PREVIOUS WORKS
increase in the accuracy rate of the MFCC features associated
with their first and second derivatives. In fact, the improvement
was observed for all performance measures namely the AUC
rates which were improved to reach 81.87% for the MFCC
combination with an optimization of 6.85% while 3.85% and
2.75% were the improvement rates for the combination of
MFCC_Delta1 and MFCC_Delta1&2, respectively, Fig. 6.
100
Non optimized (%)
87.96
90 Optimized (%)
85.59 85.21
81.87 81.74
80
75.02
70
Area Under Curve (%)
60
50
40
30
20
10
0
Fig. 6. Comparison between the ANN AUC rates of the optimized and non-
optimized MFCC features
Finally, with an accuracy rate of 86.44%, sensitivity of
Finally, the optimized MFCC_Delta1&2 combination 98.24%, specificity of 77.04%, AUC of 87.64% and precision
reached the best ANN performance rates with an accuracy rate of 74.42%, the SVM classifier can be judged efficient for voice
of 87.82%, sensitivity of 99.12%, specificity of 80.31%, AUC disorders identification. Also, the ANN classifier offers an
of 87.96% and a precision of 81.42% as mentioned in Table 2. accuracy rate of 87.82%, sensitivity of 99.12%, specificity of
343 | P a g e
Vol. 7, No. 5, 2016
80.31%, AUC of 87.96% and precision of 81.42% which are Calibration and Fusion of Scores Using MultiFocal Toolkit,” Advances
slightly better than those of the SVM classifier which leads to in Speech and Language Technologies for Iberian Languages, vol. 328,
pp. 99-109, 2012.
conclude that the ANN classifier is likewise effective for voice
[5] E. F. Fonseca, R. C. Guido, P. R. Scalassara, C. D. Macciel and J. C.
impairment identification. With these performance rates, the Pereria, “Wavelet time frequency analysis and least square support
proposed algorithm can be considered reliable for the vector machine for the identification of voice disorders,” Comp. Bio.
identification of pathological voices from normal ones. Med., vol. 37, pp. 571-578, 2007.
[6] J. I. Godino-Llorente, P. Gomez-Vilda and M. Blanco-Velasco,
V. CONCLUSION “Dimensionality reduction of a pathological voice quality assessment
system based on Gaussian mixture models and short-term cepstral
This paper proposes an optimized voice disorders parameters,” IEEE Trans. Biomed. Eng., vol. 53, pp. 1943- 1953, 2006.
identification algorithm based on short-term cepstral
[7] W. J. Barry and M. Putzer, Saarbrucken Voice Database, Institute of
parameters and the Linear Discriminant Analysis as feature Phonetics, Univ. of Saarland.
selection method. As regards the classification task, it is [8] M. Putzer and J. Koreman, “A German database of patterns of
performed by the Artificial Neural Networks and the Support pathological vocal fold vibration,” Phonus 3, Institute of Phonetics,
Vector Machine. The three combinations of MFCC, University of the Saarland, pp. 143-153, 1997.
MFCC_Delta1 and MFCC_Delta1&2 are examined in order to [9] X. Xiong, “Robust speech features and acoustic models for speech
conclude the role of the derivative features. Indeed, recognition,” PhD Dissertation, School of computer engineering,
experimental results demonstrate that the contribution of the Nanyang Technological University, 2009.
first and second derivative of the MFCC features varies [10] V. Tiwari, “MFCC and its applications in speaker recognition,”
International Journal on Emerging Technologies, vol. 1, pp. 19-22,
according to the classifier. In addition, the LDA transformation 2010.
can be considered as optimization procedure since it improves [11] J. W. Picone, “Signal modeling techniques in speech recognition,” in
the system performance while reducing its dimensionality. The Proc. of the IEEE, vol. 81, pp. 1215-1247, 1993.
accuracy rates of 86.44% and 87.82% were obtained by the [12] G. Quanquan, L. Zhenhui and H. Jiawei, Linear Discriminant
SVM and the ANN, respectively. Therefore, we can conclude Dimensionality Reduction, ser. Lecture Notes in Computer Science,
that ANN and SVM are efficient for voice disorders Machine Learning and Knowledge Discovery. Germany: Springer, 2011,
identification with a slight advantage to the ANN. Many future pp. 549-564.
improvements can be proposed such as including other feature [13] V. S. Tomar, “Discriminant feature space transformations for automatic
extraction methods in a hybrid schema in order to improve speech recognition,” Department of Electrical and Computer
Engineering, McGill University, Montreal, 2012.
performance rates. For instance, we can suggest the Discrete
[14] I. Guyon, B. Boser and V. Vapnik, “Automatic capacity tuning of very
Wavelet Transform to be integrated with the proposed MFCC large VC-dimension classifiers,” Advances in Neural Information
features. In addition, the real time implementation of the Processing Systems, pp. 147-155, 1993.
proposed algorithm may be envisaged. [15] B. E. Boser, I. M. Guyon and V. N. Vapnik, “A training algorithm for
REFERENCES optimal margin classifiers,” in Proc. WCLT'92, New York, 1992.
[1] A. Akbari and M. K. Arjmandi, “An efficient voice pathology [16] C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector
classification scheme based on applying multi-layer linear discriminant machines,” ACM Trans. Intell. Syst. Technol., vol. 27, pp. 1-27, 2011.
analysis to wavelet packet-based features,” Biomedical Signal [17] R.M. Neal, Bayesian learning for neural networks, New York : Spring
Processing and Control, vol. 10, pp. 209-223, 2014. Verlag, 1996.
[2] A. E. Aronson and D. M. Bless, Clinical voice disorders, Fourth ed., [18] A. Al-nasheri, A Zulfiqar, M Ghulam and A Mansour, “Voice pathology
New York: Thieme, 2009. detection using auto-correlation of different filters bank,” in Proc.
[3] Lions Voice Clinic, University of Minnesota, Department of AICCSA'14, Doha, Qatar, 2014.
Otolaryngology, P. O. Box 487, 420 Delaware St., SE, Minneapolis, [19] I. M. M. El Emary, M. Fezari and F. Amara, “Towards developing a
MN55455,USA. voice pathologies detection system,” Journal of Communications
[4] D. Martinez, E. Lleida, A. Ortega, A. Miguel and J. Villalba, “Voice Technology and Electronics, vol. 59, pp. 1280-1288, 2014.
Pathology Detection on the Saarbruecken Voice Database with
344 | P a g e

Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification

Uploaded by

Copyright:

Available Formats

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 7, No. 5, 2016

Artificial Neural Networks and Support Vector

Where x(t) is the cepstral coefficient, t is the frame number

Fig. 1. Block diagram of the proposed system

Area Under Curve (%)

TABLE I. THE SVM PERFORMANCE BASED ON THE MFCC

Hence, the LDA procedure can be considered efficient in

5.92% for the MFCC features which leads to an accuracy rate

It is obvious that the ANN performance is increasingly

You might also like