International Journal of Electrical and Computer Engineering (IJECE)
Vol. 13, No. 2, April 2023, pp. 1997~2008
ISSN: 2088-8708, DOI: 10.11591/ijece.v13i2.pp1997-2008
1997
Characterization of Arabic sibilant consonants
Youssef Elfahm1, Nesrine Abajaddi1, Badia Mounir2, Laila Elmaazouzi2, Ilham Mounir2,
Abdelmajid Farchi1
1
IMII Laboratory, Faculty of Sciences and Technics, University Hassan First, Settat, Morocco
2
LAPSSII Laboratory, High School of Technology, University Cadi Ayyad, Safi, Morocco
Article Info
ABSTRACT
Article history:
The aim of this study is to develop an automatic speech recognition system
in order to classify sibilant Arabic consonants into two groups: alveolar
consonants and post-alveolar consonants. The proposed method is based on
the use of the energy distribution, in a consonant-vowel type syllable, as an
acoustic cue. The application of this method on our own corpus reveals that
the amount of energy included in a vocal signal is a very important
parameter in the characterization of Arabic sibilant consonants. For
consonants classifications, the accuracy achieved to identify consonants as
alveolar or post-alveolar is 100%. For post-alveolar consonants, the rate is
96% and for alveolar consonants, the rate is over 94%. Our classification
technique outperformed existing algorithms based on support vector
machines and neural networks in terms of classification rate.
Received Jan 25, 2022
Revised Sep 15, 2022
Accepted Oct 12, 2022
Keywords:
Alveolar
Classification
Energy bands
Post-alveolar
Sibilant fricatives
This is an open access article under the CC BY-SA license.
Corresponding Author:
Youssef Elfahm
Laboratory IMII, Electrical and Computer Engineering Department, Faculty of Sciences and Technologies,
Hassan First University
Road of Casablanca B.P: 577, Settat, Morocco
Email:
[email protected]
1.
INTRODUCTION
The field of automatic speech processing has undergone considerable development in recent years.
This development allowed humans to communicate with machines. As a result, speech recognition systems
are used in a wide range of activities, both professional and public, such as newspaper writing, controlling
industrial machinery, and so on. Knowledge of various fields, such as signal processing, linguistics,
phonology, computer science, and statistics, is required in the subject of automatic speech recognition. from a
phonetic perspective, vowels and consonants are the two basic kinds of vocal sounds. The creation of vowels
necessitates open air circulation in the vocal tract, whereas the generation of consonants necessitates an
interruption or disturbance in the flow of air at one point [1], [2]. Occlusive and fricative consonants are the
two basic modalities of consonantal articulation in articulatory phonetics. Occlusives are noisy sounds of
short duration marked by quiet caused by the complete closure of the vocal tract at a specific point (as in the
consonant /k/). Fricatives, on the other hand, are noisy sounds created by turbulent airflow. There is a
frictional noise (as in the consonant: /s/) when this flow hits a constriction [3], [4]. Fricatives can be grouped
into the sibilant and non-sibilant categories. In the opposite of the non-sibilant consonants, the sibilant ones
are produced by directing a flow of air with the tongue towards the edge of the teeth kept closed, resulting in
a distinctive hissing sound [5]. Figure 1 reports the classification of Arabic consonants.
The researchers conducted many studies in order to design a voice recognition system and/or
improve its performance. They used a variety of acoustic indices in their research, including the duration and
amplitude of the frication, the center of gravity (CoG) value, spectral moments (skewness, mean, kurtosis,
and standard deviation), gammatone filter outputs, Mel-frequency cepstral coefficients (MFCCs), and so on.
Journal homepage: http://ijece.iaescore.com
1998
ISSN: 2088-8708
For sibilant consonants, several studies were conducted. Indeed, Behrens and Blumstein [6] undertook an
examination of the temporal changes of the spectral features of English sibilants (/s/ and /ʃ/) as part of their
work on the characterization of sibilant consonants. According to their findings, monitoring the frequency of
the peak at the start, middle, and end of the consonant allows for highly accurate identification of these
sounds. The consonant /s/ had a greater peak frequency than the consonant /ʃ/. The spectrum and intensity of
fricative consonants can be used to determine the place of articulation, according to Borden [7]. When
compared to non-sibilants, sibilants (/s/, /z/, /sh/, and /zh/) have unusually steep high frequency spectral
peaks and comparatively high intensity levels. The alveolar cells’ spectral peak (/s/ and /z/) is around 4 kHz.
For a typical male speaker, the post-alveolar (/sh/ and /zh/) frequency is around 2.5 kHz. The duration and
amplitude of the frication are also related to the articulation point, allowing to discriminate between sibilants
and non-sibilants [8].
Figure 1. Diagram classifying the Arabic fricative consonants
To identify English fricative sounds, Ali et al. [9] used the maximum normalized spectral slope
(MNSS) and the spectral CoG (SCoG). They stated in their paper that the detection of sibilants is done in two
stages: the first is determining the voicing, and the second is determining the articulation location. They were
87 percent accurate on average. Regarding the recognition of English sibilants in terms of alveolar and postalveolar, Ali et al. [10] found that alveolar peaks around 5 kHz while the post-alveolar peaks around 3 kHz.
The CoGs, which have been identified between 2 and 4 kHz for the post-alveolar and between 4 and 8 kHz
for the alveolar, can be used to distinguish the two classes of sibilants [11], [12]. The front cavity of the postalveolar consonant /ʃ/ is larger than that of the alveolar consonant /s/ from an articulatory standpoint. This
difference is accompanied by a qualitative difference in the shape of the front cavity: for /ʃ/, the tongue is
positioned so that a sublingual cavity would be formed behind the lower incisors, whereas for /s/, the tongue
is positioned in a way in which the underside of the tongue tip comes into contact with the lower incisors,
obviating the need for a sublingual cavity [13]–[15]. Non-sibilant English fricatives have bigger standard
deviations, lower overall amplitudes, and shorter durations than sibilant ones, according to spectral moments.
The palato-alveolar junction place /ʃ/ (4.7 kHz) had a lower spectral mean than the alveolar one /s/ (7.1 kHz).
The asymmetry average for the consonant /s/ was negative in all female productions and considerably
positive in all male outputs [16], [17]. Kong et al. [18] focused on classifying English fricatives as alveolar,
post-alveolar, or non-sibilant. The data was analyzed using spectral characteristics, gammatone filter outputs,
and MFCCs. They achieved an accuracy of 88 percent with gammatone filter outputs and 87 percent with
non-gammatone filter outputs. Kochetov [19] looked at the CoG, formants F1, F2 and F3 during the next
vowel, and length of the four unvoiced Russian sibilants (/s, sj, ʂ, and ʃj /). During the frication area, he
discovered that the CoG aids detect anterior versus posterior contrast. F1 and especially F2 at the beginning
and middle of the next vowel distinguished the palatalized versus non-palatalized difference. Only /ʃj/ is
distinguished by the fricative duration.
Cooper et al. [20] studied unvoiced fricatives using spectral moments, median power, and fricative
duration as acoustic indicators when working on Arabic fricatives. This study demonstrated that spectral
asymmetry may be used to determine consonant articulation points. The asymmetry value increases when the
point of articulation is moved from the front to the back of the vocal tract. The value of /s/ was bigger than
that of /ʃ/ in terms of spectral mean. The greatest values were found in alveolar fricatives, followed by postalveolar fricatives, while the lowest values were found in non-sibilant fricatives. The spectral standard
deviation values of sibilant fricatives were lower than those of non-sibilant fricatives. In his research on
Arabic fricative consonants, Al-Khair [21] found that the spectral position of the peak is an acoustic index
that permits to distinguish between alveolar sibilant consonants /s and z/ and post-alveolar /∫/. Arabic
Int J Elec & Comp Eng, Vol. 13, No. 2, April 2023: 1997-2008
Int J Elec & Comp Eng
ISSN: 2088-8708
1999
sibilants have a compact spectrum with a greater intensity and frequency CoG than non-sibilants, according
to Benamrane [22]. The consonants /s and s/ have a high CoG in comparison to the consonants /z, Ӡ and ∫/.
Mokari and Mahdinezhad [23] have conducted a comparison of tow Azerbaijani fricative classifiers. The first
system uses spectral moments, spectral peak, amplitude, and duration, whereas the second one employs
cepstral coefficients. This comparison shows that the cepstral coefficients were more trustworthy predictors
in the categorization of the nine Azerbaijani fricatives. Based on the energy in the bands as an acoustic
indication, Elfahm et al. [24] developed a technique for categorizing Arabic fricative consonants into two
main groups: sibilant and non-sibilant. They discovered that sibilant consonants had zero energy in the band
(800 to 2,000 Hz), while non-sibilant had the lowest energy in the region (5,000 Hz to 8,000 Hz).
As can be seen from this overview, the major works were limited to classify the tow sibilant
consonants /s and ∫/ using spectral moments and CoG values as acoustic cues. In this study, our contribution
is to extend the classification to the other Arabic sibilant consonants /s, sҁ, z, Ӡ, ∫/. Our algorithm uses the
energy distribution in syllable to classify these consonants into two groups: alveolar /s, sҁ, and z/ and postalveolar /Ӡ and ∫/. Then, it recognizes the consonants of each group. This paper is organized as follows: The
methodology and instruments employed, as well as the experiments conducted, are presented in the first
section. The results are presented and discussed in the second section. A summary of the findings and a
presentation of the conclusions are included in the final section.
2.
METHOD
This study took place in two phases: a phase of construction and segmentation of the corpus, a
second phase concerning the processing and the acoustic analysis of the signal. The purpose of the first phase
is to record vocal sequences and segment these sequences into syllabic units of the consonant-vowel CV
type. In the signal processing part, we calculated the landmarks and the energy in frequency bands in order to
use it in the acoustic analysis of the voice signal.
2.1. Corpus and signal processing
The data used for the acoustic analyzes presented in this study include audio recordings from our
own corpus. We asked nine male Moroccan speakers to repeat a CVCVCV sequence four times, see Figure 2.
All footage is recorded in an isolated chamber via the Labtech AM232 microphone which was placed 20 cm
from the corner of the mouth and at a 45° angle to increase recording quality. The audio files were recorded
in rural areas, using Praat software, at a sampling frequency of 22.05 KHz. From this dataset, we performed a
segmentation operation, exploiting landmarks, to extract a CV sequence.
Figure 2. Using the Praat software, recording the sibilant consonant /s/ followed by the vowel /a/
The speech signal’s spectrogram was computed using MATLAB software as follows: The signal is
initially divided into 11.6 ms segments, with adjacent segments overlapping by 9.6 ms. To obtain appropriate
frequency resolution, these segments underwent Hamming windowing, which was preceded and followed by
zero padding. After that, the fast Fourier transform (FFT) is computed. To get the normalized energy EB(n)
of vowels and fricative consonants, use (1).
𝐸𝐵 (𝑛) = ∑𝑖 10. log(|𝑋(𝑛, 𝑖 |2 )
(1)
Characterization of Arabic sibilant consonants (Youssef Elfahm)
2000
ISSN: 2088-8708
X(n,i) is the amplitude of the spectrum smoothed by a moving average of 20 points along the time index (n).
The frequency band is represented by B. The frequency index, (i) is calculated using the DFT indices that
reflect the bottom and upper boundaries of each band. Then, using (2), we computed energy percentage EBn
(n) of band B for each window n.
𝐸 (𝑛)
𝐸𝐵𝑛 (𝑛) = 𝐸𝐵(𝑛)
(2)
𝑅𝐸𝐵 (𝑛) = 𝐸𝐵 (𝑛) − 𝐸𝐵 (𝑛 − 𝐽)
(3)
𝑇
ET(n) represents global energy in segment n.
In order to identify a landmark, it is necessary to measure the rate of change of a number of
characteristics that were derived from the speech signal over a brief period of time. The ensuing equation was
applied to determine the rate of change (ROC) of energy in band b.
The time step is represented by the letter J. The difference in energy value between the current window n and
the one preceding it by J windows is shown by this measurement.
2.2. Segmentation
Vowels and consonants are produced when the vocal tract suddenly constricts. This articulatory
action is mirrored in the speech signal’s spectrum by a sudden change at the moment in time when the sound
is closed or released [25], [26]. These time points serve as markers for determining the beginning and the end
of a consonant or vowel. We employed two sorts of landmarks point in our approach. The first is the acoustic
cue (g), which indicates when the vocal cords begin to vibrate (g+) and when they stop vibrating (g–). These
times correspond to the crossing points of the ROC curve of the first band B1 above and below the threshold
values of 9 dB (g+) and -9 dB (g-) respectively. The acoustic cue (b) Burst is the second kind, with (b+)
indicating the start of the frication noise for fricative consonants or the commencement of the explosion for
plosive consonants, and (b-) indicating the conclusion of the frication or suction noise. Between the points
(g-) and (g+), the landmark point (b) is positioned at the most important peak of the ROC curve of the bands
B2 to B5. The following intervals correlate to a consonant or vowel’s location: A vocal consonant or vowel is
expressed by (+g, -g). (+b, +g, -g): A syllable that starts with a frication, with (+b) indicating that the
frication is present. (+b, -b, +g, -g): initial plosive syllables, (+b, -b) denoting the start and end of the
liberation.
2.3. Support vector machine and artificial neural network methods
An artificial neural network (ANN) is a mathematical model that imitates the functions of the
human brain. Today, the multilayer perceptron (MLP) is a form of neural network that is widely used in
classification. The input and output nodes are separated by one or more layers in this feed-forward network.
With one and two hidden layers, each with a different number of neurons, we put the MLP network to the test
(s). The output layer is made up of two neurons, whereas the input layer is made up of four neurons [27].
When determining the number of neurons per hidden layer, there are several guidelines to follow. The size of
hidden layer must be either the same as the size of the input layer [28] or 75 percent of its [29].
Support vector machine (SVM) is a classification technique based on supervised machine learning.
The objective is to find a decision function that uses the optimal hyperplane margin separation as a starting
point. Support vectors are the data points closest to the hyperplane. SVM transforms the representation space
of the input data into a higher-dimensional space where a linear separation is more likely when the data to be
processed is not linearly separable [30]. This is accomplished through the usage of a kernel function. The
polynomial kernel is the most often used kernel in SVMs.
3.
RESULTS AND DISCUSSION
Our classification algorithm works in three steps. It all starts with recognizing the vowel that follows
the consonant. Then and for the same vowel, our algorithm divides the consonants into two categories,
alveolar and post-alveolar. Finally, it separates the consonants that belong to each of the two categories.
3.1. Vowel classifications
We will detail the operation and show the results of the classification method for the three Arabic
vowels in this section. We first divided each time domain vowel in three equal segments which are: onset,
middle and offset. For each vowel frequency band as shown in Table 1, we then calculated the normalized
energy in the middle of each vowel as shown in Figure 3.
Int J Elec & Comp Eng, Vol. 13, No. 2, April 2023: 1997-2008
Int J Elec & Comp Eng
ISSN: 2088-8708
2001
We see that the vowel /a/ is characterized by a high energy in the BV1 band (more than 50%),
while the BV2 band has only 40%. The energy in the B3 band is about 10%. In the case of the vowel /u/,
band BV1 carries the most energy, roughly 80%. The BV2 band has a 20% energy level. The band BV3 has
the lowest energy value. The vowel /i/ varies from the other vowels in that it has essentially little energy in
the BV2 band and a lot in the BV3 band (around 30 percent). These findings are consistent with those of
Abajaddi et al. [31].
Based on these remarks, we developed the algorithm depicted in Figure 4 to classify the three
Arabic vowels. First, we look at the energy in the middle of the vowel in the second band BV2 (MV2). If it is
greater than 5%, one proceeds to the analysis of the energy in the middle of the vowel in band BV1, if not
one proceeds to the examination of the distribution of energy in the middle of the vowel in band BV3.
The vowel sought is /a/ if the energy in band BV1 is less than 55%; otherwise, the searched vowel is /u/.
The vowel is /i/ if the energy of the BV3 band is more incredible than 5%, otherwise, the vowel is /u/. This
algorithm’s evaluation showed a very high classification rate (over 98 %). Table 2 summarizes the results
that our algorithm achieved.
Table 1. Vowel frequency bands
BV1
100 to 600 Hz
BV2
600 to1800 Hz
BV3
1,800 to 4,600Hz
BV4
4,600 to 8,000Hz
Figure 3. The energy distribution in the middle of the three Arabic vowels in the four frequency bands
Figure 4. The algorithm for classifying the three Arabic vowels /a/, /i/, and /u/
Characterization of Arabic sibilant consonants (Youssef Elfahm)
2002
ISSN: 2088-8708
Table 2. Accuracy of vowel classification
Vowel
Classification rate
/a/
98.5%
/i/
98.9%
/u/
97.4%
3.2. Classification of alveolar/post-alveolar consonants followed by vowels /a/, /i/ and /u/
3.2.1. Alveolar/post-alveolar consonants followed by vowels /a/ or /i/
To characterize the consonants, we took the following steps: we divided the time domain of each
consonant into three equal segments. Then, the normalized energy in the middle of each consonant for each
consonant frequency band in Table 3 was calculated as shown in Figure 5. By analyzing the energy
distribution graphs of alveolar and post-alveolar consonants as shown in Figure 5, we discovered that the
energy follows the same evolution in the band B1, B2 and B3. On the other hand, the energy distribution is
different in bands B4 and B5. In the B4 band, alveolar consonants have an energy proportion of less than
30%, whereas in the B5 band, it is larger than 50%. The energy distribution for post-alveolar consonants is
flipped, with more than 60% of the energy in the B4 band and less than 10% in the B5.
Table 3. Consonant frequency bands
B1
100 to 400 Hz
B2
400 to 1,600 Hz
B3
1,600 to 3,000 Hz
B4
3,000 to 5,000 Hz
B5
5,000 to 8,000 Hz
Figure 5. The energy distribution of alveolar and post-alveolar consonants followed by the vowel /a/, /i/ and /u/
Based on the findings, we devised the algorithm shown in Figure 6, which classifies sibilant
consonants into alveolar and post-alveolar consonants when they are followed by one of the two vowels /a/ or
/i/. The following is how the algorithm works: The consonant is classed as alveolar if the energy percentage
Int J Elec & Comp Eng, Vol. 13, No. 2, April 2023: 1997-2008
Int J Elec & Comp Eng
ISSN: 2088-8708
2003
in the middle of the consonant in the B4 band: (3,000 to 5,000 Hz) is less than 35 percent. The consonant is
post-alveolar otherwise. This method has a perfect classification rate of 100%.
Figure 6. Alveolar and post-alveolar consonants classification algorithm followed by the vowel /a/ or /i/
3.2.2. Alveolar/post-alveolar consonants followed by the vowel /u/
The energy distribution of alveolar and post-alveolar consonants followed by the vowel /u/ is shown
in Figure 5. The difference in energy distribution between consonants followed by the vowels /a/ and /I/ and
those followed by the vowel /u/ is the first observation. The vowel /u/, on the other hand, shifted energy from
higher to lower frequencies. The second discovery is that alveolar consonants have essentially little energy in
the B3 band, but post-alveolar consonants have energy in this band. The energy in the other bands evolves in
the same way as the two consonant classes.
Based on this analysis, we developed the classification algorithm for sibilant consonants
accompanied by the vowel /u/, as shown in Figure 7. This method works as follows: in the band B3 (1600 to
3000 Hz), the consonant is classed as alveolar if the energy percentage in the middle of the consonant is less
than 5%. Otherwise, the consonant is considered post-alveolar. This algorithm has an accuracy of 90.5%.
Figure 7. Classification algorithm for alveolar and post-alveolar consonants followed by the vowel /u/
3.3. Classification of post-alveolar consonants /ʒ/ and /ʃ/ accompanied by the three vowels
The distribution of the energy percentage in the middle of the two post-alveolar consonants (/ʒ/ and
/ʃ/) in the five frequency bands is shown in Figure 8. It can be observed that the energy in the bands B2, B3,
B4, and B5 follows the same distribution. The energy in the B1 band, on the other hand, allows for
differentiation between the two post-alveolar cells. The consonant /ʒ/ is distinguished by the existence of
energy in the first band, whereas the consonant /ʃ/ has no energy in this band. On the basis of this finding, we
proposed the algorithm shown in Figure 9, which permits the categorization of post-alveolar consonants (/ʒ/
and /ʃ/) as follows: if the energy in band B1 is zero, the consonant is classified as /ʃ/; otherwise, the
consonant is classified as /ʒ/. This method has an accurate classification rate of 96.76 percent.
Characterization of Arabic sibilant consonants (Youssef Elfahm)
2004
ISSN: 2088-8708
Figure 8. The energy distribution of post-alveolar consonants followed by the three vowels
Figure 9. Classification algorithm for post-alveolar consonants followed by the three vowels.
3.4. Classification of alveolar consonants /z/, /s/ and /sҁ/ accompanied by the three vowels
The categorization of the three alveolar consonants /z/, /s/, and /sҁ/ according to the energy in the
middle of the consonant is a bit tricky, as seen in Figure 10. In particular, the two consonants /s/ and /sҁ/ have
the identical energy distribution in all bands, independent of the vowel that follows the consonant. However,
Int J Elec & Comp Eng, Vol. 13, No. 2, April 2023: 1997-2008
Int J Elec & Comp Eng
ISSN: 2088-8708
2005
owing to the band B1, we can distinguish the sound /z/ from the two consonants /s/ and /sҁ/. The consonant
/z/ has an energy percentage of more than 45 percent in band B1, but the two consonants /s/ and /sҁ/ have an
energy percentage of zero in this band.
We may extract the algorithm of Figure 11 from this result, which allows us to classify the alveolar
consonants /z/, /s/, and /sҁ/. This algorithm operates as follows: if the energy in band B1 is more than 30%,
the consonant is designated as /z/; otherwise, the consonant is designated as /s ҁ/ or /s/. The accuracy of this
algorithm is 94.75 percent. The results of the three methods are summarized in the Table 4. We can see that
our approach consistently outperforms or equals the results provided by other algorithms (ANN and SVM).
Figure 10. The energy distribution of alveolar consonants followed by the vowel /a/
Figure 11. Alveolar consonant classification algorithm followed by the vowel /a/, /i/ or /u/
Characterization of Arabic sibilant consonants (Youssef Elfahm)
2006
ISSN: 2088-8708
Table 4. A comparison of our algorithm’s performance with that of the ANN and SVM algorithms
Vowel classifications
Classification of alveolar/post-alveolar
consonants followed by vowels /a/ and /i/
Classification of alveolar/post-alveolar
consonants followed by the vowel /u/
Classification of post-alveolar consonants /ʒ/
and /ʃ/ accompanied by the three vowels
Classification of alveolar consonants /z/, /s/
and /sҁ/ accompanied by the three vowels
Our algorithm
93 %
100 %
ANN algorithm
83.64 %
100 %
SVM algorithm
83.33 %
99.63 %
90.50 %
90.22 %
86.11 %
96.76 %
94.77 %
93.80 %
94.75 %
93.35 %
93.15 %
3.5. Discussions
The goal of this study is to develop an algorithm that can identify the Arabic sibilant consonants (/s/,
/sҁ/, /z/, /Ӡ/ and /∫/) followed by the three Arabic vowels (/a/, /i/, and /u/) using the normalized energy
distribution of a speech signal in the previously described frequency bands. Our algorithm begins by
identifying the vowel that follows the consonant. The energy in the middle of the vowel as determined by an
acoustic analysis revealed that the three vowels contain a large amount of energy in the B1 band. The fact
that vowels are voiced sounds, that is, sounds generated by the vibrating of the vocal cords, justifies this
behavior. The energy for the vowel /i/ is focused in two bands: the first (100 to 600 Hz) and the third band
(1,800 to 4,600 Hz), with the first band containing the majority of the energy. The value of the first formant
(F1<300 Hz) which is found in the low frequencies, in addition to the energy owing to the vibration of the
vocal cords created by spoken sounds, explains the high energy concentration in the B1 band. The value of
the second formant (F2>2,000Hz), which is found in the high frequencies, is responsible for the quantity of
energy present in the third band. The most anterior and closed vowel in terms of articulation is /i/. As a result,
it has the smallest front cavity and, as a result, a very big rear cavity, resulting in a very high F2 and a very
low F1. Due to the distribution of the formants F1 and F2 (F1> 600 Hz and F2>1000 Hz), the majority of the
energy for the vowel /a/ is placed in the B1 and B2 bands. In articulatory phonetics, /a/ being the least
anterior and most open front vowel, has a rear cavity affiliated with a very high F1, exhibiting intermediate
lowering and advancement of the tongue. Because the first two formants F1 and F2 are concentrated in the
low frequencies (F1>100 Hz and F2 1,000 Hz), the energy of the vowel /u/, which is the most closed,
posterior, and rounded, is concentrated in the first band B1 [32], [33].
Once the vowel has been identified, the second phase of our algorithm consists of recognizing the
consonant. Figure 5 depicts the energy distribution of the two types of sibilant consonants: alveolar and postalveolar consonants. We discovered that when post-alveolar consonants are followed by the vowels /a/ and
/i/, the majority of their energy is concentrated in the fourth band (3,000 to 5,000 Hz), whereas alveolar
consonants have a substantial energy share in the fifth band (5,000 to 8,000 Hz). The point of constriction of
the vocal tract justifies this energy distribution from an articulatory standpoint. Between the tip of the tongue
and the alveoli, the alveolar sibilants (/s/, /sҁ/, and /z/) are articulated. The alveolar consonants offer a
maximum of energy in the high frequencies due to the pressure of expelled air at the level of this constriction.
Between the lamina and the rear of the alveoli, the post-alveolar sibilants (/Ӡ/ and /∫/) are articulated. The
energy has been decreased towards the band (3,000-5,000 Hz) at this point of articulation. We discovered
that when consonants are followed by the vowel /u/, the consonant’s energy migrates to the lower frequency
ranges (1,600 to 3,000 Hz). This is due to the influence of the vowel /u/ coarticulation on the consonant it
precedes. The sound consonants (/Ӡ/ and /z/) differ from the deaf consonants (/s/, /sҁ and /∫/) in that they have
more overall energy in the low frequencies. As previously stated, the sound consonants are created by a
vibration of the vocal cords, which explains this behavior [34], [35].
4.
CONCLUSION
The Arabic sibilant fricative consonants (/s, sҁ, z, Ӡ and ∫/), followed by the three vowels (/a/, /i/, and
/u/), were classified in this work. The suggested method’s key characteristic is the use of normalized energy
as an acoustic index in frequency ranges. Our findings indicate that the energy contained in the speech signal
is a critical element in sound characterization. The rate of proper categorization surpasses 90%. The
characterization of non-sibilant consonants will be the focus of future research.
REFERENCES
[1]
[2]
J. Cantineau, “Arabic phonetics lessons,” (in French), Paris: Klincksieck, 1960.
A. Juneja and C. Espy-Wilson, “Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning,”
Int J Elec & Comp Eng, Vol. 13, No. 2, April 2023: 1997-2008
Int J Elec & Comp Eng
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
ISSN: 2088-8708
2007
in Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP ‘02., 2002, vol. 2,
pp. 726–730, doi: 10.1109/ICONIP.2002.1198153.
A. A. al Nassir, “Sibawayh the phonologist : A critical study of the phonetic and phonological theory of Sibawayh as presented in
his treatise ?Al Kitab?,” University of York, 1993.
K. N. Stevens, “Airflow and turbulence noise for fricative and stop consonants: Static considerations,” The Journal of the
Acoustical Society of America, vol. 50, no. 4B, pp. 1180–1192, Oct. 1971, doi: 10.1121/1.1912751.
P. Ladefoged and I. Maddieson, The sounds of the world’s languages. Wiley, 1996.
S. J. Behrens and S. E. Blumstein, “Acoustic characteristics of English voiceless fricatives: a descriptive analysis,” Journal of
Phonetics, vol. 16, no. 3, pp. 295–298, Jul. 1988, doi: 10.1016/S0095-4470(19)30504-2.
L. J. Raphael, G. J. Borden, and K. S. Harris, Speech science primer: Physiology, acoustics, and perception of speech. Williams
& Wilkins, 1984.
A. Jongman, R. Wayland, and S. Wong, “Acoustic characteristics of English fricatives,” The Journal of the Acoustical Society of
America, vol. 108, no. 3, pp. 1252–1263, 2000, doi: 10.1121/1.1288413.
A. M. Abdelatty Ali, J. Van Der Speigel, and P. Mueller, “Auditory-based speech processing based on the average localized
synchrony detection,” in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat.
No.00CH37100), 2000, vol. 3, pp. 1623–1626, doi: 10.1109/ICASSP.2000.862016.
A. M. Abdelatty Ali, J. Van der Spiegel, and P. Mueller, “Acoustic-phonetic features for the automatic classification of
fricatives,” The Journal of the Acoustical Society of America, vol. 109, no. 5, pp. 2217–2235, May 2001, doi: 10.1121/1.1357814.
J. Goodacre and Y. Nakajima, “The perception of fricative peaks and noise bands,” Journal of Physiological Anthropology and
Applied Human Science, vol. 24, no. 1, pp. 151–154, 2005, doi: 10.2114/jpa.24.151.
M. Toda, “Speaker normalization of fricative noise: Considerations on language-specific contrast,” in Proceedings of the16th
International Congress on Phonetic Sciences, 2007, pp. 825–828.
M. Toda and K. Honda, “An MRI-based cross-linguistic study of sibilant fricatives,” in Proceedings of the 6th International
Seminar on Speech Production, 2003, pp. 1–6.
J. S. Perkell et al., “The distinctness of speakers’ /s/—/∫/ contrast is related to their auditory discrimination and use of an
articulatory saturation effect,” Journal of Speech, Language, and Hearing Research, vol. 47, no. 6, pp. 1259–1269, Dec. 2004,
doi: 10.1044/1092-4388(2004/095).
S. McLeod, A. Roberts, and J. Sita, “Tongue/palate contact for the production of /s/ and /z/,” Clinical Linguistics & Phonetics,
vol. 20, no. 1, pp. 51–66, Jan. 2006, doi: 10.1080/02699200400021331.
K. Maniwa, A. Jongman, and T. Wade, “Acoustic characteristics of clearly spoken English fricatives,” The Journal of the
Acoustical Society of America, vol. 125, no. 6, pp. 3962–3973, Jun. 2009, doi: 10.1121/1.2990715.
K. L. Haley, E. Seelinger, K. C. Mandulak, and D. J. Zajac, “Evaluating the spectral distinction between sibilant fricatives
through a speaker-centered approach,” Journal of Phonetics, vol. 38, no. 4, pp. 548–554, Oct. 2010, doi:
10.1016/j.wocn.2010.07.006.
Y.-Y. Kong, A. Mullangi, and K. Kokkinakis, “Classification of fricative consonants for speech enhancement in hearing devices,”
PLoS ONE, vol. 9, no. 4, Apr. 2014, doi: 10.1371/journal.pone.0095001.
A. Kochetov, “Acoustics of Russian voiceless sibilant fricatives,” Journal of the International Phonetic Association, vol. 47,
no. 3, pp. 321–348, Dec. 2017, doi: 10.1017/S0025100317000019.
D. S. Cooper, C. Scholl, L. Petrosino, R. C. Scherer, and L. H. Small, “The acoustics of fricative consonants in gulf spoken
Arabic,” ProQuest Dissertations Publishing, Bowling Green State University, Ohio, 2005.
M. A. Al-Khair, “Acoustic characteristics of Arabic fricatives,” University of Florida, 2005.
A. Benamrane, “Acoustic study of standard Arabic fricatives (Algerian speakers),” (in French), Université de Strasbourg,
Strasbourg, 2013.
P. Ghaffarvand Mokari and N. Mahdinezhad Sardhaei, “Predictive power of cepstral coefficients and spectral moments in the
classification of Azerbaijani fricatives,” The Journal of the Acoustical Society of America, vol. 147, no. 3, pp. EL228–EL234,
Mar. 2020, doi: 10.1121/10.0000830.
Y. Elfahm, N. Abajaddi, B. Mounir, L. Elmaazouzi, I. Mounir, and A. Farchi, “Classification of Arabic fricative consonants
according to their places of articulation,” International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 1,
pp. 936–945, Feb. 2022, doi: 10.11591/ijece.v12i1.pp936-945.
S. A. Liu, “Landmark detection for distinctive feature‐based speech recognition,” The Journal of the Acoustical Society of
America, vol. 96, no. 5, pp. 3227–3227, Nov. 1994, doi: 10.1121/1.411152.
S. Boyce, H. Fell, and J. MacAuslan, “SpeechMark: Landmark detection tool for speech analysis,” 13th Annual Conference of the
International Speech Communication Association, pp. 1894–1897, 2012.
R. P. Lippmann, “Review of neural networks for speech recognition,” Neural Computation, vol. 1, no. 1, pp. 1–38, Mar. 1989,
doi: 10.1162/neco.1989.1.1.1.
J. Bloemer, J. Lemmink, and H. Kasper, “Neural nets versus marketing models in time series analysis: A simulation study,” in
Proceedings of the 23rd Annual Conference of the European Marketing Academy, 1994, pp. 1139–1153.
V. Venugopal and W. Baets, “Neural networks and statistical techniques in marketing research,” Marketing Intelligence &
Planning, vol. 12, no. 7, pp. 30–38, Aug. 1994, doi: 10.1108/02634509410065555.
C. J. C. Burges, “Tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2,
pp. 121–167, 1998, doi: 10.1023/A:1009715923555.
N. Abajaddi, Y. Elfahm, B. Mounir, L. Elmaazouzi, I. Mounir, and A. Farchi, “Efficiency of the energy contained in modulators
in the Arabic vowels recognition,” International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 4,
pp. 3601–3608, Aug. 2021, doi: 10.11591/ijece.v11i4.pp3601-3608.
Y. A. Alotaibi and A. Hussain, “Speech recognition system and formant based analysis of spoken Arabic vowels,” in Future
Generation Information Technology, 2009, pp. 50–60.
Y. Korkmaz and A. Boyaci, “Classification of Turkish vowels based on formant frequencies,” in 2018 International Conference
on Artificial Intelligence and Data Processing (IDAP), Sep. 2018, pp. 1–4, doi: 10.1109/IDAP.2018.8620877.
M. Toda, “Articulatory and acoustic study of sibilant fricatives,” (in French), Ph.D. thesis, Université Paris III, 2009.
Y. Meynadier, “Elements of acoustic phonetics,” (in French), in Méthodes et outils pour l’analyse phonétique des grands corpus
oraux, Hermes Science Publications, 2010, pp. 25–83.
Characterization of Arabic sibilant consonants (Youssef Elfahm)
2008
ISSN: 2088-8708
BIOGRAPHIES OF AUTHORS
Youssef Elfahm
received a master’s degree specializing in automatic control, signal
processing and industrial computing from the University of Hassan First in Settat, Morocco, in
2017. Currently, he is a professor in the Department of Electrical Engineering at Alkhaouarizmi
Technical High School. His research interests include speech recognition systems, speech
production, and artificial intelligence. He can be contacted at
[email protected].
Nesrine Abajaddi
was born in Casablanca, Morocco, in 1994. received a master’s
degree specializing in automatic control, signal processing and industrial computing from the
University of Hassan First in Settat, Morocco, in 2017. She is currently a Ph.D. student in
Engineering, mechanical, Industrial Management, and Innovation Laboratory research
Laboratory, Faculty of Sciences and Technics, Hassan First University. She can be contacted at
[email protected].
Badia Mounir
was born in Casablanca, Morocco, in 1968. He received an engineer
degree in 1992 in automatic and industrial computing, The Mohammadia School of Engineering,
Rabat, Morocco. She is an assistant professor at Graduate School of Technology, University Cadi
Ayyad since 1992. Habilitaded to supervise research (HDR) since 2007 and professor of higher
education (PES) since 2017. Member of Laboratory of Process, Signals, Industrial Systems,
Informatic (LAPSSII). Her research interests include speech recognition, signal processing,
energy optimization and modeling. She can be contacted at
[email protected].
Laila Elmazouzi
Ing Ph. D. in Telecommunication and Networks. Habilitaded to
supervise research (HDR) at High School of Technology- Cadi Ayyad University. Member of the
LAPSSII Laboratory (Laboratory of Process, Signals, Industrial Systems, Informatic). Her
research interests include telecommunication, signal processing, emotion recognition, machine
learning. She can be contacted at
[email protected].
Ilham Mounir
is a Ph.D. in applied mathematics. She habilitaded to supervise
research (HDR) at High School of Technology, Cadi Ayyad University. Member of LAPSSII
(Laboratory of Process, Signals, Industrial Systems, Informatic). Her research interests include
applied mathematics, signal processing, emotion recognition, speech recognition, and energy:
optimization and modeling. She can be contacted at
[email protected].
Abdelmajid Farchi
Ing received a Ph.D. in electric engineering and
telecommunications and is now a chief of research team “Signals and Systems” in Laboratory of
Engineering, Industrial Management and Innovation. He is an educational person responsible for
the cycle engineer electrical systems and embedded systems of the faculty of the sciences and
technology of Settat, Morocco. He can be contacted at
[email protected].
Int J Elec & Comp Eng, Vol. 13, No. 2, April 2023: 1997-2008