Report End of Studies
Report End of Studies
Report End of Studies
ASSIAR Nouheila
Title :
Host Company: Schneider Electric
Internal
Acknowledgments
I am deeply grateful to all people who have supported me and accompanied me during my
end of studies internship. First, I would like to thank my supervisors at Schneider Electric,
Denis DODELIN and Nicolas WENZEL for their efforts. Their guidance helped me to better
understand the methodology of research and allowed me to improve my reflexions and
obtain new findings. I would also like to express my sincere gratitude to both my Professors
at Arts et Metiers Arnaud POLETTE and Mustapha HAIN, they provided me with clear
direction and explanations and were always available to answer my questions and give
valuable feedback. Finally, I would like to give a special thanks to the project team, they
were always willing to share their expertise, explanations, and details of the project and
have contributed in a variety of ways to facilitate my integration in the company.
Internal
Abstract
Deep learning has gained much interest in many fields of diagnosis, however there is still lack
of research on the field of fault diagnosis of circuit breaker. This paper proposes Shazam-Net, a light-
weight deep learning network based on the combination of convolution neural network and recurrent
neural network that take the raw audio signal of tripping sound in the circuit breaker and classifies it
into five classes of faults with an accuracy of 95%. The network is composed of three parts: a
convolution neural block used to extract information from the raw audio waveform, Long-short term
memory layers are added to retrieve the temporal dependencies from the high dimensional output
and finally a multi-layer-perceptron is added to perform the classification of extracted features into
five classes. Besides that, techniques of quantization and pruning are used to compress the size of the
model to be embedded in devices with limited resources. The proposed approach is trained and
validated using sound signals collected from an embedded microphone recording the tripping event
inside the circuit breaker. The results show that the network has a high diagnostic accuracy compared
to other models that requires more memory.
Internal
Contents
Acknowledgments ................................................................................................................................... 2
Abstract ................................................................................................................................................... 3
List of Figures ........................................................................................................................................... 5
List of Tables ............................................................................................................................................ 6
I-Introduction ........................................................................................................................................... 7
I.1 Problem Statement ........................................................................................................................ 8
I.2 Motivation and Contribution ......................................................................................................... 9
I.3 Overview of the organization ......................................................................................................... 9
II-State-of-the-Art .................................................................................................................................. 11
II.1-The spectogram-based approach ................................................................................................ 12
II.2 The raw audio signal-based approach........................................................................................ 13
III- Shazam-Net architecture .................................................................................................................. 14
IV-Experimental Setup ........................................................................................................................... 18
IV.1-Data generation Pipeline ............................................................................................................ 19
IV.1.1-Data Acquisition .................................................................................................................. 19
IV.1.2-Data pre-processing ............................................................................................................ 21
IV.1.3 Data augmentation .............................................................................................................. 22
IV.2 Implementation ......................................................................................................................... 23
IV.2.1 Models development ........................................................................................................ 23
IV.2.2 Models deployment ........................................................................................................... 27
V Results and discussion ....................................................................................................................... 29
V.1- Evaluation criteria...................................................................................................................... 30
V.2- Quantitative analysis .................................................................................................................. 31
....................................................................................................................................................... 35
V.3- Synthesis of comparison ........................................................................................................... 35
V.4- Qualitative analysis .................................................................................................................. 38
VI- Conclusion and future Work............................................................................................................. 40
References ............................................................................................................................................. 41
Internal
List of Figures
Figure 1: Compact NSX250 circuit breaker with the smart cover for signal classification function ...... 5
Figure 2 : Schneider Electric –R&D hub Electropole ........................................................................... 10
Figure 3: Overall architecture of Shazam-Net....................................................................................... 15
Figure 4 : Internal structure of an LSTM unit ....................................................................................... 16
Figure 5: Experimental Setup used to record the signals in the circuit breaker .................................... 19
Figure 6: Sound vibration signals .......................................................................................................... 20
Figure 7: Data pre-processing pipeline for tripping sound classification in the circuit breaker. ........... 21
Figure 8: Example of aberrant signal .................................................................................................... 21
Figure 9: Audio signal with a portion of silence ................................................................................... 22
Figure 10: Architecture of 1D convolution neural network .................................................................. 25
Figure 11: Spectogram representation of High-overcurrent signal ....................................................... 26
Figure 12: Architecture of 2D convolution model ................................................................................ 27
Figure 13: Architecture of a multi-layer perceptron with 1-dense layer ............................................... 28
Figure 18: Architecture of different models used for the evaluation..................................................... 36
Figure 15: Classification accuracy and number of parameters for various number of filter size ......... 33
Figure 16: Classification accuracy and number of parameters for various number of convolution
layers and filters .................................................................................................................................... 34
Figure 17: Accuracy for each number of dense layers with various number of hidden neurons in MLP
............................................................................................................................................................... 35
Figure 18: Results in terms of Flash and Ram before and after optimization of Shazam-net ............... 36
Figure 19 : Specifications of different STM32 boards .......................................................................... 36
Figure 20: Inference time of Shazam-net model on different STM32 microcontrollers ...................... 37
Figure 21 : Confusion Matrix of Shazam-net ....................................................................................... 40
Internal
List of Tables
Internal
I-Introduction
Internal
Circuit breaker is one of the most important components in a power system. It is designed to
protect an electrical circuit in the event of an overcurrent or short-circuit. There are many different types
of circuit breaker depending on their operating voltage, from low voltage ones designed to protect low-
voltage circuits or house equipment, to High-voltage ones used to protect high voltage circuits that feed
an entire city. The function of the circuit breaker is to interrupt current flow to prevent the risk of fire,
then the circuit breaker is reset to resume the normal operation.
The traditional maintenance is quite costly and time-consuming, in recent years, various
approaches have been proposed to diagnose the circuit breaker using the vibration signal. Initially,
machine learning algorithms used the features extracted from the sound signal, like the root mean square
envelope, the amplitude envelope, energy distribution and other signal characteristics to analyse the
signals. Recently, deep learning-based methods have received attention from the researchers in the field,
there exist two approaches. approaches based on spectrograms extracted from the signals that transform
the 1D-signal into pictures and adopt techniques of image classifications to classify the signals, a second
approach that directly use the one-dimensional signal as an input to models.
Although these techniques have shown to be effective, they still have two open issues: first,
most of the existing models have a high number of parameters and hence they require a large memory
size to be deployed in an edge device, second a prior knowledge and parameter selection is required to
extract the features from the signal, therefore insufficient expert knowledge may reduce the diagnostic
accuracy of these methods.
Figure 1:Compact NSX250 circuit breaker with the smart to cover for
signal classification function
Internal
I.2 Motivation and Contribution
The project team has already worked on models to classify the trip causes by extracting features
from the tripping sound signal and applying different machine learning algorithms. Recently, deep
learning has been successfully used to solve complex problems that are sometimes difficult to solve with
more traditional approaches. Therefore, it is important to investigate the deep learning approaches on
such application and whether it can be deployed despite all its constraints.
Our motivation is to fill the gap in the existing techniques by proposing a novel approach of
deep learning able to classify the trip causes in the circuit-breaker directly by using the raw audio signal.
In summary, the main contributions of this study are threefold:
(i) To the best of our knowledge, this is the first attempt to classify the trip causes of low-
voltage circuit breaker using deep learning algorithm on the sound signal.
(ii) A model that takes the raw tripping audio signal of the circuit-breaker as input and
correctly classifies it into five classes of faults with an accuracy of 95.50%.
(iii) A light-weight model with a small number of parameters that can be embedded in
devices with limited resources.
The paper is organized as follows: section 2 reviews the works related to deep learning
approaches applied to diagnose the circuit-breaker using the sound signal. The overall architecture of
the model Shazam-net is proposed in section 3 with the details of each part. Section 4 Shows the
experimental setup and the results are presented and discussed in section 5, the paper end with a
conclusion and discussion about the future work in section 6.
Schneider Electric is a world leader in energy management and automation in more than 115
countries. It is a global specialist in electrical management, medium voltage, low voltage, and secure
energy. The company proposes integrated solutions for residential housing, buildings, the service sector,
data centers, infrastructure, and industry. Schneider Electric is a Fortune Global 500 company and is
publicly traded on the Euronext Exchange. The company offers products and services to meet the needs
of its customers from energy and sustainability consulting to optimizing the lifecycle of assets. it has
made several acquisitions over the years, including Télémécanique in 1988, Square D in 1991, and
Merlin in 1992, In January of 1999, Schneider acquired the Scandinavian switch-maker Lexel and later
that year, the company renamed itself Schneider Electric to reflect its focus on the electricity sector.
In November 2021, the company launched a global AI Hub focused on data and analytics. It has
more than 200 AI experts collaborating with the company’s market-leading domain expertise to offers
the best services to customers and remain competitive on the market for technology.
To assist their customers in making agile decisions, the AI Hub aims to offer AI solutions via
the cloud and edge in different fields. This includes area such as energy efficiency using solar panel,
automation in electric vehicles, improving the customers experiences on Websites, and predictive
Internal
maintenance application on different products. They also try to improve the intern process by
developing chatbots for recruitment, marketing and logistics tools.
The Electropole site, located in Eybens, is one of Schneider Electric's main R&D centers for
low-voltage electrical distribution products. It comprises 33,500 m2 of premises, including 7,000 m2
of laboratories, my internship took place in the Energy Management Business Unit. It includes 5
divisions: Home and Distribution, Power Systems and Digital Energy and Secure Power. I had the
opportunity to work within the power product division under the supervision of Denis DODELIN
within the Embedded system team managed by Jean-Baptiste BERNARD.
10
Internal
II-State-of-the-Art
11
Internal
Research in fault diagnosis has gained much momentum in recent years. Different techniques
for circuit breaker have been developed to reduce maintenance time and costs. This section gathers
traditional ML techniques and deep learning approaches that address the fault diagnosis of circuit
breakers.
Embedding a microphone inside the circuit breaker is a complex task. Therefore, most of the
existing fault diagnosis methods are mainly based on vibration signals, and there is a lack of research
on techniques based on sound signal. There are differences between vibration and sound signals,
however both are represented by time series waveform. Therefore, it could also be interesting to look
for techniques applied on vibrations signals although our research is focused on sound signals.
The traditional approaches rely on the information extracted from the signal to understand the
signal properties such as the RMS, the amplitude envelope of the signal and the zero-crossing rate. H.
N. et al. [1] uses support vector machine learning algorithm on vibration signal to detect the mechanical
faults in high-voltage circuit breakers, but it can only identify the fault and no-fault of the circuit breaker
and has limitations to detect the category of fault. W.B et al. [2] have exploited 16 features extracted
directly from the original signal in time-domain to construct a light gradient boosting model which can
locate faulty components in power distribution systems. These methods show a good result, but suffer
from several limitations. For instance, the growth strategy of the decision tree directly affects the
efficiency of classification and increase the risk of overfitting caused by a large number of features. In
recent years, a few attempts have been made to overcome the limitation of traditional methods using
deep learning models to analyse the signals and monitor the state of the circuit breaker. The techniques
for fault detection based on sound signal can be classified into two main categories, namely spectogram
based approach and raw-audio signal based approach.
Instead of directly processing the raw audio signal, Yang et al [3] used the Hilbert-Huang
transform to transform the raw-audio signal into a frequency domain, then a 2D convolution neural
network is trained to recognize the electrical fault. L. A. P. et al. [4] used Alex Net, a 2D- convolution
neural network (2D-CNN) to diagnose the faults of the contact system in a conventional circuit breaker
based on spectrogram extracted from the original vibration signal. Although these methods seem to
achieve good results, with the transformation process from the raw signal to the spectogram, the
information of the signal can be lost which affect the fault diagnosis performance. Ling et al. [6] have
introduced dual stream convolution neural network based on bi-spectrum analysis to detect three types
of faults: jam of tripping closing electromagnet; jam of principle axles and half shaft jam. Qiang et al.
[7] used a long short-term memory to detect four fault types in an HVDC substation located in southwest
China. Luo et al [8] proposed a technique to diagnose the faults by using the sparse representation
classification (src) algorithm on the mel frequency cepstrum coefficients (mfcc) extracted from the
tripping sound signal of the circuit breaker. The mfcc features are extracted based on the characteristics
of human hearing, however experiments shows that the it is unable to differentiate the fault causes from
the audio signal. Although the above research has achieved some good results, there are still the
following problems:
12
Internal
• Most of these techniques rely on features extracted from the raw signal. These features
are considered hand-crafted as we must define various parameters such as the window
type, the number of Mel bins, and the hop length to extract the spectogram. The results
of fault detection models is directly affected by the quality of these features and hence
the choice of suitable parameters, which requires strong expert experience.
• Spectogram-based approach requires a pre-processing step to transform the raw audio
signal into spectogram presented by a two-dimensional image, which results on large
amount of computing resources.
The second approach keep the raw audio signal unchanged and use the amplitude of the signal
at each timestamp to train the network. Sun and al [9] adapted a one-dimensional convolution neural
network. H. F. et al. [10] proposed a feature fusion module composed of autoencoders to extract
information entropy from the signals and constructed a dual intelligent fault diagnosis algorithm. Zhang
et al [11] introduced a novel method named deep convolutional neural networks with Wide First-layer
kernel (WDCNN) that uses raw sound vibration signals as input and uses the wide kernels in the first
convolution layer for supressing high frequency noise in signals. Several works have demonstrated the
ability to obtain good performance using convolution neural network. [12] proposed Simple Net network
to learn from the raw vibration signal with a very simple structure and high model interpretability. [13]
introduced a one-dimensional convolutional capsule neural network which add the attention mechanism
after the pooling layer to increase the feature extraction capability. These methods have shown to be
effective Thanks to their ability to learn features from the raw audio signal. They overcome the
drawbacks of the spectogram-based approach. However they still suffer from a huge number of
parameters as the input signal is high dimensional. In addition, they need a large dataset of signals to
prevent the overfitting, which is not easy because of the difficulty of measurement in the circuit breaker
environment.
13
Internal
III- Shazam-Net architecture
14
Internal
To extract the characteristics of sound signals and improve the accuracy of fault diagnosis,
shazam-net is proposed combining the traditional 1D-CNN with recurrent neural network. The overall
structure is shown in Fig 3.
The intuition behind using shazam-net is to perform the feature extraction efficiently by the
model and build a full architecture with limited parameters able to determine the right class of current
based on the raw sound signal. To achieve this goal, Shazam-Net has been implemented based on three
main components: Feature Extractor, Temporal aggregation mechanism, and multi-layer-perceptron
classifier.
• Feature extractor used to extract features from the raw audio signal in an efficient
way. It has been implemented based on a convolutional neural network, The input is a
one-dimensional vector [𝑥1 , 𝑥2 , 𝑥3 , … … … . . 𝑥𝑡 ] where 𝑥𝑡 is the amplitude of the signal
at timestamp t. We feed the vector into a 1D Convolution neural network for feature
extraction. 1D-CNN is a neural network used to recognize patterns in data. It has
typically two main layers: a convolutional layer and a pooling layer. The first layer
extract features that distinguish different waveform by performing a dot product
between two vectors. One vector is the set of learnable parameters otherwise known as
a kernel or filter, and the other vector is the audio signal. The second layer is the pooling
layer, it provides an approach to summarize the presence of features in the output of
convolutional layers which reduces the parameters and overall computation of the
network. CNN are one of the most used architectures for computer vision, and they also
appear in almost all state-of -the art audio signal waveform. However most of the
existing approaches tried to classify audio signals using a convolutional neural network
15
Internal
with a large number of layers. This actually leads to difficulty of optimization and
embedding.
Figure 4 :internal structure of an LSTM unit, h_(t-1) :hidden state at previous timestamp (short-term-
memory;C_(t-1) :cell state at previous timestamp t-1 (long-short memory); X_t the input vector at t,
h_t :hidden state at t ; C_t cell state at t ; σ is the sigma function
LSTM neural network is composed of multiple LSTM cells, and each LSTM cell
contains a certain number of units. Figure 4 shows the internal structure of an LSTM
unit. The architecture shows two main states, the cell state and it represent the long-term
memory and the hidden state, and that represents the short-term-memory. To ensure the
interaction between the long and short-term memories. The LSTM units is composed of
three gates:
16
Internal
1-Forgate gate: This gate controls what information should be throwing away
from the cell state, as described in equation (1). According to the result of sigmoid
function, the information in the cell state should be completely discarded ( a value of
0), remembered (a value of 1), or partially preserved (some value between 0 and 1).
2-Input gate: This gate helps to identify what information we will update into
the cell state, it contains two main blocks, a first block that combine the short term-
memory and the input to create a potential long-term memory according to the equation
(2) and a second block that determines what percentage of that potential memory to add
to the long-term memory according to Equation (3).
3-the output gate: This final stage in LSTM updates the short-term memory or
the hidden state, we use the long-term-memory as an input to tanh activation function,
the output represents a potential short-term memory. This output will be multiplied by
the result of the sigmoid function to determine what percentage of information should
be remembered according to equations (4) and (5).
17
Internal
IV-Experimental Setup
18
Internal
This section introduces the data generation pipeline designed to train and validate shazam-Net
model, it also includes the settings and details of different blocks of Shazam-Net, and the
implementation of other networks including multi-layer perceptron, 1D-CNN and 2D-cnn to compare
the results in term of accuracy and number of parameters.
IV.1.1-Data Acquisition
We intend to classify the tripping events of the circuit-breaker compact NSX250 using the
tripping sound. For this purpose, the compact NSX250 from Schneider electric’s products is used. Fig
5 shows the experimental equipment diagram.
Figure 5: The experimental Setup used to record the signals in the circuit breaker
A microphone is used to record the tripping sound of the circuit-breaker while injecting different
values of current inside. Specifically, an audio dataset with 5 different classes is collected based on the
current that caused the trip. We carefully define scenario where the user pushes the circuit breaker
manually with no current class to demonstrate the case of no fault has occurred to trip the circuit breaker.
Figure 6 shows different signals with the respective current class. From the figure, we can observe that
the waveforms of different types of signals are significantly different. However, we are not able to
recognize all signals within the same class from the sound or the waveform.
19
Internal
Figure 6 :The sound vibration signal: (a) no current, (b) small current, (c) Low Overcurrent,(d) Moderate
Overcurrent, (e) High Overcurrent
During data acquisition, information about the experimental conditions is collected. It includes
the station, the product serial number, the microphone reference, the current and voltage value, the type
of defect (single-phase, bi-phase or three-phase) and the signal recording time. This information could
be required to compare and interpret models’ classification mistake.
The sound signal during the Circuit-breaker operation is collected by a data acquisition card at
a sampling rate of 55kHz for a duration of 0.54s. A total of 2500 signals is collected. The experimental
samples information is showed in Table 1. as it is described in the table, the signals across classes are
not equals, the number of moderate-OC and high-OC signals are less than the other classes as obtaining
signals under this range of current caused damages to the circuit breaker which is costly and more time
consuming since we have to replace broken breakers more often during the test. This paper studies the
classification of the sound signals according to the five classes to determine the trip causes of the circuit
breaker.
No current 0A 200
20
Internal
IV.1.2-Data pre-processing
The dataset is pre-processed and used to compare shazam-net with other deep learning models.
Data pre-processing is an important step to build any machine learning model and enhance its
performance. It includes removing aberrant signals, resampling, removing silence and splitting data.
Figure 7 shows data pre-processing pipeline for tripping sound classification in the circuit breaker.
Figure 7 : The data pre-processing pipeline for tripping sound classification in the circuit
breaker.
The first step in pre-processing is to remove aberrant signals in the dataset caused by recording
error of microphones signals as they can have an influence on the overall dataset, figure 8 shows an
example of aberrant signal, the maximum value of signal amplitude is calculated then according to a
threshold the signal is removed from the dataset.
21
Internal
The original raw audio signals have a varying sampling rate which is not suitable, as training a
deep learning model requires input of the same length. The length of 1-Dimensional vector directly
affects the size of input and hence the number of parameters of models. Therefore resampling is applied
to balance the computational cost and accuracy. The resulting 1-dimensional vector has a sampling rate
of 16 kHz which result in 1200 samples where samples correspond to the amplitude of signal at each
timestamp.
Figure 9 shows that a large portion of signals contain silence which doesn’t provide information to
models to distinguish between different classes. Therefore, we add a function to remove silence in order
to reduce the length of the input signal, then a zero-padding is added to ensure that all the signal vectors
are of equal length.
The dataset is split into three subsets: train set to fit the model parameters, the validation set is used to
tune model hyperparameters and finally the test set is reserved to evaluate the models. a ratio of 80% is
reserved for the train, 10 % for validation and 10% for test set which implies that the train set contains
1800 signals, the validation set contains 255 signals and the test set contains 140 signals.
The original dataset has only 2500 audio signals which is not sufficient for deep learning models.
Therefore, several augmentations have been applied to increase the size of the dataset.
Data augmentation is a technique used in machine learning to increase the diversity of the
training set Applying different transformations on the original dataset reduce overfitting, especially
when the size of the dataset is small. For audio signals, there exists two types of augmentations:
augmentations directly applied on the raw audio, and augmentations applied on the extracted
spectrograms from the audio signal. The specific transformations applied to the dataset are described in
table 2. Although these augmentations add a level of robustness to model, they can introduce some level
of artifacts which destroy the legitimacy of the original data. Therefore, the original dataset is divided
into two set, the train set and test set, and augmentations were applied only to the train set, whereas the
validation and test set remains without any transformations in order to test accurately the performance
of models. Specifically, we apply 6 augmentations directly on the raw audio signal in the training data
set. Using data augmentation, the size of train dataset increases from 1800 signals to 10 800 signals. To
apply all the augmentations, the audiomentations library of python is used.
22
Internal
Transformation Type Transformation Description
Time stretching Increase/Decrease the audio speed
Pitch scaling Change the pitch or frequency
Raw audio augmentations Time shifting Shift audio left/right in time axis
Noise addition Add Gaussian or white noise
Low/high/pass filters Apply various filters individually
Time masking Mask certain portion of time
Spectogram augmentations
Frequency masking Mask certain frequencies
IV.2 Implementation
The main experiment consisted of two steps: the first part of the proposed work is to implement a light-
weight model able to identify the trip causes using the raw signal of tripping sound. The second part is
to optimize the size of model and compare between different neural networks on different
microcontrollers. This section explains the implementation of the network Shazam-Net along with other
models including multi-layer-perceptron, 1D-cnn and 2D-cnn.
As no previous research has been done on similar dataset, a direct comparison is not possible.
However, to compare the performance of the proposed method, we rely on some previous models used
in audio signal classification works. Three models have been tuned on our dataset, 1D convolution
neural network, a multi-Layer-perceptron and a 2D convolution neural network trained on spectrograms
extracted from the raw audio signal.
Several experiments have been conducted to choose the best architecture of each model. With this
experiment, we are interested to compare the performance of the proposed model shazam-Net with other
architectures in terms of accuracy and number of parameters. The same training strategy has been
adopted to train the simple 1D convolution neural network, the MLP and 2D-CNN Network.
1-Shazam-Net
23
Internal
To decide the architecture of each of the Three components of Shazam-Net, several experiences have
been conducted:
• First, we are interested in determining the architecture of the feature extractor block. As
mentioned in section 4, it consists of a 1D convolution neural network. Architecture
parameters, like convolution layers number, pooling layers number, filters number, and
filter size, directly affect algorithm performance. There are too many combinations to
try all possible architecture. So, I decided to train the model by changing only 1
parameter at a time, keep the best parameter, then proceed with next parameter.
• For Multi-layer perceptron classifier: The MLP is used in the last part of the
architecture. It consists of one or more than non-linear layers called hidden layers each
layer has multiple neurons that learns a set of weights. Generally, in the case of small
dataset, a large number of hidden layers are not needed, however there is no proved or
accepted theory in determining the number of neurons in hidden layer, thus, the process
of selecting a suitable parameter of hidden layer and hidden neurons is crucial for the
optimal result of classification.
The notation (10,20,30) in section 5.1 indicates that three convolution layers were used, with
respectively 10, 20, and 30 filters in each convolution layer. Temporal aggregation mechanism is
ensured by the LSTM network, the number of LSTM Layers and units in each layer are the main
hyperparameters that influence the results of the second part of the network, in the same manner, the
notation (10,20) indicates using two LSTM layers with respectively 10,20 units in each layer and finally
the last part of the network is a Multi-Layer perceptron composed where the last layer has 5 neurons
according to the number of current classes and the number of dense layers is varying with the number
of neurons in each layer.
Setting the appropriate configuration requires a large number of experiments which is time
consuming and computationally expensive, however there exists some assumptions that may be taken
into consideration in order to limit the range of experiments. generally, in the case of small dataset,
small number of layers is enough to build a suitable model, in addition, it is important to emphasize
that the aim of this study is to build an accurate model with small number of parameters, a deeper
network with large number of units will significantly increase the network, thus, we limit the number
of layers in each block of the network.
In deep learning, different hyperparameters including the loss function, number of epochs, the
batch size, will generate different results. The optimum values are decided based on experience and the
effectiveness of the model on validation set. For this purpose, the network was randomly initialised and
trained for 50 epochs, cross entropy loss function is used with adam optimizer and a batch size of 64.
Then different values have been experimented to decide the optimum hyperparameters. The results of
different hyperparameters are shown in section 5.1.
24
Internal
2. 1D-Convolution neural Network
In this experiment, we would like to feed the raw audio waveform to the model and let it extract
the useful features from the signal. Firstly, we would like to investigate how the number of convolutional
layers and filters can affect the network and secondly how much the pooling layers can affect the
learning of the algorithm.
Filter size is one of the parameters used in a convolution layer, it refers to the height of the
filter used for the convolution operation. Most existing architectures uses 3 or 5 as the filter size by
analogy to the networks in computer vision filed, however there is no experiment or study to explain
the idea behind using this size. In a convolutional neural network, does increasing the size of kernel
always result in better accuracy? To answer this question, we use a one convolution layer with 20
filters and various number of filter size followed by a max pooling layer. Figure 10 shows the
percentage accuracy along with the number of parameters for different number of filter size.
In a second experiment we would like to investigate the number of convolution layers and filters,
we begin with a simple network, which contains one convolutional layer followed by a max pooling
layer and a dense layer with 5 neurones to perform the classification. The architecture used in these
experiments is presented in figure 12. It is common for a convolutional layer to learn from 32 to
512 filters in parallel for a given input [17]. we varied the number of filters in this range of values to
find the best configuration. Figure 13 shows the classification accuracy along with the number of
parameters for different number of filters in the convolution layer.
25
Internal
2D-convolution neural network is a model used for spectogram based approach, first we must
extract spectogram from the raw audio, then we feed this information to a 2D convolutional neural
network. A spectogram is a time-frequency representation of a signal.
Figure 11 represent the spectogram of a high overcurrent audio signal, the vertical axis displays
frequencies in Hertz, the horizontal axis represents time, and amplitude is represented by brightness.
The brighter is the colour the heavily is concentrated the sound around those specific frequencies and
darker is the colour the quieter is the sound. Spectrograms are obtained by first splitting the audio into
overlapping windows, then performing the short time Fourier transform on each window. This approach
is considered a handcrafted approach as we must define several parameters, including the length of the
window, window type, and hop length which directly affect the shape of the spectogram and hence the
performance and the number of parameters of the model.
In this experiment, firstly, we would like to investigate how various parameters can affect the
performance of the model, then we would like to compare the spectogram based approach to the raw
audio-based approach where the features are automatically learned by the model.
The hop length is the number of samples between consecutive frames. Ideally a shorter hop
length should yield to more precision and hence better results, which comes at a higher computational
cost. In this experiment, the whole audio signal has a total duration of 54ms. Therefore, we choose to
reduce the frame size and hop length to smaller values to result on better precision.
Figure 12 shows the architecture of 2D convolution model used in this experiment. The
methodology adopted to choose the number of convolution layers, number of filters and filter size is
similar to the previous experiment with Shazam-net and 1D convolution model.
26
Internal
4. Multi-Layer-Perceptron
Several previous studies have suggested MLP, by using a combination of features extracted
manually from the audio signal. However, in this experiment, we would like to feed the entire raw
waveform to the network. It is to the model to extract the meaningful features and perform the
classification task. We will use an MLP with various number of hidden-layers and hidden-neurons to
compare the performance. The architecture used in this section consists of three layers, input layer,
hidden layer and output layer. The input layer represents the raw waveform, the hidden layer consists of
1-50 neurons. The output layer consists of five neurons, it represents the classification result of the five
class of current. Figure 13 shows the architecture of a multi-layer-perceptron with only one hidden layer.
In further experiments, we used the same architecture with the exception of varying the number of
hidden layers. To prevent the overfitting, a dropout layer has been added to the network.
We would also like to explore pretrained models on shazam dataset, however most of the
publicly available models have a huge number of parameters. Shazam function has specific constraints
in term of memory usage; therefore, we presume that they cannot be deployed as they are heavy in term
of parameters.
Several experiments have been done to choose the optimal configuration and hyperparameters
for each model. Figure 19 shows the architecture of different models used for the evaluation. The second
27
Internal
part of this work is to optimize and compare between different models for deployment. One of the main
criteria to choose the best model is the memory size. Generally deep learning models are heavy in terms
of numbers of parameters which require a large size of memory. To reduces the model size, there exists
two mains techniques: pruning and quantization. Pruning is a method focused on removing some of the
model weights to reduce the number of parameters and hence it’s memory consumption. It has been
shown to achieve significant efficiency improvement while deploying models. The second technique is
quantization, which consists of converting the weights of the model into integer type to consume less
memory and performing faster calculations. However, both quantization and pruning can result in
reduced accuracy. Quantization aware training is a technique used during training of the model to
remains the accuracy of model. TensorFlow provides a library to perform the quantization using trained
models, first we optimize the model using TensorFlow API, second, we use STM32 Cube AI developer
platform to optimize, quantize, and deploy the trained model on different STM32 microcontrollers. The
optimization process is done using with the inference time, the memory size, or a trade-off between
both. the quantization stage is done by TensorFlow quantization interface then the models are running
on several STM32 Boards hosted in STM premises remotely to get the required Flash and Ram as well
as the inference time.
28
Internal
V Results and discussion
29
Internal
This section discusses the results of the proposed model Shazam-Net in section 3. The model is
validated using three other models trained on the same dataset to compare the performance in term of
accuracy of number of parameters.
Our intended application for audio signal classification is to create a smart cover for the breaker
to determine the trip cause. In this application, edge computing is preferred over cloud computing as the
circuit breaker will not have any connectivity with the external environment. Therefore, the trained
model will be deployed on a microcontroller of the STM32U5 family. It’s a low-power series used in
different applications of industrial field, it is offering up to 786 kB of memory, the memory’s size
allocated for the classification model is 500 kB. A compromised between the size of memory and the
accuracy is the main criteria to choose the best architecture. The inference time is not crucial in the
deployment as the predictions are not used in real time. However it will be taken into consideration to
visualize the performance of the microcontroller used and compare it to other microcontrollers for
further projects.
It is necessary to mention that the size of memory is directly affected by the number of
parameters in a neural network. Getting the memory size and flash size required by each model is time
consuming, especially, with a large number of configurations for each architecture. Therefore, we decide
to rely on the number of parameters to choose the best configuration for each architecture, then the best
configurations are compared in terms of memory usage. Apart from using accuracy to see the global
performance of the model. It is necessary to look at the confusion matrix, precision, and recall analysing
the results of prediction on each class. For instance, a model which confuses the class of high overcurrent
with that of small current is much less accurate than a model which confuses the class of high overcurrent
with that of moderate overcurrent.
The equations below present the definition of the metrics used. Note that TP, TN, FP, and FN are True
Positive, True Negative, False Positive, and False Negative, respectively.
𝑇𝑃+𝑇𝑁 𝑇𝑃
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (1) 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃+𝐹𝑁 (2)
𝑇𝑃 2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑜𝑛 = 𝑇𝑃+𝐹𝑃 (3) 𝐹1_𝑠𝑐𝑜𝑟𝑒 = (4)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
30
Internal
V.2- Quantitative analysis
Table 3 shows the results with various configurations of the proposed network Shazam-Net in
terms of accuracy and number of parameters. We notice that with simple network, we are above 85%
accuracy and the model approach 95% as the architecture becomes more complex. In term of number
of parameters, a higher number of parameters yield to better accuracy. Nonetheless, it does not grow
linearly with the score of accuracy, a configuration with 7269 parameters lead to an accuracy of 86.03%,
and increasing the number of parameters twelve times more (configuration 4) increase the accuracy only
by 1.5%.
The best results are obtained by the configuration 6. We get 95% accuracy and 140 553 parameters. To
reach the optimal results. we choose this architecture to continue the experiments and find the optimal
hyperparameters. We should note that changing those hyperparameters doesn’t affect the number of
parameters of the model however, it can affect the training time which is not crucial in this study as the
training will be performed just ono time.
Table 3 : Results with various configurations in terms of accuracy and number of parameters
Table 4 shows the effect of epochs, batch size, optimizer and learning rate on the validation set
in terms of accuracy. Based on these experiments, I selected 200 epochs, 100 as the batch size, Adam
optimizer with a learning rate of 0.001. The model with these hyperparameters achieves an accuracy of
95.39% on the validation set.
31
Internal
Accuracy
Hyperparameters definition Values
(%)
50 91.8
Number of times a whole dataset is passed 100 92.3
Epoch
through the model 150 93.9
200 95.3
16 89.4
32 91.8
Number of samples that will be propagated
Batch size 64 94.5
through the model in each training iteration.
100 95.3
132 94.3
SGD 89.1
Optimizer Algorithm to adjust the parameters for a model Adam 95.3
Rmsprop 91.2
0.001 95.3
The step size to adjust the weights after each
Learning rate 0.01 80.21%
iteration.
0.1 78.26%
Figure 15 shows that the 1D-CNN architecture has an irregular variation in the percentage of
accuracy with various number of filter size, it also indicates that the number of parameters is decreasing
along with larger filter.
The results showed that the configuration with a filter size of 20 achieves the highest score in accuracy
while it doesn’t require many parameters unlike the configuration with a filter size of 5 where the number
of parameters is much bigger, and the accuracy is lower.
According to [12] the convolution layers are learning what the informative and discriminative frequency
spectra are with regards to the task. First, with respect to the amount of information extracted, the small
kernel will have a smaller receptive field which means it will look at very few samples in the audio
signal at once whereas a large kernel will look at a larger frame of audio. This in turn would mean that
the features extracted by a small kernel would be highly local whereas the features extracted from the
large kernel would be generic and spread across the signal. Second, with respect to the memory and
number of parameters, small kernels would lead to slow reduction of signal dimensions which will result
in more parameters to connect the output of the convolutional layer to the dense layer, whereas large
kernels would decrease the size of the signal faster.
32
Internal
Figure 15 :classification accuracy and the number of parameters for various
number of filter size
Figure 16 shows a fluctuation in the accuracy of the model. Contrary to expectations, increasing
the number of filters doesn’t lead to better performance. However increasing the number of convolution
layer lead to better results. We notice less fluctuation in term of accuracy while adding a greater number
of layers. The model with 4 and 3 convolutions layers have approximately the same accuracy while
changing the number of filters contrary to model with one and two convolution layers. The highest
accuracy is obtained using 4 convolution layers and 192 filters. Adding more convolution layers lead to
a greater number of parameters as the number of parameters is related to the number of filters and filters
size used in each convolution layers. The deeper the network , number of parameters.
Figure 16 :classification accuracy and the number of parameters for various number of
convolution layers and filters
Table 5 shows the effect of the hop length on validation set in terms of both model accuracy and numbers
of parameters. [TxM] is the shape of the resulted spectogram, where T represent the number of frames
which directly depends on the size of the window and the hop length parameter, and the height M is
equal to one half of the Frame-length. The results show that the model with 128 as a hop length achieves
the best accuracy 90 % with a total number of parameters 797 765. As it is expected, shorter hop length
33
Internal
leads to more precision, better performance, and hence more computational resources. We notice a
slightly difference in terms of accuracy comparing to the increasing number of parameters while
decreasing the hop length.
Number of
Hop Length TxM Accuracy
parameters
533,573
300 4x513 89%
Table 5: Results of different Hop length for spectrograms on the Test set using a Window Length=1024
Figure 17 shows the percentage accuracy for one, two, and four dense layers with different
number of hidden neurons. it is important to emphasize that the number of parameters increase
exponentially with more hidden layers and more neurones.
The result showed that the MLP with only 2-hidden layer has the highest score in average accuracy,
with 60 hidden neurons. The architecture with only 1-hidden layer and 2-hidden layers gives better
performance in various number of neurones compared to 4-hidden layers.
As it is shown the number of hidden neurons influence the classification result, but the accuracy does
not increase regularly, we notice irregular variations while changing the number of layers and neurones.
This experiment indicates that the depth of neural network doesn’t always lead to better performance.
However more depth implies increased number of parameters and increased computational resources.
In different experiments with MLP, the highest accuracy doesn’t exceed 70% of accuracy on test set,
while it achieves 95% of accuracy on training set.This indicates that the model does not generalize well
to new data. This is one of the main challenges faced by the model, it arises while selecting best features
from a high dimensional input like images or signals in our case, Without extracting the best features,
the MLP is prone to overfit . It is difficult to extract best features for a particular task and moreover
these features must be hand-picked. Convolutional neural networks solve this problem easily as they
take care of feature selection.
34
Internal
Figure 17 : Accuracy for each number of dense layers with various number of hidden
neurons in MLP
Figure 18 shows the comparison between the model before and after quantization in terms of memory
and flash size. The results show that using quantization, we are moving from 495 KiB to only 256 KiB
in terms of ram size, likewise, the flash size is five times lesser using the quantization technique.
Figure 18: Results in terms of Flash and Ram before and after optimization of
Shazam-net
The best configuration of each model including shazam-net, the multi-layer-perceptron, 1D-
CNN, and 2D-convolution neural network is selected, then STM32 Cube AI platform is used to
benchmark models on different STM32 boards, then the quantization technique is used to reduce the
model size and perform the evaluation on STM32 U5 microcontroller.
Table 6 gathers the results of the best configurations of the discussed models according to the
number of parameters, accuracy, ram size, and flash size allocated for each model. To estimate the size
35
Internal
of ram and flash allocated, we use STM32 cube AI platform to run the models on STM32U5
microcontroller.
The results show that the shazam-net and 1D-cnn achieve the best accuracy. Shazam-net with only
144 553 parameters achieve a score of 95.50%, whereas 1D-Cnn, which need more parameters to
increase the accuracy with only 1%. The Multi-Layer-Perceptron (MLP) has the lowest memory size
allocated. However it doesn’t perform very well on the signals. It has a score that doesn’t exceed 70%.
The 2D-CNN has a good score of 90.24%, however it requires more size of memory as it must process
the new signal and extract the spectogram before feeding the data to the model. The size of memory and
flash are calculated after performing the quantization of models. The STM32U5 has 500kB of memory
allocated for audio signal recognition function and 2 MB of flash which can be sufficient for the
developed models.
Although the technical specifications require the use of STMU5 as it provides a trade-off between the
price and the computational resources, it will be interesting to compare the results of models on different
boards.STM32Microcontroller provides five families of microcontrollers with different performances,
we choose to run our model on a board of each family to compare the results. Figure 19 describes the
specifications of different STM32 board used for comparison.
36
Internal
Figure 19 :The specifications of different STM32 board
Figure 20 shows the inference time of Shazam-net model on different STM32 boards. The inference
time is the time it takes for the model to process the new data and make a classification for an input
signal. STM32U5 Figure
has an19inference time of
: The inference time2.5ofseconds, STM32F7
Shazam-net model onisdifferent
faster inSTM32
term of inference time
while the STM32L4 takes more time to 18
microcontrollersFigure process data and classify
:The specifications the signals.
of different the results demonstrates
STM32 board
that higher are the performance of microcontrollers the better are the results and the shorter is the time
of inference.
37
Internal
V.4- Qualitative analysis
The breaker is more likely to break down during tests with higher values of current, therefore
the class distribution is unbalanced. We have more signals with small current and low overcurrent than
the other classes. Usually, in such cases. Models which just predict correctly the most frequent class get
higher scores in terms of accuracy. However it is essential to know how the model is performing on
each class. Our aim is to get a model that can classify correctly a maximum number of signals with High
Overcurrent, in other words, the less is the error on High-Overcurrent class, the better is the performance
of a model. A trip caused by a high overcurrent requires the intervention of an expert and it can more
degrade the health status of the circuit-breaker, In addition, a model that confuses the class of no current
with high-overcurrent is more dangerous for the user than a model that make the confusion between the
small-overcurrent and high-Overcurrent class or Moderate -Overcurrent and High-overcurrent.
Table 6.5 shows the results of three best configurations of each model, 1D-CNN, 2D-CNN, and
MLP along with the proposed architecture of Shazam-Net. To compare the results, we use precision,
recall and F1-score. they rely on concepts of true positives and false negative, each time we consider a
class as the positive one and the others as negative to understand how the model is performing on each
class separately. As it is mentioned above, we are more interested on comparing the results on High-OC
class.
Shazam-Net yield to better results in terms of F1-score on the High-OC class compared to the
other models, it achieves a F1_score of 0.93, we notice that almost models have smaller scores on the
High-OC class and Moderate-OC class compared to other classes. The MLP has the lowest F1_score on
High-OC class. it doesn’t exceed a score of 50% on this class.
Table 7 : The results of the best models in term of F1-score on each class
Figure 21 shows the confusion Matrix of shazam-Net on Test set. The results indicates that the
model makes more confusion between moderate overcurrent and High overcurrent class than the other
classes which in not critical as the other cases. Three is no high-overcurrent signal predicted as small-
current or Low-OC, only one Moderate-OC signal is incorrectly classified as a Low-overcurrent.
38
Internal
Figure 21 : The confusion Matrix of Shazam-net
For further analyses, we are interested in exploring the signals predicted incorrectly by the
models to observe if there are the same while changing the model and if they seem different than the
other signals correctly classified.
The models are more likely to confuse Moderate-OC signals with High-OC. First we extract the
Moderate-OC signals incorrectly classified, then we retrieve the information from the file containing
the signal. The results show that the signal incorrectly classified are recorded under a value of current
equal to 2500A, which coincide with the extremity of the current range of Moderate-OC. We assume
the confusion of the model between high-OC and Moderate-OC signals is caused by the high value of
current.
39
Internal
VI- Conclusion and future Work
In the current work, two basic investigations have been performed. The first task was to
implement a model able to classify trip causes in the circuit breaker, the second one was to optimize
the model in order to be deployed in a microcontroller with limited resources.
This study proposes a network named shazam-net which is a combination of Convolution neural-
network and Long-short-term-memory. The model can classify accurately the trip causes of a circuit-
breaker from the raw-audio signal of the tripping sound. The network is trained using the audio signals
dataset recorded using the Compact NSX 250 product of Schneider Electric.
Compared to the traditional techniques working with the feature extracted manually, the proposed
approach can extract automatically relevant information from the raw audio signal and achieves a very
good results with low memory usage compared to the existing models of deep learning. The results have
shown that Shazam-Net and 1D-CNN achieve the best accuracy with respectively 95.39% and 96.57%.
However shazam-net requires less memory than the 1D-CNN.
It is essential to highlight that the proposed method is evaluated using the comparison with other models
on only one dataset. Future studies can be done by collecting more variety of audio signals from different
products of circuit-breaker to expand the range of application of the model. In addition, Further studies
should focus on trying other customized quantization algorithms to reduce both the memory
requirements and computational cost of models. Moreover, the recognition function of the trip cause
can be used to evaluate the health status of the breaker, needs concern the possibility to realize a reliable
diagnosis using the historic of the circuit breaker is still an inevitable problem. In the coming work, this
problem will be tackled.
40
Internal
References
1. N Huang, H Chen, S Zhang, et al., Mechanical fault diagnosis of high voltage circuit breakers based
on wavelet time-frequency entropy and one-class support vector machine, Entropy 18 (1) (2016) 7.
2. Wang, B.; Yang, K.; Wang, D.; Chen, S.-Z.; Shen, H.-J. The applications of XGBoost in Fault
Diagnosis of Power Networks. IEEE Innov. Smart Grid Technol. 2019, 3496–3500. [Google Scholar]
[CrossRef].
3. Qiuyu, Y., Jiangjun R., Zhijian, Z., & Daochun, H. (2019). Condition Evaluation for Opening Damper
of Spring Operated High-Voltage Circuit Breaker Using Vibration Time-Frequency Image. IEEE Sensors
Journal, 19(18), 8116-8126.
4. S Sun, T Zhang, Q Li, et al., Fault Diagnosis of Conventional Circuit Breaker Contact System Based
on Time–Frequency Analysis and Improved AlexNet, IEEE Transactions on Instrumentation and
Measurement 70 (2020) 1–12.
5. Chuan, L. & Qifeng, X. (2021). Fault diagnosis of circuit breaker based on bispectrum and two-
stream convolutional neural network. Electronic Measurment Technology.
6. Yupeng, C., Lin, L., Qiao, W., & Jianliang, Z. (2021). Fault diagnosis of high-voltage vacuum circuit
breaker with a convolutional deep network. Power System Protection and Control, 49(03), 39-47.
7. Qian Chen, Jiyang Wu, Qiang Li, Ximing Gao. Long Short-Term Memory Network-Based HVDC
Systems Fault Diagnosis under Knowledge Graph.
8. Sun Y, Luo L, Wang H, Sheng G, Jiang X . Mechanical Fault Diagnosis of Circuit Breaker Based on
MFCC and Improved SRC. IEE Innov. Electrical and Power Engineering (CEEPE).
9. Quan Sun, Xianghai Yu,Hongsheng Li. Open-circuit Fault Diagnosis Based on 1D-CNN for Three-
phase Full-bridge Inverter.
10. Zhiwu Shang, Wanxiang Li, Maosheng Gao, Xia Liu. An Intelligent Fault Diagnosis Method of
Multi-Scale Deep Feature Fusion Based on Information Entropy.
11. W.Zhang, G.Peng, C.Li, Y.Chen and Z.Zhang. A New Deep Learning Model for Fault Diagnosis
with Good Anti-Noise and Domain Adaptation Ability on Raw Vibration Signals. Physical Sensors.
12. Anonymous authors. How do deep convolutional neural networks Learn from raw audio
waveforms. Under review as a conference paper at ICLR 2018.
13. Xinyu Ye, Jing Yan, Yanxin Wang, Lei Lu, Ruixin He. A Novel Capsule Convolutional Neural
Network with Attention Mechanism for High-Voltage Circuit Breaker Fault Diagnosis. [ScienceDirect].
Electrical power system Research.
14. Iurii Lezhenin, Natalia Bogach.Urban Sound Classification using Long short-term memory neural
network. Institute of Computer Science and Technology.
15. Deyin Xu, Lin Luo, Qiao Wang. Online Fault diagnosis of high-voltage Vacuum Circuit Breaker
based on deep convolutional Long short-term memory network.
41
Internal
16. Ascensión G, Juan M. On combining acoustic and modulation spectrograms in an attention LSTM-
based system for speech intelligibility level classification.
17. https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-
networks/#:~:text=Convolutional%20neural%20networks%20do%20not,parallel%20for%20a%20give
n%20input
42
Internal
Université Hassan II Casablanca
École Nationale Supérieure d’Arts et Métiers Casablanca
Report title:
« Deep Learning-Based Fault diagnosis of circuit breakers Using acoustic
signals»
Abstract:
Deep learning has gained much interest in many fields of diagnosis, however there is
still lack of research on the field of fault diagnosis of circuit breaker. This paper proposes
Shazam-Net, a lightweight deep learning network based on the combination of convolution
neural network and recurrent neural network that take the raw audio signal of tripping sound
in the circuit breaker and classifies it into five classes of faults with an accuracy of 95%. The
network is composed of three parts: a convolution neural block used to extract information
from the raw audio waveform, Long-short term memory layers are added to retrieve the
temporal dependencies from the high dimensional output and finally a multi-layer-perceptron
is added to perform the classification of extracted features into five classes. Besides that,
techniques of quantization and pruning are used to compress the size of the model to be
embedded in devices with limited resources. The proposed approach is trained and validated
using sound signals collected from an embedded microphone recording the tripping event
inside the circuit breaker. The results show that the network has a high diagnostic accuracy
compared to other models that requires more memory.
43
Internal