Sensors: Partial Discharge Recognition With A Multi-Resolution Convolutional Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Downloaded from https://iranpaper.

ir
https://www.tarjomano.com https://www.tarjomano.com

sensors
Article
Partial Discharge Recognition with a
Multi-Resolution Convolutional Neural Network
Gaoyang Li 1 , Xiaohua Wang 1, *, Xi Li 2 , Aijun Yang 1 and Mingzhe Rong 1, *
1 State Key Laboratory of Electrical Insulation and Power Equipment, School of Electrical Engineering,
Xi’an Jiaotong University, Xi’an 710049, China; [email protected] (G.L.);
[email protected] (A.Y.)
2 The Global Energy Interconnection Development and Cooperation Organization (GEIDCO), Beijing 100031,
China; [email protected]
* Correspondence: [email protected] (X.W.); [email protected] (M.R.)

Received: 24 September 2018; Accepted: 16 October 2018; Published: 18 October 2018 

Abstract: Partial discharge (PD) is not only an important symptom for monitoring the imperfections
in the insulation system of a gas-insulated switchgear (GIS), but also the factor that accelerates the
degradation. At present, monitoring ultra-high-frequency (UHF) signals induced by PDs is regarded
as one of the most effective approaches for assessing the insulation severity and classifying the PDs.
Therefore, in this paper, a deep learning-based PD classification algorithm is proposed and realized
with a multi-column convolutional neural network (CNN) that incorporates UHF spectra of multiple
resolutions. First, three subnetworks, as characterized by their specified designed temporal filters,
frequency filters, and texture filters, are organized and then intergraded by a fully-connected neural
network. Then, a long short-term memory (LSTM) network is utilized for fusing the embedded
multi-sensor information. Furthermore, to alleviate the risk of overfitting, a transfer learning approach
inspired by manifold learning is also present for model training. To demonstrate, 13 modes of defects
considering both the defect types and their relative positions were well designed for a simulated GIS
tank. A detailed analysis of the performance reveals the clear superiority of the proposed method,
compared to18 typical baselines. Several advanced visualization techniques are also implemented to
explore the possible qualitative interpretations of the learned features. Finally, a unified framework
based on matrix projection is discussed to provide a possible explanation for the effectiveness of
the architecture.

Keywords: partial discharge; ultra-high-frequency signals; multi-resolution analysis; convolutional


neural network

1. Introduction
Gas-insulated switchgears (GISs) are widely used as the major control and protection equipment
in medium to ultra-high voltage substations, due to their superior compactness, high reliability,
strong dielectric strength, and maintenance-free properties. However, unavoidable imperfections
and deterioration in the insulation system pose serious threats to the reliability and safety of the
GIS and whole power grids [1–3]. The occurrence of partial discharges (PDs) is not only one of the
main characteristics that can effectively reflect the inner dielectric flaws, but is also the cause of the
accelerated degradation [4,5]. Therefore, it is imperative to assess the potential correlation between the
PD patterns and the defect types, so that corresponding maintenance activities can be taken before a
complete breakdown.
The conventional monitoring methods of PD could be summarized into three main categories:
chemical, electrical, and acoustic [6]. Among these, ultra-high-frequency (UHF) signals are becoming

Sensors 2018, 18, 3512; doi:10.3390/s18103512 www.mdpi.com/journal/sensors


Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 2 of 27

increasingly important due to their high sensitivity, better immunity against interference, and the ability
for real-time monitoring [7,8]. Considering the stochastic nature of PDs, machine learning has long
been the mainstream method for the diagnosis of UHF signals. The existing literature covers a wide
range of techniques, including Support Vector Machines (SVM) [9], Neural Networks (NN) [10–12],
cluster analysis [13], fuzzy logic [14] and so on. A more detailed taxonomy of NNs could give rise to
methodologies such as the self-organizing map (SOM) [10], Ensemble neural networks (ENN) [11],
and Probabilistic Neural Networks (PNN) [12]. Compared with classifiers, feature extraction plays
a more dominant role in the success of PD recognition due to the high dimensionality of the PD
data. A gamut of techniques have been applied to both the time-resolved partial discharge (TRPD)
and phase-resolved partial discharge (PRPD) patterns [6], of which the statistical parameters [15],
the time-power ratio map [16], fractal features [17], Fourier transform [18], wavelet analysis [12],
Hilbert-Huang transform (HHT) [19], and S-transform [20] are a few. In addition, some advanced
signal processing techniques such as compressed sensing [21] and principal component analysis
(PCA) [22] are also utilized. A summary of these implementations is shown in Table 1.
Although the cooperation of features and algorithms indicate a considerable degree of success,
it is pertinent to note that most of them suffer from a fundamental limitation: the carefully selective
feature extraction process usually calls for much human expertise and hence, it is subject to human
error. The significant randomness of the electromagnetic waves caused by propagation and reflection
makes the process even more intricate [23]. Besides, even with all these efforts, the features designed
for a particular task are not guaranteed to be the optimal choices for the other application scenarios
and datasets. Therefore, learning features directly from the dataset itself may be a better choice.
Among all of the analysis methods, time–frequency (TF) analysis is especially applicable, due
to its invertibility, and its ability to provide both time and frequency resolutions [24]. From the
perspective of manifold learning, the intrinsic advantage of the TF method is to obtain continuity
and compactness in a high-dimensional space at the cost of dimensionality. However, the high
dimensionality poses a substantial impediment for the traditional machine learning models. The matrix
compression algorithms for dimensionality reduction such as the two-dimensional PCA (2DPCA) [25]
and Non-negative matrix factorization (NMF) [26] inevitably share similar obstacles with the other
feature engineering-based methodologies, such as adaptation and generalization.
Fortunately, the recent advances in deep learning make it possible to extract high-level features
automatically from high-dimensional sensory data, and they demonstrate dramatic success in various
areas such as natural language processing, image classification, and auto-driving, of which the
Convolutional Neural Network (CNN) has been designed for vision-related tasks [27,28]. Therefore,
the combination of the CNN and TF maps becomes a potentially promising option for the PD signal
recognition tasks. Nevertheless, although it sounds transparent, directly applying CNN for TF maps is
not a good idea, due to the inherently non-local and serial characteristics of TF maps. In our preliminary
work [29], we proposed a CNN of frequency matching to obtain the semantic meaning. In this paper,
we further improve the framework into multiple columns to realize multi-resolution diagnosis, thus
promoting the diagnostic accuracy to a new level. The major contributions are summarized as follows.

1. We propose a deep multi-column architecture that incorporates multi-resolution information for


the recognition of UHF signals with specified temporal filters, frequency filters, and texture filters.
Multi-sensor fusion based on LSTM is also utilized to further improve the diagnostic accuracy.
2. To alleviate the risk of overfitting induced by the expanded structure, a spectral embedding
technique based on manifold learning is utilized for the pretraining. We demonstrate that
the transfer learning-based training process is able to achieve remarkable performance with
limited data.
3. A unified matrix projection-based framework is proposed to enhance the interpretability of the
proposed model.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 3 of 27

Table 1. Summary of the relevant features and classifiers that have been used for partial discharge (PD) recognition.

Reference PD Types Feature Extraction Classifier Recognition Accuracy


Four types of artificial
Umamaheswari and Sarathi [9] Partial power analysis SVM Average of 95.25%
insulation defects
Darabad et al. [10] Ten types of PD sources Texture features + PCA SOM Grouped data visualization
Six PD fault geometries in oil, air, Statistical parameters from the
Mas’ud et al. [11] ENN Average 95%
and poly-ethylene-terephthalate PRPD patterns
Four types of artificial PDs in oil Wavelet packet transform +
Evagorou et al. [12] PNN 97.49%, 91.9%, 100%, and 99.8%
and air. statistical parameters
Four types of artificial defect models in
Wang et al. [13] S-transform Affinity propagation clustering 99.67%
the oil and air.
Cumulative energy in time and
Four kinds of typical defects in a Fuzzy maximum-likelihood
Li et al. [14] frequency domain + mathematical 98%
252 kV GIS algorithm
morphological gradient
Statistical parameters of both the TRPD Dempster–Shafer evidence
Li et al. [15] Four kinds of defects in GIS 97.25%
and PRPD patterns theory + Neural network
Separation of PDs with The separation criteria
Albarracín et al. [16] Power ratios and the signal times Grouped data visualization
electromagnetic noises was given
Six types of artificial insulation defect
Wavelet packages + fractal features + 99.4, 94.5, 99.4, 91.9, 87.5,
Li et al. [17] models in the oil, air, Finding the closest centroid
Linear discriminant analysis and 97.7%, respectively.
and paper-fiber interface.
Three typical PD defect patterns in a Fourier transform + Chromatic
Wang et al. [18] SVM 86.67%
252 kV GIS methodology
Three common defect types of 25 kV
100%, when 5% random
Gu et al. [19] cross-linked polyethylene (XLPE) HHT + Fractal parameters NN
white noise
power cable
S-transform + Singular value
Dai et al. [20] Four kinds of artificial defects in GIS SVM 98.33%
decomposition (SVD)
Air voids with dimensions of 1, 1.5, 1-norm, 2-norm, and infinity-norm of 99.7%, 92.9%, 94.0%, and 81.6%
Majidi et al. [21] Sparse representation vector
and 2 mm the statistical features for four different scenarios
Parallel-plane stainless steel electrodes
Khan et al. [22] in SF6 with 10 different particle lengths Statistical features + PCA NN 88%
and positions
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 4 of 27

The rest of this paper is organized as follows. Some related theories are introduced first in
Section 2. Section 3 provides the details of both the model architectures and the transfer learning
process. The platform presented in Section 4 provides the necessary dataset for verifying the proposed
method, and Section 5 describes the recognition results. Based on the experimental results, Section 6
discusses the effectiveness of the proposed model. Finally, Section 7 concludes the paper.

2. Related Work
In deep learning, the most straightforward way to improve the model performance has shifted
from feature engineering to network architecture designing. ResNet [30] proposed by Microsoft is
a good example, which introduces an extra column with residual mapping, and has state-of-the-art
performance. Although the early implementation of the multi-column neural network only aimed
at accelerating the training process with multiple Graphics Processing Units (GPUs) [28], it has been
realized that the several columns can become specified experts for different features. For example,
the Inception model [31] is a successful multi-column deep neural network (DNN) to approximate the
optimal sparse structure of the input with different convolutional kernels. The Xception model [32] goes
even further by decoupling both the cross-channel and spatial correlations to obtain a multi-branch
structure. In addition, Saining exposed a new dimension called cardinality to standardize the
multi-branch structure [33]. Ling attempted to train a multi-column neural network with spectral
clustering to embed the hierarchical manifolds into a compact vector [34].
Although sharing the similar structure of multiple columns, the multi-resolution theories focus
more on the structures of different scales. Sander attempted to learn music features by extracting the
mel-spectrograms with different windows [35]. Michael utilized CNN to predict future images from
different resolutions and concluded that a lower resolution was able to capture the entire context, while
a higher resolution recovered more details [36]. The similar idea was also shared by MMDenseNet [37],
which enables the neural network to model both the fine-grained structures and the long contexts
efficiently. In the Deep Belief Network, the multi-resolution structures also showed their superiority in
image-related tasks [38]. Furthermore, the idea of multiple resolutions has long been integrated into the
underlying design of neural networks. The pooling layers and the variants, such as subsequent spatial
pyramid pooling all attempt to obtain the larger space by reducing the resolution of the intermediary
layers [39].

3. Proposed Method

3.1. Gabor Representations and Multi-Resolution Analysis


Fourier transform is a powerful tool to provide the frequency distributions of complex signals.
However, we must simply remark that the one-dimensional solutions are delocalized in time and
not comprehensive enough. The bidimensional functions that provide both the frequency and time
localizations are preferred. To introduce the time resolution, a time-localized window is utilized to
suppress the signal outside the neighborhood, which is the well-known short-time Fourier transform
(STFT) [24]: Z +∞
Fx (t, v; h) = x (u)h∗ (u − t)exp[− j2πvu]du (1)
−∞

where x (u) is the original signal, and h is the analysis window. t and v represent the time and frequency,
respectively. In practical applications, the discrete STFT is preferred, which is given by:
Z +∞
Fx [n, m; h] = Fx (nt0 , mv0 ; h) = x (u)h∗ (u − nt0 )exp[− j2πmv0 u]du (2)
−∞

where m, n ∈ Z. t0 and v0 are the time and frequency steps. Specifically, the STFT with Gaussian
windows is able to theoretically maximize the concentration in the TF plane by approaching the lower
bound of the Heisenberg–Gabor inequality [24]:
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 5 of 27

T×B≥1 (3)

Here, T is the time duration, and B is the normalized bandwidth. The STFT of this particular case
is also called the Gabor transform. The most important inference of the Heisenberg-Gabor inequality
is the trade-off between time and frequency resolutions in accordance with the width of the analysis
window. An
Sensors 2018, 18,illustrative
x example is shown in Figure 1. 5 of 26

Figure
Figure 1. Gabor representations
1. Gabor representations of
of different
different time
time and
and frequency
frequency resolutions:
resolutions: (a)
(a) windows
windows of
of different
different
lengths; (b) spectrogram
lengths; (b) spectrogramofofhigher
highertime
timeresolution;
resolution;
(c)(c) spectrogram
spectrogram of higher
of higher frequency
frequency resolution;
resolution; (d)
(d) spectrum of medium resolution.
spectrum of medium resolution.

As
As shown
shown in in Figure
Figure 1,1, the
the long
long window
window gives
gives rise
rise to
to harmonic
harmonic information,
information, while
while the
the pulses
pulses
emerge
emerge from the short time window. In this study, the spectrograms of different resolutions areare
from the short time window. In this study, the spectrograms of different resolutions all
all employed for the diagnosis of the UHF signals. It should be noted that
employed for the diagnosis of the UHF signals. It should be noted that the concept of multiple the concept of multiple
resolutions
resolutions isis different
different from
from thethe wavelet
wavelet or or S-transform,
S-transform, whichwhich still
still generate
generate only
only aa single
single resolution
resolution
for each frequency band.
for each frequency band.
Furthermore,
Furthermore, aa semiquantitative
semiquantitativemethod methodis isproposed
proposed as as
anan assistance
assistance criteria
criteria for for choosing
choosing the
the proper
proper analysis
analysis windows,
windows, considering
considering the the morphology
morphology characteristics
characteristics of the
of the STFTSTFT spectrums.
spectrums. As
As indicated
indicated by by Figure
Figure 2, centering
2, centering on on
thethe
max max value,
value, thethe time
time andand frequency
frequency ranges
ranges beforereaching
before reachinga
acertain
certainthreshold
thresholdare arecalculated,
calculated,whichwhichcancanbeberegarded
regardedasasaaroughrough description
descriptionof of the
the peak
peak shape.
shape.
Intuitively, the greater the time range is, the less the time resolution is retained, while
Intuitively, the greater the time range is, the less the time resolution is retained, while the frequency the frequency
isolation
isolation isismore
moreobvious.
obvious.The The frequency
frequency range
range varies
varies in aninopposite
an opposite
way. way. The proper
The proper window window
can be
can be selected
selected based onbasedthe on the trade-off
trade-off between between
the timethe and
timefrequency
and frequency resolutions,
resolutions, depending
depending on
on their
their priorities.
priorities.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors2018,
Sensors 18,x3512
2018,18, 66of
of26
27

Figure 2. Schematic diagram of the time range and frequency range before reaching the threshold.
Figure 2. Schematic diagram of the time range and frequency range before reaching the threshold.
3.2. Convolutional Neural Network and the Design of Convolutional Kernels
3.2. Convolutional Neural
In recent years, CNNNetwork
and itsand the Design have
derivatives of Convolutional
been regardedKernels
as the state-of-the-art choices for
computer vision
In recent tasks.
years, CNN The keyitsideas
and in CNNhave
derivatives include
beenthe concepts
regarded asof local
the receptive fields,
state-of-the-art shared
choices for
weights, and spatial subsampling. Specifically, in each convolutional layer, the feature
computer vision tasks. The key ideas in CNN include the concepts of local receptive fields, shared maps are first
convolved
weights, andwith several
spatial kernels. After
subsampling. the biases,
Specifically, activation
in each function
convolutional andthe
layer, sometimes, the are
feature maps pooling
first
operation, the
convolved withoutputs
severalare then fed
kernels. intothe
After the biases,
next convolutional layer, which
activation function and can be summarized
sometimes, as:
the pooling
operation, the outputs are then fed into the 
next convolutional 
layer, which can be summarized as:
x lj = f  ∑ xil −1 ∗ klij + blj  (4)
= i∈ Mj ∗ + (4)

where ∗ donates the convolutional operator. x lj−1 and xil are the input and output of the lth layer,
where ∗ donates the convolutional operator. and are the input and output of the lth layer,
respectively, and b is the bias. M j is a selection of the feature maps, and klij is the element of the
respectively, and is the bias. is a selection of the feature maps, and is the element of the
kernel matrix.
kernelThematrix.
activation function f is chosen as the Relu function for the convenience of the gradient
The activation
calculation. A simple function
parameterisanalysis
chosen canas the Relu as
be given function
follows. forAssuming
the convenience of the gradient
that the volume of input
calculation. A simple parameter analysis can be given as follows. Assuming that
is W1 × H1 × D1, after applying convolutions of k filters (without zero padding), the new volume the volume of input of
isW2 ×1× 1 × 1,
H2 × D2 is: after applying convolutions of filters (without zero padding), the new volume
of 2 × 2 × 2 is: (W1 − F1)
W2 = +1 (5)
1S1− 1
2 =( H1 − F2) + 1 (5)
H2 = 1 +1 (6)
S2
1− 2
2 = D2 = k +1 (7)
(6)
2
where F1 and F2 are the width and height of the filter cores, and S1 and S2 are the strides. One special
characteristic of CNN is the shift invariance. However, 2= although dramatically useful in image (7)
recognition,
where 1 andthe invariance features
2 are the width and pose an impediment
height for theand
of the filter cores, recognition
1 and of2Gabor representations
are the strides. One
that own a clear physical meaning. Any shift of a generally rectangular filter
special characteristic of CNN is the shift invariance. However, although dramatically useful may distort theinoriginal
image
TF distributions and confuse the CNN. To alleviate the restriction, three kinds of
recognition, the invariance features pose an impediment for the recognition of Gabor representationsconvolutional kernels
are designed
that own a clearin physical
accordance with the
meaning. Anyspectrograms of different
shift of a generally resolutions,
rectangular filternamely frequency
may distort filters,
the original
temporal filters, and texture filters.
TF distributions and confuse the CNN. To alleviate the restriction, three kinds of convolutional
kernelsAsare
shown by the
designed inred rectangles
accordance in Figure
with 3, the temporal
the spectrograms filters are
of different designed namely
resolutions, for mapsfrequency
of higher
time resolutions,
filters, and they
temporal filters, andshare the filters.
texture same widths with the maps, while the frequency filters are suitable
As shown by the red rectangles in Figure 3, the temporal filters are designed for maps of higher
time resolutions, and they share the same widths with the maps, while the frequency filters are
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 7 of 27


Sensors 2018, 18, x 7 of 26

for higher
suitable forfrequency resolutions,
higher frequency and theyand
resolutions, arethey
as high ashigh
are as the spectrograms. The texture
as the spectrograms. filters are
The texture of
filters
medium size as supplements.
are of medium size as supplements.

Figure 3. Schematic diagram of the different convolutional filters and their corresponding outputs:
Figure 3. Schematic diagram of the different convolutional filters and their corresponding outputs:
(a) temporal filter; (b) frequency filter; (c) texture filter.
(a) temporal filter; (b) frequency filter; (c) texture filter.

The designing principles of the special convolutional filters may be explained qualitatively with
The designing principles of the special convolutional filters may be explained qualitatively with
orthogonality, since the convolutional operation is basically a template matching process. The first
orthogonality, since the convolutional operation is basically a template matching process. The first
step is to make the filter cover either a complete frequency axis or a whole time axis, thereby allowing
step is to make the filter cover either a complete frequency axis or a whole time axis, thereby allowing
for perfect frequency or temporal matching. Second, by assuming that the filter is also a kind of
for perfect frequency or temporal matching. Second, by assuming that the filter is also a kind of
special signal, it is assumed that orthogonality exists between the unfolding direction of the filters
special signal, it is assumed that orthogonality exists between the unfolding direction of the filters
and the spectrograms, to maximize the diversity in a filter as much as possible. Following this idea,
and the spectrograms, to maximize the diversity in a filter as much as possible. Following this idea,
the frequency filters are applied to the Gabor maps of higher frequency resolutions, while the temporal
the frequency filters are applied to the Gabor maps of higher frequency resolutions, while the
filters are applied to the higher time resolutions. Another advantage of the frequency and temporal
temporal filters are applied to the higher time resolutions. Another advantage of the frequency and
filters refers to Equations (5) and (6). The outputs of the frequency and temporal filters are only
temporal filters refers to Equations (5) and (6). The outputs of the frequency and temporal filters are
one-dimensional, which dramatically decreases the model’s size.
only one-dimensional, which dramatically decreases the model’s size.
3.3. Architecture of the Proposed Model
3.3. Architecture of the Proposed Model
As has been mentioned before, TF maps of different resolutions can reveal different rhythms in
As has been mentioned
the 2-dimensional space. It isbefore, TF mapsthat
hypothesized of different
it would resolutions can reveal
be of great benefit different provide
to explicitly rhythmsTF in
the 2-dimensional space. It is hypothesized that it would be of great benefit to explicitly
maps of different resolutions as inputs to a deep model. This combines the multiple views with deep provide TF
maps of different
learning’s resolutions
ability to as inputs
extract intrinsic to a deepfrom
properties model.
rawThis combines
inputs, so thatthe multiple
the views with
discriminative deep
features
learning’s ability to extract intrinsic properties from raw inputs, so
could be learned while maintaining explanatory factors with physical meaning. that the discriminative features
couldAccordingly,
be learned while maintaining
the proposed explanatorynetwork
multi-resolution factors with physical meaning.
is summarized in Figure 4. It mainly contains
Accordingly, the proposed multi-resolution network
three parts, namely, single-resolution embedding, multi-resolution is summarized
fusion, and in multi-sensor
Figure 4. It mainly
fusion.
contains three parts, namely, single-resolution embedding, multi-resolution fusion, and multi-sensor
Here, the first part is realized by three subnetworks; the second part is a fully-connected neural network
fusion. Here,
attached at thethe
endfirst part
of the is realized and
subnetworks, by three
Long subnetworks;
Short-Term Memorythe second
(LSTM) part
[40]isisaimplemented
fully-connected
for
neural network attached
multi-sensor fusion. at the end of the subnetworks, and Long Short-Term Memory (LSTM) [40]
is implemented for multi-sensor fusion.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018,
Sensors 18,18,
2018, 3512
x 8 8ofof2726

Figure 4. Structure of the multi-resolution CNN and the LSTM fusion.


Figure 4. Structure of the multi-resolution CNN and the LSTM fusion.
1. Single-resolution embeddings with the subnetworks
1. Single-resolution embeddings with the subnetworks
Each branch of the subnetworks has its own micro-structure, which is basically a CNN with
the convolutional
Each branch operation. The main
of the subnetworks hasdifferences lie in the specified
its own micro-structure, whichdesigned frequency
is basically a CNN with filters,
the
temporal filters, and texture filters. After the convolutional filter, an activation
convolutional operation. The main differences lie in the specified designed frequency filters, temporal function, a pooling
layer, and
filters, and a dropout layer are
texture filters. Afterarranged hierarchically
the convolutional as an
filter, a basic building
activation block,awhich
function, poolingrepeats
layer,itself
and a
twice
dropoutbefore connecting
layer are arrangedto a flattening layer.asFinally,
hierarchically a basicabuilding
fully-connected layer repeats
block, which is attached
itselfattwice
the end of
before
each subnetwork
connecting to a toflattening
generate an embedding
layer. Finally, vector as the intermediate
a fully-connected layer isfeature space.
attached at the end of each
subnetwork to generate an embedding vector as the intermediate feature space.
2. Multi-resolution information fusion using the fully-connected neural network
2. Multi-resolution information fusion using the fully-connected neural network
The fully connected neural network is a normal multi-layer perceptron (MLP) containing two
layersThe of 100 units
fully and 50 neural
connected units, respectively,
network is athat are attached
normal multi-layer after the concatenation
perceptron of the three
(MLP) containing two
subnetworks.
layers of 100 unitsThe model
and 50automatically choosesthat
units, respectively, thearebest filters by
attached assigning
after them higher
the concatenation ofweights,
the three
and the information
subnetworks. of different
The model resolutions
automatically is merged
chooses the best to filters
offer comprehensive
by assigning them representations
higher weights, of
compact
and theforms.
information of different resolutions is merged to offer comprehensive representations of
compact forms.
3. Multi-sensor information fusion using the LSTM
3. Multi-sensor information fusion using the LSTM
The propagation and reflections of UHF signals in the GIS can significantly influence the
The propagation
characteristics andespecially
of the signals, reflections of UHF
at the signals
L corner for theininconsistency
the GIS canofsignificantly influence[41].
the wave impedance the
characteristics
Therefore, of the signals, of
the characteristics especially
sensors at
ofthe L corner
different for the inconsistency
positions of the wave
can reveal different impedance
patterns. [41].
By fusing
Therefore,
the information the from
characteristics of sensors
different sensors, it isofpossible
different topositions can reveal
obtain a more different patterns.
comprehensive understandingBy fusing
of
thePD
the information
patterns. from different sensors, it is possible to obtain a more comprehensive understanding
of the PD patterns.
Recurrent neural network (RNNs) [42] is the most popular choice for sequence mining tasks
such asRecurrent neural network
speech recognition, natural(RNNs)
language [42] is the most
translation, andpopular choice for sequence
video recognition. As shownmining
in Figuretasks
5,
such as speech recognition, natural language translation, and video recognition. As shown in Figure
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 9 of 27


Sensors 2018, 18, x 9 of 26

5,inineach
eachtime
timestep
stept, the
, the hidden
hidden state ht ℎis updated
state is updated
by by a function
a function of the
of the lastlast hidden
hidden ht−ℎ1 andand
state
state the
the
new new input
input xt : :
ℎ =ht = tanh
ℎ (Wx+ t +ℎ Uht+
−1 + b ) (8)(8)
where the
where thehidden
hiddenstate stateℎht is
is aa d-dimensional
d-dimensionalvector, and b is
vector,and is the bias. W and
the bias. and U arearethe
theweights.
weights.
Tanh is the activation function. LSTM is a branch of RNN, which introduces
Tanh is the activation function. LSTM is a branch of RNN, which introduces a memory cell to preserve a memory cell to preserve
theinformation
the informationfor forlong
longsequences,
sequences, andand relieves
relieves thethe exploding
exploding or vanishing
or vanishing gradients
gradients problem
problem in
in the
the original
original RNNRNN [40].[40].
The The
basicbasic
unit unit
of LSTM of LSTM
can can be explained
be explained as aascollection
a collection
of aofblock
a block input, xat ,
input
a memory
memory cellcell ,cand
t , and a hidden
a hidden stateℎ h, tcontrolled
state , controlledbybyananinput gate i,t ,aaforget
inputgate gate f t,,and
forgetgate andananoutput
output
gate ot as
gate as shown
shownin inFigure
Figure5,5,with
withthe
thefollowing
followingrelationships:
relationships:
= + ℎ + (9)(9)
 
i t = σ W ( i ) x t + U ( i ) h t −1 + b i

=  + ℎ +  (10)
f t = σ W ( f ) x t + U ( f ) h t −1 + b f (10)
=  + ℎ +  (11)
o t = σ W ( o ) x t + U ( o ) h t −1 + b o (11)
= ℎ  + ℎ +  (12)
ut = tanh W (u) xt + U (u) ht−1 + bu (12)
= ⊙ + ⊙ (13)
ct = it ut + f t c t −1 (13)

ℎ =ht =⊙ot ℎ (ct )


tanh (14)
(14)
where
where σσ is the sigmoidsigmoid function
functionand and ⊙donates
donates thethe point
point multiplication.
multiplication. W (i) , W,( f ) , W ,(o) , W (,u) , U (i,) ,
U ( f,) , U (o,) , and, U (u) are the
and areweights. bi , b f , bo,, and
the weights. , bu, are
andthe biases.
are theIntuitively,
biases. Intuitively,
the inputthe input
gate and
gate and thegate
the output output gatehow
control control
muchhow themuch the old memories
old memories and newand new information
information are recombined
are recombined as the new as
the new memory
memory cell. Thecell. The hidden
hidden state ht state ℎ is partial
is a gated, a gated,view
partial view
of the of the internal
internal memorymemoryof the cell. of the cell. By
By varying
varying
the gatethe gate values
values in different
in different time steps,
time steps, the LSTM the LSTM can keep
can keep itselfitself up-to-date
up-to-date whilewhile maintaining
maintaining the
the
most most discriminable
discriminable information
information acquired
acquired from from different
different time Finally,
time steps. steps. Finally,
the featuresthe features
from differentfrom
different
sensors are sensors are integrated
integrated for information
for information fusion tofusion to give
give the finalthe final diagnosis
diagnosis results.results.
AsAs the net structure is very complex, more sophisticated techniquesare
the net structure is very complex, more sophisticated techniques arerequired
requiredto torelieve
relievethe the
risk
riskof ofoverfitting.
overfitting.Details
Detailsof ofthe
thetraining
trainingprocess
process will will be
be presented
presented in in the
the next
next part.
part.

Architecturesof
Figure5.5.Architectures
Figure ofLSTM
LSTMand
andaageneral
generalRNN.
RNN.

3.4. Model Training by Transfer Learning


Although the multi-resolution views and the multi-steam structures gain great potential for
more comprehensive diagnosis, a larger number of parameters accompanies its bigger size, making
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 10 of 27

3.4. Model Training by Transfer Learning


Sensors 2018, 18, xthe
Although multi-resolution views and the multi-steam structures gain great potential for 10 of 26
more
comprehensive diagnosis, a larger number of parameters accompanies its bigger size, making the
the model
model moremore
proneprone to overfitting.
to overfitting. Thus,
Thus, a transfer
a transfer learning
learning framework
framework wasproposed
was proposedthatthatclassifies
classifies
thetraining
the training process
process into
into the
thethree
threestages
stagesofof
single-resolution
single-resolution embedding,
embedding,multi-resolution
multi-resolutionfusion, and
fusion,
multi-sensor fusion, in accordance with the network’s
and multi-sensor fusion, in accordance with the network’s structure. structure.
Transferlearning
Transfer learningisisaatechnique
techniquefocusing
focusingon ontransferring
transferringknowledge
knowledgegained
gainedfrom
fromone
oneproblem
problem
toaadifferent
to differentbut
butrelated
relatedtask task[43].
[43].The
Theparameters
parametersof ofthe
thetarget
targetnetwork
networkare
arepartly
partlyororall
allinitialized
initialized
from another network that is pretrained from the source task, while the nontransferred
from another network that is pretrained from the source task, while the nontransferred weights are weights are
randomlyinitialized.
randomly initialized.In Inmost
mostcases,
cases,transfer
transferlearning
learningisisdone
doneby byweight
weighttransfer.
transfer.
The usage of transfer learning was more flexible in this study, where the source and target
The usage of transfer learning was more flexible in this study, where the source and target problem
problem use the same dataset. The key idea is relative independence among
use the same dataset. The key idea is relative independence among columns of different resolutions.
columns of different
resolutions. The same dataset is much less prone to overfitting when it is applied to a smaller
The same dataset is much less prone to overfitting when it is applied to a smaller problem. Therefore,
problem. Therefore, it is beneficial if the single-column is trained first independently. The knowledge
it is beneficial if the single-column is trained first independently. The knowledge is then transferred to
is then transferred to the higher levels for more sophisticated targets. The details of the three learning
the higher levels for more sophisticated targets. The details of the three learning stages are shown in
stages are shown in Figure 6.
Figure 6.
1. The single-column features extracted by each subnetwork are learned using manifold embedding.
1. The single-column features extracted by each subnetwork are learned using manifold embedding.
2. The weights of the MLP attached at the end of the three subnetworks are randomly initialized,
2. The weights of the MLP attached at the end of the three subnetworks are randomly initialized,
and the whole structure is fine-tuned.
and the whole structure is fine-tuned.
3. The weights gained from step (2) are frozen as feature extractors. Only the weights of the LSTM
3. The weights gained
are updated from step fusion.
for multi-sensor (2) are frozen as feature extractors. Only the weights of the LSTM
are updated for multi-sensor fusion.

Figure 6. Three stages of the knowledge transfer process.


Figure 6. Three stages of the knowledge transfer process.
3.4.1. Single Column Embedding with Manifold Learning
3.4.1.It Single
is worthColumn
notingEmbedding with Manifold
that the subnetwork Learning
that embeds the high-dimensional data into a fixed vector
is similar
It is to a general
worth dimensionality
noting reduction
that the subnetwork thatproblem
embedsthat
the maps the high dimensional
high-dimensional data into points
a fixedonto a
vector
is similar to a general dimensionality reduction problem that maps the high dimensional points onto
a low dimensional manifold to make the similar points adjacent. Although some general
dimensionality reduction frameworks exist [44,45], the same key point is to preserve the distances of
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 11 of 27

low dimensional
Sensors 2018, 18, x manifold to make the similar points adjacent. Although some general dimensionality 11 of 26
reduction frameworks exist [44,45], the same key point is to preserve the distances of the original
the original
vertex pairs.vertex
Givenpairs. Given
the data the
x1,..., xUdata , the lossparameterized
,…, function
, the loss function parameterized by α and embedding
by α and embedding function
function
f is: is:
U
= = ∑ L∥ k f ( x, i , α)−− f x, j , α ∥k−− Wij
 
Loss (15)
(15)
i,j
,

where Wij isisthe


where theelement
element of ofa pairwise
a pairwisedistance matrix
distance in ain
matrix class-dependent
a class-dependent or class-independent way.
or class-independent
For
way.example, Multidimensional
For example, Multidimensional scaling (MDS)(MDS)
scaling [46] can becan
[46] usedbe in thisinpattern
used by defining
this pattern Wij as the
by defining
Euclidean distance.
as the Euclidean For theFor
distance. specified problem,
the specified the distances
problem, are defined
the distances in a class-dependent
are defined way as
in a class-dependent
follows. Wij = 0 if i and
way as follows. = 0 j ifbelong
andto the sametoclass,
belong and 1class,
the same otherwise.
and 1 otherwise.
Furthermore, to make the loss function learnable
Furthermore, to make the loss function learnable and more and more suitable for the for
suitable subnetworks, a Siamesea
the subnetworks,
network structure [47] is implemented to realize the pairwise affinity calculation.
Siamese network structure [47] is implemented to realize the pairwise affinity calculation. A Siamese networkA
is a special
Siamese neural is
network network
a special with two symmetric
neural network with branches, as shown branches,
two symmetric in Figure 7.asEach
showntime, only one
in Figure 7.
subnetwork is fabricated into the Siamese structure for pretraining with the following
Each time, only one subnetwork is fabricated into the Siamese structure for pretraining with the loss function:
following loss function: 
 k f ( x i , α ) − f x j , α k2

i f Wij = 0
Loss = ∥ , − , ∥ =0 (16)
=  max 0, m − k f ( x , α) − f x , αk2
 
if =Wij1 = 1 (16)
0, −∥ , i − , j ∥
where is the margin. The loss function is different from MDS in two ways. First, the distance is
where m is the margin. The loss function is different from MDS in two ways. First, the distance
class-dependent. Second, the losses are chosen as the hinge losses for better generalization. After
is class-dependent. Second, the losses are chosen as the hinge losses for better generalization.
accomplishing the model construction, Adadelta optimizer is utilized for the weights update.
After accomplishing the model construction, Adadelta optimizer is utilized for the weights update.

Figure
Figure 7. Siamese network
7. Siamese network for
for embedding
embedding the
the feature
feature maps
maps into
into aa constant
constant vector.
vector.
3.4.2. Transfer Learning for Cascaded Training
3.4.2. Transfer Learning for Cascaded Training
For the other two steps, the weights of the MLP that were attached at the end of the three
For the other two steps, the weights of the MLP that were attached at the end of the three
subnetworks were randomly initialized first, and then another layer with the same node number of
subnetworks were randomly initialized first, and then another layer with the same node number of
the classes was connected to make a complete single-sensor diagnosis model. After fine-tuning with
the classes was connected to make a complete single-sensor diagnosis model. After fine-tuning with
Adadelta, the weights gained from step (2) were frozen, and they only acted as feature extractors for
Adadelta, the weights gained from step (2) were frozen, and they only acted as feature extractors for
the information fusion of different sensors, while the weights of the LSTM were made to be trainable.
the information fusion of different sensors, while the weights of the LSTM were made to be trainable.
The convergence speeds of the two steps were likely to be much faster compared with the first step, due
The convergence speeds of the two steps were likely to be much faster compared with the first step,
to the existing class-dependent knowledge. In addition, some advanced training techniques such as
due to the existing class-dependent knowledge. In addition, some advanced training techniques such
early stopping and heuristic learning rate adaption were also utilized.
as early stopping and heuristic learning rate adaption were also utilized.

4. PD Laboratory Setup
For the purpose of the experimental study, a simulative L-shaped GIS tank with a coaxial
cylindrical conductor was implemented to simulate the realistic equipment, as shown in Figures 8
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 12 of 27

4. PD Laboratory Setup
Sensors
For2018,
the18,purpose
x of the experimental study, a simulative L-shaped GIS tank with a coaxial 12 of 26
cylindrical conductor was implemented to simulate the realistic equipment, as shown in
and 9a, 8with
Figures and the central
9a, with theconductor diameterdiameter
central conductor being 90beingmm,90 and
mm,theand
enclosure tank diameter
the enclosure tank diameterbeing
320 mm. The tank was filled with SF6 gas of 0.1 MPa to simulate the
being 320 mm. The tank was filled with SF6 gas of 0.1 MPa to simulate the propagation medium. propagation medium.
Four planar
Four planar equiangular
equiangular spiral spiral antennas
antennas (PESAs)
(PESAs) withwith impedance
impedance transformers
transformers were were installed
installed
inside the
inside the GIS
GISchamber
chamberininthe thehand
hand holes
holesparallel to the
parallel to axis
the of theofGIS
axis thebus-bar, as shown
GIS bus-bar, in Figures
as shown in
Figures 8 and 9b. The PESA utilized in this study is an ultra-wideband antenna that can beinternally
8 and 9b. The PESA utilized in this study is an ultra-wideband antenna that can be installed installed
in the GIS.
internally Its GIS.
in the outside and inside
Its outside radii are
and inside radii109
are mm
109 mmandand2 mm,
2 mm, respectively,
respectively,withwith the substrate
the substrate
thickness being 1 mm. The other key parameters that can affect the performance
thickness being 1 mm. The other key parameters that can affect the performance of the PESA sensors of the PESA sensors
include the relative dielectric constant, the spiral growth rate, and the rotation
include the relative dielectric constant, the spiral growth rate, and the rotation angle, which are chosen angle, which are
chosen
as as 2.65,
2.65, 0.364, 0.364,
and 3.5π,and 3.5 , respectively.
respectively. In practice,In practice,
the most the most commonly
commonly quoted parameter
quoted parameter to evaluate to
evaluate the performance of the antenna is the reflection coefficient S 11, which represents how much
the performance of the antenna is the reflection coefficient S11 , which represents how much power
power
is is reflected
reflected from the from the antenna.
antenna. The range
The range of theofbandwidth
the bandwidththat that satisfies
satisfies S11 S<11−<10
−10dB dBwaswas0.8 0.8GHz
GHz
to 3.5
to 3.5 GHz
GHz from
fromthe thelower
lowerlimit
limittotothethehigher
higherlimit
limitfor
forthe
thePESAs.
PESAs.Besides,
Besides,thetheSS11parameter
parameter ranged ranged
11
between −8 dB to −6 dB from 0.2 GHz and 0.7 GHz, and close to −10 dB in the 0.7 GHz to 0.8range,
between −8 dB to −6 dB from 0.2 GHz and 0.7 GHz, and close to −10 dB in the 0.7 GHz to 0.8 GHz GHz
which which
range, satisfied the measurement
satisfied the measurement requirements
requirements of theof UHF
the UHFsignals. TheThe
signals. detailed
detailed S11 Scurve
curvecancanbe
11
found in our previous study [48]. The geometrical dimensions of the
be found in our previous study [48]. The geometrical dimensions of the GIS tank and the relative GIS tank and the relative
distances between
distances between the the PESAs
PESAs are arealso
alsoillustrated
illustratedin inFigure
Figure8.8.
High voltage
High voltage waswas introduced
introduced by by an an insulating
insulating bushing
bushing connected
connected with with aa non-PD
non-PD testing
testing
transformer (Xinyuan Electric, Yangzhou, China, YDTW-30/150) with
transformer (Xinyuan Electric, Yangzhou, China, YDTW-30/150) with an amplitude of 0–150 kV, and an amplitude of 0–150 kV, and a
a capacity of 30 kVA. The four PESAs received the UHF waves successively,
capacity of 30 kVA. The four PESAs received the UHF waves successively, based on their locations. based on their locations.
Furthermore,aafour-channel
Furthermore, four-channeldigitaldigital oscilloscope
oscilloscope (Tektronix,
(Tektronix, Beaverton,
Beaverton, OR, USA,
OR, USA, DPO7354C,
DPO7354C, 3.5
3.5 GHz,
GHz, 40 GS/s) was employed to acquire and record
40 GS/s) was employed to acquire and record the TPRD UHF patterns. the TPRD UHF patterns.

Figure 8. Schematic diagram of the GIS tank, PESA sensors, and the experimental circuit.
Figure 8. Schematic diagram of the GIS tank, PESA sensors, and the experimental circuit.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 13 of 27


Sensors 2018,
Sensors 2018, 18,
18, xx 13 of
13 of 26
26

Figure
Figure 9. GIS experimental
9. GIS
GIS platform: (a)
experimental platform:
platform: (a) GIS
GIS tank
tank and
and the
the positions of
positions of the
of the UHF
the UHF sensors;
UHF sensors; (b)
sensors; (b) the
(b) the
the
Figure 9. experimental (a) GIS tank and the positions
installation
installation of
of the
the UHF
UHF sensors
sensors inside
inside the
the tank.
tank.
installation of the UHF sensors inside the tank.
Five types of PDs were fabricated to simulate the various insulation defects, including the floating
Five types of of PDs
PDs were
were fabricated
fabricated to to simulate
simulate the the various
various insulation
insulation defects,
defects, including
including the the
electrode, types
Five a metal protrusion on the conductor and the tank, surface contamination, and free metal
floating
floating electrode, a metal protrusion on the conductor and the tank, surface contamination, and free
particles.electrode, a metal
Besides, three protrusion
relative angles on the conductor
between the defect and the tank,
position andsurface
the PESA contamination,
antennas, which andwere
free
metal
metal
◦ ◦
particles.◦Besides,
particles. Besides, three
three relative
relative angles
angles between
between the the defect
defect position
position and and the
the PESA
PESA antennas,
antennas,
0 , 90 , and 180 , were integrated with the five defect types in the defect simulations.
which were
which were 0°,
0°, 90°,
90°, and
and 180°,
180°, were
were integrated
integrated withwith the
the five
five defect
defect types
types in in the
the defect
defect simulations.
simulations.
As shown in Figure 10, the floating electrode defect was simulated by fixing two adjacent copper
As shown in Figure 10, the floating electrode defect was simulated by fixing two adjacent copper
nuts As shown
to an in Figure
insulated bolt10,
thatthewas
floating electrode
attached on the defect was simulated
conductor. The nuts bywere
fixing3 two
mmadjacent
away from copper
the
nuts to
nuts to anan insulated
insulated bolt
bolt that
that was
was attached
attached on on the
the conductor.
conductor. The The nuts
nuts were 33 mm mm awayaway fromfrom the
the
conductor, and the distance between the copper nuts was 1 mm. Two were
kinds of metal protrusions
conductor,
conductor, and the
and the distance between the copper nuts was 1 mm. Two kinds of metal protrusions were
were replicated bydistance
a 30 mmbetweenneedle the copper
adhered onnuts
the was 1 mm. Two
high-voltage kinds of metal
conductor protrusions
and tank, were
respectively.
replicated by
replicated by aa 30
30 mm
mm needle
needle adhered
adhered on on the
the high-voltage
high-voltage conductor
conductor and and tank,
tank, respectively.
respectively.
2
Surface
Surface
Surface contamination was replicated by a piece of aluminum foil 2(2 × 20 mm ) adhered to the
contamination was
contamination was replicated
replicated by by aa piece
piece ofof aluminum
aluminum foil foil (2
(2 ×× 20
20 mm
mm2)) adhered
adhered to to the
the surface
surface ofof the
the
surface of the spacer. Finally, the free particles defect was produced by connecting an insulation
spacer.
spacer. Finally,
Finally, the free
the free particles
particles defect was
defectparticles produced
was produced by connecting
by connecting an insulation tube that contained
tube that contained some aluminum inside the conductor, an withinsulation
the length tubeof that contained
the insulation
some aluminum
some aluminum particles
particles inside
inside thethe conductor,
conductor, with with the
the length
length of of the
the insulation
insulation tubetube being
being 20 20 mm.
tube being 20 mm. The first four types of defects were arranged at three relative angles betweenmm. the
The first
The first four
four types
types ofof defects
defects were
were arranged
arranged at at three

three

relative
relative ◦
angles between
angles between the the defect
defect positions
positions and
and
defect positions and the sensor position for 0 , 90 , and 180 , as shown in Figure 8, generating 13 PD
the sensor
the sensor position
position forfor 0°,
0°, 90°, andand 180°,
180°, asas shown
shown in FigureFigure 8, 8, generating
generating 13 13 PD
PD conditions
conditions in in total
total
conditions in total (the free90°,
metal particles were onlyinsimulated at 90◦ ). Figure 11 shows some typical
(the free
(the free metal
metal particles
particles were only only simulated
simulated at at 90°).
90°). Figure
Figure 11 11 shows
shows somesome typical
typical acquisitions
acquisitions of of the
the
acquisitions of the UHF were signals within 100 ns, and their corresponding frequency spectra.
UHF signals within 100 ns, and their corresponding
UHF signals within 100 ns, and their corresponding frequency spectra. frequency spectra.

Figure 10. Typical PD defect patterns: (a) floating electrode; (b) metal protrusion on the conductor;
Figure 10.
Figure 10. Typical
Typical PD defect
defect patterns:
patterns: (a)
(a) floating electrode;
electrode; (b) metal
metal protrusion
protrusion on on the
the conductor;
conductor;
(c) metal protrusionPD
on the tank; (d) surfacefloating
contamination; (e)(b)
free metal particles.
(c) metal
(c) metal protrusion
protrusion on
on the
the tank;
tank; (d)
(d) surface
surface contamination;
contamination; (e)
(e) free
free metal
metal particles.
particles.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 14 of 27


Sensors 2018, 18, x 14 of 26

Figure 11. Typical UHF waveforms: (a) UHF signals of five different defect types; (b) UHF signals
Figure 11. Typical UHF waveforms: (a) UHF signals of five different defect types; (b) UHF signals
acquired from the four channels; (c) UHF signals of the three relative angles.
acquired from the four channels; (c) UHF signals of the three relative angles.
5. Experiment Evaluations
5. Experiment Evaluations
A detailed analysis was carried out to ascertain the model’s discriminatory capability. Both the
relative A detailed
angles andanalysis was carried
defect types out to into
were taken ascertain the model’sincluding
consideration, discriminatory capability.
the floating Both the
electrode (0◦ ,
90◦ ,relative
and 180angles
◦ ), theand
metaldefect types were
protrusion taken
on the into consideration,
conductor including
(0◦ , 90◦ , and themetal
180◦ ), the floating electrodeon(0°,
protrusion the
90°,◦and ◦180°), the metal
◦ protrusion on the conductor◦ (0°,
◦ 90°, and ◦180°), the metal protrusion
tank (0 , 90 , and 180 ), the surface contamination (0 , 90 , and 180 ), and the free particles (90 only), on
◦ the
tank (0°, 90°, and 180°), the surface contamination (0°, 90°, and 180°), and the free particles (90° only),
thus creating 13 defect modes. There were 1386 samples in total, of which 20% were used for testing.
thus creating 13 defect modes. There were 1386 samples in total, of which 20% were used for testing.
Two cases were designed for recognition with the same dataset, namely, (1) the combined diagnosis of
Two cases were designed for recognition with the same dataset, namely, (1) the combined diagnosis
positions and defect types together, which had 13 classes in total, and (2) diagnosis of the defect types
of positions and defect types together, which had 13 classes in total, and (2) diagnosis of the defect
only, which has five classes only.
types only, which has five classes only.
5.1. Implementation Details
5.1. Implementation Details
First, the lengths of the analysis windows were chosen based on the frequency decline range and
First, the lengths of the analysis windows were chosen based on the frequency decline range and
the time decline range described in Figure 2, and 50% of the max value was chosen as the threshold.
the time decline range described in Figure 2, and 50% of the max value was chosen as the threshold.
Instead of using
Instead thethe
of using ranges directly,
ranges their
directly, theirproportions
proportionsin
inthe
thewhole
whole analysis timeaxis
analysis time axisand
andfrequency
frequency
axisaxis
were calculated for ease of comparison. The time range limit and frequency range
were calculated for ease of comparison. The time range limit and frequency range limit were limit were
chosen
chosen as 100 ns and 3 GHz, respectively. The sampling rate was 10 GHz, indicating that 0.1 nsnsisis
as 100 ns and 3 GHz, respectively. The sampling rate was 10 GHz, indicating that 0.1
added when the window length increases by 1. Figure 12 shows the variations of the percentages of the
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, x 15 of 26


Sensors 2018, 18, 3512 15 of 27

added when the window length increases by 1. Figure 12 shows the variations of the percentages of
the average
average time time and frequency
and frequency ranges.ranges. It is obvious
It is obvious thatwindow
that as the as the length
window length increased,
increased, the
the frequency
frequency resolution
resolution increases asincreases as well,
well, while whileresolution
the time the time resolution
showed anshowed
oppositean trend.
opposite trend.to
In order Inretain
order
to retainresolution
enough enough resolution
information information
for both thefor both
axis, 30%the
wasaxis, 30% was
selected selected
as the as the
threshold. It isthreshold. It is
observed that
observed
the window that the window
lengths lengths
were near were
6 and 100 near 6 and
at this 100Therefore,
point. at this point. Therefore,window
the temporal the temporal window
and frequency
and frequency
window window
were chosen as 6were
and chosen as 6 and 100,
100, respectively. respectively.
Besides, Besides,
the crossing pointthe
wascrossing
obtained point
nearwas
30,
obtained
which wasnear 30, which
selected as thewas selected
window forasthethe window
texture for the texture channel.
channel.

Figure
Figure 12. Percentages of
12. Percentages of the
the time
time and
and frequency
frequency decline
decline ranges
ranges along
along the
the window
window lengths.
lengths.

Besides, in order to further reduce the input dimension, a bigger stride was used along the time
Besides, in order to further reduce the input dimension, a bigger stride was used along the time
axis for the frequency channel, while a bigger frequency stride was utilized for the temporal channel
axis for the frequency channel, while a bigger frequency stride was utilized for the temporal channel in
in STFT, which could be seen as a subsampling. A cut-off of 2 GHz was also utilized in the temporal
STFT, which could be seen as a subsampling. A cut-off of 2 GHz was also utilized in the temporal
channel and the texture channel, considering that the details in the high-frequency band were trivial in
channel and the texture channel, considering that the details in the high-frequency band were trivial in
their diagnosis. Finally, the windows and the input shapes of the three subnetworks are summarized
their diagnosis. Finally, the windows and the input shapes of the three subnetworks are summarized
in Table 2. Considering the limited amount of data, the neural network architecture was wide, but it
in Table 2. Considering the limited amount of data, the neural network architecture was wide, but it
had only a few convolutional maps per column. The detailed net structures are also shown in Table 2.
had only a few convolutional maps per column. The detailed net structures are also shown in Table 2.
The three columns are the time column, frequency column, and texture column, respectively.
The three columns are the time column, frequency column, and texture column, respectively.
Table 2. Detailed network structure.
Table 2. Detailed network structure.
Index Window Input Shape CNN Structures MLP LSTM
Input
Index Window CNN
(Conv2 Structures
× 50 −MaxPooling2 × MLP LSTM
Shape
1−Dropout)−(Conv2 ×
Column 1 6 30 × 50 (Conv2 × 50−MaxPooling2 ×
1−MaxPooling2 ×
1−Dropout)−(Conv2
1−Dropout) ×
−(Dense100−Dense50)
Column 1 6 30 × 50 LSTM with 16
1−MaxPooling2 × Flatten–
(Conv150 × 4−MaxPooling1 × inner nodes–
1−Dropout)−(Dense100−Dense50)
2−Dropout)–(Conv1 × Concatenation– Flatten–
Column 2 100 150 × 36 Dense100– LSTM with 16
(Conv150 × 4−MaxPooling1
3–MaxPooling1 × × Flatten– Dense50–
Dense50 inner nodes–
2−2−Dropout)–(Conv1
Dropout)–(Dense100−×Dense50)
Concatenation–
3– Output
Column 2 100 150 × 36 Flatten–
MaxPooling1 × 2−Dropout)–
(Conv50 × 4−MaxPooling2 × Dense100–
Dense50–
2−Dropout)−(Conv2 ×
(Dense100−Dense50) Dense50
Column 3 30 100 × 64 Output
2 − MaxPooling2 ×
(Conv50 × 4−MaxPooling2 ×
2−Dropout)−(Dense100−Dense50)
2−Dropout)−(Conv2 ×
Column 3 30 100 × 64
2−MaxPooling2 ×
2−Dropout)−(Dense100−Dense50)
The learning algorithm setup was as follows: the Adadelta optimizer with a learning rate of 1.0
for the pretraining and MLP embedding stages, and a Rmsprop optimizer for LSTM with a learning
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 16 of 27

rate of 0.001 and epsilon of 1 × 10−6 . The pretraining was carried out for 1000, 800, and 400 iterations,
with the patience being 100, 50, and 50. All networks were realized by Keras and NVIDIA 750TI GPUs
(NVIDIA, Santa Clara, CA, USA). The tests averaged over five sets of initialization parameters were
reported as the final results.

5.2. PD Pattern Recognition Results

5.2.1. Diagnosis Accuracies


The recognition accuracies of the two cases of (1) diagnosis of both the defect positions and types,
and (2) the defect types only, are shown in Tables 3 and 4. Overall, we achieved 97.51% testing accuracy
for the recognition of both the angles and defect types, and 98.20% for the defect types only, based on
the experimental dataset.
Furthermore, the diagnostic accuracies of single resolutions and single sensors with partial
configurations are also presented as quantitative measures of the informativeness. The following
points may be observed, based on the recognition accuracies.

1. Multi-resolution diagnosis leads to a smaller misclassification rate than by using only the single
resolution information.
2. Sensor1 has the best accuracy among all the sensors with respect to the shortest distance to the
defect location. The information loss can be quite significant after the L-shaped corner.
3. A higher frequency resolution is more valuable for single resolution diagnosis.
4. The multi-sensor combination is not certainly better than a single sensor without appropriate
methods for the increasing the dimensionality and the confusing information carried by
different sensors.

Table 3. Diagnostic accuracies and the intermediate outputs for classifying both the positions and
defect types.

Single Sensor Diagnosis (%)


Index Structure Multi-Sensor (%)
Sensor1 Sensor2 Sensor3 Sensor4
1 Temporal column 85.72 86.94 66.62 71.94 87.91
2 Frequency column 94.57 93.56 85.43 83.78 95.35
3 Texture column 94.42 93.05 84.78 86.30 93.70
4 Multi-resolution 96.08 95.07 91.01 90.18 97.51

Table 4. Diagnostic accuracies and the intermediate outputs for classifying the defect types only.

Single sensor Diagnosis (%)


Index Structure Multi-Sensor (%)
Sensor1 Sensor2 Sensor3 Sensor4
1 Temporal column 89.53 90.86 81.22 81.04 94.53
2 Frequency column 95.58 96.29 90.14 87.81 96.51
3 Texture column 96.15 95.07 89.53 87.48 95.79
4 Multi-resolution 97.37 97.09 91.29 90.79 98.20

5.2.2. Loss Curves and Time Consumption


To diagnose the effectiveness of transfer learning, the loss curves and time consumptions of the
single column training, multi-column training, and the LSTM training are shown in Figures 13 and 14.
It is obvious that during the pretraining stage, the convergence speed of the texture channel was
the fastest, while the frequency channel ranked second, and the temporal channel was the slowest.
However, the time consumption was the other way around. The texture channel called for the longest
training time. This was attributed to the number of dot calculations for each iteration being much more
than the other two kinds of kernels. Besides, both the iteration number and the time consumption of
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018,
Sensors
Sensors 18,18,
2018,
2018, 3512
18, xx 17 1726
17 of
of of 27
26

more
more than
than the the other
other two
two kinds
kinds ofof kernels.
kernels. Besides,
Besides, both
both the
the iteration
iteration number
number and and the
the time
time
theconsumption
second and third
consumption of thetraining
of the second
second and stages
and were
third
third muchstages
training
training less, compared
stages were
were much
much with
less,their
less, previous
compared
compared withsteps.
with theirFor
their the final
previous
previous
LSTM information
steps.
steps. For
For the finalfusion
the final LSTMstage,
LSTM only less
information
information thanstage,
fusion
fusion 10 s was
stage, onlyneeded.
only less
less than
than 10
10 ss was
was needed.
needed.
The
Theresults
The results
results inin
inTables
Tables333and
Tables and4,4,
and 4,and
andFigures
and Figures 13
Figures 13 and
13 and 14show
and 14
14 showthe
show theexhaustive
the exhaustiverole
exhaustive role
role played
played
played byby
by transfer
transfer
transfer
learning.
learning.It
learning. It is significant
It is
is significant that
significant that during
that duringthe
during the three training
the three
three trainingstages, the
training stages, diagnostic
stages, the
the diagnostic accuracy
diagnostic accuracy increased,
accuracy increased, while
increased,
both the
while training
while both
both the iterations
the training and
training iterations time
iterations and decreased
and time dramatically,
time decreased even
decreased dramatically, with
dramatically, even a larger
even with network.
with aa larger This
larger network. dynamic
network. This
This
indicates
dynamic
dynamic that the knowledge
indicates
indicates that
that the learned from
the knowledge
knowledge earlier
learned
learned stages
from
from wasstages
earlier
earlier successfully
stages was transferredtransferred
was successfully
successfully to a higherto
transferred tolevel
aa
higher
with less level
higher proneness
level with
with less proneness
proneness to
to overfitting.
less to overfitting.
overfitting.

Figure
Figure
Figure 13.13.
13. Loss
Loss
Loss curves
curvesofof
curves the
ofthe three
thethree stages
threestages
stages of the transfer
of the transfer learning:
transfer learning:(a)
learning: (a)The
(a) Thedefect
The defect
defecttypes
types and
and
types positions;
positions;
and positions;
(b)(b)
thethe
(b) the defect
defect
defect types
types
types only.
only.
only.

Figure
Figure
Figure 14.14.
14. Time consumptions
Timeconsumptions
Time consumptionsof of the
ofthe three
thethree stages
stages of
three stages of the
the transfer
transferlearning:
transfer learning:(a)
learning: (a)The
(a) The
Thedefect
defect types
types
defect and
and
types and
positions;
positions;
positions; (b)(b)
(b) the
the
the defect types
defecttypes
defect only.
typesonly.
only.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 18 of 27


Sensors 2018, 18, x 18 of 26

Convolutional Filters
5.2.3. Visualization of the Convolutional Filters
considered as black boxes. However, with the help of visualization
Neural networks are usually considered
methods, it is possible to obtain good qualitative interpretations of the black boxes of deep deep networks.
networks.
Besides, visualization is also an effective way to diagnose the misclassifications and evaluate
Besides, visualization is also an effective way to diagnose the misclassifications and evaluate the degree the
degree
of of overfitting.
overfitting. In this In this study,
study, the Activation
the Activation Maximization
Maximization [49] method
[49] method was utilized
was utilized to visualize
to visualize the
the learned
learned filters.
filters. The ofidea
The idea of Activation
Activation Maximization
Maximization is quite is quite generating
simple: simple: generating
an image an image to
to maximize
maximize
the the of
activation activation
a hiddenofunit
a hidden
as: unit as:

= argmax . .∥ ∥ ℎ , (17)
x ∗ = argmaxx s.t. k xk=ρ hij (θ, x ) (17)
where ∗ is the visualization result, and donates the fixed neural network parameters. Thus,


where , x is the activation
visualization result,
values of and
unitθ donates
in layerthe withfixedfixed
neural network parameters.
parameters hij (θ, x ).
Thus, norm
and bounded
is the activation values of unit j in layer i with fixed parameters and bounded
For simplicity and the convenience of comparison, only the first layer of filters were visualized, as
θ norm ρ. For simplicity
and
shown the in
convenience
Figure 15. of comparison, only the first layer of filters were visualized, as shown in Figure 15.
It
It is clear that
is clear that the
the learned
learned convolutional
convolutional filters
filters were
were quite
quite smooth
smooth in in space,
space, thus
thus indicating
indicating that
that
the
the training
training waswas sufficient.
sufficient. The
The temporal
temporal filters
filters were
were sensitive
sensitive to
to pulses
pulses along
along the
the time
time axis,
axis, while
while the
the
frequency filters showed clear patterns of different frequency combinations.
frequency filters showed clear patterns of different frequency combinations. Besides, the diversity ofBesides, the diversity
of patterns
patterns waswas richer
richer inin thevisualization
the visualizationresults
resultsofofthethetexture
texturefilters,
filters,where
where both
both the
the temporal
temporal and and
frequency
frequency patterns
patterns revealed
revealed their
their importance.
importance. It It is
is also
also encouraging
encouraging thatthat both
both the
the temporal
temporal filters
filters and
and
frequency
frequency filters
filters gained
gained significant
significant resolution
resolution power
power over over multiple
multiple scales,
scales, as
as shown
shown byby the
the different
different
densities
densities of of the
the lines.
lines.

Figure
Figure 15. Visualizationofofthe
15. Visualization theconvolutional
convolutionalfilters:
filters:(a)(a) temporal
temporal filters;
filters; (b) (b) frequency
frequency filters;
filters; (c)
(c) texture
texture filters.
filters.

Besides, a direct visualization of the outputs of the filters was helpful in illustrating the automatic
Besides, a direct visualization of the outputs of the filters was helpful in illustrating the
feature extraction and the information compression of the filters. An illustrative example is shown in
automatic feature extraction and the information compression of the filters. An illustrative example
Figure 16. The outputs of the frequency filter and the temporal filter were only one-dimensional, while
is shown in Figure 16. The outputs of the frequency filter and the temporal filter were only one-
the texture filter transferred the two-dimensional input to a smaller feature map. That is also why the
dimensional, while the texture filter transferred the two-dimensional input to a smaller feature map.
texture filter calls for the longest training time.
That is also why the texture filter calls for the longest training time.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 19 of 27


Sensors
Sensors2018,
2018,18,
18,xx 19
19of
of26
26

Figure
Figure
Figure16.16.
The
16. The
The outputs
outputs
outputs ofof
of the
the first
thefirst layer
firstlayer
layerof of convolutional
convolutional filters.
ofconvolutional filters. (a)
filters. (a)The
(a) Theoutput
The outputof
output the
ofof the
the temporal filter;
temporal
temporal filter;
filter;
(b)
(b) (b) the
thethe output
output of
of of
output the
thethe frequency
frequency
frequency filter;
filter; (c) the
filter;(c)(c)the output
theoutput of
outputof the
ofthe texture
thetexture filter.
texture filter.
filter.

5.2.4. Visualization ofofthe


5.2.4. Learned Embedding Features
5.2.4. Visualization
Visualization of the the Learned
Learned Embedding
Embedding Features
Features
Furthermore,
Furthermore,by
Furthermore, byusing
by usingaaat-Distributed
using t-Distributed Stochastic
t-Distributed Stochastic Neighbor
Stochastic Neighbor Embedding(t-SNE)
Neighbor Embedding
Embedding (t-SNE)projection
(t-SNE) projection
projection
algorithm,
algorithm, the 50-dimensional features learned by the subnetworks were
algorithm, the 50-dimensional features learned by the subnetworks were reduced to three
the 50-dimensional features learned by the subnetworks reduced
were to three
reduced dimensions,
to three
as dimensions,
shown
dimensions, as
as shown
in Figures in
17 and
shown in Figures
18 for the
Figures 17
17 and 18
cases
and 18offor13the
for and
the cases
fiveof
cases 13
13 and
classes,
of five
five classes,
classes, respectively.
andrespectively. It was observed
respectively. ItIt was
that
was
observed
theobserved that
features that the
learned features learned
automatically
the features learnedby automatically
the CNN ofby
automatically by the CNN
different
the CNN of different
defects defects
fell into
of different distinct
defects fell into distinct
fellclusters in the
into distinct
clusters
clusters in the three-dimensional space, where each cluster showed clear manifold characteristics of
in
three-dimensionalthe three-dimensional
space, where space,
each where
cluster each
showed cluster
clear showed
manifoldclear manifold
characteristics characteristics
of a continuous of
a
a continuous
distribution
continuous distribution
in the space. in
distribution in the
the space.
space.

Figure
Figure 17. Visualization of
of the
the learned features for
for both the
the defects and
and positions: (a)
(a) Temporal
Figure 17. 17. Visualization
Visualization of the learned
learned features
features for both bothdefects
the defects positions:
and positions: (a) TemporalTemporal
channel;
channel;
channel; (b)
(b) frequency
frequency channel;
channel; (c)
(c) texture
texture channel.
channel.
(b) frequency channel; (c) texture channel.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors2018,
Sensors 2018,18,
18,3512
x 20ofof27
20 26

Figure
Figure18. Visualization of
18. Visualization of the
the learned
learnedfeatures
featuresfor
forthe
thedefect
defecttypes
types only:
only: (a)(a) Temporal
Temporal channel;
channel; (b)
(b) frequency
frequency channel;
channel; (c)(c) texture
texture channel.
channel.

5.3. Comparison with the Baselines


5.3. Comparison with the Baselines
To further verify the effectiveness, some baselines were implemented. Instead of focusing on
To further verify the effectiveness, some baselines were implemented. Instead of focusing on the
the specified methods, we first extracted the common framework utilized in the practice of UHF
specified methods, we first extracted the common framework utilized in the practice of UHF signal
signal recognition, which were two-dimensionalization, matrix compression, decomposition, features
recognition, which were two-dimensionalization, matrix compression, decomposition, features
extraction, feature selection (FS), and the final classifier. Second, by using typical models in certain
extraction, feature selection (FS), and the final classifier. Second, by using typical models in certain
steps, representative methods could be obtained. A description of some implemented techniques is
steps, representative methods could be obtained. A description of some implemented techniques is
as follows.
as follows.
1.1. Two-dimensionalization:
Two-dimensionalization:Wavelet Waveletspectrum
spectrumand andHibert–Huang
Hibert–Huangspectrum;
spectrum;
2.2. Matrix
Matrix compression:
compression: NMF NMF and
and 2DPCA;
2DPCA;
3.3. Signal Decomposition:
Signal Decomposition: Wavelet
Wavelettransform
transformandandHibert–Huang
Hibert–Huangtransform
transform(HHT);
(HHT);
4.4. Features extraction:
Features extraction: The
The extracted
extracted time
time and
and frequency
frequency (T&F)
(T&F)features
featuresinclude
includethe
themax
maxvalue,
value,root
root
mean square deviation, standard deviation, skewness, kurtosis, and the peak-to-peak
mean square deviation, standard deviation, skewness, kurtosis, and the peak-to-peak value in the value in the
time domain.
time domain. TheThe frequency
frequency features
features include
include the
the mean
mean frequency,
frequency, frequency
frequency center,
center,root
rootmean
mean
square frequency,
square frequency, andand the
the standard deviation frequency. Besides,
Besides, the
the entropy
entropyisisalso
alsocalculated.
calculated.
5.5. Feature selection:
Feature selection: ExtraTrees
ExtraTreesClassifier
Classifierand
andLinearSVC
LinearSVCfeature
featureselection;
selection;
6.6. Classifier: Finally, both the SVM and DNN of dense layers are
Classifier: Finally, both the SVM and DNN of dense layers are chosen as chosen asthe
thefinal
finalclassifiers,
classifiers,due
due
to their high representativeness in engineering practice.
to their high representativeness in engineering practice.
Through the combination of the different techniques, 18 diagnosis models are summarized in
Through the combination of the different techniques, 18 diagnosis models are summarized
Table 5 as the baselines, including both simple methods and complicated flows. However, the
in Table 5 as the baselines, including both simple methods and complicated flows. However,
recognition accuracies are not certainly proportional to the complexity of the methods.
the recognition accuracies are not certainly proportional to the complexity of the methods.
All of the hyperparameters are selected based on cross-validation to the best of our knowledge.
All of the hyperparameters are selected based on cross-validation to the best of our knowledge.
The final recognition accuracies of (1) combined the diagnosis of positions and defect types together,
The final recognition accuracies of (1) combined the diagnosis of positions and defect types
13 classes, and (2) diagnosis of the defect types only; five classes are shown in Tables 6 and 7,
together, 13 classes, and (2) diagnosis of the defect types only; five classes are shown in
respectively.
Tables 6 and 7, respectively.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 21 of 27

Table 5. Typical diagnosis models for comparisons.

Two-Dimensionalization Analysis Feature Extraction


Summary Feature
Index Two- Matrix Classifier
Decomposition Features Selection
Dimensionalization Compression
1 HHT Map - - - - CNN
Map + CNN
2 Wavelet Map - - - - CNN
3 - - - - Yes SVM
Raw input
4 - - - - Yes DNN
5 - - - T&F Yes SVM
T&F features
6 - - - T&F Yes DNN
7 - - wavelet T&F Yes SVM
Wavelet + T&F
8 - - wavelet T&F Yes DNN
9 - - HHT T&F Yes SVM
HHT + T&F
10 - - HHT T&F Yes DNN
11 STFT NMF - Yes SVM
STFT + NMF
12 STFT NMF - Yes DNN
13 SFTF + NMF STFT NMF T&F Yes SVM
14 + T&F STFT NMF T&F Yes DNN
15 STFT 2DPCA Yes SVM
SFTF + 2DPCA
16 STFT 2DPCA Yes DNN
17 SFTF + 2DPCA STFT 2DPCA T&F Yes SVM
18 + T&F STFT 2DPCA T&F Yes DNN

Table 6. Diagnostic accuracies of the baseline methods for classifying both defect types and positions.

Single Sensor (%)


Index Summary Multi-Sensor (%)
Sensor1 Sensor2 Sensor3 Sensor4
1 HHT spectrum + CNN 65.58 61.73 45.94 48.49 60.79
2 Wavelet spectrum + CNN 90.72 91.83 84.82 84.46 90.61
3 FS + SVM 89.20 87.41 67.27 66.18 66.18
4 FS + DNN 89.17 86.58 67.66 62.99 78.77
5 T&F + FS + SVM 55.04 55.04 41.73 49.64 72.30
6 T&F + FS + DNN 55.58 56.73 43.20 46.37 74.31
7 Wavelet + T&F + FS + SVM 61.87 55.76 47.48 49.64 76.61
8 Wavelet + T&F + FS + DNN 65.07 58.02 46.62 50.58 76.87
9 HHT + T&F + FS + SVM 37.33 32.73 22.66 32.01 45.68
10 HHT + T&F + FS + DNN 38.41 36.76 30.14 33.45 42.58
11 STFT + NMF + FS + SVM 90.29 88.85 72.66 68.70 92.09
12 STFT + NMF + FS + DNN 90.93 90.58 76.12 71.04 91.55
13 STFT + NMF + T&F + FS + SVM 77.70 71.58 56.83 57.91 85.97
14 STFT + NMF + T&F + FS + DNN 80.14 72.91 59.57 58.09 84.86
15 STFT + 2DPCA + FS + SVM 87.41 84.53 57.19 55.39 87.05
16 STFT + 2DPCA + FS + DNN 87.62 84.10 60.28 55.36 85.89
17 STFT + 2DPCA + T&F + FS + SVM 84.53 76.25 49.28 49.64 82.37
18 STFT + 2DPCA + T&F + FS + DNN 84.06 76.47 54.60 49.78 81.72
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 22 of 27

Table 7. Diagnostic accuracies of the baseline methods for classifying the defect types only.

Single Sensor (%)


Index Summary Multi-Sensor (%)
Sensor1 Sensor2 Sensor3 Sensor4
1 HHT Spectrum + CNN 73.02 74.1 60.68 60.86 76.47
2 Wavelet Spectrum + CNN 92.41 93.81 90.32 89.82 91.94
3 FS + SVM 92.45 90.29 73.37 71.94 81.29
4 FS + DNN 90.72 89.28 76.80 76.16 77.48
5 T&F + FS + SVM 64.39 60.07 52.52 62.95 81.29
6 T&F + FS + DNN 67.63 68.71 56.87 66.22 82.01
7 Wavelet + T&F + FS + SVM 68.35 66.91 57.55 61.51 82.73
8 Wavelet + T&F + FS + DNN 72.60 67.63 60.07 65.00 81.85
9 HHT + T&F + FS + SVM 48.92 50.00 42.81 48.56 56.83
10 HHT + T&F + FS + DNN 54.93 54.14 44.82 53.53 60.07
11 STFT + NMF + FS + SVM 91.01 90.65 82.37 77.70 93.88
12 STFT + NMF + FS + DNN 92.59 90.72 83.67 76.73 93.48
13 STFT + NMF + T&F + FS + SVM 79.50 74.82 68.35 69.78 88.49
14 STFT + NMF + T&F + FS + DNN 84.03 78.13 70.36 68.71 89.28
15 STFT + 2DPCA + FS + SVM 87.76 87.76 66.19 67.63 87.41
16 STFT + 2DPCA + FS + DNN 89.71 86.01 70.14 71.94 86.47
17 STFT + 2DPCA + T&F + FS + SVM 86.69 80.58 64.03 63.31 87.05
18 STFT + 2DPCA + T&F + FS + DNN 85.79 82.41 61.83 65.32 84.38

The accuracy comparisons may provide some insights into the fundamental differences between
the traditional feature-based methods and the deep learning approaches, as well as some essential
flaws of the feature engineering methods.

1. The misclassification rate of the proposed multi-resolution CNN clearly indicates its superior
capability compared with the baseline methods. STFT’s performance is slightly better than the
Wavelet spectrum in certain resolutions, and much better than the HHT map.
2. The best performance in the baseline methods is gained by the combination of STFT, NMF,
and SVM, which is 92.09% for the 13 classes, and 93.88% for the five classes. Firstly, it is observed
that NMF slightly outperforms 2DPCA, which may be explained as follows. In the experiments,
the NMF gains the best performance with four dimensions, compared with the 12 dimensions
of the 2DPCA, indicating that the redundancy is huge in the STFT maps. Thus, searching for a
global projection in the redundancy data may be not a good idea. Second, the features extraction
usually does not increase the model performance, which reflects the eternal contradiction among
the learning ability of the model, the input dimensions, the distribution of the data, and the
discernibility of the features.
3. The performances of SVM and DNN are similar when utilizing the same input features. Therefore,
for traditional feature-based recognition methods, the recognition accuracy relies much on the
discriminative features.
4. Similar to the conclusion drawn from Tables 3 and 4, the performance of multiple sensors is not
certainly better than the single sensor, especially when the nearest single sensor can gain good
diagnostic accuracy.

6. Discussion
Although the deep networks have gained tremendous performance in many areas, low
interpretability is always an obstacle to applying them in some highly regulated environments, due to
the uncontrollability that comes with the unclear mechanism. Sometimes, the training of the neural
networks becomes a trial-and-error problem. In recent years, a great number of studies have focused
on the interpretability of deep learning, and great progress has been made in both the theories and
experiments. In the discussion, instead of pursuing a general solution, we only concentrate attention
on the special CNN structures that are implemented here, and try to explain its effectiveness from the
perspective of manifold learning.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018,18,
Sensors2018, 18,3512
x 23
23of
of27
26

on the special CNN structures that are implemented here, and try to explain its effectiveness from
A comprehensive
the perspective comparison
of manifold learning.of the similarities between the filters learned by the CNN and
the matrix
A comprehensive comparison ofsuch
projection-based methods, as 2DPCA,between
the similarities is presented to provide
the filters learnedsome insights
by the CNNintoandthe
the
advantages of deep learning
matrix projection-based methods,
methods, suchcompared
as 2DPCA, with the feature
is presented toextraction
provide some solutions. It is
insights quite
into the
interesting
advantagestoofnotice
deepthe similarity
learning betweencompared
methods, the projection
withoperation
the feature in 2DPCA
extractionandsolutions.
the convolutions in
It is quite
CNN. In 2DPCA, an image A of m × n matrix is projected into a lower dimension
interesting to notice the similarity between the projection operation in 2DPCA and the convolutionsby:
in CNN. In 2DPCA, an image of × matrix is projected into a lower dimension by:
Yk = AXk (18)
= (18)
whereXk isisa avector
where vectorofofn dimensions,
dimensions,and andk is is
thethe index.AAvisualization
index. visualizationexampleexampleofofthe
the2DPCA’s
2DPCA’s
compressedresult
compressed resultof
ofthe
theSTFT
STFTspectrum
spectrumisisshown
shownin inFigure
Figure19.19.ItItisiseasy
easyto tonotice
noticeits
itssimilarity
similaritywith
with
the outputs in Figure 16, while the information concentration in Figure
the outputs in Figure 16, while the information concentration in Figure 16 is better. 16 is better.

Figure19.
Figure 19.Visualization
Visualizationof
ofthe
thecompressed
compressedresults
resultsofofthe
the2DPCA.
2DPCA.

Moreover,ititisisalso
Moreover, alsopossible
possibleto
to reconstruct
reconstruct the original image from the compressed
compressed information
informationby:
by:

A= d
(19)
 = ∑ Yk XkT (19)
k =1
where is the compression dimension. Therefore, in 2DPCA, it is assumed that any input can be
approximately
where reconstructed
d is the compression as a weighted
dimension. sum ofin
Therefore, smaller
2DPCA,collections. In summary,
it is assumed that anythe keycan
input points
be
in 2DPCA include:
approximately reconstructed as a weighted sum of smaller collections. In summary, the key points in
2DPCA
(1) Theinclude:
projection operation for compressing,
(2)
(1) The projection
The assumption that complex
operation images are composed of eigenimages.
for compressing,
(2) The
It isassumption that complex
quite interesting imagescharacteristics
that similar are composedare of eigenimages.
shared by the proposed temporal and
frequency filters.
It is quite interesting that similar characteristics are shared by the proposed temporal and
(1) The dot
frequency production between the frequency filters and the spectrum, and the similar operation
filters.
between the temporal filters and the spectrum can be seen as a projection operation,
(1)
(2) The dotCNN,
In the production between
it is also assumed the that
frequency
imagesfilters and thefrom
are formed spectrum, andofthe
elements thesimilar
loweroperation
level, like
between the temporal filters and the
pixels to more composite representations [28].spectrum can be seen as a projection operation,
(2) In the CNN, it is also assumed that images are formed from elements of the lower level, like
Therefore, a common framework that distinguishes different categories by matrix projection
pixels to more composite representations [28].
may be extracted to unify the deep structure and 2DPCA. The CNN and 2DPCA both try to classify
different categories
Therefore, by using
a common matrix projections,
framework and the different
that distinguishes main differences liesbyinmatrix
categories their supervision
projection
methods, where the CNN is supervised by labels, while 2DPCA is supervised
may be extracted to unify the deep structure and 2DPCA. The CNN and 2DPCA both try to classify by the inner product
in a high-dimensional space. Furthermore, a special case of NMF that aims
different categories by using matrix projections, and the main differences lies in their supervision at reducing the
reconstruction
methods, whereerrors
the CNN by projecting
is supervised could also bewhile
by labels, defined to fitisthis
2DPCA framework.
supervised by theThe main
inner specialty
product in
lies in that the loss function is defined at the single sample, and it uses the mean square
a high-dimensional space. Furthermore, a special case of NMF that aims at reducing the reconstruction error between
the original images and the reconstructed ones.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 24 of 27

errors by projecting could also be defined to fit this framework. The main specialty lies in that the loss
function is defined at the single sample, and it uses the mean square error between the original images
and the reconstructed ones.
Moreover, based on this framework, the 2DPCA can also be reformulated and approximated as a
special case of the CNN with the following setups: (1) Use only a single layer of convolutions, and the
width of the convolutional kernels should be restricted at one. (2) The loss function is defined as the
sum of the pairwise Euclidean distance losses to force the model to maintain its relative relations in
higher space. Some new algorithms may be developed based on this idea.
Therefore, the superiority of the proposed may be explained as follows. Based on the same
framework, only the CNN is goal-oriented at the final classifying task, while the 2DPCA and NMF aim
at the other specified goals. It is usually hard to determine the effectiveness of these specified goals
until obtaining the final results. Is it better to maintain the global information as 2DPCA, or to keep
the local completeness as in NMF? In addition, the multiple projections also equip the CNN with a
better learning ability to handle more complex problems.

7. Conclusions
Inspired by the phenomenon that the PD spectrograms of different resolutions can capture
the information of different patterns, both multi-resolution and multi-sensor fusion algorithms are
proposed. They are realized by a multi-column deep CNN characterized by specified filters and a
sequential LSTM network. A multi-stage transfer learning framework is also presented to train the
model with limited data. Several conclusions may be drawn based on the detailed analysis.

1. The proposed multi-resolution model gains accuracies of 97.51% and 98.20% on the tasks of
diagnosing (1) the positions and defect types together, and (2) the defect types only; thus
indicating its clear superiority compared with the baselines.
2. The loss curves, time consumptions, and diagnostic accuracies show the effectiveness of transfer
learning, which successfully transfers knowledge from a lower level to a higher level with less
proneness to overfitting.
3. The comparisons with the baseline methods reveal several fundamental flaws of the feature
based methods, such as the difficulties in choosing the most suitable approach, and the curse of
dimensionality when encountering too many features from multiple sensors.
4. A matrix projection framework is proposed to enhance the interpretability of the proposed deep
network structure.

Author Contributions: G.L. and X.W. conceived and designed the experiments; X.L. provided the experimental
data; G.L. wrote the paper; A.Y. revised the contents and reviewed the manuscript; M.R. provided theoretical
guidance and supported the study.
Funding: This work was supported by the National Natural Science Foundation of China (No. 51521065).
Acknowledgments: The author gratefully acknowledges the support of the K. C. Wong Education Foundation.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Metwally, I.A. Technology progress in high-voltage gas-insulated substations. IEEE Potentials 2010, 29, 25–32.
[CrossRef]
2. Riechert, U.; Holaus, W. Ultra high-voltage gas-insulated switchgear—A technology milestone. Eur. Trans.
Electr. Power 2012, 22, 60–82. [CrossRef]
3. Schichler, U.; Koltunowicz, W.; Endo, F.; Feser, K.; Giboulet, A.; Girodet, A.; Hama, H.; Hampton, B.;
Kranz, H.-G.; Lopez-Roldan, J. Risk assessment on defects in GIS based on PD diagnostics. IEEE Trans.
Dielectr. Electr. Insul. 2013, 20, 2165–2172. [CrossRef]
4. Stone, G.C. Partial discharge diagnostics and electrical equipment insulation condition assessment.
IEEE Trans. Dielectr. Electr. Insul. 2005, 12, 891–904. [CrossRef]
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 25 of 27

5. Bartnikas, R. Partial discharges. Their mechanism, detection and measurement. IEEE Trans. Dielectr.
Electr. Insul. 2002, 9, 763–808. [CrossRef]
6. Wu, M.; Cao, H.; Cao, J.; Nguyen, H.L. An overview of state-of-the-art partial discharge analysis techniques
for condition monitoring. IEEE Electr. Insul. Mag. 2015, 31, 22–35. [CrossRef]
7. Kurrer, R.; Feser, K. The application of ultra-high-frequency partial discharge measurements to gas-insulated
substations. IEEE Trans. Power Deliv. 1998, 13, 777–782. [CrossRef]
8. High Voltage Test Techniques—Measurement of Partial Discharges by Electromagnetic and Acoustic Methods; IEC TS
62478:2016; BSI: London, UK, 2016.
9. Umamaheswari, R.; Sarathi, R. Identification of partial discharges in gas-insulated switchgear by
ultra-high-frequency technique and classification by adopting multi-class support vector machines.
Electr. Power Compon. Syst. 2011, 39, 1577–1595. [CrossRef]
10. Darabad, V.; Vakilian, M.; Blackburn, T.; Phung, B. An efficient PD data mining method for power transformer
defect models using SOM technique. Int. J. Electr. Power Energy Syst. 2015, 71, 373–382. [CrossRef]
11. Mas’Ud, A.A.; Stewart, B.; McMeekin, S. Application of an ensemble neural network for classifying partial
discharge patterns. Electr. Power Syst. Res. 2014, 110, 154–162. [CrossRef]
12. Evagorou, D.; Kyprianou, A.; Lewin, P.; Stavrou, A.; Efthymiou, V.; Metaxas, A.; Georghiou, G. Feature
extraction of partial discharge signals using the wavelet packet transform and classification with a
probabilistic neural network. IET Sci. Meas. Technol. 2010, 4, 177–192. [CrossRef]
13. Wang, K.; Li, J.; Zhang, S.; Liao, R.; Wu, F.; Yang, L.; Li, J.; Grzybowski, S.; Yan, J. A hybrid algorithm based
on s transform and affinity propagation clustering for separation of two simultaneously artificial partial
discharge sources. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 1042–1060. [CrossRef]
14. Zhu, M.-X.; Xue, J.-Y.; Zhang, J.-N.; Li, Y.; Deng, J.-B.; Mu, H.-B.; Zhang, G.-J.; Shao, X.-J.; Liu, X.-W.
Classification and separation of partial discharge ultra-high-frequency signals in a 252 kV gas insulated
substation by using cumulative energy technique. IET Sci. Meas. Technol. 2016, 10, 316–326. [CrossRef]
15. Li, L.; Tang, J.; Liu, Y. Partial discharge recognition in gas insulated switchgear based on multi-information
fusion. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 1080–1087. [CrossRef]
16. Albarracín, R.; Robles, G.; Martinez-Tarifa, J.M.; Ardila-Rey, J. Separation of sources in radiofrequency
measurements of partial discharges using time–power ratio maps. ISA Trans. 2015, 58, 389–397. [CrossRef]
[PubMed]
17. Li, J.; Jiang, T.; Harrison, R.F.; Grzybowski, S. Recognition of ultra high frequency partial discharge signals
using multi-scale features. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1412–1420. [CrossRef]
18. Wang, X.; Li, X.; Rong, M.; Xie, D.; Ding, D.; Wang, Z. UHF Signal Processing and Pattern Recognition
of Partial Discharge in Gas-Insulated Switchgear Using Chromatic Methodology. Sensors 2017, 17, 177.
[CrossRef] [PubMed]
19. Gu, F.-C.; Chang, H.-C.; Chen, F.-H.; Kuo, C.-C.; Hsu, C.-H. Application of the Hilbert-Huang transform with
fractal feature enhancement on partial discharge recognition of power cable joints. IET Sci. Meas. Technol.
2012, 6, 440–448. [CrossRef]
20. Dai, D.; Wang, X.; Long, J.; Tian, M.; Zhu, G.; Zhang, J. Feature extraction of GIS partial discharge signal
based on S-transform and singular value decomposition. IET Sci. Meas. Technol. 2016, 11, 186–193. [CrossRef]
21. Majidi, M.; Fadali, M.S.; Etezadi-Amoli, M.; Oskuoee, M. Partial discharge pattern recognition via sparse
representation and ANN. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 1061–1070. [CrossRef]
22. Khan, Y. Partial discharge pattern analysis using PCA and back-propagation artificial neural network for the
estimation of size and position of metallic particle adhering to spacer in GIS. Electr. Eng. 2016, 98, 29–42.
[CrossRef]
23. Li, X.; Wang, X.; Yang, A.; Xie, D.; Ding, D.; Rong, M. Propogation characteristics of PD-induced UHF signal
in 126 kV GIS with three-phase construction based on time–frequency analysis. IET Sci. Meas. Technol. 2016,
10, 805–812. [CrossRef]
24. Auger, F.; Flandrin, P.; Gonçalvès, P.; Lemoine, O. Time-Frequency Toolbox; CNRS France: Paris, France;
Rice University: Houston, TX, USA, 1996.
25. Zhang, S.; Li, C.; Wang, K.; Li, J.; Liao, R.; Zhou, T.; Zhang, Y. Improving recognition accuracy of partial
discharge patterns by image-oriented feature extraction and selection technique. IEEE Trans. Dielectr.
Electr. Insul. 2016, 23, 1076–1087. [CrossRef]
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 26 of 27

26. Wang, K.; Liao, R.; Yang, L.; Li, J.; Grzybowski, S.; Hao, J. Optimal features selected by NSGA-II for partial
discharge pulses separation based on time-frequency representation and matrix decomposition. IEEE Trans.
Dielectr. Electr. Insul. 2013, 20, 825–838. [CrossRef]
27. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [CrossRef] [PubMed]
28. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks.
In Proceedings of the Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012;
pp. 1097–1105.
29. Li, G.; Rong, M.; Wang, X.; Li, X.; Li, Y. Partial discharge patterns recognition with deep Convolutional Neural
Networks. In Proceedings of the Condition Monitoring and Diagnosis, Xi’an, China, 25–28 September 2016;
pp. 324–327.
30. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016;
pp. 770–778.
31. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.
Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9.
32. Chollet, F. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017;
pp. 1251–1258.
33. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks.
In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu,
HI, USA, 21–26 July 2017; pp. 5987–5995.
34. Shao, L.; Wu, D.; Li, X. Learning Deep and Wide: A Spectral Method for Learning Deep Networks. IEEE Trans.
Neural Netw. Learn. Syst. 2017, 25, 2303–2308. [CrossRef] [PubMed]
35. Dieleman, S.; Schrauwen, B. Multiscale approaches to music audio feature learning. In Proceedings of the
14th International Society for Music Information Retrieval Conference (ISMIR-2013), Pontifícia Universidade
Católica do Paraná, Curitiba, Brazil, 4–8 November 2013; pp. 116–121.
36. Mathieu, M.; Couprie, C.; LeCun, Y. Deep multi-scale video prediction beyond mean square error.
arXiv Preprint 2015, arXiv:1511.05440.
37. Takahashi, N.; Mitsufuji, Y. Multi-Scale multi-band densenets for audio source separation. In Proceedings of
the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz,
NY, USA, 15–18 October 2017; pp. 21–25.
38. Tang, Y.; Mohamed, A.-R. Multiresolution deep belief networks. In Proceedings of the Artificial Intelligence
and Statistics, La Palma, Canary Islands, Spain, 21–23 April 2012; pp. 1203–1211.
39. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual
Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef] [PubMed]
40. Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey.
IEEE Trans. Neural Netwo. Learn. Syst. 2015, 28, 2222–2232. [CrossRef] [PubMed]
41. Li, X.; Wang, X.; Xie, D.; Wang, X.; Yang, A.; Rong, M. Time–frequency analysis of PD-induced UHF signal in
GIS and feature extraction using invariant moments. IET Sci. Meas. Technol. 2018, 12, 169–175. [CrossRef]
42. Mikolov, T.; Karafiát, M.; Burget, L.; Cernocký, J.; Khudanpur, S. Recurrent neural network based language
model. In Proceedings of the INTERSPEECH 2010—11th Annual of Conference of the International Speech
Communication Association, Makuhari, Chiba, Japan, 26–30 September 2010; pp. 1045–1048.
43. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359.
[CrossRef]
44. Weston, J.; Ratle, F.; Mobahi, H.; Collobert, R. Deep learning via semi-supervised embedding. In Neural
Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012; pp. 639–655.
45. Bengio, Y.; Paiement, J.F.; Vincent, P.; Delalleau, O.; Roux, N.L.; Ouimet, M. Out-of-sample extensions for
LLE, IsoMap, MDS, Eigenmaps, and Spectral Clustering. Adv. Neural Inf. Process. Syst. 2004, 16, 177–184.
46. Williams, C.K.I. On a Connection between Kernel PCA and Metric Multidimensional Scaling. Mach. Learn.
2002, 46, 11–19. [CrossRef]
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Sensors 2018, 18, 3512 27 of 27

47. Chopra, S.; Hadsell, R.; Lecun, Y. Learning a Similarity Metric Discriminatively, with Application to Face
Verification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 539–546.
48. Li, T.; Rong, M.; Zheng, C.; Wang, X. Development simulation and experiment study on UHF partial
discharge sensor in GIS. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1421–1430. [CrossRef]
49. Erhan, D.; Bengio, Y.; Courville, A.; Vincent, P. Visualizing Higher-Layer Features of a Deep Network; Technical
Report 1341; University of Montreal: Montreal, QC, Canada, 2009; pp. 1–13.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like