J Neucom 2019 05 052

Accepted Manuscript
An improved deep convolutional neural network with multi-scale

information for bearing fault diagnosis
Wenyi Huang , Junsheng Cheng , Yu Yang , Gaoyuan Guo
PII: S0925-2312(19)30758-1
DOI: https://doi.org/10.1016/j.neucom.2019.05.052
Reference: NEUCOM 20835
To appear in: Neurocomputing
Received date: 24 January 2019

Revised date: 28 April 2019
Accepted date: 17 May 2019
Please cite this article as: Wenyi Huang , Junsheng Cheng , Yu Yang , Gaoyuan Guo , An improved
deep convolutional neural network with multi-scale information for bearing fault diagnosis, Neurocom-
puting (2019), doi: https://doi.org/10.1016/j.neucom.2019.05.052
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
An improved deep convolutional neural network with multi-scale
information for bearing fault diagnosis

a,
Wenyi Huanga, Junsheng Cheng *, Yu Yanga, Gaoyuan Guoa
a
State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body,
College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082,
China
*Corresponding author.
T
E-mail address: [email protected]
IP
Abstract: In recent years, deep learning technique has been used in mechanical intelligent fault
diagnosis and it has achieved much success. Among the deep learning models, convolutional
CR
neural network (CNN) is able to accomplish the feature learning without priori knowledge and
pattern classification automatically, which makes it to be an end-to-end method. However, CNN
may fall into local optimum when lack of useful information in the input signal. Diversity
resolution expressions of signal in frequency domain can be obtained by using the filters with
US
different scales (lengths) and more expressions may provide more useful information. Thus, in this
paper, an improved CNN named multi-scale cascade convolutional neural network (MC-CNN) is
AN
proposed for the classification information enhancement of input. In MC-CNN, a new layer has
been added before convolutional layers to construct a new signal of more distinguishable
information. The composed signal is obtained by concatenating the signals convolved by original
input and kernels of different lengths. To reduce the abundant neurons produced by the multi-scale
M
signal, a convolutional layer with kernels of small size and a pooling layer are added after the
multi-scale cascade layer. To verify the proposed method, the original CNNs and MC-CNN are
applied to the pattern classification of bearing vibration signal with four conditions under normal
ED
and noise environments, respectively. The classification results show that the proposed MC-CNN
is more effective than the commonly CNNs. In addition, the lower t-distributed stochastic
neighbor embedding (t-SNE) clustered error verifies the effectiveness and necessity of MC layer
PT
further. By analyzing the kernels learned from the multi-scale cascade layer, it can be found that
the kernels act as filters of different resolutions to make the frequency domain structure of
CE
different fault signals more distinguishable. By studying the influence of kernel scale in MC layer
on fault diagnosis, it is found that the optimal scale does exist and will be a research emphasis in
the future. Moreover, the effectiveness of MC-CNN is verified furthermore by analyzing the
AC
application of MC-CNN in bearing fault diagnosis under nonstationary working conditions.

Key Words: deep learning; convolutional neural network; multi-scale cascade; rolling
bearing; fault diagnosis
1、 Introduction
With the rapid development of science and technology and the continuous development of
industrial applications, mechanical equipments are becoming more and more sophisticated,
automatic and intelligent in modern industries than before. Therefore, the requirement of industrial
condition monitoring and fault diagnosis system is higher and higher in practical application [1-4]
under stationary or nonstationary operation condition [5-7] because faults of any location may
lead significant loss. The rolling bearing is the critical element of rotating machinery since its fault
1
ACCEPTED MANUSCRIPT
probability is 30% in all faults of rotating element [8]. Thus, it is significant to monitor the healthy
condition of bearing on the working state [9-10]. Accordingly, many researchers pay more and
more attention to bearing fault diagnosis in recent years. Generally, feature extraction and feature
classification are necessary for bearing fault diagnosis. In the step of feature extraction [11-15],
the collected signals of bearing will be analyzed and the useful features containing fault
information will be selected based on priori knowledge. However, such feature designing
processes make full use of human knowledge in signal processing techniques and diagnostic
expertise, but they lacks the adaptability to working conditions and environment. In order to get
rid of this situation, more advanced artificial intelligence techniques should be proposed to
accomplish the diagnosis tasks of feature learning and classification automatically.
T
In recent years, deep learning has been paid much attention and achieved much success in
IP
many fields, such as image processing, machine vision and speech recognition [16-20]. Since deep
learning is a full-automatic toll of end to end recognition which can leave out the steps of feature
CR
extraction based on human knowledge, it has been introduced to the intelligent fault diagnosis of
machinery. For instance, Hongkai Jiang et al. [21] proposed a novel method called improved
convolutional deep belief network (CDBN) combining with compressed sensing (CS) for feature
learning and fault diagnosis of rolling bearing. In addition, they also proposed a novel method
US
called deep wavelet auto-encoder (DWAE) with extreme learning machine for bearing fault
classification and the results confirmed that the proposed method is superior to the traditional
AN
methods and standard deep learning methods [22]. Wentao Mao [23] et al. integrated multi-layer
extreme learning machine (ML-ELM) and auto-encoder for bearing feature extraction and fault
diagnosis. Haidong Shao [24] et al. proposed a novel method called continuous deep belief
network with locally linear embedding for rolling bearing detection and the results showed that the
M
proposed method can accurately predict the tendency of bearing performance index, which is more
effective than traditional methods. Liang Gao [25] et al. proposed an automatic feature extraction
method using a subset based deep auto-encoder (SBTDA) model which combined with swarm
ED
optimization algorithm for bearing fault diagnosis and the results showed that the proposed
method had better accuracy than traditional methods. In brief, deep learning techniques have the
great capacity to overcome the inherent disadvantages of traditional intelligent methods and have
PT
been paid much attention by researchers.

Among the deep learning models, convolutional neural network (CNN) aims at processing
CE
data with known grid-like shape [26-30], such as 2D image data, or 1D time-series data. It had
been proved that CNN is suitable to learn features from rotating mechanical signals because of its
ability in handing the periodic signals [31]. Thus, many researchers have applied CNN to fault
AC
diagnosis of rotating machinery. For instance, Xiang Li et al. [32] utilized CNN to predict the
remaining useful life (RUL) of bearing and the root mean square error of RUL based on the
proposed method was lower than the classical methods such as Kalman filter, support vector
machine classifier and long short term memory network in deep learning. By intelligent fusion of
the multi-level wavelet packet, Baoping Tang [33] et al. presented a dynamic ensemble CNN for
bearing diagnosis in which the input of the paralleled CNNs are 2-D matrices containing the
wavelet packet time-frequency information of different levels. To reveal how kernels are learned
in CNN, many researches utilized 1-D CNN for fault diagnosis of machinery. Yaguo Lei [31] et al.
proposed a deep normalized convolutional neural network (DNCNN) for imbalanced fault
classification of machinery. The results showed that DNCNN is able to deal with the imbalanced
2
ACCEPTED MANUSCRIPT
classification problem more effectively than the commonly used CNNs. Guoliang Peng [34] et al.
proposed a deep CNN with new training methods for bearing fault diagnosis under noisy
environment and different working load. The classification results showed that the model can
achieve pretty high accuracy under noisy environment and load changing environment. However,
there are still some disadvantages in CNN. Compared with other classification methods that do not
need so much data such as SVM, the high classification accuracy of CNN relies on a great number
of training samples because the mathematical model of deep neural network is complex and more
samples are needed to increase the generalization ability of the model to prevent over-fitting.
Moreover, in traditional feature extraction methods, the features composed of information coming
from different scales can represent deep fault-related information better, such as multiscale
T
dispersion entropy [35], multiscale fuzzy entropy [36], wavelet-based multiscale slope features
IP
[37] and so on. Nevertheless, the kernel scale in each convolutional layer is fixed in CNN layer,
which cannot extract the classifiable information comes from different resolutions varied by
CR
kernel scales. On the other hand, the physical meaning of CNN needs to be revealed because it is
helpful to comprehend the process of feature extraction in CNN. Thus, how to extract multiscale
classification information of higher generalization ability in CNN and reveal the physical meaning
of learned kernels in each layer is still worth studying.
US
By analyzing the researches of CNN, it can be found that the structures of CNN are almost
the same, which contains the convolutional layer, pooling layer and full-connected layer. In
AN
addition, the kernel size of each convolutional layer is fixed. As we know, signals convoluted by
kernels of different sizes have diverse resolutions of frequency domain. It is possible to find
sensitive bands for fault classification in frequency domain of different resolutions and more
sensitive information can lead higher and more stable prediction accuracy. Thus, it is beneficial to
M
add the kernels of different size in convolutional layer. However, different kernel sizes make the
length of outputs to be inconsistent and the outputs cannot be superimposed before putting into
activation function. To tackle this issue, it is reasonable to make the outputs convoluted by kernels
ED
of different scale cascaded in the first layer because the cascaded signal contains all the sensitive
bands of the outputs. Moreover, one more convolutional layer and pooling layer will be followed
after the cascaded layer for neuro reduction. Thus, an improved CNN named multi-scale cascade
PT
CNN (MC-CNN) is proposed in this paper for rolling bearing feature learning and fault diagnosis.
The structure of MC-CNN is adding a layer of kernels with different scale before the first layer of
CE
original CNN. The MC-CNN can not only extract fault features adaptively which is different from
traditional feature extraction methods, but also fuse multi-scale information of input signals which
is superior to original CNN. In this paper, the proposed method is applied to the bearing vibration
AC
datasets under normal environment, noise environment and nonstationary working conditions. All
the diagnosis results confirm that the proposed method MC-CNN has higher classification
accuracy than original CNNs. In addition, the evolution of signals in MC-CNN is displayed to
show how kernels are learned in each layer and the full-connected features of the four conditions
are compared to verify the classification ability of MC-CNN further. Finally, by studying the
influence of kernel scale in MC layer on fault diagnosis, it has been proved that the optimal scale
does exist and will be a research emphasis in the future.
2、 Backgrounds
2.1 Structure of CNN
Convolutional neural network is a special depth feed-forward neural network. It can avoid the
3
ACCEPTED MANUSCRIPT
parameter redundancy caused by the full connection between layers, which makes the training of
network model rely on the large amounts of data [38-40]. The connection mode of CNN is local
connection, which conforms to the sparse response characteristics of biological neurons. Thus, it
can greatly reduce the parameter size of the network model and the dependence on the amount of
training data. CNN mainly contains three kinds of layers: convolutional layer, pooling layer and
full-connected layer. The function of the three layers is feature learning and the features can be
learned automatically, which are not depend on any priori knowledge. In this paper, 1-D CNN will
be introduced because the input of CNN is raw vibration signal.
(1) Convolutional layer: It convolves the input local regions with filter kernels and each
kernel is convoluted across the input vector, producing a feature vector. Each filter uses the same
T
kernel to extract the local feature of the input local region, which is usually referred to as weight
IP
sharing in literature. Suppose that input signal x∈Rn, and filter w∈Rm, the valid convolution
process is described as follows:
CR
 xl   ( z l )
 l
z  Y  b
l l
 l
Y  conv( xl 1 , wl ,' valid ')  ( y l (1),..., y l (t ),... y l (n  m  1))  R n  m 1 (1)
 m
 y l (t )  xl 1 (t  i  1) wl (i )
 
i 1
US
AN
where bl is bias vector and zl is the linear activation vector in the lth layer, б(·) is the nonlinear
activation function and xl is the output features in lth layer. To illustrate the convolution step
intuitively, the process is shown in Fig.1 (a). In addition, the meaning of introducing the activation
M
function is to improve the characterization ability of the feature through nonlinear operation. In
application, the commonly used activation functions are Rectified Linear Unit (Relu) and sigmoid.
ED
(a) w1 (b) Max pooling

x1 w1x1+w2x2+w3x3 x1 max(x1,x2)
* Convolution Pooling size=2
w2
x2 w1x2+w2x3+w3x4 x2
w3 max(x3,x4)
PT
max(x5,x6)
Output Vector Output Vector
x5 x5
CE
x6 x6
Input Vector Input Vector
AC
Fig.1. (a) Illustration of the convolution process in the convolutional layer, and (b)
illustration of the max pooling process in the pooling layer
(2) Pooling layer: Essentially, the function of pooling operation is reducing the spatial
dimension, which can reduce computational complexity and effectively control the risk of
over-fitting. The commonly pooling methods contain average pooling, max pooling and norm
pooling. Max pooling is applied in this paper and the stride size of the pooling layer equals to the
pooling size. In the pooling layer l, the max pooling is conducted by
xl  down( xl 1, s) (2)
4
ACCEPTED MANUSCRIPT
where down(·) represents the down-sampling function of max pooling, xl is the output feature
vector of the pooling layer, xl-1 is the feature vector in the previous layer, and s is the pooling size.
For better understanding, Fig. 1(b) shows a max pooling process when s equals to 2.
(3) Full-connected (FC) layer: After several convolutional layers and pooling layers, the
learned features of each kernel are flattened into one vector and it will be the input of the classifier.
The output xl of the lth fully connected layer is obtained by
 x   ( z )
l l
 l l T l 1
(3)
 z  ( w ) x  b
l
where ul is the linear activation in FC layer, xl-1 is the output vector of the layer l-1,wl is the
T
weight matrix in FC layer, and bl is the bias vector.
IP
2.2 Tricks of CNN
Two tricks can be used to serve as interference during the training process to enhance the
CR
anti-noise and domain adaptation ability of CNN. First, batch normalize (BN) [41] is designed to
reduce the shift of internal covariance and accelerate the training process of deep neural network.
Second, drop out [42] is designed for avoiding over fitting by abandon some neural network unit
from the network randomly.
US
(1) BN: BN layer is usually added right after the convolutional layer or fully-connected layer
and before the activation unit. Given the q-dimension input to the lth BN layer yl=(yl(1),…,yl(q)), if
AN
BN layer is added right after the convolutional layer, yl(i)=(yl(i,1),…,yl(i,q)) (i denotes the ith kernel),
and if BN layer is added right after the fully-connected layer, yl(i)=yl(i)=yl(i,1). As shown above, in
BN operation, fully-connected layer can be treated as a special kind of convolutional layer, where
M
the number of neurons q in each feature map equals 1. The transformation of BN layer can be
described as follows:
 l (i , j ) y l (i , j )  
ED
 yˆ 
 ( 2   ) (4)
 l (i , j )
z   l (i ) yˆ l (i , j )   l (i )
PT
where z l(i,j) is the output of one neuron response (j denotes the jth input), μ=E[yl(i,j)],
б2=Var[yl(i,j)], ε is a small constant added for numerical stability, γl(i) and βl(i) are the scale and shift
parameters to be learned, respectively.
CE
(2) Dropout: Dropout trick was firstly proposed by Srivastava et al. to prevent the CNN from
over-fitting. It is simple but effective, which randomly deactivate kernels along with their
connection with kernels in other layers with a probability p during the training process. It had been
AC
proved that drop out can prevent units from co-adapting too much. It makes the full connected of
kernels in convolutional convert to semi-connected, which can be interpreted as sampling a
―thinned‖ network to original ―full‖ network. The drop rate function can be described as follows:
ri l ~Bernoulli ( p )

 l ri * K i
l l
 Ki  (5)
 p
 z l ( j)  K l * x l
 i i j
where p is the Dropout rate, * denotes the element-wise product and ril follows Bernoulli
5
ACCEPTED MANUSCRIPT
distribution, which is used to decide the ith kernel of the lth convolutional layer is dropped or not.
zil(j) is the thinned network output of jth input xj after convolutional process.
2.3 Proposed of MC-CNN

In original CNN, the inputs of network are the raw images or signals without any pre-process
and it may lead low prediction accuracy no matter how hyper-parameters change because the
useful information of the input is not sufficient for precise classification. Thus, it is meaningful to
enhance the prediction ability of the model by applying information enhancement technology to
the input. In 2-D CNN, there are some common tricks for information augmentation such as
horizontal/vertical turnover, color transformation, rotation transformation and noise adding. The
T
basic theory of these approaches is strengthening the robust of model by increasing number of
inputs. Another kind of tricks is information filtering such as random crop and random size
IP
changing. The main purpose of these tricks is to highlight the classification information of the
input. However, both information augmentation and information filtering tricks rely on the human
CR
knowledge and operation, which cannot ensure the performance improvement of model when
meet with variety of input samples. Moreover, the information enhancement technology is still not
applied into 1-D CNN when deal with the machinery fault diagnosis. Therefore, it is necessary to
design a self-adaptive information enhancement method for 1-D CNN.
As we know, the sensitive frequencies of signals are diverse in different conditions and they
may exist in different frequency bands. It is useful to apply the filters with different frequency
US
AN
resolution to find them out, which can enhance the classification information of input in frequency
domain. In addition, the different scales (lengths) of filters have different frequency resolution.
Thus, it is reasonable to add a multi-scale information fusion layer before convolutional layer in
M
CNN to enhance the classification information of the input. To combine the information of
different scales, the signals convoluted by kernels of different scales have been cascaded in series
since the frequency components of series signal contain all the sensitive frequency bands of
ED
multi-scale convoluted signals. The multi-scale information fusion layer named MC-CNN is
proposed in this paper. The architecture of MC-CNN is shown in Fig.2.
PT
.
.
.
.
CE
Scale 1
Sub-
sampling
-0.25
-0.15
-0.05
0.05
0.15
0.25
-0.2
-0.1
0.1
0.2
.
0
0
Sub-
Convolution
. sampling
100
Convolution
200
. . .
. Sub-
300
. sampling
Convolution
.
. .
400
.
Convolution
. . . .
AC
. .
500
. .
600
. . . . .
. .
700
. . . .
.
Scale 2 . . . .
.
800
.
.
. .
900
.
. . .
.
1000
.
.
.
. .
.
.
Scale n
Input MC C1 P1 C2 P2 C3 P3 F
Signal Kernel size Kernel size Pooling size Kernel size Pooling size Kernel size Pooling size Neuron number
[100 200 300] 8×1×8 2 32×1×8 2 16×1×8 2 112
Stride Stride Stride
2 4 2
Fig.2 The architecture of MC-CNN

6
ACCEPTED MANUSCRIPT
2.4 Training of MC-CNN

The training of 1-D CNN was detailed in many references [43-44]. In this section, the
training process of MC-CNN especially the kernels in MC layer has been elaborated. The loss
function of MC-CNN is the cross-entropy between the estimated soft-max output probability
distribution and the target class probability distribution. Suppose that p(x) denotes the target
distribution and q(x) denotes estimation distribution, the cross-entropy between p(x) and q(x) is
shown as follows
Loss  H ( p( x), q( x))   p( x)log q( x) (6)
x
For the back propagation of MC-CNN, the gradient δ of output layer can be described as
T
follows
IP
L( w, b) L( w, b)
L     '( z L ) (7)
z L
x L
CR
where  represents for Hadamard product.
The transmission of the gradient in each layer can be stated as follows
(1) MC layer
j 1
m   L ku  1
US
 j i ,l   i ,l 1[m  1: m  L  k j  1]  rot180( wm j l 1 )   ' ( z i ,l )
(8)
AN
u 1
where wmj is the kernel of jth scale in MC layer, б(zi,l) is output of lth layer of ith sample ,kj is
the length of jth kernel and L is the length of δi,j.
M
(2) Full-connected layer
 i ,l  (wl 1 )T  i ,l 1   ' ( zi ,l ) (9)

ED
(3) Convolutional layer
 i ,l   i ,l 1  rot180(wl 1 )   ' ( z i ,l ) (10)

PT
(4) Pooling layer
 i ,l  upsample( i ,l 1 )   ' ( zi ,l ) (11)

CE
In addition, the update of kernels w, wm and bias b can be processed in every epoch and they
can be described as
AC
(1) MC layer
m
wm j l  wm j l    j i ,l * rot180(ai ,l 1 ) (12)
i 1
(2) Full-connected layer

m m
wl  wl    i ,l (ai ,l 1 )T , bl  bl    i ,l (13)
i 1 i 1
(3) Convolutional layer
7
ACCEPTED MANUSCRIPT
m m
w  w     rot180(a
l l i ,l i , l 1
), b  b    ( i ,l )u , v
l l
(13)
i 1 i 1 u , v
After the optimization, the signal can act as the input of MC-CNN directly and the learned
kernels and output features of different condition signals will be analyzed.
3、 Experimental verification
In this section, the bearing datasets involving normal, inner fault, outer fault and ball fault are
used to verify the performance of MC-CNN. The structural framework of the proposed
fault diagnosis method based on MC-CNN is shown in Fig.3. The original experiments data
T
was collected from the accelerometers of the motor driving mechanical system at a sampling
frequency of 12 kHz from the Case Western Reserve University (CWRU) Bearing Data center
IP
[45]. Each sample contains 985 data points, which is matching with the structure of the MC-CNN.
There are 200 sets of sampling signal in each healthy condition. The diagnosis results of the
CR
experiment datasets based on original CNN and MC-CNN are compared and the specific
evolution of 1-D data in MC-CNN is displayed. In addition, different number of training samples
and samples with different signal noise ratios (SNRs) are used to verify the advantages of
MC-CNN compared with original 1-D CNN.
US
Signal Acquisition
AN
M
Training Sample Testing Sample

8
6
6
4
4
2 2
0
a/(ms -2)
0
加速度
a/(ms -2)
加速度
-2
-2
-4
-4
-6
-6 -8
-10
-8 0 0.02 0.04 0.06 0.08 0.1 0.12
0 0.02 0.04 0.06 0.08 0.1 0.12 时间 t/s
时间 t/s
ED
MC-CNN Model Construction

Training
Testing
PT
.
.
. .
Kernel Learning . Classification
.
. .
. .
.
CE
Bearing Fault Diagnosis

AC
Fig.3 The structural framework of the proposed fault diagnosis method based on MC-CNN
3.1 Diagnosis results and performance comparison with different training datasets
To compare the classification performance of MC-CNN and original CNN, three datasets
(Dataset A, B and C) shown in Table 1 are constituted with different training samples and testing
samples. The details of the architecture of MC-CNN and original CNN are shown in Table 2 and
Table 3, respectively. There are three convolutional layers, pooling layers and one MC layer in the
MC-CNN. However, the original CNN only contains convolutional layers and pooling layers. The
kernel sizes of the first layer in both MC-CNN and CNN are wide because the wider kernels can
8
ACCEPTED MANUSCRIPT
better restrain high frequency noise compared with small kernels [27]. The small kernels in the
following layer make the networks deeper, which helps to get useful representation of the input
signals and improve the performance of the network. The length of multi-scale kernels in the first
layer of MC-CNN is set to be 100, 200 and 300, respectively. On the other hand, the kernel size in
the first layer of original CNN is set to be 200 and the kernel number is the same as that of
MC-CNN in first layer. The normalized training samples are the input of both MC-CNN and
original CNN. The activation function in the MC-CNN is Sigmoid and pooling type is max
pooling. Both Sigmoid and Relu activation functions are used in original CNN to compare the
effectiveness with MC-CNN. The learning rate of the three methods is 0.01 and the dropout rate is
0.4. The training epoch is set to be 20 and the batch size is 20. After each convolutional layer and
T
full-connected layer, batch normalization is used to improve the performance of MC-CNN. The
IP
number of output neurons in MC-CNN is 112 and that in original CNN is 152.
Tab.1 Description of rolling element bearing datasets
CR
Condition Normal Ball Inner Outer
Dataset train 75 75 75 75
A test 125 125 125 125
Dataset
B
Dataset
train
test
train
100
100
175
US 100
100
175
100
100
175
100
100
175
AN
C test 25 25 25 25
Tab.2 The details of the architecture of MC-CNN

Layer Parameter name Parameter size Stride Output size
M
Input layer / / / 985×1

MC Kernels [100 200 300] 1 2358×1
ED
C1 Kernels 8×1×8 2 1176×8

P1 Max pooling 2 / 588×8
C2 Kernels 32×1×8 4 140×8
PT

C3 Kernels 16×1×8 2 28×8
CE
F / / / 112
Tab.3 The details of the architecture of CNN

AC

C1 Kernels 200×1×3 1 786×3
C2 Kernels 8×1×8 1 779×8
P1 Max pooling 2 / 390×8(padding)
C2 Kernels 32×1×8 2 180×8
C3 Kernels 16×1×8 2 38×8
F / / / 152
9
ACCEPTED MANUSCRIPT
Fifty times of the experiments had been accomplished and the average accuracies of the three
models are shown in Fig.4. It can be seen from the Fig.3 that the accuracies of MC-CNN are
higher than that of CNN-S (CNN of sigmoid activation function) and CNN-R (CNN of Relu
activation function) obviously in all the datasets. Although, it can be found that the original CNNs
can distinguish most of the testing samples accurately, but they are still less excellent than
MC-CNN in classification, which can be concluded that multi-scale cascade signals can provide
more distinguishing information than the raw signals.
100%
MC-CNN-S
CNN-S
95 % CNN-R
T
IP
90 %
85 %
CR
80 %
75 %
70 %
US
AN
Dataset A Dataset B Dataset C
MC-CNN-S 97.2±0.16 98.5±0.14 99.7±0.11

CNN-S 94.1±0.37 94.9±0.29 96.5±0.23
CNN-R 85.7±0.42 87.2±0.32 92.0±0.17
M
Fig. 4. Testing accuracies of the three datasets using the three methods
To reveal what limits the effectiveness of original CNNs, the losses of the three methods
ED
based on the dataset C are shown in Fig. 5. The loss of MC-CNN is close to zero in the end of
epochs, but the losses of original CNNs converge to about 0.1, which can be attributed to the lack
of classification information in the raw signals. Thus, it can be proved that MC layer actually
PT
make the signal information enhancement by combining the useful components come from
different frequency resolutions. Even though the data amount has been increased caused by series
operation, one more convolutional layer with small kernels and pooling layer can address it
CE
effectively. Therefore, it is reasonable to add MC layer in the CNN when deal with the fault
classification problem of machinery signals. In addition, the running time of MC-CNN and
original CNN is compared in Table 4. It can be sure that the running time of MC-CNN is higher
AC
than CNN because of the deeper layer and longer input produced by the cascade. However, the
sampling time of one sample is about 0.085s because the sampling frequency is 12 kHz and the
running time of MC-CNN is lower than the sampling time. Therefore, the fault diagnosis method
based on MC-CNN can be applied to real time monitoring of rotating machinery.
Tab.4 Running time comparison of MC-CNN and CNN
Running Time(s)
Model
Testing Sample(Dataset A) Average Sample
MC-CNN 38.22 0.0637
CNN 27.85 0.0464
10
ACCEPTED MANUSCRIPT
0.9
MC-CNN-S
0.8 CNN-R
CNN-S
0.7
0.6
Error 0.5
0.4
0.3
0.2
T
0.1
IP
0
0 100 200 300 400 500 600 700
Epoch
CR
Fig. 5. The loss of MC-CNN-S, CNN-R and CNN-S
3.2 The details of MC-CNN
US
In order to reveal the function of multi-scale cascade layer, the evolution of the neurons in the
MC-CNN, the learned cascade kernels, multi-scale cascade signals and the output of each layer in
the MC-CNN are displayed in this section. It can be seen from the Fig. 6 that the length of cascade
AN
signals has been increased because of the series connection of signals convoluted by kernels with
different scales. In addition, the waveform shapes of cascaded signals have been changed. In Fig.
6 (a), there is little difference between the signal wave shapes (SWS) of ball and normal. However,
M
in Fig. 6 (b), the cascade-SWS of them are totally different, which can provide more useful
features for classifier. In addition, the shapes of inner and outer cascade signals have been altered
and own more information than raw signals. On the other hand, the frequency domains of the raw
ED
signals and MC signals are shown in Fig. 7. It can be concluded from the comparisons of the
frequency components that the MC signals remain the principal frequencies but get rid of some
non-essential frequencies, which can increase the discrimination degree of each signal because the
PT
convolution operator in time domain is equal to product in frequency domain.

CNN are able to learn features from the signals automatically and t-distributed stochastic
neighbor embedding (t-SNE) is utilized to visualize the learned features. To verify that MC
CE
operation can make the classification information of signal enhanced, the three learned features in
the MC layer and the first convolutional layer of CNNs are mapped into two-dimension features.
The mapped features of the three models are shown in Fig. 8 and it can be seen from the maps that
AC
the features in first layer of CNNs overlap in ball and normal. It is the same with the samples of
outer and inner. In contrary, the features in MC-layer cluster well in the samples of normal and
outer even the ball and inner are overlapped. Moreover, the t-SNE clustered error of MC-CNN is
less than that of CNNs, which makes network to converge easier.
To reveal the meaning of learned kernels in MC-CNN, the cascade kernels are shown in Fig.
9. It can be seen from the Fig. 9 that the three cascaded kernels are consist of several sinusoidal
signals of different low frequencies. The kernels of larger scale have higher resolution in
frequency domains, which are used to integrate the useful frequency bands containing more
classification information under different frequency resolutions. Thus classification degrees of the
four cascaded signals in time domain are higher than raw signals. After cascaded, the features in
11
ACCEPTED MANUSCRIPT
full-connected layer are filtered out by the following three convolutional layers. The convolutional
kernels shown in Fig. 10 are sets of filters actually. Although a few of these filters present similar
properties in different layers, most of them have different frequency bands concerning
low-frequency or high-frequency information of the input MC signals.
0.5 4 20
10
2 10
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-2 -10
-10
-0.5 -4 -20
0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
Sampling Point Sampling Point Sampling Point Sampling Point
Ball Inner Ball Inner
T
5 2 100
0.2
1 50
Amplitude
Amplitude
Amplitude
0.1
Amplitude
IP
0 0 0 0
-0.1
-1 -50
-0.2
-5 -2
CR
-100
0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
Sampling Point (a) Sampling Point Sampling Point Sampling Point
Normal Outer Normal
(b) Outer
Fig.6 Raw signals (a) and MC signals (b) of each healthy condition of bearings
0.1 0.4
0.3
US 1.5 3
Amplitude
Amplitude
Amplitude
1
Amplitude
0.05 0.2
0.5 1
AN
0.1
0 0 0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
Frequency(Hz) Frequency(Hz) Frequency(Hz) Frequency(Hz)
0.08 0.8 0.8 30

M
0.06 0.6 0.6

Amplitude
Amplitude
20
Amplitude
Amplitude
0.04 0.4 0.4

10
0.02 0.2 0.2
0 0
ED
0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
Normal (a) Outer Normal (b) Outer
Fig.7. Frequency domain of Raw signals (a) and MC signals (b) of each healthy condition
100 100
PT
Mapped feature 2
Mapped feature 2
50 50
0 0
CE
-50 -50
-100 -100
-100 -50 0 50 100 -100 -50 0 50 100
Mapped feature 1
AC
Mapped feature 1
CNN-S:0.91 CNN-R:1.06
100
Ball
Mapped feature 2
50
Inner
0 Normal
-50 Outer
-100
-100 -50 0 50 100
Mapped feature 1
MCCNN:0.71
Fig.8. The visualization of the MC layer in MC-CNN and the first convolutional layer of CNN-S
and CNN-R for Dataset C
12
ACCEPTED MANUSCRIPT
1 0.4
Amplitude
Amplitude
0 0.2
-1 0
0 10 20 30 40 50 60 70 80 90 100 0 1000 2000 3000 4000 5000 6000
Kernel size:100 Frequency(Hz)
K1 K1
1 0.4
Amplitude
Amplitude
0 0.2
-1 0
0 20 40 60 80 100 120 140 160 180 200 0 1000 2000 3000 4000 5000 6000
K2 K2
0.5 0.1
Amplitude
Amplitude
0 0.05
-0.5 0
0 50 100 150 200 250 300 0 1000 2000 3000 4000 5000 6000
K3 K3
(a) Time domain
(b) Frequency domain
T
Fig. 9. The learned cascaded kernels
IP
Learned kernels in C1 Learned kernels in C2
0.5 0.5 0.2 0.2
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
CR
-0.5 -0.5 -0.2 -0.2
2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
0.5 0.5 0.2 0.2
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-0.5 -0.5 -0.2 -0.2

2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
US
0.5 0.5 0.2 0.1
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-0.5 -0.5 -0.2 -0.1

2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
1 0.5 0.2 0.2
Amplitude
Amplitude
Amplitude
Amplitude
AN
0 0 0 0
-1 -0.5 -0.2 -0.2

2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
Learned kernels in C3
-0.05 -0.05
Amplitude
Amplitude
M
-0.1 -0.1
-0.15 -0.15
5 10 15 5 10 15
-0.05 -0.05
Amplitude
Amplitude
ED
-0.1 -0.1
-0.15 -0.15
5 10 15 5 10 15
-0.05 -0.05
Amplitude
Amplitude
-0.1 -0.1
PT
-0.15 -0.15
5 10 15 5 10 15
-0.05 -0.05
Amplitude
Amplitude
-0.1 -0.1
CE
-0.15 -0.15
5 10 15 5 10 15
Fig.10. The learned kernels of convolutional layers in MC-CNN

AC
In order to reveal how the raw signals change in the MC-CNN and what is the input of the
classifier, the evolution of the inputs in MC-CNN and the features of full-connected (FC) layer are
shown in Fig. 11 and Fig. 12, respectively. It can be seen from the Fig. 11 that the outputs of the
C3 (the third convolutional layer) are similar to the wave shape of MC signals and they can be
distinguished obviously in the four conditions. The full connected features are constituted by the
features in C3 and it can be seen from the Fig.12 that the features before putting into the classifier
are linear separable which can increase the classification accuracy. Furthermore, the full connected
features of CNN-S, CNN-R and MC-CNN are mapped into two-dimension features using t-SNE
which are shown in Fig. 13. Obviously, the mapped features of MC-CNN cluster better than CNNs
even though the CNNs can differentiate most samples. Thus, the t-SNE error of MC-CNN is the
13
ACCEPTED MANUSCRIPT
lowest among the three models.

1 1 1 1 1 1
0.5 0.5 0.5 0.5 0.5 0.5
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.7
1 1 1 1 1 1
0.6 8
0.6
0.5
6
0.5 0.5 0.5 0.5 0.5 0.5
0.4
4
0.3 0.5
2
0 0 0 0 0 0
0.2
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.1 0
1 1 1 1 0.5 1
0.4
0
-2
-0.1
-4
0.5 0.5 0.3
-0.2 0.5 0.5 0.5
-6
-0.3
0.2
-0.4
0 100 200 300 400 500 600 700 800 900 1000
-8
0 500 1000 1500 2000 2500 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 0
0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
1 1 1 1 C3 1 1
0.1
0.5 0.5 0.5 0.5

0
0.5 0.5 0 20 40 60 80 100 120
0 0
0 500 1000 1500 0 500 1000 1500 0 0 50 100 150
0
0 50 100 150
0 0
0 10 20 30 0 10 20 30
Input MC C1 C2 C3 F
(a):Ball
1 1 1 1 1 1
0.5 0.5 0.5 0.5 0.5 0.5

0.7
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
1 1 1 1 1 1 0.6
3
15
T
2 10
0.5 0.5 0.5 0.5 0.5 0.5 0.5
5
1
0 0 0 0 0 0
0
0 0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30 0.4
-5 1 1 1 1 0.5 1
-1
-10 0.3
-2
0.5 0.5 0.5 0.5 0.5
-15
-20
0.2
-3
0 100 200 300 400 500 600 700 800 900 1000 0 500 1000 1500 2000 2500 0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
C3
IP
1 1 1 1 1 1 0.1
0.5 0.5 0.5 0.5 0.5 0.5 0

0 20 40 60 80 100 120
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
Input MC C1 C2 C3 F
CR
(b):Inner
1 1 0.1 0.1
0.8 0.8
0.5 0.5 0.05 0.05
0.6 0.6
0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0.4 0.4 0.6
1 1 0 10 20 30 0 10 20 30
0.25 2 0.2 0.1
0.2
0.8 0.8
1.5
0.15 0.5 0.5 0.1 0.05 0.55
0.1
1
0.6 0.6
0.5
0.05 0 0 0 0
0 0 0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0.4 0.4
1 1 0 10 20 30 0 10 20 30 0.5
-0.05
0.1 0.1
US
-0.5
-0.1 0.8
-1
-0.15 0.5 0.5 0.45
0.05 0.05
-0.2
-1.5
0.4 0.6 0.45
-0.25
0 100 200 300 400 500 600 700 800 900 1000
-2
0 500 1000 1500 2000 2500 0 0 0.35
0 500 1000 1500 0 500 1000 1500 0 0
0 50 100 150 0 50 100 150 0.4
1 1 0 10 20 30 0 10 20 30
0.2 0.2 0.4
0.8 0.8
0.5 0.5
0.1 0.1
0.6 0.6
0.35
0 0 0 20 40 60 80 100 120
0 500 1000 1500 0 500 1000 1500 0 0 0.4 0.4
0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
Input MC C1 C2 C3 F
AN
(c):Normal
1 1 1 1 0.04 0.04
0.5 0.5 0.5 0.5 0.02 0.02
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
5 0.03
100
1 1 1 1 0.04 0.02
4
80
3
60 0.5 0.5 0.5 0.5 0.02 0.01
2
0.025
40
1 20
0 0 0 0 0 0
0 0 0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.02
-1 -20 1 1 1 1 0.02 0.04
-40
-2
-60
-3 0.5 0.5 0.5 0.5 0.01 0.02
-80 0.015
M
-4
0 100 200 300 400 500 600 700 800 900 1000 -100
0 500 1000 1500 2000 2500
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.01
1 1 1 1 0.04 0.04
0.5 0.5 0.5 0.5 0.02 0.02 0.005
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0
Input MC C1 C2 C3 F
0 20 40 60 80 100 120
(d):Outer
ED
Fig.11 The whole evolution of signals of each healthy condition in MC-CNN
0.8 0.8
PT
0.6 0.6
Amplitude
Amplitude
0.4 0.4
CE
0.2 0.2
0 0
0 50 100 0 50 100
AC
FC feature FC feature
Ball Inner
0.6 0.03
0.55
0.5 0.02
Amplitude
Amplitude
0.45
0.4 0.01
0.35
0
0 50 100 0 50 100
Normal Outer
Fig.12 Full-connected layer features of each signals based on MC-CNN
14
ACCEPTED MANUSCRIPT
60 40
Mapped feature 2
40
Mapped feature 2
20
20
0
0
-20 -20
-40 -40
-40 -20 0 20 40 60 -20 -10 0 10 20 30
Mapped feature 1 Mapped feature 1
CNN-S:0.418 CNN-R:0.437
60
Ball
Mapped feature 2
40
T
Inner
20
Normal
IP
0
Outer
-20
CR
-40
-100 -50 0 50 100
Mapped feature 1
MCCNN:0.236
Fig.13 The visualization and cluster error of the full-connected features for Dataset C
3.3 Diagnosis results and performance comparison under noise condition US

In this section, the diagnosis accuracy of the proposed MC-CNN method under noise
AN
environment will be discussed. The white Gaussian noise will be added to the four original signals
to composite signals with different SNR, and the definition of SNR is shown as follows:
Pnoise
SNRdb  10log10(
M
)
Psignal
where Psignal and Pnoise are the power of signal and the noise, respectively. The noisy signals
ED
samples contain 0dB, 2dB, 4dB and 6dB. The structures and parameters of MC-CNN and CNNs
are the same as the previous part.
The diagnosing results of the proposed MC-CNN model with noise signals have been
PT
obtained after fifty times of experiments and the average accuracy are shown in Table 5. It can be
seen that all the models achieve higher accuracy when SNR value is lower and the accuracy of
MC-CNN achieve highest value no matter how does the SNR changes. The noisy samples of 4dB
CE
are used as an example to display the process of MC-CNN. The time domain and frequency
domain of original signals and MC signals with 4dB SNR are shown in Fig. 14 and Fig. 15,
respectively. It cannot be found some obvious difference between raw signals and MC signals
AC
especially in ball and normal. However, in frequency domain, the frequency components of MC
signals are totally different from that of raw signals. The frequency distributions of raw signals in
ball and normal are almost the same, which is caused by the noise. On contrary, some
characteristic frequencies arise after the raw signals are multi-scale cascaded in signals of ball and
normal, which can help the convolutional layer of CNN to extract the classification information.
In addition, time domain and frequency domain of the learned cascaded kernels are shown in
Fig.16. It can be seen from the frequency domain of different scale kernels that each kernel has its
own sensitive frequency band and the cascaded signals have all the sensitive bands of the three
kernels. Finally, the full-connected features of the four noise signals are shown in Fig.17 (a). It can
be clearly revealed that the difference of outer from other features and the features of ball，normal
15
ACCEPTED MANUSCRIPT
and inner are similar in general. In order to prove that the full-connected feature difference of
different conditions is greater than that of same condition, the full-connected features of four
samples come from the same condition are shown in Fig. 17 (b), (c), (d), respectively. It can be
found that the wave of features come from the same condition are almost the same and details of
features come from different condition can be figured out by the red circle. Thus, it can be
demonstrated that MC-CNN still has the ability to distinguish the fault conditions even though in
noisy environment.
4 4
200 200
2 2 100 100
Amplitude
Amplitude
Amplitude
Amplitude
T
0 0 0 0
-2 -2 -100 -100
IP
-4 -4 -200 -200
0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
4 10 200 1000
CR
2 5 100 500
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-2 -5 -100 -500
-4 -10 -200 -1000
US
0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
Fig.14. Original signals (a) and MC signals (b) with SNR value of 4 dB
AN
0.2 0.4 10 20
0.15 0.3 15
Amplitude
Amplitude
Amplitude
Amplitude
0.1 0.2 5 10
M
0.05 0.1 5
0 0 0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
0.2 1 10 80
ED
0.15 60
Amplitude
Amplitude
Amplitude
Amplitude
0.1 0.5 5 40
0.05 20
PT
0 0 0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
Frequency(Hz) (a) Frequency(Hz) Frequency(Hz) (b) Frequency(Hz)
Normal Outer Normal Outer
Fig.15. Frequency domain of Raw signals (a) and MC signals (b) with SNR value of 4 dB
CE
20 4
Amplitude
Amplitude
0 2
-20 0
AC
0 10 20 30 40 50 60 70 80 90 100 0 1000 2000 3000 4000 5000 6000

K1 K1
10
Amplitude
2
Amplitude
0 1
-10 0
0 20 40 60 80 100 120 140 160 180 200 0 1000 2000 3000 4000 5000 6000
K2 K2
10
Amplitude
2
Amplitude
0
1
-10
0 50 100 150 200 250 300 0
0 1000 2000 3000 4000 5000 6000
Kernel size:300
Frequency(Hz)
K3
K3
(a) Time domain (b) Frequency domain
Fig.16 The learned cascaded kernels of noisy signals
16
ACCEPTED MANUSCRIPT
1 1 0.8 0.8
0.6 0.6
Amplitude
Amplitude
Amplitude
Amplitude
0.5 0.5 0.4 0.4
0.2 0.2
0 0 0 0
0 50 100 0 50 100 0 50 100 0 50 100
Ball Inner Inner Inner
1 0.4
1 1
0.3
Amplitude
Amplitude
Amplitude
Amplitude
0.5 0.2
0.5 0.5
0.1
0 0
0 50 100 0 50 100 0 0
0 50 100 0 50 100
T
Normal Outer Inner
(a) Four conditions (b) Four samples of Inner Inner
IP
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
Amplitude
Amplitude
Amplitude
Amplitude
CR
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
0 50 100 0 50 100 0 50 100 0 50 100
Ball Ball Normal Normal
US
0.8 0.8 1 1
0.6 0.6
Amplitude
Amplitude
Amplitude
Amplitude
0.4 0.4 0.5 0.5
0.2 0.2
AN
0 0 0 0
0 50 100 0 50 100 0 50 100 0 50 100
(c) Four samples of Ball (d) Four samples of Normal
samples of No
Fig.17 Full-connected layer features of each condition signals with SNR value of 4 dB
M
Tab.5 Diagnosis accuracies of the three models under different SNR

SNR(dB) 0 2 4 6
ED
MC-CNN 99.61±0.11 97.33±0.23 95.71±0.17 91.33±0.08

CNN-S 95.32±0.38 89.54±0.27 83.69±0.41 74.54±0.62
CNN-R 88.13±0.46 83.17±0.39 71.82±0.42 62.27±0.35
PT
3.4 Diagnosis results comparison with different kernel scales in MC layer

In the part 3.2, the scale of the kernels in MC layer is set to be 100, 200 and 300. The scale is
CE
set artificially and it is hard to define the optimum scale for any signal because the frequency
distribution of different signal is diverse. However, it is determined that the different scales have
different contributions to the fault diagnosis. In this section, three sets of kernels with different
AC
scales in MC layer are used to construct three MC-CNN models and the fault diagnosis accuracy
will be compared. The three scales of kernels are shown in Table 6.
Tab.6 Different scales of kernels in MC layers
Kernels Scale 1 Scale 2 Scale 3
Ⅰ 50 100 150
Ⅱ 100 200 300
Ⅲ 150 300 450
The training data is the dataset A, B and C in part 3.1 and the training parameters are the
same. After fifty times of experiments, the comparison results are shown in Fig.18. It can be found
17
ACCEPTED MANUSCRIPT
from the results that the all the three models have higher diagnosis accuracy than original CNNs
and the kernels Ⅱ has the highest in all the three datasets, which can be deduced that the optimal
scale exists and it can help to obtain the higher recognition accuracy. In the future research, the
optimal scale will be considered as a hyperparameter to be learned using some adaptive
optimization algorithms.
100%
Scale 1
Scale 2
95 % Scale 3
T
90 %
IP
85 %
CR
80 %
75 %
70 %
Dataset A US
Dataset B Dataset C
AN
Scale 1 95.7±0.07 97.0±0.21 96.9±0.17
Scale 2 97.2±0.16 98.5±0.14 99.7±0.11
Scale 3 96.3±0.31 98.2±0.18 98.5±0.12
Fig. 18. Testing accuracies of kernels with different scales in MC layer
M
3.5 Diagnosis results and performance comparison using nonstationary dataset

In rotation mechanical fault diagnosis, vibration signal of rolling bearing under vibration
ED
condition can be considered as the nonstationary signal because the signal dose not satisfy the
stability requirement of Fourier transform. However, the operation conditions of rolling bearing
are always changing in practical engineering. Therefore, it is meaningful to study the effectiveness
PT
of the proposed method in the application of non-stationary signals. In this section, the signals
under variable speed and loading are used to verify the validity of MC-CNN. The Suzhou
University bearing fault simulation bench is shown in Fig.19 and the signals are collected on it.
CE
The type of the experimental rolling bearing is SKF6205. Six operation conditions had been
simulated during the experiment as shown in Table 7. Vibration signals of different fault patterns
under the six operation conditions had been collected, which are normal, inner fault of 0.2mm,
AC
inner fault 0.4mm, outer fault of 0.2mm, outer fault of 0.3mm and rolling ball fault of 0.2mm. The
sampling frequency is 8192Hz and the sampling length of one sample is 1024. There were 160
samples in each fault pattern dataset under one operation condition. The datasets of the training
samples are the same as Table 1. The time-plot of different fault pattern signal under operation
condition Ⅰ and the time-plot of 0.4mm inner fault signal under different operation conditions are
shown in Fig.20 and Fig.21, respectively.
The model structure of MC-CNN is the same as the Tab.3. The activation function is Sigmoid
and pooling type is max pooling. Similarly, the Sigmoid and Relu activation functions of CNN are
compared with MC-CNN. In addition, the learning rate of the three methods is 0.01, the dropout
rate is 0.4, the training epoch is set to be 50 and the batch size is 20.
18
ACCEPTED MANUSCRIPT
Fig.19 Suzhou University bearing fault simulation bench
T
IP
Operation Condition Loading(N) Speed (rpm)
CR
Ⅰ 0 900
Ⅱ 2000 900
Ⅲ 0 1200
Ⅳ
Ⅴ
Ⅵ
US
2000
0
2000
1200
1500
1500
AN
Normal Inner fault of 0.2mm
5
Amplitude
Amplitude
a/(ms -2)
200
a/(ms -2)
0 0
-200
-5
0 0.05 0.1 0 0.05 0.1
M
Time t/s Time t/s
Inner fault of 0.4mm Outer fault of 0.2mm

200 200
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
0 0
ED
-200 -200
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
Outer fault of 0.3mm Rolling ball fault of 0.2mm

200 20
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
PT
0 0
-200 -20
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
CE
Fig.20 Time-plot of different fault pattern signal under operation condition Ⅰ

0N+900rpm 0N+1200rpm
200 200
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
AC
0 0
-200 -200
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
0N+1500rpm 2000N+900rpm
500 100
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
0 0
-500 -100
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
2000N+1200rpm 2000N+1500rpm
100 500
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
0 0
-100 -500
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
Fig.21 Time-plot of 0.4mm inner fault signal under different operation condition
19
ACCEPTED MANUSCRIPT
90%
MC-CNN-S
85% CNN-S
CNN-R
80%
75%
70%
65%
T
60%
IP
55%
CR
50%
MC-CNN-S 75.0±0.53 81.4±0.46 83.2±0.47

CNN-S 56.8±0.65 62.2±0.42 64.1±0.41
CNN-R 52.1±0.71
US
67.7±0.50 65.9±0.57
Fig. 22. Testing accuracies of the three nonstationary datasets using the three methods
AN
After fifty times of the experiments, the comparison results are shown in Fig.22. It can be
concluded that the nonstationary information caused by the variable condition does interfere with
model learning, but MC-CNN can still distinguish the different fault pattern in general. Moreover,
M
MC-CNN also has higher recognition accuracy than original CNNs, which proves the
effectiveness of multi-scale cascade layer further.
4、 Conclusions
ED
An improved framework called MC-CNN is proposed for fault classification of machinery. In

this framework, a multi-scale cascaded layer is proposed and it is added in original CNN to
PT
enhance the distinguish ability of signals come from different fault conditions by integrating the
multi-scale information of the original signal. The way to obtain the multi-scale cascaded kernels
is the same as the convolutional kernels, which can be learned by back propagation algorithm.
CE
Bearing datasets with four conditions under normal and noisy environments are used to verify the
proposed MC-CNN. The classification results show that MC-CNN has higher classification
accuracy than original CNNs both in normal and noisy environment. Compared with clustering
AC
effect of the features in MC layer and first convolutional layer in original CNN using t-SNE, the
cascaded features have lower t-SNE clustered error, which can verify the effectiveness and
necessity of MC layer. Furthermore, the whole evolution of signals in MC-CNN has been
displayed in this paper, which may help us understand how the kernels are learned in each layer of
MC-CNN. Finally, it has been proved that MC-CNN can be applied to fault diagnosis under
non-stationary working conditions by analyzing the non-stationary signals. In addition, the
number and scale of kernels in MC layer will be optimized using some adaptive optimization
algorithms in the future research.
20
ACCEPTED MANUSCRIPT
Acknowledgement
This research is supported by the National Natural Science Foundation of China (51575168 and 51875
183) and Key Research and Development Program of Hunan Province (2017GK2182). The authors als
o would like to thank the support from the Collaborative Innovation Center of Intelligent New Energy
Vehicle, the Hunan Collaborative Innovation Center for Green Car.
Reference
[1] Zhao C , Gao F . Fault Subspace Selection Approach Combined With Analysis of Relative Changes for Reconstruction
T
Modeling and Multifault Diagnosis[J]. IEEE Transactions on Control Systems Technology, 2016, 24(3):928-939.
[2] Yu W , Zhao C . Sparse Exponential Discriminant Analysis and Its Application to Fault Diagnosis[J]. IEEE Transactions on
IP
Industrial Electronics, 2018, 65(7):5931-5940.
[3] Yu W , Zhao C . Online Fault Diagnosis in Industrial Processes Using Multimodel Exponential Discriminant Analysis
CR
Algorithm[J]. IEEE Transactions on Control Systems Technology, 2018:1-9.
[4] Yu W , Zhao C . Recursive Exponential Slow Feature Analysis for Fine-scale Adaptive Processes Monitoring with
Comprehensive Operation Status Identification[J]. IEEE Transactions on Industrial Informatics, 2018:1-1.
US
[5] Zhao C , Huang B . A Full-condition Monitoring Method for Nonstationary Dynamic Chemical Processes with Cointegration
and Slow Feature Analysis[J]. AIChE Journal, 2018, 64(5): 1662-1681.
[6] Zhao C, He S. Dynamic distributed monitoring strategy for large-scale nonstationary processes subject to frequent varying
AN
conditions under closed-loop control[J]. IEEE Transactions on Industrial Electronics, 2019, 66(6):4749-4758.
[7] Sun H , Zhang S , Zhao C , et al. A sparse reconstruction strategy for online fault diagnosis in nonstationary processes with no
priori fault information[J]. Industrial & Engineering Chemistry Research, 2017, 56 (24), 6993-7008.
M
[8] Djurdjanovic D, Lee J, Ni J. Watchdog Agent—an infotronics-based prognostics approach for product performance
degradation assessment and prediction [J]. Advanced Engineering Informatics, 2003, 17(3–4):109-125.
[9] Li B, Chow M Y, Tipsuwan Y, et al. Neural-network-based motor rolling bearing fault diagnosis[J]. Industrial Electronics
ED
IEEE Transactions on, 2002, 47(5):1060-1069.

[10] Muruganatham B, Sanjith M A, Krishnakumar B, et al. Roller element bearing fault diagnosis using singular spectrum
analysis[J]. Mechanical Systems & Signal Processing, 2013, 35(1-2):150-166.
PT
[11] Yan R, Gao R X. Approximate entropy as a diagnostic tool for machine health monitoring[J]. Mechanical Systems and
Signal Processing, 2007, 21(2): 824-839.
[12] Sun C, Zhang Z, He Z, et al. Novel method for bearing performance degradation assessment–A kernel locality preserving
CE
Projection-based approach[J]. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical
Engineering Science, 2014, 228(3): 548-560.
[13] Sun C, Zhang Z, He Z, et al. Manifold learning-based subspace distance for machinery damage assessment [J]. Mechanical
AC
Systems and Signal Processing, 2016, 70: 637-649.

[14] Pan Y, Chen J, Guo L. Robust bearing performance degradation assessment method based on improved wavelet packet–
support vector data description[J]. Mechanical Systems and Signal Processing, 2009, 23(3): 669-681.
[15] Yu J. Bearing performance degradation assessment using locality preserving projections and Gaussian mixture models [J].
Mechanical Systems and Signal Processing, 2011, 25(7): 2573-2588. [7] Jin X, Zhao M, Chow T W S, et al. Motor Bearing Fault
Diagnosis Using Trace Ratio Linear Discriminant Analysis[J]. IEEE Transactions on Industrial Electronics, 2014,
61(5):2441-2451.
[16] Pan B, Shi Z, Xu X. R-VCANet: A New Deep-Learning-Based Hyperspectral Image Classification Method[J]. IEEE Journal
of Selected Topics in Applied Earth Observations & Remote Sensing, 2017, 10(5):1975-1986.
[17] Fang B, Li Y, Zhang H, et al. Semi-Supervised Deep Learning Classification for Hyperspectral Image Based on
21
ACCEPTED MANUSCRIPT
Dual-Strategy Sample Selection [J]. Remote Sensing, 2018, 10(4):574.

[18] Brunetti A, Buongiorno D, Trotta G F, et al. Computer Vision and Deep Learning Techniques for Pedestrian Detection and
Tracking: A Survey[J]. Neurocomputing, 2018, 300: 17-33.
[19] Nguyen V N, Jenssen R, Roverso D. Automatic autonomous vision-based power line inspection: A review of current status
and the potential role of deep learning [J]. International Journal of Electrical Power & Energy Systems, 2018, 99:107-120.
[20] G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, B.
Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE
Signal Process. Magaz. 29 (6) (2012) 82–97.
[21] Shao H, Jiang H, Zhang H, et al. Rolling bearing fault feature learning using improved convolutional deep belief network
with compressed sensing[J]. Mechanical Systems & Signal Processing, 2018, 100:743-765.
T
[22] Shao H, Jiang H, Lin Y, et al. A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep
IP
auto-encoders [J]. Mechanical Systems & Signal Processing, 2018, 102:278-297.
[23] Wentao Mao, Wushi Feng, Xihui Liang, et al. A novel deep output kernel learning method for bearing fault structural
CR
diagnosis [J]. Mechanical Systems & Signal Processing, 2019, 117:293-318.
[24] Shao H, Jiang H, Li X, et al.. Rolling bearing fault detection using continuous deep belief network with locally linear
embedding [J]. Computers in Industry, 2018, 96:27-39.
US
[25] Zhang Y, Li X, Gao L, et al. A new subset based deep feature learning method for intelligent fault diagnosis of bearing[J].
Expert Systems with Applications, 2018, 110: 125-142.
[26] Osama Abdeljaber a , Onur Avci a, Mustafa Serkan Kiranyaz, et al. 1-D CNNs for structural damage detection: Verification
AN
on a structural health monitoring benchmark data [J]. Neurocomputing, 2018, 275: 1308-1317.
[27] Osama Abdeljaber a, OnurAvci a,n, SerkanKiranyaz, et al. Real-time vibration-based structural damage detection using
one-dimensional convolutional neural networks [J]. Journal of Sound and Vibration, 2017, 388: 154-170.
[28] David George, Xianghua Xie, Gary KL Tam. 3D mesh segmentation via multi-branch 1D convolutional neural networks [J].
M
Graphical Models,2018,96:1-10.
[29] Dat Thanh Tran ,Alexandros Iosifidis, Moncef Gabbouj. Improving efficiency in convolutional neural networks with
multilinear filters [J]. Neural Networks, 2018, 105: 328-339.
ED
[30] Mohammed Yousefhussien, David J. Kelbe , Emmett J. Ientilucci. A multi-scale fully convolutional network for semantic
labeling of 3D point clouds [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 143:191-204.
[31] Feng jia, Yaguo Lei, Na Lu, et al. Deep normalized convolutional neural network for imbalanced fault classification of
PT
machinery and its understanding via visualization [J]. Mechanical Systems & Signal Processing, 2018, 110:349-367.
[32] Xiang Li, Qian Ding, Jianqiao Sun. Remaining useful life estimation in prognostics using deep convolution neural networks
CE
[J]. Reliability Engineering & System Safety, 2018, 172:1-11.

[33]Han Y, Tang B, Deng L. Multi-level wavelet packet fusion in dynamic ensemble convolutional neural network for fault
diagnosis[J]. Measurement, 2018, 127: 246-255
AC
[34].Wei Zhang, Chuanhao Li, Guoliang Peng, et al. A deep convolutional neural network with new training methods for bearing
fault diagnosis under noisy environment and different working load [J]. Mechanical Systems & Signal Processing, 2018,
100:439-453.
[35] Xiaoan Y , Minping J . Intelligent fault diagnosis of rotating machinery using improved multiscale dispersion entropy and
mRMR feature selection[J]. Knowledge-Based Systems, 2019,163:450-471.
[36] Jinde Z , Zhanwei J , Haiyang P . Sigmoid-based refined composite multiscale fuzzy entropy and t-SNE based fault
diagnosis approach for rolling bearing[J]. Measurement, 2018, 129:332-342.
[37] Li P , Kong F , He Q , et al. Multiscale slope feature extraction for rotating machinery fault diagnosis using wavelet
analysis[J]. Measurement, 2013, 46(1):497-505.
[38] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11)
22
ACCEPTED MANUSCRIPT
(1998) 2278–2324.
[39 Bouvrie. Notes on Convolutional Neural Neworks [J] . Neural Nets, 2006.
[40] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition [J]. Computer Science,
2014.
[41]S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015,
Also Available at: arXiv preprint arXiv:1502.03167.
[42] N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural
networks from overfitting, J. Mach. Learn. Res. 15 (1) (2014) 1929–1958.
[43] Abdeljaber O , Avci O , Kiranyaz M S , et al. 1-D CNNs for Structural Damage Detection: Verification on a Structural
Health Monitoring Benchmark Data[J]. Neurocomputing, 2018, 275:1308-1317.
T
[44] Gan J , Wang W , Lu K . A new perspective: Recognizing online handwritten Chinese characters via 1-dimensional CNN[J].
IP
Information Sciences, 2019, 478:375-390.
[45] Case Western Reserve University. Bearing data center web-site: bearing data center seeded fault test data [EB/OL].
CR
[2007-11-27]. http://www/eecs/cwru/edu/laboratory/bearing/.
US
AN
M
ED
PT
CE
AC
23
ACCEPTED MANUSCRIPT
T
IP
CR
Wenyi Huang: received the B.S. degree from East China Jiaotong University, Nanchang, China,
in 2015. Now he is currently working toward the Ph.D. Degree in Hunan University, Changsha,
China. His main research interests include pattern recognition and machinery fault diagnosis.
US
AN
M
ED
PT
CE
Junsheng Cheng: received the Ph.D. degree in manufacturing engineering and automation form
AC
Hunan University in 2005. He is currently a professor in College of Mechanical and Vehicle

Engineering, Hunan University. His main research interests include mechanical fault diagnosis,
dynamics signal processing, vibration and noise control.
24
ACCEPTED MANUSCRIPT
T
IP
CR
Yu Yang: received the B.S. degree, the M.S. and Ph.D. degrees in mechanical engineering from
the College of Mechanical and Vehicle Engineering, Hunan University, Changsha, PR China, in
US
1994, 1997 and 2005, respectively. Her research interests include pattern recognition, digital
signal processing and machine fault diagnosis.
AN
M
ED
PT
CE
AC
Gaoyuan Guo: received the B.E. degree from Southwest Jiaotong University, ChengDu,China,in
2017.Now she is currently working toward the B.S. degree in Hunan University, Changsha, China.
Her main research interests include image processing and pattern recognition.
25
ACCEPTED MANUSCRIPT
(a) w1 (b) Max pooling

x1 w1x1+w2x2+w3x3 x1 max(x1,x2)
* Convolution Pooling size=2
w2
w3 max(x3,x4)
max(x5,x6)
Output Vector Output Vector
x5 x5
T
x6 x6
Input Vector Input Vector
IP
Fig.1. (a) Illustration of the convolution process in the convolutional layer, and (b) illustration of
CR
the max pooling process in the pooling layer
Scale 1
.
.
. US Sub-
sampling
AN
-0.25
-0.15
-0.05
0.05
0.15
0.25
-0.2
-0.1
0.1
0.2
.
0
0
Sub-
Convolution
. sampling
100
Convolution
200
. . .
. Sub-
300
. sampling
Convolution
.
. .
400
.
Convolution
. . . .
. .
500
. .
600
. . . . .
. .
700
. . . .
.
Scale 2 . . . .
.
800
.
.
. .
M
900
.
. . .
.
1000
.
.
.
ED
. .
.
.
Scale n
Input MC C1 P1 C2 P2 C3 P3 F
Signal Kernel size Kernel size Pooling size Kernel size Pooling size Kernel size Pooling size Neuron number
8×1×8 32×1×8 16×1×8
PT
[100 200 300] 2 2 2 112

Stride Stride Stride
2 4 2
Fig.2 The architecture of MC-CNN

CE
AC
26
ACCEPTED MANUSCRIPT
Signal Acquisition
Training Sample Testing Sample

8
6
6
4
4
2 2
a/(ms -2)
0
加速度
a/(ms -2)
加速度
-2
-2
-4
-4
-6
-6 -8
-10
-8 0 0.02 0.04 0.06 0.08 0.1 0.12
0 0.02 0.04 0.06 0.08 0.1 0.12 时间 t/s
时间 t/s
Training MC-CNN Model Construction
Testing
T
.
.
. .
Kernel Learning . Classification
IP
.
. .
. .
.
CR
Bearing Fault Diagnosis
Fig.3 The structural framework of the proposed fault diagnosis method based on MC-CNN
Tab.1 Description of rolling element bearing datasets

US
AN
Condition Normal Ball Inner Outer
A test 125 125 125 125
M

B test 100 100 100 100
ED
C test 25 25 25 25
Tab.2 The details of the architecture of MC-CNN

PT

CE
MC Kernels [100 200 300] 1 2358×1

C1 Kernels 8×1×8 2 1176×8
AC
C2 Kernels 32×1×8 4 140×8

C3 Kernels 16×1×8 2 28×8
F / / / 112
Tab.3 The details of the architecture of CNN

C1 Kernels 200×1×3 1 786×3
27
ACCEPTED MANUSCRIPT
C2 Kernels 8×1×8 1 779×8

P1 Max pooling 2 / 390×8(padding)
C2 Kernels 32×1×8 2 180×8
C3 Kernels 16×1×8 2 38×8
F / / / 152
100%
MC-CNN-S
CNN-S
95 % CNN-R
T
IP
90 %
CR
85 %
80 %
75 %
70 %
US
AN
MC-CNN-S 97.2±0.16 98.5±0.14 99.7±0.11

CNN-S 94.1±0.37 94.9±0.29 96.5±0.23
CNN-R 85.7±0.42 87.2±0.32 92.0±0.17
M
Fig. 4. Testing accuracies of the three datasets using the three methods
ED
PT
Tab.4 Running time comparison of MC-CNN and CNN

Running Time(s)
Model
Testing Sample(Dataset A) Average Sample
CE
MC-CNN 38.22 0.0637

CNN 27.85 0.0464
AC
28
ACCEPTED MANUSCRIPT
0.9
MC-CNN-S
0.8 CNN-R
CNN-S
0.7
0.6
Error 0.5
0.4
0.3
0.2
T
0.1
IP
0
0 100 200 300 400 500 600 700
Epoch
CR
Fig. 5. The loss of MC-CNN-S, CNN-R and CNN-S
US
0.5 4 20
10
2 10
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-2 -10
-10
AN
-0.5 -4 -20
0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
5 2 100
0.2
1 50
Amplitude
Amplitude
Amplitude
0.1
M
Amplitude
0 0 0 0
-0.1
-1 -50
-0.2
-5 -2 -100
0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
ED
Sampling Point (a) Sampling Point Sampling Point Sampling Point

Normal Outer Normal
(b) Outer
Fig.6 Raw signals (a) and MC signals (b) of each healthy condition of bearings
PT
0.1 0.4 1.5 3
0.3
Amplitude
Amplitude
2
Amplitude
1
Amplitude
0.05 0.2
0.5 1
0.1
CE
0 0 0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
0.08 0.8 0.8 30

AC
0.06 0.6 0.6

Amplitude
Amplitude
20
Amplitude
Amplitude
0.04 0.4 0.4

10
0.02 0.2 0.2
0 0 0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
Fig.7. Frequency domain of Raw signals (a) and MC signals (b) of each healthy condition
29
ACCEPTED MANUSCRIPT
100 100
Mapped feature 2
Mapped feature 2
50 50
0 0
-50 -50
-100 -100
-100 -50 0 50 100 -100 -50 0 50 100
CNN-S:0.91 CNN-R:1.06
100
Ball
Mapped feature 2
50
Inner
T
0 Normal
IP
-50 Outer
-100
-100 -50 0 50 100
CR
Mapped feature 1
MCCNN:0.71
Fig.8. The visualization of the MC layer in MC-CNN and the first convolutional layer of CNN-S
and CNN-R for Dataset C
1
US 0.4
Amplitude
Amplitude
0 0.2
AN
-1 0
0 10 20 30 40 50 60 70 80 90 100 0 1000 2000 3000 4000 5000 6000
K1 K1
1 0.4
Amplitude
Amplitude
0 0.2
0
M
-1
0 20 40 60 80 100 120 140 160 180 200 0 1000 2000 3000 4000 5000 6000
K2 K2
0.5 0.1
Amplitude
Amplitude
0 0.05
ED
-0.5 0
0 50 100 150 200 250 300 0 1000 2000 3000 4000 5000 6000
K3 K3
(a) Time domain
Fig. 9. The learned cascaded kernels

PT
CE
AC
30
ACCEPTED MANUSCRIPT
Learned kernels in C1 Learned kernels in C2

0.5 0.5 0.2 0.2
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-0.5 -0.5 -0.2 -0.2

2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
0.5 0.5 0.2 0.2
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-0.5 -0.5 -0.2 -0.2

2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
0.5 0.5 0.2 0.1
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-0.5 -0.5 -0.2 -0.1

2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
1 0.5 0.2 0.2
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
T
-1 -0.5 -0.2 -0.2
2 4 6 8 2 4 6 8 5 10 15 20 25 30 5 10 15 20 25 30
IP
Learned kernels in C3
-0.05 -0.05
Amplitude
Amplitude
-0.1 -0.1
CR
-0.15 -0.15
5 10 15 5 10 15
-0.05 -0.05
Amplitude
Amplitude
-0.1 -0.1
-0.15
-0.05
5 10
US 15
-0.15
-0.05
5 10 15
Amplitude
Amplitude
-0.1 -0.1
AN
-0.15 -0.15
5 10 15 5 10 15
-0.05 -0.05
Amplitude
Amplitude
-0.1 -0.1
-0.15 -0.15
5 10 15 5 10 15
M
Fig.10. The learned kernels of convolutional layers in MC-CNN

ED
PT
CE
AC
31
ACCEPTED MANUSCRIPT
1 1 1 1 1 1
0.5 0.5 0.5 0.5 0.5 0.5
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.7
1 1 1 1 1 1
0.6 8
0.6
0.5
6
0.5 0.5 0.5 0.5 0.5 0.5
0.4
4
0.3 0.5
2
0 0 0 0 0 0
0.2
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.1 0
1 1 1 1 0.5 1
0.4
0
-2
-0.1
-4
0.5 0.5 0.3
-0.2 0.5 0.5 0.5
-6
-0.3
0.2
-0.4
0 100 200 300 400 500 600 700 800 900 1000
-8
0 500 1000 1500 2000 2500 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 0
0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
1 1 1 1 C3 1 1
0.1
0.5 0.5 0.5 0.5

0
0.5 0.5 0 20 40 60 80 100 120
0 0
0 500 1000 1500 0 500 1000 1500 0 0 50 100 150
0
0 50 100 150
0 0
0 10 20 30 0 10 20 30
Input MC C1 C2 C3 F
(a):Ball
1 1 1 1 1 1
0.5 0.5 0.5 0.5 0.5 0.5
T
0.7
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
1 1 1 1 1 1 0.6
3
15
2 10
0.5 0.5 0.5 0.5 0.5 0.5 0.5
5
1
0 0 0 0 0 0
0
0 0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30 0.4
1 1 1 1 0.5 1
IP
-5
-1
-10 0.3
-2
0.5 0.5 0.5 0.5 0.5
-15
-20
0.2
-3
0 100 200 300 400 500 600 700 800 900 1000 0 500 1000 1500 2000 2500 0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
1 1 1 1 C3 1 1 0.1
0.5 0.5 0.5 0.5 0.5 0.5 0

0 20 40 60 80 100 120
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
Input MC C1 C2 C3 F
CR
(b):Inner
1 1 0.1 0.1
0.8 0.8
0.5 0.5 0.05 0.05
0.6 0.6
0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0.4 0.4 0.6
1 1 0 10 20 30 0 10 20 30
0.25 2 0.2 0.1
0.8 0.8
US
0.2
1.5
0.15 0.5 0.5 0.1 0.05 0.55
0.1
1
0.6 0.6
0.5
0.05 0 0 0 0
0 0 0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0.4 0.4
1 1 0 10 20 30 0 10 20 30 0.5
-0.05
-0.5 0.1 0.1
-0.1 0.8
-1
-0.15 0.5 0.5 0.45
0.05 0.05
-0.2
-1.5
0.4 0.6 0.45
-0.25
0 100 200 300 400 500 600 700 800 900 1000
-2
0 500 1000 1500 2000 2500 0 0 0.35
0 500 1000 1500 0 500 1000 1500 0 0
0 50 100 150 0 50 100 150 0.4
1 1 0 10 20 30 0 10 20 30
0.2 0.2 0.4
0.8 0.8
0.5 0.5
0.1 0.1
0.6 0.6
0.35
0 0 0 20 40 60 80 100 120
0 500 1000 1500 0 500 1000 1500 0 0 0.4 0.4
0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
Input MC C1 C2 C3 F
AN
(c):Normal
1 1 1 1 0.04 0.04
0.5 0.5 0.5 0.5 0.02 0.02
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
5 0.03
100
1 1 1 1 0.04 0.02
4
80
3
60 0.5 0.5 0.5 0.5 0.02 0.01
2
0.025
40
M
1 20
0 0 0 0 0 0
0 0 0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.02
-1 -20 1 1 1 1 0.02 0.04
-40
-2
-60
-3 0.5 0.5 0.5 0.5 0.01 0.02
-80 0.015
-4
0 100 200 300 400 500 600 700 800 900 1000 -100
0 500 1000 1500 2000 2500
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0.01
1 1 1 1 0.04 0.04
0.5 0.5 0.5 0.5 0.02 0.02 0.005
0 0 0 0 0 0
0 500 1000 1500 0 500 1000 1500 0 50 100 150 0 50 100 150 0 10 20 30 0 10 20 30
0
Input MC C1 C2 C3 F
0 20 40 60 80 100 120
ED
(d):Outer
Fig.11 The whole evolution of signals of each healthy condition in MC-CNN
PT
0.8 0.8
0.6 0.6
Amplitude
Amplitude
CE
0.4 0.4
0.2 0.2
0 0
AC
0 50 100 0 50 100
Ball Inner
0.6 0.03
0.55
0.5 0.02
Amplitude
Amplitude
0.45
0.4 0.01
0.35
0
0 50 100 0 50 100
Normal Outer
Fig.12 Full-connected layer features of each signals based on MC-CNN
32
ACCEPTED MANUSCRIPT
60 40
Mapped feature 2
40
Mapped feature 2
20
20
0
0
-20 -20
-40 -40
-40 -20 0 20 40 60 -20 -10 0 10 20 30
CNN-S:0.418 CNN-R:0.437
60
T
Ball
Mapped feature 2
40
Inner
IP
20
Normal
0
Outer
CR
-20
-40
-100 -50 0 50 100
Mapped feature 1
MCCNN:0.236
Fig.13 The visualization and cluster error of the full-connected features for Dataset C
US
AN
4 4
200 200
2 2 100 100
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
M
-2 -2 -100 -100
-4 -4 -200 -200
0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
ED
4 10 200 1000
2 5 100 500
Amplitude
Amplitude
Amplitude
Amplitude
0 0 0 0
-2 -5 -100 -500
PT
-4 -10 -200 -1000

0 500 1000 0 500 1000 0 500 1000 1500 2000 0 500 1000 1500 2000
Fig.14. Original signals (a) and MC signals (b) with SNR value of 4 dB
CE
0.2 0.4 10 20
0.15 0.3 15
Amplitude
Amplitude
Amplitude
Amplitude
AC
0.1 0.2 5 10
0.05 0.1 5
0 0 0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
0.2 1 10 80
0.15 60
Amplitude
Amplitude
Amplitude
Amplitude
0.1 0.5 5 40
0.05 20
0 0 0 0
0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000
Frequency(Hz) (a) Frequency(Hz) Frequency(Hz) (b) Frequency(Hz)
Normal Outer Normal Outer
Fig.15. Frequency domain of Raw signals (a) and MC signals (b) with SNR value of 4 dB
33
ACCEPTED MANUSCRIPT
20 4
Amplitude
Amplitude
0 2
-20 0
0 10 20 30 40 50 60 70 80 90 100 0 1000 2000 3000 4000 5000 6000
K1 K1
10
Amplitude
Amplitude
0 1
-10 0
0 20 40 60 80 100 120 140 160 180 200 0 1000 2000 3000 4000 5000 6000
K2 K2
10
Amplitude
Amplitude
0
1
-10
0 50 100 150 200 250 300 0
0 1000 2000 3000 4000 5000 6000
Kernel size:300
Frequency(Hz)
K3
K3
(a) Time domain
T
Fig.16 The learned cascaded kernels of noisy signals
IP
1 1 0.8 0.8
0.6 0.6
Amplitude
Amplitude
Amplitude
Amplitude
0.5 0.5 0.4 0.4
CR
0.2 0.2
0 0 0 0
0 50 100 0 50 100 0 50 100 0 50 100
Ball Inner Inner Inner
US
1 0.4
1 1
0.3
Amplitude
Amplitude
Amplitude
Amplitude
0.5 0.2
0.5 0.5
0.1
AN
0 0
0 50 100 0 50 100 0 0
0 50 100 0 50 100
Normal Outer Inner
(a) Four conditions (b) Four samples of Inner Inner
0.8 0.8 0.8 0.8

M
0.6 0.6 0.6 0.6

Amplitude
Amplitude
Amplitude
Amplitude
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2

ED
0 0 0 0
0 50 100 0 50 100 0 50 100 0 50 100
0.8 0.8 1 1
0.6 0.6
PT Amplitude
Amplitude
Amplitude
Amplitude
0.4 0.4 0.5 0.5
0.2 0.2
0 0 0 0
CE
0 50 100 0 50 100 0 50 100 0 50 100

(c) Four samples of Ball (d) Four samples of Normal
samples of No
Fig.17 Full-connected layer features of each condition signals with SNR value of 4 dB
AC
34
ACCEPTED MANUSCRIPT

SNR(dB) 0 2 4 6
MC-CNN 99.61±0.11 97.33±0.23 95.71±0.17 91.33±0.08
CNN-S 95.32±0.38 89.54±0.27 83.69±0.41 74.54±0.62
CNN-R 88.13±0.46 83.17±0.39 71.82±0.42 62.27±0.35
T
Tab.6 Different scales of kernels in MC layers
IP
Kernels Scale 1 Scale 2 Scale 3
Ⅰ 50 100 150
CR
Ⅱ 100 200 300
Ⅲ 150 300 450
100%
US
AN
Scale 1
Scale 2
95 % Scale 3
M
90 %
85 %
ED
80 %
PT
75 %
CE
70 %
Scale 1 95.7±0.07 97.0±0.21 96.9±0.17

Scale 2 97.2±0.16 98.5±0.14 99.7±0.11
AC
Scale 3 96.3±0.31 98.2±0.18 98.5±0.12

Fig. 18. Testing accuracies of kernels with different scales in MC layer
35
ACCEPTED MANUSCRIPT
Fig.19 Suzhou University bearing fault simulation bench
T
IP
Operation Condition Loading(N) Speed (rpm)
CR
Ⅰ 0 900
Ⅱ 2000 900
Ⅲ 0 1200
Ⅳ
Ⅴ
Ⅵ
US
2000
0
2000
1200
1500
1500
AN
Normal Inner fault of 0.2mm
5
Amplitude
Amplitude
a/(ms -2)
200
a/(ms -2)
0 0
-200
-5
0 0.05 0.1 0 0.05 0.1
M
Time t/s Time t/s
Inner fault of 0.4mm Outer fault of 0.2mm

200 200
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
0 0
ED
-200 -200
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
Outer fault of 0.3mm Rolling ball fault of 0.2mm

200 20
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
PT
0 0
-200 -20
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
CE
Fig.20 Time-plot of different fault pattern signal under operation condition Ⅰ

0N+900rpm 0N+1200rpm
200 200
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
AC
0 0
-200 -200
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
0N+1500rpm 2000N+900rpm
500 100
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
0 0
-500 -100
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
2000N+1200rpm 2000N+1500rpm
100 500
Amplitude
Amplitude
a/(ms -2)
a/(ms -2)
0 0
-100 -500
0 0.05 0.1 0 0.05 0.1
Time t/s Time t/s
Fig.21 Time-plot of 0.4mm inner fault signal under different operation condition
36
ACCEPTED MANUSCRIPT
90%
MC-CNN-S
85% CNN-S
CNN-R
80%
75%
70%
65%
T
60%
IP
55%
CR
50%
MC-CNN-S 75.0±0.53 81.4±0.46 83.2±0.47

CNN-S 56.8±0.65 62.2±0.42 64.1±0.41
CNN-R 52.1±0.71
US
67.7±0.50 65.9±0.57
Fig. 22. Testing accuracies of the three nonstationary datasets using the three methods
AN
M
ED
PT
CE
AC
37

J Neucom 2019 05 052

Uploaded by

Copyright:

Available Formats

J Neucom 2019 05 052

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

J Neucom 2019 05 052

Uploaded by

Copyright:

Available Formats

Accepted Manuscript

An improved deep convolutional neural network with multi-scale

Wenyi Huang , Junsheng Cheng , Yu Yang , Gaoyuan Guo

To appear in: Neurocomputing

Received date: 24 January 2019

An improved deep convolutional neural network with multi-scale

information for bearing fault diagnosis

application of MC-CNN in bearing fault diagnosis under nonstationary working conditions.

been paid much attention by researchers.

(a) w1 (b) Max pooling

xl  down( xl 1, s) (2)

2.3 Proposed of MC-CNN

Fig.2 The architecture of MC-CNN

2.4 Training of MC-CNN

(2) Full-connected layer

 i ,l  (wl 1 )T  i ,l 1   ' ( zi ,l ) (9)

(3) Convolutional layer

 i ,l   i ,l 1  rot180(wl 1 )   ' ( z i ,l ) (10)

(4) Pooling layer

 i ,l  upsample( i ,l 1 )   ' ( zi ,l ) (11)

(2) Full-connected layer

(3) Convolutional layer

Training Sample Testing Sample

MC-CNN Model Construction

Bearing Fault Diagnosis

Tab.2 The details of the architecture of MC-CNN

Input layer / / / 985×1

C1 Kernels 8×1×8 2 1176×8

P2 Max pooling 2 / 70×8

Tab.3 The details of the architecture of CNN

Layer Parameter name Parameter size Stride Output size

MC-CNN-S 97.2±0.16 98.5±0.14 99.7±0.11

3.2 The details of MC-CNN

convolution operator in time domain is equal to product in frequency domain.

0.08 0.8 0.8 30

0.06 0.6 0.6

0.04 0.4 0.4

-0.5 -0.5 -0.2 -0.2

-0.5 -0.5 -0.2 -0.1

-1 -0.5 -0.2 -0.2

Fig.10. The learned kernels of convolutional layers in MC-CNN

lowest among the three models.

0.5 0.5 0.5 0.5 0.5 0.5

0.5 0.5 0.5 0.5

0.5 0.5 0.5 0.5 0.5 0.5

0.5 0.5 0.5 0.5 0.5 0.5 0

0.5 0.5 0.5 0.5 0.02 0.02

0.5 0.5 0.5 0.5 0.02 0.02 0.005

Fig.11 The whole evolution of signals of each healthy condition in MC-CNN

3.3 Diagnosis results and performance comparison under noise condition US

-4 -10 -200 -1000

0 10 20 30 40 50 60 70 80 90 100 0 1000 2000 3000 4000 5000 6000

Fig.16 The learned cascaded kernels of noisy signals

0.6 0.6 0.6 0.6

0.2 0.2 0.2 0.2

0.4 0.4 0.5 0.5

Tab.5 Diagnosis accuracies of the three models under different SNR

MC-CNN 99.61±0.11 97.33±0.23 95.71±0.17 91.33±0.08