Comparison of Neural Network Architectures For Machinery Fault Diagnosis

CDI TOC
Proceedings of ASME Turbo Expo 2003

Power for Land, Sea, and Air
June 16-19, 2003, Atlanta, Georgia, USA
GT2003-38450
COMPARISON OF NEURAL NETWORK ARCHITECTURES FOR MACHINERY FAULT

DIAGNOSIS
T.A.F. Hassan, A. El-Shafei, Y. Zeyada
Faculty of Engineering
Department of Mechanical Design and Production
Cairo University
Giza, Egypt 12316
N. Rieger
STI Technologies,Inc
Rochester, New York 14623
ABSTRACT the compressor dynamics followed by two recurrent networks,

This paper provides a comparison of the performance of to classify the system fault qualitatively and quantitatively.
five different neural network architectures in diagnosing Dellomo (1999) [6], used a set of Neural Networks to
machinery faults. The network architectures include adaptively detect faults in a helicopter gearbox. A lot of
perceptrons, linear filters, feed-forward, self-organizing, and research effort has been directed to the development of neural
LVQ. The study provides a critical analysis of the performance networks for the identification of nonlinear dynamic models
of each network on a test rig with different faults. The such as rotor-bearing systems, see for example Narenda and
comparison discusses the success rate in network training and Parthasarathy (1990) [7]; and Billings et al. (1990) [8].
identification of faults including: unbalance and looseness. It Little attention has been drawn to the comparison of
is shown that the perceptron and LVQ architectures were different neural network architectures and their effectiveness
superior and achieved 100% diagnosis on the cases presented. on the accuracy of the diagnostic results. In particular, for
applications of machinery fault diagnosis with high-resolution
INTRODUCTION spectral data, there is no source in the literature that identifies
A common goal for all rotating machinery is to minimize the most suitable neural network architecture for this
maintenance down time, and to maximize the availability of application. Actually, the authors have embarked on a program
critical resources. This can be better achieved if machinery to use neural networks to diagnose faults in rotating
faults are detected before they actually occur. Machine health machinery. The results presented here represent a first step in
monitoring by observing changes in the vibration signature is a selecting the most suitable network for diagnosing faults in
major tool in any predictive maintenance scheme. Recently, rotating machinery.
[1]-[3] Neural Networks have been proven capable of This paper introduces a comparison among several types
identifying source and severity of machine faults if they are of networks and their capabilities to detect different rotating
given enough training on appropriate data. machinery faults. In particular five network architectures are
Roemer, Hong and Hesler (1996) [4], introduced a finite compared namely: Perceptrons (PER), Linear Filters (LF),
element based neural network for machinery fault diagnosis. Feed Forward (FF), Self Organizing Networks (SON), and
They used a finite element model of a rotor-bearing system to Linear-Vector-Quantization networks (LVQ). A desktop
generate training data where faults as well as component rotor-bearing test rig is used to generate training and validation
stresses can be correlated to measure vibration. data sets. Each network is tested for the detection and the
James Li and Yimin Fan (1999) [5], described a method diagnosis between three different conditions of the desktop test
based on the use of neural networks to diagnose the most rig. These three conditions include the no fault condition and
frequent faults of a screw compressor. Their network two different faults, which are: unbalance and structure
configuration consists of a feed forward network to identify looseness. Each kind of fault is implemented separately to the
test rig.
1 Copyright © 2003 by ASME

The measured vibration is used to train each of the five Perceptrons have a single layer of hard-limit neurons as a
networks, and later used to diagnose the fault conditions. This transfer function. The number of network inputs and the
study presents the methodology as well as the results of each number of neurons in the layer are constrained by the number
of the networks. The networks are compared based on their of inputs and outputs required by the problem.
success rate in training and in diagnosis. The perceptron network, Figure 1, is a single-layer
network whose weights and biases could be trained to produce
NEURAL NETWORKS a correct target vector t when presented with the corresponding
A NN is a biologically inspired computational model that input vector. The training technique used is called the
consists of processing elements (neurons) and connections Perceptrons learning rule. The perceptron generated great
between them, as well as of training and recall algorithms. The interest due to its ability to generalize from its training vectors
structure of a neuron is defined by inputs, having weights and learn from initially randomly distributed connections.
bound to them; an input function, which calculates the
aggregated net input signal to a neuron coming from all its
inputs; an activation (signal) function, which calculates the
activation level of a neuron as a function of its aggregated
input signal and (possibly) of its previous state. An output
signal equal to the activation value is emitted through the
output of the neuron. Neural networks are owing to the main
role of the connections. The weights bound to them are a result
of the training process and represent the “long-term-memory”
of the model. The main characteristics of a NN are [9]:
Learning; a NN can start with no knowledge and can be
trained using a given set of data examples, that is, input-output
pairs (as in supervised training), or only input data (as in
unsupervised training). Through learning the connection
weights change in such a way that the network learns to
produce desired outputs for known inputs.
Generalization; if a new vector that differs from the
known examples is supplied to the network, it produces the
best output according to the examples used. a = hardlim (IW p + b)
Massive potential parallelism; during the processing of
data many neurons “fire” simultaneously. Figure 1 Perceptron Architecture
Robustness; if some neurons go wrong, the whole system
may still perform well. The Perceptron Learning Rule is given by
Neural networks for diagnosis, fault detection, wnew = wold + epT (1)
predictions, pattern recognition and association; solve new old
essentially various classification, association, and b =b +e (2)
generalization problems. Such networks can acquire where
knowledge without extracting If-Then rules from a human
expert provided that the number of training vector pairs is e = t– a (3)
sufficient to suitably form all decision regions. Thus, neural
networks would be able to ease the knowledge acquisition, the Linear Filters
bottleneck that is hampering the creation and development of Single-layer linear networks can perform linear function
conventional expert systems. After training, even a data-rich approximation or pattern association. Single-layer linear
situation, neural networks will have the potential to perform networks can be designed directly or trained with the Widrow-
like expert systems. Layered diagnostic network is trainable Hoff rule [9] to find a minimum error solution. The design of
using the error backpropagation technique. a single-layer linear network is constrained completely by the
As we have mentioned before, we are in need for a problem to be solved. The number of network inputs and
network that classifies the situation of the machine (unbalance, neurons is determined by the number of inputs and outputs
structure looseness, no fault condition ). In the following, an required by the problem.
analysis of some of the possible Neural Network models, that The linear network neuron (Figure 2) is similar to the
can be used for diagnosis. This includes: Perceptrons, Linear, Perceptron neuron, but their transfer function is linear rather
Feed-Forward, Self-Organizing, and Learning-Vector- than hard-limiting. This allows their outputs to take on any
Quantization networks value, whereas the Perceptrons output is limited to either 0 or
1. Linear networks, like the Perceptrons, can only solve
Perceptrons linearly separable problems.
The basic unit in the Perceptron Network is the Perceptron The linear transfer function calculates the neuron's output
Neuron. Perceptrons are useful as classifiers. They can by simply returning the value passed to it.
classify linearly separable input vectors very well.
Convergence is guaranteed in a finite number of steps a = purelin (n) = purelin (Wp + b) = Wp + b (4)
providing the perceptron can solve the problem.

The linear network shown in Figure 2, has one layer of S
neurons connected to R inputs through a matrix of weights W. Feed-Forward Network
The difference between an output vector and its target vector is Feed-Forward networks often have one or more hidden
the error. We would like to find values for the network layers of sigmoid neurons followed by an output layer of linear
weights and biases such that the sum of the squares of the neurons (Figure 3). Multiple layers of neurons with nonlinear
errors is minimized or below a specific value. This problem is transfer functions allow the network to learn nonlinear and
manageable because linear systems have a single error linear relationships between input and output vectors. The
minimum. Fortunately, we can always train the network to linear output layer lets the network produce values outside the
have a minimum error by using the Least Mean Squares range –1 to +1.
(Widrow-Hoff) algorithm.
a = tansig (IW*P + b) a = purelin (LW*P + b )
Figure 3 Multiple Layers in Feed-Forward Networks
This network can be used as a general function

approximator. It can approximate any function with a finite
number of discontinuities, arbitrarily well, given sufficient
neurons in the hidden layer. Each input is weighted with an
appropriate w. The sum of the weighted inputs and the bias
forms the input to the transfer function f. Neurons may use
Figure 2 Linear Network Architecture any differentiable transfer function f to generate their output.
Multilayer networks often use the log-sigmoid transfer
The least mean square error (LMS) algorithm is an function (logsig). The function logsig generates outputs
example of supervised training, in which the learning rule is between 0 and 1 as the neuron's net input goes from negative
provided with a set of examples of desired network behavior: to positive infinity. Alternatively, multi-layer networks may
use the tan-sigmoid transfer function (tansig). Occasionally,
{P1 , t 1 }, {P2 , t 2 }, …, {PQ , t Q } (5) the linear transfer function (purelin) is used in
backpropagation networks.
Here Pq is an input to the network, and tq is the If the last layer of a multi-layer network has sigmoid
corresponding target output. As each input is applied to the neurons, then the outputs of the network are limited to a small
network, the network output is compared to the target. The range. If linear output neurons are used, the network outputs
error is calculated as the difference between the target output can take on any value. In back-propagation it is important to
and the network output. We want to minimize the average of be able to calculate the derivatives of any transfer functions
the sum of these errors. used. Each of the transfer functions above, tansig, logsig, and
Q Q
purelin, have a corresponding derivative function, which
∑ ∑ ( t( k ) − a( k ))
1 1 makes them suitable for back-propagation applications.
mse = e( k ) 2 = 2
(6)
Q Q Once the network weights and biases have been
k =1 k=1
initialized, the network is ready for training. The network can
be trained for function approximation (nonlinear regression),
The LMS algorithm adjusts the weights and biases of the pattern association, or pattern classification. The training
linear network, to minimize this mean square error. The LMS process requires a set of examples of proper network behavior
rule is applied through the Widrow-Hoff Algorithm, as follows - network inputs p and target outputs t. During training the
weights and biases of the network are iteratively adjusted to
W(k +1) = W(k) + 2 áe(k)pT (k) (7) minimize the network performance function. The most known
b(k +1) = b(k) + 2 áe(k) (8) performance function for Feed-Forward networks is the mean
square error MSE - the average squared error between the
network outputs a and the target outputs t.
Here the error e and the bias b are vectors and α is a The modified Levenberg-Marquardt algorithm [10] which
learning rate. If α is large, learning occurs quickly, but if it is is used here, was designed to approach second-order training
too large it may lead to instability and errors may even speed without having to compute the Hessian matrix due to its
increase. To ensure stable learning, the learning rate must be large memory requirements; it uses the Jacobian matrix to
less than the reciprocal of the largest eigen value of the approximate the Hessian matrix, while the standard
correlation matrix pT p of the input vectors. Levenberg-Marquart uses the Hessian matrix to accelerate

training. When the performance function has the form of a sum Where i is the neuron number and i* is the winning neuron
of squares, then the Hessian matrix can be approximated as number.
H = JT J (9) Here the neighborhood N i* (d) contains the indices for all
of the neurons that lie within a radius d of the winning neuron
and the gradient can be computed as i* .
g = JT e (10) Ni (d) = {j, d ij d} (14)
Equation (14) illustrates all the neurons j allocated at a
where J is the Jacobian matrix that contains first derivatives of distance d from the winning neuron Ni* .
the network errors with respect to the weights and biases, and Thus, when a vector p is presented, the weights of the
e is a vector of network errors. The Levenberg-Marquardt winning neuron and its close neighbors move toward p.
algorithm uses this approximation to the Hessian matrix in the Consequently, after many presentations, neighboring neurons
following Newton-like update: will have learned vectors similar to each other.
The architecture for this SON is shown in Figure 4. This
architecture is like that of a competitive network, except no
xk+1 = xk – [JT J + ìI]-1 JT e (11) bias is used here. The competitive transfer function produces a
1 for output element a i corresponding to i* , the winning
When the scalar ì is zero, this is just Newton's method, neuron. All other output elements in a are 0. Now, however,
using the approximate Hessian matrix. When ì is large, this as described above, neurons close to the winning neuron are
becomes gradient descent with a small step size. Newton's updated along with the winning neuron.
method is faster and more accurate near an error minimum, so
the aim is to shift towards Newton's method as quickly as
possible. Thus, the error is decreased after each successful
step (reduction in performance function) and is increased only
when a tentative step would increase the performance function.
In this way, the performance function will always be reduced
at each iteration of the algorithm.
Self-Organizing Networks
Self-organizing networks (SON) learn to classify input
vectors according to how they are grouped in the input space.
They differ from competitive layers in that neighboring
neurons in the self-organizing network learn to recognize
neighboring sections of the input space. Thus, self-organizing
networks learn both the distribution (as do competitive layers)
and topology of the input vectors they are trained on.
A self-organizing network learns to categorize input Figure 4 SON Architecture
vectors. It also learns the distribution of input vectors. SON
allocates more neurons to recognize parts of the input space Learning Vector Quantization Networks
where many input vectors occur and allocate fewer neurons to LVQ networks classify input vectors into target classes by
parts of the input space where few input vectors occur. Self- using a competitive layer to find subclasses of input vectors,
organizing networks also learn the topology of their input and then combining them into the target classes. Unlike
vectors. Neurons next to each other in the network learn to Perceptrons, LVQ networks can classify any set of input
respond to similar vectors. The layer of neurons can be vectors, not just linearly separable sets of input vectors. The
imagined to be a rubber net that is stretched over the regions in only requirement is that the competitive layer must have
the input space where input vectors occur. Self-organizing enough neurons, and each class must be assigned enough
networks allow neurons that are neighbors to the winning competitive neurons.
neuron to output values. Thus, the transition of output vectors To ensure that each class is assigned an appropriate
is much smoother than that obtained with competitive layers, amount of competitive neurons, it is important that the target
where only one neuron has an output at a time. Here a self- vectors used to initialize the LVQ network, have the same
organizing network identifies a winning neuron i* using the distributions of targets as the training data the network is
same procedure as employed by a competitive layer [10]. trained on. If this is done, target classes with more vectors
However, instead of updating only the winning neuron, all will be the union of more subclasses.
neurons within a certain neighborhood Ni* (d) of the winning Learning vector quantization (LVQ) is a method for
neuron are updated using the Kohonen rule. Specifically, we training competitive layers in a supervised manner. A
adjust all such neurons i ª Ni* d as follows: competitive layer automatically learns to classify input
vectors. However, the classes that the competitive layer finds
i w (q) = i w (q –1) + á (p(q) - i w (q –1)) (12) are dependent only on the distance between input vectors. If
or i w (q) = (1– á) i w (q –1) + áp (q) (13) two input vectors are very similar, the competitive layer
probably will put them in the same class. There is no
mechanism in a strictly competitive layer design to say
whether or not any two input vectors are in the same class or

different classes. LVQ networks, on the other hand, learn to Perceptron Networks (PN) is known for their
classify input vectors into target classes chosen by the user. classification capabilities. They can classify linearly separable
The LVQ network architecture is shown in Figure 5. An input vectors very well. Convergence is guaranteed in a finite
LVQ network has a first competitive layer and a second linear number of steps depending on initialization and data vectors to
layer. The competitive layer learns to classify input vectors be classified. The perceptron network used is single layered
into subclasses. The linear layer transforms the competitive with 1000 input nodes and three output nodes of hard-limit
layer's subclasses into target classifications defined by the neurons (transfer function) as shown in Figure 1. Perceptron
user. We refer to the classes learned by the competitive layer learning rule, was adopted to find the optimum weights and
as subclasses and the classes of the linear layer as target biases that best classifies input vectors. The network output
classes. nodes are (0,1) type encoded to the NF, UN, and SL
Both the competitive and linear layers have one neuron conditions. Five input vectors corresponding to five data files
per (sub or target) class. Thus, the competitive layer can learn are considered for training for each of the three faults
up to S1 subclasses. These, in turn, are combined by the linear presented. Those 15 input vectors were used repeatedly to
layer to form S2 target classes. (S1 is always larger than S2). train the perceptron network until the network error is
In short, a 1 in the i th row of a1 (the rest to the elements of a1 minimized. The network performance is then validated by
will be zero) effectively picks the i th column of LW as the using it to identify input vectors corresponding to the rest of
network output. Each such column contains a single 1, data files that it has not been trained upon.
corresponding to a specific class. The second type of networks used in the comparison is the
Linear Networks (LN). They have the advantage of being
directly designed without iterative searches. This is due to
their parabolic error surface that has a single minimal. A least
square error procedure can directly locate the minimum and
find the optimal weights and biases of the network. Similar to
perceptron network, the linear network is single-layered with
1000 inputs and one output node of linear neuron (transfer
function) as shown in Figure 2. The output is set to indicate
ndist error type by producing a correspondent number of 1, 2 or 3
for NF, UN, and SL respectively. Since the output of the
network is continuously distributed, a marginal acceptance
envelope of ± 0.2 is tolerated for each number. Similar to
n = - IW – P perceptrons, the linear network was directly designed using 15
a = compet (n) A = purelin (LW * a) out of the 27 files, and been validated using the rest of the
files.
Figure 5 LVQ Architecture The third network architecture discussed here is the two-
layered Feed-Forward network (FF). These networks are
We know ahead of time what fraction of the layer 1 proven capable of approximating any nonlinear function with
neurons should be classified into the various class outputs of finite discontinuities. The network used has a tan-sigmoid
layer 2, so we can specify the elements of each row of LW at transfer function input layer, followed by linear transfer
the start. However, we have to go through a training procedure function output layer as shown in Figure 3. Similar to linear
to get the first layer to produce the correct subclass output for network, the feed-forward network utilizes the 1000-point
each vector of the training set. spectrum input and produces single output set to 1,2, or 3
corresponding to fault type. Unlike linear networks, feed-
NEURAL NETWORKS DESIGN forward networks cannot be directly designed. A search
In this section an analysis is conducted to compare the algorithm has to be adopted to find network weights and biases
performance of different neural network architectures in corresponding to the global minimum in the error surface.
detecting rotor faults. In particular, the five network Levnenberg-Marquardt algorithm is used here to perform
architectures are designed and trained namely: Perceptrons, search task. The number of hidden neurons remains a key
Linear, Feed-Forward, Self-Organizing, and Linear Vector design issue. Too little hidden neurons would cause the
Quantization networks. The networks utilized the same data network to be under designed and large amount of errors will
files obtained. A total of 27 files have been collected using the be observed during training. Increasing the hidden neurons
same measurement settings, with the main difference of the excessively would cause the network to over fit the data
planted fault. Nine data files were recorded for the no-fault performing well with training data while having poor
condition (NF). Disk unbalance was planted and nine more performance with validation data resulting in loss of
files were recorded representing the unbalance fault (UN). The generalization.
last nine files were collected after a structure looseness fault Self-organizing networks (SON), or some times called
(SL) was planted. Data measurements are normalized and competitive networks are the next architecture considered for
presented in the frequency domain. A 1000 frequency comparison. These networks learn to classify input vectors
resolution spectrum is then obtained for each of the 27 according to how they are grouped in the input space. The
measurements, which represent a single input vector to be network training aims to iteratively find weight vectors that act
utilized by neural networks. The selected network as the center of the input data clusters. When the network used
architectures cover a wide range of applications and usages for with fresh input vectors, they will be classified according to
fault diagnosis systems found in literatures.

which center they are closed to. In this study, we considered a looseness by loosening bolts of the test rig frame and bearing’s
single competitive layer with competitive transfer function as housing.
shown in Figure 4, to classify input spectrums and relate them Thus the test rig is designed to introduce specific faults.
to their corresponding faults. This competitive transfer function The unbalance fault is introduced by adding weights to the
produces a 1 for the output element corresponding to the disc. The structural looseness is introduced by loosening the
winning neuron and produces a 0 for all other output elements. bolts on the test rig frame. Two other faults can be also
To show how competitive networks use limited training data, implemented on this test rig: bearing looseness by loosening
we used only one input spectrum for each fault to be predicted, the bearing housing bolts, and misalignment (angular and
training the network on a total of three data input files only, parallel) by using the mechanism shown in Figure 7.
leaving the rest of the 27 data files for validation. This would
cause the search algorithm to put the weight centers right on
top of those three input vectors making them act like attractors
in the input space. The spectrum input vector closer to one of
the centers will be classified as similar to the fault
corresponding to that center. It should be noticed that, this
architecture was the choice of testing the capability of NN to
perform fast adaptive learning due to its great flexibility. SON
are free of linear separability problems associated with
perceptron networks and employing LVQ in this situation
makes no difference in case of single training data vector used
here. Feed forward and linear networks may work adaptively
but would require many samples till they produce good results.
The last network architecture considered in the comparison
is the Learning-Vector-Quantization network (LVQ). It Figure 6 Manufactured Test Rig
consists of an input competitive layer followed by a linear
layer as shown in figure 5. This arrangement allowing for 1- Motor. 2- Motor support. 3- Coupling. 4- Bearing housing
subdividing the data vector clusters into sub-classes followed & support. 5- Shaft. 6- Discs.
by a second clustering phase to collect those subclasses into
predefined target classes. The competitive layer first groups the
input data vectors into sub-classes, then the linear layer would
utilize those sub-classes and outputs, relates them to specific
target classes. This way, the LVQ networks would classify
input vectors that not necessarily have to be very close to
belonging to the same class. Also, this network architecture
does not require the input vectors to be linearly separable to
properly classify them such as the case with perceptron
networks. The network is trained using 15 data files, and is
validated with 12 data files.
DESIGN OF TEST RIG

The test rig shown in Figure 6, is a rotor-bearing test rig,
with two self-aligning ball bearings, four discs with
circumferential holes for the ability of tightening unbalance
masses: disk material is steel 50, density=7850 kg/m3 , outer
diameter=120 mm, thickness=15mm, inner diameter=25 mm,
number of holes in each disk=8, pitch circle diameter of
holes=100 mm, diameter of each hole=8 mm, and mass of
each disk=1.35 kg; the shaft was designed against static loads
and fatigue failure: shaft material is st. K 510 (Boehler) with
critical speeds 2570 rpm, 6952 rpm and 9737 rpm,
diameter=25 mm, length=1000 mm, density=7800 kg/m3 ,
E=210 GPa, HB=220, ultimate stress=3.45 HB=759 Mpa,
yield stress=531.3 Mpa, and factor of safety=2. The shaft is
driven by a three phase electric motor of 1.5 KW power, and Figure 7 Motor Support to allow movement and alignment
was driven through a MULTI CROSS RILLO coupling with
the ability to allow for a radial misalignment up till a value of DATA ACQUISITION
1.3mm and an angular misalignment up till 5 degrees. The test To acquire the vibration spectrum data, we used a
rig plates were designed to allow for radial misalignment and computer based data collection system, controlled by software
angular misalignment of the motor as shown in Figure 7. The emulating Virtual Instrument (VI). Vibration signals measured
test rig frame as well as the bearing housings were designed to on the rotor are fed to a signal-conditioning unit and finally
allow for fixation or for the implementation of structure delivered to a PC through a data acquisition card.

The measurement system consisted of a Bruel & Kjaer processing. After acquiring the raw data from the
accelerometer, conditioning amplifier, National Instruments accelerometer through the amplifier, the waveform is
AT-MIO-16F data acquisition card [11] and a Pentium III constructed; a window (Hanning window) is applied before the
computer. A software program developed using National FFT is processed. The processes of averaging, scaling, and
Instrument LabVIEW software controls this system [12] –[14]. integration have to be applied before the amplitude spectrum
The accelerometer was placed on one of the two bearings can be displayed. The resulting data have to be stored on the
since the vibrations experienced in the system are transmitted computer hard disk for further processing. Using the block
to the bearings. The signals obtained from the accelerometer diagram of Figure 8, it is possible to obtain the velocity
are amplified by the conditioning amplifier and finally spectrum representation of each signal and save it in a text file.
transmitted to the computer through the data acquisition card. A software was developed using LabVIEW to apply the
With the aid of the developed computer software using procedure described in the block diagram of Figure 8, and to
LabVIEW, the measured time waveform is analyzed in store it in text files and process these signals after specifying
preparation for neural network training and testing. However, the sampling rate and number of samples. This VI model
the main tool used to differentiate between the different cases computes the RMS averaged velocity amplitude spectrum of
of the machine condition is the amplitude spectrum of the the acquired waveform.
signals.
RESULTS
Neural Networks should be trained on data obtained by
Raw data
(From accelerometer) planting faults as shown in figure 9, which need to be later
identified by the network. Three sets of data are acquired, no
fault, mechanical unbalance, and structure looseness:
Conditioning amplifier
Waveform acquisition
(sampling)
Time domain Windowing

display
Amplitude spectrum
( FFT, unwrap phase )
Phase spectrum Figure 9 the test rig showing planted faults locations
Averaging display
No Fault: After setting up the test rig and the acquisition
system, nine data files are collected where no faults been
Scaling
planted. Those files are regarded as the No-Fault (NF) class.
Each measurement is performed while the rotor is running at
1800 RPM. Small 1X amplitude and smaller amplitudes for all
Integration other components characterize it, as sample shown in Figure
10.
Mechanical Unbalance: A mechanical unbalance is
Saving to text introduced by tightening a 10 gm bolt to one of the discs close
file
to the middle of the rotor. Also, nine readings are obtained.
The recorded files are labeled as (UN) class. Large 1X with
Amplitude spectrum smaller harmonics characterize it, as sample shown in figure
display 11.
Structural Looseness: In order to make a suitable
Figure 8 Block Diagram of the required Signal looseness in a safe way, two bolts are relaxed. These are the
Processing bolts of the test rig frame. The accelerometer was held radial
in the same side of the relaxed frame bolts, as shown in Figure
9. Nine more data files are recorded here and has been labeled
Basically, to obtain a spectrum, the steps shown in the as (SL) class. Large 1X with maybe larger then decreasing
block diagram in Figure 8, are to be implemented in the signal harmonics characterize it, as sample shown in Figure 12.

are used for validation. This is done to test the capability of the
network to use limited data for training and to inspect the
possibility for the in-line adaptive fault diagnostic ability of
the network. The results are illustrated in Table 1.
Figure 10 Spectrum for test rig with no faults
Figure 12 Spectrum of structural looseness fault
Table 1 consists of seven columns and 30 rows. The first

column contains the names of the 29 files used in the analysis.
The second column contains the class of the file: NF for no
fault, UN for unbalance fault, and SL for structural looseness.
This is the actual experimental condition that each of the
networks should be able to identify. The last five columns
contain the results of the five different neural networks. The
results of these five columns contain each of the three status
cases as the network identified, with the abbreviations NF, UN
and SL as before. The results should be compared with the
actual condition contained in the “Class” column. Moreover, a
letter T or V is used in parenthesis in the last five columns, do
indicate whether the file was used in training (T) or validation
Figure 11 Spectrum of mechanical unbalance fault (V). To facilitate reading the table the incorrect results are
shown in lighter color. The last row of the table contains the
performance results in both training and validation. It can be
Nine data files were acquired for each case. A total of 27 seen that all networks achieved 100% result in training, except
files were recorded and saved in Matlab friendly format [15]. the Feedforward network, which missed two cases in the SL
Using Matlab, a program was developed and was used to train training and converged to the UN case. This is because the
the five different network types using the obtained data. Files network was unable to minimize the error during the training
are split into two groups; the first group includes 15 files (5 phase
from each case) is used for training the networks, while a Two networks (Perceptrons and LVQ) obtained 100%
second group of 12 files (4 from each class) is used for validation, while two others (Linear and SON) obtained
validation of the trained networks. This technique is common satisfactory results. The Feed-forward network may obtain
in designing neural networks where the network performance better results by experimenting with other transfer functions in
is tested by data that hasn’t been used for training to guarantee the output layer. By modifying the number of hidden neurons
generalization. Self-organizing network is an exception from a 100% results was achieved in the training set. But choosing
the above validation procedure where it uses only three files the number of hidden neurons remains an issue due to the
for training (one from each class) and the remaining 24 files network susceptibility to over and under fitting problems.

Table 1 Performance of Neural Networks
File name Class Perceptrons Linear FeedForward Self-Organizing LVQ

Nf1 NF NF (T) NF (T) NF (T) NF (T) NF (T)
Nf2 NF NF (T) NF (T) NF (T) NF (V) NF (T)
Nf6 NF NF (V) NF (V) NF (V) NF (V) NF (V)
Un1 UN UN (T) UN(T) UN (T) UN (T) UN (T)
Un2 UN UN (T) UN(T) UN (T) UN (V) UN (T)
Un3 UN UN (T) UN(T) UN (T) SL (V) UN (T)
Un4 UN UN (T) UN(T) UN (T) SL (V) UN (T)
Un5 UN UN (T) UN(T) UN (T) UN (V) UN (T)
Un6 UN UN (V) UN(V) UN (V) SL (V) UN(V)
Un7 UN UN (V) SL (V) UN (V) UN (V) UN(V)
Un8 UN UN (V) UN(V) UN (V) SL (V) UN(V)
Un9 UN UN (V) SL (V) UN (V) UN (V) UN(V)
Stloos1 SL SL (T) SL (T) SL (T) SL (T) SL (T)
Stloos2 SL SL (T) SL (T) SL (T) SL (V) SL (T)
Stloos3 SL SL (T) SL (T) UN (T) SL (V) SL (T)
Stloos4 SL SL (T) SL (T) UN (T) SL (V) SL (T)
Stloos5 SL SL (T) SL (T) SL (T) SL (V) SL (T)
Stloos6 SL SL (V) SL (V) UN (V) SL (V) SL (V)
Stloos9 SL SL (V) SL (V) NF (V) SL (V) SL (V)
Performance 100%T 100%T 87%T 100%T 100%T
100%V 84%V 75%V 84%V 100%V
Tà File used for Training, Và File used for Validation
When the order of data files has changed, the perceptron The ability of different network architectures to recognize
network fails to accomplish the same performance. This is particular vibration signature and correlate them to different
referred to the data files being in this arrangement resulting in a test rig conditions was tested.
trained network with a decision boundary that causes miss The five diagnostic network architectures succeeded with
recognition due to linear inseparability of data vectors. different degrees varying between 75% and 100% in the
detection of different test rig conditions in both training and
validation phases. It was found that most of the networks
CONCLUSION succeeded 100% in the detection of the training data set, except
A neural-based comparison was conducted between five the FeedForward network, which reached 87% in the training
different architectures concerning their capabilities to perform data set and 75% in the validation set. It could reach better
machinery fault diagnosis. performance by adding more hidden neurons or more hidden
This comparison was demonstrated with the use of a layers.
desktop test rig, which was subjected to two different While for validation data set, two networks reached 100%
mechanical faults: mass unbalances, and structure looseness, correct fault diagnosis, which were the Perceptrons and the
seeking to examine the capabilities of different networks for LVQ. While the Self-Organizing network was very interesting
diagnosing rotating machinery faults, based on the because it needed only one data file for each test rig condition
characteristic vibration signatures and amplitude spectrums of for the training phase, and with this small amount of training
each fault, measured on the desktop test rig. data, it succeeded in the diagnosis of 84% of the validation
data, which demonstrate its high capability in this issue.

Although the perceptrons and LVQ reached 100% results [6] Dellomo, M. R., 1999, “Helicopter Gearbox Fault
in training and diagnosis, yet it is our opinion that the LVQ has Detection: A Neural Network Based Approach”,
a greater potential in actual applications, and is actually our ASME, Journal of Vibration and Acoustics, 121, pp.
choice for future work on diagnosis. Further work is planned to 265-270.
include more faults in the comparison, as well as the possibility [7] Narendra, K. S., and Parthasarathy, K., 1990,
of detecting multiple faults. This should lead to a better “Identification and Control of Dynamical Systems
understanding of the performance of the neural networks in using Neural Networks”, IEEE Transaction on Neural
diagnosis. Moreover, it is planned to combine the neural Networks, Vol. 1, pp. 4-27.
network with a fuzzy logic inference system to be able to obtain [8] Billings, S. A., Jamaluddin, H. B., Chen, S., 1992
a robust diagnostic tool. “Properties of Neural Networks with Applications to
Modeling Non-Linear Dynamical Systems”, Int. J. of
ACKNOWLEDGMENTS Control, 55, No. 1, pp. 193-224.
The authors would like to acknowledge the support [9] Martin T. Hagan, Howard B. Demuth, Mark Beale,
received from the U.S.-Egypt Joint-Board on Scientific and 1996, "Neural Network Design", PWS Publishing
Technological Cooperation under project ID Code OTH4-004- company.
001. Contract/Agreement No. 110. This support is gratefully [10] Hagan, M. T., and Menhaj, M., 1994, “Training
acknowledged. Feedforward Networks with Marquardt Algorithm”,
IEEE Transactions on Neural Networks, Vol. 5, No.6,
REFERENCES pp. 989-993.
[1] Zurada, Jack M. 1992, "Introduction to artificial [11] "DAQ AT-MIO-16F-5 User Manual", 1994,
neural systems", West Publishing Company. NATIONAL INSTRUMENTS Corporation, Austin.
[2] Christopher M. Bishop, 1995, "Neural Networks for [12] www.LabVIEW.com
pattern recognition" Oxford University Press Inc. New [13] "LabVIEW Analysis VI Reference Manual", 1993,
York. NATIONAL INSTRUMENTS Corporation, Austin.
[3] FA-LONG LUO and ROLF UNBEHAUEN. 1998, [14] "LabVIEW Data Acquisition VI Reference Manual",
"Applied Neural Networks for signal processing", 1993, NATIONAL INSTRUMENTS Corporation,
Cambridge University Press. Austin.
[4] Roemer, M. J., Hong, C., Hesler, S. H., 1996, [15] Howard Demuth, Mark Bael, 1998, "Neural Network
“Machine Health Monitoring and Life Management Toolbox For use with MATLAB, User's Guide
Using Finite-Element-Based Neural Networks”, Version 3", fifth printing, version 3, by the
ASME Transaction, Vol. 118, pp. 830-835. MathWorks, Inc.
[5] C. James Li and Yimin Fan, 1999, "Recurrent Neural
Networks for Fault Diagnosis and Severity
Assessment of a Screw Compressor", Transactions of
the ASME, Journal of Dynamic Systems,
Measurements, and Control, 121, pp. 724-729.

Comparison of Neural Network Architectures For Machinery Fault Diagnosis

Uploaded by

Copyright:

Available Formats

Comparison of Neural Network Architectures For Machinery Fault Diagnosis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparison of Neural Network Architectures For Machinery Fault Diagnosis

Uploaded by

Copyright:

Available Formats

CDI TOC

Proceedings of ASME Turbo Expo 2003

COMPARISON OF NEURAL NETWORK ARCHITECTURES FOR MACHINERY FAULT

ABSTRACT the compressor dynamics followed by two recurrent networks,

1 Copyright © 2003 by ASME

2 Copyright © 2003 by ASME

a = tansig (IW*P + b) a = purelin (LW*P + b )

Figure 3 Multiple Layers in Feed-Forward Networks

This network can be used as a general function

3 Copyright © 2003 by ASME

4 Copyright © 2003 by ASME

5 Copyright © 2003 by ASME

DESIGN OF TEST RIG

6 Copyright © 2003 by ASME

Time domain Windowing

7 Copyright © 2003 by ASME

Figure 10 Spectrum for test rig with no faults

Figure 12 Spectrum of structural looseness fault

Table 1 consists of seven columns and 30 rows. The first

8 Copyright © 2003 by ASME

File name Class Perceptrons Linear FeedForward Self-Organizing LVQ

Tà File used for Training, Và File used for Validation

9 Copyright © 2003 by ASME

10 Copyright © 2003 by ASME

You might also like

a = tansig (IWP + b) a = purelin (LWP + b )