Vetomacpaper 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Roller Bearing Fault Detection with DCT Signal Processing and Machine

Learning Classifier

ABSTRACT – Condition monitoring enables us to reduce that number significantly and creates a strong defense
against brutal breakdowns. Because of the intricate nature of mechanical system structure and operating condition,
bearing vibration data sometimes have visible nonlinear and non-stationary time-varying features. It is difficult to fully
explain the non-stationary time-varying properties when conducting a study entirely in the time or frequency domain,
which is based primarily on stationary signals. The fault within the bearing affects the machine's performance and
production process. Detection and diagnosis of bearing faults are essential aspects of industries. Time-frequency
analysis efficiently reveals the machinery's health condition when signals are non-stationary. It has been noted that the
time-frequency-based analysis possesses a significant amount of responsive fault details regarding the need for
bearings, which can be implied from various features. In this work, we propose Discrete Cosine Transform based fault
diagnosis technique. In work by authors, discrete cosine transforms belonging to cosine function at different
frequencies. Statistical features are ranked using the ReliefF feature ranking technique. Results are obtained from the
maximum possible machine learning classifiers. It revealed that the Decision tree classifiers are diagnosed bearing
faults more accurately and correctly. For other classifiers also, the tenfold accuracy is better.
Keyword: Condition monitoring; Time-Frequency analysis; DCT; ReliefF; Decision tree.

1. Introduction
Rolling bearings are among the essential components in most machinery applications. Despite careful design,
manufacturing, and testing of the bearings, it is observed that the bearing does not attain its calculated service life.
Premature bearing failure can occur for a variety of reasons [1]. There are several reasons bearings can be damaged or
fail, and Failures of bearings will generally cause financial losses due to production loss and consequential damage to
adjacent parts [2]. To detect problems in bearings or deviations from a typical operation, the vibration monitoring
method has been frequently deployed for defect diagnoses of rotating machinery [3]
Time and frequency-based Fault topographies are the essential parameters to reveal the information of healthy and
defective bearings based on their variation. Widely used features are Mean, median, Mode, RMS, standard deviation,
variance, kurtosis, skewness, energy, etc.[4] [5]. Many researchers have utilized various signal processing techniques
for bearing fault diagnosis. Some are ensemble empirical mode decomposition (EEMD) and fuzzy c-means [6].
Another study proposed a methodology to diagnose bearing faults using Hilbert Transform and Fast Fourier Transform
[7]. Wavelet transform-based methods are mainly used for time-frequency signal processing. General and hybrid
approaches are also considered for diagnosis purposes. The results obtained through numerical and experimental
simulation investigated the methodology's effectiveness. Walsh hadamend transform required less storage space and
less computation time for analysis. DCT is a popular technique to analyze signals in the frequency domain. Discrete
cosine transform is an excellent classical compression algorithms procedure that yields a mean squared error (MSE).
Consequently, data compression through discrete cosine transformation is quite common. [8]. As it works almost as

1
well as the theoretically ideal Karhunen-Loeve transformations for a wide range of signals, the discrete cosine
transform (DCT) is frequently employed in block signal coding. DCT can concentrate highly correlated data
information, transforms it into the frequency domain, and has good performance of autocorrelation within the signal.
DCT compresses the matrix data according to the statistical features of the signals in the frequency range [9].
Feature extraction and selection is a critical stage that affects the performance of diagnosis accuracy. Various authors
propose feature selection criteria for obtaining better results with less computation time. Different feature selection
criteria and better accuracy with fewer features to diagnose various bearing faults are described[10]. Statistical features
were computed after decomposing a signal and afterward ranked using ReliefF score from various faults conditions
recorded from the CWRU data set. Learning from existing and past data is helpful for effectively utilizing machine
learning (ML) algorithms to predict and classify various bearing faults. Machine learning algorithms help analyze the
fault patterns from the measured signal [11]. Various authors have successfully applied different ML algorithms to
detect and diagnose faults and severity levels. Driven by the above facts, below are the key achievements of the
recommended methodology:
1 The CWRU bearing test rig was used in investigations to collect signals under various bearing situations at
different speeds, including Healthy, IRD, ORD, and BD.
2 Essential features are considered and analyzed by the 1-D DCT signal processing method.
3 ReliefF is considered for feature ranking.
4 The subset of features is evaluated by tenfold accuracy by considering the machine learning analyzer.
The paper is ordered as follows: Section 2 briefly describes Discrete Cosine Transform, Section 3 introduces machine
learning techniques, the ReliefF feature selection method is discussed in section 4, Section 5 focuses on the experiment
procedure, and Section 6 highlights the results and discussion and lastly, in Section 7, the conclusion is tinted. Fig.1
displays the flow chart of the approach used in our study.

DCT SIGNAL Machine BEARING


VIBRATION RELIEF F
PROCESSING Learning FAULT
SIGNALS FEATURE
METHOD Methods DIAGNOSIS

Fig.1 Flow chart of the process.

2. Discrete Cosine Transform (DCT)


A popular transformation method in signal processing and data compression is DCT, which Nasir Ahmed first
suggested in 1972 [12]. DCT is a part of the standardized Fourier transform, which decomposes the signal into a cosine
function. It works on actual data points, with orthogonally and symmetrysity compared to DFT, which also contains a
sine function. Due to its better energy compaction characteristics, it is mainly used in signal processing, image
processing, digital media, and audio signals. Mainly eight types of DCT are studied by the researcher. Out of them, 1-
D DCT is utilized for bearing vibration signal analysis. It can store large amounts of information in very small
2
frequency components of a given signal with much less storage space and quick time. If we consider 𝑥 as an input
vector of 𝑁 samples, then its DCT 𝑦 can be defined as,
𝑦 = 𝐷𝑇 𝑥
Where the elements of matrix 𝐷 can be stated as,
1

𝑁
𝐷 (𝑚, 𝑙) =
2 𝜋(2𝑚 + 1)𝑙
√ cos ( )
{ 𝑁 2𝑁

Where for first case 𝑙 = 0 and, for second case1 ≤ 𝑙 ≤ 𝑁, 0 ≤ 𝑚 ≤ 𝑁 − 1

3. Feature selection and ranking using ReliefF


To minimize complexity without degrading the content included, selecting features is essential for choosing a minimal
portion of the features from the initial feature subset. Which features must be kept and which must be eliminated
depends only on the technique used. The meaning of each attribute in a feature set regarding class labels should be
measured by an attribute selection tool to remove any outliers [13]. Feature-ranking procedures include dimension
reduction, information gain, and Fisher Score ReliefF. In addition to decreasing dimensionality, feature-ranking
algorithms improve feature reparability and preserve necessary information. Sikonja et al. [14] Used ReliefF as a
feature subset assortment method in their learning. They found that it is a dominant attribute estimator process that
can be applied to several classification problems. RF calculates the weight of the feature 𝑊𝑖 from the feature set𝑋𝑖 .
Let 𝑁𝐻𝑖 and 𝑁𝑀𝑖 characterize the adjacent hit and adjacent miss from the same class and opposite class, respectively.
The illustrations closed within the class are known as the nearest hit, and the closest different class instance is known
as the nearest miss. Weight can be computed by [15].
𝑤𝑖 = 𝑤𝑖 + 𝜀0 |𝑋𝑖 − 𝑁𝑀𝑖 | − 𝜀1 |𝑋𝑖 − 𝑁𝐻𝑖 |
The weight of a feature depends on the weight gain by that feature in nearby instances of the same class. ReliefF
uses𝜀0 = 𝜀1 = 1 suggests that inside-class preservation and inter-class discrepancy are weighted correspondingly.
Table 1. Feature Extraction using DCT
Features Formula
𝑎1 + 𝑎2 + 𝑎3 + ⋯ 𝑎𝑛
Average 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 =
𝑛
𝜇4
Kurtosis 𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = 4
𝜎
The distribution of probability in a given data set can be found
Skewness
by skewness
∑(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)
Covariance 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝑁−1
It defines as a variation in the data concerned with an average
Standard Deviation
value

3
∑(𝑥𝑖 − 𝑥̅ )2
Variance 𝑉𝑎𝑟𝑖𝑒𝑛𝑐𝑒 = √
𝑁−1

Root means Square ∑ 𝑥𝑖 2


𝑅𝑀𝑆 2 =
Level 𝑛
Peak2Peak It is a difference between the highest and lowest data points.

Peak2Rms It shows the difference between peaks to rms values.


Root Sum of Square It represents the summation of the squared value of data.
Level
𝑆
Signal to Noise ratio 𝑆𝑁𝑅 = 20 log10
𝑁
Mean Frequency It estimates the tendency of power distribution
It is the ratio dividing the total mean frequency by the total
Median Frequency
values.

Band power It is considered the average value of power spectrum density


𝑃𝑒𝑎𝑘
Crest Factor 𝐶. 𝐹. =
𝑅𝑀𝑆
𝑃𝑒𝑎𝑘 𝑡𝑜 𝑃𝑒𝑎𝑘
Foam Factor 𝐹. 𝐹. =
𝐴𝑣𝑒𝑟𝑎𝑔𝑒
𝑅𝑀𝑆
Shape Factor 𝑆. 𝐹. =
𝐴𝑣𝑒𝑟𝑎𝑔𝑒
𝑃𝑒𝑎𝑘
L Factor 𝐿. 𝐹. =
𝑅𝑜𝑜𝑡 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒

Shannon Entropy It measures the Uncertainty in the data process.

ReliefF score
0.1200
0.1000
Score value

0.0800
0.0600
0.0400
0.0200
0.0000
-0.0200

Feature name

Fig. 2 ReliefF Feature ranking

4
4. Machine Learning

ML refers to the capability of a statistical algorithm to enhance its performance by applying artificial intelligence (AI)
approaches to mimic human learning methods. There are three main types of ML: supervised, unsupervised, and
reinforcement learning.
4.1 Decision Tree

In supervised machine learning, a Decision tree classifier can handle classifications and regression problems
smoothly.it is working on a tree structure. Class and features are considered as leaves and branches of the tree. It is
one of the simplest algorithms used in various applications for data analysis [16].
4.2 Discernment Analysis

Discernment analysis is separating two data sets statistically. Linear discriminant analysis (LDA) is one of the popular
methods. It is closely related to regression analysis techniques like ANOVA. In ANOVA utilizing, categorical data
are considered, while in Discernment analysis, continuous data are used [17].
4.3 Naive Bayes

It is the Bayesian statistic method for consideration of the model that relates to the class label and instance to decide
the vector of features. Here it is assumed that a particular feature's value is free from another feature's value. It can be
used for real-time data prediction [18].
4.4 Support vector machine

In the last few decades, the support vector algorithm has emerged as an effective supervised learning method for
classification and regression [19]. SVMs have shown better prediction capabilities, even in studies with small sample
numbers, because of their relative simplicity and versatility in tackling various classification issues. In general, two
classes using different kernel functions are differentiated via hyperplane. The support vector identifies the nearest data
point for evaluating the hyper-plane. Further detail on SVM can be found in the literature [20].
4.5 K Nearest Neighbor (KNN)

A supervised learning method with non-parametric regression and classification is the basis of an application for
implementing K-nearest neighbor (K-NN). KNN can categorize the similarity between the two features. With the help
of extracted features, the feature vector can be analyzed using Euclidean distance to spot the closest neighbor and
compute the output class [21]. Euclidean distance to determine the closest neighbors is represented as

∑n (xi − yi )
Euclidean (A, B) = √ i=1
n

4.6 Ensemble Classifier

Bagging is the ensemble learning method that reduces variance within a considered dataset. Rather than depending on
a single variable, the bagged tree depends on many decision trees. Decision trees are uneven, meaning that an
insignificant variation in the records can lead to a momentous transformation in the arrangement of the optimal
5
decision tree [22]. Bagging is used with decision trees, which significantly increases models' stability in improving
accuracy and reducing variance, eliminating the challenge of over fitting.

5. Experimental setup
Case Western Reserve University (CWRU) generated all non-stationary vibration signals corresponding to different
bearing defect states [23]. It is observed from the figure that vibration signals are captured with the help of an
accelerometer from the drive end of the motor, which gets allied with the help of coupling devices to the dynamometer.
Four different bearing conditions are considered for motor speeds of 1720, 1750, 1772, and 1797 RPM.
The motor load very from 1 Hp to 3 Hp. The sampling rate is 12 kHz, and each vibration signal lasts 10 seconds. This
study used a 6205-2RSL JEM SKF deep groove ball bearing with the dimensions and values listed in Table 2.

Fig. 3 CWRU Bearing test rig.


Table 2 CWRU Bearing specifications.
Class of Outer race Inner race Size of the Quantity of
bearing Diameter (mm) Diameter (mm) ball (mm) the ball
6205 (SKF) 51.99 25.01 7.94 9

Healthy
NO. OF CASES Bearing
6%
Ball Inner
Fault Race
25% Fault
25%

Outer
Race
Fault
44%Fig. 4 CWRU Bearing conditions for analysis.

6
Fig. 5 a sample Signal for bearing condition.
6. Results and discussion

This investigation used the CWRU-bearing dataset information to extract from the DCT signal processing method.
The 64 vibration signals from CWRU represent various fault conditions of bearing used for the study with fault
conditions HB, IRD, ORD, and BD. Several statistical features were extracted from each condition and ranked with
the ReliefF score. It has been observed that relevant information with various fault conditions cannot be obtained from
statistical features. Classifiers like SVM, KNN, etc., are applied to the feature vectors and training, and the tenfold
cross-validation is performed. The attributes are divided into ten equivalent sets, and ten reiterations are done in the
tenfold cross-validation scheme. Confusion matrices were obtained from tenfold procedures with considered
classification algorithms. Table 6.2 show the comparison results attained from all machine learning classifiers. It is
perceived that 100% tenfold accuracy was reached from all the decision trees classifiers. Ternary fault conditions are
considered. For discernment analysis, 83.3% tenfold accuracy was achieved for HB-IRD-ORD bearing conditions.
For naïve base classifier, kernel base naïve classifiers get 91.1% accuracy. For SVM cubic SVM classifier gives 86.1%
tenfold accuracy for HB-BD-IRD bearing conditions. Similarly, a way for fine KNN is 88.9% close to SVM. And
finally, for the ensemble type of classifier, the bagged tree gets good results for all three ternary and quaternary
conditions.
Table 3 Tenfold accuracy.

7
Sr.
Type Classifiers HB_BD_IRD HB_IRD_ORD HB_ORD_BD HB_BD_IRD_ORD
No

1 Fine Tree 100 70.8 77.1 71.9


Decision
2 Medium Tree 100 70.8 77.1 71.9
Trees
3 Coarse Tree 100 72.9 77.1 59.4
Discernme
Linear
4 nt 69.4 56.3 83.3 59.4
Discriminant
Analysis
Gaussian Naïve
5 Naïve 69.4 62.5 77.1 57.8
base
Bayes
6 Kernel Naïve base 91.7 68.8 81.3 64.1
7 Linear SVM 69.4 68.8 70.8 51.6
8 Quadratic SVM 80.6 62.5 83.3 53.1
9 Cubic SVM 86.1 66.7 77.3 54.7
Support Fine Gaussian
10 Vector 63.9 58.3 58.3 43.8
SVM
Machine Medium Gaussian
11 72.2 62.5 75 53.1
SVM
Coarse Gaussian
12 52.8 58.3 66.7 48.8
SVM
13 Fine KNN 88.9 77.1 83.3 71.9
14 Medium KNN 50 54.2 56.3 35.9
15 Nearest Coarse KNN 44.4 58.3 58.3 43.8
Neighbour
16 Classifier Cosine KNN 50 56.3 56.3 43.8
17 Cubic KNN 47.2 45.8 56.3 32.8
18 Weighted KNN 69.4 64.6 66.7 54.7
19 Boosted Trees 44.4 58.3 58.3 43.8
20 Bagged Trees 97.2 79.2 97.9 87.5
Ensemble Subspace
21 80.6 66.7 79.2 60.9
Classifier Discriminant
22 Subspace KNN 86.1 66.7 72.9 53.1
RUS Boosted
23 86.1 54.2 75 54.7
Trees

(a) (b)

8
(c) (d)
Fig. 6 Confusion matrix (a) HB-BD-IRD, (b) HB-IRD-ORD, (c) HB-ORD-BD, and (d) HB-BD-IRD-ORD.

Further, to assess class-wise precision and usefulness of all ML algorithms, when errors are more, a confusion matrix
is one of the active ways to summarize the individual accuracy in tabular form. In the confusion matrix, diagonal
elements show the accurate prediction, whereas the mispredicted cases are highlighted in red, as shown in Figure 6.
For the fine-tree classifier, all the instances are predicted correctly for the HB-BD-IRD case. For HB-IRD-ORD, ten
cases are incorrectly classified by the classifier, and for HB-BD-ORD, one instance is misinterpreted in a given number
of instances. Lastly, considering all the 64 cases, 8 cases are incorrectly identified by the given classifiers. It should
also be noted that all circumstances of ball deficiency and standard bearing are reliably recognized for CWRU using
Decision tree classifiers. The recommended methodology's effectiveness for defect spots can be demonstrated by
tenfold cross-validation accuracy.

7. Conclusion
In this work, the authors applied Discrete Cosine Transform to diagnose various fault conditions belonging to the
CWRU data set. Several statistical features were extracted and ranked with ReliefF score to identify fault conditions
efficiently. The result can be summarized as follows:
Twenty-three classifiers of different bearing fault conditions are compared with tenfold accuracy.
1 Maximum tenfold accuracy was observed at the eighth and seventeen features with HB-BD-IRD conditions
with the given dataset.
2 Maximum accuracy of 83. 3 % is observed for linear discriminate analysis with all 19 features extracted from
the CWRU dataset.
3 Maximum accuracy of 91.3 % is observed for the naïve Bayes classifier with all 19 features extracted from
the CWRU dataset.
4 Maximum accuracy of 86. 1 % is observed for SVM with all 19 features extracted from the CWRU dataset.
9
5 Maximum accuracy of 88. 9 % is observed for KNN with all 19 features extracted from the CWRU dataset.
6 The Ensemble classifier has a maximum accuracy of 97. 2 % with all 19 features extracted from the CWRU
dataset.

REFERENCES
[1] Z. Feng, M. Liang, and F. Chu, "Recent advances in time-frequency analysis methods for machinery fault
diagnosis: A review with application examples," Mech. Syst. Signal Process., vol. 38, no. 1, pp. 165–205,
2013.
[2] S. A. McInerny and Y. Dai, "Basic vibration signal processing for bearing fault detection," IEEE Trans.
Educ., vol. 46, no. 1, pp. 149–156, Feb. 2003.
[3] N. Tandon and A. Choudhury, "A review of vibration and acoustic measurement methods for the detection of
defects in rolling element bearings," 1999.
[4] P. K. Kankar, S. C. Sharma, and S. P. Harsha, "Vibration-based fault diagnosis of a rotor-bearing system
using artificial neural network and support vector machine," Int. J. Model. Identif. Control, vol. 15, no. 3, pp.
185–198, 2012.
[5] A. Rai and S. H. Upadhyay, "A review on signal processing techniques utilized in the fault diagnosis of
rolling element bearings," Tribol. Int., vol. 96, pp. 289–306, Apr. 2016.
[6] V. Dave, S. Singh, and V. Vakharia, "Diagnosis of bearing faults using multi-fusion signal processing
techniques and mutual information," Indian J. Eng. Mater. Sci., vol. 27, no. 4, pp. 878–888, 2020.
[7] T. Bettahar, C. Rahmoune, D. Benazzouz, and B. Merainani, "New method for gear fault diagnosis using
empirical wavelet transform, Hilbert transform, and cosine similarity metric," Adv. Mech. Eng., vol. 12, no. 6,
2020.
[8] K. Bhakta, N. Sikder, A. Al Nahid, and M. M. M. Islam, "Rotating Element Bearing Fault Diagnosis Using
Discrete Cosine Transform and Supervised Machine Learning Algorithm," 5th Int. Conf. Comput. Commun.
Chem. Mater. Electron. Eng. IC4ME2 2019, pp. 11–12, 2019.
[9] J. Li, H. Wang, L. Song, and L. Cui, "A novel feature extraction method for roller bearing using sparse
decomposition based on self-Adaptive complete dictionary," Meas. J. Int. Meas. Confed., vol. 148, p.
106934, 2019.
[10] K. Kappaganthu and C. Nataraj, "Feature selection for fault detection in rolling element bearings using
mutual information," J. Vib. Acoust. Trans. ASME, vol. 133, no. 6, pp. 1–11, 2011, doi: 10.1115/1.4003400.
[11] P. K. Kankar, S. C. Sharma, and S. P. Harsha, "Fault diagnosis of ball bearings using machine learning
methods," Expert Syst. Appl., vol. 38, no. 3, pp. 1876–1886, 2011.
[12] K. R. R. N. A. T. Natarajan, “Discrete Cosine Transfonn,” no. January, pp. 90–93, 1974.
[13] T. W. Rauber, F. De Assis Boldt, and F. M. Varejão, "Heterogeneous feature models and feature selection
applied to bearing fault diagnosis," IEEE Trans. Ind. Electron., vol. 62, no. 1, pp. 637–646, 2015.
[14] M. Robnik-Šikonja and I. Kononenko, “Theoretical and Empirical Analysis of ReliefF and RReliefF,” Mach.

10
Learn., vol. 53, no. 1–2, pp. 23–69, 2003.
[15] V. Vakharia, V. K. Gupta, and P. K. Kankar, "Efficient fault diagnosis of ball bearing using ReliefF and
Random Forest classifier," J. Brazilian Soc. Mech. Sci. Eng., vol. 39, no. 8, pp. 2969–2982, Aug. 2017.
[16] V. Sugumaran and K. I. Ramachandran, "Automatic rule learning using decision tree for fuzzy classifier in
fault diagnosis of roller bearing," Mech. Syst. Signal Process., vol. 21, no. 5, pp. 2237–2247, 2007.
[17] R. Z. Haddad and E. G. Strangas, "Fault detection and classification in permanent magnet synchronous
machines using Fast Fourier Transform and Linear Discriminant Analysis," Proc. - 2013 9th IEEE Int. Symp.
Diagnostics Electr. Mach. Power Electron. Drives, Sdemped 2013, pp. 99–104, 2013.
[18] V. Muralidharan and V. Sugumaran, "A comparative study of Naïve Bayes classifier and Bayes net classifier
for fault diagnosis of monoblock centrifugal pump using wavelet analysis," Appl. Soft Comput. J., vol. 12, no.
8, pp. 2023–2029, 2012.
[19] R. Liu, B. Yang, E. Zio, and X. Chen, "Artificial intelligence for fault diagnosis of rotating machinery: A
review," Mechanical Systems and Signal Processing, vol. 108. Academic Press, pp. 33–47, Aug. 01, 2018.
[20] A. Widodo and B. S. Yang, "Support vector machine in machine condition monitoring and fault diagnosis,"
Mech. Syst. Signal Process., vol. 21, no. 6, pp. 2560–2574, 2007.
[21] A. Moosavian, H. Ahmadi, A. Tabatabaeefar, and M. Khazaee, "Comparison of two classifiers; K-nearest
neighbor and artificial neural network, for fault diagnosis on a main engine journal-bearing," Shock Vib., vol.
20, no. 2, pp. 263–272, 2013.
[22] B. S. Yang, X. Di, and T. Han, "Random forests classifier for machine fault diagnosis," J. Mech. Sci.
Technol., vol. 22, no. 9, pp. 1716–1725, 2008.
[23] A. Boudiaf, A. Moussaoui, A. Dahane, and I. Atoui, "A Comparative Study of Various Methods of Bearing
Faults Diagnosis Using the Case Western Reserve University Data," J. Fail. Anal. Prev., vol. 16, no. 2, pp.
271–284, 2016.

11

You might also like