A comparative analysis of fault detection and process diagnosis methods based on a signal processing paradigm

Abstract

An important paradigm in industrial engineering for fault detection and diagnosis purposes is signal processing. The various methods consider methods in the time, frequency, or time–frequency domain for signal processing as state and output signals from the considered process. The objective of this work is to perform a comparative analysis of the most used methods based on a signal processing paradigm and in the context of fault detection and process diagnosis. The electromechanical equipment that generates mechanical vibrations—as an effect of bearing faults—is considered and analyzed. The recorded data are explored with smaller and sliding frames, adapted to the processing criteria used. Seven methods are considered for evaluation: two in the time domain, two in the frequency domain and three in the time–frequency domain. The main problem is to extract and select the right features to use in the classification stage. The methods of the time domain are based on statistical moments and signal modeling. The methods in the frequency domain use either the discrete components of power spectra or the features of the frequency domain. In the time–frequency domain, the coefficients of the time–frequency transforms define digital images, which are further processed. For testing, the methods are evaluated with real recorded data from bearings with several types and sizes of faults, i.e., incipient, medium, advanced, and large. Finally, the considered methods are compared from the point of view of five criteria, namely, the recognition rate, window length, response time, computational resources, and complexity of the algorithms. A global quality criterion is built and used to assess the quality of the methods. The results of the computer-based experiments show acceptable performance for all methods for the test case of bearings but the potential to detect more complex faults and change detection in the behavior of the machines, in general. Time–frequency methods offer an optimum.

Article highlights

A practical comparative overview of the main methods based on signal processing paradigm used in process diagnosis and detection problem.

Comparing methods from different domains of representations: time, frequency, and time-frequency domain.

An example of quality criterion in assessing the signal processing methods for process diagnosis and fault detection.

Full text

Translate

Turn on search term navigation

Introduction

Change detection, fault detection and classification, and process diagnosis are important activities in industrial engineering, e.g., for condition monitoring, cost optimization, maintenance, and safety. A more general name for the above problem is change detection and diagnosis (CDD), which considers the change in a state of the process. CDD includes fault detection and diagnosis. The roots of the field come from [1, 2]. Depending on imposed objective and computational resources, the solutions of the above problems directly use the available data, which is called data-driven approach, or looks to find and use specific models and patterns in data, which is called data-modelling or model-driven approach. Obviously, data–driven approaches do not ask preliminary information about data but could fail when the basic properties of data or of the source which generated data are changing. Some examples and more details of these are available in [3, 4].

On another level, model-driven approaches are more sophisticated and need knowledge about the process under study, to identify and use the necessary model. The two categories are used sometime together, as described in [5, 6–7]. In model-driven category, two main approaches based on models are available, i.e., with equations of the process and signal processing transforms, as described, e.g., in [8, 9], which include machine learning and artificial intelligence techniques, [10]. The first one is a process model, and the second is a model of the signal or of the generation process.

The present work looks more at the second approach, i.e., the signal modeling and processing paradigm. The signals could come from mechanical vibrations but also could represent electrical (e.g., current, voltage) or non-electrical variables (e.g., acoustic, and ultrasonic waves). The methods can be used not only for industrial and physical processes, but also in medicine and biology, where various kind of signals are available. Statistical techniques are applied to all three domains of representation, i.e., time, frequency, and time–frequency. Current trends and results are obtained with advanced signal processing techniques and combinations of classic methods, e.g., multiscale transforms, information fusion, and machine learning [11, 12, 13–14].

Vibration and acoustic emissions are common effects of incipient and advanced machinery faults and are intensively used as signal sources, e.g., in [15, 16]. All signals coming from monitored machines/processes are preprocessed by analog-to-digital conversion (if this is the case), low-pass or bandpass filtering, and source separation, as in [17, 18], and beamforming [19]. The level of research related to faults in bearings, based on vibration processing, is well shaped by [20, 21–22].

The rationale of the work is coming from the objective and necessity of researchers to have a guide and a reference source with practical examples to see the methods based on signal processing at work. Three main domains of signal processing, i.e., time (t), frequency (f), and time–frequency (t, f), are considered for CDD problem. The end user will decide which one will be implemented, based on specific requirements of the diagnosis and fault detection, in conjunction with the process under study.

Figure 1 presents the general structure for change detection and diagnosis by using a signal processing approach. Signals x(t) from the observed and measured process are input for the signal processing chain, for segmentation and modeling, followed by the feature extraction and selection block, which prepares the input for classification, i.e., for decisions related to CDD. The considered signals are mechanical vibrations, but other types of signals could be considered for processing, e.g., current or voltage signals. The signal processing level is drawn at a higher level than the level of the process, which means also that the running of the observed process is not affected or changed.

Fig. 1 [Images not available. See PDF.]

Change detection and process diagnosis structure using the signal processing paradigm

It is important to declare the main hypothesis used in this work, i.e., the faults are independent one another one, which could be a little far of reality, because the real physical faults could not respect the above hypothesis. There is a general solution based on blind source separation (BSS), applied before the fault detection and process diagnosis. For each identified source one or more discussed methods for fault detection and based on signal processing paradigm could be considered, [23]. This is a more complex scenario.

A set of the three domains, i.e., time, frequency, and time–frequency, are briefly presented in the next three sections, i.e., Sect. 2 for the time domain, Sect. 3 for the frequency domain and Sect. 4 for the time–frequency domain. In the time domain, statistical moments and signal modeling approaches are discussed. In the frequency domain, two methods for power spectra estimations are presented and discussed. Section 4 addresses some specific data transforms and the structure of the method for CDD based on time–frequency images. Each domain has some basic general features as well as specific features. Section 5 describes the basic features used for CDD objectives. The results of the experiments using real data records are presented in Sect. 6. The test data are described and analyzed. A performance criterion is introduced and evaluated for the considered methods. Finally, the conclusion section presents the main qualitative results and some potential future steps to follow.

Methods of the time domain

Tine domain methods are the oldest in the field and use statistical moments of various orders and types, such as average, median, mode, variance, standard deviation, peak values, RMS (root-mean-square), crest factor, skewness, and kurtosis coefficient. (The basic standard equations will be presented later). These are considered features. As an alternative, data could be used for signal modeling, and the parameters of the model are features for analysis and classification. The values and trends of the features are compared with the reference values (which describe the normal working conditions). If the values of the features are greater than the reference limits, then a fault is detected, as presented in Fig. 2.

Fig. 2 [Images not available. See PDF.]

Computational structure for the time domain methods

Let x represents the data sequence (time series) and the case of real-valued signal. There are various solutions, starting with the general ARMA (Autoregressive Moving Average) model and continuing with ones, AR (Autoregressive) or MA (Moving Average). For example, the ARMA model is

x [n] + \sum_{i = 1}^{M} a_{i} x [n - i] = v [n] + \sum_{j = 1}^{K} b_{j} v [n - j]

with M values for the AR part and K values for the MA part. The signal v is a white random sequence. The parameters make the connection between the current value of the signal, x[n], past values of the signals, x[n-i], and past values of the noise v[n-j], the input of the model. As example, in the case of the Kalman estimator and for a model or order (M + K), the vector of parameters, at the discrete time moment n, defines the state vector as

1.1

x (n) = {[a_{1} (n), a_{2} (n), \dots, a_{M} (n), 1, b_{1} (n), \dots, b_{K} (n)]}^{T}

and the matrix of measurement is

1.2

C (n) = [- y (n - 1), . . ., - y (n - M), v (n), v (n - 1), . . ., v (n - K)]

The output of the model is then

1.3

\hat{y} (n) = - \sum_{k = 1}^{M} a_{k} (n) \cdot \hat{y} (n - k) + \sum_{k = 1}^{K} b_{k} (n) \cdot v (n - k) + v (n) = C (n) \cdot x (n)

An error criterion is considered and minimized, based on the difference between the real output y(n) and the estimated one, $\hat{y} (n)$ . The state vector x(n) generates the desired parameters of the model.

The parameters are estimated in an adaptive framework by using, e.g., the recursive least squares (RLS) algorithm or Kalman estimator, as described in detail in [24, 25–26].

Methods of the frequency domain

Figure 3 presents the structure of the computation for CDD by using information from the frequency domain, i.e., the power spectrum of the signal. The spectrum is processed to extract and select the right features for classification. Finally, a classifier is used to estimate the fault or the state of the observed process. In fact, if the selected features are not acceptable for the available classifier, the power spectrum could be used directly for classification. The features of the frequency domain describe the distribution of power over all frequency ranges or in the range/bandwidth of interest.

Fig. 3 [Images not available. See PDF.]

Computational structure for the frequency domain methods

There are two methods for estimating the power spectrum. The first is based on the following definition:

S_{XX}^{(1)} (ω) = \frac{1}{T} {|\underset{T}{\int^{}}, x, (t), e^{- j ω t}, d, t|}^{2} = \frac{{|X (j ω)|}^{2}}{T}

where T is the time interval of the analysis. The variable

X (j ω)

is the Fourier transform of the signal x(t). The second is based on the Wiener-Hincin theorem, in which the inverse Fourier transform is:

S_{XX}^{(2)} (ω) = F \{R_{xx}, (τ)\} = \underset{- \infty}{\int^{\infty}} R_{XX} (τ) e^{- j ω τ} d τ

where

R_{XX} (τ)

is the auto-correlation function of the signal x(t), defined as

3.1

R_{xx} (τ) = E \{x (t) x (t + τ)\} = \frac{1}{T} \int_{T} x (t) x (t + τ) d t

Methods of the time–frequency domain

Time–frequency transforms (TFTs) are used especially for data coming from nonstationary sources, such as audio signals [27], medical signals [28], or mechanical vibrations [29, 30].

The representation of the transform coefficients is an image, which is addressed by a time–frequency image (TFI)—a 2D data structure/array. Such an image contains a large amount of information, and it offers solutions to problems from various engineering fields where nonstationary signals are involved.

Change detection methods based on image processing can be classified into two classes, namely, bitemporal change detection and temporal trajectory analysis methods. The former makes a comparison at two time points. The latter considers an analysis on a quasicontinuous time scale by defining/computing the trajectories or curves from temporal image data. It is applied for cases in which a high temporal resolution is available. An important application of the temporal trajectory analysis method is real-time detection, such as video image sequence analysis. At the level of image analysis, seven categories for change detection are commonly used: direct comparison (DC), classification-based methods (CM), object-oriented methods (OOM), model methods (MM), time-series analysis (TSA), visual analysis (VA), and hybrid methods (HM) [31, 32, 33–34].

Figure 4 presents the structure of signal processing in the (t, f) domain. The signal x(t) comes from the measured process. By using a time–frequency transform, a time–frequency image I is obtained. By implementing image analysis and processing, an array F of features is constructed. If possible, a selection process is involved at the level of features to find and use the most sensitive features P for the classification block. The output of the classifier is information about the fault detected, such as the type and size, if possible.

Fig. 4 [Images not available. See PDF.]

Structure of the CDD method based on time–frequency image processing

The methods based on TFTs are efficient solutions for detecting changes in vibrational processes, i.e., changes in the frequency range and the moment of time when changes occur. This is especially valid for a periodic and dynamic fault. It indicates the capacity to detect and locate transient signals. Discrete-time data transformation is used.

The elements of TFI have physical meaning. This is important, i.e., the features extracted from these images could have meaning and can be explained and used in a more efficient way than virtual/abstract features. The three most used transforms for time–frequency analysis are the short-time Fourier transform (STFT), quadratic time–frequency transform (QTFT), and wavelet transform (WT). Some key elements are introduced in the following for each transform.

The STFT, also called the windowed Fourier transform (FT), uses a window of given length, w(t), along the signal of interest, x(t):

x_{t} (τ) = x (τ) \cdot w (t - τ)

The FT is applied to the windowed signal, allowing measurement and analysis of content in this signal segment/window:

X_{t} (ω) = \frac{1}{\sqrt{2 π}} \int_{- \infty}^{\infty} x_{t} (τ) e^{- j ω τ} d τ

By translating the analysis window, the signal is divided into segments. The analysis windows can be of various forms, such as Hamming, Hanning or Gaussian windows [35]. The width of the window specifies the resolution in time and frequency. If the window is wide, then a fine frequency resolution is obtained, but the resolution is poor in the time domain. Simultaneously, it cannot.

The STFTs are indicated for slowly varying signals over time. The Gaussian window is an optimal solution because it offers optimal resolutions in terms of time and frequency. The expression is

w_{g} (t) = \frac{1}{2 \sqrt{π α}} e^{- \frac{t^{2}}{4 α}}, α > 0

where the parameter α controls the width of the kernel.

The FT with this window function is called the Gabor transform. The result of the transformation is represented in a two-dimensional time–frequency graph/diagram, where each point in time corresponds to a spectrum based on the average characteristics of the signal in the window under consideration. The representation of the modulus of the Fourier transform is called a spectrogram.

S (t, f) = {|\frac{1}{\sqrt{2 π}} \int_{- \infty}^{\infty} x (τ) \cdot w (t - τ) e^{- j 2 π f τ} d τ|}^{2}

The trade-off between time (Δt) and frequency resolution (Δf) is a consequence of the uncertainty principle, which sets a lower limit for the product of time–frequency resolutions

Δ t \cdot Δ f > 1 / 4 π

This limitation is the main disadvantage when using spectrograms as a time–frequency analysis tool.

If x(t) is a continuous (possible complex) signal, as described by [33], the time–frequency transform can be obtained from the general formula

W_{x} (t, f) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} φ (θ, τ) \cdot x^{*} (u - \frac{τ}{2}) \cdot x (u + \frac{τ}{2}) \cdot e^{- j 2 π (θ \cdot t + f τ - θ \cdot u)} d u d τ d θ

where

φ (θ, τ)

is the kernel function, which imposes the properties of the distribution, and "*" denotes complex conjugation. The kernel function corresponds to the type of window used in window-based distributions and determines certain properties, particularly how energy is distributed over time and frequency. If the kernel function is one, then the Wigner distribution is obtained.

For the case where x(t) is an analytical signal, the Wigner distribution is called the Wigner–Ville distribution (WVD) [36]. This distribution satisfies many desirable mathematical properties, as described in the specialized literature, e.g., [37, 38–39]. In particular, the WVD is always real valued; it preserves time and frequency shifts and satisfies the marginal properties. This approach has several drawbacks, such as cross-terms.

The Wigner‒Ville distribution (WVD) aims to eliminate the limited resolution (due to the length of the observation window) of the STFT by weighting the signal x(t) with time and frequency translations of the signal itself instead of a window function. The distribution is defined by considering the FT of the instantaneous autocorrelation function $E {x ∙ x^{*}}$ :

W V D (t, ω) = \frac{1}{2 π} \int_{- \infty}^{\infty} x^{*} (t - \frac{τ}{2}) x (t + \frac{τ}{2}) e^{- j ω τ} d τ

The WVD distribution is a measure of overlapping signals from past moments to future moments. The disadvantage is that it generates interference (intermediate or cross terms), which causes large oscillation terms resulting from the superposition of the signal's separate spectral components, which can generate false interpretations. These interferences can be eliminated through averaging, but the resolution is reduced. A solution is weighted with an observation window h(t) to obtain the pseudo-Wigner–Ville distribution:

P W V D (t, ω) = \frac{1}{2 π} \int_{- \infty}^{\infty} h (τ) \cdot x^{*} (t - \frac{τ}{2}) \cdot x (t + \frac{τ}{2}) e^{- j ω τ} d τ

In this work, the Choi–Williams distribution (CWD) is used, with the following kernel function:

φ (θ, τ) = \exp [- {(θ, τ)}^{2} / σ^{2}]

which have an algorithm for fast computation, as in [40]. The parameter

σ^{2}

stands for variance and control the widespread of the kernel. This distribution function adopts an exponential kernel to suppress the cross-term that results from the components that differ in both time and frequency centers. The value of the parameter

σ

specifies to what extent intermediate terms will be mitigated. Increasing it will increase attenuation but will also decrease the time–frequency resolution of the distribution.

Wavelet transforms are localized equivalents of the Fourier transform. It is a powerful transform for representing local transient signals. The wavelet transform is defined in an analogous manner to the Fourier transform, except that harmonics are replaced by a series of basic functions (called wavelets) of the following form:

φ_{τ s} (t) = \frac{1}{\sqrt{s}} φ (\frac{t - τ}{s})

where τ and s are the parameters for translation and scaling (expansion or contraction). The transformation function

φ (.) is

is called the mother wavelet. There are many mother wavelets, such as the Mexican hat or the Morlet wavelet [41, 42].

The continuously transformed wavelet of the signal x(t) is defined by

Φ_{x}^{φ} (τ, s) = \int_{- \infty}^{\infty} x (t) {\cdot φ}_{τ s} (t) d t = \int_{- \infty}^{\infty} x (t) \frac{1}{\sqrt{s}} φ^{*} (\frac{t - τ}{s}) d t

The discrete version of the wavelet transform is obtained by first discretizing the scale parameter (s) on a logarithmic scale. The time parameter is discretized according to the scale parameter, so a different sampling rate is used for each scale. In other words, sampling is performed according to a dyadic sampling grid. Through this sampling, the signal x(t) can be decomposed by orthogonal basic functions (shifted and scaled versions of the mother wavelet, $φ (.)$ ), and the representation of coefficients is made on levels. The representation of the wavelet transform is called a scalogram.

Description of features

Features are parameters that describe the state of the process under consideration. For each domain, a set of features is presented. The main statistical features from the time domain are presented in Table 1. The possible features of the frequency domain are presented in Table 2.

Table 1. Features of the time domain

No	Name	Expression
	Sample mean	$μ = \frac{1}{N} \sum_{i = 0}^{N - 1} x (i)$
	Range	$r = \max \{x\} - \min {x}$
	Sample variance	$σ^{2} = \frac{1}{N - 1} \sum_{i = 0}^{N - 1} {(x [i] - μ)}^{2}$
	Mean of absolute values	$m a = \frac{1}{N} \sum_{i = 0}^{N - 1} \|x, (i)\|$
	RMS = Root Mean Square	$R M S = \sqrt{\frac{1}{N} \sum_{i = 0}^{N - 1} x^{2} [i]}$
	Peak value	$P k = \max \{a, b, s, (x)\}$
	Crest factor	$k c f = \frac{Pk}{RMS}$
	Skewness coefficient	$S = \frac{1}{N} \sum_{i = 1}^{N} {(x [i] - μ)}^{3} / σ_{y}^{3}$
	Kurtosis coefficient	$K = (\frac{1}{N} \sum_{i = 1}^{N} {(x [i] - μ)}^{4} / σ_{y}^{4}) - 3$

Table 2. Features of the frequency domain

No	Name	Expression
	Mean in frequency	${Pf}_{m} = m (f) = \frac{1}{E_{f}} \int_{- \infty}^{+ \infty} f \cdot {\|X, (f)\|}^{2} \cdot d f$
	Variance in frequency	$B^{2} = \frac{4 π}{E_{f}} \int_{- \infty}^{+ \infty} (f - f_{m})^{2} {\|X, (f)\|}^{2} d f$
	Central momentof order 3	$\frac{1}{N_{f} \cdot \sqrt{σ_{(f)}^{3}}} \sum_{k = 1}^{N_{f}} {(X [k] - m_{(f)})}^{3}$
	Central moment of order 4	$\frac{1}{N_{f} \cdot \sqrt{σ_{(f)}^{4}}} \sum_{k = 1}^{N_{f}} {(X [k] - m_{(f)})}^{4}$
	Median	median(X(f))
	Energy of the signal	$E_{f} = \int_{- \infty}^{+ \infty} {\|X, (f)\|}^{2} \cdot d f$
	Mean of amplitude	$m_{(f)} = E [X] = \frac{1}{N_{f}} \sum_{k = 1}^{N_{f}} X [k]$
	Variance in amplitude	$σ_{(f)}^{2} = E [{(X - m_{(f)})}^{2}]$

The processed signal is x(t) (x[i] in discrete time). A sliding nonoverlapping window of length nw < < N is considered each time the features are computed. Each window of observation is a data frame/segment, and by transformation, a set of features is obtained. For frame/window #i, the set of features is

F_{i} = {[\begin{matrix} μ & σ^{2} & \begin{matrix} r & \begin{matrix} \dots & \begin{matrix} S & K \end{matrix} \end{matrix} \end{matrix} \end{matrix}]}^{T}

Thus, for all windows i = 1,2,…, nw, a feature matrix F is defined

F = [\begin{matrix} F_{1} & F_{2} & \begin{matrix} \dots & F_{nw} \end{matrix} \end{matrix}]

The elements and the content of TFIs can be considered directly the features of the domain, which is the result of a data transform. The features could have physical meaning or could be abstract variables without an immediate clear meaning.

The (t, f) features are presented in Table 3 as an extension of the features used in the time or frequency domain. The features are computed for an image I. The first five have an immediate statistical meaning, and the last two try to cover the dynamics of the change. The features of Table 3 could be extended by features from other levels of information/knowledge representation, e.g., features based on diverse types of entropies. It is difficult to determine which features should be considered without having information about the behavior of these features and especially about their sensitivity at the considered faults and detection criteria. For example, the spectral flow feature measures the rate of change in the spectral content of the signal, and spectral flatness is calculated by dividing the geometric mean by the arithmetic mean of the spectrogram (image) Table 4 presents the data used, and described in subsection 6.1 in detail.

Table 3. Features of the time–frequency domain

Feature	Time	(t, f) Extension
Mean	$m_{(t)}$	$m_{(t, f)} = \frac{1}{N_{t} \cdot N_{f}} \sum_{i} \sum_{k} I [i, k]$
Variance	$σ_{(t)}^{2}$	$σ_{(t, f)}^{2} = \frac{\sum_{i} \sum_{k} {(I [i, k] - m_{(t, f)})}^{2}}{N_{t} \cdot N_{f}}$
Skewness	$γ_{(t)}$	$γ_{(t, f)} = \frac{\sum_{i} \sum_{k} {(I [i, k] - m_{(t, f)})}^{3}}{(N_{t} \cdot N_{f} - 1) σ_{(t, f)}^{3}}$
Kurtosis	$k_{(t)}$	$k_{(t, f)} = \frac{\sum_{i} \sum_{k} {(I [i, k] - m_{(t, f)})}^{4}}{(N_{t} \cdot N_{f} - 1) \cdot σ_{(t, f)}^{4}}$
Coefficient of variation	$c (t) = \frac{σ_{(t)}}{m_{(t)}}$	$c_{(t, f)} = \frac{σ_{(t, f)}}{m_{(t, f)}}$
Spectral flow	Frequency	$F L_{(t, f)} = \sum_{n = 1}^{N - l} \sum_{k = 1}^{N - m} \|I [n + l, k + m) - I [n, k]\|$
Spectral flatness	Frequency	$S F_{(t, f)} = N_{f} N_{t} \frac{\sqrt[\frac{1}{NtNf}]{\prod_{n = 1}^{N_{t}} \prod_{k = 1}^{N_{f}} \|I, [n, k]\|}}{\sum_{n = 1}^{N_{t}} \sum_{k = 1}^{N_{f}} \|I, [n, k]\|}$

Table 4. Data test set (6205 bearing type); load 0 HP, 1797 rpm; 12 K drive end bearing fault

Fault Size [inch /1000]	Faults
	F0	F1	F2	F3
	Free	Inner race	Ball	Outer race
	Free	Inner race	Ball	06HH	03HH	12HH
0.0	d0 (97)	–	–	–	–	–
7.0	–	d1 (105)	d2 (118)	d3 (130)	d4 (144)	d5 (156)
14.0	–	d6 (169)	d7 (185)	d8 (197)	–	–
21.0	–	d9 (209)	d10 (222)	d11 (234)	d12 (246)	d13 (258)
28.0	–	d14 (3001)	d15 (3005)	–

The variables in bold refer to vectors

Experiments and results

The experimental section begins with data description, in fact a benchmark used in fault detection in bearings of the rotating machines. The available data is analyzed from distribution in probability and next in frequency. This analysis is performed when a decision to choose the right methods is involved. More, such analysis helps in better understanding the signals, as effects of the faults, and the generative processes.

Sub-Sect. 6.3 groups data in classes, one class for a fault, i.e., C0 for fault #0, C1 for fault #2, and so on. For each class, the prototype vectors are computed.

Sub-Sect. 6.4 prepares the test vectors used for fault detection and diagnosis. The test vectors are associated with the type and size of the faults. Independent faults are considered and stationary working regime.

Sub-Sect. 6.5 describes the classifier and the results of the classification, organized in three Tables 5, 6, 7. The values in bold and italics show the averages on rows and columns.

Table 5. Classification rates R [%] for the time domain methods

Method	w	Distance type	100	500	1000	2000	5000	10,000	20,000	${\bar{R}}_{M}$
Method	nw	Distance type	4800	960	480	240	96	48	24	${\bar{R}}_{M}$
M1: STAT	Test_1 (incipient fault)	E	45.79	60.20	63.75	68.33	70.83	68.75	70.83	*64.06*
	Test_1 (incipient fault)	M	50.29	61.25	65.00	71.25	73.95	68.75	70.83	*62.90*
	Test_2 (medium fault)	E	27.06	29.89	31.66	30.83	29.16	25.00	25.00	*28.37*
	Test_2 (medium fault)	M	30.58	31.66	31.25	29.58	28.12	25.00	25.00	*28.74*
	Test_3 (advanced fault)	E	37.27	62.08	65.41	68.33	70.83	72.91	75.00	*64.54*
	Test_3 (advanced fault)	M	47.83	62.81	65.00	62.91	65.62	68.75	75.00	*63.98*
	Test_4 (large fault)	E	55.33	65.62	62.08	58.33	50.00	50.00	50.00	*55.90*
	Test_4 (large fault)	M	59.16	63.22	63.75	59.16	50.00	50.00	50.00	*56.47*
	${\bar{R}}_{w}$		*44.16*	*54.59*	*55.98*	*56.09*	*54.81*	*53.64*	*55.20*
M2: SMODEL	Test_1 (incipient fault)	E	38.58	42.29	45.41	22.50	18.75	56.25	45.83	*38.51*
	Test_1 (incipient fault)	M	44.02	41.87	40.41	27.08	6.25	52.08	37.50	*35.60*
	Test_2 (medium fault)	E	36.31	47.60	52.08	35.00	15.62	33.33	25.00	*34.99*
	Test_2 (medium fault)	M	38.52	46.45	54.37	35.41	12.50	37.50	25.00	*35.67*
	Test_3 (advanced fault)	E	43.06	46.14	44.16	25.00	31.25	50.00	33.33	*38.99*
	Test_3 (advanced fault)	M	46.79	46.85	43.75	20.41	31.25	50.00	33.33	*38.91*
	Test_4 (large fault)	E	37.00	27.91	23.12	58.33	56.25	75.00	58.33	*47.99*
	Test_4 (large fault)	M	35.70	22.39	22.91	40.41	56.25	75.00	33.33	*40.85*
	${\bar{R}}_{w}$		*39.99*	*40.18*	*40.77*	*33.01*	*28.51*	*53.64*	*36.45*

The values in bold and italics show the averages on rows and columns

Table 6. Classification rates R [%] for methods in the frequency domain

Method	w	Distance type	100	500	1000	2000	5000	10,000	20,000	${\bar{R}}_{M}$
Method	nw	Distance type	4800	960	480	240	96	48	24	${\bar{R}}_{M}$
M3: INDIRECT (feature based)	Test_1 (incipient fault)	E	56.29	44.16	35.20	48.75	34.37	25	25	*38.39*
	Test_1 (incipient fault)	M	58.64	48.02	48.33	50	36.45	29.16	25	*42.22*
	Test_2 (medium fault)	E	57.35	50.52	54.16	60.41	58.33	47.91	54.16	*54.69*
	Test_2 (medium fault)	M	57.22	49.27	51.04	57.08	56.25	37.50	50	*51.19*
	Test_3 (advanced fault)	E	46.81	48.64	37.50	50	47.91	50	50	*47.26*
	Test_3 (advanced fault)	M	50.37	49.27	41.66	50	47.91	50	50	*48.45*
	Test_4 (large fault)	E	51.87	75	50	75	75	75	75	*68.12*
	Test_4 (large fault)	M	64.89	64.79	61.87	75	75	75	75	*70.22*
	${\bar{R}}_{w}$		*55.43*	*53.70*	*47.47*	*58.28*	*53.90*	*48.69*	*50.52*
M4: DIRECT (Spectrum based)	Test_1 (incipient fault)	E	64.25	51.25	50	63.33	50	50	50	*54.11*
	Test_1 (incipient fault)	M	61.35	50.52	50	50	40.62	33.33	29.16	*44.99*
	Test_2 (medium fault)	E	58.41	44.16	46.66	57.91	36.45	33.33	25	*43.13*
	Test_2 (medium fault)	M	55.29	34.37	32.08	37.50	31.25	29.16	25	*34.95*
	Test_3 (advanced fault)	E	53.52	47.50	51.04	75.41	73.95	56.25	75	*61.81*
	Test_3 (advanced fault)	M	46.37	48.64	50	69.16	51.04	50	50	*52.17*
	Test_4 (large fault)	E	81.79	83.75	90.20	99.58	100	100	95.83	*93.02*
	Test_4 (large fault)	M	87.37	80.41	83.75	97.50	96.87	93.75	83.33	*88.99*
	${\bar{R}}_{w}$		*63.54*	*55.07*	*56.71*	*68.79*	*60.02*	*55.72*	*54.16*

The values in bold and italics show the averages on rows and columns

Table 7. Classification rates R [%] for methods in the time–frequency domain

Method	w	Distance type	100	500	1000	2000	5000	${\bar{R}}_{M}$
Method	nw	Distance type	4800	960	480	240	96	${\bar{R}}_{M}$
M5: STFT	Test_1 (incipient fault)	E	44.41	63.85	57.50	50.00	50.00	*53.15*
	Test_1 (incipient fault)	M	25.00	60.00	52.70	50.00	50.00	*47.54*
	Test_2 (medium fault)	E	42.12	25.00	25.00	25.00	27.08	*28.84*
	Test_2 (medium fault)	M	25.00	25.41	25.00	25.00	25.00	*25.08*
	Test_3 (advanced fault)	E	44.04	64.47	67.70	72.50	71.87	*64.11*
	Test_3 (advanced fault)	M	25.00	63.12	66.66	71.66	72.91	*59.87*
	Test_4 (large fault)	E	80.12	96.87	99.16	100	100	*95.23*
	Test_4 (large fault)	M	25.00	98.33	99.37	100	100	*84.54*
	${\bar{R}}_{w}$		*38.83*	*62.13*	*61.63*	*61.77*	*62.10*
M6: CWT	Test_1 (incipient fault)	E	54.50	47.50	48.75	37.08	25.00	*42.56*
	Test_1 (incipient fault)	M	55.08	50.62	25.00	25.00	25.00	*36.14*
	Test_2 (medium)	E	25.60	25.41	23.75	25.00	29.16	*25.78*
	Test_2 (medium)	M	25.22	25.00	25.00	25.00	25.00	*25.04*
	Test_3 (medium fault)	E	30.93	56.25	65.41	70.00	69.79	*58.47*
	Test_3 (medium fault)	M	37.33	58.43	64.37	70.00	68.75	*59.77*
	Test_4 (large fault)	E	70.10	95.20	98.33	100	100	*92.72*
	Test_4 (large fault)	M	76.06	97.08	97.91	100	100	*94.21*
	${\bar{R}}_{w}$		*46.85*	*56.93*	*56.06*	*56.51*	*55.33*
M7:WT	Test_1 (incipient fault)	E	46.00	46.45	49.37	45.83	50.00	*47.53*
	Test_1 (incipient fault)	M	46.35	45.83	48.75	46.25	53.12	*48.06*
	Test_2 (medium fault)	E	32.52	31.56	40.00	37.91	46.87	*37.72*
	Test_2 (medium fault)	M	32.52	31.66	41.66	37.50	43.75	*37.41*
	Test_2 (medium fault)	E	44.02	53.12	54.79	56.66	44.79	*50.67*
	Test_2 (medium fault)	M	43.68	52.81	54.37	56.25	48.95	*51.21*
	Test_4 (large fault)	E	43.45	48.33	44.79	48.33	52.08	*47.39*
	Test_4 (large fault)	M	42.70	46.77	45.41	50.83	47.91	*46.72*
	${\bar{R}}_{w}$		*41.40*	*44.56*	*47.69*	*47.44*	*48.43*

The values in bold and italics show the averages on rows and columns

The last sub-section presents a qualitative global analysis of the evaluated methods and domains, to have a reference among various methods for fault detection and diagnosis and based on signal processing paradigm.

Data description

The experimental data were considered for the case of bearing faults. The signals come from [43]. These are also explained in [44] and described in Table 4. The numbers inside the parentheses indicate the names of the files from the original source of data vibrations, i.e., [43].

Three types of faults are available: F1 (inner race; class #1), F2 (ball; class #2), and F3 (outer race, class #3). Case F0 (class #0) indicates no faults. In the case of fault F3, there are three subcases, depending on the fault position relative to the load zone: ‘centered’ (fault in the 6:00 o’clock position), ‘orthogonal’ (3:00 o’clock), and ‘opposite’ (12:00 o’clock) [44]. Vibration data from four fault sizes are available. Fault sizes ranging from incipient/small (0.007″) to larger (0.028″) are available. The sampling rate is 12,000 Hz, the motor load is 0 hp, and all the data are from the drive end bearing (DE).

Defects in the various elements of movement were achieved with a special machine with electric discharge (electro discharge machining (EDM)) and diameters of {0.007, 0.014, 0.021, 0.028} inches. The depth of the defects is 0.011 inches = 0.2794 mm.

For testing based on computer experiments, new names for variables were considered. All names beginning with “d” indicate a vector with 120,000 samples from normal conditions (no faults) or from the records with faults. The variable d0 contains the first 120,000 elements of the raw file, named 97 from [43]. The data are not scaled.

Data analysis

The signal analysis is made to indicate the frequency bandwidth of each fault, the range, and the overlapping. For example, a small overlapping in the frequency domain directly indicates a difficulty to process by band pass filtering and—indirectly—the need to carefully select the features for the classification stage. Also, the sampling frequency depends on the size of overlapping. A high overlapping imposes a higher value of the sampling rate. Concerning the range of the vibrations signals, a significant difference between signals associated to different faults, as size and type, could suggest the using of simple methods based on triggering. On another way, the data normalization makes difficult to use the amplitude information in detection and diagnosis. All these details are considered in data analysis stage. Classification is based on the selected features, with or without physical meaning. There is a link between the power of features in describing a state (fault) of the process and the capacity of the classifier to indicate the right class (fault). A set of small features needs a performant classifier.

Preliminary data analysis is performed in the time and frequency domains to evaluate the major features of the processed signals and identify the complexity of the fault detection problem.

Figure 5 presents the first 500 samples from the record files, with the parameters of the load of the electrical machine ranging from 0 to 4 hp. When the amplitude of the normal behavior (no faults, class #0) is very low, the highest amplitude is obtained for class #3 (the outer race fault).

Fig. 5 [Images not available. See PDF.]

Vibration for various bearing faults, and different loads (in HP)

Figure 6 presents a histogram set of the prototype vectors for incipient bearing faults, with values of the signals on x axis. The difference between the histograms suggests the use of features such as skewness and kurtosis for fault classification, which can be used in statistical methods.

Fig. 6 [Images not available. See PDF.]

Histograms for incipient bearing faults (the signal values are on x axis)

Figure 7 presents an example of four signals, one for each fault, in the frequency domain by using the power spectral densities (psd). The signals correspond to d0, d1, d2 and d3 (see Table 4). On the top side, the psd is in [dB/Hz]. On the bottom side, the normalized psd [W/Hz] is presented. The vibration signals with no load on the system are used. The periodogram function of MATLAB [45] was used for the computation of the power spectral density (psd), with a Tukey window. The frequency content is quite large for all signals, and a wide overlap is presented. This suggests a preliminary difficulty in distinguishing the class of the signals by using methods of the frequency domain. Selection methods based on filtering could fail, especially when the mechanical load of the system varies, which also induces a variation in the spectral contents of the vibration signals. The power spectral density is estimated by working and averaging on data windows, based on Discrete Fourier Transform (DFT). The number of points for the FFT is important. The length of the window imposes de resolution in the frequency domain. As example, a window of 512 samples provides a frequency resolution of 23 Hz and a window of 12,000 samples generates a frequency resolution of 1 Hz.

Fig. 7 [Images not available. See PDF.]

Example of psd of the vibration signals (d#0 to d#3), one for each class (C#0 to C#3), with normalized psd at the bottom side

Figures 8, 9, 10, 11 present a more complex representation by using the time–frequency transform. The Choi–Williams time–frequency distribution (CWD) was used with a weighting window of Kaiser type and length 99. The associated images are scaled. The window time length is 0.2 s, which means n = 2,400 samples at a frequency sampling of 12 kHz. The images obtained are for faults F0 (file no. 97), F1 (file no. 105), F2 (file no. 118) and F3 (file no. 130). On the left side of the figures, the power spectral density is represented, and on the bottom, the evolution over time is drawn. For each image, the current values of the parameters are presented also, in terms of frame/window, and file number. This is important, because the images are changing in time, i.e., from one window to another one, and reveals the non-stationary feature of the analyzed signal. The analysis of these images reveals a dominant component at approximately 1 kHz in the case without faults. When the fault size increases, the power spectrum increases up to 4 kHz and the main components are between 2 and 4 kHz. For each fault, it is possible to define specific sub-images or block patterns, which could be used for the detection of the incipient faults and later to implement a diagnosis plan. This research direction based on image processing paradigm is not considered here, but an example is presented in [46].

Fig. 8 [Images not available. See PDF.]

Time–frequency image for class C0 (No fault) (parameters: frame #10; file #97)

Fig. 9 [Images not available. See PDF.]

Time–frequency image for class C1 (Fault F1) (Parameters: frame #10; file #105)

Fig. 10 [Images not available. See PDF.]

Time–frequency image for class C2 (Fault F2) (Parameters: frame #12; file #118)

Fig. 11 [Images not available. See PDF.]

Time–frequency image for class C3 (Fault F3) (Parameters: frame #5; file #130)

To simplify the end-user equipment, a smaller observation window length is desired. The length of the data frame is imposed by the time–frequency patterns, in the sense that the TFI must contain enough information for detection and diagnosis purposes. The requirements of real-time change detection, which requires short data processing windows, must also be considered. In this work, five values for the length were considered: 100, 500, 1000, 2000, and 5000 samples. A length less than 100 does not accurately describe faults in the time–frequency (TF) domain. A length greater than 5000 will increase the complexity and cost of CDD equipment at the end-user point.

In the case of faults in bearings, there is a complex pattern in time–frequency images. Short length windows have parts/fragments only of the specific pattern and thus could generate difficulties in finding the right features for classification. At the same time, long-term records reveal some stationary patterns in the analyzed images.

Prototype vectors of the classes

The prototype vectors represent the pattern classes in the classification process. The principle adopted here is based on averaging followed by normalization of the features. The patterns are computed for each type of time–frequency transform before classification. The set of prototype vectors should be updated in time to consider the change in the parameters when the process is running.

For each class, the structure of the data for the computation of prototypes is:

d a t a_C 0 = [d_{0}]

d a t a_C 1 = [d_{0}, d_{6}, d_{9}, d_{14}]

d a t a_C 2 = [d_{2}, d_{7}, d_{10}, d_{15}]

d a t a_C 3 = [d_{3}, d_{8}, d_{11}]

For each domain, a set of prototype vectors is considered. As an example, for the (t,f) domain, Fig. 12 presents the prototype images of the classes. For the STFT method, the function spectrogram of MATLAB [45] was used with a Gaussian window of length 100 and 50% overlap. The size of the window is 5,000 samples. Similar figures are obtained for the methods based on the CWD and WT.

Fig. 12 [Images not available. See PDF.]

Prototype vectors as spectrograms

At this stage, the normalization of a feature is performed by considering all classes. The values used in normalization are also used at the classification stage when the features of the current window are considered.

Test vectors

The data for testing the classifier are organized in four levels of fault detection, i.e., for incipient, medium, advanced, and large faults, which means sizes of 0.007″, 0.014″, 0.021″, and 0.028″, respectively. The test vectors are defined by considering the names of the data files from the used database (first set) and, respectively, the name of the vectors (second set) of Table 4:

d a t a_{t e s t_{i n c i p i e n t_{f a u l t}}} = T e s t_1 = [97, 105, 118, 130] = [d_{0}, d_{1}, d_{2}, d_{3}]

d a t a_{t e s t_{m e d i u m_{f a u l t}}} = T e s t_2 = [97,169,185,197] = [d_{0}, d_{6}, d_{7}, d_{8}]

d a t a_{t e s t_{a d v a n c e d_{f a u l t}}} = T e s t_3 = [97,209,222,234] = [{d_{0}, d}_{9}, d_{10}, d_{11}]

d a t a_{t e s t_{{l a r g e}_{f a u l t}}} = T e s t_4 = [97,3001,3005,234] = [d_{0}, d_{14}, d_{15}, d_{11}]

Figure 13 presents the signal vectors used for testing. It is composed of vectors from each class, for a total of four vectors, each with 120,000 samples.

Fig. 13 [Images not available. See PDF.]

The signal used for testing

Classification results

For diagnosis purposes, a classifier is considered based on the similarity metric, which is used to compute the similarity between the input vector and prototype vectors of the considered classes. The index of the minimum distance indicates the estimated class of the fault.

The similarity metric is based on the general Minkowski distance, with two cases, Euclidean and Manhattan distances. For two vectors a and b, each of length n, the distances are computed by

d_{E} (a, b) = \sqrt{\sum_{i = 1}^{n} {(a_{i} - b_{i})}^{2}}

and

d_{M} (a, b) = \sum_{i = 1}^{n} |a_{i} - b_{i}|

The output of the classifier shows the index/number of the class associated with the considered fault or state from the process. It measures the distance between the current/actual feature vector x_i and the prototype feature vector of the classes, p_k. For n time windows and four classes, the output k* is computed by

k^{*} = a r g m i n \{d (x_{i}, p_{k}), i = 1, 2, \dots, n ; k = 0, 1, 2, 3\}

The classification rate is defined as the ratio between the number of right/correct outputs and the total number of windows. The results of the classification rate are presented in Table 5, 6, 7 for various lengths of the observation window, i.e., variable w, and various test signals, i.e., Test_1 to Test_4. The length w of the observation window is varied from 100 to 20,000 samples. The variable nw means the number of windows from the test vector. The average ${\bar{R}}_{M}$ over the methods used is computed. The average values of the recognition rate for various window lengths are also presented, ${\bar{R}}_{w}$ .

For the methods of time domain (Table 5) there is no significant difference between Euclidean and Manhattan distances. For the statistical method M1, by increasing the window length, the recognition rate is slightly rising. For both methods, M1 and M2, the recognition rates are not remarkably high, the best values being 75%. The statistic method M1 generate better results than M2, which is based on modelling. The change of the order of the model could improve the results of the classification.

For the methods of frequency domain (Table 6), The Euclidean distance provides slightly better results. The best recognition rate is obtained for the Test vector #4, which is associated with large faults. This means a more detailed spectrum. Also, large data windows (at least 1,000 samples) are necessary to obtain classification rates higher than 90%. The direct method (M4) provides better results than method based on features (M3), with the price of higher input data for the classifier. The best values are obtained for large faults, which is explained by the more significant component in the frequency spectrum, used for classification.

In the case of time–frequency domain (Table 7), the methods of STFT (M5) and CWT (M6) provide the best results, in opposition with WT (M7). Large window sizes are necessary, i.e., higher than 1,000 samples. Again, the best results are provided for the test vector #4, which correspond to large faults.

Figure 14 presents the evolution of the average classification rates for various methods and test signals based on the values in Table 5, 6, 7. The results of the average classifications show a better performance for the Direct spectrum (M4) method. By considering the representations over intervals (on middle and right side of the figure) M4 gives the best result for the interval no. 4, i.e., a window length around 2000 samples. This means a necessary time constant for an answer of order 2000/Fs = 0.16 s related to the samples and approximately 0.2 s if the processing time is included. Related to time–frequency methods, i.e., the methods M5 to M7, the length of the window must be higher than 500 samples.

Fig. 14 [Images not available. See PDF.]

Average results of the classification

A large window offers better results than a narrow window, e.g., w = 2000 samples, compared with w = 1,000 or w = 500 samples. However, increasing the length of the observation window will make the TF image more complex, and the features will become less sensitive to the considered faults. The use of supervised classifiers and prior information about the regimens and the expected spectral components could improve the quality of the diagnosis.

There was no significant difference between the Euclidean and Manhattan distances, at least for the considered set of features. Changing the set of features could change the balance between them.

From a computational point of view, the methods need serious resources, i.e., if we count the memory to store and process the TF images of sizes 1000 × 1000 elements. If the window length is increased, this could represent an obstacle for the detection and diagnosis equipment. This also imposes special and fast algorithms for data transforms, which is necessary for commercial implementation.

Qualitative global analysis

Table 8 presents a qualitative analysis of the evaluated methods from various points of view: recognition rate, window length, response time, computational resources, and complexity of the method/features. The highest recognition rate is 100. The highest (used) window length is 20,000 samples. The shorter, the better. The response time is computed by multiplying the window length by the sampling period. Computational (hardware) resources refer to the necessary circuits, e.g., signal processors and memory. The complexity of the method refers to the mathematical expression of the method/algorithm.

Table 8. Comparative analysis

			Criteria
Domain	Method $↓$	Name	Recognition rate (RR) (0:100)	Window length (NW) (1:20,000)	Response time (RT) [s]	Computational Resources (CR) (1:10)	Complexity (CX) (1:10)	Qn
Desired trend $\to$			$↗$	$↘$	$↘$	$↘$	$↘$	$↘$
t	M1	Statistical moments	75	20,000	1.66	8	8	2.30
t	M2	Signal modeling	75	10,000	0.83	6	8	2.02
f	M3	Features	75	2,000	0.16	6	7	1.94
f	M4	Direct	100	5,000	0.41	6	7	1.64
(t,f)	M5	Short-time Fourier	100	10,000	0.83	5	5	1.58
	M6	Choi-Williams	100	10,000	0.83	5	5	1.58
	M7	Wavelets	56	10,000	0.83	5	5	2.36

The values in bold show the best results

The table also contains a line with arrows in the up and down directions, which show the desired values and trends for the considered variables. Finally, the normalized criterion Qn should have small values.

The quality-normalized criterion Qn is considered an aggregate of the variables from the previous table as

Q_{n} = \frac{100}{RR} + \frac{NW}{20,000} + \frac{RT}{1.66} + \frac{CR}{10} + \frac{CX}{10}

The local variables were normalized by dividing by the highest values. Finally, if necessary to consider the sensitivity to the components, a weighted criterion of the components C_i can be used as

Q_{nw} = \sum_{i = 1}^{5} w_{i} ∙ C_{i}

The optimum criterion has the minimum value of the quality criterion as

B e s t = arg min_{k} Q_{nw}, k = 1, . . ., 7

Figure 15 presents the values of the normalized criterion Q_n for the set of seven evaluated methods. The minima values are obtained by a set of three methods, M4, M5 and M6. The absolute minimum value is for M5 and M6, corresponding to the time–frequency domain.

Fig. 15 [Images not available. See PDF.]

The quality global criterion

The main point of view was to highlight the results obtained at the levels of domain, i.e., time, frequency, and time–frequency, and under the signal processing paradigm. Considering details of each of the considered method needs more space, time to optimize the parameters of the used models, and specific numerical details related to the implementation stage in computers. More, the considered methods are not the single candidates. There could consider also other methods in each of the domains, as, e.g., methods based on Bayesian decision theory [47], statistic correlation [48] and more advanced data transforms as generalized S-transform and synchroextracting transform [49].

As presented in Fig. 15, the time–frequency domain is followed by frequency domain and—finall—by time domain. This hierarchy is not surprising because working in frequency and time–frequency domain means one more data transform before making feature extraction and selection.

The performances of the methods are evaluated with the help of the classifier, which is the simple but useful to understand the principle of each domain where signal processing as involved, in opposition with the process modelling approach. More advanced classifiers could be used to improve the recognition rates, e.g., as those based on machine learning paradigm [50].

Conclusion

The objective of this work was to evaluate several methods of fault detection and diagnosis under a signal processing paradigm.

Three representation domains were considered: time, frequency, and time–frequency. Seven solutions were proposed, two for the time domain, two for the frequency domain and three for the time–frequency domain.

The signals come from the observed process, electrical or mechanical, as an effect of changes in the state of the elements of the process under consideration. As a test case, faults in bearings and vibration signals were considered. Static conditions are considered, such as processes under stationary regimes, i.e., the constant speed of rotating machinery and constant mechanical load. The faults of bearings include inner race, ball, and outer race faults, with diverse sizes: incipient, medium, advanced, and large.

The raw data are explored in frames, with a tradeoff between the accuracy of the change detection, which requires small lengths, and the performance of the diagnosis, which requires longer lengths. Each frame/window is processed by a suitable transform.

For each domain, a set of features were proposed and used by a simple classifier, based on similarity of the prototypes of the classes and the input pattern. Superior performances could be obtained by using a more complex classifier, e.g., based on machine learning paradigm.

All methods have the potential to solve the problems of fault detection and process diagnosis. Each one has pros and cons. The preferred solution is determined by the user and must be adapted to the features of the available signals. Moreover, each method needs to be optimized, from algorithms and numerical implementation points of view.

A performance criterion is formulated based on the recognition rate, signal length, response time, computational resources, and complexity of the method. The minimum value of this criterion indicates the best method, which indicates that the time–frequency approaches are better.

Even if the data used in this work were from bearings, the proposed and implemented solutions have potential for other processes where other types of measured or estimated signals are available or of interest.

Author contributions

The author declare that no funds, grants, or other support were received during the preparation of this manuscript. The author has no relevant financial or non-financial interests to disclose. The author contributed to the study conception and design. Material preparation, data collection and analysis were performed by Dorel Aiordachioaie.

Funding

The author declare that no funds, grants, or other support were received during the preparation of this manuscript.

Data availability

Case Western Reserve University Bearing Data Center (2024), available at https://engineering.case.edu/bearingdatacenter.

Declarations

Competing interests

The authors declare no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Isermann, R. Preface to the special section of papers on supervision, fault detection and diagnosis of technical systems. Control Eng Practice; 1997; 5, 5 637. [DOI: https://dx.doi.org/10.1016/S0967-0661(97)00045-2]

2. Gertler, J. Fault detection and diagnosis in engineering systems; 1998; Marcel Dekker:

3. Zhou, Z; Li, G; Wang, J; Chen, H; Zhong, H; Cao, Z. A comparison study of basic data-driven fault diagnosis methods for variable refrigerant flow system. Energy Build; 2020; 224, 110232. [DOI: https://dx.doi.org/10.1016/j.enbuild.2020.110232]

4. Severson, K; Chaiwatanodom, P; Braatz, RD. Perspectives on process monitoring of industrial systems. Ann Rev Control; 2016; 42, pp. 190-200. [DOI: https://dx.doi.org/10.1016/j.arcontrol.2016.09.001]

5. Tidriri, K; Chatti, N; Verron, S; Tiplica, T. Bridging data-driven and model-based approaches for process fault diagnosis and health monitoring: a review of researches and future challenges. Ann Rev Control; 2016; 42, pp. 63-81. [DOI: https://dx.doi.org/10.1016/j.arcontrol.2016.09.008]

6. Atoui, MA; Cohen, A. Coupling data-driven and model-based methods to improve fault diagnosis. Comput Indus; 2021; 128, 103401. [DOI: https://dx.doi.org/10.1016/j.compind.2021.103401]

7. Mansouri, M; Harkat, MF; Nounou, HN; Nounou, MN. Data-driven and model-based methods for fault detection and diagnosis; 2020; Netherlands, Elsevier:

8. Chen, J; Patton, RJ. Robust model-based fault diagnosis for dynamic systems; 1999; Kluwer Academic Publishers: [DOI: https://dx.doi.org/10.1007/978-1-4615-5149-2]

9. Sobie, C; Freitas, C; Nicolai, M. Simulation-driven machine learning: Bearing fault classification. Mech Syst Signal Process; 2018; 99, pp. 403-419. [DOI: https://dx.doi.org/10.1016/j.ymssp.2017.06.025]

10. Liu, R; Yang, B; Zio, E; Chen, X. Artificial intelligence for fault diagnosis of rotating machinery: a review. Mech Syst Signal Process; 2018; 108, pp. 33-47. [DOI: https://dx.doi.org/10.1016/j.ymssp.2018.02.016]

11. Rai, A; Upadhyay, SH. A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribol Int; 2016; 96, pp. 289-306. [DOI: https://dx.doi.org/10.1016/j.triboint.2015.12.037]

12. Timusk, M; Lipsett, M; Mechefske, CK. Fault detection using transient machine signals. Mech Syst Signal Process; 2008; 22, pp. 724-749. [DOI: https://dx.doi.org/10.1016/j.ymssp.2008.01.013]

13. Lin, TR; Kim, E; Tan, ACC. A practical signal processing approach for condition monitoring of low-speed machinery using Peak-Hold-Down-Sample algorithm. Mech Syst Signal Proc; 2013; 36, pp. 256-270. [DOI: https://dx.doi.org/10.1016/j.ymssp.2012.11.003]

14. Randal, RB. Vibration-based condition monitoring: industrial, aerospace, and automotive applications; 2011; John Wiley & Sons: [DOI: https://dx.doi.org/10.1002/9780470977668]

15. Popescu, ThD; Aiordachioaie, D; Culea-Florescu, A. Basic tools for vibration analysis with applications to predictive maintenance of rotating machines: an overview. Int J Adv Manuf Technol; 2022; 118, pp. 2883-2899. [DOI: https://dx.doi.org/10.1007/s00170-021-07703-1]

16. Sadhu, A; Narasimhan, S; Antoni, J. A review of output-only structural mode identification literature employing blind source separation methods. Mech Syst Signal Process; 2017; 94, pp. 415-431. [DOI: https://dx.doi.org/10.1016/j.ymssp.2017.03.001]

17. Popescu, ThD. Blind separation of vibration signals and source change detection—application to machine monitoring. Appl Math Model; 2010; 34, 11 pp. 3408-3421.2651776[DOI: https://dx.doi.org/10.1016/j.apm.2010.02.030]

18. Antoni, J. Blind separation of vibration components: principles and demonstrations. Mech Syst Signal Process; 2005; 19, pp. 1166-1180. [DOI: https://dx.doi.org/10.1016/j.ymssp.2005.08.008]

19. Cabada, EC; Leclere, Q; Antoni, J; Hamzaoui, N. Fault detection in rotating machines with beamforming: spatial visualization of diagnosis features. Mechanical Syst Signal Proc; 2017; 97, pp. 33-43. [DOI: https://dx.doi.org/10.1016/j.ymssp.2017.04.018]

20. Cerrada, M; Sánchez, RV; Li, C; Pacheco, F; Cabrera, D; de Oliveira, JV; Vásquez, RE. A review on data-driven fault severity assessment in rolling bearings. Mech Syst Signal Proc; 2018; 99, pp. 169-196. [DOI: https://dx.doi.org/10.1016/j.ymssp.2017.06.012]

21. Randall, RB; Antoni, J. Rolling element bearing diagnostics—a tutorial. Mech Syst Signal Process; 2011; 25, pp. 485-520. [DOI: https://dx.doi.org/10.1016/j.ymssp.2010.07.017]

22. Thalji, I; Jantunen, E. A summary of fault modelling and predictive health monitoring of rolling element bearings. Mech Syst Signal Process; 2015; 60–61, pp. 252-272. [DOI: https://dx.doi.org/10.1016/j.ymssp.2015.02.008]

23. Yang, Y; Xie, R; Li, M; Cheng, W. A review on the application of blind source separation in vibration analysis of mechanical systems. Measurement; 2024; 227, 114241. [DOI: https://dx.doi.org/10.1016/j.measurement.2024.114241]

24. Alexander, ST. Adaptive signal processing; 1986; New York, Springer: [DOI: https://dx.doi.org/10.1007/978-1-4612-4978-8]

25. Adali, T; Haykin, S. Adaptive signal processing: next generation solutions; 2010; New York, Wiley-IEEE Press: [DOI: https://dx.doi.org/10.1002/9780470575758]

26. Gustafsson, F. Adaptive filtering and change detection; 2001; Wiley: [DOI: https://dx.doi.org/10.1002/0470841613]

27. Umapathy, K; Ghoraani, B; Krishnan, S. Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking. EURASIP J Adv Signal Process; 2010; [DOI: https://dx.doi.org/10.1155/2010/451695]

28. Boashash, B; Azemi, G; Khan, NA. Principles of time–frequency feature extraction for change detection in nonstationary signals: applications to newborn EEG abnormality detection. Pattern Recogn; 2015; 48, 3 pp. 616-627. [DOI: https://dx.doi.org/10.1016/j.patcog.2014.08.016]

29. Ahmed, HOA; Nandi, AK. Vibration image representations for fault diagnosis of rotating machines: a review. Machines; 2022; 10, 1113. [DOI: https://dx.doi.org/10.3390/machines10121113]

30. Meng, Q; Qu, L. Rotating machinery fault diagnosis using Wigner distribution. Mech Syst Signal Process; 1991; 5, 3 pp. 155-166. [DOI: https://dx.doi.org/10.1016/0888-3270(91)90040-C]

31. Radke, RJ; Andra, S; Al-Kofahi, O; Roysam, B. Image change detection algorithms: a systematic survey. IEEE Trans Image Process; 2005; 14, 3 pp. 294-307.2120664[DOI: https://dx.doi.org/10.1109/TIP.2004.838698]

32. Xiaolu, S; Bo, C. Change detection using change vector analysis from landsat TM images in Wuhan. Elsevier Procedia Environ Sci; 2011; 11, pp. 238-244. [DOI: https://dx.doi.org/10.1016/j.proenv.2011.12.037]

33. İlsever, M; Ünsalan, C. Two-dimensional change detection methods: remote sensing applications; 2012; London, Springer London: [DOI: https://dx.doi.org/10.1007/978-1-4471-4255-3]

34. Ashok, HG; Patil, DR. Survey on change detection in SAR images. Int J Comput Appl; 2014; 0975–8887, pp. 4-7.

35. Proakis JG, Manolakis DG. Digital signal processing: principles, algorithms and applications, 5th edition, Pearson 2022.

36. McFadden PD, Wang W. Time-Frequency Domain Analysis of Vibration Signals for Machinery Diagnostics. (I) Introduction to the Wigner-Ville Distribution, University of Oxford, Report OUEL 1859/92 1990.

37. Cohen, L. Time-frequency distributions—a review. Proc IEEE; 1989; 77, 7 pp. 941-981. [DOI: https://dx.doi.org/10.1109/5.30749]

38. Auger F, Flandrin P, Gonçalvès OL. Time-frequency Toolbox, CNRS France - Rice University 1996.

39. Hlawatsch, F; Boudreaux-Bartels, GF. Linear and quadratic time-frequency signal representations. IEEE Signal Proc Magaz; 1992; 9, 2 pp. 21-67. [DOI: https://dx.doi.org/10.1109/79.127284]

40. Barry, DT. Fast calculation of the Choi-Williams time-frequency distribution. IEEE Trans Signal Process; 1992; 40, 2 pp. 450-455. [DOI: https://dx.doi.org/10.1109/78.124957]

41. Daubechies I. Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, Series No. 61, 1st Ed. 1992.

42. Debnath L, Shah FA. Wavelet Transforms and Their Applications, Birkhäuser Boston, MA 2014.

43. Case Western Reserve University Bearing Data Center (2024), available at https://engineering.case.edu/bearingdatacenter

44. Smith, WA; Randall, RB. Rolling element bearing diagnostics using the Case Western Reserve University data: a benchmark study. Mech Syst Signal Process; 2015; 64–65, pp. 100-131. [DOI: https://dx.doi.org/10.1016/j.ymssp.2015.04.021]

45. The MathWorks, Inc. MATLAB version: 9.4.0.813654 (R2018a), 2024. https://www.mathworks.com

46. Aiordachioaie D, Popescu Th D, Dumitrascu BA. Method of feature extraction from time-frequency images of vibration signals in faulty bearings for classification purposes, EMERGING-2019, special session: advanced techniques of signal processing with application in operating and monitoring the industrial processes, Porto, Portugal, ISBN 978–1–61208–740–5, 34–39. 2019. https://personales.upv.es/thinkmind/dl/conferences/emerging/emerging_2019/emerging_2019_2_20_58002.pdf

47. Soleimani, M; Shahbeigi, S; Esfahani, MN. A Bayesian network development methodology for fault analysis; case study of the automotive aftertreatment system. Mech Syst Signal Process; 2024; 216, 111459. [DOI: https://dx.doi.org/10.1016/j.ymssp.2024.111459]

48. Liu, H; Zhang, J; Cheng, Y; Lu, C. Fault diagnosis of gearbox using empirical mode decomposition and multi-fractal detrended cross-correlation analysis. J Sound Vib; 2016; 385, pp. 350-371. [DOI: https://dx.doi.org/10.1016/j.jsv.2016.09.005]

49. Wang, H et al. A novel time-frequency analysis method for fault diagnosis based on generalized S-transform and synchroextracting transform. Meas Sci Technol; 2024; 35, [DOI: https://dx.doi.org/10.1088/1361-6501/ad0e59] 036101.

50. EmmertStreib, F; Dehmer, M. Taxonomy of machine learning paradigms: a data-centric perspective. WIREs Data Min Knowledge Discover; 2022; [DOI: https://dx.doi.org/10.1002/widm.1470]

Word count: 7682

Show less

A comparative analysis of fault detection and process diagnosis methods based on a signal processing paradigm

Content area

Abstract

Full text