Transfer Learning-Based Fault Diagnosis of

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

The primary source of power for automobiles, trains, boats, and other vehicles is the internal combustion (IC) engine [1]. Gearbox acts as the major power transmission unit between the IC engine and the wheels that initiates vehicle movement. Gearbox consists of two shafts with gears of varying sizes to vary the speed and torque supplied to the wheels. In general, gearbox is operated continually during vehicle usage, thereby susceptible to various loads and stresses. Such prolonged operation, variable load, high power transmission, thermos mechanical stresses, and presence of working components can induce faults in gearbox. Pitting, plastic flow, bearing defects, breakage in gears, etc., are some common faults that occur in gearbox. The aforementioned faults occur due to various factors like wear, surface fatigue, inadequate lubrication, high oil level in case, overload and contamination. A faulty gearbox can disrupt smooth gear transmission, damage internal gear components, and lead to the accrual of vibration and noise, thereby degrading the working of gearbox. To avoid catastrophic failure, downtime, high service cost and ensures smooth operation of the engine, diagnosing the condition of a gearbox is essential [1]. Numerous techniques have been adopted in identifying machine element faults that are described as follows:

• Oil analysis—In this method, the used oil extracted from machine elements is analyzed to determine the presence of faults in machine elements. To ascertain the state of health of the machine component, the chemical and physical characteristics of the extracted oil are evaluated. The major drawbacks of oil analysis are lack of heavy machine applicability and high cost [2–4].

• Temperature analysis—This method uses a thermal imaging camera to measure the temperature differential between components. These data are used to detect faults in the machine element. Temperature analysis allows for quick defect detection. Application of this method is limited as it cannot be used to detect the fault of machine components working at high temperatures [5–8].

• Sound emission and acoustic analysis—Sonic analysis, another name for acoustic analysis, examines the sound waves that are created by the rotational parts of a system. The faults are detected using variations in the frequency of the sound waves that are released. High signal-to-noise ratio, easy implementation, fault localization, and high-frequency applications are some of the advantages of acoustic and sound emission analysis. High computational time and sensitivity toward noise are the disadvantages in acoustic and sound emission analysis [9–12].

• Electromagnetic field analysis—This technique aids in identifying areas of machine component weakness as well as fractures, dents, and corrosion. An eddy current testing is used to assess the defects in machine components. The application of electromagnetic field analysis is limited due to its sensitivity toward the surrounding environment and noise [13].

• Vibration analysis—In this fault diagnosis method, vibration signals are acquired using an accelerometer. The acquired vibration signals are used to evaluate the condition of a machine element. Vibration analysis has many advantages over the other mentioned methods, like less computational time, more accurate, reliable, easy to extract and process. Hence, it is the most widely used and one of the best techniques in identifying machine element faults [14–19].

Among the above-mentioned techniques, vibration analysis has been widely adopted due to the nondestructive nature, unique pattern creation, precise measurement, ease of computation, and the presence of superior feature information. Several features like wavelet, auto-regressive moving average, histogram, and statistical features can be extracted from vibration signals. Additionally, feature selection algorithms like J48, wavelet transform [20], particle swarm optimization [21], principle component analysis [22], Fourier spectral analysis, and genetic algorithm [23] have been applied in fault diagnosis applications. To necessitate the classification process, a number of machine learning algorithms like neural network [24, 25] decision tree, multifeatured vector [26], support vector machine (SVM) [27], and much more were adopted. Despite the excellence of machine learning approaches in diagnosing faults in gearbox, feature engineering prevails as the major challenge. Deep knowledge and a high level of expertise in the specific application is required to choose the best extraction of feature technique, as the performance of the fault diagnosis technique depends upon the extracted features from the vibration signals. Furthermore, the extraction of features and classification are considered a tedious and time-consuming process due to the complexity of mathematical modeling. Such scenarios have made researchers to work onto a different direction that looks for an automated technique that can perform classification without the extraction of obvious features from vibration signals. Deep learning has evolved as one such technique that can perform classification without explicitly extracting features in images. However, acquiring images from the inside of the engine gearbox is nonfeasible and considered a costly affair.

On the other hand, vibration signal acquisition from gearbox is an established and familiar activity. The vibration signals may be transformed into radar plots that deep learning networks can use as input. Convolutional neural networks (CNNs) combine together to form deep learning architectures. An important trait of deep learning technique revolves around the capability to understand image patterns and provide accurate classification. Deep learning networks are available in two different variants, namely, built from scratch and pretrained models. Built from scratch models are generated based on user defined configuration that necessitates huge volumes of training data and consumes more time. Alternately, pretrained models are trained over large datasets, with their weights being accessible for custom applications in public repositories. Training a pretrained network is relatively simple and requires minimal time in comparison with models built from scratch. Additionally, pretrained models carry prior knowledge that can be adopted to solve problems at hand. Such a knowledge transfer process is termed transfer learning. Transfer learning has been excessively adopted in recent times in solving fault diagnosis problems. Simple construction, accurate classification, minimal training time, and versatile application are certain advantages of adopting transfer learning [28].

The scientific community has been drawn to technological advancements in artificial intelligence (AI) and its evolution in conjunction with the Internet of Things (IoT). Recent years have witnessed the rise of AI as a power tool in various applications that include data exploration and big data analysis. AI has also been a driving force for the development of modern fault diagnosis techniques. The application of AI in the field of science was initially introduced by Hinton, Krizhevsky, and Sutskever [29], which paved the way for research in several fields of study. In a hierarchical framework, several neuronal layers are placed on top of one another to create deep neural networks (DNNs), which are able to comprehensively extract input information. The construction of networks is referred to as “deep” since the original data are processed via many layers of learning. DNN analyses complex dataset structures using stacked layers and extracts the most important facts from the input data. Many disciplines have used deep learning models, including visual inspections, robotics, surveillance object detection, and language processing, due to their automatic feature learning ability and advanced nonlinear regression capability [30–33]. Thus, there is a larger requirement and possibility for the adoption of DNN with automatic learning capabilities to diagnose faults in mechanical systems.

As shown in Figure 1, mechanical system defect detection was implemented in two stages based on the DNN’s capacity to learn complicated features. This is an explanation of the first two phases of deep learning applications.

[figure(s) omitted; refer to PDF]

• In the initial stage, DNN was used to mimic the traditional fault diagnosis techniques i.e., either to classify or conduct feature selection. First, the signals obtained were used to extract the features using different feature extraction techniques. Moreover, the DNN that handles the classification job was trained using the retrieved features. The following discussion of the literatures is based on the aforementioned method. An optimized deep belief network was used by Shao et al. [34], using 18-time features to detect bearing faults. A sparse autoencoder was used by Verma et al. [35] to diagnose air compressor faults. CNN was used by Chen, Li, and Sánchez [36]; using time and frequency characteristics, one may infer the status of the gearbox. Due to the fact that the complete feature learning capacity of DNN was not extensively discovered, DNN was merely utilized to replace conventional approaches (Figure 1a).

• The feature learning capabilities of DNN were initially investigated in 2015. In this stage, the vibration data (stored as raw signals or images) is used by DNN for performing feature extraction and classification. This approach has eliminated the requirement for domain experts and has minimized the amount of time consumed. The following are some literatures on the aforementioned stages that have been published. Long short-term memory convolution was employed by Zhao et al. [37] to perform tool condition monitoring. Guo, Chen, and Shen [38] developed an ensemble-based strategy employing dual deep CNNs to track the health of roller bearings. One CNN to carry out the extraction of feature and fault pattern recognition, while the other CNN was used to perform fault classification [38]. Since DNN has the capacity to learn features, the defect detection procedure becomes more automated and efficient (Figure 1b).

Several works of literature have been carried out in gearbox fault diagnosis using machine learning and deep learning but were not specifically oriented toward IC engine gearbox. Most of the works related to machine learning algorithms utilized vibration or acoustic signal features to perform diagnostics. In a work carried out by Nowakowski et al. [39], gearbox condition was determined by acoustic measurements and decision tree algorithms. SVMs were applied by Jamadar et al. [40] to diagnose faults in gearbox through time-frequency domain analysis. A comparative study among three different classifiers, namely, random forest (RF), ensemble tree (ET), and k-nearest neighbor (KNN), was carried out by Afia et al. [41] to determine the faults in gearbox. Although machine learning classifiers were precise in the level of classification, the feature engineering process consumed more time as well as required expert domain knowledge. To levitate the existing challenges, researchers looked forward to the application of deep learning algorithms. For instance, Liu et al. [42] attempted to diagnose gearbox faults using continuous wavelet transform derived from vibration signal images. The authors used the gray wolf optimization algorithm and symmetric cross entropy to perform segmentation [42]. In another study, a counterfactual augmented few-shot contrastive learning approach was utilized for intelligent machine fault diagnosis [43]. A common spatial pattern-based feature extraction was performed by Karabacak and Özmen [44] to determine worm gear fault detection with the aid of acoustic and vibration measurements. The authors performed classification with KNN, SVM, deep learning, and artificial neural networks (ANNs). The authors inferred that ANN outperformed all other classifiers considered [44]. In another study, the authors also included thermal images to diagnose faults in worm gears. Vibration and acoustic time domain features, along with thermal image features, were extracted and evaluated individually and fuzed in different combinations with ANN and SVM. The fusion of three features derived a superior result [45]. Furthermore, thermal images of worm gear conditions were analyzed with a CNN-based model to portray the superior performance of deep learning models [46]. Wavelet capsule generative adversarial networks (WCGAN) were adopted by Liu et al. [47] to detect and diagnose rolling bearing faults. Also, the authors have adopted adversarial variational autoencoders for imbalance fault diagnosis [48]. Thus, deep learning algorithms have gained large attraction due to the efficient training, precise models, better pattern recognition, lack of manual feature engineering, and nonrequirement of prerequisite domain knowledge.

The building blocks of deep learning, which are used to extract intricate characteristics from visual data, are CNN. CNN is widely utilized in many different industries, including defect diagnosis, object identification, and voice recognition. Yet only a small number of research was done on identifying IC engine gearbox defects. Moreover, no attempt was made to use CNN for transfer learning to diagnose problems with the gearbox of an IC engine. Table 1 lists various publications that discuss the application of transfer learning to mechanical systems. In the current study, the effectiveness of a number of cutting-edge pretrained networks, including GoogLenet [60], residual network (ResNet)-50 [61], AlexNet [62], and Visual Geometry Group 16 (VGG-16) [63], was assessed for use in analyzing pictures captured from vibration signals to diagnose the status of an IC engine gearbox. Five hyperparameters like epochs, optimizes, learning rate and train-test split ratio were changed for three load situations to conduct the experiment (no load, 9.6 and 13.3 Nm). To find the optimum network for diagnosing IC engine gearbox faults, the findings were compared. Figure 2 depicts the whole experimental workflow.

[figure(s) omitted; refer to PDF]

Table 1

Works related to mechanical system fault diagnosis.

Mechanical system	Deep learning technique	References
Aero engine	Transfer learning using extreme learning machine	[49]

Rotor system	Transfer learning over feature vectors of vibration signals	[50]

Roller bearing	Ensemble lightweight model combined with transfer learning	[51]
	Transfer learning method/Wasserstein adversarial networks	[52]
	Layered alternately transfer learning	[53]
	Transfer learning	[54]
	Deep transfer learning using residual networks	[55]
	Deep transfer learning with reinforced ensemble	[51]

Nuclear power plant	Transfer learning with labeled data	[56]
Nuclear power plant	Transfer learning with unlabeled data	[57]

Wind turbine gearbox	Deep-boosted transfer learning	[58]
Wind turbine gearbox	Transfer learning	[59]

The primary technical contributions performed in the present study are detailed as follows:

• Comprehensive Analysis of Gearbox Fault Conditions: The present study examines four distinct IC engine gearbox fault conditions (100% defect, 75% defect, 50% defect, and 25% defect) alongside a healthy condition. Each condition is evaluated under three load conditions (13.3, 9.6 Nm, and no load), providing a comprehensive assessment.

• Radar Plot Image Generation: The collected vibration signal for every condition is transformed into radar plot images. Radar plots offer a versatile and effective tool for visualizing multidimensional data, facilitating comparative analysis, pattern recognition, and communication of insights across various domains and applications. Furthermore, the images are resized to 227 × 227 and 224 × 224 to facilitate further analysis.

• Application of Pretrained Networks: Four pretrained networks were considered to perform fault classification in the IC engine gearbox namely, AlexNet, VGG-16, ResNet-50, and GoogLenet. Out of the four pretrained networks, AlexNet alone was supplied with a 227 × 227-pixel image, and the other three networks were supplied with a 224 × 224-pixel image.

• Hyperparameter Tuning for Classification Effectiveness: Five hyperparameters like epochs, solver, train-test split ratio, batch size, and learning rate were changed to assess the classification effectiveness of pretrained networks.

• Identification of Top Performing Pretrained Network: Based on the outcomes of the evaluation, the study identifies the top-performing deep learning architecture tailored for IC engine gearbox fault detection. This finding guides practitioners and researchers in selecting the most effective model for similar applications in the field.

This study presents a novel methodology for detecting faults in IC engine gearbox. The key novel contribution points toward the conversion of vibration signals into radar plot images. Radar plots were applied in the present study due to the various advantages that include ease in representation, visualization of multidimensional data, an excellent tool for comparative analysis, definite patterns, and communication of insights. Adding to the primary novel contribution, a comprehensive analysis of four distinct gearbox fault conditions alongside a healthy condition under various load conditions was carried out to mimic the real-world operation. Furthermore, the study evaluates the performance of pretrained neural network architectures like AlexNet, VGG-16, ResNet-50, and GoogLeNet, utilizing hyperparameter tuning to optimize classification effectiveness. The identification of the top-performing network provides valuable guidance for practitioners and researchers in selecting appropriate models for similar applications. Overall, these contributions significantly advance gearbox fault detection techniques, offering practical insights for predictive maintenance and reliability enhancement in industrial machinery.

2. Experimental Studies

A four-stroke IC engine gearbox was used to carry out the experiment with an eddy current dynamometer. Figures 3 and 4 represent the schematic and physical experimental setup. The specification of the engine that was used for the experiment is illustrated in Table 2. The test was conducted for five different gear conditions (25%, 50%,75%, 100% defect and healthy condition). A base frame provided support for the setup, preventing unneeded foundation excitation. Figure 5 shows the placement of the test gear inside the gearbox. A tri-axial accelerometer (make: YMC, China, Type: IEPE) installed on the gearbox casing of the IC engine was used to record the vibrating data. By replacing each gear in turn, the experiment was run for various gear conditions. The vibration signal was digitalized using a data collecting equipment DAQ-9234 (Make: National Instruments) at 25.6 kHz sampling rate. Software called LabVIEW was used to acquire and process vibration signals. Three distinct loading situations were used throughout the experiments, and the vibration signals were recorded. To prevent any mistakes in the signal derived from other damaged components, the engine and gearbox parts were examined. Before obtaining the data, the engine was kept running for a while to get a consistent speed. The gearbox’s healthy and fractured teeth states were used in the experiments. In total, 30 samples were taken for each gearbox condition under each of the three loading circumstances (T2 = 13.3, T1 = 9.6 Nm and no load), for a total of 450 samples. The loading situation for the IC engine’s functioning is shown in Table 3. The input shaft’s perpendicular vibration signals were shown as radar plots for investigation.

[figure(s) omitted; refer to PDF]

Table 2

IC engine specifications adopted for the study.

IC engine parameter	Values
Number of driven gears	5
Displacement	124.66 cc
Power	11 bhp @ 8000 rpm
Torque	10.8 Nm @ 5500 rpm
Type	4 stroke, Air-cooled

Abbreviation: IC, internal combustion.

Table 3

IC engine loading condition.

S. no	Status of engine	Load condition (Nm)	Time duration (s)
1	Idle/start	Not applicable	<300
2	Speed of crank (uniform)	0 (no load), 9.6 (Torque 1), 13.3 (Torque 2)	<1800
3	Idle	Not applicable	<30
4	Stop	Not applicable	End

Abbreviation: IC, internal combustion.

3. Description of CNNs

Machine learning includes CNNs, which are a subset of it. It is one of several distinct ANN designs that are used for diverse purposes and data kinds. An example of a deep learning network architecture used primarily for image identification tasks that require the processing of pixel input is a CNN. CNN does not require manual features selection as it directly learns from the data. The features extracted from the images are associated with weights and biases through CNN. Classification of images is carried out by CNN based on the learned image features. The three-layer classifications are the convolution layer, pooling layer, and fully connected layer that make up CNN, as shown in Figure 6. The aforementioned layer grouping purposes are as follows:

[figure(s) omitted; refer to PDF]

• The input layer for CNN is the initial layer, which gathers and stores the picture pixel values.

• After the input layer, a convolution layer with a particular filter size is present. Mathematical operation between the filter and the input image is performed in this layer by moving the filtering the given picture.

• After the convolution layer comes the pooling layer. The primary purpose of this layer is to decrease the dimensions of the mapped features in the previous layer. This is achieved by down-sampling higher dimensional features to spatial dimension. The purpose of this layer is to minimize the computation expense.

• The fully connected layer comes after the pooling layer. This layer transforms picture feature matrices into vector forms. By designating a range of values for each class, this layer accurately classifies data with the use of activation functions.

CNN uses downsampling and convolution techniques to transform the given image input through the convolution, pooling, and layers that are fully connected to provide class scores for regression and classification purposes. It is not enough to only know the general architecture of CNN because these models need some time to develop and improve. The following is a detailed examination of each layer, including information on connection and hyperparameters. Figure 6 outlines the overall architecture of CNN.

4. Diagnosing Faults in IC Engine Gearbox Using Pretrained Models

In this study, fault diagnosis in IC engine gearbox was carried out by converting the acquired vibration signals to images (vibration radar plots). Preprocessing is done on the converted images, and they are downsized to 224 × 224 and 227 × 227 pixels based on the network selected. Four pretrained networks were used to categorize the images and determine the gearbox’s condition. The four pretrained networks taken into consideration are ResNet-50, GoogLenet, AlexNet, and VGG-16. Among the pretrained networks considered, AlexNet received an image feed with a 227 × 227-pixel size, while the other networks received a 224 × 224-pixel size. The whole process for defect diagnosis in an IC engine gearbox utilizing pretrained networks is shown in Figure 7. The following subsections elaborate on the pretrained networks employed along with the formulation of dataset.

[figure(s) omitted; refer to PDF]

4.1. The Formation and Preprocessing of Datasets

In the present study, the vibrations signals for IC engine gearbox fault conditions were obtained for three load situations (T2 = 13.3, T1 = 9.6 Nm, and No load). Furthermore, inside each load condition, five gear conditions, namely, 100% tooth defect, 75% tooth defect, 50% tooth defect, 25% tooth defect, and healthy gear, were adopted. The collected vibration signals were stored in the form of a data file (.csv) that was saved in separate folders for every gearbox condition. To generate the radar plots, Microsoft Excel Macros were utilized. A macro code in the visual basic platform inside Microsoft Excel was created to plot the radar plots for all the data files inside every gearbox condition sequentially. The generated radar plots were saved as images corresponding to the gearbox condition using the Kutools plugin for Microsoft Excel. Figure 8 represents the defects in gear tooth used in diagnosing faults in gearbox. A total of 450 samples were collected, with 30 samples being recorded for each of the gearbox’s three loading circumstances. The obtained vibration signals were converted into images (radar vibration plots). The images were then resized such that the input image size was acceptable by the adopted pretrained networks. Figure 9 represents the sample of obtained radar plots of different gear conditions for T1 load conditions.

[figure(s) omitted; refer to PDF]

4.2. VGG-16 Pretrained Network

Karen Simonyan and Andrew Zisserman introduced VGG-16 in the annual ILSVRC 2014, which won the prize for the top-performing network. Oxford Net is another name for VGG-16 since it was created by a team of academics from Oxford University. The network has 16 layers and was built to operate on pictures having 224 × 224 input dimensions. A classification layer, three fully connected layers, 5 max-pooling layers, and 13 convolutional layers made up the network design. To comprehend how the VGG-16 architecture functions, a 33-size kernel with two learnable parameters is an instance (B and d). The convolution process is represented by the equation y = f (By + d) below. The deep layers of the VGG-16 architecture extract and learn complex feature information, while the top layers of the architecture learn basic characteristics like edges. Each picture is reduced in size and fragmented into tiny memory components during the convolution process in order to extract more useful and beneficial qualities. The stride value and filter size are the two key variables that affect how an image from a convolution layer is produced. As its activation function, rectified linear unit (ReLU) is used by each convolution layer, and the layer before the fully connected layer has a dropout of 0.5. Convolution layers are coupled with maximum pooling layers to degrade activation samples and introduce classification uncertainty.

4.3. ResNet Pretrained Network

ResNet is one of the most efficient and successful networks in recent times that was developed by He et al. [61]. The model was declared as the best working model in the annual ILSVRC 2015. The network is intended to operate on pictures having 224 × 224 input dimensions. ResNet was formed by stacking residual units. To begin with, ResNet was trained using the Common Objects in Context (COCO) dataset. Depending upon the number of layers and number of residual units, ResNet architecture comes in many forms, like 18, 34, 50, 101, 152, and 1202. The effectiveness of the ResNet design has been affected by the use of identity shortcuts, where the identity of the output value is similar to the identity of the input value. Convolution pooling and fully linked layers are also parts of ResNet. There are a total of 49 convolution layers and one fully connected layer for the ResNet-50 architecture considered in the study. VGG network architecture resembles ResNet architecture, but the latter is eight times deeper than the former. The key advantages of ResNet are, high convergence with accurate classification.

4.4. AlexNet Pretrained Network

AlexNet was among the top-performing networks created by Alex Krizhevshy in cooperation with Ilya Sutskever and Geoffrey Hinton. Almost 1000 picture classes totaling more than 1.2 million photos were used to train AlexNet. The network, which has 61 million learnable parameters and eight layers overall, has been built to operate on pictures with a 227 × 227 input size. Convolution layers combined with maximum pooling make up the initial five layers, which are followed by three completely linked layers. The picture is subjected to convolution, normalization, and max pooling procedures to produce an output feature map with an image size of 55 × 55 in the first convolution layer, which has 96 filters of size 11 × 11. The second layer has 256 receptive filters, followed by a max pooling layer of size 33, which produces a feature map with a final dimension of 27 × 27. The size is then further decreased once the picture has been passed through layers three, four, and five of the convolutions, yielding a 13 × 13 feature map. ReLU is used as the activation function for each layer in order to overcome nonlinear difficulties. The last convolution layer is followed by two completely connected layers that vectored the matrices (4096 parameters). Lastly, to perform classification on the provided problem, a softmax enabled output layer is provided. To avoid model overfitting, a 0.5 ratio dropout layer was added before the fully connected layer.

4.5. GoogLeNet Pretrained Network

Szegedy et al. [60] created the network architecture known as GoogLeNet. It excelled in completing challenges related to object identification and picture classification at ILSVRC 2014. There are 22 levels total, 27 of which are pooling layers. A total of nine inception modules are coupled to three softmax layers, five fully connected, and three average pooling, four max pooling, and convolution in the Google Neural Network, which accepts pictures with a size of 224 × 224. It has wide applications in fields such as robotics, face recognition, adversarial training, etc. ReLU is supported by a 0.5 ratio dropout layer and used by fully connected layers as their activation function. The appearance of inception modules has given GoogLeNet the potential to handle any given problem. The key benefit of employing inception modules is the decrease in computing time and dimensional complexity. The above-mentioned advantages are achieved by identifying complex features by the module using different convolutional layer filter sizes.

5. Results and Discussion

Four pretrained models (ResNet-50, GoogLenet, VGG-16, and AlexNet) were employed in this investigation. Their effectiveness in diagnosing IC engine gearbox faults is assessed in this action. On the basis of varying the test-train split, optimizer, batch size, epochs, and learning rate, a total of five experiments were run. Matlab R2022a was utilized to conduct the experiment. A thorough investigation of the considered scenarios is described in the sections below.

5.1. Outcome of Varying Train-Test Split Ratio

The ratio at which the complete dataset is categorized into training and test datasets is primarily termed as train-test split ratio. Predictions of the trained model are evaluated using the test dataset, while image pattern knowledge is trained to the adopted network using the training dataset. A train-test split ratio of 0.8, for instance, indicates that 80% of the total sample was used to train the model and 20% was used to test it. Three load conditions—T2 = 13.3, T1 = 9.6 Nm, and No load were taken into consideration in this study. The experimentation was carried out using the five different ratios, namely 0.60, 0.70, 0.75, 0.80, and 0.85 while keeping all other hyperparameters constant (stochastic gradient descent [SGDM] solver, 10 batch size, 10 epochs, and 0.0001 learning rate). The results of various the train-test split ratio for several networks is described in Figure 10a–c.

[figure(s) omitted; refer to PDF]

• No Load Condition: The classification process of various pretrained networks for no-load condition can be observed in Figure 10a. From the plot in Figure 10a, one can interpret that ReNet-50 produces 100% classification accuracy for all the train-test ratios. However, to provide fair classification and training of the network, 0.80 train test split ratio was adopted for ResNet-50. On the other hand, VGG-16 and AlexNet also produced a maximum classification accuracy of 100% for a train test split ratio of 0.85 and 0.80, respectively. GoogLeNet obtained 96.7% classification accuracy for 0.80 train test split ratio.

• T1 (9.6 Nm) Load Condition: Figure 10b represents the classification accuracy of all the pretrained networks considered for T1 (9.6 Nm) load condition. From the plot in Figure 10b, one can observe that the highest classification accuracy was achieved by the VGG-16 network with 100% for 0.85 train-test split ratio. A classification accuracy of 97.1% was achieved by both ResNet-50 and AlexNet for 0.75 train-test split ratio. GoogLeNet produced a classification accuracy of 96.7% for 0.8 train-test split ratio.

• T2 (13.3 Nm) Load Condition: The plot in Figure 10c represents the classification accuracy of the adopted pretrained networks for T2 (13.3 Nm) load condition. The obtained results state that 100% classification accuracy was achieved by ResNet-50 and VGG-16 networks for train-test split ratios of 0.7 and 0.85, respectively. GoogLeNet attained a classification accuracy of 95.6% for 0.7 train-test split ratio, while AlexNet obtained 95% for 0.85 train-test split ratio.

• Observation and Significance: Various train-test split ratios like 0.6, 0.7, 0.75, 0.8, and 0.85 were evaluated to find the optimal ratio to achieve higher classification accuracy for different load conditions. The optimal train-test split ratio of 0.8 (for most cases) signifies that using a larger training set allows deep learning models to generalize the radar data plot. However, a higher training data of 0.85 did not improve the accuracy always thereby indicating that a balanced train-test split is essential for achieving reliable performance without overfitting. This selection highlights the importance of selecting training data to avoid overfitting (more training) or underfitting (minimal training)

5.2. Outcome of Varying Optimizers

Optimizers are used to improve the accuracy and reduce the overall loss by modifying the attributes like learning rate and weight of the neural network. Optimizers varied in this experiment were adaptive moment estimation (ADAM), root mean square propagation (RMSprop) and SGDM. Based on the findings from Section 5.1, the top-performing train test split ratio was fixed for each network for all the load conditions. Figure 11a–c shows that the effectiveness of the trained networks is impacted by modifying the optimizers.

[figure(s) omitted; refer to PDF]

• No Load Condition: The plot in Figure 11a details that 100% classification accuracy was obtained by ResNet-50 and VGG-16 network for the SGDM solver, while GoogLeNet produced the same accuracy for the RMSprop solver. AlexNet delivered an accuracy of 96.7% for the SGDM solver.

• T1 (9.6 Nm) Load Condition: From Figure 11b, one can infer that RMSprop optimized ResNet-50 to achieve the maximum classification accuracy of 100%. AlexNet and GoogLeNet were optimized by ADAM optimizer to deliver accuracies of 91.4% and 96.7%, respectively. VGG-16 obtained 90% accuracy for the SGDM solver.

• T2 (13.3 Nm) Load Condition: ResNet-50 and GoogLeNet produced a maximum classification accuracy of 95.6% and 97.8% for the ADAM optimizer that is evident from Figure 11c. Additionally, the SGDM optimizer portrayed maximum classification accuracy for AlexNet and VGG-16 with 85% and 100%, respectively.

• Observation and Significance: Several optimizers, like SGDM, ADAM, and RMSprop, were evaluated for performance. Some networks, like VGG-16 performed well with SGDM, while GoogLeNet performed well while using RMSprop. The superior performance of VGG-16 with SGDM optimizer represents the optimizers capability to make the model learn slowly and steadily to perform fault diagnosis task, thereby avoiding sudden changes to learning rate. On the other hand, RMSprops strength in handling sparse gradients was more suited for GoogLeNet. This analysis provides an understanding on how optimizers aid in model convergence to optimal solution.

5.3. Outcome of Varying Batch Size

The term “batch size” describes how many training samples are used in each iteration. When batch size is increased, the memory space consumed also increases. For the batch sizes of 8, 10, 16, 24, and 32, the performance of the pretrained models was evaluated. Based on the outcomes from Sections 5.1 and 5.2, the best-performing train test split ratio and optimizer were fixed for each network. Figure 12a–c shows the pretrained network performance for various batch sizes.

[figure(s) omitted; refer to PDF]

• No Load Condition: The plot in Figure 12a details that 100% classification accuracy was obtained by ResNet-50 and VGG-16 network for a batch size of 8, while GoogLeNet and AlexNet produced the same accuracy for 16 batch size.

• T1 (9.6 Nm) Load Condition: From Figure 12b, one can infer that batch sizes 24 and 8 assisted ResNet-50 and VGG-16 to achieve the maximum classification accuracy of 100%. AlexNet and GoogLeNet delivered accuracies of 94.3% and 96.7% for batch sizes 32 and 16, respectively.

• T2 (13.3 Nm) Load Condition: ResNet-50 and VGG-16 produced a maximum classification accuracy of 100% for 16 batch size that is evident from Figure 12c. Additionally, 10 batch size portrayed maximum classification accuracy for AlexNet and GoogLeNet with 95% and 97.8%, respectively.

• Observation and Significance: The impact of batch sizes (8, 10, 16, 24, and 32) on model performance was analyzed in the present section. Smaller batch sizes like 8 and 16 produced higher classification accuracy in comparison to the larger batch sizes. The analysis of batch size highlights the trade-off between memory size and model performance. Smaller batch sizes lead to frequent updates to improve the learning process accompanied by higher computational costs. On the other hand, higher batch sizes, though computationally efficient compromise on classification accuracy. This observation signifies that smaller batch sizes aid in fine-tuning and shaping the model better.

5.4. Outcome of Varying Epochs

A hyperparameter called “epochs” determines how many times the learning algorithm will run over the whole training dataset. To assess the performance of the model, an experiment with varied epochs 10, 20, and 30 was conducted in this study. Based on the outcomes from Sections 5.1, 5.2, and 5.3, the best-performing train test split ratio, optimizer, and batch size were fixed for each network. Figure 13a–c show how the pretrained networks performed at various epochs.

[figure(s) omitted; refer to PDF]

• No Load Condition: The classification process of various pretrained networks for no-load condition can be observed in Figure 13a. From the plot in Figure 13a, one can interpret that ReNet-50, VGG-16, and GoogLeNet produced 100% classification accuracy for 20 epochs. However, AlexNet delivered the same accuracy for 30 epochs.

• T1 (9.6 Nm) Load Condition: Figure 13b represents the classification accuracy of all the pretrained networks considered for T1 (9.6 Nm) load condition. From the plot in Figure 13b, one can observe that the highest classification accuracy was achieved by VGG-16 and ResNet-50 network with 100% for 20 and 10 epochs, respectively. A classification accuracy of 96.7% was achieved by GoogLeNet for 30 epochs. AlexNet produced a classification accuracy of 88.6% for 10 epochs.

• T2 (13.3 Nm) Load Condition: The plot in Figure 13c represents the classification accuracy of the adopted pretrained networks for T2 (13.3 Nm) load condition. The obtained results state that 100% classification accuracy was achieved by the VGG-16 network for 30 epochs. GoogLeNet, ResNet-50, and AlexNet attained a classification accuracy of 95.6%, 93.3%, and 95% for 20 epochs.

• Observation and Significance: Testing different epochs (10, 20, 30) revealed that most networks achieved peak performance around 20 epochs with diminishing returns at 30 epochs. The optimal performance at 20 epochs signifies that the networks need sufficient training to capture the complex features in the radar plots. However, increasing the epochs beyond this resulted in either no improvement or overfitting. This suggests that fault classification with vibration signals reaches its limit of generalization at a moderate level of training, thereby balancing between learning the data well and avoiding excessive memorization.

5.5. Effect of Learning Rate

The rate at which a model adjusts to a challenge depends on its learning rate. It might be difficult to choose the appropriate learning rate since a greater learning rate can lead to ineffective training, and a low relaxing rate can lead to an increase in computing time. 0.001, 0.0001, and 0.0003 learning rates were employed to assess the model performance. Considering the outcomes of Sections 5.1, 5.2, 5.3, and 5.4, the optimal train-test split ratio, optimizer, and epochs were established for each network. The effectiveness of each pretrained network for various learning rates can be shown in Figure 14a–c.

[figure(s) omitted; refer to PDF]

• No Load Condition: The plot in Figure 14a details that 100% classification accuracy was obtained by AlexNet and VGG-16 network for a learning rate of 0.0001, while ResNet-50 produced the same accuracy for 0.001 learning rate. GoogLeNet delivered 96.7% accuracy for 0.0001 learning rate.

• T1 (9.6 Nm) Load Condition: From Figure 14b, one can infer that learning rates 0.0001 and 0.0003 assisted GoogLeNet and VGG-16 to achieve the maximum classification accuracy of 100%. AlexNet and ResNet-50 delivered accuracies of 85.7% and 94.2% for 0.0001 learning rate.

• T2 (13.3 Nm) Load Condition: AlexNet and VGG-16 produced a maximum classification accuracy of 100% for 0.0001 learning rate that is evident from Figure 14c. Additionally, GoogLeNet and ResNet-50 both portrayed a maximum classification accuracy of 97.8% for 0.0001 learning rate.

• Observation and Significance: Among the various learning rates (0.0001, 0.0003, 0.001) explored, 0.0001 emerged as the best rate for most networks. The learning rate analysis demonstrates the importance of finding the correct learning speed for the network. A lower learning rate (0.0001) allowed the models to adjust the weights slowly and accurately, thereby reducing the risk of missing the optimal solution. A higher learning rate could lead to overshooting, which explains why the models underperformed. This shows that in delicate fault diagnosis tasks where the signal patterns are subtle, a lower learning rate helps the models converge more reliably.

5.6. Performance Comparison of the Adopted Pretrained Models

This section provides information on the performance assessment of pretrained networks. Based on the findings from the preceding sections, the ideal hyperparameters that improve the pretrained models’ performance were found. Table 4 lists the hyperparameters that helped pretrained networks perform better. Figure 15 shows the comparison plot on the overall effectiveness of pretrained networks with the best hyperparameters. Additionally, Figure 15 makes it very evident that VGG-16 has achieved the best performance under all load circumstances. Hence, VGG-16 can be recommended for diagnosing IC engine gearbox faults. Figure 16a–c shows the training progress of the best-performing VGG-16 pretrained network. Furthermore, the confusion matrices of VGG-16 for each load condition are presented in Figure 17a–c. From the confusion matrix and training progress, one can infer that the VGG-16 pretrained network has been classified accurately without any misclassification. The aforementioned factors infer that the loss during training is zero and that the network has successfully learned every feature for all the three load conditions.

[figure(s) omitted; refer to PDF]

Table 4

The list of optimal hyperparameters selected for the study.

Load condition	Pretrained network	Train test split ratio	Optimizer	Batch size	Epochs	Learning rate
No load	AlexNet	0.80	SGDM	16	30	0.0001
	GoogLeNet	0.80	RMSProp	16	20	0.0001
	ResNet–50	0.80	SGDM	8	20	0.001
	VGG-16	0.85	SGDM	8	20	0.0001

T1 (9.6 Nm)	AlexNet	0.75	ADAM	32	10	0.0001
	GoogLeNet	0.80	ADAM	16	30	0.0001
	ResNet–50	0.75	RMSProp	24	10	0.01
	VGG-16	0.85	SGDM	8	20	0.0003

T2 (13.3 Nm)	AlexNet	0.85	SGDM	10	20	0.0001
	GoogLeNet	0.70	ADAM	10	20	0.0001
	ResNet–50	0.70	ADAM	16	20	0.0001
	VGG-16	0.85	SGDM	16	30	0.0001

Abbreviations: ADAM, adaptive moment estimation; ResNet, residual network; RMS, root mean square; SGDM, stochastic gradient descent; VGG-16, Visual Geometry Group 16.

5.7. Performance Comparison With Various State of the Art Techniques

To diagnose gearbox faults, a variety of machine learning techniques have been utilized in the literature. The effectiveness of numerous pretrained network models has been evaluated in this study, and the top-performing network has been selected. The achieved outcome is contrasted with other cutting-edge techniques covered in the literature. The performance comparison of the suggested strategy with various cutting-edge approaches is shown in Table 5. From Table 5, it can be seen that VGG-16 has outperformed cutting-edge techniques in identifying gearbox defects. In light of the performance assessment, VGG-16 can be recommended for real-time problem diagnostics in the gearbox system. Also, the experiment was conducted five times in order to eliminate unpredictability, and the outcomes remained the same each time.

Table 5

Performance comparison of state-of-the-art techniques.

State of the art methods	Classification accuracy (%)	References
kNN	96.83	[20]
K-star	97.50
LWL	95.66

LSTM	94.33	[64]

SVM	93.87	[65]
ANN	95.97
CNN	98.30
SVM integrating CNN	99.72

VGG-16 (proposed)	100.00	—

Abbreviations: ANN, artificial neural network; CNN, convolutional neural network; kNN, k-nearest neighbor; VGG-16, Visual Geometry Group 16.

6. Conclusion

To identify faults in an IC engine gearbox, the transfer learning approach was used in the current research. Vibration radar plots were gathered, and an experimental test rig was built. A total of 450 samples were collected under three distinct load circumstances (T2 = 13.3, T1 = 9.6 Nm, and no load) for each condition of the gearbox: 100% defect, 75% defect, 50% defect, 25% defect, and healthy condition. On the basis of vibration plots, four pretrained deep learning models, GoogLeNet, VGG-16, ResNet-50, and AlexNet, were used to identify the defects in the IC engine gearbox. The ideal hyperparameters for each network were found after varying a number of them, including the train-test split ratio, epochs, optimizer, batch size, and learning rate. With 100% accuracy, VGG-16 outperformed ResNet-50 (97.3%), GoogleNet (98.1%), and AlexNet (95.23%). In light of the performance assessment, VGG-16 can be recommended for real-time problem diagnostics in the gearbox system. The key benefits of the suggested method are shorter calculation time, increased accuracy, resource conservation, and support for training models when only unlabeled datasets are available. The cost of the sensor, which is rather considerable, is the restriction. Using microelectromechanical systems (MEMS) sensors may also instrument new product development by lowering the cost of sensors.

References

[1] J. Wang, S. Li, Y. Xin, Z. An, "Gear Fault Intelligent Diagnosis Based on Frequency-Domain Feature Extraction," Journal of Vibration Engineering & Technologies, vol. 7 no. 2, pp. 159-166, DOI: 10.1007/s42417-019-00089-1, 2019.

[2] J. Sun, L. Wang, J. Li, F. Li, J. Li, H. Lu, "Online Oil Debris Monitoring of Rotating Machinery: A Detailed Review of More Than Three Decades," Mechanical Systems and Signal Processing, vol. 149,DOI: 10.1016/j.ymssp.2020.107341, 2021.

[3] D. Pradhan, A. K. Mishra, "Analysis of ISO VG 68 Bearing Oil for Condition Monitoring Collected From An Externally Pressurized Ball Bearing System," Materials Today: Proceedings, vol. 44, pp. 4602-4606, DOI: 10.1016/j.matpr.2020.10.831, 2021.

[4] C. Li, M. Liang, "Extraction of Oil Debris Signature Using Integral Enhanced Empirical Mode Decomposition and Correlated Reconstruction," Measurement Science and Technology, vol. 22 no. 8,DOI: 10.1088/0957-0233/22/8/085701, 2011.

[5] A. M. D. Younus, B.-S. Yang, "Intelligent Fault Diagnosis of Rotating Machinery Using Infrared Thermal Image," Expert Systems with Applications, vol. 39 no. 2, pp. 2082-2091, DOI: 10.1016/j.eswa.2011.08.004, 2012.

[6] Y. Li, X. Du, X. Wang, S. Si, "Industrial Gearbox Fault Diagnosis Based on Multi-Scale Convolutional Neural Networks and Thermal Imaging," ISA Transactions, vol. 129, pp. 309-320, DOI: 10.1016/j.isatra.2022.02.048, 2022.

[7] Y. Li, X. Du, F. Wan, X. Wang, H. Yu, "Rotating Machinery Fault Diagnosis Based on Convolutional Neural Network and Infrared Thermal Imaging," Chinese Journal of Aeronautics, vol. 33 no. 2, pp. 427-438, DOI: 10.1016/j.cja.2019.08.014, 2020.

[8] H. Zhiyi, S. Haidong, Z. Xiang, Y. Yu, C. Junsheng, "An Intelligent Fault Diagnosis Method for Rotor-Bearing System Using Small Labeled Infrared Thermal Images and Enhanced CNN Transferred From CAE," Advanced Engineering Informatics, vol. 46,DOI: 10.1016/j.aei.2020.101150, 2020.

[9] O. AlShorman, F. Alkahatni, M. Masadeh, "Sounds and Acoustic Emission-Based Early Fault Diagnosis of Induction Motor: A Review Study," Advances in Mechanical Engineering, vol. 13 no. 2,DOI: 10.1177/1687814021996915, 2021.

[10] A. Glowacz, R. Tadeusiewicz, S. Legutko, "Fault Diagnosis of Angle Grinders and Electric Impact Drills Using Acoustic Signals," Applied Acoustics, vol. 179,DOI: 10.1016/j.apacoust.2021.108070, 2021.

[11] C. K. Madhusudana, H. Kumar, S. Narendranath, "Face Milling Tool Condition Monitoring Using Sound Signal," International Journal of System Assurance Engineering and Management, vol. 8 no. 2, pp. 1643-1653, 2017.

[12] T. Toutountzakis, C. K. Tan, D. Mba, "Application of Acoustic Emission to Seeded Gear Fault Detection," NDT & E International, vol. 38 no. 1, pp. 27-36, DOI: 10.1016/j.ndteint.2004.06.008, 2005.

[13] S. J. Kim, K. Kim, T. Hwang, "Motor-Current-Based Electromagnetic Interference De-Noising Method for Rolling Element Bearing Diagnosis Using Acoustic Emission Sensors," Measurement, vol. 193,DOI: 10.1016/j.measurement.2022.110912, 2022.

[14] H. Cui, Y. Guan, H. Chen, "Rolling Element Fault Diagnosis Based on VMD and Sensitivity MCKD," IEEE Access, vol. 9, pp. 120297-120308, DOI: 10.1109/ACCESS.2021.3108972, 2021.

[15] A. Kumar, C. P. Gandhi, G. Vashishtha, "VMD Based Trigonometric Entropy Measure: A Simple and Effective Tool for Dynamic Degradation Monitoring of Rolling Element Bearing," Measurement Science and Technology, vol. 33 no. 1,DOI: 10.1088/1361-6501/ac2fe8, 2021.

[16] M. Liang, P. Cao, J. Tang, "Rolling Bearing Fault Diagnosis Based on Feature Fusion With Parallel Convolutional Neural Network," The International Journal of Advanced Manufacturing Technology, vol. 112 no. 3-4, pp. 819-831, DOI: 10.1007/s00170-020-06401-8, 2021.

[17] L. Pan, L. Zhao, A. Song, S. She, S. Wang, "Research on Gear Fault Diagnosis Based on Feature Fusion Optimization and Improved Two Hidden Layer Extreme Learning Machine," Measurement, vol. 177,DOI: 10.1016/j.measurement.2021.109317, 2021.

[18] R. F. R. Junior, I. A. dos S. Areias, G. F. Gomes, "Fault Detection and Diagnosis Using Vibration Signal Analysis in Frequency Domain for Electric Motors Considering Different Real Fault Types," Sensor Review, vol. 41 no. 3, pp. 311-319, DOI: 10.1108/SR-02-2021-0052, 2021.

[19] V. Indira, R. Vasanthakumari, R. Jegadeeshwaran, V. Sugumaran, "Determination of Minimum Sample Size for Fault Diagnosis of Automobile Hydraulic Brake System Using Power Analysis," Engineering Science and Technology, An International Journal, vol. 18 no. 1, pp. 59-69, DOI: 10.1016/j.jestch.2014.09.007, 2015.

[20] K. N. Ravikumar, C. K. Madhusudana, H. Kumar, K. V. Gangadharan, "Classification of Gear Faults in Internal Combustion (IC) Engine Gearbox Using Discrete Wavelet Transform Features and K Star Algorithm," Engineering Science and Technology, an International Journal, vol. 30,DOI: 10.1016/j.jestch.2021.08.005, 2022.

[21] G. Li, Y. Li, H. Chen, W. Deng, "Fractional-Order Controller for Course-Keeping of Underactuated Surface Vessels Based on Frequency Domain Specification and Improved Particle Swarm Optimization Algorithm," Applied Sciences, vol. 12 no. 6,DOI: 10.3390/app12063139, 2022.

[22] A. Mehta, D. Goyal, A. Choudhary, B. S. Pabla, S. Belghith, "Machine Learning-Based Fault Diagnosis of Self-Aligning Bearings for Rotating Machinery Using Infrared Thermography," Mathematical Problems in Engineering, vol. 2021, 2021.

[23] W. Deng, X. Zhang, Y. Zhou, "An Enhanced Fast Non-Dominated Solution Sorting Genetic Algorithm for Multi-Objective Problems," Information Sciences, vol. 585, pp. 441-453, DOI: 10.1016/j.ins.2021.11.052, 2022.

[24] V. Muralidharan, V. Sugumaran, "Feature Extraction Using Wavelets and Classification Through Decision Tree Algorithm for Fault Diagnosis of Mono-Block Centrifugal Pump," Measurement, vol. 46 no. 1, pp. 353-359, DOI: 10.1016/j.measurement.2012.07.007, 2013.

[25] K. Ostad-Ali-Askari, M. Shayannejad, H. Ghorbanizadeh-Kharazi, "Artificial Neural Network for Modeling Nitrate Pollution of Groundwater in Marginal Area of Zayandeh-Rood River, Isfahan, Iran," KSCE Journal of Civil Engineering, vol. 21 no. 1, pp. 134-140, DOI: 10.1007/s12205-016-0572-8, 2017.

[26] H. Chen, F. Miao, Y. Chen, Y. Xiong, T. Chen, "A Hyperspectral Image Classification Method Using Multifeature Vectors and Optimized KELM," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 2781-2795, DOI: 10.1109/JSTARS.2021.3059451, 2021.

[27] S. Natarajan, "Vibration Signal Analysis Using Histogram Features and Support Vector Machine for Gear Box Fault Diagnosis," International Journal of Systems, Control and Communications, vol. 8 no. 1,DOI: 10.1504/IJSCC.2017.081542, 2017.

[28] G. Chakrapani, V. Sugumaran, "Transfer Learning Based Fault Diagnosis of Automobile Dry Clutch System," Engineering Applications of Artificial Intelligence, vol. 117,DOI: 10.1016/j.engappai.2022.105522, 2023.

[29] G. E. Hinton, A. Krizhevsky, I. Sutskever, "Imagenet Classification With Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems, vol. 25 no. 1, pp. 1106-1114, 2012.

[30] Y. Bengio, "Learning Deep Architectures for AI," Foundations and Trends® in Machine Learning, vol. 2 no. 1,DOI: 10.1561/2200000006, 2009.

[31] Y. Bengio, A. Courville, "Deep Learning of Representations," Handbook on Neural Information Processing, vol. 49,DOI: 10.1007/978-3-642-36657-4, 2013.

[32] Y. Bengio, A. Courville, P. Vincent, "Representation Learning: A Review and New Perspectives," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35 no. 8, pp. 1798-1828, DOI: 10.1109/TPAMI.2013.50, 2013.

[33] Z. Chen, X. Zeng, W. Li, G. Liao, "Machine Fault Classification Using Deep Belief Network," ,DOI: 10.1109/I2MTC.2016.7520473, .

[34] H. Shao, H. Jiang, X. Zhang, M. Niu, "Rolling Bearing Fault Diagnosis Using an Optimization Deep Belief Network," Measurement Science and Technology, vol. 26 no. 11,DOI: 10.1088/0957-0233/26/11/115002, 2015.

[35] N. K. Verma, V. K. Gupta, M. Sharma, R. K. Sevakula, "Intelligent Condition Based Monitoring of Rotating Machines Using Sparse Auto-Encoders," ,DOI: 10.1109/ICPHM.2013.6621447, .

[36] Z. Chen, C. Li, R.-V. Sánchez, "Multi-Layer Neural Network With Deep Belief Network for Gearbox Fault Diagnosis," Journal of Vibroengineering, vol. 17 no. 5, pp. 2379-2392, 2015.

[37] R. Zhao, R. Yan, J. Wang, K. Mao, "Learning to Monitor Machine Health With Convolutional Bi-Directional LSTM Networks," Sensors, vol. 17 no. 2,DOI: 10.3390/s17020273, 2017.

[38] X. Guo, L. Chen, C. Shen, "Hierarchical Adaptive Deep Convolution Neural Network and Its Application to Bearing Fault Diagnosis," Measurement, vol. 93, pp. 490-502, DOI: 10.1016/j.measurement.2016.07.054, 2016.

[39] T. Nowakowski, F. Tomaszewski, P. Komorski, G. M. Szymański, "Tram Gearbox Condition Monitoring Method Based on Trackside Acoustic Measurement," Measurement, vol. 207,DOI: 10.1016/j.measurement.2022.112358, 2023.

[40] I. M. Jamadar, R. Nithin, S. Nagashree, "Spur Gear Fault Detection Using Design of Experiments and Support Vector Machine (SVM) Algorithm," Journal of Failure Analysis and Prevention, vol. 23 no. 5, pp. 2014-2028, DOI: 10.1007/s11668-023-01742-4, 2023.

[41] A. Afia, F. Gougam, C. Rahmoune, W. Touzout, H. Ouelmokhtar, D. Benazzouz, "Gearbox Fault Diagnosis Using REMD, EO and Machine Learning Classifiers," Journal of Vibration Engineering & Technologies, vol. 12 no. 3, pp. 4673-4697, DOI: 10.1007/s42417-023-01144-8, 2024.

[42] Y. Liu, J. Kang, L. Wen, Y. Bai, C. Guo, W. Yu, "Fault Diagnosis Algorithm of Gearboxes Based on GWO-SCE Adaptive Multi-Threshold Segmentation and Subdomain Adaptation," Processes, vol. 11 no. 2,DOI: 10.3390/pr11020556, 2023.

[43] Y. Liu, H. Jiang, R. Yao, T. Zeng, "Counterfactual-Augmented Few-Shot Contrastive Learning for Machinery Intelligent Fault Diagnosis With Limited Samples," Mechanical Systems and Signal Processing, vol. 216,DOI: 10.1016/j.ymssp.2024.111507, 2024.

[44] Y. E. Karabacak, N. G. Özmen, "Common Spatial Pattern-Based Feature Extraction and Worm Gear Fault Detection Through Vibration and Acoustic Measurements," Measurement, vol. 187,DOI: 10.1016/j.measurement.2021.110366, 2022.

[45] Y. E. Karabacak, N. G. Özmen, L. Gümüşel, "Intelligent Worm Gearbox Fault Diagnosis Under Various Working Conditions Using Vibration, Sound and Thermal Features," Applied Acoustics, vol. 186,DOI: 10.1016/j.apacoust.2021.108463, 2022.

[46] Y. E. Karabacak, N. G. Özmen, L. Gümüşel, "Worm Gear Condition Monitoring and Fault Detection From Thermal Images via Deep Learning Method," Eksploatacja i Niezawodność—Maintenance and Reliability, vol. 22 no. 3, pp. 544-556, DOI: 10.17531/ein.2020.3.18, 2020.

[47] Y. Liu, H. Jiang, C. Liu, W. Yang, W. Sun, "Data-Augmented Wavelet Capsule Generative Adversarial Network for Rolling Bearing Fault Diagnosis," Knowledge-Based Systems, vol. 252,DOI: 10.1016/j.knosys.2022.109439, 2022.

[48] Y. Liu, H. Jiang, R. Yao, H. Zhu, "Interpretable Data-Augmented Adversarial Variational Autoencoder With Sequential Attention for Imbalanced Fault Diagnosis," Journal of Manufacturing Systems, vol. 71, pp. 342-359, DOI: 10.1016/j.jmsy.2023.09.019, 2023.

[49] Y.-P. Zhao, Y.-B. Chen, "Extreme Learning Machine Based Transfer Learning for Aero Engine Fault Diagnosis," Aerospace Science and Technology, vol. 121,DOI: 10.1016/j.ast.2021.107311, 2022.

[50] S. Wang, Q. Wang, Y. Xiao, W. Liu, M. Shang, "Research on Rotor System Fault Diagnosis Method Based on Vibration Signal Feature Vector Transfer Learning," Engineering Failure Analysis, vol. 139,DOI: 10.1016/j.engfailanal.2022.106424, 2022.

[51] H. Zhong, Y. Lv, R. Yuan, D. Yang, "Bearing Fault Diagnosis Using Transfer Learning and Self-Attention Ensemble Lightweight Convolutional Neural Network," Neurocomputing, vol. 501, pp. 765-777, DOI: 10.1016/j.neucom.2022.06.066, 2022.

[52] X. Li, H. Jiang, M. Xie, T. Wang, R. Wang, Z. Wu, "A Reinforcement Ensemble Deep Transfer Learning Network for Rolling Bearing Fault Diagnosis With Multi-Source Domains," Advanced Engineering Informatics, vol. 51,DOI: 10.1016/j.aei.2021.101480, 2022.

[53] C. Huo, Q. Jiang, Y. Shen, C. Qian, Q. Zhang, "New Transfer Learning Fault Diagnosis Method of Rolling Bearing Based on ADC-CNN and LATL Under Variable Conditions," Measurement, vol. 188,DOI: 10.1016/j.measurement.2021.110587, 2022.

[54] D. Ruan, F. Zhang, J. Yan, "Transfer Learning Between Different Working Conditions on Bearing Fault Diagnosis Based on Data Augmentation," IFAC-PapersOnLine, vol. 54 no. 1, pp. 1193-1199, DOI: 10.1016/j.ifacol.2021.08.141, 2021.

[55] Y. Z. Liu, K. M. Shi, Z. X. Li, G. F. Ding, Y. S. Zou, "Transfer Learning Method for Bearing Fault Diagnosis Based on Fully Convolutional Conditional Wasserstein Adversarial Networks," Measurement, vol. 180,DOI: 10.1016/j.measurement.2021.109553, 2021.

[56] J. Li, M. Lin, Y. Li, X. Wang, "Transfer Learning With Limited Labeled Data for Fault Diagnosis in Nuclear Power Plants," Nuclear Engineering and Design, vol. 390,DOI: 10.1016/j.nucengdes.2022.111690, 2022.

[57] J. Li, M. Lin, Y. Li, X. Wang, "Transfer Learning Network for Nuclear Power Plant Fault Diagnosis With Unlabeled Data Under Varying Operating Conditions," Energy, vol. 254,DOI: 10.1016/j.energy.2022.124358, 2022.

[58] F. Jamil, T. Verstraeten, A. Nowé, C. Peeters, J. Helsen, "A Deep Boosted Transfer Learning Method for Wind Turbine Gearbox Fault Detection," Renewable Energy, vol. 197, pp. 331-341, DOI: 10.1016/j.renene.2022.07.117, 2022.

[59] Y. Zhu, C. Zhu, J. Tan, Y. Tan, L. Rao, "Anomaly Detection and Condition Monitoring of Wind Turbine Gearbox Based on LSTM-FS and Transfer Learning," Renewable Energy, vol. 189, pp. 90-103, DOI: 10.1016/j.renene.2022.02.061, 2022.

[60] C. Szegedy, W. Liu, Y. Jia, "Going Deeper With Convolutions," ,DOI: 10.1109/CVPR.2015.7298594, .

[61] K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," pp. 770-778, DOI: 10.1109/CVPR.2016.90, .

[62] A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet Classification With Deep Convolutional Neural Networks," Communications of the ACM, vol. 60 no. 6, pp. 84-90, DOI: 10.1145/3065386, 2017.

[63] K. Simonyan, A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," .

[64] K. N. Ravikumar, A. Yadav, H. Kumar, K. V. Gangadharan, A. V. Narasimhadhan, "Gearbox Fault Diagnosis Based on Multi-Scale Deep Residual Learning and Stacked LSTM Model," Measurement, vol. 186,DOI: 10.1016/j.measurement.2021.110099, 2021.

[65] Z. Chen, C. Liu, K. Gryllias, W. Li, "Gearbox Fault Diagnosis Using Convolutional Neural Networks and Support Vector Machines," ,DOI: 10.23919/EUSIPCO.2019.8902686, .

Word count: 8902

Show less

Copyright © 2024 S. Naveen Venkatesh et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

Due to constant loads, gear wear, and harsh working conditions, gearboxes are subject to fault occurrences. Faults in the gearbox can cause damage to the engine components, create unnecessary noise, degrade efficiency, and impact power transfer. Hence, the detection of faults at an early stage is highly necessary. In this work, an effort was made to use transfer learning to identify gear failures under five gear conditions—healthy condition, 25% defect, 50% defect, 75% defect, and 100% defect—and three load conditions—no load, T1 = 9.6, and T2 = 13.3 Nm. Vibration signals were collected for various gear and load conditions using an accelerometer mounted on the casing of the gearbox. The load was applied using an eddy current dynamometer on the output shaft of the engine. The obtained vibration signals were processed and stored as vibration radar plots. Residual network (ResNet)-50, GoogLenet, Visual Geometry Group 16 (VGG-16), and AlexNet were the network models used for transfer learning in this study. Hyperparameters, including learning rate, optimizer, train-test split ratio, batch size, and epochs, were varied in order to achieve the highest classification accuracy for each pretrained network. From the results obtained, VGG-16 pretrained network outperformed all other networks with a classification accuracy of 100%.

Details

Title

Transfer Learning-Based Fault Diagnosis of Internal Combustion (IC) Engine Gearbox Using Radar Plots

Author

S Naveen Venkatesh¹

; Srivatsan, B²; Sugumaran, V²

; Ravikumar, K N³

; Kumar, Hemantha⁴

; Vetri Selvi Mahamuni⁵

¹ Division of Operation and Maintenance Engineering Luleå University of Technology Luleå Norbotten Sweden
² School of Mechanical Engineering (SMEC) Vellore Institute of Technology Chennai Tamil Nadu India
³ School of Technology (Mechanical Engineering) Gati Shakti Vishwavidyalaya (A Central University, Under Ministry of Railways, Govt of India) Lalbaugh Vadodara Gujarat India
⁴ Department of Mechanical Engineering National Institute of Technology Surathkal Karnataka India
⁵ Department of Project Management Mettu University Mettu Ethiopia

Editor

Tomasz Wandowski

Publication year

2024

Publication date

2024

Publisher

John Wiley & Sons, Inc.

ISSN

1687725X

e-ISSN

16877268

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/js/8869808

ProQuest document ID

3144715339

Transfer Learning-Based Fault Diagnosis of Internal Combustion (IC) Engine Gearbox Using Radar Plots

Jump to:

Full text

Abstract

Details

Suggested sources