Next Article in Journal
Moho Imaging with Fiber Borehole Strainmeters Based on Ambient Noise Autocorrelation
Next Article in Special Issue
Distributed State Observer for Systems with Multiple Sensors under Time-Delay Information Exchange
Previous Article in Journal
Effect of Sampling Rate, Filtering, and Torque Onset Detection on Quadriceps Rate of Torque Development and Torque Steadiness
Previous Article in Special Issue
Pipeline Leak Detection: A Comprehensive Deep Learning Model Using CWT Image Analysis and an Optimized DBN-GA-LSSVM Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Domain Adaptation for Bearing Fault Diagnosis Based on SimAM and Adaptive Weighting Strategy

1
School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300384, China
2
Maritime College, Tianjin University of Technology, Tianjin 300384, China
3
Engineering Training Center, Tianjin University of Technology, Tianjin 300384, China
4
Institute of Intelligent Control and Fault Diagnosis, Tianjin University of Technology, Tianjin 300384, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(13), 4251; https://doi.org/10.3390/s24134251
Submission received: 1 June 2024 / Revised: 22 June 2024 / Accepted: 26 June 2024 / Published: 30 June 2024
(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors 2024)

Abstract

:
Domain adaptation techniques are crucial for addressing the discrepancies between training and testing data distributions caused by varying operational conditions in practical bearing fault diagnosis. However, transfer fault diagnosis faces significant challenges under complex conditions with dispersed data and distinct distribution differences. Hence, this paper proposes CWT-SimAM-DAMS, a domain adaptation method for bearing fault diagnosis based on SimAM and an adaptive weighting strategy. The proposed scheme first uses Continuous Wavelet Transform (CWT) and Unsharp Masking (USM) for data preprocessing, and then feature extraction is performed using the Residual Network (ResNet) integrated with the SimAM module. This is combined with the proposed adaptive weighting strategy based on Joint Maximum Mean Discrepancy (JMMD) and Conditional Adversarial Domain Adaption Network (CDAN) domain adaptation algorithms, which minimizes the distribution differences between the source and target domains more effectively, thus enhancing domain adaptability. The proposed method is validated on two datasets, and experimental results show that it improves the accuracy of bearing fault diagnosis.

1. Introduction

With the advent of Industry 4.0, modern information technology has become deeply integrated with manufacturing, leading to significant advancements in machine manufacturing and industrial production. Rotating machinery equipment is extensively used in these fields, and bearings, as key mechanical components of such machinery, affect the safe operation of the equipment in its entirety [1,2,3]. Statistics show that about 40% of failures in rotating equipment are caused by bearing faults [4]. Thus, accurate and real-time detection of bearing faults is essential for the smooth progress of mechanical manufacturing and industrial production.
With the boom in big data and artificial intelligence technology, data-driven intelligent fault diagnosis methods have become a key research focus in recent years [5,6,7]. Data processing plays a key role in the effectiveness of fault diagnosis. Raw bearing fault signals typically reflect time domain information, and, after serial processing, they can reveal frequency domain information. However, considering only time or frequency domain information, the model’s fault diagnosis performance is often suboptimal when dealing with nonlinear bearing fault diagnosis signals. Therefore, attention has been given to the Continuous Wavelet Transform (CWT), which can simultaneously reflect both time and frequency domain information. Gu et al. [8] proposed a hybrid deep learning model for fault diagnosis that effectively extracts fault features from bearings and handles small sample datasets. This model uses variational modal decomposition (VMD) [9] and CWT algorithms for data processing and employs a convolutional neural network (CNN) [10] for model training. Cheng et al. [11] introduced a rotational machinery diagnosis method based on the CWT and Local Binary Convolutional Neural Networks. Stable and accurate fault diagnosis technology can reliably detect the types of faults in motors, providing reliable support for the operational monitoring and maintenance of rotating machinery [5,6,7]. As an intelligent algorithm, the deep learning model can extract fault features from bearing data for end-to-end fault diagnosis. Wang et al. [12] proposed a method combining an improved residual network and wavelet transform for intelligent gearboxes. This approach effectively extracts features and diagnoses single faults, compound faults, and unbalance faults. Jiang et al. [13] introduced a multi-scale convolutional neural network featuring channel attention utilizing both max pooling and average pooling layers to identify bearing fault characteristics at different scales. Regarding nonlinear feature extraction methods, Zhang et al. [14] developed an adaptive activation function with a tanh function and slope thresholding. These were incorporated into the Residual Network (ResNet), allowing the network to extract features that are significantly different between faults types. However, deep learning requires a large number of data for training, and deep learning algorithms need training and testing data to have the same operational conditions, meaning they must share the same distribution. Real-time changes in operating conditions such as humidity, voltage, speed, current fluctuations, and load can cause data distribution variations during the normal operation of actual rotating equipment. These changes decrease the accuracy of deep learning algorithms when processing test dataset data [15].
Recently, bearing fault diagnosis methods based on domain adaptation transfer learning have addressed several challenges. Specifically, they have resolved the issues of low generalizability and low robustness owing to limited data in deep learning. They have tackled the problems associated with the source and target data being in different feature spaces or distributions. Schwendemann et al. [16] proposed the Layered Maximum Mean Discrepancy (LMMD) method, an extension of the Maximum Mean Discrepancy (MMD) that incorporates the unique characteristics of the proposed intermediary domain. Lu et al. [17] developed an architecture in which the conditional and marginal distributions are adapted across multiple neural network layers. This method uses the MMD to measure the distribution discrepancies and introduces an adaptive weighting strategy to ascertain the importance of different distributions. Mao et al. [18] combined the adaptability of Domain Adversarial Neural Networks (DANNs) with structured relational information across various failure models to enhance transfer learning effectiveness. Chen et al. [19] proposed the Multi-Gradient Hierarchical Domain Adaptation Network, which concurrently acquires transferable domain invariance and class-discriminative insights, improving the diagnostic transferability of bearing faults. All of these methods have achieved satisfactory results in some respects. However, traditional bearing fault diagnosis methods based on transfer learning and CWT time–frequency images still face the following major challenges in feature capture and domain adaptation:
(1) When the fault signal is weak, the data are smooth, or feature contrast is not apparent. CWT alone may not clearly display bearing fault characteristics. Therefore, enhancing data contrast through sharpening methods to improve the discriminative power of data features is particularly important.
(2) In the process of feature extraction for fault diagnosis, models need strong feature capture capabilities. Traditional residual networks often struggle to adequately focus on important features when capturing complex fault patterns, leading to suboptimal feature extraction.
(3) Domain adaptation algorithms based on kernel methods, such as the MMD, LMMD, and Joint Maximum Mean Discrepancy (JMMD), rely heavily on the selection and tuning of the kernel function to achieve feature alignment. When dealing with data exhibiting complex nonlinear distributions, the choice of kernel function greatly influences the algorithm’s ability to capture feature differences and interactions within the data. Domain adaptation algorithms based on adversarial learning, such as DANNs and Conditional Adversarial Domain Adaption Networks (CDANs), align features between the source and target domains through adversarial training. Although adversarial learning excels at capturing complex nonlinear distribution differences, the training process is prone to gradient instability, mode collapse, and vanishing gradients, making it difficult for the model to converge. Additionally, a DANN primarily focuses on aligning feature distributions and lacks explicit alignment of class conditions, which can adversely affect classification performance.
In order to solve the problems mentioned above, this paper proposes the CWT-SimAM-DAMS model. The specific innovations and contributions are as follows:
(1) The one-dimensional bearing fault signal is intergrated using a sliding window, the segmented data are processed with the CWT algorithm, and, finally, the resulting CWT time–frequency images are enhanced by overlaying high-frequency features using the Unsharp Masking (USM) algorithm. This method is named CWT-USM.
(2) The SimAM attention mechanism is integrated into the Residual Network to enhance the model’s feature extraction capability for input images and provide a robust feature extraction foundation for JMMD and CDAN domain adaptation algorithms. This model is named SimAM-ResNet.
(3) The model’s generalization ability is enhanced utilizing the JMMD and CDAN domain adaptation algorithms and designing an adaptive weighting strategy. The JMMD domain adaptive algorithm provides stable distribution alignment to make adversarial training more stable, and the CDAN domain adaptive algorithm mitigates the JMMD domain adaptive algorithm’s dependence on the kernel method by capturing complex nonlinear distribution differences through adversarial learning. CDAN and JMMD domain adaptive algorithms focus on both the joint distribution of labels and features. The adaptive weighting strategy considers the classification, JMMD, and CDAN loss, effectively reducing the discrepancy in joint distributions and achieving global domain alignment. Additionally, parameters are adaptively adjusted at various stages of model training to ensure the model’s optimal performance.
The rest of the paper is organized as follows: Section 2 describes the theoretical concepts of transfer learning, CWT, USM, SimAM, and ResNet. Section 3 presents a new domain adaptive method for diagnosing bearing faults, including the SimAM attention mechanism, the JMMD and CADN domain adaptation algorithms, and the weight adaptive strategy. Section 4 describes the specifics of the dataset and the parameter settings used in this study. Section 5 provides experimental results and conducts an analysis. Section 6 summarizes the paper.

2. Theoretical Background

2.1. Description of Transfer Learning Problems

In domain adaptation [5], the source domain is defined as D s = { χ s , P ( x s ) } and the target domain as D t = { χ t , P ( x t ) } . The dataset for the source domain is X s = { ( x i s ) } i = 1 N s , with labels Y s = { ( y i s ) } i = 1 N s , where y i 1 , 2 . . . , K , and K denotes the total number of categories. The dataset for the target domain is X t = { ( x i t ) } i = 1 N t . The main problem addressed in this paper is that the feature space of the source and target domains are the same, i.e., χ s = χ t , but their marginal distributions differ, i.e.,  P ( x s ) P ( x t ) .

2.2. Continuous Wavelet Transform

When dealing with the continuous one-dimensional vibration signals of motor faults, an effective feature extraction strategy is to convert the signals into two-dimensional time–frequency images. This method not only enriches the representation of frequency domain information, but also makes it more suitable for the learning process of neural networks due to its two-dimensional structure. In the time–frequency plots, one can directly observe the changes in signal frequency components over time. The CWT is particularly adept due to its window scaling, which overcomes the limitations of the Short-Time Fourier Transform (STFT) [20,21], where window sizes do not vary with frequency or time, making it more suitable for handling transient signals like those in motor faults. The CWT is mathematically formulated as follows [22]:
W x ( a , b ) = 1 a + x ( t ) ψ * t b a d t
where W x ( a , b ) represents the wavelet coefficients, a is the scaling parameter, and b is the translation parameter. The choice of the mother wavelet is crucial in wavelet transforms as it determines the accuracy and efficiency of the transform. Common mother wavelets include the Daubechie Wavelet, Reverse Biorthogonal Wavelet, Bior Wavelet, and Morlet Wavelet. This paper selects the cmor wavelet as the mother wavelet for the CWT because the cmor wavelet, a complexified version of the Morlet Wavelet, possesses excellent time–frequency localization properties and effective filtering and signal reconstruction capabilities.

2.3. Unsharp Masking

Unsharp Masking is a widely used technique for sharpening enhancement. The USM algorithm acquires high-frequency components by subtracting the low-pass filtered blurred image from the original image. These high-frequency parts are then multiplied by a gain coefficient and added back to the original image, enhancing the contrast of these high-frequency components and thereby improving the visual clarity of the image details and edges in the image. The processing steps of Unsharp Masking are as [23] follows:
Step 1: Use a Gaussian filter to create a blurred version of the original image and reduce its high-frequency content.
G ( x , y , σ ) = 1 2 π σ 2 e x 2 + y 2 2 σ 2
where x and y denote the positions relative to the center pixel, and  σ is the standard deviation of the Gaussian distribution, which controls the extent of blurring.
Step 2: Use a high-pass filter to extract the edges and texture information from the image, i.e., the high-frequency components.
H = I ( G I )
where I represents the original image, G I represents the image after applying Gaussian filtering, and H represents the image containing the high-frequency components.
Step 3: Add the high-frequency image to the original image according to a coefficient, adjust the sharpening intensity, and merge them.
B = I + α H
where α represents the sharpening intensity and B is the final image after sharpening.

2.4. Residual Network

For richer image features, a common method is to increase the network. However, as the network depth increases, the model may encounter vanishing or exploding gradient problems, which can decrease its accuracy. He et al. [24] proposed the Residual Network to simplify the training of deep networks. ResNet improves the traditional CNN and effectively addresses this issue. The structure of a residual module is illustrated in Figure 1. The output G ( X ) of the residual network is composed of a combination of input x and mapping function F ( x ) .

2.5. SimAM

SimAM is an attention mechanism distinct from traditional channel attention mechanisms or spatial attention mechanisms [25]. SimAM identifies neurons with higher spatial suppression effects by defining an energy function and assigning them higher weights. Its specific framework is illustrated in Figure 2, and the energy function is expressed as follows:
e t w t , b t , y , x i = y t t ^ 2 + 1 M 1 i = 1 M 1 y 0 x ^ i 2
where t and x i represent the target neuron and the estimated values of other neurons on a single channel of input X R C × H × W (C, H, and W denote the number of channels, height, and width, respectively, and  R is the set of real numbers). w t and b t denote the weight and bias, M = H × W represents the number of neurons in that channel, and t ^ i = w t t + b t and x ^ i = w t x i + b t are the linear transformations of t and x i .
Introducing the regularization coefficient λ into the weights, the energy formula is as follows:
e t ( w t , b t , y , x i ) = 1 M 1 i = 1 M 1 ( 1 ( w t x i + b t ) ) 2 + ( 1 ( w t t + b t ) ) 2 + λ w t 2
The solutions for w t and b t are obtained as follows:
w t = 2 ( t μ t ) ( t μ t ) 2 + 2 σ t 2 + 2 λ b t = 1 2 ( t + μ t ) w t
where μ t = 1 M 1 i = 1 M 1 x i and σ t 2 = 1 M 1 i = 1 M 1 ( x i u t ) 2 represent the mean and variance of the channel excluding the target neuron.
The final simplified minimum energy is as follows:
e t * = 4 ( σ ^ 2 + λ ) ( t u t ) 2 + 2 σ ^ 2 + 2 λ
where σ ^ represents the covariance value. Equation (8) reveals that the smaller the energy value, the greater the separability between the target neuron and the rest of the neurons, indicating an inverse relationship between the energy value and the separability of the target neuron from the rest. Therefore, the attention parameter is denoted by 1 e t * .
Finally, the enhanced input with attention is obtained as follows:
X ˜ = s i g m o i d ( 1 E ) X

3. The Proposed Method

The proposed CWT-SimAM-DAMS method is illustrated in Figure 3. The process begins with converting vibration signals into time–frequency images using the CWT algorithm. These images are then enhanced with the USM algorithm, and the enhanced data serve as input for the model. In the source domain, the feature extraction model first extracts the bearing fault features. These features are then subjected to dimensionality reduction and nonlinear transformation through a bottleneck layer, which includes a Dropout layer ( p = 0.5 ), a fully connected layer, and a ReLU activation function. The transformed features are passed through a linear classifier to compute the classification loss. Simultaneously, the features from the bottleneck layer are used with the JMMD and CDAN domain adaptation algorithms to align the joint distribution between the source and target domains and calculate the domain adaptation loss for these two algorithms. The JMMD provides a smooth and continuous alignment target, making adversarial training more stable, while the CDAN captures complex nonlinear distribution differences through adversarial learning. Additionally, the proposed weight adaptive algorithm can adjust the weights of each part in real time based on the losses from the classification, JMMD, and CDAN during model training, achieving the optimal fault monitoring state.

3.1. Data Processing Based on Unsharp Masking and Continuous Wavelet Transform

The core principle in information theory is that information inevitably suffers loss or degradation during transmission. Therefore, when converting one-dimensional raw data into two-dimensional images, some loss of data information is unavoidable. This paper adopts a 50% data overlap strategy [26] when generating CWT images effectively. The specific procedure is as follows. First, calculate the number of samples per cycle based on the following equation:
N min = f Z r / 60
where f Z represents the sampling frequency of the vibration signal [27], and r is the rotation speed of the bearing. Therefore, the minimum number of samples for the bearing is calculated as N min . However, to maintain the completeness of the sampling data, we consider N 1.5 N min . Then, the data are segmented through a sliding window, where the moving step size of the data segmentation window is half the number of samples per cycle. Continue this process until the end of the data is reached. This procedure is illustrated in Figure 3. After the data are segmented using a sliding window, the steps for image processing are as follows:
Step 1: Normalize the segmented data according to
f ( x ) = ( x min ) / ( max min )
Step 2: Apply the CWT algorithm to the data to transform them into a two-dimensional time–frequency image.
Step 3: Set different values for the key parameters and the USM algorithm and enhance each image generated by the CWT algorithm using the USM algorithm.
Step 4: Conduct comparative experiments on the images generated by the USM algorithm under different parameters, and select the images processed with the parameters yielding the highest fault diagnosis accuracy as the experimental input data.

3.2. Domain Adaptation Model Based on SimAM and ResNet

3.2.1. Residual Network Integrated with SimAM

In fault diagnosis, neural networks play a crucial role in feature extraction. However, traditional residual models often struggle to effectively identify fault features when dealing with complex time–frequency images, mainly due to their limited feature extraction capabilities. Hence, to address this issue, we propose the SimAM-ResNet model, which relies on ResNet as the backbone. ResNet addresses the vanishing gradient problem in deep networks by introducing residual connections, enabling easier training and optimization. Specifically, ResNet adds residual connections across the layers by directly adding the input signal to the output signal during forward propagation, thus implementing “skip connections”. This connection method enables the network to learn more accurate feature representations and significantly reduces training errors. Additionally, ResNet employs batch normalization techniques and pre-activation structures to enhance network performance and stability further. Moreover, the SimAM attention mechanism is introduced on this basis. SimAM determines attention weights by computing similarity scores between elements in the input sequence. Specifically, it calculates the similarity of each element in the input sequence to other elements, typically using the similarity or dot product operations. Then, for each element, it weights and sums the other elements based on their similarity scores to obtain the attention representation of that element. SimAM is unique because it introduces a similarity threshold which automatically filters out low-quality elements, reducing the impact of noise and redundant information. Thus, SimAM improves the robustness and generalization ability of the fault diagnosis model, which is used for refining feature mapping. The specific network structure of the fault classification module is presented in Figure 4 and Table 1.

3.2.2. Joint Maximum Mean Discrepancy

The Maximum Mean Discrepancy [28] is a non-parametric metric for evaluating the difference in distributions of different datasets. It operates by mapping the feature representations of the source and target domains into the Regenerative Kernel Hilbert Space (RKHS), where the distribution discrepancy is determined by calculating the marginal distributions P ( X s ) and Q ( X t ) between the two domains. The MMD is defined as follows:
M M D 2 ( P , Q ) = sup | | ϕ | | H 1 | | E [ ϕ ( x s ) ] E [ ϕ ( x t ) ] | | H 2
where s u p denotes the supremum, ϕ represents the mapping function, which maps the original dataset into the reproducing kernel Hilbert space, H denotes the reproducing kernel Hilbert space, and the subscript | | ϕ | | H 1 indicates that the norm of the function in the Hilbert space is less than or equal to 1. The empirical estimate of the MMD is given by
M M D 2 ( P , Q ) = | | 1 n s i = 1 n s ϕ ( x i s ) 1 n t j = 1 n t ϕ ( x j t ) | | H 2 = | | 1 n s 2 i = 1 n s j = 1 n s k ( x i s , x j s ) 2 n s n t i = 1 n s j = 1 n t k ( x i s , x j t ) + 1 n t 2 i = 1 n t j = 1 n t k ( x i t , x j t ) | | H
where k ( · , · ) is the kernel function, and k ( x i , y i ) = exp ( x i y i 2 / ( 2 σ 2 ) ) .
The MMD, serving as a kernel-based two-sample test statistic, is extensively utilized to assess the distinction between marginal distributions but has not been employed to gauge the difference between joint distributions. Moreover, the MMD exhibits limited domain adaptation capability under complex multimodal conditions, and optimizing kernel parameters poses challenges. Therefore, the JMMD [29] is proposed by considering the empirical joint distributions P ( X s , Y s ) and Q ( X t , Y t ) between the two domains. The JMMD is defined as follows:
L J M M D ( P , Q ) = | | E P ( l = 1 | L | ϕ l ( z l s ) ) E Q ( l = 1 | L | ϕ l ( z l t ) ) | | l = 1 | L | H l 2
where l = 1 | L | ϕ l z l = ϕ 1 z 1 ϕ | L | z | L | , and z l s represents the output of the activation function of the l-th layer of the network.

3.2.3. Conditional Adversarial Domain Adaption

A DANN is a domain adaptive network model based on adversarial concepts. DANNs optimize learning through adversarial training between a feature extractor and a domain classifier. During the training process, domain adaptation is embedded into the model’s learning, enabling the model to extract and recognize domain-invariant features. The Category classifier trained based on adversarial concepts demonstrates good generalization in the target domain. However, DANNs do not consider the joint distribution of features and labels, which can lead to the neglect of class-specific features during training. Additionally, when the data distribution exhibits a multimodal structure, focusing solely on feature distribution makes it challenging for DANNs to accurately align the source and target domains. Long et al. [30] proposed a Conditional Domain Adversarial Network. The CDAN divides the entire network structure into three modules: a feature extractor, a Category classifier, and a domain discriminator. The CDAN addresses the problem of the DANN neglecting the joint distribution of features and labels by introducing a multilinear conditioning mechanism. Specifically, it optimizes the joint distribution of features f and labels g through multilinear mapping, thereby considering the joint distribution of features and labels. T ( f , g ) and T ( f , g ) are the multilinear mapping methods proposed by the CDAN. When d f × d g 4096 , the CDAN takes T ( f , g ) as input for the domain discriminator. When d f × d g > 4096 , to avoid the dimensionality explosion, the CDAN randomly selects certain dimensions of the features and labels for multilinear mapping. In this case, the CDAN takes T ( f , g ) as the input for the domain discriminator. The multilinear mapping method can capture the distribution characteristics of multimodal complex data. The loss function of the CDAN can be expressed as
L C D A N ( θ f , θ d ) = E x i s D s l o g D f i s , g i s E x i t D t l o g 1 D f i t , g i t
where f i s = G f ( x i s , θ f ) , g i s = G c ( G f ( x i s , θ f ) ) , D ( f , g ) = G d ( f g , θ d ) .
Substituting f i s , g i s , and  D ( f , g ) into Equation (15), the loss function of the CDAN can be expressed as
L C D A N ( θ f , θ d ) = E x i s D s log [ G d ( G f ( x i s ) G c ( G f ( x i s ) ) ) ] E x i t D t log [ 1 G d ( G f ( x i t ) G c ( G f ( x i t ) ) ) ]

3.2.4. Domain Adaptation Based on Adaptive Weighting Strategy

This paper proposes a domain adaptation method that improves the accuracy of cross-domain fault diagnosis by enabling the model to reduce marginal distribution discrepancies like the JMMD and achieve global domain alignment like the CDAN. Additionally, this paper designs an adaptive weighting strategy based on the principle that the parts of the loss function with larger values should receive more attention during the training process. The objective of this strategy is to allocate higher weights to objectives that are difficult to achieve in the current stage of the model, thereby prioritizing these parts during training. To ensure that the model learns effective source domain features in the early stages of training and improves domain adaptation ability in the later stages during target optimization, the final loss function L J M M D is obtained by multiplying the JMMD loss function by a parameter λ J M M D . The CDAN requires minimizing the label classification loss and maximizing the domain classification loss during the optimization process. To eliminate the simultaneous maximization and minimization optimization problem, a Gradient Reversal Layer (GRL) is introduced between the feature extractor and the domain discriminator. Specifically, during forward propagation, the GRL does not perform any operation and passes the features normally through the network. During backward propagation, the GRL takes the gradient from the subsequent network, multiplies it by the parameter λ C D A N , and passes it to the previous layer. Through the above operations, the final loss function of the CDAN is obtained as L C D A N . To effectively integrate the JMMD and CDAN, we introduce three key weights: the classifier weight W c , distance weight W J M M D , and adversarial weight W C D A N . The adaptive weighting strategy dynamically adjusts these weights in real time based on the model’s performance during training and optimization objectives. Algorithm 1 describes the process of the CWT-SimAM-DAMS model training. The overall loss function of the CWT-SimAM-DAMS model can be expressed as
L = W c L c + W J M M D L J M M D + W C D A N L C D A N
W c l a s s i f i e r k = 3 × L c k L c k + L J M M D k + L C D A N k
W J M M D k = 3 × L J M M D k L c k + L J M M D k + L C D A N k
W C D A N k = 3 × L C D A N k L c k + L J M M D k + L C D A N k
where L represents the overall loss function of the SimAM-DAMS model, and L c is the classification loss of the model. W c k represents the weight corresponding to L c at the K-th epoch. L J M M D is the loss function of the JMMD algorithm, W J M M D k represents the weight corresponding to L J M M D at the K-th epoch. L C D A N is the loss function of the CDAN algorithm. W C D A N k represents the weight corresponding to L C D A N at the K-th epoch. In this paper, there are three loss functions. To ensure that the sum of the weights is a fixed parameter of 3, each weight is multiplied by 3, which has no special significance.
Algorithm 1:  CWT-SimAM-DAMS algorithm.
  • Input: epoch: max interation;
  • D: CWT-USM images;
  • Randomly initialized network parameter parameterized by θ ;
  • Training:
  • set n = 0,
  • While n < epoch do
  •    for each batch in D do
  •       obtain SimAM-RseNet results: f = F ( X )
  •       obtain classification loss
  •       obtain JMMD loss by Equation (14) and CDAN loss by Equation (16)
  •       calculate CWT-SimAM-DAMS loss by Equation (17)
  •       obtain Predicted results: G ( z )
  •       update θ by using the method of gradient descent;
  •       set n++;
  •    end
  • Output: Predicted label y = G ( z )

4. Data Description

The experimental platform involves a Windows 11 64-bit operating system using a 13th Gen Intel(R) Core(TM) i9-13900HX at 2.20 GHz and an NVIDIA GeForce RTX 4060 laptop GPU. The program runs in the PyCharm 2023.3.4 ×64 environment.

4.1. Dataset Introduction

This paper primarily utilizes two publicly available bearing fault datasets: the Case Western Reserve University bearing dataset and the dataset from the laboratory of the University of Padova. The number of epochs for the CWRU dataset is 80, and, for the PU dataset, it is 800. According to Equation (10), the signal period sampling points are 800 for the CWRU dataset and 3840 for the PU dataset. The data are split into training and testing data with a ratio of 75:25. Below is a detailed introduction to the datasets.

4.1.1. Case Western Reserve University Dataset

The CWRU collected vibration acceleration data [31] from the motor drive and fan end (Figure 5). The dataset includes normal bearing and faulty bearing operation data. This paper utilizes a sample frequency of 12 kHz for the faulty samples from the drive end.
The bearing speed is categorized into four speeds, labeled as “0, 1, 2, 3”, with different loads under each speed. The data are divided into four operating conditions, as reported in Table 2. The CWRU dataset comprises 10 bearing health conditions, including one normal and three types of faults. “IF” represents an inner ring fault, “BF” represents a ball fault, “OF” stands for outer ring fault, and “NA” represents normal bearings, as presented in Table 3. The transfer task 0-1 represents the migration from source domain operating condition 0 to target domain operating condition 1.

4.1.2. Paderborn University Dataset

The PU dataset [32] contains two sets of data: an artificial dataset and an actual bearing damage dataset. This paper selects the actual bearing damage data collected from an accelerated life experiment. The experimental setup [33] is illustrated in Figure 6. The electric motor comprises a drive motor, adjusting nut, spring package, and housing. The vibration acceleration signal sampling frequency for the PU dataset is 64 kHz. Based on the changes in load, radia, and speed in the PU dataset, this paper selects three working conditions for the motor, as reported in Table 4. Six transfer learning tasks are constructed accordingly. This paper investigates transfer learning tasks under different operating conditions using data from 13 bearings damaged due to accelerated life experiments. Table 5 provides the classification information.

4.2. Experimental Parameter Settings

This paper employs the Adam algorithm as the optimizer. The λ C D A N and λ J M M D settings for this article are as follows:
λ C D A N = 1 , λ J M M D = 2 1 + e 10 × current _ epoch middle _ epoch max _ epoch middle _ epoch 1
where m i d d l e _ e p o c h is set to 0. Different datasets have varying maximum numbers of epochs, and the parameter m a x _ e p o c h differs accordingly. The maximum number of epochs for the CWRU dataset is 80, and, for the PU dataset, it is 800.
When c u r r e n t _ e p o c h [ 0 , 40 ) , the learning rate is set to 10 3 . When c u r r e n t _ e p o c h [ 40 , 60 ) , the learning rate is set to 10 4 . When c u r r e n t _ e p o c h [ 60 , max _ e p o c h ) , the learning rate is set to 10 5 .

5. Experimental Verification

5.1. Experiment on Unsharp Mask Parameter Settings

We selected nine values for the σ and λ parameters in the Unsharp Masking algorithm based on reference [34], and the experiments were conducted on the CWRU dataset. The ResNet model was utilized, and each experiment was repeated five times. Table 6 reports the corresponding results. This demonstrates the feasibility of the USM algorithm.
By analyzing Table 6, it can be seen that, after applying the USM algorithm, the overall performance of fault diagnosis improved. In transfer task 2-3, the accuracy of all nine parameter configurations selected in this study was higher than the results using the original CWT images. Moreover, different parameter configurations had a certain impact on the final results. Specifically, when the parameters were set to σ = 1.0 ,   λ = 1.5 , the algorithm performed best, achieving an average accuracy of 86.59%, which is higher than the average accuracy of 84.89% achieved without using the USM algorithm. σ = 1.0 and λ = 1.5 represent a moderately blurred and strongly enhanced image edge and detail processing in sharpening. While reducing minor noise, it also avoids losing too many detailed features. This parameter setting makes the fault features more pronounced without excessively emphasizing noise, effectively balancing the signal-to-noise ratio and showcasing the frequency and time information of CWT images at different scales. Therefore, σ = 1.0 , λ = 1.5 were selected as the parameter settings for the Unsharp Masking algorithm.

5.2. Comparative Experiment of Image Processing Method

Comparative experiments were conducted on the CWRU dataset to validate the effectiveness of the proposed image extraction method (CWT-USM) by combining the CWT and USM in the signal feature extraction process. In the experiment, several different image transformation methods [35] were selected for comparison, including Gramian Angular Summation Fields (GASF), Gramian Angular Difference Fields (GADF), Recurrence Plot (RP), and Markov Transition Fields (MTF) methods. The corresponding two-dimensional images are shown in Figure 7.
The ResNet model was selected for the experiments. Where 50% data overlap was selected for all image processing methods, the signal period sampling points were the same as that of the CWT algorithm. Figure 8 depicts the results after conducting five experiments for each method and averaging the results. By observing the experimental results, it is evident that the accuracy of images processed using GASF, GADF, RP, and MTF methods in the transfer task 0-1 was below 60%. In contrast, the accuracy of images processed with the CWT-USM method in transfer task 0-1 was 85.04%, which is a 30.4% improvement compared to the second-highest accuracy achieved by MTF (54.64%), showing a significant enhancement. Additionally, in other transfer tasks, the accuracy of the RP and MTF was significantly improved compared to that of GASF and GADF, but their accuracy was still lower than that of the CWT-USM method proposed in this study. The results indicate that the CWT-USM method can extract richer and more accurate data features, significantly improving the accuracy of bearing fault diagnosis.

5.3. Comparative Experiments with Different Dimensional Inputs

To compare the impact of different dimensional inputs on fault diagnosis outcomes, we evaluated the original one-dimensional (1D) time domain signal, the 1D frequency domain signal processed by FFT, and the proposed CWT-USM method, which includes time–frequency domain information. The experiments were conducted using the ResNet model, and the results are shown in Table 7.
The results indicate that CWT-USM outperformed the 1D frequency domain input across all transfer tasks. Although the accuracy of CWT-USM was slightly lower in transfer tasks 0-3, 1-2, 1-3, and 2-1 compared to the one-dimensional time domain input, the overall average accuracy of CWT-USM was higher. Specifically, the CWT-USM method improved the average accuracy by 3.61% compared to the 1D time domain input and by 14.46% compared to the 1D frequency domain input.
These experimental results demonstrate the superiority of using CWT-USM as input. By encompassing both frequency domain and time domain information relating to the vibration signal, CWT-USM provides richer feature information, leading to better fault diagnosis performance.

5.4. Comparative Experiments on Different Domain Adaptation Strategies

To enhance the persuasiveness and general applicability of the experiments, this study introduced the PU bearing and the existing CWRU datasets. The experiments extensively compared several transfer strategies, including the baseline model without any transfer strategy (SimAM-ResNet), utilizing the Conditional Domain Adaptation Network (SimAM-ResNet-CDAN), utilizing the Maximum Mean Discrepancy (SimAM-ResNet-JMMD), a model combining CDAN and JMMD but without the adaptive weighting algorithm (SimAM-ResNet-CDAN-JMMD), and the proposed method (CWT-SimAM-DAMS). The experiments were repeated five times, and the results on the CWRU and PU datasets are presented in Table 8 and Table 9, as well as Figure 9 and Figure 10, respectively.
On the CWRU dataset, compared to SimAM-ResNet without domain adaptation algorithm or SimAM-ResNet-CDAN and SimAM-ResNet-JMMD using one domain adaptation algorithm alone, SimAM-ResNet-CDAN-JMMD, a method that combines two domain adaptation algorithms, had an improved average fault diagnostic accuracy. However, there was still be a problem where the accuracy rate decreased in migration tasks compared to when using a domain adaptation algorithm alone, e.g., migration task 0-2. The proposed method, CWT-SimAM-DAMS, achieves an accuracy rate that is greater than or equal to that of other domain adaptation algorithms across all migration tasks. Additionally, this method addresses the decreased accuracy of the SimAM-ResNet-CDAN-JMMD method compared to the SimAM-ResNet-CDAN and SimAM-ResNet-JMMD methods on migration task 0-2. On the PU dataset, the proposed method showed a significant improvement in migration task 0-2 and migration task 2-1. Although it decreased in migration tasks 0-1, 1-0, and 2-0, the conditions 0-2 and 2-1 were improved by 14.03% and 13.42%, respectively, compared to the SimAM-ResNet-CDAN-JMMD method. Thus, it significantly improves the average fault diagnosis accuracy. Compared to other domain adaptation algorithms, the proposed CWT-SimAM-DAMS method exhibits stronger adaptability and accuracy. Because it adjusts the optimization objectives in real time, by comprehensively considering three optimization objectives, the classification, JMMD, and CDAN, this method reduces the distribution differences between the source and target domains, enhancing the model’s ability to diagnose bearing failures.

5.5. Model Comparison Experiment

To verify the feasibility of our proposed bearing fault diagnosis model compared to other bearing fault diagnosis models, we selected several common algorithms and models in bearing fault diagnosis for comparative verification. Each model was applied five times to obtain the average diagnostic accuracy. Figure 11 and Figure 12, as well as Table 10 and Table 11, present the experimental results comparing the CWT-SimAM-DAMS model with the competitor models. Table 12 and Table 13 present the training and testing times of different models. Additionally, the performance of each method was assessed through confusion matrices, as shown in Figure 13 and Figure 14.
The experimental results show that the CWT-SimAM-DAMS model achieved an average accuracy of 99.29% on the CWRU dataset and 86.93% on the PU dataset. Compared to several traditional bearing fault diagnosis methods, the CWT-SimAM-DAMS method significantly improves average accuracy. Specifically, the accuracy of the CWT-SimAM-DAMS method on the CWRU and PU datasets was 13.56% and 25.42% higher, respectively, than that of the traditional ResNet model. Similarly, compared to the CNN model, the CWT-SimAM-DAMS method achieved an average accuracy improvement of 12.18% and 30.66% on the CWRU and PU datasets, respectively. This indicates that the CWT-SimAM-DAMS model has superior feature extraction and domain alignment capabilities.
Table 12 and Table 13 show that the ResNet model had the longest training and testing times on both the CWRU and PU datasets. Although the CWT-SimAM-DAMS model has relatively long training times compared to other models, its testing time does not significantly increase. For the CWRU dataset, the training time difference between all models was less than one minute, and the testing time of the CWT-SimAM-DAMS model was only 0.068898 min longer than the fastest AlexNet model. For the PU dataset, the training time difference between all models was within 10 min, and the testing time of the CWT-SimAM-DAMS model was only 0.4784722 min longer than the fastest CNN model. Considering that, in practical industrial applications, model training is usually conducted offline, training time is not a critical issue compared to model accuracy. Additionally, the difference in testing time between models is not significant. Taking both model accuracy and testing time into account, the CWT-SimAM-DAMS model still has a significant advantage.

5.6. Ablation Study

Various ablation experiments were conducted on the CWRU dataset to verify the proposed CWT-SimAM-DAMS method, referred to as Method 1 in Table 14, and the efficacy of each component. These ablation experiments involved systematically removing key modules of Method 1 and observing their impact on the final performance, thereby revealing the contributions and importance of each module.
By comparing different combinations, it was found that removing the adaptive weighting module led to a significant decrease in performance, indicating the critical importance of the adaptive weighting module for the effectiveness of the CWT-SimAM-DAMS method. Conversely, when CWT-USM was replaced with regular CWT or the Residual Network integrated with SimAM was substituted by a standard residual network, although there was a decrease in performance, the impact was relatively minor. This indicates that, while the module image processing and the residual network integrated with SimAM contribute to performance enhancement, their effect is not as pronounced as that of the weight adaptive strategy module. The complete Method 1 model outperformed all other combinations, validating that integrating all modules achieves the best performance.

6. Conclusions

This study proposes a bearing fault diagnosis method based on SimAM and an adaptive weighting transfer strategy. The proposed method transforms one-dimensional vibration time series signals of bearing faults into CWT images and enhances the detailed features of the images using the USM algorithm, facilitating feature extraction by the model. Integrating the SimAM attention mechanism into the residual network enhances the model’s feature extraction capability in the source domain. Additionally, by combining the JMMD and CDAN algorithms and employing a weight adaptive strategy, the domain adaptation transfer capability of the model is strengthened.
The proposed method is validated on both the CWRU and PU datasets, achieving an accuracy of 99.29% on the CWRU dataset and 86.93% on the PU dataset, representing a significant improvement compared to other models. Moreover, ablation experiments conducted on the CWRU dataset verify the importance and effectiveness of each component. The experimental results demonstrate that this method effectively reduces the distribution difference between the source and target domains, improving fault diagnosis accuracy.
In future research, further optimization of the model architecture will be pursued to enhance its generalization, and application in more realistic industrial scenarios will be explored. Additionally, refinement of model parameters will be conducted to improve both training and testing times for fault diagnosis while maintaining the model’s accuracy.

Author Contributions

Conceptualization, Z.T., X.H. (Xinhao Hou) and X.W.; methodology, Z.T. and X.H. (Xinhao Hou); software, Z.T., X.H. (Xinhao Hou), X.H. (Xinheng Huang) and J.Z.; validation, Z.T., X.H. (Xinhao Hou), X.H. (Xinheng Huang) and J.Z.; formal analysis, Z.T.; investigation, Z.T., X.H. (Xinhao Hou) and X.W.; resources, Z.T. and X.W.; data curation, Z.T. and J.Z.; writing—original draft preparation, Z.T.; writing—review and editing, Z.T. and X.W.; funding acquisition, Z.T. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work were supported by the key project of the National University Innovation and Entrepreneurship Training Programs Foundation under grant number 202210060002.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The detailed data information of this article is included in the article. For detailed information, please contact the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wei, H.; Zhang, Q.; Shang, M.; Gu, Y. Extreme learning machine-based classifier for fault diagnosis of rotating machinery using a residual network and continuous wavelet transform. Measurement 2021, 183, 109864. [Google Scholar] [CrossRef]
  2. Jiang, X.; Song, Q.; Wang, H.; Du, G.; Guo, J.; Shen, C.; Zhu, Z. Central frequency mode decomposition and its applications to the fault diagnosis of rotating machines. Mech. Mach. Theory 2022, 174, 104919. [Google Scholar] [CrossRef]
  3. Wan, L.; Li, Y.; Chen, K.; Gong, K.; Li, C. A novel deep convolution multi-adversarial domain adaptation model for rolling bearing fault diagnosis. Measurement 2022, 191, 110752. [Google Scholar] [CrossRef]
  4. Kompella, K.D.; Rao, M.V.G.; Rao, R.S. Bearing fault detection in a 3 phase induction motor using stator current frequency spectral subtraction with various wavelet decomposition techniques. Ain Shams Eng. J. 2018, 9, 2427–2439. [Google Scholar] [CrossRef]
  5. Zhao, Z.; Zhang, Q.; Yu, X.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Applications of unsupervised deep transfer learning to intelligent fault diagnosis: A survey and comparative study. IEEE Trans. Instrum. Meas. 2021, 70, 1–28. [Google Scholar] [CrossRef]
  6. Wang, X.; Wang, X.; Li, T.; Zhao, X. A fault diagnosis method based on a rainbow recursive plot and deep convolutional neural networks. Energies 2023, 16, 4357. [Google Scholar] [CrossRef]
  7. Wang, X.; Wang, X.; Zhang, X.; Chen, Q. Motor fault diagnosis under variable working conditions based on two-dimensional time series and transfer learning. In Proceedings of the 2022 25th International Conference on Electrical Machines and Systems (ICEMS), Chiang Mai, Thailand, 29 November–2 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
  8. Gu, J.; Peng, Y.; Lu, H.; Chang, X.; Chen, G. A novel fault diagnosis method of rotating machinery via vmd, cwt and improved cnn. Measurement 2022, 200, 111635. [Google Scholar] [CrossRef]
  9. Zhang, J.; Zhang, J.; Zhong, M.; Zheng, J.; Yao, L. A goa-msvm based strategy to achieve high fault identification accuracy for rotating machinery under different load conditions. Measurement 2020, 163, 108067. [Google Scholar] [CrossRef]
  10. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
  11. Cheng, Y.; Lin, M.; Wu, J.; Zhu, H.; Shao, X. Intelligent fault diagnosis of rotating machinery based on continuous wavelet transform-local binary convolutional neural network. Knowl.-Based Syst. 2021, 216, 106796. [Google Scholar] [CrossRef]
  12. Wang, S.; Tian, J.; Liang, P.; Xu, X.; Yu, Z.; Liu, S.; Zhang, D. Single and simultaneous fault diagnosis of gearbox via wavelet transform and improved deep residual network under imbalanced data. Eng. Appl. Artif. Intell. 2024, 133, 108146. [Google Scholar] [CrossRef]
  13. Jiang, Q.; Lin, X.; Lu, X.; Shen, Y.; Zhu, Q.; Zhang, Q. Self-supervised learning-based dual-classifier domain adaptation model for rolling bearings cross-domain fault diagnosis. Knowl.-Based Syst. 2024, 284, 111229. [Google Scholar] [CrossRef]
  14. Zhang, T.; Liu, S.; Wei, Y.; Zhang, H. A novel feature adaptive extraction method based on deep learning for bearing fault diagnosis. Measurement 2021, 185, 110030. [Google Scholar] [CrossRef]
  15. Lu, W.; Liang, B.; Cheng, Y.; Meng, D.; Yang, J.; Zhang, T. Deep model based domain adaptation for fault diagnosis. IEEE Trans. Ind. Electron. 2016, 64, 2296–2305. [Google Scholar] [CrossRef]
  16. Schwendemann, S.; Amjad, Z.; Sikora, A. Bearing fault diagnosis with intermediate domain based layered maximum mean discrepancy: A new transfer learning approach. Eng. Appl. Artif. 2021, 105, 104415. [Google Scholar] [CrossRef]
  17. Lu, N.; Xiao, H.; Sun, Y.; Han, M.; Wang, Y. A new method for intelligent fault diagnosis of machines based on unsupervised domain adaptation. Neurocomputing 2021, 427, 96–109. [Google Scholar] [CrossRef]
  18. Mao, W.; Liu, Y.; Ding, L.; Safian, A.; Liang, X. A new structured domain adversarial neural network for transfer fault diagnosis of rolling bearings under different working conditions. IEEE Trans. Instrum. Meas. 2020, 70, 1–13. [Google Scholar] [CrossRef]
  19. Chen, J.; Liu, H. A multi-gradient hierarchical domain adaptation network for transfer diagnosis of bearing faults. Expert Syst. Appl. 2023, 225, 120139. [Google Scholar] [CrossRef]
  20. Sun, H.; He, Z.; Zi, Y.; Yuan, J.; Wang, X.; Chen, J.; He, S. Multiwavelet transform and its applications in mechanical fault diagnosis–a review. Mech. Syst. Signal Process. 2014, 43, 1–24. [Google Scholar] [CrossRef]
  21. Du, Y.; Chen, Y.; Meng, G.; Ding, J.; Xiao, Y. Fault severity monitoring of rolling bearings based on texture feature extraction of sparse time–frequency images. Appl. Sci. 2018, 8, 1538. [Google Scholar] [CrossRef]
  22. Meng, L.; Su, Y.; Kong, X.; Xu, T.; Lan, X.; Li, Y. Intelligent fault diagnosis of gearbox based on differential continuous wavelet transform-parallel multi-block fusion residual network. Measurement 2023, 206, 112318. [Google Scholar] [CrossRef]
  23. Wang, D.; Gao, T. An efficient usm sharpening detection method for small-size jpeg image. J. Inf. Secur. Appl. 2020, 51, 102451. [Google Scholar] [CrossRef]
  24. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vsion and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  25. Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
  26. Sun, D.; Meng, Z.; Guan, Y.; Liu, J.; Cao, W.; Fan, F. Intelligent fault diagnosis scheme for rolling bearing based on domain adaptation in one dimensional feature matching. Appl. Soft Comput. 2023, 146, 110669. [Google Scholar] [CrossRef]
  27. Wu, Z.; Jiang, H.; Lu, T.; Zhao, K. A deep transfer maximum classifier discrepancy method for rolling bearing fault diagnosis under few labeled data. Knowl.-Based Syst. 2020, 196, 105814. [Google Scholar] [CrossRef]
  28. Borgwardt, K.M.; Gretton, A.; Rasch, M.J.; Kriegel, H.-P.; Schölkopf, B.; Smola, A.J. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 2006, 22, e49–e57. [Google Scholar] [CrossRef]
  29. Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning PMLR, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
  30. Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2018; Volume 31, pp. 1647–1657. [Google Scholar]
  31. Bearings Data Center, Seeded Fault Test Data, Case Western Reserve University. Available online: http://csegroups.case.edu/bear-ingdatacenter/pages/download-data-file (accessed on 19 February 2024).
  32. Lessmeier, C.; Kimotho, J.K.; Zimmer, D. KAt-DataCenter, Chair of Design and Drive Technology. Paderborn University. 2022. Available online: https://mb.uni-paderborn.de/kat/forschung/datacenter/bearing-datacenter/ (accessed on 19 February 2024).
  33. Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the PHM Society European Conference, Bilbao, Spain, 5–8 July 2016; Volume 3. [Google Scholar]
  34. Ye, J.; Shen, Z.; Behrani, P.; Ding, F.; Shi, Y.-Q. Detecting usm image sharpening by using cnn. Signal Process. Image Commun. 2018, 68, 258–264. [Google Scholar] [CrossRef]
  35. Sun, Y.; Wang, W. Role of image feature enhancement in intelligent fault diagnosis for mechanical equipment: A review. Eng. Fail. 2023, 156, 107815. [Google Scholar] [CrossRef]
Figure 1. The basic architecture of ResNet.
Figure 1. The basic architecture of ResNet.
Sensors 24 04251 g001
Figure 2. The architecture of the SimAM attention mechanism.
Figure 2. The architecture of the SimAM attention mechanism.
Sensors 24 04251 g002
Figure 3. The proposed CWT-SimAM-DAMS framework.
Figure 3. The proposed CWT-SimAM-DAMS framework.
Sensors 24 04251 g003
Figure 4. Residual network structure integrated with the SimAM attention mechanism.
Figure 4. Residual network structure integrated with the SimAM attention mechanism.
Sensors 24 04251 g004
Figure 5. CWRU testing platform.
Figure 5. CWRU testing platform.
Sensors 24 04251 g005
Figure 6. PU testing platform.
Figure 6. PU testing platform.
Sensors 24 04251 g006
Figure 7. Data preprocessing 2D images: (a) Gramian Angular Difference Fields; (b) Recurrence Plot; (c) Markov Transition Fields; (d) Gramian Angular Summation Fields.
Figure 7. Data preprocessing 2D images: (a) Gramian Angular Difference Fields; (b) Recurrence Plot; (c) Markov Transition Fields; (d) Gramian Angular Summation Fields.
Sensors 24 04251 g007
Figure 8. Experimental results of comparison between different two-dimensional images.
Figure 8. Experimental results of comparison between different two-dimensional images.
Sensors 24 04251 g008
Figure 9. Average accuracy (%) of different domain adaptation methods on the CWRU dataset.
Figure 9. Average accuracy (%) of different domain adaptation methods on the CWRU dataset.
Sensors 24 04251 g009
Figure 10. Average accuracy (%) of different domain adaptation methods on the PU dataset.
Figure 10. Average accuracy (%) of different domain adaptation methods on the PU dataset.
Sensors 24 04251 g010
Figure 11. Average diagnostic accuracy (%) of different models on the CWRU dataset.
Figure 11. Average diagnostic accuracy (%) of different models on the CWRU dataset.
Sensors 24 04251 g011
Figure 12. Average diagnostic accuracy (%) of different models on the PU dataset.
Figure 12. Average diagnostic accuracy (%) of different models on the PU dataset.
Sensors 24 04251 g012
Figure 13. Visualization of confusion matrix for different models in the target domain of CWRU dataset 0-1 migration task.
Figure 13. Visualization of confusion matrix for different models in the target domain of CWRU dataset 0-1 migration task.
Sensors 24 04251 g013
Figure 14. Visualization of confusion matrix for different models in the target domain of PU dataset 0-1 migration task.
Figure 14. Visualization of confusion matrix for different models in the target domain of PU dataset 0-1 migration task.
Sensors 24 04251 g014
Table 1. Parameters of the residual network structure integrated with the SimAM attention�mechanism.
Table 1. Parameters of the residual network structure integrated with the SimAM attention�mechanism.
LayerParameterOutput Size
SimAM_module\\32 � 32 � 3
Conv2dConv2dkernel_size = 7, stride = 2, pad = 316 � 16 � 64
Layer1Conv2dkernel_size = 3, stride = 1, pad = 116 � 16 � 64
Conv2dkernel_size = 3, stride = 1, pad = 116 � 16 � 64
Conv2dkernel_size = 3, stride = 1, pad = 116 � 16 � 64
Conv2dkernel_size = 3, stride = 1, pad = 116 � 16 � 64
Layer2Conv2dkernel_size = 3, stride = 2, pad = 18 � 8 � 128
Conv2dkernel_size = 3, stride = 1, pad = 18 � 8 � 128
Conv2dkernel_size = 3, stride = 1, pad = 18 � 8 � 128
Conv2dkernel_size = 3, stride = 1, pad = 18 � 8 � 128
Layer3Conv2dkernel_size = 3, stride = 2, pad = 14 � 4 � 256
Conv2dkernel_size = 3, stride = 1, pad = 14 � 4 � 256
Conv2dkernel_size = 3, stride = 1, pad = 14 � 4 � 256
Conv2dkernel_size = 3, stride = 1, pad = 14 � 4 � 256
Layer4Conv2dkernel_size = 3, stride = 2, pad = 12 � 2 � 512
Conv2dkernel_size = 3, stride = 1, pad = 12 � 2 � 512
Conv2dkernel_size = 3, stride = 1, pad = 12 � 2 � 512
Conv2dkernel_size = 3, stride = 1, pad = 12 � 2 � 512
AdaptiveAvgPool\\1 � 1 � 512
Table 2. CWRU dataset operating conditions and data splitting.
Table 2. CWRU dataset operating conditions and data splitting.
Task0123
Speed (rpm)1797177217501730
Load (HP)0HP1HP2HP3HP
Datasettrain75757575
test25252525
Table 3. CWRU dataset fault condition information.
Table 3. CWRU dataset fault condition information.
Class Label0123456789
Fault LocationNAIFBFOFIFBFOFIFBFOF
Fault Size (mils)0777141414212121
Table 4. PU dataset operating conditions and data splitting.
Table 4. PU dataset operating conditions and data splitting.
Task012
Load (Nm)0.70.10.7
Radia (N)10001000400
Speed (rpm)150015001500
Datasettrain757575
test252525
Table 5. PU dataset fault condition information.
Table 5. PU dataset fault condition information.
Class Lable0123456789101112
Bearing CodeKA04KA15KA16KA22KA30KB23KB24KB27KI14KI16KI17KI18KI21
Bearing ElementORORORORORIR(+OR)IR(+OR)IRIRIRIRIRIR
CombinationSSRSRMMMMSRSS
S: single damage; M: multiple damage; R: repetitive damage; IR: inner ring; OR: outer ring.
Table 6. Experimental results of USM algorithm parameter comparison.
Table 6. Experimental results of USM algorithm parameter comparison.
Parameters of USM0-10-20-31-01-21-32-02-12-33-03-13-2Average
Not using USM79.9676.468.8892.7294.884.8488.1290.4893.780.5679.1289.1284.89
σ = 0.7 , α = 1.0 69.0473.671.7692.4895.5290.2488.3289.8495.6883.1278.9691.685.01
σ = 1.0 , α = 0.5 85.276.7272.1690.6495.7690.9286.1691.8497.1280.487889.2886.19
σ = 1.0 , α = 0.8 76.8881.1671.0487.4491.2886.887.688.496.1681.0477.8889.1284.57
σ = 1.0 , α = 1.0 80.9676.1670.3290.695.2888.4887.8490.0897.8477.7679.1288.485.24
σ = 1.0 , α = 1.3 8075.2872.7694.2493.689088.5689.7697.4479.8479.289.8485.88
σ = 1.0 , α = 1.5 85.0480.2472.3295.1295.2887.2888.689.9295.6479.8479.290.5686.59
σ = 1.3 , α = 1.0 74.5678.5673.4489.2894.4883.6888.889.8498.6878.5680.1691.8485.16
σ = 1.3 , α = 1.5 79.5676.18670.3291.1291.57485.86689.7691.0495.8479.8478.6488.2484.83
σ = 1.5 , α = 1.0 78.5672.3268.4891.4497.7687.2891.5288.8895.4880.480.0890.4885.22
Table 7. Comparative experiments with different dimensional inputs for the CWRU dataset.
Table 7. Comparative experiments with different dimensional inputs for the CWRU dataset.
Methods0-10-20-31-01-21-32-02-12-33-03-13-2Average
1D time domain input80.3279.8475.6581.7799.8496.6974.1993.8794.6869.1172.3477.4282.98
1D frequency domain input73.2267.3154.4872.4579.0181.346783.4382.3355.7870.3578.8672.13
CWT-USM85.0480.2472.3295.1295.2887.2888.689.9295.6479.8479.290.5686.59
Table 8. Experimental results of comparing different domain adaptation strategies on the CWRU dataset.
Table 8. Experimental results of comparing different domain adaptation strategies on the CWRU dataset.
Methods0-10-20-31-01-21-32-02-12-33-03-13-2Average
SimAM-ResNet81.0480.473.689694.485.689.4489.9294.877.9279.293.886.28
SimAM-ResNet-JMMD9498.6489.6899.699.9299.5299.699.9210088.1691.7699.7696.71
SimAM-ResNet-CDAN96.8898.8886.899.8410099.8495.3610099.9281.7685.699.6895.38
SimAM-ResNet-CDAN-JMMD97.2894.492.0810099.9299.8410096.2410095.3697.4499.2897.65
CWT-SimAM-DAMS10099.8492.0810010010010010010099.9299.8499.8499.29
Table 9. Experimental results of comparing different domain adaptation strategies on the PU dataset.
Table 9. Experimental results of comparing different domain adaptation strategies on the PU dataset.
Methods0-10-21-01-22-02-1Average
SimAM-ResNet82.450.6584.9251.6340.7441.6658.67
SimAM-ResNet-JMMD97.1174.5297.6679.8274.8972.1882.7
SimAM-ResNet-CDAN97.1760.1896.5573.0565.670.0977.11
SimAM-ResNet-CDAN-JMMD97.0569.7898.477.0579.277.0483.09
CWT-SimAM-DAMS97.0583.8198.0980.3771.8190.4686.93
Table 10. Average diagnostic accuracy (%) of different models on the CWRU dataset.
Table 10. Average diagnostic accuracy (%) of different models on the CWRU dataset.
Methods0-10-20-31-01-21-32-02-12-33-03-13-2Average
VGG33.523635.764042.430.836.4840.449.7238.9632.8833.8437.56
AlexNet76.5475.0474.7299.1290.9682.485.8488.897.1280.4880.1682.2484.45
LeNet84.9673.7673.3898.6489.688092.2492.2685.479.4476.2479.5283.79
CNN79.3673.271.5293.9294.9691.4489.5289.6897.848483.3696.5687.11
ResNet85.0480.2472.3295.1295.2887.2888.689.9295.6479.8479.290.5686.59
CWT-SimAM-DAMS10099.8492.0810010010010010010099.9299.8499.8499.29
Table 11. Average diagnostic accuracy (%) of different models on the PU dataset.
Table 11. Average diagnostic accuracy (%) of different models on the PU dataset.
Methods0-10-21-01-22-02-1Average
VGG41.3414.7438.3611.0819.1622.54824.54
AlexNet71.4840.1279.2251.9734.0230.3251.19
LeNet79.7927.9967.1434.4129.2324.6243.86
CNN85.1638.0386.746.9538.2642.556.27
ResNet83.8247.3887.9559.2642.4848.1661.51
CWT-SimAM-DAMS97.0583.8198.0980.3771.8190.4686.93
Table 12. The average training and testing time of each model in the CWRU dataset.
Table 12. The average training and testing time of each model in the CWRU dataset.
MethodsVGGAlexNetLeNetCNNResNetCWT-SimAM-DAMS
Training time (min)1.8569171.6580831.8411672.2583192.5416812.431865
Test time (min)0.2827780.2680420.3168330.3810560.3853190.33694
Table 13. The average training and testing time of each model in the PU dataset.
Table 13. The average training and testing time of each model in the PU dataset.
MethodsVGGAlexNetLeNetCNNResNetCWT-SimAM-DAMS
Training time (min)27.18352825.04413923.01841723.27447232.80988930.7255
Test time (min)4.09783333.83513893.83519443.79041674.87036114.2688889
Table 14. CWUR dataset ablation experiment result.
Table 14. CWUR dataset ablation experiment result.
Image
Processing
Model
Selection
Domain
Adaptation
Accuracy
CWT-
USM
SimAM-
ResNet
Adaptive
Weight
Strategy
0-10-20-31-01-21-32-02-12-33-03-13-2Average
Method 110099.8492.0810010010010010010099.9299.8499.8499.29
Method 2×81.0480.473.689694.485.689.4489.9294.877.9279.293.886.28
Method 3××85.0480.2472.3295.1295.2887.2888.689.9295.6479.8479.290.5686.59
Method 4×96.1690.5693.3699.9299.7699.9297.7698.810099.9299.999.897.92
Method 5××80.0873.57593.9294.3284.7285.289.6894.478.3277.1285.684.76
Method 6×10099.9293.4410010010010010010092.1684.4899.7697.48
Method 7×××79.9676.468.8892.7294.884.8488.1290.4893.780.5679.1289.1284.89
Method 8××99.9297.5290.0899.8499.8499.7610010010095.9285.6899.8497.37
✓ represents selecting the corresponding module, × represents not selecting the corresponding module.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, Z.; Hou, X.; Huang, X.; Wang, X.; Zou, J. Domain Adaptation for Bearing Fault Diagnosis Based on SimAM and Adaptive Weighting Strategy. Sensors 2024, 24, 4251. https://doi.org/10.3390/s24134251

AMA Style

Tang Z, Hou X, Huang X, Wang X, Zou J. Domain Adaptation for Bearing Fault Diagnosis Based on SimAM and Adaptive Weighting Strategy. Sensors. 2024; 24(13):4251. https://doi.org/10.3390/s24134251

Chicago/Turabian Style

Tang, Ziyi, Xinhao Hou, Xinheng Huang, Xin Wang, and Jifeng Zou. 2024. "Domain Adaptation for Bearing Fault Diagnosis Based on SimAM and Adaptive Weighting Strategy" Sensors 24, no. 13: 4251. https://doi.org/10.3390/s24134251

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop