Abstract

Translate

Deformation is one of the critical response quantities reflecting the structural safety of dams. To enhance outlier identification and denoising in dam deformation monitoring data, this study proposes a novel preprocessing method based on optimized Variational Mode Decomposition (VMD) and Kernel Density Estimation (KDE). The approach systematically processes data in three steps: First, VMD decomposes raw data into intrinsic mode functions without recursion. The parallel Jaya algorithm is used to adaptively optimize VMD parameters for improved decomposition. Second, the intrinsic mode functions containing outlier and noise characteristics are identified and separated using sample entropy and correlation coefficients. Finally, KDE thresholds are applied for outlier localization, while a data superposition method ensures effective denoising. Validation using simulated deformation data and Global Navigation Satellite Systems (GNSS)-based observed horizontal deformation from dam engineering demonstrates the method’s robustness in accurately identifying outliers and denoising data, achieving superior preprocessing performance.

Full text

Turn on search term navigation

Translate

1. Introduction

Traditional dam deformation monitoring often relies on instruments such as levels, theodolites, and total stations, which require manual operation [1]. Although modern total stations and wire extensometers have enabled automated operations, these methods demand stringent observation conditions. In recent years, digital technologies have been increasingly integrated into dam engineering [2], offering innovative solutions for data analysis, monitoring, and decision-making. These advancements enhance the ability to process complex datasets, predict structural behavior, and identify anomalies. Global Navigation Satellite System (GNSS) observations can compensate for the shortcomings of traditional monitoring methods, enhancing the accuracy and stability of dam deformation monitoring [3,4]. By continuously tracking the precise positions of reference points on dams over time, GNSS enables the detection of even subtle movements or deformations [5]. The installation of receivers at key locations facilitates the identification of slow and gradual changes that may not be evident through visual inspection alone.

With the development and application of GNSS in dam deformation monitoring, a substantial amount of deformation monitoring data has been accumulated during both construction and operation phases. The precision of this monitoring data is crucial for accurately assessing dam structural behavior and ensuring safety. Under the influence of accidental or non-accidental uncertainties (e.g., poor atmospheric conditions, poor satellite geometry, signal interference, human factors, etc.), GNSS-based deformation monitoring sequences are often affected by noise or outliers. These interferences impose a significant burden on and adversely affect the reliability of dam safety evaluations. Therefore, it is essential to preprocess the raw GNSS deformation data to improve its quality before conducting a comprehensive dam safety analysis.

Existing methods for outlier identification and denoising in dam deformation data are mainly categorized into three types: statistical analysis methods, regression-based methods, and signal processing methods [6]. Statistical analysis methods often rely on assumptions about the underlying data distribution, such as normality, and utilize statistical metrics to identify data points that deviate significantly from the expected pattern [7]. The PauTa criterion, Chauvenet criterion, and Grubbs criterion are widely employed in statistical analysis and are primarily designed for deformation data that follow a normal distribution. Non-parametric methods, such as boxplot analysis, provide a robust alternative by summarizing key statistical characteristics like central tendency, dispersion, and skewness, enabling outlier detection even for deformation data with non-normal distributions [8]. Regression-based methods, such as robust regression [9,10,11] and machine learning-aided regression approaches [12,13,14,15], are powerful tools for outlier identification or data denoising. These methods model the relationship between target variables and influential variables, making them effective in many scenarios. However, their performance can be hindered by incomplete data or violation of underlying assumptions regarding data distribution and variable relationships.

Signal decomposition algorithms are widely recognized as powerful tools for preprocessing and data mining in time-series analysis. These algorithms decompose the original signal into a set of modal sequences with distinct frequencies and bandwidths. High-frequency modes typically capture noise or roughness features, while middle- and low-frequency modes contain trending or cyclic patterns relevant to deformation sequences. Commonly used decomposition algorithms for deformation sequences include the Wavelet Transform (WT) [16,17], Short-Time Fourier Transform (STFT) [18], Empirical Mode Decomposition (EMD) [19,20,21], and Ensemble Empirical Mode Decomposition (EEMD) [22,23,24]. However, the effectiveness of these methods is limited by certain inherent challenges. For instance, the efficacy of WT depends on the appropriate selection of wavelet basis functions [25]. The performance of STFT is constrained by its fixed time-frequency resolution, while EMD and its variants are susceptible to modal aliasing and endpoint effects, which may compromise the reliability and accuracy of data decomposition.

Variational Mode Decomposition (VMD), proposed by Dragomiretskiy et al. [26], is a signal processing algorithm designed for non-recursive and adaptive decomposition of time-series signals. By effectively mitigating issues such as modal aliasing and endpoint effects, VMD enhances the accuracy and robustness of signal analysis. To date, it has been widely applied across various research domains, such as time-series prediction [27,28,29], fault diagnosis [30,31,32], and structural health monitoring [33,34].

In this study, we address the challenge of preprocessing GNSS-based dam deformation monitoring data to enhance outlier identification and denoising performance. We propose a novel method combining Optimized Variational Mode Decomposition (OVMD) and Kernel Density Estimation (KDE). The workflow of the proposed method is illustrated in Figure 1. The key novelties and contributions of this research can be summarized as follows:

(1). A novel framework based on the envelope function and Parallel Jaya (PJaya) algorithm is proposed to optimize the key parameters of VMD.
(2). Sample entropy and the Pearson correlation coefficient are selected as hybrid discriminating indicators to distinguish the high-frequency Intrinsic Mode Functions (IMFs) obtained by OVMD.
(3). KDE is employed to identify the outlier information concealed within the extracted high-frequency IMFs, while data denoising is accomplished by the superposition of the remaining low-frequency IMFs.

The rest of the paper is organized as follows: Section 2 describes the related methodologies and the procedure of the proposed method. Case studies, together with the results and discussion, are presented in Section 3. Finally, Section 4 summarizes the key findings and provides recommendations for future research.

2. Methods

2.1. Optimized Variational Mode Decomposition

2.1.1. Variational Mode Decomposition

VMD transforms the non-recursive decomposition problem of the data sequence into a variational problem. This approach iteratively searches for the optimal solution of the variational modes, enabling the determination of the center frequency and bandwidth of each IMF. As a result, the data sequence is effectively separated from low to high frequencies.

The estimation of each IMF is achieved through the following steps: the computation of the one-sided spectrum of each mode component using the Hilbert transform; the addition of a correction exponential term to adjust the central frequency, thereby modulating the individual spectrum to correspond to the baseband; and the estimation of the bandwidth of the demodulated signal with the assistance of $H^{1}$ Gaussian smoothing, which enables the decomposition of the data sequence into a constrained variational problem [26], as follows:

(1) $\{\begin{array}{l} \min_{\{u_{k}\}, \{ω_{k}\}} \{\sum_{k} {‖\partial_{t} [(δ (t) + \frac{i}{π t}) u_{k} (t)] e^{- i ω_{k} t}‖}_{2}^{2}\} \\ s . t . \sum_{k = 1}^{K} u_{k} = f (t) \end{array}$

where

u_{k}

denotes the mode function,

ω_{k}

is the frequency of each modal center,

K

is the number of modes, and

f (t)

denotes the original deformation monitoring sequence.

Introducing a quadratic penalty factor $α$ and Lagrange multiplier $λ (t)$ , the above constrained variational problem is transformed into an unconstrained variational problem, as follows:

(2) $\begin{matrix} L (\{u_{k}\}, \{ω_{k}\}, λ) = & α \sum_{k} {‖\partial_{t} [(δ (t) + \frac{i}{π t}) u_{k} (t)] e^{- i ω_{k} t}‖}_{2}^{2} + \\ {‖f (t) - \sum_{k} u_{k} (t)‖}_{2}^{2} + 〈λ (t), f (t) - \sum_{k} u_{k} (t)〉 \end{matrix}$

The optimal solution of the constrained model is obtained by alternating the updates $u_{k}^{n + 1}$ , $ω_{k}^{n + 1}$ , and $λ^{n + 1}$ , solving for the saddle point of Equation (2) through the alternating direction method of the multiplier algorithm ( $n + 1$ is the current number of updates).

The final solution for ${\hat{u}}_{k}^{n + 1}$ is obtained as follows:

(3) ${\hat{u}}_{k}^{n + 1} (ω) = \frac{\hat{f} (ω) - \sum_{i \neq k} {\hat{u}}_{i} (ω) + \frac{\hat{λ} (ω)}{2}}{1 + 2 α {(ω - ω_{k})}^{2}}$

Similarly, the final solution for $ω_{k}^{n + 1}$ is:

(4) $ω_{k}^{n + 1} (ω) = \frac{\int_{0}^{\infty} ω {|{\hat{u}}_{k} (ω)|}^{2} d ω}{\int_{0}^{\infty} {|{\hat{u}}_{k} (ω)|}^{2} d ω}$

In summary, the main steps for implementing VMD are as follows:

(1). Set the mode number $K$ and the penalty parameter $α$ , the initialize modal functions $\{{\hat{u}}_{k}^{1}\}$ , the center frequency $\{{\hat{ω}}_{k}^{1}\}$ , and the Lagrange multiplier ${\hat{λ}}^{1}$ . Let $n \leftarrow 0$ .
(2). Let $n = n + 1$ , update ${\hat{u}}_{k}$ and ${\hat{ω}}_{k}$ in accordance with Equations (3) and (4).
(3). Update ${\hat{λ}}^{n + 1} = {\hat{λ}}^{n} + τ (\hat{f} - \sum_{k} {\hat{u}}_{k}^{n + 1})$ , where $τ$ is the noise threshold parameter in the deformation monitoring sequence.
(4). For constraint $D = {\sum_{k} ‖{\hat{u}}_{k}^{n + 1} - {\hat{u}}_{k}^{n}‖}_{2}^{2} / {‖- {\hat{u}}_{k}^{n}‖}_{2}^{2}$ , terminate the loop ( $ε = 0.01$ ) once $D < ε$ is satisfied. Otherwise, re-execute step (2).

2.1.2. PJaya Algorithm

The Jaya algorithm is a population-based optimization algorithm proposed by R. Venkata Rao [35]. It is effective for solving both constrained and unconstrained optimization problems. The name “Jaya” means victory or success in Sanskrit, reflecting the algorithm’s goal of finding the optimal solution. The idea of the Jaya algorithm is to move toward the best solution while avoiding the worst solution. This is realized by applying a set of updating rules to each individual in the population, where the individual is updated based on the difference between the best candidate, the existing one, and the worst solution. In the ith iteration, the value of the jth variable for the kth candidate is updated as follows:

(5) $x_{j, k, i}^{'} = x_{j, k, i} + r_{1 j, i} (x_{j, best, i} - |x_{j, k, i}|) - r_{2 j, i} (x_{j, worst, i} - |x_{j, k, i}|)$

where

x_{j, best, i}

and

x_{j, worst, i}

denote the values of jth variable for the best candidate and the worst candidate, respectively, and

r_{1 j, i}

and

r_{2 j, i}

denote two random coefficients for the ith variable in the range [0, 1].

To enhance the computational efficiency of the Jaya algorithm, a variant of the Jaya algorithm, PJaya, is adopted for VMD parameter optimization. PJaya introduces a strategy that partitions the initial population into $P$ sub-populations, allowing each sub-population to evolve independently. Within each sub-population, the solution $x_{j, k, i}^{'}$ is updated according to Equation (5), promoting parallel exploration of the solution space. Consequently, PJaya leverages parallel architecture to ensure efficient resource utilization, making it especially effective for parameter tuning of VMD. In this study, the initial parameters for the PJaya algorithm are specified as follows: the number of iterations is set to 20, the population size is set to 500, and the number of sub-populations is set to 5.

2.1.3. Optimize the VMD Parameters Using the PJaya Algorithm

The mode number $K$ and penalty parameters $α$ of VMD are two crucial controlling parameters that have a significant impact on the performance of data decomposition. If $K$ is too large, the sequence may be over-decomposed. Conversely, if $K$ is two small, it might cause incomplete decomposition of the sequence, leading to insufficient extraction of characteristic information. Similarly, if $α$ is set too large or too small, it may cause the modes of each layer to be too smooth or overfitted. Envelope entropy [36] is a metric used to describe the complexity of signal envelopes, which can characterize the time-frequency characteristics and sparsity of IMFs, as shown below:

(6) $\{\begin{cases} E_{k} = - \sum_{i = 1}^{N} \log_{2} p_{i} \\ p_{i} = a (i) / \sum_{i = 1}^{N} a (i) \end{cases}$

where

E_{k}

denotes the envelope entropy of the kth mode (

k = 1, 2, \dots, K

N

denotes the sample number of each mode, and

a (i)

represents the envelope signal sequence.

The optimization of VMD parameters can be transformed into a constrained optimization problem. In this study, the PJaya algorithm is applied to determine the optimal VMD parameters, i.e., mode number $K$ and penalty parameters $α$ , where the minimum value of the envelope entropy of the first mode and second mode is taken as the target function:

(7) $f (K_{b e s t}, α_{b e s t}) = \underset{(K, α)}{argmin} \{E_{1} + E_{2}\}$

The proposed optimization process adaptively determines the optimal parameters for VMD, removing the reliance on manual parameter tuning. This capability addresses the inherent uncertainty caused by random or subjective parameter selection in traditional methods. By incorporating an intelligent optimization mechanism, OVMD enhances the robustness and reliability of the decomposition process, ensuring higher accuracy and consistency in its outcomes.

2.2. Kernel Density Estimation

KDE is a non-parametric approach used to estimate the probability density function (PDF) of an unknown distribution. KDE works by placing a kernel, a small probability distribution, at each observed data point and summing these contributions to obtain a smooth estimate of the PDF. KDE is particularly useful in exploratory data analysis, as it provides insights into the shape of the underlying distribution without making strong assumptions about its form.

Given a set of independent and identically distributed observations $X_{1}, X_{2}, \dots, X_{n}$ from an unknown PDF $f (x)$ , the goal of KDE is to construct an estimate $\hat{f} (x)$ of $f (x)$ . For one-dimensional data, the general formula of KDE is as follows:

(8) ${\hat{f}}_{d} (x) = \frac{1}{n d} \sum_{i = 1}^{n} K (\frac{x - X_{i}}{d})$

where

{\hat{f}}_{d} (x)

is the estimated density at point

x

K (\cdot)

denotes the kernel function, and

d > 0

denotes the bandwidth parameter. The kernel function

K (\cdot)

is a symmetric, non-negative function used to measure the similarity between two variables. In this study, the widely used Gaussian kernel is selected. The bandwidth

d

controls the degree of smoothing. A larger

d

results in a smoother estimate but may lead to underfitting, whereas a smaller

d

captures more local variations but risks overfitting.

Once the $\hat{f} (x)$ is estimated, outliers can be identified by setting a threshold on the density values. Typically, this threshold is set as a fraction of the maximum density, $μ \cdot {\hat{f}}_{\max}$ $(μ \leq 0.05)$ . Any data point with a density below this threshold can be considered a potential outlier. The choice of the threshold is application-dependent. A lower threshold will identify more points as outliers, while a higher threshold will be more conservative. In this study, we set μ = 0.01 as the threshold for outlier detection, as this value effectively captures data points located in the tail region of the density distribution, where the probability density falls below 1% of the peak density.

2.3. Discriminating Indicators

After decomposing the dam deformation data sequence using VMD, a total of K mode functions with different frequencies can be obtained. Among them, the low- and medium-frequency modes retain the main trend and cycle change characteristics of the deformation monitoring sequence, while the high-frequency modes contain a large amount of redundant information, including noise and outliers. Therefore, it is crucial to identify the demarcation point between the high-frequency modes and the middle- and low-frequency modes to distinguish the redundant information. In this study, we adopted sample entropy and the Pearson correlation coefficient as discriminating indicators to identify the high-frequency IMFs.

2.3.1. Sample Entropy

Sample entropy [37] is a statistical indicator used to measure the complexity and randomness of time series data. When the sequence complexity and stochasticity are stronger, the sample entropy is larger. Conversely, when the sequence periodicity and trend are more prominent, the sample entropy is smaller [38]. The main principle of sample entropy is as follows:

For the mode function sequence $\{u (i), i = 1, 2, \dots, N\}$ , construct a set of $m$ -dimensional vector sequences as follows:

(9) $U (i) = [u (i), u (i + 1), \dots, u (i + m - 1)], i = 1, 2, \dots, N - m + 1$

Define the distance $D_{i j}$ between $U (i)$ and $U (j)$ as the absolute value of the maximum difference between the two corresponding elements:

(10) $D_{i j} = \max {| u (i + k) - u (j + k) |, 0 < k < m - 1}$

For a given $U (i)$ and a similarity tolerance parameter $r$ , define $C_{i}$ as the number of $D_{i j}$ less than $r$ . Define the ratio of $C_{i}$ to the total number of distances $N - m + 1$ as $C_{i}^{m} (r)$ :

(11) $C_{i}^{m} (r) = \frac{C_{i}}{N - m + 1}, i = 1, 2, \dots, N - m$

Define $C^{m} (r)$ as:

(12) $C^{m} (r) = \frac{1}{N - m} \sum_{i = 1}^{N - m} C_{i}^{m}$

Update $m = m + 1$ and repeat the above calculation steps to obtain $C^{m + 1} (r)$ .

Theoretically, the sample entropy is expressed as follows:

(13) $S E (m, r) = \lim_{N \to \infty} [- \ln \frac{C^{m + 1} (r)}{C^{m} (r)}]$

When given a limited number of mode functions N, Equation (12) is transformed into the following form:

(14) $S E (m, r, N) = \ln C^{m} (r) - \ln C^{m + 1} (r)$

where

m

is set to 2, and

r

is set to 0.2 times the standard deviation of the data sequence. The sample entropy threshold is empirically set to 0.2 to distinguish which IMF is high-frequency modal. If the corresponding sample entropy of the IMF exceeds the threshold, the IMF is determined to be a potential high-frequency mode function.

2.3.2. Pearson Correlation Coefficient

The Pearson correlation coefficient is a statistical metric of the strength of the linear relationship between two continuous variables. Its expression is shown as follows:

(15) $PCC (f, u) = \frac{\sum_{i = 1}^{N} (f_{i} - \bar{f}) (u_{i} - \bar{u})}{\sqrt{\sum_{i = 1}^{N} {(f_{i} - \bar{f})}^{2} \sum_{i = 1}^{N} {(u_{i} - \bar{u})}^{2}}}$

where

f

is the original data sequence,

u

is the IMF,

\bar{f}

and

\bar{u}

are the mean values of

f

and

u

, respectively. Empirically, the high-frequency IMFs, which contain noise and outliers features, have a low correlation with the original data sequence. Therefore, the correlation coefficient threshold is set as 0.2 to distinguish which IMF is a potential high-frequency mode function.

2.4. Model Implementation

In summary, the proposed pre-processing method of dam deformation monitoring data via OVMD and KDE is shown in Figure 2, and the specific implementation steps are as follows:

(1). Parameter optimization: Input the deformation monitoring data and calculate the optimal parameter combinations of VMD using the average envelope entropy objective function and Parallel Jaya algorithm.
(2). Data decomposition: The optimized VMD algorithm with parameter optimization is used to decompose the original deformation data to obtain $k$ sets of IMFs, from low frequency to high frequency in order of IMF₁~IMF_k.
(3). Mode screening: Calculate the sample entropy and Pearson’s correlation coefficient of each IMF and obtain a group of high-frequency modes by setting the threshold discrimination.
(4). Outliers identification: Sum up the high-frequency IMFs to get a new set of reconstructed sequences. The PDF of the reconstructed sequence is estimated using KDE, and the threshold is set by $μ \cdot {\hat{f}}_{\max}$ . Any data with a density below this threshold is eliminated.
(5). Data denoising: Sum up the remaining IMFs to get a new denoised sequence, thereby achieving data noise reduction.

3. Case Study

In this study, we applied the proposed OVMD-KDE method for dam deformation data preprocessing. To verify the validity of the proposed method, both the simulation data (case study 1) and the measured horizontal deformation data (case study 2) were used for case studies. As a comparison, the performance of the OVMD method was compared with the standard EMD, EEMD, and VMD methods. In both the EMD and EEMD methods, the number of IMFs is adaptively determined based on the signal characteristics. For EEMD, the controlling parameters, namely ensemble size and noise standard deviation, are set to 25 and 0.1, respectively. Similarly, for the standard VMD method, the controlling parameters, namely the number of IMFs and penalty coefficients, are configured to 10 and 3000, respectively.

To systematically evaluate the effectiveness of the proposed OVMD-KDE and provide a comprehensive comparison with the used preprocessing methods, we adopted three metrics: the Signal-to-Noise Ratio (SNR), Pearson Correlation Coefficient (PCC), and Signal Energy Ratio (SER). SNR is a statistical measure used to compare the level of a desired signal to the level of background noise, while PCC and SER reflect the similarity of energy before and after denoising. The equations of evaluation metrics are shown as follows:

(16) $SNR = 10 \cdot \lg (\frac{{\sum_{i = 1}^{N} f_{i}}^{2}}{\sum_{i = 1}^{N} {(f_{i} - y_{i})}^{2}})$

(17) $PCC (f, y) = \frac{\sum_{i = 1}^{N} (f_{i} - \bar{f}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{N} {(f_{i} - \bar{f})}^{2} \sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}}$

(18) $SER = \frac{\sqrt{\sum_{i = 1}^{N} y_{i}^{2}}}{\sqrt{\sum_{i = 1}^{N} {f_{i}}^{2}}}$

where

f

is the original simulation data sequence,

y

is the denoised sequence,

\bar{f}

and

\bar{y}

are the mean values of

f

and

y

, respectively, and

N

is the number of samples. For

SNR

PCC (f, y)

, and SER metrics, the larger the value, the better the performance of the data preprocessing.

3.1. Case Study 1: Preprocessing of the Simulated Deformation Data

The monitored dam deformation can be regarded as a combination of real deformation and noise sequences [39]. According to existing research, dam deformation is influenced by hydrostatic pressure, temperature, and time-dependent effect [40,41,42], where the hydrostatic pressure and temperature components are typically characterized by low-frequency cyclic variations, while the time-dependent component typically exhibits a trend-like variation characteristic [43,44,45]. To study the inherent characteristics of dam deformation, a set of simulation data series is generated. The simulation data consists of two parts: one periodic term to simulate the deformation under the influence of reservoir water level and temperature, and one trend term to simulate the time-dependent deformation. The expression of the original simulation data sequence $f$ is as follows:

(19) $f_{1} = 10 \sin [2 π (t - 90) / 365] + \cos (2 π t / 365) + 2 \log (t / 100 + 1)$

where the number of data sampling points is 2000, and

t

denotes the time (

t = 1, 2, \dots, 2000

To simulate the noise characteristics and outlier characteristics of dam deformation data, Gaussian white noise characterized by a standard deviation of 1.0 and an outlier sequence featuring an amplitude of 10.0 was artificially generated. These two sequences, denoted as $f_{n}$ , were subsequently superimposed onto the original simulation sequence $f_{1}$ to generate the noisy data sequence $f_{2}$ . The sequence $f_{1}$ and $f_{2}$ are illustrated in Figure 3a and Figure 3b, respectively.

For OVMD, the optimized values of $(K, α)$ are $(7, 6.56 \times 10^{4})$ . Based on the obtained optimized parameters, OVMD is adopted to decompose the signal $f_{2}$ into seven IMFs, as illustrated in Figure 4. It can be seen from Figure 4 that the IMF₁ and IMF₂ have low-frequency periodic characteristics, while the rest of the IMFs show irregular high-frequency characteristics.

To distinguish the IMF with noisy features, the sample entropy and correlation coefficient discriminators of each IMF are calculated, as listed in Table 1. It is observed that IMF₃~IMF₆ can be categorized as high-frequency, which is consistent with the frequency characteristics reflected in Figure 4.

The reconstructed high-frequencysequence (marked as $f_{3}$ ) is obtained by superposition of IMF₃~IMF₆, and $f_{3}$ is shown in Figure 3c, in comparison with $f_{n}$ . It can be observed that the distribution range of $f_{3}$ is close to that of $f_{n}$ , and the mutation values in $f_{3}$ and $f_{n}$ almost overlap. Next, we apply KDE to estimate the probability density function of $f_{3}$ and determine the upper limit to localize the outliers. It is seen from Figure 3c that the precision of outlier identification is nearly 100%.

So far, the high-frequency IMFs have been excluded according to the obtained sample entropy threshold and correlation coefficient threshold. The preprocessed data is obtained by superimposing IMF₁ and IMF₂, as shown in Figure 3d. It is evident that the reconstructed data sequence closely matches the original simulation data $f_{1}$ , exhibiting a high degree of fidelity and smoothness. Since the noise signal is known, SNR is adopted as the evaluation metric herein to measure the effectiveness, and the SNR values of different methods are listed in Table 2. It is seen that the SNR value of OVMD is the largest, indicating that the proposed OVMD-KDE method outperforms the other methods.

3.2. Case Study 2: Preprocessing of the Measured Dam Horizontal Deformation

YQ Dam, an earth dam located in north China, was selected as a case study. The dam reaches a maximum height of 24 m and has a crest length of over 2000 m, with a total capacity of 13 × 10⁸ m³. To effectively monitor and analyze the surface deformation of the dam, a comprehensive GNSS deformation monitoring system has been installed, see Figure 5. However, the accuracy of the deformation monitoring data is compromised by various factors, including poor satellite signals and a complex observation environment. As a result, the data is often noisy, complicating the modeling and analysis process. In this study, the horizontal deformation data of the dam slope collected from four monitoring points were selected for data preprocessing. The time series of the data is shown in Figure 6, where the monitoring frequency is once per day.

The optimized parameters of OVMD are summarized in Table 3, while the discriminating indicators of different deformation are listed in Table 4. Based on the proposed OVMD-KDE method, the dam deformation data collected from four GNSS receivers is automatically preprocessed. Taking the deformation collected from the GNSS-1 receiver as an example, the corresponding decomposed IMFs are shown in Figure 7. It can be seen from Figure 7 that the OVMD can extract the features of the original deformation sequence through its adaptive decomposition mechanism. Notably, IMF₁ and IMF₂, identified as the most significant components based on the SE and PCC discriminative indicators, are utilized for deformation reconstruction and preprocessing. Figure 8a–d show the time series of the measured deformation and preprocessed deformation, where the blue curves denote the measured deformation and the red curves denote the preprocessed deformation.

As displayed in Figure 7, the original measured data exhibits significant noise and local abrupt changes, particularly noticeable around days 1000 and 1500. After preprocessing with the OVMD-KDE method, the noise is significantly reduced and the underlying deformation trends become clearer. The proposed method effectively mitigates high-frequency noise and anomalous local spikes, providing a smoother and more reliable representation of the real deformation behavior.

Generally, it is impossible to obtain noise-free data due to the influence of various complex factors, and the real noise signal is unknown. Consequently, SNR is not appropriate as an evaluation metric in this case. To evaluate the preprocessing performance of each method in terms of noise reduction, only SER and PCC metrics were calculated. Table 5 lists the results of the evaluation metrics, and the bar plots of the SER and PCC metrics in different scenarios are illustrated in Figure 9 and Figure 10, respectively.

As seen from the evaluation metric values, the proposed OVMD-KDE method not only achieves the highest SER values but also maintains the highest PCC values across different scenarios (GNSS-1 to GNSS-4). The obtained results highlight the robustness and effectiveness of the OVMD-KDE method in preprocessing dam deformation data.

4. Conclusions

In this study, we present a novel method for preprocessing dam deformation data utilizing the OVMD-KDE approach. This effectively reduces data noise and enhances outlier identification in in-situ dam deformation data collected through GNSS technology. The method is validated using both simulation data and measured horizontal deformation data collected from a dam engineering project. The following conclusions can be drawn from the analysis:

(1). The proposed method utilizes the average envelope entropy as the objective function and leverages a PJaya algorithm to optimize the number of decomposition modes $K$ adaptively and the penalty parameter $α$ . This approach addresses the limitations associated with manual parameter selection and mitigates the risks of over- and under-decomposition of the deformation data.
(2). By employing sample entropy and the correlation coefficient as discriminant indicators, the proposed method can accurately identify high-frequency mode functions. Additionally, outliers in residual high-frequency mode functions are effectively located through kernel density estimation, facilitating precise localization and identification of outliers in dam deformation data.
(3). By superimposing low- and medium-frequency mode functions, the obtained preprocessed data effectively eliminates noise from the original dataset. The efficacy and superiority of the proposed data preprocessing method have been validated through multiple evaluation metrics and comparisons with other signal decomposition-based approaches, including standard VMD, EMD, and EEMD.

For future research, the following remarks can be considered:

(1). Develop a robust data preprocessing framework for denoising dam deformation data, even with incomplete datasets. Additionally, extend the application of the proposed preprocessing method to analyze other types of dam monitoring data, including but not limited to seepage pressure, stress distribution, and temperature gradients.
(2). The proposed method is currently designed for data preprocessing and quality control, not real-time monitoring and anomaly detection. Future research should focus on integrating the preprocessing method with real-time warning systems to improve anomaly detection performance in dam safety monitoring.

Author Contributions

Conceptualization, S.C. and J.S.; methodology, S.C. and C.L.; software, S.C. and C.L.; formal analysis, S.C.; investigation, S.C. and Y.G.; resources, Y.G.; writing—original draft preparation, S.C.; writing—review and editing, C.L. and M.A.H.-A.; visualization, S.C. and C.L.; funding acquisition, S.C., C.L. and J.S. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. Diagram of the proposed data preprocessing method for GNSS dam deformation.

Figure 2. Flowchart of the proposed method.

View Image - Figure 3. Time-series of the simulation data: (a) [Forumla omitted. See PDF.]; (b) [Forumla omitted. See PDF.]; (c) Comparison of [Forumla omitted. See PDF.] and separated noisy data; (d) Comparison of [Forumla omitted. See PDF.] and preprocessed data.

Figure 3. Time-series of the simulation data: (a) [Forumla omitted. See PDF.]; (b) [Forumla omitted. See PDF.]; (c) Comparison of [Forumla omitted. See PDF.] and separated noisy data; (d) Comparison of [Forumla omitted. See PDF.] and preprocessed data.

Figure 4. Time series of the obtained IMFs.

Figure 5. The GNSS pillars and receiver located on the YQ dam.

Figure 6. Time-series of the measured deformation data.

Figure 7. Decomposition result of GNSS-1 deformation data using OVMD.

Figure 8. Comparison of the measured deformation and preprocessed deformation. (a) GNSS-1; (b) GNSS-2; (c) GNSS-3; (d) GNSS-4.

Figure 9. Bar plots of the SER metrics.

Figure 10. Bar plots of the PCC metrics.

Table 1

Discriminating indicators of case study 1.

Discriminating Indicators	IMF₁	IMF₂	IMF₃	IMF₄	IMF₅	IMF₆	IMF₇
SE	0.0066	0.0387	0.5945	0.5372	0.2776	0.4496	0.5874
$PCC (f, y)$	0.2069	0.9639	0.0337	0.0436	0.0375	0.0464	0.0380

Table 2

Evaluation metrics for the preprocessing effectiveness of simulation data.

Method	OVMD	VMD	EMD	EEMD
SNR	34.5875	30.2050	28.6438	31.7735

The best values are highlighted in bold.

Table 3

Controlling parameters of OVMD in case study 2.

Parameters	GNSS-1	GNSS-2	GNSS-3	GNSS-4
( $K, α$ )	$(5, 1.01 \times 10^{3})$	$(5, 2.61 \times 10^{3})$	$(4, 2.05 \times 10^{3})$	$(4, 3.36 \times 10^{3})$

Table 4

Discriminating indicators of case study 2.

Point Number	Discriminating Indicators	IMF₁	IMF₂	IMF₃	IMF₄	IMF₅
GNSS-1	SE	0.0214	0.1911	0.5123	0.4493	0.5795
GNSS-1	$PCC (f, y)$	0.9707	0.2472	0.1075	0.1077	0.0936
GNSS-2	SE	0.0198	0.0963	0.3128	0.4633	0.2552
GNSS-2	$PCC (f, y)$	0.9531	0.3479	0.1559	0.0913	0.0877
GNSS-3	SE	0.0158	0.1512	0.3039	0.3495	/
GNSS-3	$PCC (f, y)$	0.9736	0.2580	0.0952	0.0749	/
GNSS-4	SE	0.0184	0.1229	0.3954	0.3299	/
GNSS-4	$PCC (f, y)$	0.9709	0.2701	0.0853	0.0683	/

Table 5

Evaluation metrics for the preprocessing effectiveness of measured deformation data.

Scenarios	Metrics	OVMD	VMD	EMD	EEMD
GNSS-1	SER	0.9683	0.9562	0.9163	0.9518
GNSS-1	$PCC (f, y)$	0.9859	0.9814	0.9829	0.9756
GNSS-2	SER	0.9596	0.9580	0.8473	0.9458
GNSS-2	$PCC (f, y)$	0.9833	0.9822	0.9699	0.9772
GNSS-3	SER	0.9788	0.9706	0.9767	0.9633
GNSS-3	$PCC (f, y)$	0.9895	0.9871	0.9821	0.9854
GNSS-4	SER	0.9829	0.9661	0.9273	0.9704
GNSS-4	$PCC (f, y)$	0.9885	0.9855	0.9843	0.9855

The best values are highlighted in bold.

References

1. Seyed-Kolbadi, S.M.; Hariri-Ardebili, M.A.; Mirtaheri, M.; Pourkamali-Anaraki, F. Instrumented Health Monitoring of an Earth Dam. Infrastructures; 2020; 5, 26. [DOI: https://dx.doi.org/10.3390/infrastructures5030026]

2. Hariri-Ardebili, M.A.; Mahdavi, G.; Nuss, L.K.; Lall, U. The role of artificial intelligence and digital technologies in dam engineering: Narrative review and outlook. Eng. Appl. Artif. Intell.; 2023; 126, 6813. [DOI: https://dx.doi.org/10.1016/j.engappai.2023.106813]

3. Li, Z.; Dai, J.; Huang, D.; Wen, Z.; Li, H.; Wang, Z.; Kang, R. Application and prospects of satellite remote sensing monitoring technology in water conservancy projects. Adv. Water Sci.; 2023; 34, pp. 798-811. [DOI: https://dx.doi.org/10.14042/j.cnki.32.1309.2023.05.014]

4. Scaioni, M.; Marsella, M.; Crosetto, M.; Tornatore, V.; Wang, J. Geodetic and Remote-Sensing Sensors for Dam Deformation Monitoring. Sensors; 2018; 18, 3682. [DOI: https://dx.doi.org/10.3390/s18113682]

5. Barzaghi, R.; Cazzaniga, N.; De Gaetani, C.; Pinto, L.; Tornatore, V. Estimating and Comparing Dam Deformation Using Classical and GNSS Techniques. Sensors; 2018; 18, 756. [DOI: https://dx.doi.org/10.3390/s18030756]

6. Deng, Z.; Gao, Q.; Huang, M.; Wan, N.; Zhang, J.; He, Z. From data processing to behavior monitoring: A comprehensive overview of dam health monitoring technology. Structures; 2025; 71, 108094. [DOI: https://dx.doi.org/10.1016/j.istruc.2024.108094]

7. Li, B.; Yang, J.; Hu, D.X. Dam monitoring data analysis methods: A literature review. Struct. Control Health Monit.; 2019; 27, e2501. [DOI: https://dx.doi.org/10.1002/stc.2501]

8. Fontana, M.; Bernardi, M.S.; Cigna, F.; Tapete, D.; Menafoglio, A.; Vantini, S. Identification of Precursors in InSAR Time Series Using Functional Data Analysis Post-Processing: Demonstration on Mud Volcano Eruptions. Remote Sens.; 2024; 16, 1191. [DOI: https://dx.doi.org/10.3390/rs16071191]

9. Erdogan, H. The Effects of Additive Outliers on Time Series Components and Robust Estimation: A Case Study on the Oymapinar Dam, Turkey. Exp. Tech.; 2012; 36, pp. 39-52. [DOI: https://dx.doi.org/10.1111/j.1747-1567.2010.00676.x]

10. Yuen, K.-V.; Ortiz, G.A. Outlier detection and robust regression for correlated data. Comput. Methods Appl. Mech. Eng.; 2017; 313, pp. 632-646. [DOI: https://dx.doi.org/10.1016/j.cma.2016.10.004]

11. Li, X.; Li, Y.L.; Lu, X.; Wang, Y.F.; Zhang, H.; Zhang, P. An online anomaly recognition and early warning model for dam safety monitoring data. Struct. Health Monit.; 2020; 19, pp. 796-809. [DOI: https://dx.doi.org/10.1177/1475921719864265]

12. Lin, C.; Chen, S.; Hariri-Ardebili, M.A.; Li, T. An Explainable Probabilistic Model for Health Monitoring of Concrete Dam via Optimized Sparse Bayesian Learning and Sensitivity Analysis. Struct. Control Health Monit.; 2023; 2023, 2979822. [DOI: https://dx.doi.org/10.1155/2023/2979822]

13. Chen, S.; Gu, C.; Lin, C.; Wang, Y.; Hariri-Ardebili, M.A. Prediction, monitoring, and interpretation of dam leakage flow via adaptative kernel extreme learning machine. Measurement; 2020; 166, 108161. [DOI: https://dx.doi.org/10.1016/j.measurement.2020.108161]

14. Zheng, D.J.; Li, X.Q.; Yang, M.; Su, H.Z.; Gu, C.S. Copula entropy and information diffusion theory-based newprediction method for high dam monitoring. Earthq. Struct.; 2018; 14, pp. 143-153.

15. Ren, Q.; Li, M.; Kong, R.; Shen, Y.; Du, S.L. A hybrid approach for interval prediction of concrete dam displacements under uncertain conditions. Eng. Comput.; 2021; 39, pp. 1285-1303. [DOI: https://dx.doi.org/10.1007/s00366-021-01515-3]

16. Gu, C.; Wu, B.; Chen, Y. A High-Robust Displacement Prediction Model for Super-High Arch Dams Integrating Wavelet De-Noising and Improved Random Forest. Water; 2023; 15, 1271. [DOI: https://dx.doi.org/10.3390/w15071271]

17. Jia, D.; Yang, J.; Sheng, G. Dam deformation prediction model based on the multiple decomposition and denoising methods. Measurement; 2024; 238, 115268. [DOI: https://dx.doi.org/10.1016/j.measurement.2024.115268]

18. Mata, J.; Tavares de Castro, A.; Sá da Costa, J. Time–frequency analysis for concrete dam safety control: Correlation between the daily variation of structural response and air temperature. Eng. Struct.; 2013; 48, pp. 658-665. [DOI: https://dx.doi.org/10.1016/j.engstruct.2012.12.013]

19. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.; 1998; 454, pp. 903-995. [DOI: https://dx.doi.org/10.1098/rspa.1998.0193]

20. Zhang, Z.; Gu, C.; Bao, T.; Zhang, L.; Yu, H. Application analysis of empirical mode decomposition and phase space reconstruction in dam time-varying characteristic. Sci. China-Technol. Sci.; 2010; 53, pp. 1711-1716. [DOI: https://dx.doi.org/10.1007/s11431-010-3098-1]

21. Bian, K.; Wu, Z.Y. Data-based model with EMD and a new model selection criterion for dam health monitoring. Eng. Struct.; 2022; 260, 114171. [DOI: https://dx.doi.org/10.1016/j.engstruct.2022.114171]

22. Wu, Z.; Huang, N.E. Ensemble Empirical Mode Decomposition: A Noise-Assisted Data Analysis Method. Adv. Adapt. Data Anal.; 2009; 1, pp. 1-41. [DOI: https://dx.doi.org/10.1142/S1793536909000047]

23. Su, H.; Li, H.; Chen, Z.; Wen, Z. An approach using ensemble empirical mode decomposition to remove noise from prototypical observations on dam safety. Springerplus; 2016; 5, 650. [DOI: https://dx.doi.org/10.1186/s40064-016-2304-4] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27330916]

24. Li, B.; Zhang, L.; Zhang, Q.; Yang, S. An EEMD-Based Denoising Method for Seismic Signal of High Arch Dam Combining Wavelet with Singular Spectrum Analysis. Shock Vib.; 2019; 2019, 4937595. [DOI: https://dx.doi.org/10.1155/2019/4937595]

25. Guo, T.; Zhang, T.; Lim, E.; Lopez-Benitez, M.; Ma, F.; Yu, L. A Review of Wavelet Analysis and Its Applications: Challenges and Opportunities. IEEE Access; 2022; 10, pp. 58869-58903. [DOI: https://dx.doi.org/10.1109/ACCESS.2022.3179517]

26. Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process.; 2014; 62, pp. 531-544. [DOI: https://dx.doi.org/10.1109/TSP.2013.2288675]

27. Chen, H.; Lu, T.; Huang, J.; He, X.; Yu, K.; Sun, X.; Ma, X.; Huang, Z. An Improved VMD-LSTM Model for Time-Varying GNSS Time Series Prediction with Temporally Correlated Noise. Remote Sens.; 2023; 15, 3694. [DOI: https://dx.doi.org/10.3390/rs15143694]

28. Zhiyao, L.; Yong, D.; Denghua, L. Coupling VMD and MSSA denoising for dam deformation prediction. Structures; 2023; 58, 105503. [DOI: https://dx.doi.org/10.1016/j.istruc.2023.105503]

29. Wang, H.; Ao, Y.; Wang, C.; Zhang, Y.; Zhang, X. A dynamic prediction model of landslide displacement based on VMD-SSO-LSTM approach. Sci. Rep.; 2024; 14, 9203. [DOI: https://dx.doi.org/10.1038/s41598-024-59517-2]

30. Li, H.; Liu, T.; Wu, X.; Chen, Q. An optimized VMD method and its applications in bearing fault diagnosis. Measurement; 2020; 166, 108185. [DOI: https://dx.doi.org/10.1016/j.measurement.2020.108185]

31. Jin, Z.; He, D.; Wei, Z. Intelligent fault diagnosis of train axle box bearing based on parameter optimization VMD and improved DBN. Eng. Appl. Artif. Intell.; 2022; 110, 104713. [DOI: https://dx.doi.org/10.1016/j.engappai.2022.104713]

32. Li, Z.; Chen, J.; Zi, Y.; Pan, J. Independence-oriented VMD to identify fault feature for wheel set bearing fault diagnosis of high speed locomotive. Mech. Syst. Signal Process.; 2017; 85, pp. 512-529. [DOI: https://dx.doi.org/10.1016/j.ymssp.2016.08.042]

33. Xin, J.; Jiang, Y.; Zhou, J.; Peng, L.; Liu, S.; Tang, Q. Bridge deformation prediction based on SHM data using improved VMD and conditional KDE. Eng. Struct.; 2022; 261, 114285. [DOI: https://dx.doi.org/10.1016/j.engstruct.2022.114285]

34. Civera, M.; Surace, C. A Comparative Analysis of Signal Decomposition Techniques for Structural Health Monitoring on an Experimental Benchmark. Sensors; 2021; 21, 1825. [DOI: https://dx.doi.org/10.3390/s21051825] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33807884]

35. Rao, R. Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput.; 2016; 7, pp. 19-34.

36. Lei, W.; Wang, G.; Wan, B.; Min, Y.; Wu, J.; Li, B. High voltage shunt reactor acoustic signal denoising based on the combination of VMD parameters optimized by coati optimization algorithm and wavelet threshold. Measurement; 2024; 224, 113854. [DOI: https://dx.doi.org/10.1016/j.measurement.2023.113854]

37. Richman, J.S.; Moorman, J.R. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol.; 2000; 278, pp. 2039-2049. [DOI: https://dx.doi.org/10.1152/ajpheart.2000.278.6.H2039]

38. Delgado-Bonal, A.; Marshak, A. Approximate Entropy and Sample Entropy: A Comprehensive Tutorial. Entropy; 2019; 21, 541. [DOI: https://dx.doi.org/10.3390/e21060541]

39. Chen, S.; Gu, C.; Lin, C.; Hariri-Ardebili, M.A. Prediction of arch dam deformation via correlated multi-target stacking. Appl. Math. Model.; 2021; 91, pp. 1175-1193. [DOI: https://dx.doi.org/10.1016/j.apm.2020.10.028]

40. Gu, C.; Wu, Z. Safety Monitoring of Dams and Dam Foundations-Theories & Methods and Their Application; Hohai University Press: Nanjing, China, 2006.

41. Mata, J. Interpretation of concrete dam behaviour with artificial neural network and multiple linear regression models. Eng. Struct.; 2011; 33, pp. 903-910. [DOI: https://dx.doi.org/10.1016/j.engstruct.2010.12.011]

42. Salazar, F.; Morán, R.; Toledo, M.A.; Oñate, E. Data-Based Models for the Prediction of Dam Behaviour: A Review and Some Methodological Considerations. Arch. Comput. Methods Eng.; 2015; 24, pp. 1-21. [DOI: https://dx.doi.org/10.1007/s11831-015-9157-9]

43. Shi, Y.; Yang, J.; Wu, J.; He, J. A statistical model of deformation during the construction of a concrete face rockfill dam. Struct. Control Health Monit.; 2018; 25, e2074. [DOI: https://dx.doi.org/10.1002/stc.2074]

44. Prakash, G.; Sadhu, A.; Narasimhan, S.; Brehe, J.M. Initial service life data towards structural health monitoring of a concrete arch dam. Struct. Control Health Monit.; 2018; 25, e2036. [DOI: https://dx.doi.org/10.1002/stc.2036]

45. Lin, C.; Li, T.; Chen, S.; Yuan, L.; van Gelder, P.H.A.J.M.; Yorke-Smith, N. Long-term viscoelastic deformation monitoring of a concrete dam: A multi-output surrogate model approach for parameter identification. Eng. Struct.; 2022; 266, 114553. [DOI: https://dx.doi.org/10.1016/j.engstruct.2022.114553]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Dam Deformation Data Preprocessing with Optimized Variational Mode Decomposition and Kernel Density Estimation

Content area

Abstract

Full text

1. Introduction

2. Methods

2.1. Optimized Variational Mode Decomposition

2.1.1. Variational Mode Decomposition

2.1.2. PJaya Algorithm

2.1.3. Optimize the VMD Parameters Using the PJaya Algorithm

2.2. Kernel Density Estimation

2.3. Discriminating Indicators

2.3.1. Sample Entropy

2.3.2. Pearson Correlation Coefficient

2.4. Model Implementation

3. Case Study

3.1. Case Study 1: Preprocessing of the Simulated Deformation Data

3.2. Case Study 2: Preprocessing of the Measured Dam Horizontal Deformation

4. Conclusions