Content area
In engineering structure performance monitoring, capturing real-time on-site data and conducting precise analysis are critical for assessing structural condition and safety. However, equipment instability and complex on-site environments often lead to data anomalies and gaps, hindering accurate performance evaluation. This study, conducted within a wind farm reinforcement project in Shandong Province, addresses these challenges by focusing on anomaly detection and data imputation for weld nail strain, anchor cable axial force, and concrete strain. We propose an innovative iterative rolling difference-Z-score method for anomaly detection and a machine learning-based imputation framework combining linear interpolation with LightGBM. Experimental results show that the iterative rolling difference-Z-score method detects single-point and clustered anomalies with a Z-score threshold of 4, achieving robust performance even with 80% data loss. The imputation framework maintains low mean squared error (MSE) of 0.0214–0.0227 and root mean squared error (RMSE) of 0.14–0.15 for continuous missing data scenarios (60–200 points), with reliable reconstruction up to 50% data loss. This research provides a robust solution for ensuring the precision and integrity of wind farm monitoring data, enhancing long-term structural reliability in renewable energy applications.
Introduction
In the increasingly competitive landscape of new energy technologies, wind power has become a significant global choice due to its clean and renewable nature [1]. The foundation of a wind turbine is critical to its operation, as it supports the turbine’s weight and withstands dynamic loads during operation [2–4]. The strength and stability of the turbine foundation directly affect the turbine’s functionality. Given the prevalence of turbine failures attributable to foundation damage, especially in older models, reinforcing the turbine foundation is crucial for ensuring long-term stable operation [5].
In recent years, substantial progress has been made in the maintenance and reinforcement of turbine foundations. For instance, Zhao et al. [6] conducted extensive field monitoring and numerical analysis of large wind turbine reinforced concrete foundations, revealing changes in reinforcement effects under environmental loads. Gondle et al. [7] developed a low-cost mechanical displacement indicator and corresponding finite element models to assess the long-term performance and potential degradation of turbine foundation systems. Wei et al. [8] used field monitoring methods to analyze the performance of rock anchor-based turbine foundation reinforcement.
During the long-term monitoring of wind turbine foundations, environmental factors may cause abnormal data fluctuations, and electromagnetic interference can compromise data transmission accuracy, leading to abnormal monitoring data. Additionally, unstable power supplies or sensor damage due to environmental pressures may result in data loss. Addressing these issues is a significant challenge in long-term monitoring [9].
Currently, standard methods for anomaly detection [10] include Isolation Forest, One-Class SVM, DBSCAN, Local Outlier Factor (LOF), K-Means, Gaussian Mixture Model, Autoencoder, and Random Projection. Machine learning-based anomaly monitoring methods use models such as Convolutional Neural Networks (CNN) [11], which transform detection problems into classification or prediction tasks, including models like Quantile Regression Neural Networks (QRNN) [12], Artificial Neural Networks (ANN) combined with error analysis [13], Stacked Adversarial Variational Recurrent Neural Networks (SAVRNN) [14], and combinations of Recurrent Neural Networks (RNN) with Long Short-Term Memory (LSTM) [15–17] for anomaly detection. Standard methods for handling missing data include statistical methods and machine learning techniques.
However, these algorithms often suffer from poor interpretability and limited transferability in practical engineering projects; while they can identify anomalies, they fail to explain the causes or probabilities of these anomalies, and training a neural network is difficult to transfer to other datasets. To address this issue, this paper proposes a new anomaly detection method that effectively solves anomaly detection problems in practical engineering projects. Statistical methods rely on the statistical characteristics of the dataset to infer and fill in missing values, suitable for simpler data distributions [18,19]. Common techniques in this domain include maximum expectation filling [20], regression filling [20], and multiple imputation [21]. In contrast, machine learning methods [22] take a more dynamic and complex approach, treating missing values as target attributes that need to be predicted and filled [23–29].
This study, based on a wind farm reinforcement project in Shandong Province, explores how to manage data anomalies and data loss encountered during long-term monitoring. For monitoring data anomalies, we propose an Iterative Rolling Difference-Z-score method for anomaly detection, which effectively addresses large-scale continuous data anomalies caused by sensor failures and extensive data loss. For monitoring data loss, we introduce a new data imputation framework combining linear interpolation, machine learning, and the Iterative Rolling Difference-Z-score, providing a robust guarantee for the accuracy and integrity of wind farm monitoring data.
Engineering example analysis
Project overview
The wind farm consists of 24 wind turbine units, each supported by foundations constructed using secondary grouting micro-piles and base platforms. The wind farm began operations and grid-connected power generation in November 2013. During a maintenance check in early 2021, it was discovered that over 180 internal anchor rods and more than 20 external anchor rods in the turbines had fractured, with the numbers continuing to rise. Fig 1(a) illustrates the situation of an internal anchor rod fracture. Additionally, the turbine foundations show radial and circumferential cracks of varying depths and widths, with radial cracks being predominant and circumferential cracks concentrated in the root area, as shown in Fig 1(b).
[Figure omitted. See PDF.]
Given the significant safety hazards posed by existing foundation issues affecting the operational integrity of the units, an informed decision was undertaken to initiate experimental reinforcement work on Turbine #1. The proposed reinforcement design for the turbine foundation encompasses the integration of weld nails into the tower structure and the application of an external concrete encasement at the base. Specifically, weld nails will be strategically positioned within a defined height range at the tower’s base, accompanied by an external concrete layer encasing both the existing foundation and the weld-nailed section of the tower. The new and existing concrete foundations will be integrally connected via pressure-dispersive anchor rods extending into the ground. The construction site designated for the reinforcement is illustrated in Fig 2(a), with a detailed schematic of the reinforcement plan presented in Fig 2(b).
[Figure omitted. See PDF.]
Arrangement of monitoring points
To evaluate the structural safety and performance enhancement post-reinforcement of the turbines, it is essential to ascertain the force patterns and load distribution of the turbine foundation following the upgrade. Within this project, strain gauges and anchor assemblies were embedded during construction to facilitate comprehensive monitoring. The specific structural monitoring scope encompasses the external concrete, foundation reinforcing bars, external concrete sidewall rebars, newly added weld nail forces, and axial forces of the anchor cables. The arrangement of monitoring points is as follows: strain gauges for weld nail monitoring are strategically placed in the weld nail region on the outer surface of the wind turbine tower, with the planes designated as D1 to D5. In the non-door area, nine monitoring points are uniformly selected per row of weld nails, whereas in the door area, eight points per row are designated for measurement. The anchor gauge for prestressed anchor monitoring delivers real-time tension data. Strain monitoring of the external concrete structure entails embedding concrete strain gauges within the cast-in-place concrete. The strain gauge for section C1 monitoring is situated at the interface between the new foundation and rock, whereas the device for section C2 monitoring is positioned at the interface between the new and old foundations. The monitoring of internal forces within the foundation reinforcing bars and external concrete sidewall rebars is executed using rebar force meters affixed to the rebars. The rebar meters are positioned on the reinforcing bars (S1 position) and within the external concrete rebars (S2 position), evenly distributed across the four quadrants of the circumference. The schematic diagram illustrating the layout of monitoring points is presented in Fig 3.
[Figure omitted. See PDF.]
Data collection
Establishing monitoring points enables real-time assessment of the reinforcement effect on the wind turbine foundation within the wind farm, thereby ensuring the stability of the engineering structure and its safe, continuous operation. This provides essential data support for evaluating long-term benefits. The detailed data collection procedure is as follows: 14,000 monitoring data points were gathered from August 1, 2021, to December 25, 2021, at 15-minute intervals. Fig 4 illustrates a partial data collection scenario.
[Figure omitted. See PDF.]
The monitoring data depicted in Fig 4 reveals a significant presence of missing and anomalous values, which impedes data analysis. Therefore, addressing the anomalies and imputing the missing data is essential. To resolve these issues, this paper proposes a method for anomaly detection under missing data conditions and introduces a novel data imputation framework to handle missing values.
Outlier handling
Iterative rolling difference-Z-score outlier detection
To tackle the challenge of detecting anomalies in large-scale continuous data caused by sensor failures and massive data loss, this study introduces an innovative iterative rolling difference-Z-score method for anomaly detection. This method is particularly well-suited for the detection and cleansing of anomalies in time-series data. The methodology proceeds as follows: Initially, rolling differences are computed within a predefined window to effectively accentuate data fluctuations, thereby distinguishing between normal and abnormal variations. Subsequently, Z-scores [10] are employed to transform the difference data, quantifying each data point’s deviation from the mean and identifying potential anomalies characterized by significant Z-score values. Following these steps, the rolling difference and Z-score analyses are iteratively performed until anomalies within the dataset are meticulously identified and effectively addressed. This method continuously minimizes the number of anomalies through an iterative optimization process, thereby significantly enhancing overall data quality and analytical accuracy. The anomaly detection process can be represented by the following equation:
(1)
In the formula:
rolling_diff: Denotes the rolling difference value; rolling_diff.mean: Indicates the mean of the rolling difference; rolling_diff.std: Signifies the standard deviation of the rolling difference.
We can trace back to the above derivations and see more details about it from Algorithm 1.
Algorithm 1. Iterative rolling difference-Z-score outlier detection
Input: DataFrame ‘data’
Output: DataFrame ‘processed_data’
For each column in ‘data’:
1. Drop NaN values from the column, store in ‘series’
2. Initialize ‘iterations’ to 0
3. Set ‘max_iterations’ to 10 # Assuming a maximum of 10 iterations
4. While ‘iterations’ < ‘max_iterations’:
a. Calculate rolling difference with a specified period, store in ‘rolling_diff’
b. Calculate Z-score for ‘rolling_diff’
c. Set ‘threshold’ to a predefined value or calculate based on data characteristics
d. Identify anomalies where abs(Z-score)> ‘threshold’
e. If no anomalies detected, break loop
f. Filter out anomalies from ‘series’
g. Increment ‘iterations’ by 1
5. Add cleaned ‘series’ to ‘processed_data ‘, reindex with ‘data’ index
Comparative analysis of anomaly detection models
Presented below is a comparative analysis of anomaly detection capabilities among Iterative Rolling Difference-Z-score, Isolation Forest, One-Class SVM, DBSCAN, LOF (Local Outlier Factor), K-Means, and Gaussian Mixture Model algorithms based on the aggregation feature S1-4 of anomalies (Fig 5).
[Figure omitted. See PDF.]
where (a) – (g) denote Iterative Rolling Difference-Z-score, Isolation Forest, One-Class SVM, DBSCAN, LOF, K-Means, and Gaussian Mixture Model.
As demonstrated in the image above, the Iterative Rolling Difference-Z-score method exhibits superior performance in effectively identifying anomalies compared to other standard anomaly detection algorithms, thereby ensuring the accuracy of monitoring data. Additionally, when compared to manual anomaly removal, the Iterative Rolling Difference-Z-score achieves a correct removal rate of 98.9%, while other algorithms have correct removal rates below 15%. Furthermore, this algorithm offers the advantage of simplified parameter tuning compared to other methods. The anomaly detection parameters for feature S1-4 include three iterations, a rolling difference range of 12, and a Z-score threshold of 4.
Experimental analysis of outlier detection models
S1-2 experiences significant challenges with large-scale missing values, including 230 deliberately introduced outlier points that encompass both continuous anomalies and outliers of substantial magnitude. The anomaly removal effectiveness reaches 100% when compared to manual monitoring. The results of the anomaly detection are illustrated in Fig 6.
[Figure omitted. See PDF.]
The anomaly detection parameters for S1-2 data include three iterations, a rolling difference range of 200, and a Z-score threshold of 4. The Iterative Rolling Difference-Z-score method effectively detects clustered anomalies and identifies anomalies in large-scale missing data scenarios. This method is also characterized by its simplicity in adjustment, necessitating only three parameters to achieve optimal results. For datasets with extensive missing values, adjusting the threshold and expanding the rolling difference range can effectively prevent misjudgments of missing data as anomalies, thereby ensuring accurate anomaly detection and maintaining data integrity.
As discussed in the review, the parameter selection for the Iterative Rolling Difference-Z-score method is as follows:
1. 1. Iteration Count Selection:
1. (1). Frequency of Data Fluctuations: If the data exhibits large fluctuations or contains complex anomaly patterns, increasing the iteration count can help capture these anomalies more effectively. Typically, three iterations are sufficient to smooth the data and extract anomalies, but if there are long-term anomalies or periodic fluctuations within the data, more iterations may be required to improve detection accuracy.
2. (2). Data Size: For larger datasets, increasing the number of iterations may lead to longer computation times. Therefore, it is essential to balance accuracy with computational efficiency when selecting the iteration count. For smaller datasets, fewer iterations (such as 3) are usually adequate.
3. (3). Degree of Anomaly Clustering: When a particular segment of the data exhibits significant shifts and a high concentration of anomalies (e.g., 20 or more consecutive anomalies, such as in the S1-4 feature with clustered anomalies), increasing the iteration count is recommended to effectively eliminate these continuous clustered anomalies.
2. 2. Rolling Difference Range Selection:
1. (1). Data Periodicity or Seasonality: If the data exhibits clear periodic or seasonal patterns, the rolling difference range should match the data’s period or seasonal cycle. For data with an annual cycle, a larger rolling difference range should be chosen to account for the full seasonal variation. In contrast, for short-term data or data with frequent fluctuations, a smaller rolling difference range (e.g., 12) can improve sensitivity.
2. (2). Degree of Anomaly Clustering: When anomalies are highly clustered in the data, expanding the rolling difference range can help capture the trend changes of anomalous data more effectively. However, an excessively large difference range might make it difficult to detect smaller anomalies, so adjustments should be made based on the actual data characteristics.
3. (3). Data Missingness: When a dataset experiences large-scale missing values, it is advisable to increase the rolling difference range to allow the remaining data around the missing values to align better with the overall trend, thereby preventing misclassification of missing data as anomalies.
3. 3. Z-score Threshold Selection:
1. (1). Data Distribution: The Z-score threshold is primarily used to determine whether a data point is anomalous. A higher Z-score threshold (e.g., 4 or 5) offers a more stringent criterion for anomaly detection, which is suitable for data that is more concentrated. A lower Z-score threshold (e.g., 2 or 3) is more appropriate for data with a more dispersed distribution or for cases where anomalies are less pronounced. Typically, a Z-score threshold around 4 provides a good balance.
2. (2). Type of Anomalies: If the goal is to strictly detect all types of anomalies, especially those with large magnitudes, lowering the Z-score threshold can improve sensitivity. Conversely, for more subtle or less significant anomalies, raising the Z-score threshold can help reduce false positives.
Data imputation
Due to the synergistic stresses among various reinforcement components in resisting wind and dynamic loads, inherent relationships and patterns among the monitoring points are evident. Machine learning enables computers to derive patterns and knowledge from data to address various complex problems [30]. Consequently, machine learning methods are employed to drive data-driven approaches for data imputation by discerning the relationships within the data.
The machine learning algorithms employed include Support Vector Machine Regression (SVR), K-Nearest Neighbors Regression (KNN), Ridge Regression, Random Forest Regression, LightGBM, XGBoost, and CatBoost [31–37].
Data Imputation evaluation indicators
Common evaluation metrics for regression tasks: Mean Squared Error (MSE),Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)for evaluation:
Mean Squared Error (MSE) calculation formula:
(2)
Mean Absolute Error (MAE) calculation formula:
(3)
Mean Absolute Error (MAE) calculation formula:
(4)
Where:
yi is the actual value data, is the imputed value, and n is the number of samples.
Monitoring data imputation process framework
After analyzing the data, the missing data types are categorized into isolated missing and continuous missing. Fig 7 illustrates the display of the missing data situation.
[Figure omitted. See PDF.]
Corresponding imputation strategies are employed for different types of missing data. Linear interpolation is used for isolated missing data, while machine learning algorithms are utilized for continuous missing data. Based on this, a data imputation framework combining linear interpolation, machine learning, and iterative rolling difference-Z-score algorithm is proposed and divided into two phases.
In the first phase, linear interpolation is used to impute isolated missing data. For columns with relatively few missing values in continuous missing data, machine learning algorithms are used for initial imputation. The imputed data from this phase serves as input for subsequent imputation steps, gradually completing the data for all features through an iterative approach. This phase may introduce potential outliers that need further processing.
In the second phase, a combination of linear interpolation and machine learning methods with the iterative rolling difference-Z-score algorithm is used for outlier detection and imputation. Initially, the iterative rolling difference-Z-score technique is applied to identify and adjust outliers. Subsequently, isolated, missing data is imputed using linear interpolation. Then, a column is chosen as the target variable, and other columns are used as features to predict and fill missing values using machine learning. This process is repeated iteratively until the number of outliers stabilizes. After completing this process, the remaining missing values are finally imputed using the KNN algorithm. Detailed steps of the entire data imputation framework are elaborated in Fig 8 of this paper.
[Figure omitted. See PDF.]
Monitoring data imputation experiment
To determine the most suitable imputation algorithm, a comparative analysis was performed utilizing Support Vector Machine Regression (SVR), Ridge Regression, Random Forest Regression, LightGBM, XGBoost, and CatBoost. These algorithms were applied to a dataset with missing values omitted, allocating 70% as the training set and 30% as the test set. To ensure reproducibility, the random seed was fixed at 42. Table 1 enumerates the evaluation metrics for the various machine learning algorithms tested, while Table 2 delineates the detailed parameters for these algorithms.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
In this experiment, traditional Support Vector Machine (SVM) and Ridge Regression demonstrated poor performance in capturing the complex relationships within the data.However, as illustrated in Table 1, tree-based models excelled in handling nonlinear relationships, with the LightGBM algorithm particularly excelling in the data imputation task.
To validate the effectiveness of linear interpolation in addressing isolated missing data, the following experiment was conducted: Seven feature columns were randomly selected from the monitoring data, encompassing 1526 data points across 218 rows, with 2% randomly designated as isolated missing values. Linear interpolation was employed to interpolate these isolated missing data points, and the test results before and after imputation were visually compared in Fig 9 The results clearly demonstrate the efficacy of the linear interpolation method in addressing isolated missing data issues and maintaining the accuracy of the original data attributes.
[Figure omitted. See PDF.]
To validate the efficacy of the data imputation framework in addressing continuous missing data, the following experiment was conducted: A feature C1-1, with a missing rate of 32.6% in the original data, was selected as a case study, and the LightGBM algorithm was utilized. For this feature, various degrees of continuous missing data were simulated. Notably, in the monitoring data involved in this study, except for individual instances where the data missing rate reached 80%, the continuous missing data did not exceed 180 data points in other cases, averaging approximately 60 continuous missing data points. Therefore, this experiment manually set continuous missing quantities of 60, 120, and 200 on the originally missing data set, with Mean Squared Error (MSE) serving as the evaluation metric. The results are presented in the Table 3 below and Fig 10 clearly indicate that the method proposed in this study exhibits good imputation performance for such missing data, demonstrating the feasibility and effectiveness of the approach.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
Data repair effect and evaluation
In applying the imputation framework, certain noisy data (i.e., outliers) were intentionally retained to bolster the model’s robustness. The objective was to develop a model capable of making accurate predictions even when confronted with poor-quality input data. With this aim, the first phase of data imputation was executed.
Due to the high volatility of the data and the retention of original noise during training, there was a potential risk of the algorithm introducing significant bias for certain missing data. Through qualitative analysis of data volatility and the application of the iterative rolling difference-Z-score algorithm, the noise and biased data generated in the first phase were effectively filtered out. The imputation work for the second phase proceeded with the LightGBM algorithm until the number of outliers no longer decreased. In the data imputation using LightGBM, the non-missing data is divided into 80% for the training set, 20% for the test set, and the missing values are used as the validation set for imputation. As the number of iterations increased, the count of outliers gradually decreased, eventually stabilizing (see Fig 11(a)). Upon reaching stability, the remaining small amount of missing data was imputed using the KNN algorithm. The results indicated that the noisy data was effectively removed, leaving only a small number of outliers that were challenging for the second-phase algorithm to learn. Consequently, feature autocorrelation imputation was performed on these outliers using the KNN algorithm. To ascertain the optimal number of neighbors for the KNN application, the study introduced 1% random missing values to the data imputed in the first phase and established a K value search range of [1,20]. Five-fold cross-validation was conducted, utilizing the average MSE on the test set as the evaluation criterion. Upon verification, the relationship between the K value and MSE was obtained (see Fig 11(b)), and the optimal number of neighbors was determined to be 3.
[Figure omitted. See PDF.]
The following are the data imputation results of this study, highlighting the units with the highest missing rates. A comparison is drawn between the imputed data and the original data for units with missing rates of 51% (C2-1), 80% (S1-2), 38% (D5-2), and 33% (MS-3),as shown in Fig 12.
[Figure omitted. See PDF.]
Through the kernel density [38,39] plot, the distribution of data [40,41] before and after imputation can be visualized, assisting in assessing the imputation effect, as shown in Fig 13.
[Figure omitted. See PDF.]
By comparing the mean and variance, the effectiveness of the imputation can be further quantified. The mean and variance are two important statistical characteristics of data distribution. By comparing the mean and variance of the imputed data and the original data, the degree of deviation in the statistical properties of the imputed data can be assessed, as shown in the Fig 14.
[Figure omitted. See PDF.]
For the MS-3 dataset with 33% data missing, the imputed data closely aligns with the original in terms of kernel density distribution, mean, and variance, showing an ideal imputation effect. Similarly, the D5-2 and C2-1 datasets, with data missing rates of 38% and 51% respectively, exhibit imputed data that generally matches the original in kernel density plots with minor differences in mean and variance, indicating good imputation results. However, for the S1-2 dataset with a high data missing rate of 80%, the imputed data shows noticeable deviations in the kernel density distribution compared to the original data, especially in areas with significant data loss. Although there are overall shortcomings, the imputation results are still relatively ideal in certain parts.
Model comparison
This paper compares the data imputation framework with common imputation methods, including Multiple Imputation by Chained Equations (MICE), Mean Imputation, and K-Nearest Neighbors Imputation (KNN) (see figure) (Fig 15).
[Figure omitted. See PDF.]
By comparing these different methods, the proposed data repair framework in this paper better reflects the data’s changing patterns, demonstrating superior performance in data imputation tasks. In contrast, the proposed framework combines linear interpolation and machine learning methods, especially showing significant advantages in high missing rate scenarios. It not only improves prediction accuracy but also effectively handles the nonlinear relationships and complex patterns in the data.
Conclusion
This study addresses critical challenges in long-term wind turbine foundation monitoring—data anomalies and missing values—through innovative methodologies that advance the field of structural health monitoring in renewable energy. Within the context of a wind farm reinforcement project in Shandong Province, we introduce a novel iterative rolling difference-Z-score method for anomaly detection and a pioneering data imputation framework integrating linear interpolation with machine learning (LightGBM). These methods offer significant improvements over existing approaches by providing interpretable, transferable, and robust solutions for real-time monitoring data, ensuring the accuracy and reliability essential for assessing turbine foundation health and extending operational longevity.Key findings include:
1. (1). Anomaly Detection: The iterative rolling difference-Z-score method, a novel contribution, excels in detecting single-point and clustered anomalies, maintaining high accuracy even with 80% data loss, unlike traditional methods (e.g., Isolation Forest, One-Class SVM) that struggle with interpretability and transferability.
2. (2). Data Imputation: Our imputation framework, uniquely combining linear interpolation with LightGBM, achieves superior performance with mean squared error (MSE) of 0.0214–0.0227 for continuous missing data (60–200 points) and reliable reconstruction up to 50% data loss, addressing limitations of statistical and less robust machine learning methods.
3. (3). Framework Performance: This dual approach ensures comprehensive data integrity, enabling precise structural assessments critical for preventing wind turbine failures. Its adaptability makes it a scalable solution for other renewable energy and infrastructure monitoring applications.
4. (4). Broader Impact: By enhancing data reliability, our methods support the renewable energy industry’s push for sustainable, long-term wind power solutions, reducing maintenance costs and improving operational safety across global wind farms.
5. (5). Future Work: Future research will explore advanced models, such as Transformer networks and generative adversarial networks (GANs), to further improve imputation accuracy for complex data patterns, potentially broadening applications to other critical infrastructure systems.
The proposed methods not only address immediate monitoring challenges but also set a new standard for data-driven structural health assessment, with significant implications for the reliability and sustainability of renewable energy infrastructure worldwide.
Acknowledgments
We sincerely thank all those who provided help and support during this research. We especially appreciate the valuable assistance from the team at Shandong Jianzhu University and the insightful comments and suggestions from the peer reviewers. Additionally, we would like to express our gratitude to all the team members and collaborators who contributed to this study; their efforts were crucial to the success of the research.
References
1. 1. Piotrowska K, Piasecka I, Kłos Z, Marczuk A, Kasner R. Assessment of the life cycle of a wind and photovoltaic power plant in the context of sustainable development of energy systems. Materials (Basel). 2022;15(21):7778. pmid:36363369
* View Article
* PubMed/NCBI
* Google Scholar
2. 2. Bai JL, Wang RY, Wang YH, et al. A review of onshore wind turbine prefabricated foundation structures [J/OL]. J Civil Environ Eng. Available from: https://kns.cnki.net/kcms/detail/50.1218.TU.20230904.0953.002.html
* View Article
* Google Scholar
3. 3. Ding H, Peng Y, Zhang P, Nie L, Zhai H. Numerical simulation of vacuum preloading for reinforcing soil inside composite bucket foundation for offshore wind turbines. JMSE. 2019;7(8):273.
* View Article
* Google Scholar
4. 4. Fan Q, Zheng J. Macro element model of subplasticity for circular shallow foundations of offshore wind turbines under combined loading modes. Civil Architectural Environ Eng. 2014;36(3):59–63.
* View Article
* Google Scholar
5. 5. Laying the foundation for wind turbines now and in the future. Wind Systems Magazine. Available from: https://www.windsystemsmag.com/laying-the-foundation-for-wind-turbines-now-and-in-the-future/
6. 6. Zhou Y, Liu X, Deng Z, Gao Q-F. Field monitoring and numerical analysis of the reinforced concrete foundation of a large‐scale wind turbine. Adv Mater Sci Eng. 2021;2021(1):1–14.
* View Article
* Google Scholar
7. 7. Gondle RK, Kurup PU, Niezrecki C, et al. Evaluation of wind turbine-foundation degradation. In: Barla M, Di Donna A, Sterpi D, editors. Challenges and innovations in geomechanics. IACMAG 2021. Lecture Notes in Civil Engineering. Cham: Springer; 2021. p. 29–42.
* View Article
* Google Scholar
8. 8. Wei HW, Lei SL, Song ZX. Field monitoring of the force performance of rock anchor reinforced foundation for wind turbine. Civil Environ Eng J (Chin Eng). 2023;29(1):1–9. http://kns.cnki.net/kcms/detail/50.1218.TU.20230904.0953.002.html
* View Article
* Google Scholar
9. 9. Vidal Y, Tutivén C. Field demonstration of real-time wind turbine foundation strain monitoring. Sensors. 2020;20(12):3429.
* View Article
* Google Scholar
10. 10. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. JMLR. 2011;12:2825–30.
* View Article
* Google Scholar
11. 11. Wang W, Liu Y, Hong HJ. A CNN-based model for detecting similar and duplicate records in security data. Comput Appl Softw. 2023;40(2):17–25.
* View Article
* Google Scholar
12. 12. Xiong L, Liu J, Yang F, Zhang G, Dang J. Anomaly detection of hydropower units based on recurrent neural network. Front Energy Res. 2022;10:856635.
* View Article
* Google Scholar
13. 13. Jeatrakul P, Wong KW, Fung CC. Data cleaning for classification using misclassification analysis. J Adv Comput Intell Intell Inform. 2010;14(3):297–302.
* View Article
* Google Scholar
14. 14. Chen WC, Fang BW, Dai L. Multi-dimensional time series anomaly detection based on stacked adversarial variational recurrent neural networks. Sci China Inf Sci. 2023;53(9):1750–67.
* View Article
* Google Scholar
15. 15. Kim T-Y, Cho S-B. Web traffic anomaly detection using C-LSTM neural networks. Expert Syst Appl. 2018;106:66–76.
* View Article
* Google Scholar
16. 16. Yang S, Zhang Y, Lu X, Guo W, Miao H. Multi-agent deep reinforcement learning based decision support model for resilient community post-hazard recovery. Reliab Eng Syst Safe. 2024;242:109754.
* View Article
* Google Scholar
17. 17. Diao TC, Zhang JK, Chen Y. Chinese text sentiment classification application based on recurrent neural networks. Wireless Internet Technol. 2021;18(19):96–7.
* View Article
* Google Scholar
18. 18. Ge Y, Li Z, Zhang J. A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods. Sci Rep. 2023.
* View Article
* Google Scholar
19. 19. Nizam H, Zafar S, Lv Z, Wang F, Hu X. Real-time deep anomaly detection framework for multivariate time-series data in industrial IoT. IEEE Sensors J. 2022;22(23):22836–49.
* View Article
* Google Scholar
20. 20. Avanzi F, Zheng Z, Coogan A, Rice R, Akella R, Conklin MH. Gap-filling snow-depth time-series with Kalman filtering-smoothing and expectation maximization: proof of concept using spatially dense wireless-sensor-network data. Cold Reg Sci Technol. 2020;175:103066.
* View Article
* Google Scholar
21. 21. Shahbazi H, Karimi S, Hosseini V, et al. A novel regression imputation framework for Tehran air pollution monitoring network using outputs from WRF and CAMx models. Atmos Environ. 2018;187:24–33.
* View Article
* Google Scholar
22. 22. Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. pmid:19564179
* View Article
* PubMed/NCBI
* Google Scholar
23. 23. Chen J, Wang XY, Luo LL. Comparison of machine learning and statistical learning in the imputation of missing values. Stat Decis. 2020;36(17):28–32.
* View Article
* Google Scholar
24. 24. Li D, Chi JJ, Xiang B. Missing oil value filling algorithm based on KNN and SMOTE. Math Pract Theory. 2019;49(17):187–95.
* View Article
* Google Scholar
25. 25. Liao X, Zhang Y, Zheng X, Kang J, Zhao H, Wang N. Building energy efficiency assessment base on predict-center criterion under diversified conditions. Energy Build. 2024;311:114118.
* View Article
* Google Scholar
26. 26. Liu Y, Li Q, Wang K. Revealing the degradation patterns of lithium-ion batteries from impedance spectroscopy using variational auto-encoders. Energy Storage Mater. 2024;69:103394.
* View Article
* Google Scholar
27. 27. Lopez Alcaraz JM, Strodthoff N. Diffusion-based time series imputation and forecasting with structured state space models. Technical report, arXiv. 2022.
* View Article
* Google Scholar
28. 28. Fan W, Zheng S, Yi X, Cao W, Fu Y, Bian J, et al. DEPTS: Deep expansion learning for periodic time series forecasting. International Conference on Learning Representations; 2022.
* View Article
* Google Scholar
29. 29. Shen L, Chen W, Kwok J. Multi-resolution diffusion models for time series forecasting. The Twelfth International Conference on Learning Representations; 2024.
* View Article
* Google Scholar
30. 30. Kirch W. Z-score. In: Kirch W, editor. Encyclopedia of public health. Dordrecht: Springer; 2008.
* View Article
* Google Scholar
31. 31. Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995.
* View Article
* Google Scholar
32. 32. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inform Theory. 1967;13(1):21–7.
* View Article
* Google Scholar
33. 33. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.
* View Article
* Google Scholar
34. 34. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
* View Article
* Google Scholar
35. 35. Ke G, Meng Q, Finley T. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146–54. Available from: https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848b693b38c01e38f0-Abstract.html
* View Article
* Google Scholar
36. 36. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York; 2016. p. 785–94.
* View Article
* Google Scholar
37. 37. Prokhorenkova L, Gusev G, Vorobev A. CatBoost: unbiased boosting with categorical features. Adv Neural Inf Process Syst. 2018;31:6638–48. Available from: https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html
* View Article
* Google Scholar
38. 38. Feng Y, Zhou G, Zhang Z, et al. Machine learning-based data imputation in structural health monitoring: a comparative study. J Comput Civil Eng. 2020;34(5):04020028.
* View Article
* Google Scholar
39. 39. Gramacki A. Kernel density estimation. In: Nonparametric kernel density estimation and its computational aspects. Studies in big data, vol 37. Cham: Springer; 2018.
* View Article
* Google Scholar
40. 40. Khan H, Ullah I, Shabaz M, Omer MF, Usman MT, Guellil MS, et al. Visionary vigilance: optimized YOLOV8 for fallen person detection with large-scale benchmark dataset. Image Vis Comput. 2024;149:105195.
* View Article
* Google Scholar
41. 41. Khan H, Usman MT, Koo J. Bilateral Feature Fusion with hexagonal attention for robust saliency detection under uncertain environments. Inf Fusion. 2025;121:103165.
* View Article
* Google Scholar
Citation: Li R, Lu X, Zhao J, Chen W, Wei H, Liu C (2025) Iterative rolling difference-Z-score and machine learning imputation for wind turbine foundation monitoring. PLoS One 20(9): e0331213. https://doi.org/10.1371/journal.pone.0331213
About the Authors:
Renjie Li
Roles: Conceptualization, Funding acquisition
Affiliation: Shandong Electric Power Engineering Consulting Institute Corp., Ltd., Jinan, China
Xiangxing Lu
Roles: Funding acquisition
Affiliation: Shandong Electric Power Engineering Consulting Institute Corp., Ltd., Jinan, China
Jizhang Zhao
Roles: Conceptualization, Data curation, Formal analysis, Investigation, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft
E-mail: [email protected]
Affiliations: School of Civil Engineering, Shandong Jianzhu University, Jinan, China, Key Laboratory of Building Structural Retrofitting and Underground Space Engineering, Ministry of Education, Jinan, China, Subway Protection Research Institute, Shandong Jianzhu University, Jinan, China
ORICD: https://orcid.org/0009-0009-1572-6158
Weibing Chen
Roles: Funding acquisition
Affiliation: Shandong Electric Power Engineering Consulting Institute Corp., Ltd., Jinan, China
Huanwei Wei
Roles: Conceptualization, Data curation, Writing – review & editing
Affiliations: School of Civil Engineering, Shandong Jianzhu University, Jinan, China, Key Laboratory of Building Structural Retrofitting and Underground Space Engineering, Ministry of Education, Jinan, China, Subway Protection Research Institute, Shandong Jianzhu University, Jinan, China
Cong Liu
Roles: Writing – review & editing
Affiliations: School of Civil Engineering, Shandong Jianzhu University, Jinan, China, Key Laboratory of Building Structural Retrofitting and Underground Space Engineering, Ministry of Education, Jinan, China, Subway Protection Research Institute, Shandong Jianzhu University, Jinan, China
[/RAW_REF_TEXT]
1. Piotrowska K, Piasecka I, Kłos Z, Marczuk A, Kasner R. Assessment of the life cycle of a wind and photovoltaic power plant in the context of sustainable development of energy systems. Materials (Basel). 2022;15(21):7778. pmid:36363369
2. Bai JL, Wang RY, Wang YH, et al. A review of onshore wind turbine prefabricated foundation structures [J/OL]. J Civil Environ Eng. Available from: https://kns.cnki.net/kcms/detail/50.1218.TU.20230904.0953.002.html
3. Ding H, Peng Y, Zhang P, Nie L, Zhai H. Numerical simulation of vacuum preloading for reinforcing soil inside composite bucket foundation for offshore wind turbines. JMSE. 2019;7(8):273.
4. Fan Q, Zheng J. Macro element model of subplasticity for circular shallow foundations of offshore wind turbines under combined loading modes. Civil Architectural Environ Eng. 2014;36(3):59–63.
5. Laying the foundation for wind turbines now and in the future. Wind Systems Magazine. Available from: https://www.windsystemsmag.com/laying-the-foundation-for-wind-turbines-now-and-in-the-future/
6. Zhou Y, Liu X, Deng Z, Gao Q-F. Field monitoring and numerical analysis of the reinforced concrete foundation of a large‐scale wind turbine. Adv Mater Sci Eng. 2021;2021(1):1–14.
7. Gondle RK, Kurup PU, Niezrecki C, et al. Evaluation of wind turbine-foundation degradation. In: Barla M, Di Donna A, Sterpi D, editors. Challenges and innovations in geomechanics. IACMAG 2021. Lecture Notes in Civil Engineering. Cham: Springer; 2021. p. 29–42.
8. Wei HW, Lei SL, Song ZX. Field monitoring of the force performance of rock anchor reinforced foundation for wind turbine. Civil Environ Eng J (Chin Eng). 2023;29(1):1–9. http://kns.cnki.net/kcms/detail/50.1218.TU.20230904.0953.002.html
9. Vidal Y, Tutivén C. Field demonstration of real-time wind turbine foundation strain monitoring. Sensors. 2020;20(12):3429.
10. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. JMLR. 2011;12:2825–30.
11. Wang W, Liu Y, Hong HJ. A CNN-based model for detecting similar and duplicate records in security data. Comput Appl Softw. 2023;40(2):17–25.
12. Xiong L, Liu J, Yang F, Zhang G, Dang J. Anomaly detection of hydropower units based on recurrent neural network. Front Energy Res. 2022;10:856635.
13. Jeatrakul P, Wong KW, Fung CC. Data cleaning for classification using misclassification analysis. J Adv Comput Intell Intell Inform. 2010;14(3):297–302.
14. Chen WC, Fang BW, Dai L. Multi-dimensional time series anomaly detection based on stacked adversarial variational recurrent neural networks. Sci China Inf Sci. 2023;53(9):1750–67.
15. Kim T-Y, Cho S-B. Web traffic anomaly detection using C-LSTM neural networks. Expert Syst Appl. 2018;106:66–76.
16. Yang S, Zhang Y, Lu X, Guo W, Miao H. Multi-agent deep reinforcement learning based decision support model for resilient community post-hazard recovery. Reliab Eng Syst Safe. 2024;242:109754.
17. Diao TC, Zhang JK, Chen Y. Chinese text sentiment classification application based on recurrent neural networks. Wireless Internet Technol. 2021;18(19):96–7.
18. Ge Y, Li Z, Zhang J. A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods. Sci Rep. 2023.
19. Nizam H, Zafar S, Lv Z, Wang F, Hu X. Real-time deep anomaly detection framework for multivariate time-series data in industrial IoT. IEEE Sensors J. 2022;22(23):22836–49.
20. Avanzi F, Zheng Z, Coogan A, Rice R, Akella R, Conklin MH. Gap-filling snow-depth time-series with Kalman filtering-smoothing and expectation maximization: proof of concept using spatially dense wireless-sensor-network data. Cold Reg Sci Technol. 2020;175:103066.
21. Shahbazi H, Karimi S, Hosseini V, et al. A novel regression imputation framework for Tehran air pollution monitoring network using outputs from WRF and CAMx models. Atmos Environ. 2018;187:24–33.
22. Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. pmid:19564179
23. Chen J, Wang XY, Luo LL. Comparison of machine learning and statistical learning in the imputation of missing values. Stat Decis. 2020;36(17):28–32.
24. Li D, Chi JJ, Xiang B. Missing oil value filling algorithm based on KNN and SMOTE. Math Pract Theory. 2019;49(17):187–95.
25. Liao X, Zhang Y, Zheng X, Kang J, Zhao H, Wang N. Building energy efficiency assessment base on predict-center criterion under diversified conditions. Energy Build. 2024;311:114118.
26. Liu Y, Li Q, Wang K. Revealing the degradation patterns of lithium-ion batteries from impedance spectroscopy using variational auto-encoders. Energy Storage Mater. 2024;69:103394.
27. Lopez Alcaraz JM, Strodthoff N. Diffusion-based time series imputation and forecasting with structured state space models. Technical report, arXiv. 2022.
28. Fan W, Zheng S, Yi X, Cao W, Fu Y, Bian J, et al. DEPTS: Deep expansion learning for periodic time series forecasting. International Conference on Learning Representations; 2022.
29. Shen L, Chen W, Kwok J. Multi-resolution diffusion models for time series forecasting. The Twelfth International Conference on Learning Representations; 2024.
30. Kirch W. Z-score. In: Kirch W, editor. Encyclopedia of public health. Dordrecht: Springer; 2008.
31. Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995.
32. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inform Theory. 1967;13(1):21–7.
33. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.
34. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
35. Ke G, Meng Q, Finley T. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146–54. Available from: https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848b693b38c01e38f0-Abstract.html
36. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York; 2016. p. 785–94.
37. Prokhorenkova L, Gusev G, Vorobev A. CatBoost: unbiased boosting with categorical features. Adv Neural Inf Process Syst. 2018;31:6638–48. Available from: https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html
38. Feng Y, Zhou G, Zhang Z, et al. Machine learning-based data imputation in structural health monitoring: a comparative study. J Comput Civil Eng. 2020;34(5):04020028.
39. Gramacki A. Kernel density estimation. In: Nonparametric kernel density estimation and its computational aspects. Studies in big data, vol 37. Cham: Springer; 2018.
40. Khan H, Ullah I, Shabaz M, Omer MF, Usman MT, Guellil MS, et al. Visionary vigilance: optimized YOLOV8 for fallen person detection with large-scale benchmark dataset. Image Vis Comput. 2024;149:105195.
41. Khan H, Usman MT, Koo J. Bilateral Feature Fusion with hexagonal attention for robust saliency detection under uncertain environments. Inf Fusion. 2025;121:103165.
© 2025 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.