Full Text

Turn on search term navigation

1. Introduction

Concrete is a mixture of cement, water, fine and coarse aggregates, and additives and admixtures, which are widely used in the construction of water-related buildings and hydraulic structures [1,2]. Hydraulic concrete is a special type of concrete that is mainly used in the construction of hydraulic structures such as water conservancy projects, port projects, and ocean projects [3,4,5]. As the building material with the largest amount at present, concrete is widely used in high-rise buildings, bridges, port terminals, dams, and other projects because of its wide sources of raw materials, low cost, easy preparation, fire prevention, anti-scouring, and strong adaptability [6]. However, the anti-seepage capability of the hydraulic concrete structures is easily affected by the alternating effects of dryness, wetness, freezing, and thawing caused by the rise and fall of reservoir water for a long time, and are prone to erosion, damage, and cracks, reducing their impermeability and durability.

The compressive resistance of concrete is one of the important indicators to measure the safety of concrete structures [6]. The prediction of the concrete compressive strength can ensure that the engineering structure has a sufficient load-bearing capacity during use, thereby ensuring the safety of the project [7,8,9,10,11]. In addition, accurately predicting the compressive properties of concrete allows for the mix ratio to be optimized, enabling a reasonable assessment of the structural health and ensuring that the design is tailored to the project’s specific needs. This approach helps achieve both optimal economic benefits and high project quality.

With the rapid development of computer science, artificial intelligence (AI) technology has gradually matured [12,13,14]. As an important branch of it, machine learning (ML) can filter and sort out massive information and discover the patterns in it, improving the efficiency of data utilization [15,16,17,18]. An ML-based algorithm in AI technology can autonomously learn and optimize the prediction model based on historical data and empirical knowledge, improving the accuracy of the concrete compressive strength prediction [19]. For example, Lyngdoh et al. [9] developed a prediction model for concrete strengths enabled by missing data imputation and interpretable machine learning algorithms. Rahman et al. [20] proposed a data-driven shear strength prediction of steel-fiber-reinforced concrete beams using machine learning approaches. Nguyen et al. [12] developed a deep neural network with high-order neurons for the prediction of foamed concrete strength. Charrier et al. [12] developed an artificial neural network for the prediction of the fresh properties of cementitious materials. However, a single machine learning model is prone to overfitting and insufficient prediction accuracy when predicting concrete compressive strength, which affects the modeling accuracy. Further analysis of the shortcomings of traditional machine learning models shows that their low performance is partly caused by insufficient optimization of model parameters and insufficient non-linear processing capabilities.

To address the aforementioned challenges, this study introduces a novel method for predicting concrete compressive strength using hybrid ensemble learning and metaheuristic optimization algorithms. Ensemble machine learning models, which combine the predictions of multiple individual models, typically offer improved performance. They are more robust and accurate than single models because they capture diverse aspects of the data and mitigate overfitting [21,22]. Initially, we utilize several classic machine learning models as base models and optimize their parameters using the metaheuristic-based improved gray wolf algorithm. In the subsequent stage, we employ the light gradient boosting machine (LightGBM) model alongside a metaheuristic-based optimization algorithm to further integrate the information extracted from the base models. This approach helps to identify the primary factors influencing the compressive performance of concrete.

The contributions of this study are as follows:

(1). Multiple advanced machine learning models are used as base learners, with metaheuristic optimization algorithms determining the optimal parameters.
(2). A two-stage method for predicting concrete compressive strength is developed that combines a stacking ensemble learning strategy with the lightweight gradient boosting tree model, utilizing data fusion techniques.
(3). The effectiveness of the integrated learning and data fusion-based prediction model is thoroughly validated through both qualitative evaluations and quantitative calculations.

The remainder of this paper is organized as follows: Section 2 introduces the basic methodology of multiple machine learning soft computing methods. Section 3 details the sources and specific parameters of the concrete compressive strength dataset. Section 4 presents a series of evaluation experiments conducted to test the model’s performance in predicting concrete compressive strength. Finally, Section 5 describes the contributions of this study and the advantages of the developed model, including discussions on its limitations and plans for future research.

2. Methodology

This section first introduces the related basic theories of each base learner algorithm for predicting the compressive strength of concrete and then introduces the theory of the stacking-based ensemble learning strategy.

2.1. Statistical and ML-Based Algorithms

Multiple linear regression (MLR) is a statistical method that analyzes the relationship between multiple independent variables and a dependent variable [23]. Unlike simple linear regression, which considers only one predictor, MLR incorporates multiple predictors to create a linear equation that best predicts the dependent variable’s value. This technique is widely used in fields such as economics, social sciences, and finance to model and predict outcomes effectively.

Support vector machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks, and is commonly applied in pattern recognition and data analysis. Figure 1 shows the working principle diagram of SVM. SVM is effective at handling both linearly separable and non-linearly separable data by finding the optimal hyperplane that maximally separates different classes in the data.

Random forest (RF) is a widely used machine learning algorithm for both classification and regression tasks [24]. As depicted in Figure 2, it is an ensemble learning method that constructs multiple decision trees and combines their predictions to make a final decision. RF operates by randomly selecting a subset of features from the dataset to build each decision tree. The final prediction is made by aggregating the predictions from all the trees. This approach helps reduce overfitting and enhances the model’s accuracy.

Gaussian process regression (GPR) is a probabilistic machine learning technique used for regression tasks [25]. It relies on the concept of Gaussian processes, which are flexible non-parametric models capable of capturing complex patterns in data. In GPR, the objective is to estimate a function that maps input variables to output variables. This method models the function as a distribution over functions, characterized by its mean and covariance. The mean function represents the average behavior of the data, while the covariance function captures the relationships between different input points.

(1) $f (x) ~ G P (m (x), C (x, x^{'}))$

(2) $m (x) = E [f (x)]$

(3) $C (x, x^{'}) = E [(f (x) - m (x)) (f (x^{'}) - m (x^{'}))]$

where

x

represents the variables that affect the strength of concrete, such as age, water–cement ratio, etc.;

m (x)

represents the mean function; and

C (x, x^{'})

represents the covariance function.

The basic purpose of GPR is to predict future concrete strength values by learning the response laws of the dam structure and inputting new environmental variable data.

(4) $y = f (x) + ε$

where

ε

is assumed to be independent and identically distributed white noise and

(5) $y ~ G P (m (x), k (x_{i}, x_{j}) + σ_{n}^{2} δ_{i j})$

where

δ_{i j}

represents the Kronecker function.

(6) $[\begin{matrix} y \\ f_{*} \end{matrix}] ~ N (\begin{matrix} m (X) \\ m (X_{*}) \end{matrix}, [\begin{array}{l} K (X, X) + σ_{n}^{2} I & K (X, X_{*}) \\ K (X_{*}, X) & K (X_{*}, X_{*}) \end{array}])$

where

K (X, X)

and

K (X_{*}, X_{*})

represent the covariance functions of the training set and test set, respectively.

K (X, X_{*})

represents the covariance matrix obtained from the training and test data sets.

K (X_{*}, X) = K {(X, X_{*})}^{T} .

The posterior distribution can be derived by combining the prior distribution for the given data training set and test set

(7) $f_{*} | X, y, x_{*} ~ N ({\bar{f}}_{*}, cov v (f_{*}))$

where

{\bar{f}}_{*}

and

cov (f_{*})

can be denoted as follows:

(8) ${\bar{f}}_{*} = E [f_{*} | X, y, x_{*}] = m (X_{*}) + K (X_{*}, X) {[K (X, X) + σ_{n}^{2} I]}^{- 1} (y - m (X))$

(9) $cov (f_{*}) = K (X_{*}, X_{*}) - K (X_{*}, X) {[K (X, X) + σ_{n}^{2} I]}^{- 1} K (X, X_{*})$

An artificial neural network (ANN) is a machine-learning algorithm inspired by the structure and function of the human brain [26]. As shown in Figure 3, the ANN model consists of layers of interconnected nodes or neurons that process information and learn from data through a process called training. ANNs are versatile and can be used for various tasks, including classification, regression, and pattern recognition.

LightGBM is a gradient-boosting framework that employs tree-based learning algorithms and is designed for efficiency and scalability, particularly with large-scale datasets [27]. As illustrated in Figure 4, the network architecture of the LightGBM model utilizes a leaf-wise tree growth strategy and histogram-based feature bundling to achieve faster training speeds and lower memory usage compared to other gradient-boosting frameworks. It supports categorical features and can handle missing data.

LightGBM creates an ensemble of decision trees using the gradient-boosting algorithm. Gradient boosting is an iterative process that combines multiple weak models (decision trees) into a strong predictive model by fitting each new model to the residuals (errors) of the previous models, thereby gradually reducing the overall error. With these optimizations, LightGBM can efficiently handle large datasets with many features and produce accurate predictions. It is widely used for various machine learning tasks, including classification and regression problems.

2.2. The Metaheuristic-Based Parameter Optimization Algorithm

The gray wolf optimization (GWO) algorithm is a metaheuristic optimization technique inspired by the social behavior of gray wolves [28]. It mimics the wolves’ hunting behavior to find optimal solutions for given problems. As depicted in Figure 5, the GWO algorithm’s social hierarchy consists of four categories of wolves: alpha, beta, delta, and omega. These wolves interact through hunting, encircling, and attacking behaviors, which are mathematically modeled to update the positions of candidate solutions. GWO has been effectively applied to various optimization problems in engineering, computer science, and other fields.

The first computational step of the GWO algorithm is to initialize the gray wolf population, i.e., the initial manual assignment of parameters in the gray wolf algorithm, including the maximum number of iteration steps of the gray wolf algorithm, the upper and lower boundaries of parameters when optimizing the machine learning parameter matrix, and the wolf group used in the model. The number of gray wolves to be optimized and the accuracy definition of the machine learning model are the objective functions of the gray wolf algorithm.

The distribution of wolves in the gray wolf optimization (GWO) algorithm is determined by their positions relative to the prey, representing the combination of model parameters. These positions are either randomly generated or updated during the iteration process. Initially, when the GWO algorithm starts iterating, the wolves are scattered and randomly distributed within a defined range of parameter values. This random generation of parameter combinations serves as the initial position of the gray wolf population, signifying that the prey is surrounded.

The best parameter combination estimated in the current iteration is updated for the parameter combination represented by each gray wolf. The expression of the update process is as follows:

(10) $\{\begin{matrix} D = |C \cdot X_{p} (k) - X (k)| \\ X (k + 1) = X_{p} (k) - A \cdot D \\ A = 2 a \cdot r_{1} - a \\ C = 2 r_{2} \end{matrix}$

where

D

is the distance between the gray wolf individual and the prey, that is, the distance between the current iterative optimization parameter combination and the current estimated optimal parameter combination;

K

is the current iteration step number;

X_{p}

is the position vector of the prey, because the optimal parameter combination cannot be determined for the time being, and so the optimal parameter combination estimated under the current iteration step is used as an approximate replacement;

X (k)

is the position vector of the gray wolf, which refers to the parameter combination used for optimization in the current step; and

X (k + 1)

is the updated position vector of the gray wolf.

That is, after the current iteration is completed based on the currently estimated optimal parameter combination, a new parameter combination with a certain probability of improving the accuracy of the prediction model is updated, which is used for the iterative optimization of the number of steps. Its updated formula is as follows:

(11) $\{\begin{matrix} \begin{matrix} D_{α} = |C_{1} X_{α} - X| \\ D_{β} = |C_{2} X_{β} - X| \\ D_{δ} = |C_{3} X_{δ} - X| \\ X_{1} = X_{α} - A_{1} D_{α} \\ X_{2} = X_{β} - A_{2} D_{β} \\ X_{3} = X_{δ} - A_{3} D_{δ} \\ X_{p} (k + 1) = (X_{1} + X_{2} + X_{3}) / 3 \end{matrix} \end{matrix}$

2.3. Stacking Ensemble Learning

Ensemble learning is an ML technique that involves combining multiple models to improve the overall performance of the system. The idea is that by combining the predictions of several models, we can reduce the risk of errors and improve the accuracy of the system. There are several ensemble learning methods, including bagging, boosting, and stacking [29]. Bagging involves training multiple models on different subsets of the data while boosting involves training models sequentially and adjusting the weights of the data points based on their performance. Stacking involves combining the predictions of multiple models using another model as a meta-model.

The stacking integrated learning model is an integrated model with a hierarchical structure, and its structure is shown in Figure 6. First, the original data are input into the first-layer base learner for training, and the corresponding prediction results are obtained. Then, the prediction results of the first layer are used as the input data of the second layer, and the meta-learner is trained to obtain the final prediction results. The stacking model integrates the prediction results of multiple base learners by introducing a meta-learner, and its prediction performance is usually better than that of a single base learner.

2.4. Evaluation Indicators

In this study, three evaluation indicators, including the correlation coefficient (R²), mean absolute error (MAE), and mean square error (MSE), are utilized for the predictive model performance evaluation. The details about the calculation process are as follows:

(12) $\{\begin{cases} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \\ M_{MAE} = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}| \\ R_{MSE} = \frac{1}{n} \sum_{i = 1}^{n} {|y_{i} - {\hat{y}}_{i}|}^{2} \end{cases}$

where

y_{i}

and

{\hat{y}}_{i}

denote the measured and predicted concrete strength values, respectively; and

n

denotes the number of samples.

3. Case Study

3.1. Concrete Strength Prediction Database

This study uses open-source dataset collection and physical experimental models to construct a hydraulic concrete structure compressive performance dataset. For the public open-source data set, the specific information is as follows. The University of California, Irvine (UCI) hosts the UCI Machine Learning Repository, which is a valuable resource for researchers and ML enthusiasts. The repository contains various datasets that can be used for experimental purposes in ML and data analysis. These datasets cover a wide range of domains, from medical data to image classification and concrete strength prediction.

To make up for the insufficiency of the public dataset, physical model experiments are used to verify and supplement. Figure 7 demonstrates the hydraulic concrete specimen curing and mechanical property test evaluation processes. In total, 1050 data were used for model building, the validation evaluation, and performance testing.

Table 1 shows the statistical description of the developed hydraulic concrete compressive strength dataset. Based on the above database, the main factors affecting the compressive strength performance of hydraulic concrete include six parameters: the specimen age, amount of blast furnace slag relative to the total binder content, amount of the fine aggregate relative to the total binder content, water content relative to the total binder content, amount of the coarse aggregate relative to the fine aggregate ratio, and amount of superplasticizer relative to the fine aggregate ratio. The model output variables mainly include the compressive strength value of hydraulic concrete after reaching the required age.

3.2. Data Collection and Preprocessing

Figure 8 shows a histogram of the main distributions of these environmental variables that influence the properties of concrete materials. It can be seen from the figure that the data distribution regularity and differences in different influencing factors are significant. This shows that the factors that affect the performance of concrete materials are complex and diverse, and mining these influencing factors can provide a reference for predicting the performance and life of concrete.

Figure 9 shows the Pearson coefficient correlation matrix of influencing factors and compressive strength. It should be noted that the larger the circle in the figure is, the stronger the correlation between the two variables is. It can be seen from the figure that the compressive strength of concrete is correlated with these factors. Therefore, the prediction of concrete material properties based on influencing factors is a typical nonlinear data mining problem.

3.3. Model Building and Evaluation

A total of 1030 pieces of data are divided into the training set and the test set according to the ratio of 8:2. Then, 80% of the data in the dataset is used to build the model, and the remaining 20% of the data is used to evaluate the model’s compressive strength prediction performance. In other words, 824 pieces of data are used for model building and 206 pieces of data are used for model testing. The training and validation processes of the method used in this study were implemented on a computing workstation with Intel i9-13900 K, a single NVIDIA GeForce RTX 4090 D GPU, and 128 RAM. The proposed and benchmark algorithms were coded on the Python framework and VsCode IDE.

Hyperparameters in ML are the settings or configurations that are not learned directly from the data but are set by the user before training a model. They affect the learning process and can significantly impact the performance of the model. After parameter optimization, the optimal hyperparameter settings of the model are as follows. For the SVM model, the optimal combination of hyperparameters is $C = 2430$ and $gamma = 0.01$ . For the RF model, the optimal combination of hyperparameters is $n_estimators = 115$ , $\max_depth = 4$ , and $learning_rate = 0.01$ . For the GPR model, the optimal combination of hyperparameters is $kernel = R B F$ $alpha = 0.01$ . For the ANN model, the optimal combination of hyperparameters is $hidden_neurons = 160$ .

4. Analysis of the Experimental Results

4.1. Comparison of ML-Based Models

The training and testing of the basic model will directly affect the prediction performance of ensemble learning. Therefore, this study first studies the performance of a single ML model in predicting concrete compressive strength.

Figure 10 shows a box plot of the results from the base models. Table 2 demonstrates the performance of these basic models on the test set. It can be inferred from the figure and the table that the performance of ML-based algorithms in predicting concrete material properties is significantly better than traditional statistical regression methods. This is mainly because the traditional linear models based on statistical regression have great limitations when dealing with nonlinear problems such as concrete material performance predictions. Different ML algorithms have certain differences in the prediction results for concrete compressive strength. Considering the evaluation metric MAE, the GPR, the ANN, the SVM, and the RF soft computing algorithms achieve 59.51%, 38.23%, 3.78%, and 48.49% compared with the classic statistical regression method. In terms of the evaluation metric MSE, the GPR, the ANN, the SVM, and the RF soft computing algorithms achieve 41.56%, 21.35%, 24.11%, and 26.23% compared with the classic statistical regression method. From the above analysis, it can be seen that the introduction of ML-based soft computing methods can significantly reduce the error and improve the accuracy of concrete compressive strength predictions.

4.2. Application of Ensemble Learning Models

Ensemble learning is a technique that combines multiple models to improve the overall performance of the system. Here are some common ensemble strategies:

Bagging involves training multiple models independently on different subsets of the training data and then combining their predictions by averaging or voting.
Boosting involves training multiple models iteratively, with each new model focusing on the examples that the previous models have struggled with.
Stacking involves training multiple models and then using their predictions as input to a higher-level model that learns how to combine them.
Cascading involves training a sequence of models, where the output of each model is used as input to the next one.

Based on actual engineering applications and practices, it can be seen that each strategy has its advantages and disadvantages, and the choice of strategy depends on the problem at hand and the characteristics of the data.

Figure 11 demonstrates the model parameter optimization process based on the meta-heuristic optimization algorithm. Note that since the heuristic optimization algorithm proposed in this article takes the maximum loss function as the optimization goal, it specifically uses a negative mean square error loss function. It can be seen from the figure that as the number of iterations increases in the optimization curve, the value of the target loss function gradually increases. The optimal result appears in the iteration of 93, with a value of −0.0814238.

Table 3 shows the comparison of the concrete compressive strength prediction results from different ensemble learning strategies. As can be seen in the table, compared with the other two ensemble learning strategies, the stacking-based ensemble learning strategy can better correct the problem of insufficient prediction accuracy caused by a single model. This is mainly due to the inconsistent concrete compression prediction effects of different single ML-based methods in different scenarios. The stacking ensemble learning strategy can make full use of the advantages of different benchmark models to achieve a more accurate prediction result.

Based on the abovementioned analysis, this study uses a stacking ensemble learning strategy for predictive model fusion. The proposed method uses the prediction results of a single independent learner in the previous chapters as model input and builds an integrated learning model through a combination of meta-heuristic optimization algorithms and gradient-boosting trees.

Figure 12 shows the integrated prediction results for the concrete compressive strength. It can be inferred from the figure that the predicted results for concrete compressive properties by the proposed method are relatively close to its real numerical results. Figure 13 demonstrates the radar chart analysis of the prediction results from different regression algorithms. For the R² evaluation index, the larger the number, the higher the model prediction effect. As can be seen in the figure, the prediction results of the ensemble learning strategy are significantly higher than other single ML-based methods in terms of the R² index. For the MAE and MSE evaluation indices, the larger the number, the lower the model prediction effect. As can be seen in the figure, the prediction results from the ensemble learning strategy are significantly smaller than other single ML-based methods in terms of the MAE and MSE indices.

4.3. Assessment of the Importance of Influencing Factors

Ensemble learning methods are important for estimating impact factors. This is because ensemble learning methods can combine the prediction results from multiple models to improve the accuracy and stability of predictions. In practical applications, the estimation of impact factors usually involves multiple models and algorithms, and each model and algorithm may have certain errors and biases. Using the ensemble learning method, the advantages of multiple models can be combined to obtain more accurate and reliable impact factor estimation results. Figure 14 shows the analysis of the importance of factors affecting the compressive strength of concrete. As can be seen in the figure, the top five parameters that most significantly affect the compressive strength performance of concrete are the amount of cement, the proportions of coarse aggregate and fine aggregate, the water–cement ratio, and the age of concrete. The data mining results are consistent with the mechanical property test results for conventional concrete materials. This is mainly because cement is the main cementing material and its dosage will significantly affect the compressive strength performance of concrete. Coarse and fine aggregates occupy an important concentration in the volume of concrete, and their proportional combination will significantly affect the load-bearing capacity of concrete. Cement age is the key to affecting the mechanical properties of concrete, and the water–cement ratio will affect the mechanical properties and fluidity of concrete.

5. Conclusions

This study introduces a novel hybrid method for predicting the compressive strength of concrete, which is crucial for evaluating the structural integrity of hydraulic structures. By integrating multiple soft computing algorithms and leveraging the stacking ensemble learning strategy, the method addresses the common issues of overfitting and low prediction accuracy in single machine learning models. It was proven that the proposed hybrid ensemble learning method, which combines the LightGBM model with a metaheuristic-based optimization algorithm, outperforms other advanced predictive methods, achieving a high regression coefficient of 0.9329, a mean absolute error (MAE) of 2.7695, and a mean square error (MSE) of 4.0891. This demonstrates its superior accuracy in predicting concrete compressive strength. Thanks to this study, more reliable predictions of the compressive strength of concrete used in hydraulic structures are possible, thereby providing a tool for assessing the service life and structural damage status. The results also highlight the primary factors influencing concrete strength, offering valuable insights for both researchers and engineers in the field of water-related construction.

In future research, conducting further indoor tests and concrete-related mechanical performance testing and evaluation work based on the characteristics and characteristics of the hydraulic concrete structure itself is necessary. In addition, we will start from multiple perspectives such as internal structural damage identification and structural damage detection to make more contributions to the field of the intelligent diagnosis and assessment of concrete structure damage.

Author Contributions

Conceptualization, methodology, experiment, software, T.L. (Tianyu Li); validation, funding, X.H.; formal analysis, investigation, experiment, T.L. (Tao Li); methodology, L.M. and J.G.; validation, resources, J.L. and H.T. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data are available upon request from the corresponding author.

Conflicts of Interest

Author Jinlong Gu was employed by the company Jiangsu Vast New Material Technology Co., Ltd. and Nantong Shengmao Building Materials Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. Diagram of the working principle of SVM.

Figure 2. Illustration of random forest trees.

Figure 3. The visual diagram of the ANN.

Figure 4. The architecture of the LightGBM model.

Figure 5. Chart of the social hierarchy structure of the GWO algorithm.

Figure 6. Schematic diagram of the stacking ensemble learning strategy.

Figure 7. Preparation of hydraulic concrete structure specimens and evaluation of the compressive performance.

Figure 8. Statistical description of the hydraulic concrete compressive strength dataset.

Figure 9. Correlation diagram of factors affecting the hydraulic concrete compressive strength.

Figure 10. Box plot of the results from the base models.

Figure 11. Model parameter optimization process based on the improved metaheuristic optimization algorithm.

Figure 12. Integrated prediction results for compressive strength.

Figure 13. Radar chart of the prediction results from different regression algorithms.

Figure 14. Analysis of the main factors affecting the compressive strength of concrete.

Table 1

The statistical description of the concrete compressive strength dataset.

Parameters	Mean	Standard Deviation	Minimum	Median	Maximum
Age of specimen	45.66	63.17	1	28.00	365
Blast furnace slag	73.90	86.28	0	22.00	359.4
Fly ash	54.19	64.00	0	0.00	200.1
Superplasticizer	6.20	5.97	0	6.4	32.20
Coarse aggregate	972.92	77.75	801	968	1145
Fine aggregate	773.58	80.18	594	779.50	992.6
Compressive strength/MPa	0.46	0.13	0.22	0.47	0.90

Table 2

The comparison of prediction results from different regression algorithms.

Model	R²	MAE	MSE
MLR_pre	0.4232	7.7456	9.7965
GPR_pre	0.9136	3.1365	4.5268
ANN_pre	0.8339	4.7842	6.0920
SVM_pre	0.8522	4.3545	5.8779
RF_pre	0.8399	3.9900	5.7142

Table 3

Prediction results from different ensemble learning strategies.

Method	R²	MAE	MSE
Bagging ensemble	0.9052	3.34	4.743
Boosting ensemble	0.9086	3.328	4.612
Stacking ensemble	0.9329	2.770	4.089

References

1. Mahjoubi, S.; Barhemat, R.; Meng, W.; Bao, Y. AI-Guided Auto-Discovery of Low-Carbon Cost-Effective Ultra-High Performance Concrete (UHPC). Resour. Conserv. Recycl.; 2023; 189, 106741. [DOI: https://dx.doi.org/10.1016/j.resconrec.2022.106741]

2. Ren, Q.; Li, H.; Zheng, X.; Li, M.; Xiao, L.; Kong, T. Multi-Block Synchronous Prediction of Concrete Dam Displacements Using MIMO Machine Learning Paradigm. Adv. Eng. Inform.; 2023; 55, 101855. [DOI: https://dx.doi.org/10.1016/j.aei.2022.101855]

3. Li, M.; Si, W.; Ren, Q.; Song, L.; Liu, H. An Integrated Method for Evaluating and Predicting Long-Term Operation Safety of Concrete Dams Considering Lag Effect. Eng. Comput.; 2021; 37, pp. 2505-2519. [DOI: https://dx.doi.org/10.1007/s00366-020-00956-6]

4. Shu, J.; Yu, H.; Liu, G.; Yang, H.; Guo, W.; Phoon, C.; Alfred, S.; Hu, H. Proposing an Inherently Interpretable Machine Learning Model for Shear Strength Prediction of Reinforced Concrete Beams with Stirrups. Case Stud. Constr. Mater.; 2024; 20, e03350. [DOI: https://dx.doi.org/10.1016/j.cscm.2024.e03350]

5. Kashem, A.; Karim, R.; Das, P.; Datta, S.D.; Alharthai, M. Compressive Strength Prediction of Sustainable Concrete Incorporating Rice Husk Ash (RHA) Using Hybrid Machine Learning Algorithms and Parametric Analyses. Case Stud. Constr. Mater.; 2024; 20, e03030. [DOI: https://dx.doi.org/10.1016/j.cscm.2024.e03030]

6. Getahun, M.A.; Shitote, S.M.; Abiero Gariy, Z.C. Artificial Neural Network Based Modelling Approach for Strength Prediction of Concrete Incorporating Agricultural and Construction Wastes. Constr. Build. Mater.; 2018; 190, pp. 517-525. [DOI: https://dx.doi.org/10.1016/j.conbuildmat.2018.09.097]

7. Li, Q.; Song, Z. Prediction of Compressive Strength of Rice Husk Ash Concrete Based on Stacking Ensemble Learning Model. J. Clean. Prod.; 2023; 382, 135279. [DOI: https://dx.doi.org/10.1016/j.jclepro.2022.135279]

8. Gomaa, E.; Han, T.; ElGawady, M.; Huang, J.; Kumar, A. Machine Learning to Predict Properties of Fresh and Hardened Alkali-Activated Concrete. Cem. Concr. Compos.; 2021; 115, 103863. [DOI: https://dx.doi.org/10.1016/j.cemconcomp.2020.103863]

9. Lyngdoh, G.A.; Zaki, M.; Krishnan, N.M.A.; Das, S. Prediction of Concrete Strengths Enabled by Missing Data Imputation and Interpretable Machine Learning. Cem. Concr. Compos.; 2022; 128, 104414. [DOI: https://dx.doi.org/10.1016/j.cemconcomp.2022.104414]

10. Liang, M.; Chang, Z.; Wan, Z.; Gan, Y.; Schlangen, E.; Šavija, B. Interpretable Ensemble-Machine-Learning Models for Predicting Creep Behavior of Concrete. Cem. Concr. Compos.; 2022; 125, 104295. [DOI: https://dx.doi.org/10.1016/j.cemconcomp.2021.104295]

11. Almustafa, M.K.; Nehdi, M.L. Machine Learning Prediction of Structural Response of Steel Fiber-Reinforced Concrete Beams Subjected to Far-Field Blast Loading. Cem. Concr. Compos.; 2022; 126, 104378. [DOI: https://dx.doi.org/10.1016/j.cemconcomp.2021.104378]

12. Nguyen, T.; Kashani, A.; Ngo, T.; Bordas, S. Deep Neural Network with High-Order Neuron for the Prediction of Foamed Concrete Strength. Comput. Civ. Infrastruct. Eng.; 2019; 34, pp. 316-332. [DOI: https://dx.doi.org/10.1111/mice.12422]

13. Hou, C.; Wei, Y.; Zhang, H.; Zhu, X.; Tan, D.; Zhou, Y.; Hu, Y. Stress Prediction Model of Super-High Arch Dams Based on EMD-PSO-GPR Model. Water; 2023; 15, 4087. [DOI: https://dx.doi.org/10.3390/w15234087]

14. Lu, Q.; Gu, Y.; Wang, S.; Liu, X.; Wang, H. Deformation Field Analysis of Small-Scale Model Experiment on Overtopping Failure of Embankment Dams. Water; 2023; 15, 4309. [DOI: https://dx.doi.org/10.3390/w15244309]

15. Imran, H.; Ibrahim, M.; Al-Shoukry, S.; Rustam, F.; Ashraf, I. Latest Concrete Materials Dataset and Ensemble Prediction Model for Concrete Compressive Strength Containing RCA and GGBFS Materials. Constr. Build. Mater.; 2022; 325, 126525. [DOI: https://dx.doi.org/10.1016/j.conbuildmat.2022.126525]

16. Ahmad, A.; Ostrowski, K.A.; Maślak, M.; Farooq, F.; Mehmood, I.; Nafees, A. Comparative Study of Supervised Machine Learning Algorithms for Predicting the Compressive Strength of Concrete at High Temperature. Materials; 2021; 14, 4222. [DOI: https://dx.doi.org/10.3390/ma14154222]

17. Shahmansouri, A.A.; Yazdani, M.; Ghanbari, S.; Akbarzadeh Bengar, H.; Jafari, A.; Farrokh Ghatte, H. Artificial Neural Network Model to Predict the Compressive Strength of Eco-Friendly Geopolymer Concrete Incorporating Silica Fume and Natural Zeolite. J. Clean. Prod.; 2021; 279, 123697. [DOI: https://dx.doi.org/10.1016/j.jclepro.2020.123697]

18. Joshi, D.A.; Menon, R.; Jain, R.K.; Kulkarni, A.V. Deep Learning Based Concrete Compressive Strength Prediction Model with Hybrid Meta-Heuristic Approach. Expert Syst. Appl.; 2023; 233, 120925. [DOI: https://dx.doi.org/10.1016/j.eswa.2023.120925]

19. Li, Y.; Bao, T.; Li, T.; Wang, R. A Robust Real-Time Method for Identifying Hydraulic Tunnel Structural Defects Using Deep Learning and Computer Vision. Comput. Civ. Infrastruct. Eng.; 2022; 38, pp. 1381-1399. [DOI: https://dx.doi.org/10.1111/mice.12949]

20. Rahman, J.; Ahmed, K.S.; Khan, N.I.; Islam, K.; Mangalathu, S. Data-Driven Shear Strength Prediction of Steel Fiber Reinforced Concrete Beams Using Machine Learning Approach. Eng. Struct.; 2021; 233, 111743. [DOI: https://dx.doi.org/10.1016/j.engstruct.2020.111743]

21. Zhong, H.; Lv, Y.; Yuan, R.; Yang, D. Bearing Fault Diagnosis Using Transfer Learning and Self-Attention Ensemble Lightweight Convolutional Neural Network. Neurocomputing; 2022; 501, pp. 765-777. [DOI: https://dx.doi.org/10.1016/j.neucom.2022.06.066]

22. Cai, R.; Han, T.; Liao, W.; Huang, J.; Li, D.; Kumar, A.; Ma, H. Prediction of Surface Chloride Concentration of Marine Concrete Using Ensemble Machine Learning. Cem. Concr. Res.; 2020; 136, 106164. [DOI: https://dx.doi.org/10.1016/j.cemconres.2020.106164]

23. Kang, F.; Liu, X.; Li, J. Temperature Effect Modeling in Structural Health Monitoring of Concrete Dams Using Kernel Extreme Learning Machines. Struct. Health Monit.; 2020; 19, pp. 987-1002. [DOI: https://dx.doi.org/10.1177/1475921719872939]

24. Breiman, L. Random Forests. Mach. Learn.; 2001; 45, pp. 5-32. [DOI: https://dx.doi.org/10.1023/A:1010933404324]

25. Williams, C.; Rasmussen, C. Gaussian Processes for Regression. Advances in Neural Information Processing Systems 8; Bradford Books: Denver, CO, USA, 1995.

26. Zou, J.; Han, Y.; So, S.-S. Overview of Artificial Neural Networks. Artificial Neural Networks; Methods in Molecular Biology Humana Press: Totowa, NJ, USA, 2009; pp. 14-22.

27. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst.; 2017; 30, 52.

28. Xu, Y.; Qian, W.; Li, N.; Li, H. Typical Advances of Artificial Intelligence in Civil Engineering. Adv. Struct. Eng.; 2022; 25, pp. 3405-3424. [DOI: https://dx.doi.org/10.1177/13694332221127340]

29. Feng, D.C.; Liu, Z.T.; Wang, X.D.; Jiang, Z.M.; Liang, S.X. Failure Mode Classification and Bearing Capacity Prediction for Reinforced Concrete Columns Based on Ensemble Machine Learning Algorithm. Adv. Eng. Inform.; 2020; 45, 101126. [DOI: https://dx.doi.org/10.1016/j.aei.2020.101126]

Word count: 5345

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Concrete is the material of choice for constructing hydraulic structures in water-related buildings, and its mechanical properties are crucial for evaluating the structural damage state. Machine learning models have proven effective in predicting these properties. However, a single machine learning model often suffers from overfitting and low prediction accuracy. To address this issue, this study introduces a novel hybrid method for predicting concrete compressive strength by integrating multiple soft computing algorithms and the stacking ensemble learning strategy. In the initial stage, several classic machine learning models are selected as base models, and the optimal parameters of these models are obtained using the improved metaheuristic-based gray wolf algorithm. In the subsequent stage, the lightweight gradient boosting tree (LightGBM) model and the metaheuristic-based optimization algorithm are combined to integrate information from base models. This process identifies the primary factors affecting concrete compressive strength. The experimental results demonstrate that the hybrid ensemble learning and heuristic optimization algorithm achieve a regression coefficient of 0.9329, a mean absolute error (MAE) of 2.7695, and a mean square error (MSE) of 4.0891. These results indicate superior predictive performance compared to other advanced methods. The proposed method shows potential for application in predicting the service life and assessing the structural damage status of hydraulic concrete structures, suggesting broad prospects.

Details

Title

Enhanced Prediction and Evaluation of Hydraulic Concrete Compressive Strength Using Multiple Soft Computing and Metaheuristic Optimization Algorithms

Author

Li, Tianyu¹

; Hu, Xiamin²

; Li, Tao²; Liao, Jie²; Mei, Lidan²; Tian, Huiwen³

; Gu, Jinlong⁴

¹ School of Civil Engineering, Sanjiang University, Nanjing 210012, China[email protected] (J.L.); [email protected] (L.M.); The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing 210024, China
² School of Civil Engineering, Sanjiang University, Nanjing 210012, China[email protected] (J.L.); [email protected] (L.M.)
³ Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
⁴ Jiangsu Vast New Material Technology Co., Ltd., Nantong 202158, China; [email protected]; Nantong Shengmao Building Materials Technology Co., Ltd., Nantong 202158, China

First page

3461

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

20755309

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/buildings14113461

ProQuest document ID

3133031037

Enhanced Prediction and Evaluation of Hydraulic Concrete Compressive Strength Using Multiple Soft Computing and Metaheuristic Optimization Algorithms

Jump to:

Full Text

1. Introduction

2. Methodology

2.1. Statistical and ML-Based Algorithms

2.2. The Metaheuristic-Based Parameter Optimization Algorithm

2.3. Stacking Ensemble Learning

2.4. Evaluation Indicators

3. Case Study

3.1. Concrete Strength Prediction Database

3.2. Data Collection and Preprocessing

3.3. Model Building and Evaluation

4. Analysis of the Experimental Results

4.1. Comparison of ML-Based Models

4.2. Application of Ensemble Learning Models

4.3. Assessment of the Importance of Influencing Factors

5. Conclusions

Abstract

Details

Suggested sources