Content area
Accurate geomechanical parameters are key factors for stability evaluation, disaster forecasting, structural design, and supporting optimization. The intelligent back analysis method based on the monitored information is widely recognized as the most efficient and cost-effective technique for inverting parameters. To address the low accuracy of measured data, and the scarcity of comprehensive datasets, this study proposes an innovative back analysis framework tailored for small sample sizes. We introduce a multi-faceted back analysis approach that combines data augmentation with advanced optimization and machine learning techniques. The auxiliary classifier generative adversarial network (ACGAN)-based data augmentation algorithm is first employed to generate synthetic yet realistic samples that adhere to the underlying probability distribution of the original data, thereby expanding the dataset and mitigating the impact of small sample sizes. Subsequently, we harness the power of optimized particle swarm optimization (OPSO) integrated with support vector machine (SVM) to mine the intricate nonlinear relationships between input and output variables. Then, relying on a case study, the validity of the augmented data and the performance of the developed OPSO-SVM algorithms based on two different sample sizes are studied. Results show that the new datasets generated by ACGAN almost coincide with the actual monitored convergences, exhibiting a correlation coefficient exceeding 0.86. Furthermore, the superiority of the OPSO-SVM algorithm is also demonstrated by comparing the displacement prediction capability of various algorithms through four indices. It is also indicated that the relative error of the predicted displacement values reduces from almost 20% to 5% for the OPSO-SVM model trained with 25 samples and that trained with 625 samples. Finally, the inversed parameters and corresponding convergences predicted by the two OPSO-SVM models trained with different samples are discussed, indicating the feasibility of the combination application of ACGAN and OPSO-SVM in back analysis of geomechanical parameters. This endeavor not only facilitates the progression of underground engineering analysis in scenarios with limited data, but also serves as a pivotal reference for both researchers and practitioners alike.
Abstract
Accurate geomechanical parameters are key factors for stability evaluation, disaster forecasting, structural design, and supporting optimization. The intelligent back analysis method based on the monitored information is widely recognized as the most efficient and cost-effective technique for inverting parameters. To address the low accuracy of measured data, and the scarcity of comprehensive datasets, this study proposes an innovative back analysis framework tailored for small sample sizes. We introduce a multi-faceted back analysis approach that combines data augmentation with advanced optimization and machine learning techniques. The auxiliary classifier generative adversarial network (ACGAN)-based data augmentation algorithm is first employed to generate synthetic yet realistic samples that adhere to the underlying probability distribution of the original data, thereby expanding the dataset and mitigating the impact of small sample sizes. Subsequently, we harness the power of optimized particle swarm optimization (OPSO) integrated with support vector machine (SVM) to mine the intricate nonlinear relationships between input and output variables. Then, relying on a case study, the validity of the augmented data and the performance of the developed OPSO-SVM algorithms based on two different sample sizes are studied. Results show that the new datasets generated by ACGAN almost coincide with the actual monitored convergences, exhibiting a correlation coefficient exceeding 0.86. Furthermore, the superiority of the OPSO-SVM algorithm is also demonstrated by comparing the displacement prediction capability of various algorithms through four indices. It is also indicated that the relative error of the predicted displacement values reduces from almost 20% to 5% for the OPSO-SVM model trained with 25 samples and that trained with 625 samples. Finally, the inversed parameters and corresponding convergences predicted by the two OPSO-SVM models trained with different samples are discussed, indicating the feasibility of the combination application of ACGAN and OPSO-SVM in back analysis of geomechanical parameters. This endeavor not only facilitates the progression of underground engineering analysis in scenarios with limited data, but also serves as a pivotal reference for both researchers and practitioners alike.
Keywords: Back analysis; Machine learning; Data augmentation; Geomechanical parameters
(ProQuest: ... denotes formulae omitted.)
1 Introduction
The geomechanical parameters of surrounding rock mass are of vital importance for the layout and supporting system planning, design, and construction in underground engineering. Due to the randomness, complexity, and spatial diversity of geological mass, the estimation methods of geological parameters by field or laboratory experiments are not representative, leading to inaccurate stability evaluation and unreliable supporting system design. Numerical simulation has been a promising tool for stability analysis and structural behavior prediction with real geological conditions (Liu et al., 2023). While numerous accurate geological parameters are required, such as Elastic modulus, cohesion, and Poissons ratio. Therefore, it is necessary to estimate such parameters accurately to construct the actual numerical model for further analysis.
The back analysis method, which estimates the geomechanical parameters by forward and backward procedures, is considered to be one of the most useful methods. It actually means a thought to acquire the inner relationship of geomechanical parameters and monitoring data first, and then according to the actual monitoring data, the geomechanical parameters are adjusted and updated to minimize the fitness function, normally the error between the monitored and calculated data. Generally, the back analysis method can be subdivided into three categories (Qi & Fourie, 2018), including analytical or semi-analytical methods (Bertuzzi, 2017), numerical simulations (Luo et al., 2018), and artificial intelligence (AI)-based methods (Li et al., 2023b; Chang et al., 2023). The analytical or semianalytical methods are based on the traditional convergence-confinement methods with lots of assumptions that are hard to meet (Chu et al., 2019). Numerical simulations including finite element method (FEM), discrete element method (DEM), and finite discrete element method (FDEM) estimate the geomechanical parameters by trial and error. It requires repetitive computation to obtain the calculated data and then compare it with the actual monitoring data (Manzanal et al., 2016). If the error between them satisfies the minimum error threshold, the geological parameters corresponding to the computed data are regarded as the approximation of the actual condition. Updating will be carried out for the geomechanical parameters if the error does not meet the threshold. It is no doubt time-consuming and redundant. AI-based back analysis methods fully utilize the benefits of machine learning algorithms (Mahmoodzadeh et al., 2021; Gao, 2020; Zhang et al., 2021; Kashani et al., 2021). In the back-forward procedure, based on the dataset obtained from the forward analysis, the machine learning (ML) algorithms are used to explore the highly nonlinear relationship between the monitoring data and geomechanical parameters. Nevertheless, the performance and generalization of the trained ML model depend mainly on the quality of the provided dataset. It seems to be impossible to reflect the high-order relationship without overfitting based on the small samples (Ding et al., 2020).
Currently, the data augmentation algorithms include geometric transformations, window cropping (Iwana & Uchida, 2021), random forest (RF) (Zhang et al., 2022), support vector machine (SVM), and some neural network-based algorithms (Goodfellow et al., 2014; Radford et al., 2015; Zhu et al., 2017), such as artificial neural networks (ANNs) (Bani-Hani, 2007; Ni & Li, 2016), generative adversarial networks (GANs), deep neural networks (DNNs) and convolution neural networks (CNNs) (Oh et al., 2020; Zhang et al., 2019). Particularly, deep convolution generative adversarial network (DCGAN) has been proven to exhibit excellent performance in mining the high-dimensional and nonlinear distribution characteristics of the relationship between the input and output data (Mao et al., 2021). Conditional GAN (CGAN) added class label constraints to the original GAN, making the samples generated by GAN more controllable (Mirza & Osindero, 2014). Based on CGAN, auxiliary classifier generative adversarial network (ACGAN) enhances sample generation quality by incorporating label information into the input of the generator, allowing for the production of multi-mode and high-quality samples (Zou et al., 2020). Additionally, the discriminator in ACGAN is capable of discerning both sample authenticity and category (Odena et al., 2017). GANs were widely applied in various fields, especially in image processing, natural language processing, and anomaly detection (Fahimi et al., 2020; Lee et al., 2019). By training the structural probability distribution of samples, GANs can generate a new dataset which follows the distribution characteristics of real sample data. The above algorithms are mainly applied to sequential data or image data (Argilaga, 2023; Kim et al., 2023). For the dataset between the geomechanical parameters of surrounding rock mass and the stability response, the numerical experimental groups of geomechanical parameters are usually obtained from Latin hypercube sampling technology (McKay et al., 1979), orthogonal experimental design (Zhao et al., 2020), uniform design (Xiong et al., 2019), and random generation (Sun et al., 2023). They are discrete and may be independent. Specifically, the stability response values of the surrounding rock mass used for back analysis are normally deterministic values of a certain time, such as the computed values when the specific floor of the underground powerhouse is excavated and the mechanics response is stable (Li et al., 2023a). Due to the timeconsuming computation of numerical simulation and low accuracy of measured data, combination of various sources of data also cannot meet the requirement of machine learning algorithms. It is essential to enhance discrete datasets by fully mining the probability distribution of real data. The improvement based on ACGAN is essential for discrete data augmentation.
In this study, to address the inherent challenges in underground engineering analysis, particularly the timeconsuming nature of numerical simulations, the low accuracy of measured data, and the scarcity of comprehensive datasets, we propose an innovative back analysis framework tailored for small sample sizes. The scarcity of sequential enriched data during the actual construction process poses a significant barrier to the application of advanced machine learning algorithms. To overcome this limitation, we introduce a multi-faceted back analysis approach that combines data augmentation with advanced optimization and machine learning techniques. The ACGAN-based algorithms are first employed to generate synthetic yet realistic samples that adhere to the underlying probability distribution of the original data, thereby expanding the dataset and mitigating the impact of small sample sizes. Then, the optimized particle swarm optimization (OPSO)-SVM algorithm is used as the tool for mining the intricate nonlinear relationship between the input and output data, and this hybrid OPSO-SVM algorithm serves as a substitute tool for not only uncovering these relationships but also optimizing the geomechanical parameters to minimize the discrepancy between calculated and monitored displacement values. As a case study, the proposed framework is verified on a representative underground engineering project by comparing the inversed accuracy. This work not only contributes to the advancement of underground engineering analysis under limited data conditions but also provides a valuable reference for researchers and practitioners alike.
2 Methodology
2.1 Convolutional neural network
The concept of CNN was first proposed by Fukushima in 1980 (Fukushima, 1980). In 1989, Lecun et al. improved it to solve the handwritten character recognition problem (Lecun et al., 1989). Results showed that the improved network performed much better than any other technique (Lecun & Bottou, 1998). CNN can be regarded as the improvement of a conventional artificial neural network (ANN). The matrix-format data with pixel values, like the pixels of an image, which are dependent on each other, could benefit from the hierarchical feature extraction capabilities of CNNs. For data with variations in the distribution and scale of features, CNNs can learn to handle such variations effectively, leading to more robust models. Besides, it is also adept in scenarios where the data exhibits spatial relationships or complex patterns that traditional models may struggle to capture. Compared with traditional ANN, the convolution layer, pooling layer, activation layer, and full connection layer constitute the hidden layers of CNN, and the combination of them with the input and output layers form the structure of CNN. Particularly, the alternating convolution layer and pooling layer are the key modules for feature extraction with high dimensions. By setting the size of the convolution kernel (length x width x depth, named filter), the original matrix-format data is divided into various parts. After the arithmetic operations between each block and filter, the original data can be transferred into new feature maps by moving filters throughout the input feature maps. Lastly, the feature maps of convolution and pooling layers are merged with a full connection layer, and then, the output features are treated by the activation layer and transferred to the output layer. The typical architecture of CNN is shown in Fig. 1. All the input information in CNN is essential to be processed into matrix form with pixels. For each pixel, it always has multi-channels corresponding to various information. For example, a colorfully digital image has three channels and each channel means one of the primary colors, generally red, green, and blue. In each channel, the image constitutes a matrix-formed pixel map with a single color. Thus, the pixel of the color image includes three values corresponding to the intensities in the three channels (Jiang et al., 2023). For the non-image discrete data, different types of variables at the same position of the surrounding rock mass can be treated as channels and the variable values are intensities of each channel.
2.2 Generative adversarial network
Learn from the minimax game theory, the GAN is a contest system containing two players (networks): a generator and a discriminator (Goodfellow et al., 2014). The generator of GAN is the tool to generate new data from the noise sample, and the discriminator is able to distinguish its true or false (see Fig. 2). Correspondingly, the generator updates until the generated data can trick the discriminator, and the discriminator is trained until the true and false data can be identified. The ultimate aim of GANs is to reach Nash equilibrium by competition of the generator and discriminator during training. Assuming that a given data is x, z is the noise data, D( ) is the discriminator network, and the G( ) is the generator network. By training the input data and corresponding output data, the generator network will generate forged data for the given noise data (G(z)). Then, the discriminator network will judge the truth of the given data. Thus, D(G(z)) is the probability for the discriminator network to judge the generated data as true. On the one hand, the generator network is expected to have a robust function to generate the data very close to the true output information. While, on the other hand, the discriminator has also been given high expectations to identify the real information from the fake information. It can be regarded as a game, in which the discriminator attempts to maximize the discrimination accuracy (lgD (x)) and the generator tries to minimize the probability (lg(1 D(G(x)))) of data being identified as fake through interactive computations. The loss function of the training process is
... (1)
where E represents the expectations of the specific data distribution; pz(z) is the noise distribution space; Pdata is the distribution space of the real data information without noise.
2.3 Auxiliary classifier generative adversarial network
ACGAN is the extension of CGAN (Odena et al., 2017), which receives the class conditions as input along with the data and classifies the dataset into different classes. It also replaces multilayer perceptron networks of the generator and discriminator with CNN architectures (Radford et al., 2015), resulting in more robust performance on the feature extraction. The incorporation of class labels within the conditional generative adversarial network (CGAN) architecture directs the generator to produce data corresponding to predefined class labels. ACGAN, an extension of CGAN, innovatively omits the direct provision of class label data to the discriminator. Rather, the discriminator undergoes training to forecast the class labels associated with the data it assesses. This entails not only discern the authenticity of the data but also accurately identify the source or class label of the provided data (Odena et al., 2017). The loss functions of source and label are defined as
... (2)
... (3)
where P(S= real real) and P(S= generated |xgenerated) are the probabilities that the real data is judged to be true and the fake data is identified as false, respectively; P(Class c|xreal) and P(Class = c|xgenerated) represent the category probabilities of the real and generated data, respectively. The training process is to maximum LSource LClass for the discriminator, and to maximum LClass - LSource for the generator.
The ACGAN has been proven to have more outstanding capability to augment data and resolve data imbalance (Ding et al., 2022; Li et al., 2022). Different from GAN, the convolution layer, the batch normalization layer, and the activation layer constitute the structure of ACGAN. The convolution layer is used for feature extraction. The original data will be centralized, and the training efficiency can be improved by introducing the normalization layer. The activation layers are applied to all layers, while the activation function LeakyReLU is utilized in all layers apart from the output layers of the generator and discriminator networks. The activation functions Sigmoid and softmax are used in the output layer of the discriminator to identify the real data and category of provided sample, respectively. Figure 3 shows the general structure of the ACGAN used in this study. For the discrete data, the original data is first reconstructed into a matrix with the shape of 1 x n x 1 for a signal sample. The value of each pixel is the value for the nth variable. The input of the generator is the combination of noise data and multiple label information for various variables, and the output is the generated values of samples. For the discriminator, the input is the data whose scale is the same as the original data, and the output includes the authenticity that the discriminator judges for the input data and the predicted labels of variables.
2.4 Framework of the comprehensive back analysis technique
Particle swarm optimization (PSO) is one of the most popular algorithms to solve the optimization problem in geotechnical engineering (Khatti et al., 2024). It is inspired by the animal foraging behavior and was first proposed by Eberhart and Kennedy (1995). Due to the nonadjustable learning ability and the diminishing searching area during iteration, the traditional PSO algorithm has been improved by us from two aspects, namely optimized particle swarm optimization (OPSO) (Li et al., 2023b). The SVM method has been proven to have good performance on feature extraction based on small datasets. However, the generalization of the SVM model is mainly affected by the hyperparameters of the algorithms. The trial-and-error method is no doubt time-consuming and inefficient. Meanwhile, the estimation of the hyperparameters can be regarded as an optimization problem, which can be figured out with the aid of OPSO. Therefore, the combination of OPSO and SVM is used for the optimization of geomechanical parameters corresponding to the real structural response.
As aforementioned, the ACGAN is of potential to generate additional data that is in conicoid with the inherent regularity of the initial samples. The generated samples enrich the training database to avoid over- or underfitting. Based on the augmented database, the novel back analysis method OPSO-SVM is employed to mine the complex relationship and then earn the optimal parameters set by the intelligent optimization algorithm. Thereafter, a multi-faceted back analysis approach combining data augmentation with advanced optimization and machine learning techniques is proposed to inverse the rock parameters corresponding to the monitored data. The framework is presented in Fig. 4, including three steps, i.e., data generation in phase I, hyperparameters optimization in phase II, and geomechanical parameters estimation in phase III. In Fig. 4, C is the penalty coefficient of the objective function; σ2 is the coefficient of the kernel function; ε is the slack variable.
3 Case study
3.1 Data source
To demonstrate the performance of the proposed data augmentation method and the back analysis technique based on the OPSO-SVM algorithm, a case study is introduced in this section. The case is a tunnel project, in which the tunnel is buried in an interlayered soft-hard surrounding rock mass. The diameter of the tunnel is 16 m with a buried depth of 230 m. The FEM is used to calculate the mechanical response. To eliminate the boundary effect, the model is built as 180 m (width) x 200 m (length) x 150 m (height). The behavior of the surrounding rock mass is based on the elastoplastic theory and adheres to the MohrCoulomb failure criterion. Four sides of the model are constrained in the normal direction, and the bottom side is restrained in three directions. The buried effect of the rock mass 200 m above the upper surface is simulated by a compressive stress of 5 MPa. The elastic moduli and cohesions are regarded as the parameters to be inversed. The searching ranges and the settings of other parameters are all tabulated in Table 1.
To verify the feasibility of the numerical model and constitutive relations used in this article, we compare the numerical deformation characteristics of interlayered softhard rock mass under the lateral coefficient of 0.6, 1.0, and 1.5 with in situ ones. As shown in Fig. 5, the lateral coefficient has a vital influence on the deformation distribution features of the rock mass. The distribution of deformation has clear asymmetric features, showing the squeezing failure of the soft rock into the free surface. For example, when the horizontal stress is 1.5 times the vertical stress, the excavated tunnel is squeezed into an ellipse along the vertical orientation, resulting from the horizontal convergence of the soft rock mass. On the contrary, when the horizontal stress is 0.6 times the vertical stress, the maximum deformation of the surrounding rock mass is located in the soft rock atop the tunnel, resulting in an ellipse tunnel section along the horizontal direction. The deformation features are consistent with the in-situ observations reported by other researchers (Blumling et al., 2007; Chen et al., 2019; Xu et al., 2020).
As a common and accessible monitoring parameter, displacement of surrounding rock mass resulting from the tunnel excavation is applied for the back analysis of geomechanical parameters. Four monitoring points in the soft and hard rocks are selected in this study, whose positions are shown in Fig. 6, and the orthogonal experiment design is adopted to generate the parameter groups for the monitored displacement data. Because of the inefficient numerical analysis, only 25 parameter groups are generated, and the corresponding convergence responses at the four monitoring points are recorded.
3.2 Data preprocessing
Table 2 presents the 25 samples generated by the orthogonal experiment design. As shown in this table, each group of geomechanical parameters corresponds to four convergence values at the specific monitoring points. For this study, the training process of ACGAN aims to generate sufficient and precise datasets that follow the distribution of the original data. As aforementioned, the input of ACGAN is commonly matrix-format data with pixel values. To generate new data that obeys the logical relationship of the provided data record, each result data can be regarded as the parameter combination with four labels. That is, for each numerical model, the four convergences are rebuilt into an image with the size of 1 x 4 x 1 (one row, four columns and one channel in depth), and correspondingly, this data belongs to four labels classification. Furthermore, the geomechanical parameters are all divided into five categories to carry out the orthogonal experiment, thus, the parameters of different values are labeled as five classes. For example, as shown in Fig. 7 and Table 3, each pixel of the input data is the corresponding convergence value of geomechanical parameter sets and each geomechanical parameter set owns its specific multilabel. Specifically, as shown in Table 3, for elastic moduli of hard rock (Label 1), levels 04 represent the values of 6, 7, 8, 9, and 10 GPa, respectively; for cohesions of hard rock (Label 2), levels 04 represent the values of 2.0, 2.5, 3.0, 3.5, and 4.0MPa, respectively; for elastic moduli of soft rock (Label 3), levels 04 represent the values of 1.0, 1.5, 2.0, 2.5, and 3.0 GPa, respectively; for cohesions of soft rock (Label 4), levels 04 represent the values of 0.60, 0.95, 1.30, 1.65, and 2 MPa, respectively. Therefore, the feature extraction of ACGAN is transferred into the problem of training the parameter groups with the labels, and the process for new data generation is then the prediction of generator when given the label combination.
3.3 Results and verification
3.3.1 Verification of the data augmentation technique
The size of the filter in the generator is 3, the stride size is 2, and the treatment to pad the data with zero is applied to ensure the size consistency of the feature maps. The output values for each layer are activated by the activation function LeakyReLU, and then fed to the next layer. Batch normalization layers are conducted in both the generator and the discriminator to improve the training efficiency. For the discriminator, the size of the convolution kernels is 3, and the convolution stride is 2. Dropout layers with a dropout ratio of 20% follow each convolution layer to avoid overfitting. The output layers include the probability of the input data to be true, and the four labels activated by the activation functions Sigmoid and Softmax, respectively. The detailed structure can be seen in Fig. 3. After 10 000 iterations on the basis of the technique Adam, the generator and the discriminator are all of promising performances on the generation of authentic data and the discrimination of the authenticity for the input data, respectively. The evolution curves of the loss functions Discriminator and Generator during the training process are shown in Fig. 8, which indicates that the loss values decrease rapidly at the initial training stage and then decrease gradually as the iterative number increases. Specifically, the loss functions become stable until after many iterations, which is severely affected by the hyperparameters of the proposed model, and for faster convergence speed, the hyperparameters need to be estimated through optimization algorithms, which is also a further study for us.
To evaluate the model performance, five indices are proposed, including the root mean squared error (RMSE), determination coefficient (R2), relative error (σ) , variance accounted for (VAF), and A-20 index (IA-20). The functions are described as follows:
... (4)
... (5)
... (6)
... (7)
... (8)
where yi and yi represent the predicted and real values, respectively; yi is the mean of real values; n is the number of sample points; m20 means the number of samples whose predicted values fall within 0.8 to 1.2 times the true values; N is the number of given samples.
During the data generation process to generate new data that adheres to the original nonlinear mapping relationship using the ACGAN algorithm, the training dataset is used for the training of the ACGAN model, and the testing dataset serves as an evaluation tool to validate the trained model. In order to investigate the impact of different split ratios on the generation capability of ACGAN, we further divide the initial datasets into ratios of 8:2, 7:3 and 6:4. It should be noted that all parameter settings for the ACGAN algorithm remain consistent throughout this study. The generated samples are compared with the provided initial data, and the relative error (σ) is analyzed and presented in Fig. 9. It can be observed that when utilizing a split ratio of 8:2, there exists consistently lower relative error across all monitoring points compared to other split ratios such as 7:3 or 6:4. This suggests that a smaller quantity of training data is insufficient for effectively capturing and mining the inner relationships among variables within this algorithmic framework. Therefore, we divide the numerical dataset into training and testing datasets with the ratio of 8:2.
Figure 10 shows the comparison of the true values with the generated ones based on the ACGAN algorithm. In these figures, the horizontal coordinates (x) are the true convergences calculated by the numerical method, and the vertical coordinates (y) represent the generated values of the trained ACGAN model. The red lines are the bestfitting lines of the points composed of true and generated convergences. The green lines represent the lines in which x equals y. Taking into consideration the comparison of the convergence on the monitoring point S1, all the sample points lie near the 45° diagonal with an R2 value of 0.96, which indicates remarkable agreement between the two sets of results. As for the convergence on the monitoring point H2, the sample points are slightly deviated from the 45° diagonal, but still along the y = x line with an R2 value of 0.86 and an RMSE value of 0.43. From the figures illustrated in this study, all the generated values agree remarkably with the corresponding provided values with a mean R2 value of nearly 0.94. Moreover, the slopes of the best-fitting lines are all close to 1 with a correlation coefficient no less than 0.9, indicating the feasibility of the data augmentation technique in data generation without reductant numerical analysis.
The index called the relative error (Eq. (6)) is also raised to discuss the performances of the data augmentation techniques (ACGAN, AE and SVM algorithms). As shown in Fig. 11, for ACGAN, the relative error of the augmented and original 25 sample data is almost within 15%, and the average relative errors for the four points are all below 6%, which are smaller than those of the other two algorithms, demonstrating the validity of the generation ability of ACGAN for synthetic data.
3.3.2 Performance of intelligent models trained by various datasets
To validate the superiority of the proposed OPSO-SVM algorithm on the data prediction, the performances of the universal machine learning algorithms are compared based on four indices. Due to the limitation to the paper length, the detailed theories are introduced in other studies. Three algorithms including multiple linear regression (MLR) (Montgomery et al., 2012), OPSO-SVM algorithms (Li et al., 2023a), the PSO-artificial neural network (ANN) algorithm (McCulloch et al., 1943; Lecun et al., 2015), and the ensemble algorithm integrating LightGBM (Ke et al., 2017; Liang et al., 2020), XGBoost (Chen & Guestrin, 2016; Zhu et al., 2024), CatBoost (Hancock & Khoshgoftaar, 2020) and NGBoost algorithms (Duan et al., 2020) are trained on 25 original datasets and 625 generated datasets. The 25 samples are partitioned into training and testing datasets by the ratio of 8:2, while the 625 samples are divided into training, validation and testing datasets at a ratio of 6:2:2. To evaluate the trained model fairly, apart from the sample size, all of the parameter settings are the same for the same algorithm. The searching ranges of the hyperparameters of the SVM algorithm are C = [0.01, 1000], σ2 = [0.1, 10], and ε = [10-8, 0.1]. The parameters in the OPSO algorithm are set as follows: the learning rates are 2, and the minimum and the maximum of the inertia weights are 0.4 and 0.9, respectively. The customized time node is set as 150 and the iteration will be terminated after 200 times. The initial particle number is 20, and 30% of the particles will be reinitialized at each iteration. The parameter settings are default for MLR, LightGBM, XGBoost, CatBoost and NGBoost algorithms. The ANN is designed as four fully connected layers, in which the PSO is applied for the optimum of the quantity of the neurons. To be fair, the PSO hyperparameters and the particle characteristics are the same as those set in the OPSO-SVM algorithm.
Comparisons of the performance for the six algorithms are exhibited in Fig. 12. It can be seen that for each algorithm, the model trained with the augmented dataset performs much better than that trained with the original insufficient dataset, which indicates the feasibility and robustness of the proposed data augmentation method for the underground engineering data. Besides, due to the superiority of the SVM model in small datasets and the optimization capability of the OPSO algorithm in hyperparameter optimization, the remarkable performance of the OPSO-SVM algorithm is also validated with higher R2, VAF, and IA-20 values compared to other algorithms. In addition, because of the augmented dataset, the OPSO-SVM model trained by 625 samples outperforms the other seven models with the lowest RMSE and highest R2, VAF, and IA-20 values.
Figures 13 and 14 show the comparison between the monitored convergence values and the predicted ones on the training, testing, and validation datasets, in which the horizontal axis (x-axis) represents the monitored convergence used for the OPSO-SVM model training, and the vertical axis (y-axis) indicates the values predicted by the OPSO-SVM trained model with various data of different quantities. Results show that the prediction performance of the OPSO-SVM model trained by 25 samples is obviously worse than that trained by 625 sample data. For example, for the former model, the points representing convergence values of the monitoring point S1 on the training dataset distribute along the 1:1 line with an R2 value of 0.97 and an RMSE value of 0.97. Moreover, as shown in Fig. 13 (c), the predicted convergence values on point H1 of the training dataset deviate from the diagonal a lot with the best fitting line being y = 0.88x + 1.92. Besides, the R2 of the points with the y = x line is just 0.82. For the model trained with 625 samples, the monitored and predicted values are almost conicoid with the y = x line no matter they are on the training, validation, or testing datasets. Figure 14 (a) shows the results on the three datasets for the convergence values at point S1. It indicates that the function of the best-fitting line is just y = x, and the mean R2 and RMSE for this line with the points are 0.998 and 0.243, respectively. Besides, compared to the performance of the OPSO-SVM model trained with 25 samples, the predicted convergence values for all points are in consistent with the monitored values with the R2 very close to 1, indicating promising predictive ability. The relative errors of the two models on the training, validation and testing datasets are also compared in Figs. 15 and 16. It can be seen that the relative errors of the model trained by 25 samples are almost within 20%. Nevertheless, the relative errors for the model trained with 625 samples are nearly no more than 5% on the training, testing and validation datasets. Furthermore, the statistical interpretations of the relative errors for the two models on the training dataset are exhibited in Fig. 17. It reveals that the prediction capability of the model trained with 25 samples is much more unstable and this trained model has higher error means than that of the model trained with 625 samples. Specifically, the predicted relative error has been reduced from 8.54% to 0.78% for point S2, and the mean relative error decreases from 5.2% to 0.82%. In this regard, the OPSO-SVM algorithm is generally of satisfying potential in displacement prediction based on the small datasets. In addition, the quantity of samples is the key factor affecting the prediction capability and generalization. A large number of training datasets benefit the algorithm in mining more inner information of data and building highly nonlinear mapping relationships of variables.
3.3.3 Back analysis of geomechanical parameters
Displacement back analysis represents a robust approach for expedient and accurate estimation of geomechanical parameters. These parameters, critical for the stability assessment of surrounding rock masses during construction, are often challenging to obtain through experimental means. They encompass elastic properties, plasticity characteristics, damage parameters, rheological properties, etc. The determination of geomechanical parameters through back analysis can be formulated as an optimization problem, with the objective of minimizing errors between calculated and observed displacements. Thus, the optimal values of these parameters correspond to those yielding calculated displacements closely matching the observed ones. The iterative nature of back analysis aims to iteratively derive a set of geomechanical parameters, namely X· = [x·1 x·2, · · ·, x·m]T, such that the following formula can be approximated:
... (9)
where f(·) is the fitness function, u·i mean he true displacements at the ith monitoring point, and ui(X) represents the calcul d displacements for specific geomechanical parameters X = [x1 x2, · · ·, xm]T, (m is the number of geomechanical parameters to be inversed) at the ith monitoring point.
In this study, the realities of the geomechanical parameters are assumed as 8 GPa, 3.2 MPa, 2.6 GPa, and 1.6 MPa, and the corresponding real monitored convergence are 11.28, 17.94, 9.09, and 5.40 mm, respectively. Based on the two trained OPSO-SVM models, the reductant and time-consuming numerical analysis is replaced by the substitute model. Then, the OPSO algorithm is applied to the estimation of the optimal geomechanical parameter sets. Finally, the back analysis capabilities of the two models are also discussed. As shown in Fig. 18, compared with the assumed real geomechanical values, the inversed model based on 625 samples augmented by the ACGAN algorithm performs much better than that on 25 numerical samples, showing more precise inversed parameters. It indicates the importance of the quantity and quality of the training datasets. The relative error of the predicted and actually monitored convergence values also indicates the superiority of the model trained by 625 samples (as shown in Fig. 19). Specifically, the parameters setting of the OPSO algorithms are all the same for the two cases, and the only difference lies in the sample size for model training. Meanwhile, the relative errors corresponding to the inversed geomechanical parameters show an obvious discrepancy. The average relative error based on 25 data is nearly 0.69%, and that based on 625 data is only 0.01%, which is almost 70 times smaller than that of the frontier, indicating better inversion parameters.
4 Conclusions
The present paper proposes a multi-faceted back analysis approach that combines data augmentation with advanced optimization and machine learning techniques addressing the scarcity of sequential enriched data during the actual construction process. To mitigate the risk of over- or under-fitting, we employ the ACGAN algorithm to generate a new database that aligns with the inherent relationship within the initial datasets. We validate the feasibility of the data augmentation technique and investigate the performance of our developed OPSO-SVM algorithm across different sample sizes. Our primary conclusions are as follows.
(1) The developed ACGAN model exhibits satisfactory performance in augmenting discrete databases comprising geomechanical parameters and corresponding convergence values of surrounding rock mass.
(2) A larger training database quantity proves conductive for modeling relationships between geomechanical parameters and displacement values.
(3) The OPSO-SVM model outperforms both MLR, PSO-ANN, and ensemble algorithms in predicting displacements of surrounding rock mass.
(4) The back analysis method uniting the augmentation algorithm with the OPSO-SVM machine learning technique shows robust potentiality on limited original datasets.
5 Limitations and further research
This study has some limitations, and they are illustrated as follows.
(1) Only the convergence values of the surrounding rock mass are utilized to inverse the geomechanical parameters, neglecting the influence of strain, stress, and energy dissipation.
(2) The proposed back analysis method is the deterministic inversion method, neglecting the randomness and diversity of the geomechanical parameters.
(3) The proposed model is based on artificial numerical dataset, ignoring the noise like experimental or field data.
Thereafter, a probabilistic back analysis method considering more experimental or in situ monitored variables will be developed in our future research.
CRediT authorship contribution statement
Hui Li: Writing review & editing, Writing original draft, Visualization, Validation, Methodology, Conceptualization. Weizhong Chen: Writing review & editing, Validation, Supervision, Methodology, Conceptualization. Xianjun Tan: Writing review & editing, Supervision.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work is supported by the National Natural Science Foundation of China (Grant Nos. 51991392 and 51922104).
* Corresponding author.
E-mail address: [email protected] (W. Chen).
Peer review under the responsibility of Tongji University
References
Argilaga, A. (2023). Fractal informed generative adversarial networks (FIGAN): Application to the generation of X-ray CT images of a selfsimilar partially saturated sand. Computers and Geotechnics, 158, 105384.
Bani-Hani, K. A. (2007). Vibration control of wind-induced response of tall buildings with an active tuned mass damper using neural networks. Structural Control and Health Monitoring, 14(1), 83108.
Bertuzzi, R. (2017). Back-analysing rock mass modulus from monitoring data of two tunnels in Sydney, Australia. Journal of Rock Mechanics and Geotechnical Engineering, 9(5), 877891.
Blumling, P., Bernier, F., Lebon, P., & Martin, C. D. (2007). The excavation damaged zone in clay formations time-dependent behavior and influence on performance assessment. Physics and Chemistry of the Earth, 32, 588599.
Chang, X. Y., Wang, H., & Zhang, Y. M. (2023). Back analysis of rock mass parameters in tunnel engineering using machine learning techniques. Computers and Geotechnics, 163, 105738.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Presented at the KDD 16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco California USA (Pp. 785794). Association for Computing Machinery.
Chen, W. Z., Tian, Y., Wang, X. H., Tian, H. M., Cao, H. X., & Xie, H. D. (2019). Squeezing prediction of tunnel in soft rocks based on modified [BQ]. Rock and Soil Mechanics, 40(8), 321513133 (in Chinese).
Chu, Z., Wu, Z., Liu, B., & Liu, Q. (2019). Coupled analytical solutions for deep-buried circular lined tunnels considering tunnel face advancement and soft rock rheology effects. Tunnelling and Underground Space Technology, 94, 103111.
Ding, H., Chen, L., Dong, L., Fu, Z., & Cui, X. (2022). Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection. Future Generation Computer Systems, 131, 240254.
Ding, J., Wang, M., Ping, Z., Fu, D., & Vassiliadis, V. S. (2020). An integrated method based on relevance vector machine for short-term load forecasting. European Journal of Operational Research, 287(2), 497510.
Duan, T., Avati, A., Ding, D. Y., Thai, K. K., Basu, S., Ng, A., & Schuler, A. (2020). NGBoost: Natural gradient boosting for probabilistic prediction. In Proceedings of the 37th International Conference on Machine Learning, PMLR 119 (pp. 26902700). PMLR.
Fahimi, F., Dosen, S., Ang, K. K., Mrachacz-Kersting, N., & Guan, C. (2020). Generative adversarial networks-based data augmentation for brain-computer interface. IEEE Transactions on Neural Networks and Learning Systems, 32(9), 40394051.
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193202.
Gao, M. Y., Zhang, N., Shen, S. L., & Zhou, A. (2020). Real-time dynamic earth-pressure regulation model for shield tunneling by integrating GRU deep learning method with GA optimization. IEEE Access, 31(8), 6431064323.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, P., Warde-Farley, D., Ozair, S., Aaron, C., & Bengio, Y. (2014). Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Dec. 813, 2014 (pp. 26722680). Association for Computing Machinery.
Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: An interdisciplinary review. Journal of Big Data, 7(1), 94.
Iwana, B. K., & Uchida, S. (2021). An empirical survey of data augmentation for time series classification with neural networks. PLoS One1, 16(7), e0254841.
Jiang, S. H., Zhu, G. Y., Wang, Z. Z., Huang, Z. T., & Huang, J. S. (2023). Data augmentation for CNN-based probabilistic slope stability analysis in spatially variable soils. Computers and Geotechnics, 160, 105501.
Kashani, A. R., Raymond, C., Seyedali, M., & Gandomi, A. H. (2021). Particle swarm optimization variants for solving geotechnical problems: Review and comparative analysis. Archives of Civil and Mechanical Engineering, 28(3), 18711927.
Ke, G. L., Meng, Q., Finley, T., Wang, T. F., Chen, W., Ma, W. D., Ye, Q. W., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st Conference on Neural Information Processing Systems, (NIPS 2017), Long Beach, CA, USA (pp. 31493157). Curran Associates Inc.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of ICNN95 - International Conference on Neural Networks, Perth, WA, Australia (pp. 19421948). IEEE Xplore.
Khatti, H., Grover, K. S., Kim, H. J., Mawuntu, K. B., & Park, T. W. (2024). Prediction of ultimate bearing capacity of shallow foundations on cohesionless soil using hybrid LSTM and RVM approaches: An extended investigation of multicollinearity. Computers and Geotechnics, 165, 105912.
Kim, Y., Lim, S. Y., Kim, K. Y., & Yun, T. S. (2023). Prediction of compressional wave velocity of cement-reinforced soil from core images using a convolutional neural network regression model. Computers and Geotechnics, 153, 105067.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436444.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541551.
LeCun, Y., & Bottou, L. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 22782324.
Lee, M. B., Kim, Y. H., & Park, K. R. (2019). Conditional generative adversarial network-based data augmentation for enhancement of iris recognition accuracy. IEEE Access, 7, 122134122152.
Li, H., Chen, W. Z., & Tan, X. J. (2023a). Displacement-based back analysis of mitigating the effects of displacement loss in underground engineering. Journal of Rock Mechanics and Geotechnical Engineering, 15(10), 26262638.
Li, H., Chen, W. Z., Tan, X. Y., & Tan, X. J. (2023b). Back analysis of geomechanical parameters for rock mass under complex geological conditions using a novel algorithm. Tunnelling and Underground Space Technology, 136, 105099.
Li, W., Zhong, X., Shao, H., Cai, B., & Yang, X. (2022). Multi-mode data augmentation and fault diagnosis of rotating machinery using modified ACGAN designed with new framework. Advanced Engineering Informatics, 52, 101552.
Liang, W. Z., Luo, S. Z., Zhao, G. Y., & Wu, H. (2020). Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics, 8(5), 765.
Liu, Q. S., Lei, Y. M., Yin, X., Lei, J. S., Pan, Y. C., & Sun, L. (2023). Development and application of a novel probabilistic back-analysis framework for geotechnical parameters in shield tunneling based on the surrogate model and Bayesian theory. Acta Geotechnica, 18(9), 48994921.
Luo, Y., Chen, J., Chen, Y., Diao, P., & Qiao, X. (2018). Longitudinal deformation profile of a tunnel in weak rock mass by using the back analysis method. Tunnelling and Underground Space Technology, 71, 478493.
Mahmoodzadeh, A., Mohammadi, M., Abdulhamid, S. N., Ibrahim, H. H., Ali, H. F. H., & Salim, S. G. (2021). Dynamic reduction of time and cost uncertainties in tunneling projects. Tunnelling and Underground Space Technology, 109, 103774.
Manzanal, D., Drempetic, V., Haddad, B., Pastor, M., Martin, S. M., & Mira, P. (2016). Application of a new rheological model to rock avalanches: An SPH approach. Rock Mechanics and Rock Engineering, 49(6), 23532372.
Mao, J., Wang, H., & Spencer, B. F. (2021). Toward data anomaly detection for automated structural health monitoring: Exploiting generative adversarial nets and autoencoders. Structural Control and Health Monitoring, 20(4), 16091626.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115133.
McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239245.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial net. Computer Science, 26722680.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis. John Wiley & Sons.
Ni, Y., & Li, M. (2016). Wind pressure data reconstruction using neural network techniques: A comparison between BPNN and GRNN. Measurement, 88, 468476.
Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning, Sydney NSW Australia, PMLR 70 (pp. 26422651). PMLR.
Oh, B. K., Glisic, B., Kim, Y., & Park, H. S. (2020). Convolutional neural network-based data recovery method for structure health monitoring. Structural Control and Health Monitoring, 19(6), 1821.
Qi, C., & Fourie, A. (2018). A real-time back-analysis technique to infer rheological parameters from field monitoring. Rock Mechanics and Rock Engineering, 51(10), 30293043.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434. https://arxiv.org/abs/1511.06434.
Sun, J. L., Wu, S. C., Wang, H., Wang, T., Geng, X. J., & Zhang, Y. J. (2023). Inversion of surrounding rock mechanical parameters in a soft rock tunnel based on a hybrid model EO-LightGBM. Rock Mechanics and Rock Engineering, 56(9), 66916707.
Xiong, Q., Li, Z., Luo, H., & Zhao, Z. (2019). Wind tunnel test study on wind load coefficients variation law of heliostat based on uniform design method. Solar Energy, 184, 209229.
Xu, G. W., He, C., Wang, J., & Zhang, J. B. (2020). Study on the damage evolution of secondary tunnel lining in layered rock stratum. Bulletin of Engineering Geology and the Environment, 79(7), 1697041.
Zhang, F., Jiang, A., Guo, X., & Yang, X. (2021). Creep parameter inversion and long-term stability analysis of tunnel based on GP-DE intelligent algorithm. In Advances in Materials Science and Engineering (pp. 3769474).
Zhang, Y. Q., Tang, Z. Y., & Yang, R. J. (2022). Data anomaly detection for structural health monitoring by multi-view representation based on local binary patterns. Measurement, 202, 111804.
Zhang, Y., Miyamori, Y., Mikami, S., & Saito, T. (2019). Vibration-based structural state identification by a 1-dimensional convolutional neural network. Computer-Aided Civil and Infrastructure Engineering, 34(9), 822839.
Zhao, B. Y., Wang, X. P., Zhang, C., Li, W. C., Abbassi, R., & Chen, K. (2020). Structural integrity assessment of shield tunnel crossing of a railway bridge using orthogonal experimental design. Engineering Failure Analysis, 114, 104594.
Zhu, J. Y, Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-toimage translation using cycle- consistent adversarial networks. In Proceedings of International Conference on Computer Vision, Venice, Oct 2229, 2017 (pp. 22422251). IEEE.
Zhu, M. L., Peng, H., Liang, M., Song, G. X., Huang, N. H., Xie, W. W., & Han, Y. (2024). RC-XGBoost based mechanical parameters back analysis of rock mass in heavily fractured tunnel: A case in Yunnan, China. Rock Mechanics and Rock Engineering, 57(4), 29973019.
Zou, L., Zhang, H., Wang, C., Wu, F., & Gu, F. (2020). MW-ACGAN: Generating multiscale high-resolution SAR images for ship detection. Sensors, 20(22), 6673.
© 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.